Paragon AI Framework

High-performance, type-generic neural networks with WebGPU acceleration, ADHD metrics, and neural architecture search

🏆 MNIST Benchmark Results

99.13% Training Accuracy
97.62% Test Accuracy

RTX 3050 Mobile • 3m19s training • 56K samples

Introduction

A cutting-edge neural network framework built for performance and flexibility

Paragon is a revolutionary neural network framework written in Go, engineered for exceptional performance and unprecedented flexibility. It pioneers type-generic neural networks, WebGPU acceleration, and innovative evaluation metrics that set new standards for AI development.

Type-Generic Networks

Revolutionary support for multiple numeric types (float32, int32, uint32, int8, uint16) enabling optimized neural networks for different hardware configurations and precision requirements.

WebGPU Acceleration

Native WebGPU integration providing up to 3x performance improvements on compatible hardware. Supports NVIDIA RTX series, Intel Iris Xe, and other modern GPUs.

ADHD Metrics System

Advanced Accuracy Deviation Heatmap Distribution system provides granular performance insights, categorizing predictions into meaningful deviation buckets for targeted optimization.

Neural Architecture Search

Built-in network growth capabilities with dynamic architecture evolution, micro-network surgery, and distributed optimization for automatic model improvement.

Advanced Checkpointing

Comprehensive model persistence with JSON serialization, layer-wise checkpointing, and cross-platform compatibility for seamless deployment workflows.

Batch Processing

Efficient batch processing with GPU-optimized pipelines, supporting thousands of samples with automatic memory management and dynamic workgroup sizing.

🚀 Real-World Performance

MNIST Training
56,000 samples in 3m19s
GPU Acceleration
3x faster than CPU-only
Memory Efficiency
55 MiB VRAM usage
Cross-Platform
Linux, Windows, macOS

Getting Started

Quick setup guide to get Paragon running on your system

Prerequisites: Go 1.19+ installed. For WebGPU acceleration: compatible GPU with updated drivers (NVIDIA RTX series, Intel Iris Xe, or AMD RDNA2+).

Install Dependencies

Installation Commands bash
go mod init your-project
go get github.com/openfluke/paragon/[email protected]
go get github.com/openfluke/[email protected]
go get github.com/openfluke/webgpu@ea0f165

Create Your First Network

main.go - Basic Example go
package main

import (
    "fmt"
    "github.com/openfluke/paragon/v3"
)

func main() {
    // Create a type-generic neural network
    nn := paragon.NewNetwork[float32](
        []struct{ Width, Height int }{
            {28, 28},  // Input layer (784 neurons for MNIST)
            {32, 32},  // Hidden layer (1024 neurons)
            {10, 1},   // Output layer (10 classes)
        },
        []string{"linear", "relu", "softmax"},
        []bool{true, true, true}, // Fully connected
    )
    
    // Enable WebGPU acceleration
    nn.WebGPUNative = true
    if err := nn.InitializeOptimizedGPU(); err != nil {
        fmt.Printf("GPU init failed: %v, using CPU\n", err)
        nn.WebGPUNative = false
    } else {
        fmt.Println("✅ WebGPU acceleration enabled")
        defer nn.CleanupOptimizedGPU()
    }
    
    fmt.Printf("🚀 Network created: %d layers, %s acceleration\n", 
        len(nn.Layers), 
        map[bool]string{true: "GPU", false: "CPU"}[nn.WebGPUNative])
}

Train with Real Data

Training Example go
// Load your training data
trainInputs := [][][]float64{ /* your input data */ }
trainTargets := [][][]float64{ /* your target data */ }

// Train with GPU synchronization
nn.TrainWithGPUSync(
    trainInputs, trainTargets,
    10,        // epochs
    0.05,      // learning rate
    false,     // early stopping
    float32(2), float32(-2), // gradient clipping
)

// Evaluate with ADHD metrics
expected := []float64{ /* expected labels */ }
actual := []float64{ /* predicted labels */ }
nn.EvaluateModel(expected, actual)

fmt.Printf("📊 ADHD Score: %.2f%%\n", nn.Performance.Score)
fmt.Printf("📈 Accuracy: %.2f%%\n", 
    float64(nn.Performance.Buckets["0-10%"].Count) / 
    float64(nn.Performance.Total) * 100)

Cross-Platform GPU Setup

Platform-Specific Commands bash
# Linux (NVIDIA RTX with Optimus)
__NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia ./your-app

# Linux (Fedora with NVIDIA drivers)
sudo dnf install nvidia-driver nvidia-settings

# Windows (NVIDIA/Intel automatic detection)
./your-app.exe

# macOS (Metal backend)
./your-app

Performance Benchmarks

Real-world performance metrics and comparisons

🏆 MNIST Benchmark Results

GPU NVIDIA RTX 3050 Mobile

  • Training Time: 3m19s (10 epochs)
  • Training Accuracy: 99.13%
  • Test Accuracy: 97.62%
  • VRAM Usage: 55 MiB
  • GPU Utilization: 75% @ 38W
  • Speed: ~19.96s/epoch

CPU Intel i5-12500H Fallback

  • Training Time: ~9m35s (10 epochs)
  • Speed: ~57.5s/epoch
  • Memory Usage: ~2GB RAM
  • CPU Cores: 16 threads utilized
  • Speedup vs GPU: ~3x slower
ADHD Performance Breakdown output
📈 ADHD Performance (Train Set):
- 0-10%: 55,263 samples (98.68%)    ✅ Excellent predictions
- 10-20%: 108 samples (0.19%)       ✅ Good predictions  
- 20-30%: 65 samples (0.12%)        ⚠️  Moderate errors
- 30-40%: 50 samples (0.09%)        ⚠️  Moderate errors
- 40-50%: 43 samples (0.08%)        ⚠️  High errors
- 50-100%: 224 samples (0.40%)      ❌ Very high errors
- 100%+: 247 samples (0.44%)        ❌ Complete failures

📊 Overall ADHD Score: 99.13%
🎯 Classification Accuracy: 98.68%

🚀 Performance Highlights

  • GPU Acceleration: 3x faster than CPU with automatic fallback
  • Memory Efficient: Only 55 MiB VRAM for 56K training samples
  • Cross-Platform: Consistent performance across Linux, Windows, macOS
  • Scalable: Outperforms TensorFlow/PyTorch without batching optimization

Type System

Revolutionary type-generic neural networks

Paragon pioneered type-generic neural networks, allowing you to choose the optimal numeric precision for your specific use case, hardware constraints, and performance requirements.

FLOAT Floating Point Types

  • float32: GPU-optimized, WebGPU native
  • float64: High precision, CPU intensive

Best for: GPU acceleration, standard training

INT Integer Types

  • int32: Memory efficient, fixed-point math
  • int8/int16: Ultra-low memory, quantized
  • uint32/uint16: Positive-only, specialized

Best for: Edge devices, quantized inference

Type-Generic Network Creation go
// GPU-optimized float32 network
nnFloat32 := paragon.NewNetwork[float32](layerSizes, activations, connectivity)

// Memory-efficient int32 network  
nnInt32 := paragon.NewNetwork[int32](layerSizes, activations, connectivity)

// Ultra-compact int8 network for edge deployment
nnInt8 := paragon.NewNetwork[int8](layerSizes, activations, connectivity)

// Specialized uint32 network for positive-only data
nnUint32 := paragon.NewNetwork[uint32](layerSizes, activations, connectivity)

// Convert between types seamlessly
convertedNet, err := paragon.ConvertNetwork[float32, int32](nnFloat32)
if err == nil {
    fmt.Println("✅ Successfully converted float32 → int32 network")
}

💡 Type Selection Guide: Use float32 for GPU acceleration and standard training. Choose int32 for memory-constrained environments. Use int8 for ultra-low-power edge devices. Consider uint32 for specialized positive-only data domains.

WebGPU Acceleration

Native GPU computing with automatic optimization

Paragon features cutting-edge WebGPU integration that automatically optimizes neural network computations for modern GPUs, delivering significant performance improvements with seamless CPU fallback.

NVIDIA RTX Series

  • RTX 3050, 3060, 3070, 3080
  • RTX 4060, 4070, 4080, 4090
  • Driver 575.57.08+ recommended

INTEL Iris Xe & Arc

  • Iris Xe integrated graphics
  • Arc A380, A750, A770
  • Latest Intel drivers required

AMD RDNA2+

  • RX 6600, 6700, 6800, 6900
  • RX 7600, 7700, 7800, 7900
  • Mesa 23.0+ on Linux
GPU Initialization & Optimization go
// Enable WebGPU with automatic optimization
nn.WebGPUNative = true

// Initialize optimized GPU pipelines
if err := nn.InitializeOptimizedGPU(); err != nil {
    fmt.Printf("⚠️ GPU initialization failed: %v\n", err)
    fmt.Println("   Falling back to CPU computation...")
    nn.WebGPUNative = false
} else {
    fmt.Println("✅ WebGPU acceleration enabled")
    
    // GPU info and capabilities
    gpuInfo, _ := nn.GetAllGPUInfo()
    for _, gpu := range gpuInfo {
        fmt.Printf("🚀 Using: %s (%s)\n", gpu["name"], gpu["vendorName"])
        fmt.Printf("   Max Buffer: %s MB\n", gpu["maxBufferSizeMB"])
        fmt.Printf("   Compute Groups: %s\n", gpu["maxComputeInvocations"])
    }
    
    // Cleanup when done
    defer nn.CleanupOptimizedGPU()
}

// Training automatically uses GPU when available
nn.TrainWithGPUSync(inputs, targets, epochs, lr, false, clipUpper, clipLower)

// Forward pass with GPU acceleration
nn.Forward(testInput)

// Batch processing with GPU optimization
outputs, err := nn.ForwardBatch(batchInputs)

🔧 Platform Setup Tips:

  • Linux NVIDIA: Use __NV_PRIME_RENDER_OFFLOAD=1 for Optimus systems
  • Fedora: Install nvidia-driver package for WebGPU support
  • Windows: GPU auto-detection works out of the box
  • macOS: Metal backend provides excellent performance

ADHD Metrics System

Advanced Accuracy Deviation Heatmap Distribution analysis

The ADHD (Accuracy Deviation Heatmap Distribution) system revolutionizes model evaluation by categorizing prediction deviations into meaningful buckets, providing unprecedented insights into model behavior and enabling targeted optimization strategies.

📊 Deviation Categories

  • 0-10%: Excellent predictions (target zone)
  • 10-20%: Good predictions
  • 20-50%: Moderate errors
  • 50-100%: High errors
  • 100%+: Complete failures

🎯 Advanced Metrics

  • ADHD Score: Weighted performance metric
  • Failure Rate: Critical error percentage
  • Distribution Analysis: Error pattern insights
  • Composite Performance: Multi-metric evaluation
ADHD Evaluation Implementation go
// Standard ADHD evaluation
expected := []float64{0, 1, 2, 1, 0} // Ground truth labels
actual := []float64{0, 1, 2, 0, 0}   // Model predictions
nn.EvaluateModel(expected, actual)

// Access detailed ADHD metrics
fmt.Printf("📊 ADHD Score: %.2f%%\n", nn.Performance.Score)
fmt.Printf("📈 Total Samples: %d\n", nn.Performance.Total)
fmt.Printf("❌ Failures (100%%+): %d\n", nn.Performance.Failures)

// Iterate through deviation buckets
for bucketName, bucket := range nn.Performance.Buckets {
    percentage := float64(bucket.Count) / float64(nn.Performance.Total) * 100
    fmt.Printf("   %s: %d samples (%.2f%%)\n", bucketName, bucket.Count, percentage)
}

// Advanced composite evaluation for detailed analysis
nn.EvaluateFull(expectedVector, actualVector)
nn.PrintFullDiagnostics()

// Sample-level performance for vector outputs
samplePerf := paragon.ComputePerSamplePerformance(
    expectedVectors, actualVectors, 
    0.01, // epsilon tolerance
    nn,
)
paragon.PrintSampleDiagnostics(samplePerf, 0.01)

🧠 ADHD Insights

The ADHD system enables you to:

  • Identify model weaknesses: Pinpoint which prediction ranges need improvement
  • Track training progress: Monitor how error distributions change over epochs
  • Compare model architectures: Use ADHD scores for objective model comparison
  • Optimize hyperparameters: Target specific deviation buckets for fine-tuning

Network Growth & NAS

Automatic neural architecture search and network evolution

Paragon features advanced neural architecture search (NAS) capabilities that automatically evolve and improve network architectures through micro-network surgery, distributed optimization, and intelligent growth strategies.

Micro-Network Surgery

Extract, optimize, and reintegrate sub-networks for targeted improvements without disrupting the entire model.

Dynamic Growth

Automatically add layers and neurons based on performance metrics and training data characteristics.

Activation Evolution

Intelligently search through activation function combinations to find optimal configurations.

Network Growth Implementation go
// Basic network growth
improved := nn.Grow(
    checkpointLayer,    // Layer to checkpoint from
    testInputs,         // Test data for evaluation  
    expectedOutputs,    // Expected labels
    50,                 // Number of candidate architectures
    10,                 // Training epochs per candidate
    0.01,               // Learning rate
    1e-6,               // Convergence tolerance
    float32(2), float32(-2), // Gradient clipping
    16, 64,             // Width constraints (min, max)
    1, 4,               // Height constraints (min, max)
    []string{"relu", "leaky_relu", "swish", "gelu"}, // Activation pool
    4,                  // Max parallel threads
)

if improved {
    fmt.Println("🌱 Network successfully grown!")
    nn.PrintGrowthHistory()
} else {
    fmt.Println("⚠️ No improvement found")
}

// Advanced iterative NAS
bestNet, bestScore := nn.IterativeInitNAS(
    10,                 // Clones per round
    5,                  // NAS epochs
    0.001,              // Base learning rate
    0.1,                // Weight mutation rate
    false,              // Early stopping
    true,               // Allow activation mutations
    95.0,               // Target ADHD score
    5,                  // Max attempts
    inputs, targets,    // Training data
    float32(1), float32(-1), // Clipping bounds
)

fmt.Printf("🏆 Best NAS result: %.2f%% ADHD score\n", bestScore)

// Manual network architecture modification
nn.AddLayer(2, 128, 64, "relu", true)  // Insert layer at position 2
nn.AddNeuronsToLayer(1, 32)            // Add 32 neurons to layer 1

// Save growth history
nn.SaveGrowthLogJSON("./models/growth_log.json")

🔬 Growth Strategies: The growth system uses checkpoint-based evaluation to test architectural changes without disrupting the main network. Micro-networks are extracted, improved through parallel training, and reintegrated only if they show measurable performance gains.

Training & Evaluation

Advanced training algorithms and comprehensive evaluation metrics

🎯 Training Methods

Standard Training go
// CPU/GPU adaptive training
nn.Train(
    inputs, targets,
    100,      // epochs
    0.01,     // learning rate
    false,    // early stop on negative loss
    float32(2), float32(-2), // gradient clipping
)

// GPU-optimized training with synchronization
nn.TrainWithGPUSync(
    inputs, targets,
    50,       // epochs
    0.005,    // learning rate  
    true,     // early stopping
    float32(1), float32(-1), // clipping bounds
)

📊 Evaluation Metrics

Comprehensive Evaluation go
// ADHD classification evaluation
paragon.EvaluateWithADHD(nn, testInputs, testTargets)

// Standard accuracy computation
accuracy := paragon.ComputeAccuracy(nn, inputs, targets)
fmt.Printf("📈 Accuracy: %.2f%%\n", accuracy*100)

// Advanced composite metrics
nn.EvaluateFull(expected, actual)
fmt.Printf("🎯 Composite Score: %.2f\n", nn.Composite.Score)
fmt.Printf("✅ Exact Matches: %d/%d\n", 
    nn.Composite.ExactMatchCount, nn.Composite.TotalSamples)

⚡ Advanced Training Features

Layer Replay System

Intelligent layer replay with entropy-based gating for improved learning efficiency and stability.

// Enable dynamic replay
layer := &nn.Layers[2]
layer.ReplayEnabled = true
layer.ReplayBudget = 3
layer.ReplayGateFunc = entropyGate

Gradient Clipping

Advanced gradient clipping with type-aware bounds for stable training across all numeric types.

// Type-aware gradient clipping
nn.Train(inputs, targets, epochs, lr, 
    false, T(maxGrad), T(minGrad))

Weight Perturbation

Intelligent weight mutation for escaping local minima and improving generalization.

// Apply Gaussian noise to weights
nn.PerturbWeights(0.01, randomSeed)

🎛️ Training Tips: Start with TrainWithGPUSync for GPU acceleration. Use gradient clipping to prevent exploding gradients. Enable layer replay for complex architectures. Monitor ADHD metrics during training to track learning progress and identify optimization opportunities.

Advanced Features

Cutting-edge capabilities for research and production

Advanced Checkpointing

Layer-wise checkpointing with JSON serialization, cross-platform compatibility, and incremental loading for large models.

// Save/load full models
nn.SaveJSON("model.json")
nn.LoadJSON("model.json")

// Layer-specific checkpointing
nn.SaveLayerState(3, "layer3_checkpoint.json")
state, _ := nn.LoadLayerState(3, "layer3_checkpoint.json")
nn.ForwardFromLayer(3, state)

Batch Processing

GPU-optimized batch processing with automatic memory management, dynamic workgroup sizing, and efficient data streaming.

// Efficient batch processing
batchInputs := [][][]float64{input1, input2, input3}
outputs, err := nn.ForwardBatch(batchInputs)

// GPU batch processing with custom shaders
nn.buildBatchGPUKernels(batchSize)

Type Conversion

Seamless conversion between numeric types with intelligent scaling, precision preservation, and automatic optimization.

// Convert between types
float32Net := paragon.NewNetwork[float32](...)
int32Net, err := paragon.ConvertNetwork[float32, int32](float32Net)

// Batch conversion to multiple types
results, err := paragon.BatchConvertNetworks(
    sourceNet, []string{"float32", "int32", "uint32"})

Network Introspection

Deep network analysis with method discovery, parameter counting, and performance profiling capabilities.

// Introspect network methods
methods, err := nn.GetphaseMethods()
methodsJSON, err := nn.GetphaseMethodsJSON()

// GPU device information
gpuInfo, err := nn.GetAllGPUInfo()
for _, gpu := range gpuInfo {
    fmt.Printf("GPU: %s (%s)\n", gpu["name"], gpu["vendorName"])
}

Performance Benchmarking

Comprehensive benchmarking suite with multi-threaded testing, GPU profiling, and detailed performance analysis.

// Benchmark numeric operations
result := paragon.BenchmarkNumericOps[float32]("float32", 5*time.Second, true)
fmt.Printf("Operations/sec: %d\n", result)

// Full benchmark suite
allResults := paragon.RunAllBenchmarks(10*time.Second)
fmt.Println(allResults) // JSON output

Reverse Propagation

Advanced reverse inference techniques for input reconstruction, attribution analysis, and interpretability studies.

// Reverse inference from output
targetOutput := [][]float64{0, 0, 1, 0, 0} // Desired output
inferredInput := nn.InferInputFromOutput(targetOutput, 100, 0.01)

// Bidirectional constraint propagation
nn.PropagateBidirectionalConstraint(actualInput, targetOutput, 0.1, 0.9)

API Reference

Complete reference for all Paragon types and methods

TYPE Network[T Numeric]

Main neural network structure supporting all numeric types with GPU acceleration

Core Methods:

NewNetwork[T](layers, activations, connected, seed...) - Create type-generic network

Forward(input) - CPU/GPU adaptive forward pass

ForwardBatch(batchInputs) - Optimized batch processing

Backward(targets, lr, clipUpper, clipLower) - Backpropagation with clipping

Train(inputs, targets, epochs, lr, earlyStop, clipUpper, clipLower) - Standard training

TrainWithGPUSync(inputs, targets, epochs, lr, earlyStop, clipUpper, clipLower) - GPU training

EvaluateModel(expected, actual) - ADHD evaluation

InitializeOptimizedGPU() - Enable WebGPU acceleration

Advanced Methods:

Grow(checkpointLayer, inputs, labels, candidates, epochs, lr, tolerance, ...) - Network evolution

AddLayer(idx, width, height, activation, fullyConnected) - Dynamic architecture

AddNeuronsToLayer(layerIdx, numToAdd) - Runtime expansion

SaveJSON(path) / LoadJSON(path) - Model persistence

SaveLayerState(layerIdx, filename) - Layer checkpointing

ForwardFromLayer(layerIdx, state) - Checkpoint resumption

TYPE Grid[T Numeric]

2D neural layer with Width × Height dimensions and replay capabilities

Properties:

Width, Height int - Layer dimensions

Neurons [][]*Neuron[T] - 2D neuron grid

ReplayEnabled bool - Replay system toggle

ReplayBudget int - Maximum replay operations

CachedOutputs []T - Cached activations

GetOutputValues() []float64 - Extract layer outputs

TYPE Neuron[T Numeric]

Individual neuron with type-generic values, multiple activation functions, and GPU compatibility

Core Properties:

Value T - Current activation value

Bias T - Neuron bias term

Activation string - Function type

Inputs []Connection[T] - Input connections

RevValue T - Reverse propagation value

IsNew bool - Growth tracking flag

Dimension *Network[T] - Sub-network pointer

Type string - Neuron type classification

ID int - Unique identifier

GPU WebGPU Integration

Native WebGPU acceleration with automatic optimization and cross-platform support

GPU Methods:

InitializeOptimizedGPU() error - Setup GPU pipelines

ForwardGPUOptimized(inputs) error - GPU forward pass

BackwardGPUOptimized(targets, lr, clipUpper, clipLower) error - GPU backprop

SyncGPUWeightsToCPU() error - GPU→CPU synchronization

GetAllGPUInfo() ([]map[string]string, error) - Device enumeration

CleanupOptimizedGPU() - Resource cleanup

UTILS Utility Functions

Helper functions for data processing, benchmarking, and network analysis

Core Utilities:

SplitDataset(inputs, targets, trainFrac) - Data splitting

ConvertNetwork[T1, T2](src) - Type conversion

Softmax(inputs) []float64 - Softmax normalization

ArgMax(arr) int - Maximum index finder

RunAllBenchmarks(duration) string - Performance testing

ReadCSV(filename) ([][]string, error) - CSV loading

ComputeAccuracy[T](nn, inputs, targets) float64 - Accuracy calculation

LoadNamedNetworkFromJSONString(jsonStr) (any, error) - Dynamic loading

📚 Complete Documentation: For detailed method signatures, parameter descriptions, and usage examples, explore the comprehensive source code documentation in the GitHub repository.

Examples & Tutorials

Practical examples and step-by-step tutorials

🚀 Quick Start Examples

Complete MNIST Training Example go
package main

import (
    "fmt"
    "github.com/openfluke/paragon/v3"
    "github.com/openfluke/pilot/experiments"
)

func main() {
    // Load MNIST dataset
    mnist := experiments.NewMNISTDatasetStage("./data/mnist")
    allInputs, allTargets, err := loadMNISTData("./data/mnist")
    if err != nil {
        panic(err)
    }
    
    // Split dataset 80/20
    trainInputs, trainTargets, testInputs, testTargets := 
        paragon.SplitDataset(allInputs, allTargets, 0.8)
    
    // Create optimized network
    nn := paragon.NewNetwork[float32](
        []struct{ Width, Height int }{28, 28}, {32, 32}, {10, 1},
        []string{"linear", "relu", "softmax"},
        []bool{true, true, true},
    )
    
    // Enable GPU acceleration
    nn.WebGPUNative = true
    if err := nn.InitializeOptimizedGPU(); err != nil {
        fmt.Printf("GPU unavailable: %v\n", err)
        nn.WebGPUNative = false
    }
    defer nn.CleanupOptimizedGPU()
    
    // Train with GPU synchronization
    nn.TrainWithGPUSync(trainInputs, trainTargets, 10, 0.05, 
        false, float32(2), float32(-2))
    
    // Evaluate with ADHD metrics
    expected := make([]float64, len(testInputs))
    actual := make([]float64, len(testInputs))
    
    for i, input := range testInputs {
        nn.Forward(input)
        output := nn.ExtractOutput()
        expected[i] = float64(paragon.ArgMax(testTargets[i][0]))
        actual[i] = float64(paragon.ArgMax(output))
    }
    
    nn.EvaluateModel(expected, actual)
    fmt.Printf("🎯 Test Accuracy: %.2f%%\n", nn.Performance.Score)
    
    // Save trained model
    nn.SaveJSON("./models/mnist_model.json")
}

📊 Benchmarking Example

Performance Benchmarking go
// Comprehensive benchmark suite
results := paragon.RunAllBenchmarks(5 * time.Second)
fmt.Println("📊 Benchmark Results:")
fmt.Println(results)

// Type-specific benchmarks
float32Ops := paragon.BenchmarkNumericOps[float32]("float32", 
    2*time.Second, true)
int32Ops := paragon.BenchmarkNumericOps[int32]("int32", 
    2*time.Second, true)

fmt.Printf("Float32: %s ops/sec\n", formatNumber(float32Ops))
fmt.Printf("Int32: %s ops/sec\n", formatNumber(int32Ops))

// GPU device information
gpuInfo, err := paragon.GetAllGPUInfo()
if err == nil {
    for _, gpu := range gpuInfo {
        fmt.Printf("🚀 GPU: %s (%s)\n", gpu["name"], gpu["vendorName"])
        fmt.Printf("   Max Buffer: %s MB\n", gpu["maxBufferSizeMB"])
    }
}

🌱 Network Growth Example

Automatic Architecture Search go
// Advanced network growth with NAS
improved := nn.Grow(
    2,                  // Checkpoint layer
    validationInputs,   // Validation data
    validationLabels,   // Validation labels
    20,                 // Candidate architectures
    5,                  // Training epochs per candidate
    0.01,               // Learning rate
    1e-6,               // Convergence tolerance  
    float32(1), float32(-1), // Gradient clipping
    8, 128,             // Width constraints
    1, 8,               // Height constraints
    []string{"relu", "leaky_relu", "swish"}, // Activation pool
    4,                  // Parallel threads
)

if improved {
    fmt.Println("🌱 Network architecture improved!")
    nn.PrintGrowthHistory()
    
    // Save growth log
    nn.SaveGrowthLogJSON("growth_history.json")
} else {
    fmt.Println("⚠️ No architectural improvements found")
}

// Iterative NAS for maximum optimization
bestNet, bestScore := nn.IterativeInitNAS(
    15,     // Clones per iteration
    3,      // NAS epochs
    0.001,  // Base learning rate
    0.05,   // Weight mutation rate
    false,  // Early stopping
    true,   // Allow activation mutations
    98.0,   // Target ADHD score
    10,     // Maximum attempts
    trainInputs, trainTargets,
    float32(2), float32(-2),
)

fmt.Printf("🏆 Best NAS Score: %.2f%%\n", bestScore)

🎓 Learning Resources

  • MNIST Benchmark: Complete working example with GPU acceleration
  • Type System: Explore float32, int32, and uint32 networks for different use cases
  • WebGPU Setup: Platform-specific GPU configuration guides
  • ADHD Metrics: Advanced evaluation techniques for model analysis
  • Network Growth: Automatic architecture search and optimization