🏆 MNIST Benchmark Results
RTX 3050 Mobile • 3m19s training • 56K samples
RTX 3050 Mobile • 3m19s training • 56K samples
A cutting-edge neural network framework built for performance and flexibility
Paragon is a revolutionary neural network framework written in Go, engineered for exceptional performance and unprecedented flexibility. It pioneers type-generic neural networks, WebGPU acceleration, and innovative evaluation metrics that set new standards for AI development.
Revolutionary support for multiple numeric types (float32, int32, uint32, int8, uint16) enabling optimized neural networks for different hardware configurations and precision requirements.
Native WebGPU integration providing up to 3x performance improvements on compatible hardware. Supports NVIDIA RTX series, Intel Iris Xe, and other modern GPUs.
Advanced Accuracy Deviation Heatmap Distribution system provides granular performance insights, categorizing predictions into meaningful deviation buckets for targeted optimization.
Built-in network growth capabilities with dynamic architecture evolution, micro-network surgery, and distributed optimization for automatic model improvement.
Comprehensive model persistence with JSON serialization, layer-wise checkpointing, and cross-platform compatibility for seamless deployment workflows.
Efficient batch processing with GPU-optimized pipelines, supporting thousands of samples with automatic memory management and dynamic workgroup sizing.
Quick setup guide to get Paragon running on your system
Prerequisites: Go 1.19+ installed. For WebGPU acceleration: compatible GPU with updated drivers (NVIDIA RTX series, Intel Iris Xe, or AMD RDNA2+).
go mod init your-project
go get github.com/openfluke/paragon/[email protected]
go get github.com/openfluke/[email protected]
go get github.com/openfluke/webgpu@ea0f165
package main
import (
"fmt"
"github.com/openfluke/paragon/v3"
)
func main() {
// Create a type-generic neural network
nn := paragon.NewNetwork[float32](
[]struct{ Width, Height int }{
{28, 28}, // Input layer (784 neurons for MNIST)
{32, 32}, // Hidden layer (1024 neurons)
{10, 1}, // Output layer (10 classes)
},
[]string{"linear", "relu", "softmax"},
[]bool{true, true, true}, // Fully connected
)
// Enable WebGPU acceleration
nn.WebGPUNative = true
if err := nn.InitializeOptimizedGPU(); err != nil {
fmt.Printf("GPU init failed: %v, using CPU\n", err)
nn.WebGPUNative = false
} else {
fmt.Println("✅ WebGPU acceleration enabled")
defer nn.CleanupOptimizedGPU()
}
fmt.Printf("🚀 Network created: %d layers, %s acceleration\n",
len(nn.Layers),
map[bool]string{true: "GPU", false: "CPU"}[nn.WebGPUNative])
}
// Load your training data
trainInputs := [][][]float64{ /* your input data */ }
trainTargets := [][][]float64{ /* your target data */ }
// Train with GPU synchronization
nn.TrainWithGPUSync(
trainInputs, trainTargets,
10, // epochs
0.05, // learning rate
false, // early stopping
float32(2), float32(-2), // gradient clipping
)
// Evaluate with ADHD metrics
expected := []float64{ /* expected labels */ }
actual := []float64{ /* predicted labels */ }
nn.EvaluateModel(expected, actual)
fmt.Printf("📊 ADHD Score: %.2f%%\n", nn.Performance.Score)
fmt.Printf("📈 Accuracy: %.2f%%\n",
float64(nn.Performance.Buckets["0-10%"].Count) /
float64(nn.Performance.Total) * 100)
# Linux (NVIDIA RTX with Optimus)
__NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia ./your-app
# Linux (Fedora with NVIDIA drivers)
sudo dnf install nvidia-driver nvidia-settings
# Windows (NVIDIA/Intel automatic detection)
./your-app.exe
# macOS (Metal backend)
./your-app
Real-world performance metrics and comparisons
📈 ADHD Performance (Train Set):
- 0-10%: 55,263 samples (98.68%) ✅ Excellent predictions
- 10-20%: 108 samples (0.19%) ✅ Good predictions
- 20-30%: 65 samples (0.12%) ⚠️ Moderate errors
- 30-40%: 50 samples (0.09%) ⚠️ Moderate errors
- 40-50%: 43 samples (0.08%) ⚠️ High errors
- 50-100%: 224 samples (0.40%) ❌ Very high errors
- 100%+: 247 samples (0.44%) ❌ Complete failures
📊 Overall ADHD Score: 99.13%
🎯 Classification Accuracy: 98.68%
Revolutionary type-generic neural networks
Paragon pioneered type-generic neural networks, allowing you to choose the optimal numeric precision for your specific use case, hardware constraints, and performance requirements.
Best for: GPU acceleration, standard training
Best for: Edge devices, quantized inference
// GPU-optimized float32 network
nnFloat32 := paragon.NewNetwork[float32](layerSizes, activations, connectivity)
// Memory-efficient int32 network
nnInt32 := paragon.NewNetwork[int32](layerSizes, activations, connectivity)
// Ultra-compact int8 network for edge deployment
nnInt8 := paragon.NewNetwork[int8](layerSizes, activations, connectivity)
// Specialized uint32 network for positive-only data
nnUint32 := paragon.NewNetwork[uint32](layerSizes, activations, connectivity)
// Convert between types seamlessly
convertedNet, err := paragon.ConvertNetwork[float32, int32](nnFloat32)
if err == nil {
fmt.Println("✅ Successfully converted float32 → int32 network")
}
💡 Type Selection Guide: Use
float32
for GPU acceleration and standard
training. Choose int32
for memory-constrained
environments. Use int8
for ultra-low-power edge
devices. Consider uint32
for specialized
positive-only data domains.
Native GPU computing with automatic optimization
Paragon features cutting-edge WebGPU integration that automatically optimizes neural network computations for modern GPUs, delivering significant performance improvements with seamless CPU fallback.
// Enable WebGPU with automatic optimization
nn.WebGPUNative = true
// Initialize optimized GPU pipelines
if err := nn.InitializeOptimizedGPU(); err != nil {
fmt.Printf("⚠️ GPU initialization failed: %v\n", err)
fmt.Println(" Falling back to CPU computation...")
nn.WebGPUNative = false
} else {
fmt.Println("✅ WebGPU acceleration enabled")
// GPU info and capabilities
gpuInfo, _ := nn.GetAllGPUInfo()
for _, gpu := range gpuInfo {
fmt.Printf("🚀 Using: %s (%s)\n", gpu["name"], gpu["vendorName"])
fmt.Printf(" Max Buffer: %s MB\n", gpu["maxBufferSizeMB"])
fmt.Printf(" Compute Groups: %s\n", gpu["maxComputeInvocations"])
}
// Cleanup when done
defer nn.CleanupOptimizedGPU()
}
// Training automatically uses GPU when available
nn.TrainWithGPUSync(inputs, targets, epochs, lr, false, clipUpper, clipLower)
// Forward pass with GPU acceleration
nn.Forward(testInput)
// Batch processing with GPU optimization
outputs, err := nn.ForwardBatch(batchInputs)
🔧 Platform Setup Tips:
__NV_PRIME_RENDER_OFFLOAD=1
for Optimus systems
nvidia-driver
package for WebGPU support
Advanced Accuracy Deviation Heatmap Distribution analysis
The ADHD (Accuracy Deviation Heatmap Distribution) system revolutionizes model evaluation by categorizing prediction deviations into meaningful buckets, providing unprecedented insights into model behavior and enabling targeted optimization strategies.
// Standard ADHD evaluation
expected := []float64{0, 1, 2, 1, 0} // Ground truth labels
actual := []float64{0, 1, 2, 0, 0} // Model predictions
nn.EvaluateModel(expected, actual)
// Access detailed ADHD metrics
fmt.Printf("📊 ADHD Score: %.2f%%\n", nn.Performance.Score)
fmt.Printf("📈 Total Samples: %d\n", nn.Performance.Total)
fmt.Printf("❌ Failures (100%%+): %d\n", nn.Performance.Failures)
// Iterate through deviation buckets
for bucketName, bucket := range nn.Performance.Buckets {
percentage := float64(bucket.Count) / float64(nn.Performance.Total) * 100
fmt.Printf(" %s: %d samples (%.2f%%)\n", bucketName, bucket.Count, percentage)
}
// Advanced composite evaluation for detailed analysis
nn.EvaluateFull(expectedVector, actualVector)
nn.PrintFullDiagnostics()
// Sample-level performance for vector outputs
samplePerf := paragon.ComputePerSamplePerformance(
expectedVectors, actualVectors,
0.01, // epsilon tolerance
nn,
)
paragon.PrintSampleDiagnostics(samplePerf, 0.01)
The ADHD system enables you to:
Automatic neural architecture search and network evolution
Paragon features advanced neural architecture search (NAS) capabilities that automatically evolve and improve network architectures through micro-network surgery, distributed optimization, and intelligent growth strategies.
Extract, optimize, and reintegrate sub-networks for targeted improvements without disrupting the entire model.
Automatically add layers and neurons based on performance metrics and training data characteristics.
Intelligently search through activation function combinations to find optimal configurations.
// Basic network growth
improved := nn.Grow(
checkpointLayer, // Layer to checkpoint from
testInputs, // Test data for evaluation
expectedOutputs, // Expected labels
50, // Number of candidate architectures
10, // Training epochs per candidate
0.01, // Learning rate
1e-6, // Convergence tolerance
float32(2), float32(-2), // Gradient clipping
16, 64, // Width constraints (min, max)
1, 4, // Height constraints (min, max)
[]string{"relu", "leaky_relu", "swish", "gelu"}, // Activation pool
4, // Max parallel threads
)
if improved {
fmt.Println("🌱 Network successfully grown!")
nn.PrintGrowthHistory()
} else {
fmt.Println("⚠️ No improvement found")
}
// Advanced iterative NAS
bestNet, bestScore := nn.IterativeInitNAS(
10, // Clones per round
5, // NAS epochs
0.001, // Base learning rate
0.1, // Weight mutation rate
false, // Early stopping
true, // Allow activation mutations
95.0, // Target ADHD score
5, // Max attempts
inputs, targets, // Training data
float32(1), float32(-1), // Clipping bounds
)
fmt.Printf("🏆 Best NAS result: %.2f%% ADHD score\n", bestScore)
// Manual network architecture modification
nn.AddLayer(2, 128, 64, "relu", true) // Insert layer at position 2
nn.AddNeuronsToLayer(1, 32) // Add 32 neurons to layer 1
// Save growth history
nn.SaveGrowthLogJSON("./models/growth_log.json")
🔬 Growth Strategies: The growth system uses checkpoint-based evaluation to test architectural changes without disrupting the main network. Micro-networks are extracted, improved through parallel training, and reintegrated only if they show measurable performance gains.
Advanced training algorithms and comprehensive evaluation metrics
// CPU/GPU adaptive training
nn.Train(
inputs, targets,
100, // epochs
0.01, // learning rate
false, // early stop on negative loss
float32(2), float32(-2), // gradient clipping
)
// GPU-optimized training with synchronization
nn.TrainWithGPUSync(
inputs, targets,
50, // epochs
0.005, // learning rate
true, // early stopping
float32(1), float32(-1), // clipping bounds
)
// ADHD classification evaluation
paragon.EvaluateWithADHD(nn, testInputs, testTargets)
// Standard accuracy computation
accuracy := paragon.ComputeAccuracy(nn, inputs, targets)
fmt.Printf("📈 Accuracy: %.2f%%\n", accuracy*100)
// Advanced composite metrics
nn.EvaluateFull(expected, actual)
fmt.Printf("🎯 Composite Score: %.2f\n", nn.Composite.Score)
fmt.Printf("✅ Exact Matches: %d/%d\n",
nn.Composite.ExactMatchCount, nn.Composite.TotalSamples)
Intelligent layer replay with entropy-based gating for improved learning efficiency and stability.
// Enable dynamic replay
layer := &nn.Layers[2]
layer.ReplayEnabled = true
layer.ReplayBudget = 3
layer.ReplayGateFunc = entropyGate
Advanced gradient clipping with type-aware bounds for stable training across all numeric types.
// Type-aware gradient clipping
nn.Train(inputs, targets, epochs, lr,
false, T(maxGrad), T(minGrad))
Intelligent weight mutation for escaping local minima and improving generalization.
// Apply Gaussian noise to weights
nn.PerturbWeights(0.01, randomSeed)
🎛️ Training Tips: Start with
TrainWithGPUSync
for GPU acceleration. Use
gradient clipping to prevent exploding gradients. Enable layer
replay for complex architectures. Monitor ADHD metrics during
training to track learning progress and identify optimization
opportunities.
Cutting-edge capabilities for research and production
Layer-wise checkpointing with JSON serialization, cross-platform compatibility, and incremental loading for large models.
// Save/load full models
nn.SaveJSON("model.json")
nn.LoadJSON("model.json")
// Layer-specific checkpointing
nn.SaveLayerState(3, "layer3_checkpoint.json")
state, _ := nn.LoadLayerState(3, "layer3_checkpoint.json")
nn.ForwardFromLayer(3, state)
GPU-optimized batch processing with automatic memory management, dynamic workgroup sizing, and efficient data streaming.
// Efficient batch processing
batchInputs := [][][]float64{input1, input2, input3}
outputs, err := nn.ForwardBatch(batchInputs)
// GPU batch processing with custom shaders
nn.buildBatchGPUKernels(batchSize)
Seamless conversion between numeric types with intelligent scaling, precision preservation, and automatic optimization.
// Convert between types
float32Net := paragon.NewNetwork[float32](...)
int32Net, err := paragon.ConvertNetwork[float32, int32](float32Net)
// Batch conversion to multiple types
results, err := paragon.BatchConvertNetworks(
sourceNet, []string{"float32", "int32", "uint32"})
Deep network analysis with method discovery, parameter counting, and performance profiling capabilities.
// Introspect network methods
methods, err := nn.GetphaseMethods()
methodsJSON, err := nn.GetphaseMethodsJSON()
// GPU device information
gpuInfo, err := nn.GetAllGPUInfo()
for _, gpu := range gpuInfo {
fmt.Printf("GPU: %s (%s)\n", gpu["name"], gpu["vendorName"])
}
Comprehensive benchmarking suite with multi-threaded testing, GPU profiling, and detailed performance analysis.
// Benchmark numeric operations
result := paragon.BenchmarkNumericOps[float32]("float32", 5*time.Second, true)
fmt.Printf("Operations/sec: %d\n", result)
// Full benchmark suite
allResults := paragon.RunAllBenchmarks(10*time.Second)
fmt.Println(allResults) // JSON output
Advanced reverse inference techniques for input reconstruction, attribution analysis, and interpretability studies.
// Reverse inference from output
targetOutput := [][]float64{0, 0, 1, 0, 0} // Desired output
inferredInput := nn.InferInputFromOutput(targetOutput, 100, 0.01)
// Bidirectional constraint propagation
nn.PropagateBidirectionalConstraint(actualInput, targetOutput, 0.1, 0.9)
Complete reference for all Paragon types and methods
Main neural network structure supporting all numeric types with GPU acceleration
NewNetwork[T](layers, activations, connected,
seed...)
- Create type-generic network
Forward(input)
- CPU/GPU adaptive forward
pass
ForwardBatch(batchInputs)
- Optimized batch
processing
Backward(targets, lr, clipUpper, clipLower)
- Backpropagation with clipping
Train(inputs, targets, epochs, lr, earlyStop,
clipUpper, clipLower)
- Standard training
TrainWithGPUSync(inputs, targets, epochs, lr,
earlyStop, clipUpper, clipLower)
- GPU training
EvaluateModel(expected, actual)
- ADHD
evaluation
InitializeOptimizedGPU()
- Enable WebGPU
acceleration
Grow(checkpointLayer, inputs, labels, candidates,
epochs, lr, tolerance, ...)
- Network evolution
AddLayer(idx, width, height, activation,
fullyConnected)
- Dynamic architecture
AddNeuronsToLayer(layerIdx, numToAdd)
-
Runtime expansion
SaveJSON(path)
/
LoadJSON(path)
- Model persistence
SaveLayerState(layerIdx, filename)
- Layer
checkpointing
ForwardFromLayer(layerIdx, state)
-
Checkpoint resumption
2D neural layer with Width × Height dimensions and replay capabilities
Width, Height int
- Layer dimensions
Neurons [][]*Neuron[T]
- 2D neuron grid
ReplayEnabled bool
- Replay system toggle
ReplayBudget int
- Maximum replay
operations
CachedOutputs []T
- Cached activations
GetOutputValues() []float64
- Extract layer
outputs
Individual neuron with type-generic values, multiple activation functions, and GPU compatibility
Value T
- Current activation value
Bias T
- Neuron bias term
Activation string
- Function type
Inputs []Connection[T]
- Input connections
RevValue T
- Reverse propagation value
IsNew bool
- Growth tracking flag
Dimension *Network[T]
- Sub-network pointer
Type string
- Neuron type classification
ID int
- Unique identifier
Native WebGPU acceleration with automatic optimization and cross-platform support
InitializeOptimizedGPU() error
- Setup GPU
pipelines
ForwardGPUOptimized(inputs) error
- GPU
forward pass
BackwardGPUOptimized(targets, lr, clipUpper,
clipLower) error
- GPU backprop
SyncGPUWeightsToCPU() error
- GPU→CPU
synchronization
GetAllGPUInfo() ([]map[string]string, error)
- Device enumeration
CleanupOptimizedGPU()
- Resource cleanup
Helper functions for data processing, benchmarking, and network analysis
SplitDataset(inputs, targets, trainFrac)
-
Data splitting
ConvertNetwork[T1, T2](src)
- Type
conversion
Softmax(inputs) []float64
- Softmax
normalization
ArgMax(arr) int
- Maximum index finder
RunAllBenchmarks(duration) string
-
Performance testing
ReadCSV(filename) ([][]string, error)
- CSV
loading
ComputeAccuracy[T](nn, inputs, targets) float64
- Accuracy calculation
LoadNamedNetworkFromJSONString(jsonStr) (any,
error)
- Dynamic loading
📚 Complete Documentation: For detailed method signatures, parameter descriptions, and usage examples, explore the comprehensive source code documentation in the GitHub repository.
Practical examples and step-by-step tutorials
package main
import (
"fmt"
"github.com/openfluke/paragon/v3"
"github.com/openfluke/pilot/experiments"
)
func main() {
// Load MNIST dataset
mnist := experiments.NewMNISTDatasetStage("./data/mnist")
allInputs, allTargets, err := loadMNISTData("./data/mnist")
if err != nil {
panic(err)
}
// Split dataset 80/20
trainInputs, trainTargets, testInputs, testTargets :=
paragon.SplitDataset(allInputs, allTargets, 0.8)
// Create optimized network
nn := paragon.NewNetwork[float32](
[]struct{ Width, Height int }{28, 28}, {32, 32}, {10, 1},
[]string{"linear", "relu", "softmax"},
[]bool{true, true, true},
)
// Enable GPU acceleration
nn.WebGPUNative = true
if err := nn.InitializeOptimizedGPU(); err != nil {
fmt.Printf("GPU unavailable: %v\n", err)
nn.WebGPUNative = false
}
defer nn.CleanupOptimizedGPU()
// Train with GPU synchronization
nn.TrainWithGPUSync(trainInputs, trainTargets, 10, 0.05,
false, float32(2), float32(-2))
// Evaluate with ADHD metrics
expected := make([]float64, len(testInputs))
actual := make([]float64, len(testInputs))
for i, input := range testInputs {
nn.Forward(input)
output := nn.ExtractOutput()
expected[i] = float64(paragon.ArgMax(testTargets[i][0]))
actual[i] = float64(paragon.ArgMax(output))
}
nn.EvaluateModel(expected, actual)
fmt.Printf("🎯 Test Accuracy: %.2f%%\n", nn.Performance.Score)
// Save trained model
nn.SaveJSON("./models/mnist_model.json")
}
// Comprehensive benchmark suite
results := paragon.RunAllBenchmarks(5 * time.Second)
fmt.Println("📊 Benchmark Results:")
fmt.Println(results)
// Type-specific benchmarks
float32Ops := paragon.BenchmarkNumericOps[float32]("float32",
2*time.Second, true)
int32Ops := paragon.BenchmarkNumericOps[int32]("int32",
2*time.Second, true)
fmt.Printf("Float32: %s ops/sec\n", formatNumber(float32Ops))
fmt.Printf("Int32: %s ops/sec\n", formatNumber(int32Ops))
// GPU device information
gpuInfo, err := paragon.GetAllGPUInfo()
if err == nil {
for _, gpu := range gpuInfo {
fmt.Printf("🚀 GPU: %s (%s)\n", gpu["name"], gpu["vendorName"])
fmt.Printf(" Max Buffer: %s MB\n", gpu["maxBufferSizeMB"])
}
}
// Advanced network growth with NAS
improved := nn.Grow(
2, // Checkpoint layer
validationInputs, // Validation data
validationLabels, // Validation labels
20, // Candidate architectures
5, // Training epochs per candidate
0.01, // Learning rate
1e-6, // Convergence tolerance
float32(1), float32(-1), // Gradient clipping
8, 128, // Width constraints
1, 8, // Height constraints
[]string{"relu", "leaky_relu", "swish"}, // Activation pool
4, // Parallel threads
)
if improved {
fmt.Println("🌱 Network architecture improved!")
nn.PrintGrowthHistory()
// Save growth log
nn.SaveGrowthLogJSON("growth_history.json")
} else {
fmt.Println("⚠️ No architectural improvements found")
}
// Iterative NAS for maximum optimization
bestNet, bestScore := nn.IterativeInitNAS(
15, // Clones per iteration
3, // NAS epochs
0.001, // Base learning rate
0.05, // Weight mutation rate
false, // Early stopping
true, // Allow activation mutations
98.0, // Target ADHD score
10, // Maximum attempts
trainInputs, trainTargets,
float32(2), float32(-2),
)
fmt.Printf("🏆 Best NAS Score: %.2f%%\n", bestScore)