Next-Gen Neural Architecture

LOOM Neural Engine

Deterministic, CPU-first neural runtime with WebGPU acceleration. One JSON model format across Go, TypeScript/JavaScript, Python, C#, C-ABI, and WASM. Identical outputs across every platform.

CPU + WebGPU Bit-for-bit parity 3D Grid Architecture Open source
Latest Release

What's New in v0.0.8

The "NeuroSymbolic" update brings recursive clustering, transformer components, and native MoE routing.

Recursive Neuro-Symbolic

New KMeansLayer enables differentiable clustering and symbolic reasoning within the neural graph.

Transformer Inference

Native support for Multi-Head Attention (MHA), AdaNorm, GELU, and advanced Softmax variants.

Grid Softmax (MoE)

Native Mixture-of-Experts routing using Grid Softmax and Grid Scatter modes.

Neural Tweening

Geometric weight interpolation combined with backpropagation for robust adaptation.

Recursive Safetensors

Full support for saving and loading complex, nested architectures using the Safetensors format.

Verified Numerical Types

Comprehensive benchmarking for int8, uint16, float64, and more.

GPU Acceleration Update: As of v0.0.8, WebGPU acceleration is enabled for standard Forward/Backward. Note that Step-based execution, Neural Tweening, and K-Means currently run on CPU only.

Architecture

Beyond Traditional Networks

Loom introduces revolutionary concepts that break free from sequential bottlenecks, enabling parallel processing and spatial organization.

Scene 1

The Bottleneck

Traditional neural networks think sequentially—like cars at a red light. Every layer must lock, process, and wait. If one layer stalls, the entire intelligence grinds to a halt.

Scene 2

The Roundabout (Stepping Mode)

Loom replaces the red light with a roundabout. In Stepping Mode, data flow is continuous—early layers ingest new data while deeper layers think about previous context. Training and inference happen simultaneously.

Scene 3

The 3D Grid (Zig-Zag)

Traditional models are flat stacks of pancakes. Loom models live in a 3D grid. Signals don't just go up—they travel through space, zig-zagging through rows and columns for spatial organization.

Scene 4

Infinite Connections (Starburst)

With Parallel Linking, any part of the brain can talk to any other part instantly. A central hub broadcasts simultaneously to multiple corners—skipping the line entirely.

TypeScript / Node.js
npm install @openfluke/welvet
Python
pip install welvet
Go / C-ABI
go get github.com/openfluke/loom
C# / .NET
dotnet add package Welvet
Capabilities

What Makes LOOM Different

Key points from the LOOM capability report: deterministic, cross-language, CPU-first.

Deterministic Everywhere

Bit-for-bit parity across Go, Python, TypeScript/JS, C#, C, and WASM. CPU-first with optional WebGPU acceleration.

Unified API Surface

Same function names across languages: create, forward, train, save/load, evaluate. One JSON model format.

Layer Coverage

Dense, Conv2D, Multi-Head Attention, RNN/LSTM, LayerNorm, Residual, RMSNorm, SwiGLU, Softmax (10 variants, MoE).

Stepping API

Fine-grained forward/backward with manual gradient application for online/real-time learning scenarios.

Quick Start

Code Examples

TypeScript
import { createNetworkFromJSON, forward } from "@openfluke/welvet";

const model = {
  layers: [
    { type: "dense", width: 4, height: 8, activation: "relu" },
    { type: "dense", width: 8, height: 2, activation: "softmax" },
  ],
};

const net = createNetworkFromJSON(model);
const output = forward(net, [[0.1, 0.2, 0.3, 0.4]]);
console.log("output", output);
Python
from welvet import create_network_from_json, forward

config = {
    "layers": [
        {"type": "dense", "width": 4, "height": 8, "activation": "relu"},
        {"type": "dense", "width": 8, "height": 2, "activation": "softmax"},
    ]
}

net = create_network_from_json(config)
print(forward(net, [[0.1, 0.5, 0.3, 0.7]]))
Go
package main

import (
    "fmt"
    "github.com/openfluke/loom"
)

func main() {
    layers := []loom.LayerConfig{
        {Type: "dense", Width: 4, Height: 8, Activation: "relu"},
        {Type: "dense", Width: 8, Height: 2, Activation: "softmax"},
    }
    net, _ := loom.BuildNetworkFromJSON(layers)
    out := loom.ForwardCPU(net, [][]float32{{0.1, 0.2, 0.3, 0.4}})
    fmt.Println(out)
}
Key Strengths

Unique Advantages

What sets Loom apart from traditional runtimes—built for embedding, designed for portability.

True Embeddability

Compiles into a single binary with zero external dependencies. No Python runtime, no C++ bridges—just deploy and run.

Run Anywhere (Polyglot)

First-class C ABI and WebAssembly support. Train and infer in browsers, Python, C#, Rust, and Node.js with identical behavior.

Hybrid Gradient/Geometric Engine

"Neural Tweening" combines geometric gap-closing with backprop-guided momentum. Features Link Budget telemetry and Explosion Detection for self-healing training.

Structural Parallelism

LayerParallel system supports arbitrary branching with Concat, Add, Average, Grid Scatter, and Softmax-Gated MoE. Native Inception, ResNeXt, and Siamese architectures.

Native Mixed-Precision

Generic tensor backend supports int8, uint16, and float32 natively. Quantization-aware training without post-processing wrappers.

Universal Tokenizer

Pure Go BPE implementation compatible with HuggingFace tokenizer.json files. No Rust or C++ dependencies required.

Complete Training Infrastructure

7 LR schedulers, 3 optimizers (SGD/AdamW/RMSprop), 10 softmax variants for MoE routing, and 5 activation functions with proper derivatives.

Telemetry & Introspection

Runtime reflection via GetMethodsJSON(), ExtractNetworkBlueprint() for visualizing structure, and complete evaluation suite with deviation metrics.

Known Limitations: No central Model Zoo (relies on external checkpoints), WebGPU acceleration is beta/experimental, and broad operator coverage (3D Conv, Deformable Attention, FFTs) is limited compared to SciPy/JAX.

Comparison

The AI Landscape

See how Loom compares to major industry engines and the Go ecosystem.

Feature Loom (Go) PyTorch TensorFlow GoMLX Spago TF.js Candle
Core
Runtime Dependency None (Binary) Heavy (Pip) Binary CGo/XLA None Browser None
Auto-Differentiation ⚠️ Hybrid ✅ Full ✅ Full ✅ Full ⚠️ Manual ✅ Full ✅ Full
Loading & Format
Safetensors ✅ Native
Structure Inference ✅ Auto-Detect
Training
Neural Tweening ✅ Hybrid Engine
LR Schedulers ✅ 7 Types ⚠️ Basic
Layers
Parallel / MoE ✅ Structural ⚠️ Manual ⚠️ Manual
SwiGLU ✅ Native
Pure Go Tokenizer Rust/C++ C++
Platform
WASM Training ✅ Full ⚠️ Slow
Cross-Lang C-ABI ✅ Universal ⚠️
Advanced
Step-Based Forward ✅ Unique
Dynamic Arch Gen ✅ Built-in
Network Grafting ✅ Unique
Feature Loom GoMLX Gorgonia Spago Go-Deep Gonum
Foundation
Implementation Pure Go CGo (XLA) Pure Go + CGo Pure Go Pure Go Pure Go
Autograd ⚠️ Hybrid ✅ Full ✅ Symbolic ✅ Dynamic ✅ Backprop
Model Loading
Safetensors ✅ Native
Architectures
Transformer (MHA) ✅ Explicit ⚠️ Hard ✅ (BERT)
RNN / LSTM ✅ Full Gate ⚠️ Basic ✅ BiLSTM
SwiGLU
Parallel / MoE ✅ Structural ⚠️ Manual
Training
Hybrid Tweening ✅ Unique
Softmax Variants ✅ 10 Types ⚠️ Standard ⚠️ Standard ⚠️ Standard ⚠️ Standard
Advanced
RoPE (GQA) ✅ GQA Support
Network Grafting ✅ Unique
Step-Based Forward ✅ Unique
Dynamic Arch Gen ✅ Unique
Platform
C-ABI (Polyglot) ✅ Universal
WASM Training ✅ Full ❌ (XLA)
Ecosystem
Maintenance 🔥 Active 🔥 Active ⚠️ Slow ⏸️ Paused ⚠️ Slow 🔥 Active
Precision

Native Numerical Type Support

Train and infer on any numerical type without wrappers or post-processing. Most runtimes require QAT (Quantization-Aware Training)—a multi-step process where you train in float32, then simulate lower precision during fine-tuning, then convert to int8 for deployment. Loom skips this entirely: define your types upfront and train natively on int8, uint16, or any supported type from the start.

Layer Type / Numerical Type Loom GoMLX Gorgonia Spago PyTorch
Float32 (Standard) ✅ (Float64)
Float64 (High Precision) ✅ Native
Float16 / BF16 ✅ (XLA)
Int8 Training ✅ Native ⚠️ QAT Wrapper
Int8 Inference ✅ (Quant)
Int16, Int32, Int64 ✅ Native ✅ (XLA) ⚠️ Tensor ❌ Tensor Only
Uint8, Uint16, Uint32 ✅ Native ✅ (XLA) ⚠️ Tensor ✅ Uint8 Only

Complete Type System

Unlike runtimes that treat integers primarily as storage formats for quantization, Loom's Generics allow native training and inference on exotic types like uint16 (common in medical imaging), int32, or float64 (scientific simulations) across every layer type without changes to model code.

Verdict

When to Choose Each Engine

The right tool depends on your use case. Here's a quick decision guide.

Choose PyTorch

For research, SOTA models, or complex dynamic architectures requiring the Python ecosystem.

Choose TensorFlow / TFLite

For robust mobile/edge deployment with a mature toolchain and optimized inference.

Choose GoMLX

For high-performance training in Go when you can tolerate CGo and XLA C++ dependencies.

Choose Core ML

For iOS/macOS exclusive apps leveraging Apple's Neural Engine and native integration.

Choose Loom

For pure Go-native embedding (cloud/CLI/server), zero-dependency single binaries, Neural Tweening experimentation, Step-Based Forward for real-time inference, or Dynamic Architecture Generation for automated model exploration.

Resources

More Resources