Understanding Model Serialization
This guide explains how Loom saves and loads neural network models—what actually gets stored, how the formats work, and how to use serialization for different deployment scenarios.
What Gets Saved?
When you save a model, you're capturing:
- Architecture: The structure of the network (grid size, layer types, configuration)
- Weights: The learned parameters (millions of floating point numbers)
- Metadata: Version info, model ID, creation time
Saved Model File
┌─────────────────────────────────────────────────────────────────┐
│ │
│ Architecture │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Grid: 2×3, LayersPerCell: 2 │ │
│ │ │ │
│ │ Layer[0,0,0]: Dense, 1024→512, ReLU │ │
│ │ Layer[0,0,1]: Dense, 512→256, ReLU │ │
│ │ Layer[0,1,0]: Attention, heads=8, dim=256 │ │
│ │ ... │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │
│ Weights (encoded) │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Format: Base64 │ │
│ │ Data: "eyJ0eXBlIjoiZmxvYXQzMi1hcnJheSIsImxl..." │ │
│ │ (millions of numbers compressed to text) │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │
│ Metadata │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ ID: "my-classifier-v1" │ │
│ │ Type: "modelhost/bundle" │ │
│ │ Version: 1 │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
The Bundle Format
Loom uses a "bundle" format that can contain multiple models. This is useful for: - Encoder-decoder pairs - Ensemble models - Different versions of the same model
{
"type": "modelhost/bundle",
"version": 1,
"models": [
{
"id": "encoder",
"cfg": { ... architecture ... },
"weights": { ... encoded weights ... }
},
{
"id": "decoder",
"cfg": { ... architecture ... },
"weights": { ... encoded weights ... }
}
]
}
Even single models use this format (just with one entry in the models array). This keeps the format consistent.
How Weights Are Encoded
The challenging part is encoding millions of floating-point numbers efficiently. Here's what happens:
Step 1: Binary Conversion
Float32 values are converted to their binary representation:
Float32: 3.14159...
In memory (IEEE 754):
Sign: 0 (positive)
Exponent: 10000000 (128, meaning 2^1)
Mantissa: 10010010000111111011011
Binary bytes: [0x40, 0x49, 0x0F, 0xDB]
Step 2: Base64 Encoding
Binary data is converted to text using Base64 (only uses safe ASCII characters):
Binary bytes: [0x40, 0x49, 0x0F, 0xDB, ...]
↓
Base64 string: "QEkP2w==" (roughly 33% larger than binary)
Why Base64?
JSON can't directly contain binary data (binary bytes might include special characters). Base64 ensures the weights are safe to embed in JSON:
Direct binary in JSON: BROKEN
{"weights": "AB\x00CD\xFF..."}
↑ ↑
Null byte Invalid UTF-8
breaks JSON breaks JSON
Base64 in JSON: SAFE
{"weights": "QEkP2w/rABC..."}
↑
Only letters, numbers, +, /, =
File-Based Serialization
Saving a Single Model
err := network.SaveModel("model.json", "my-classifier")
What happens: 1. Network traverses all layers 2. For each layer: records type, size, activation 3. For each weight matrix: converts to bytes, then Base64 4. Writes JSON to file
Network in memory model.json on disk
┌─────────────────┐ ┌─────────────────────┐
│ Grid: 2×2 │────────────────▶│ { │
│ Layers: [...] │ SaveModel() │ "type": "...", │
│ Weights: [...] │ │ "models": [...] │
└─────────────────┘ │ } │
└─────────────────────┘
Loading a Model
network, err := nn.LoadModel("model.json", "my-classifier")
What happens: 1. Reads JSON from file 2. Parses architecture configuration 3. Creates empty network with correct structure 4. Decodes Base64 weights back to floats 5. Populates network with weights
model.json Network in memory
┌─────────────────────┐ ┌─────────────────┐
│ { │ │ Grid: 2×2 │
│ "models": [{ │──────────▶ │ Layers: [Dense, │
│ "cfg": {...}, │ LoadModel()│ Softmax]│
│ "weights": "..."│ │ Weights: [1.2, │
│ }] │ │ -0.5, │
│ } │ │ ...] │
└─────────────────────┘ └─────────────────┘
JSON Configuration (Advanced)
Loom allows you to define complex architectures directly in JSON using nn.BuildNetworkFromJSON. This is particularly useful for recursive and parallel structures.
KMeans Layer Fields
| Field | Type | Description |
|---|---|---|
type |
string | Must be "kmeans" |
num_clusters |
int | Number of centroids (K) |
kmeans_output_mode |
string | "probabilities", "features", or "reconstruction" |
attached_layer |
object | Recursive: A full LayerDefinition for the internal sub-network |
Example:
{
"type": "kmeans",
"num_clusters": 8,
"attached_layer": {
"type": "dense", "input_size": 16, "output_size": 16, "activation": "tanh"
}
}
Parallel Layer (Gated MoE) Fields
| Field | Type | Description |
|---|---|---|
type |
string | Must be "parallel" |
branches |
array | List of LayerDefinition objects for each expert |
combine_mode |
string | "concat", "add", "avg", "filter", or "grid_scatter" |
filter_gate |
object | Gating: A LayerDefinition for the gate network (if mode is "filter") |
filter_softmax |
string | Gating normalization: "standard", "sparsemax", etc. |
filter_temperature |
float | Softness/sharpness of routing (default: 1.0) |
Example:
{
"type": "parallel",
"combine_mode": "filter",
"filter_softmax": "standard",
"filter_gate": { "type": "dense", "input_size": 16, "output_size": 2 },
"branches": [
{ "type": "dense", "output_size": 8 },
{ "type": "kmeans", "num_clusters": 4 }
]
}
String-Based Serialization
Sometimes you don't have a file system—for example, in WebAssembly or when sending models over a network.
Saving to String
jsonString, err := network.SaveModelToString("my-classifier")
// jsonString is now a complete JSON representation
The output is exactly the same as file-based, but returned as a string instead of written to disk.
Loading from String
network, err := nn.LoadModelFromString(jsonString, "my-classifier")
Use Cases
File-based: String-based:
┌─────────────┐ ┌─────────────┐
│ Desktop app │ │ Browser │
│ Server │ │ WebAssembly │
│ CLI tools │ │ REST API │
│ Notebooks │ │ Database │
└─────────────┘ │ Serverless │
│ Mobile │
└─────────────┘
File-based works when you have String-based works when
disk access and want persistence. you need to move models
around without files.
Precision Options
Loom supports the full spectrum of 13 Safetensors DTypes, ranging from double-precision to 4-bit quantization:
Precision Size Range Use case
─────────────────────────────────────────────────────────────────
float64 (F64) 8 bytes ±10^308 High-precision research
float32 (F32) 4 bytes ±10^38 Standard training
float16 (F16) 2 bytes ±65504 Standard inference, GPU
bfloat16 (BF16) 2 bytes ±10^38 Modern LLM inference
float4 (F4) 0.5 bytes [0.25, 3.0] High-compression (8x vs F32)
int64 (I64) 8 bytes ±9 quintillion Large integer networks
int32 (I32) 4 bytes ±2 billion Standard integer networks
int16 (I16) 2 bytes ±32767 Quantized models
int8 (I8) 1 byte ±127 Edge devices, mobile
uint64 (U64) 8 bytes 0 to 18 quintillion Unsigned offsets
uint32 (U32) 4 bytes 0 to 4 billion Unsigned indices
uint16 (U16) 2 bytes 0 to 65535 Unsigned textures/images
uint8 (U8) 1 byte 0 to 255 Standard image data
FP4 (4-bit Float) Support
Loom implements the E2M1 (1 sign bit, 2 exponent bits, 1 mantissa bit) format for extreme model compression. By utilizing a "Shift-and-Scale" quantization strategy, it can maintain >99% quality while using only 0.5 bytes per parameter.
How It Works
Original weights (float32): [0.523, -0.127, 0.891, ...]
↓
Quantize to int8
↓
Quantized (int8): [67, -16, 114, ...]
+ scale factor: 0.00785
+ zero point: 0
↓
Encode to file
↓
On disk: compact int8 representation (4× smaller!)
↓
Decode on load
↓
Restored float32: [0.526, -0.126, 0.895, ...]
(small precision loss)
Size Comparison
1 million weights:
float64: 8 MB
float32: 4 MB ← Standard
float16: 2 MB ← GPU inference
int8: 1 MB ← Edge/mobile
For a 7B parameter model:
float32: 28 GB
int8: 7 GB ← Actually deployable on consumer hardware!
Loading External Models: SafeTensors
HuggingFace models are often stored in "SafeTensors" format. Loom can load these directly.
What is SafeTensors?
SafeTensors is a simple format for storing tensors:
SafeTensors File Structure:
┌───────────────────────────────────────────────────────┐
│ Header (JSON) │
│ ┌───────────────────────────────────────────────────┐ │
│ │ { │ │
│ │ "model.embed_tokens.weight": { │ │
│ │ "dtype": "F16", │ │
│ │ "shape": [151552, 576], │ │
│ │ "data_offsets": [0, 174635008] │ │
│ │ }, │ │
│ │ "model.layers.0.self_attn.q_proj.weight": { │ │
│ │ "dtype": "F16", │ │
│ │ "shape": [576, 576], │ │
│ │ "data_offsets": [174635008, 174967424] │ │
│ │ }, │ │
│ │ ... │ │
│ │ } │ │
│ └───────────────────────────────────────────────────┘ │
├───────────────────────────────────────────────────────┤
│ Binary Data │
│ ┌───────────────────────────────────────────────────┐ │
│ │ [raw bytes for embed_tokens.weight] │ │
│ │ [raw bytes for q_proj.weight] │ │
│ │ [raw bytes for k_proj.weight] │ │
│ │ ... │ │
│ └───────────────────────────────────────────────────┘ │
└───────────────────────────────────────────────────────┘
Loading & Saving Network Weights
While LoadSafeTensors handles raw tensors, the Network object provides high-level methods to directly save and load model weights using the Safetensors format.
// 1. Save network weights with specific precision
// Supported: "F32", "F64", "F16", "BF16", "F4", "I8", etc.
err := network.SaveWeightsToSafetensors("model.safetensors")
// 2. Load weights back into an existing network architecture
err := network.LoadWeightsFromSafetensors("model.safetensors")
These methods are the preferred way to handle model persistence in Loom, as they handle byte conversion and layer mapping automatically.
Data Type Handling
SafeTensors may store weights in float16 or bfloat16. Loom automatically converts:
File contains: float16 (2 bytes each)
↓
Auto-convert
↓
In memory: float32 (4 bytes each)
You don't need to worry about the conversion!
Generic Model Loading
What if you have an unknown model format? Loom can auto-detect:
network, detected, err := nn.LoadGenericFromBytes(weightsData, configData)
The Detection Process
Input: mystery safetensors file
│
▼
┌─────────────────────────────────────────────┐
│ 1. Parse safetensors header │
│ • Extract tensor names and shapes │
└─────────────────────┬───────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ 2. Analyze tensor names │
│ • "model.layers.0.self_attn.q_proj" │
│ → Looks like attention │
│ • "model.embed_tokens" │
│ → Looks like embedding │
│ • "model.layers.0.mlp.gate_proj" │
│ → Looks like SwiGLU │
└─────────────────────┬───────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ 3. Build network architecture │
│ • Create layers matching detected types │
│ • Wire them together appropriately │
│ • Load weights into correct layers │
└─────────────────────────────────────────────┘
│
▼
Ready-to-use Network!
The detected Return Value
The function returns detected tensor info for inspection:
for _, t := range detected {
fmt.Printf("%s: %v (%s)\n", t.Name, t.Shape, t.Type)
}
// Output:
// model.embed_tokens.weight: [151552, 576] (Embedding)
// model.layers.0.self_attn.q_proj.weight: [576, 576] (Attention)
// model.layers.0.self_attn.k_proj.weight: [192, 576] (Attention)
// ...
Transformer-Specific Loading
For Llama-style transformers, there's a specialized loader:
network, err := nn.LoadTransformerFromSafetensors("./models/llama-7b/")
What It Understands
Llama Architecture Pattern:
model.embed_tokens.weight → Embedding layer
model.norm.weight → Final RMSNorm
lm_head.weight → Output projection
For each of N layers:
model.layers.{i}.input_layernorm.weight → Pre-attention norm
model.layers.{i}.self_attn.q_proj.weight → Query projection
model.layers.{i}.self_attn.k_proj.weight → Key projection
model.layers.{i}.self_attn.v_proj.weight → Value projection
model.layers.{i}.self_attn.o_proj.weight → Output projection
model.layers.{i}.post_attention_layernorm.weight → Pre-MLP norm
model.layers.{i}.mlp.gate_proj.weight → SwiGLU gate
model.layers.{i}.mlp.up_proj.weight → SwiGLU up
model.layers.{i}.mlp.down_proj.weight → SwiGLU down
Supported Models
- Llama, Llama 2, Llama 3
- Mistral
- Qwen2.5
- TinyLlama
- SmolLM
- Any model using the Llama architecture
Cross-Platform Deployment
One of Loom's strengths is that saved models work across all platforms:
Model saved in Go
│
▼
model.json
│
├──────────────────────────────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Go │ │ Browser │ │ Python │
│ Native │ │ WASM │ │ welvet │
└─────────────┘ └─────────────┘ └─────────────┘
│ │ │
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ C/C++ │ │ TypeScript │ │ C# │
│ via CABI │ │ bindings │ │ Welvet │
└─────────────┘ └─────────────┘ └─────────────┘
Loading in Different Languages
Go (native):
network, _ := nn.LoadModel("model.json", "my_model")
output, _ := network.ForwardCPU(input)
JavaScript (WASM):
const json = await fetch('model.json').then(r => r.text());
const network = loom.LoadNetworkFromString(json, "my_model");
const output = network.ForwardCPU(inputArray);
Python (welvet):
with open("model.json") as f:
json_str = f.read()
network = welvet.load_model_from_string(json_str, "my_model")
output = network.forward_cpu(input_array)
C (CABI):
char* json = read_file("model.json");
Network* net = LoomLoadModel(json, "my_model");
float* output = LoomForward(net, input, input_len);
Practical Tips
Versioning Models
Include version in the model ID:
network.SaveModel("checkpoints/model_v2.1.0.json", "classifier_v2.1.0")
Checkpointing During Training
Save periodically to recover from crashes:
for epoch := 0; epoch < 1000; epoch++ {
// ... training ...
if epoch % 100 == 0 {
network.SaveModel(
fmt.Sprintf("checkpoints/epoch_%04d.json", epoch),
"training_checkpoint",
)
}
}
Validating Loaded Models
After loading, verify the model works:
loaded, err := nn.LoadModel("model.json", "my_model")
if err != nil {
return err
}
// Test with known input
testInput := make([]float32, 1024)
output, _ := loaded.ForwardCPU(testInput)
// Check output is reasonable (not NaN, not all zeros)
if math.IsNaN(float64(output[0])) {
return errors.New("model produces NaN")
}
Summary
Serialization captures the complete state of a trained model: - Architecture: Layer types, sizes, configurations - Weights: Millions of learned parameters - Format: JSON with Base64-encoded binary weights
Key operations:
- SaveModel / LoadModel - File-based
- SaveModelToString / LoadModelFromString - String-based
- LoadSafeTensors - HuggingFace format
- LoadGenericFromBytes - Auto-detect format
- LoadTransformerFromSafetensors - Llama-style models
The same model file works across Go, WASM, Python, C#, and C.