ENTITY format (.entity)
Every Numerical Type In Native TopologY
Native Loom checkpoint files. One .entity file = one saved brain: full volumetric topology + all native-packed weights in a single binary artifact.
Implementation: poly/entity.go
Validated in Lucy menu [7] Seven-layer CPU suite — JSON and .entity save/reload run side by side for all 21 dtypes (lucy/examples/seven_layer/runner.go). Lucy [8] ENTITY Talk converts HF LLMs to .entity and runs GPU chat from native checkpoints (lucy/hf_entity.go).
Why we built this
HuggingFace .safetensors is the right import lane for PyTorch/HF checkpoints. Loom uses it for model download and HF decode (poly/safetensors.go, SoulGlitch, LoomCreateLLM).
It is not a native Loom checkpoint format. That is not a bug in SafeTensors — Loom simply does more than flat named tensors:
| Loom needs | SafeTensors |
|---|---|
| 21 DTypes with native on-disk packing (Int4 nibbles, Binary 8:1, Ternary, FP4, …) | Fixed HF dtype strings; sub-byte types are awkward; Loom export is F32-only |
Per-layer Scale (quant mapping used at save time) |
No standard field |
Volumetric grid (Z, Y, X, L) per layer |
Flat string keys only (model.layers.0…) |
| Topology — parallel branches, sequential stacks, metacognition | Requires separate config.json; no recursion |
| Bit-perfect reload of trained native dtypes | Import path usually decodes to FP32 master |
We already had full fidelity in persistence.go (SerializeNetwork / DeserializeNetwork) — JSON + Base64 native blobs. That works (Lucy save/reload PASS on all 21 dtypes) and remains the transparent debug lane, but it is large and slow for shipping brains to phones or edge nodes.
ENTITY is the native binary path:
- Same semantics as JSON persistence (same topology spec, same packing rules)
- SafeTensors-like wire safety: length-prefixed header + indexed blob section
- Raw weight bytes (no Base64)
- Different file extension so HF tooling does not assume HuggingFace semantics
Import: model.safetensors ← HF yarn (read-only in product flow)
Native: fluffy.entity ← ENTITY (train, save, reload, ship)
Debug: model.json ← JSON persistence (same brain, verbose)
One file = topology + weights
Unlike HuggingFace’s split of config.json + model.safetensors, ENTITY keeps everything together:
┌─────────────────────────────────────────┐
│ Fixed header (magic, version, flags) │
├─────────────────────────────────────────┤
│ JSON header │
│ • network topology (grid + layers) │ ← PersistenceNetworkSpec
│ • blob index (path, offset, dtype…) │
├─────────────────────────────────────────┤
│ Binary payload │ ← native-packed weight blobs
└─────────────────────────────────────────┘
After LoadEntity, you get a full VolumetricNetwork — grid dimensions, every layer’s type/activation/dtype/config, recursive branches, and quantized weights ready for forward or further training.
The unlock: HF models as native citizens
ENTITY is not only a smaller checkpoint format. It is the bridge that moves real LLM weights from HuggingFace’s flat tensor world into Loom’s volumetric brain format — the same container Lucy [7] uses for 3D grids, parallel branches, and per-layer dtypes.
Before vs after
Before — two separate worlds:
HF .safetensors → flat tensor names → Poly Talk reads every run
↓
foreign format (no grid, no branches, no per-layer dtype in one file)
Lucy [7] seven-layer suite → 2×2×2 grids, remote links, 21 dtypes
↓
synthetic trained brains only
After — one native lane:
HF snapshot → convert once → .entity → VolumetricNetwork + transformer globals
↓
same format as Lucy [7] save/reload
chat without HF weights at runtime (Lucy [8] ENTITY Talk)
HuggingFace models are no longer guests. They are .entity citizens — reloadable, trainable, graftable, and eligible for every volumetric feature the stack already implements.
The arc: simple → native → experimental → full 3D
| Stage | What it is | Status |
|---|---|---|
| Simple | Flat HF decoder; Poly Talk loads safetensors each run | ✅ Shipped |
| Native | HF → .entity; Q4 baked for decoder blocks; GPU chat from Lucy-owned checkpoint |
✅ Shipped (Lucy [8] ENTITY Talk) |
| Experimental | Graft, parallel branches, remote links, per-layer dtype mixes on imported LLM weights | 🔓 Unlocked in format + API; product UI not built |
| Full 3D | LLM blocks as cells in a (Z,Y,X,L) grid — experts, hops, evolution around a frozen core |
🔮 Next chapter |
Lucy [7] proved the volumetric stack on small trained grids. Lucy [8] brings real LLM weights into that same format. The chat path today is still a flat decoder layout; the container is already the full brain OS.
HF import layout today
ImportHFToEntity (hf_import.go) maps a Llama-style stack into a 1×1×1 grid with four sub-layers per block (pre-norm, MHA, post-norm, SwiGLU):
net := NewVolumetricNetwork(1, 1, 1, dims.NumLayers*4)
InitHFDecoderBlocks(net, dims)
ENTITY Talk chat uses this linear layout. Nothing in the format prevents expanding to 2×2×N, parallel experts, or remote links — that is topology editing on a loaded VolumetricNetwork, then SaveEntityTransformer.
What the format unlocks
| Capability | Supported by format / poly | ENTITY Talk UI today |
|---|---|---|
| Save / reload / train native state | ✅ | Convert + chat only |
| Different dtype per layer in one file | ✅ | Q4 decoder when user picks INT4 at convert |
Parallel branches / MoE-style filter gates |
✅ parallel.go |
❌ |
Spatial hops (IsRemoteLink) |
✅ dispatch.md |
❌ |
| Graft multiple networks into one parallel layer | ✅ grafting.go |
❌ |
| NEAT / topology evolution | ✅ evolution.md |
❌ |
| Selective layer load + block-wise GPU upload | ✅ DeserializeEntityWithOptions |
✅ block upload prompt |
| Merge two LLMs with mismatched hidden size / vocab | ❌ shapes must align | ❌ |
Principle: anything Lucy [7] could do to a trained .entity, you can now in principle do to an imported LLM .entity — graft a side branch, add an experimental layer, mix dtypes, evolve topology around a frozen decoder core. Wiring those flows into product UI is separate work; the format bridge is the prerequisite, and it exists.
Example directions (not shipped)
// Load two checkpoints, graft parallel branches, save hybrid
a, _ := poly.LoadEntityTransformer("lucy_entities/Qwen--Qwen3-0.6B.entity")
b, _ := poly.LoadEntity("lucy_testing_output/my_swiglu_Int4.entity")
graft, err := poly.GraftNetworksPolymorphic([]*poly.VolumetricNetwork{a.Network, b.Network}, "concat")
// … embed graft in a new net topology, SaveEntityTransformer …
See parallel_sequential.md, evolution.md, and quick_reference.md for the underlying APIs.
LLM transformer checkpoints (Lucy [8])
Lucy menu [8] ENTITY Talk (lucy/hf_entity.go) converts supported HF models (SmolLM2, Qwen, Llama-style) to universal-transformer .entity files and runs GPU chat without loading safetensors at runtime.
Flow:
HF cache → ImportHFToEntity (FP32 master) → SerializeEntityTransformer (Q4_0 bake if INT4)
→ lucy_entities/*.entity → LoadEntityTransformer → chat
Q4 on disk vs GPU (v1)
When the user selects Q4 (INT4) at convert time, implementation lives in entity_q4.go:
| Weight region | On disk (.entity) |
On GPU at chat |
|---|---|---|
| Decoder MHA + SwiGLU | Q4_0 blocks (baked; no re-quant on load) | Q4_0 via cached Q4_0Packed / uploadQ4_0Cached |
| RMSNorm, MHA Q/K norms, final norm | FP32 | FP32 |
| Embeddings, LM head | FP32 | FP32 |
RMSNorm stays FP32 intentionally — quantizing norm gamma corrupts the forward pass. Globals stay FP32 in v1; that is why large-vocab untied models (e.g. Qwen3) may show little disk shrink vs BF16 safetensors even though decoder Q4 is real (GPU weights ~1450 MB vs ~4550 MB FP32 for Qwen3-0.6B).
Model-specific metadata persisted in the header includes expanded query_dim / kv_dim (Qwen-style MHA), MHA q_norm / k_norm auxiliary blobs, and lm_head_tied.
Tokenizer and chat template still come from the HF snapshot; only weights move native.
Name and identity
| Format | ENTITY |
| Expansion | Every Numerical Type In Native TopologY |
| Extension | .entity |
| Magic | ENTITY\0\0 (8 bytes) |
| Format version | 1 (v1, implemented) |
Wire layout (v1)
Offset Size Content
────── ──── ───────
0 8 magic "ENTITY\0\0"
8 2 u16 format_version (= 1)
10 2 u16 flags (reserved; 0 today)
12 8 u64 header_byte_length (LE)
20 N header JSON (see below)
20+N … native-packed weight blobs (contiguous)
Header JSON
The header is one JSON object:
{
"format_version": 1,
"network": { /* PersistenceNetworkSpec — topology only, no weight strings */ },
"transformer": {
"architecture": "llama_style_decoder",
"hidden_size": 2048,
"vocab_size": 32000,
"lm_head_tied": true,
"has_final_norm": true,
"dims": { "num_layers": 24, "num_heads": 32, ... }
},
"blobs": [
{
"path": "layers.0",
"offset": 0,
"length": 1234,
"dtype": "INT4",
"scale": 0.01,
"native": true
}
]
}
| Field | Role |
|---|---|
network |
Same shape as PersistenceNetworkSpec: depth, rows, cols, layers_per_cell, and every PersistenceLayerSpec (type, activation, dtype, z/y/x/l, MHA/CNN dims, parallel/sequential recursion). No weights Base64 strings — those live in the blob section. |
transformer |
Optional universal-transformer add-on. When present, global causal-LM weights live outside net.Layers: embeddings, LM head, final RMSNorm. Used by ImportHFToEntity for SmolLM2, Qwen, Llama-style decoders. Tokenizer/chat template still come from the HF snapshot (or your app). |
blobs[] |
Index into the payload. Each entry points at one weight store (main layer, nested branch, or transformer global). |
Blob paths mirror the in-memory tree:
| Path example | Weight store |
|---|---|
layers.0 |
Top-level layer index 0 |
layers.3.sequential_layers.1 |
Nested sequential sub-layer |
layers.2.parallel_branches.0 |
Parallel branch |
layers.5.meta_observed_layer |
Metacognition observed layer |
transformer.embeddings |
Token embedding matrix (FP32 blob) |
transformer.lm_head |
Output projection (omitted when lm_head_tied) |
transformer.final_norm |
Pre-head RMSNorm gamma (when has_final_norm) |
Each blob carries its own dtype, scale, and native flag — so a single checkpoint can hold different numerical types per layer (e.g. layer 0 Int4, layer 12 BFloat16, layer 40 Binary).
Weight blobs
Payload bytes use the same bit-packing as JSON persistence:
- Implemented via
EncodeNativeWeightsRaw/DecodeNativeWeightsRawinpersistence.go - Documented in serialization.md
No Base64. No FP32-only export constraint (unlike SaveSafetensors).
API (poly/entity.go)
| Function | Purpose |
|---|---|
SerializeEntity(net) |
Network → .entity bytes |
DeserializeEntity(data) |
Bytes → full network (topology + all weights) |
DeserializeEntityWithOptions(data, opts) |
Selective weight load (EntityLoadOptions.LayerIndices) |
DeserializeEntityLayer(data, layerIndex) |
Topology + one top-level layer’s weights |
SaveEntity(path, net) / LoadEntity(path) |
File I/O |
SerializeEntityTransformer(et) / DeserializeEntityTransformer(data) |
Universal transformer: decoder + embeddings/LM head/final norm |
SaveEntityTransformer / LoadEntityTransformer / LoadEntityTransformerAs[T] |
File I/O + NewTransformer wiring |
ImportHFToEntity(modelDir, path, opts) |
HF snapshot → universal .entity (hf_import.go) |
ParseEntityHeader(data) |
Header only (no weight decode; mmap-friendly planning) |
LayerPersistenceFromEntity(data, layerIndex) |
Raw blob + scale + native for one layer (parity checks) |
EntityBlobBytes(data, blobIndex) |
Raw bytes for blob i without dtype decode |
Hub model: load any format → VolumetricNetwork → save as JSON, .entity, or (lossy) safetensors F32 export.
net, err := poly.LoadEntity("brain.entity")
// … inference or re-quantize per layer (Morph) …
poly.SaveEntity("brain-v2.entity", net)
jsonWire, _ := poly.SerializeNetwork(net) // still valid debug export
Size vs JSON — observed compression (Lucy [7])
Source: lucy/lucy_testing_output/seven_layer.txt — full [7] Seven-layer CPU suite run (10 layer types × up to three grids × 21 dtypes). Checkpoints are written after MC training as tag_DType.json and tag_DType.entity.
Headline numbers
| Metric | Result |
|---|---|
| Runs compared | 546 dtype×suite rows (26 memory tables) |
| Save/reload | json=PASS entity=PASS on all 546 trained reload checks; entity=FAIL: 0 |
| Average disk saving | .entity is ~27.6% smaller than JSON (typical band 25–28%) |
| Runtime heap | Unchanged — savings are on-disk only (same trained-native weight RAM in the log’s Weights column) |
ENTITY v1 removes Base64 weight strings and pretty-printed JSON weights. It does not re-quantize. The header is still full topology JSON, so this is not safetensors-class compression.
Sample checkpoints (trained, after MC train)
SwiGLU 2×2×2 (8 cells × 7 layers = 56-layer stack) — the grid from the first live ENTITY comparison:
| DType | JSON ckpt | .entity ckpt |
Saving |
|---|---|---|---|
| Float64 | 496.79 KiB | 372.49 KiB | 25% |
| Float32 | 258.17 KiB | 193.35 KiB | 25% |
| Int4 | 49.92 KiB | 36.90 KiB | 26% |
| Binary | 27.66 KiB | 20.29 KiB | 27% |
Dense 1×1×1 (7-layer pyramid stack):
| DType | JSON ckpt | .entity ckpt |
Saving |
|---|---|---|---|
| Float32 | 57.95 KiB | 43.47 KiB | 25% |
| Int4 | 9.53 KiB | 7.12 KiB | 25% |
| Binary | 4.35 KiB | 3.24 KiB | 26% |
Dense 3×3×3 (27 cells × 7 = 189-layer stack):
| DType | JSON ckpt | .entity ckpt |
Saving |
|---|---|---|---|
| Float32 | 83.90 KiB | 60.95 KiB | 27% |
| Int4 | 69.93 KiB | 49.68 KiB | 29% |
| Binary | 68.61 KiB | 49.09 KiB | 28% |
Across all 21 dtypes on SwiGLU 2×2×2, the ENTITY/JSON ratio stays in a 25.0–26.6% band — the saving is almost entirely Base64 removal, not dtype-specific magic.
Three things the log teaches
1. ENTITY vs JSON ≈ fixed ~25% discount, not 10×
The ratio is stable because both formats carry the same topology JSON header and the same native weight bits; only the weight encoding in the file changes (Base64 strings → raw blob section).
2. Quant dtype still dominates absolute file size
Same topology, different dtype — SwiGLU 2×2×2:
| DType | JSON | .entity |
|---|---|---|
| Float64 | 497 KiB | 372 KiB |
| Int4 | 50 KiB | 37 KiB |
Int4 JSON is ~10% of Float64 JSON on the same brain. Picking Int4/Binary matters far more than picking .entity over .json.
3. Topology overhead grows with grid size; ENTITY shrinks the gap
SwiGLU 2×2×2 Float32 breakdown (from the log’s Weights vs checkpoint columns):
| Component | Size |
|---|---|
| Trained-native weights in RAM | 178.50 KiB |
| JSON checkpoint | 258.17 KiB (+80 KiB overhead ≈ 31% of file) |
.entity checkpoint |
193.35 KiB (+15 KiB overhead ≈ 8% of file) |
On Dense 3×3×3 Float32, trained-native weights are only ~12 KiB in RAM but the JSON checkpoint is ~84 KiB — topology metadata dominates. ENTITY drops that to ~61 KiB (~27% saving), but the file is still mostly header, not weights.
Residual 3×3×3 is an extreme case in the log: ~42% smaller .entity vs JSON when per-layer weight blobs are tiny relative to the 189-layer spec.
Where to read it in the log
Each layer-type × grid block ends with a memory & weight footprint table:
| DType | Heap | Sys | Heap+train | Weights | JSON ckpt | .entity ckpt |
Pass lines also print both sizes inline, e.g. json=496.79 KiB entity=372.49 KiB on SwiGLU 2×2×2 Float64.
Why not safetensors-small?
- Weights are already at the bit-width floor for each dtype (Int4 nibbles, Binary 8:1, …).
- Topology JSON is a large fixed cost on big grids (56–189 layer specs with type, dtype, dims, z/y/x/l).
- SafeTensors omits that graph entirely — flat tensor names only.
See Future: smaller files with full topology for the planned binary topology + optional zstd path.
Comparison to SafeTensors
| SafeTensors | ENTITY v1 | JSON persistence | |
|---|---|---|---|
| Weights on disk | Raw HF dtypes | Raw Loom native packing | Base64 in JSON |
| Topology in same file | ❌ | ✅ (JSON header) | ✅ |
| Per-layer Loom dtype + Scale | ❌ | ✅ | ✅ |
| Volumetric (Z,Y,X,L) | ❌ | ✅ | ✅ |
| Parallel / sequential tree | ❌ | ✅ | ✅ |
| Typical header size | Tiny | Large on big grids | Largest (includes Base64 weights) |
SafeTensors wins on flat LLM weight dumps. ENTITY wins on native Loom brains you trained and need to reload exactly.
Idempotency
SerializeEntity → DeserializeEntity → SerializeEntity yields identical bytes for a given network state (tested in poly/tests/entity_test.go).
Topology fields are canonicalized on save (e.g. default seq_length omitted for non-sequence layers) so reload does not inflate the header.
Validation
| Suite | What it checks |
|---|---|
Lucy [7] (seven_layer/runner.go) |
Before/after train: JSON and .entity save/reload PASS; memory table shows both checkpoint sizes |
Lucy [8] (lucy/hf_entity.go) |
HF cache → Q4 .entity convert → GPU ENTITY Talk; SmolLM2 parity with Poly Talk; Qwen load + Q4 GPU path |
poly/tests/entity_test.go |
Round-trip, idempotent bytes, selective layer load, Q4_0 blob round-trip for transformers |
Checkpoints land in lucy/lucy_testing_output/ as tag_DType.json and tag_DType.entity. Full-run numbers and compression observations: entity.md — observed compression (from seven_layer.txt).
Relationship to other I/O
| File | Role |
|---|---|
safetensors.go |
Read HF .safetensors; SaveSafetensors is F32-only export |
persistence.go |
JSON save/load — semantic reference for ENTITY topology and packing |
entity.go |
Native .entity binary save/load |
serialization.go |
Architecture-only JSON (BuildNetworkFromJSON) — random init, no trained weights |
universal_loader.go |
Auto-detect from safetensors shapes — import only |
Future: smaller files with full topology
ENTITY v1 prioritizes correctness and debuggability. Planned v2+ improvements (same full topology, smaller wire):
- Binary topology section — string tables, dtype/layer-type enums, grid-implied
(z,y,x,l)where regular - Compact blob index — fixed records (
node_id,u8 dtype,f32 scale, offsets) instead of JSON path strings - Optional zstd (lossless) on header/index/payload via
flagsbits ConvertSafetensorsToEntity— import HF weights into a Loom topology wrapper- Welvet C-ABI —
LoomSaveEntity,LoomLoadEntity; loaders accept.entityor.safetensors - SoulGlitch — prefer
.entityfor on-device trained saves; keep HF download as.safetensors
Weights stay on EncodeNativeWeightsRaw; the big disk wins are in topology + index, not re-quantizing weights.
See also
- serialization.md — JSON persistence, bit-packing, SafeTensors import, three save paths
- transformer.md — MHA, SwiGLU, HF decoder layout; links back here for native checkpoints
- parallel_sequential.md — parallel branches and combine modes (graft targets)
- evolution.md — NEAT, remote links, topology mutation
- numerical_types.md — 21 DTypes
- quantization.md — Scale, Morph, native packing, Q4_0
- testing_and_validation.md — Lucy [7] logs and tables
- bedrock_validation.md — seven-layer CPU suite overview