Docs ENTITY format (`.entity`)

ENTITY format (.entity)

Every Numerical Type In Native TopologY

Native Loom checkpoint files. One .entity file = one saved brain: full volumetric topology + all native-packed weights in a single binary artifact.

Implementation: poly/entity.go

Validated in Lucy menu [7] Seven-layer CPU suite — JSON and .entity save/reload run side by side for all 21 dtypes (lucy/examples/seven_layer/runner.go). Lucy [8] ENTITY Talk converts HF LLMs to .entity and runs GPU chat from native checkpoints (lucy/hf_entity.go).


Why we built this

HuggingFace .safetensors is the right import lane for PyTorch/HF checkpoints. Loom uses it for model download and HF decode (poly/safetensors.go, SoulGlitch, LoomCreateLLM).

It is not a native Loom checkpoint format. That is not a bug in SafeTensors — Loom simply does more than flat named tensors:

Loom needs SafeTensors
21 DTypes with native on-disk packing (Int4 nibbles, Binary 8:1, Ternary, FP4, …) Fixed HF dtype strings; sub-byte types are awkward; Loom export is F32-only
Per-layer Scale (quant mapping used at save time) No standard field
Volumetric grid (Z, Y, X, L) per layer Flat string keys only (model.layers.0…)
Topology — parallel branches, sequential stacks, metacognition Requires separate config.json; no recursion
Bit-perfect reload of trained native dtypes Import path usually decodes to FP32 master

We already had full fidelity in persistence.go (SerializeNetwork / DeserializeNetwork) — JSON + Base64 native blobs. That works (Lucy save/reload PASS on all 21 dtypes) and remains the transparent debug lane, but it is large and slow for shipping brains to phones or edge nodes.

ENTITY is the native binary path:

  • Same semantics as JSON persistence (same topology spec, same packing rules)
  • SafeTensors-like wire safety: length-prefixed header + indexed blob section
  • Raw weight bytes (no Base64)
  • Different file extension so HF tooling does not assume HuggingFace semantics
Import:   model.safetensors     ← HF yarn (read-only in product flow)
Native:   fluffy.entity         ← ENTITY (train, save, reload, ship)
Debug:    model.json            ← JSON persistence (same brain, verbose)

One file = topology + weights

Unlike HuggingFace’s split of config.json + model.safetensors, ENTITY keeps everything together:

┌─────────────────────────────────────────┐
│  Fixed header (magic, version, flags)   │
├─────────────────────────────────────────┤
│  JSON header                            │
│    • network topology (grid + layers)   │  ← PersistenceNetworkSpec
│    • blob index (path, offset, dtype…)  │
├─────────────────────────────────────────┤
│  Binary payload                         │  ← native-packed weight blobs
└─────────────────────────────────────────┘

After LoadEntity, you get a full VolumetricNetwork — grid dimensions, every layer’s type/activation/dtype/config, recursive branches, and quantized weights ready for forward or further training.


The unlock: HF models as native citizens

ENTITY is not only a smaller checkpoint format. It is the bridge that moves real LLM weights from HuggingFace’s flat tensor world into Loom’s volumetric brain format — the same container Lucy [7] uses for 3D grids, parallel branches, and per-layer dtypes.

Before vs after

Before — two separate worlds:

HF .safetensors  →  flat tensor names  →  Poly Talk reads every run
                      ↓
              foreign format (no grid, no branches, no per-layer dtype in one file)

Lucy [7] seven-layer suite  →  2×2×2 grids, remote links, 21 dtypes
                      ↓
              synthetic trained brains only

After — one native lane:

HF snapshot  →  convert once  →  .entity  →  VolumetricNetwork + transformer globals
                                      ↓
                         same format as Lucy [7] save/reload
                         chat without HF weights at runtime (Lucy [8] ENTITY Talk)

HuggingFace models are no longer guests. They are .entity citizens — reloadable, trainable, graftable, and eligible for every volumetric feature the stack already implements.

The arc: simple → native → experimental → full 3D

Stage What it is Status
Simple Flat HF decoder; Poly Talk loads safetensors each run ✅ Shipped
Native HF → .entity; Q4 baked for decoder blocks; GPU chat from Lucy-owned checkpoint ✅ Shipped (Lucy [8] ENTITY Talk)
Experimental Graft, parallel branches, remote links, per-layer dtype mixes on imported LLM weights 🔓 Unlocked in format + API; product UI not built
Full 3D LLM blocks as cells in a (Z,Y,X,L) grid — experts, hops, evolution around a frozen core 🔮 Next chapter

Lucy [7] proved the volumetric stack on small trained grids. Lucy [8] brings real LLM weights into that same format. The chat path today is still a flat decoder layout; the container is already the full brain OS.

HF import layout today

ImportHFToEntity (hf_import.go) maps a Llama-style stack into a 1×1×1 grid with four sub-layers per block (pre-norm, MHA, post-norm, SwiGLU):

net := NewVolumetricNetwork(1, 1, 1, dims.NumLayers*4)
InitHFDecoderBlocks(net, dims)

ENTITY Talk chat uses this linear layout. Nothing in the format prevents expanding to 2×2×N, parallel experts, or remote links — that is topology editing on a loaded VolumetricNetwork, then SaveEntityTransformer.

What the format unlocks

Capability Supported by format / poly ENTITY Talk UI today
Save / reload / train native state Convert + chat only
Different dtype per layer in one file Q4 decoder when user picks INT4 at convert
Parallel branches / MoE-style filter gates parallel.go
Spatial hops (IsRemoteLink) dispatch.md
Graft multiple networks into one parallel layer grafting.go
NEAT / topology evolution evolution.md
Selective layer load + block-wise GPU upload DeserializeEntityWithOptions ✅ block upload prompt
Merge two LLMs with mismatched hidden size / vocab ❌ shapes must align

Principle: anything Lucy [7] could do to a trained .entity, you can now in principle do to an imported LLM .entity — graft a side branch, add an experimental layer, mix dtypes, evolve topology around a frozen decoder core. Wiring those flows into product UI is separate work; the format bridge is the prerequisite, and it exists.

Example directions (not shipped)

// Load two checkpoints, graft parallel branches, save hybrid
a, _ := poly.LoadEntityTransformer("lucy_entities/Qwen--Qwen3-0.6B.entity")
b, _ := poly.LoadEntity("lucy_testing_output/my_swiglu_Int4.entity")
graft, err := poly.GraftNetworksPolymorphic([]*poly.VolumetricNetwork{a.Network, b.Network}, "concat")
// … embed graft in a new net topology, SaveEntityTransformer …

See parallel_sequential.md, evolution.md, and quick_reference.md for the underlying APIs.


LLM transformer checkpoints (Lucy [8])

Lucy menu [8] ENTITY Talk (lucy/hf_entity.go) converts supported HF models (SmolLM2, Qwen, Llama-style) to universal-transformer .entity files and runs GPU chat without loading safetensors at runtime.

Flow:

HF cache  →  ImportHFToEntity (FP32 master)  →  SerializeEntityTransformer (Q4_0 bake if INT4)
         →  lucy_entities/*.entity  →  LoadEntityTransformer  →  chat

Q4 on disk vs GPU (v1)

When the user selects Q4 (INT4) at convert time, implementation lives in entity_q4.go:

Weight region On disk (.entity) On GPU at chat
Decoder MHA + SwiGLU Q4_0 blocks (baked; no re-quant on load) Q4_0 via cached Q4_0Packed / uploadQ4_0Cached
RMSNorm, MHA Q/K norms, final norm FP32 FP32
Embeddings, LM head FP32 FP32

RMSNorm stays FP32 intentionally — quantizing norm gamma corrupts the forward pass. Globals stay FP32 in v1; that is why large-vocab untied models (e.g. Qwen3) may show little disk shrink vs BF16 safetensors even though decoder Q4 is real (GPU weights ~1450 MB vs ~4550 MB FP32 for Qwen3-0.6B).

Model-specific metadata persisted in the header includes expanded query_dim / kv_dim (Qwen-style MHA), MHA q_norm / k_norm auxiliary blobs, and lm_head_tied.

Tokenizer and chat template still come from the HF snapshot; only weights move native.


Name and identity

Format ENTITY
Expansion Every Numerical Type In Native TopologY
Extension .entity
Magic ENTITY\0\0 (8 bytes)
Format version 1 (v1, implemented)

Wire layout (v1)

Offset  Size  Content
──────  ────  ───────
0       8     magic "ENTITY\0\0"
8       2     u16 format_version  (= 1)
10      2     u16 flags           (reserved; 0 today)
12      8     u64 header_byte_length (LE)
20      N     header JSON (see below)
20+N    …     native-packed weight blobs (contiguous)

Header JSON

The header is one JSON object:

{
  "format_version": 1,
  "network": { /* PersistenceNetworkSpec — topology only, no weight strings */ },
  "transformer": {
    "architecture": "llama_style_decoder",
    "hidden_size": 2048,
    "vocab_size": 32000,
    "lm_head_tied": true,
    "has_final_norm": true,
    "dims": { "num_layers": 24, "num_heads": 32, ... }
  },
  "blobs": [
    {
      "path": "layers.0",
      "offset": 0,
      "length": 1234,
      "dtype": "INT4",
      "scale": 0.01,
      "native": true
    }
  ]
}
Field Role
network Same shape as PersistenceNetworkSpec: depth, rows, cols, layers_per_cell, and every PersistenceLayerSpec (type, activation, dtype, z/y/x/l, MHA/CNN dims, parallel/sequential recursion). No weights Base64 strings — those live in the blob section.
transformer Optional universal-transformer add-on. When present, global causal-LM weights live outside net.Layers: embeddings, LM head, final RMSNorm. Used by ImportHFToEntity for SmolLM2, Qwen, Llama-style decoders. Tokenizer/chat template still come from the HF snapshot (or your app).
blobs[] Index into the payload. Each entry points at one weight store (main layer, nested branch, or transformer global).

Blob paths mirror the in-memory tree:

Path example Weight store
layers.0 Top-level layer index 0
layers.3.sequential_layers.1 Nested sequential sub-layer
layers.2.parallel_branches.0 Parallel branch
layers.5.meta_observed_layer Metacognition observed layer
transformer.embeddings Token embedding matrix (FP32 blob)
transformer.lm_head Output projection (omitted when lm_head_tied)
transformer.final_norm Pre-head RMSNorm gamma (when has_final_norm)

Each blob carries its own dtype, scale, and native flag — so a single checkpoint can hold different numerical types per layer (e.g. layer 0 Int4, layer 12 BFloat16, layer 40 Binary).

Weight blobs

Payload bytes use the same bit-packing as JSON persistence:

No Base64. No FP32-only export constraint (unlike SaveSafetensors).


API (poly/entity.go)

Function Purpose
SerializeEntity(net) Network → .entity bytes
DeserializeEntity(data) Bytes → full network (topology + all weights)
DeserializeEntityWithOptions(data, opts) Selective weight load (EntityLoadOptions.LayerIndices)
DeserializeEntityLayer(data, layerIndex) Topology + one top-level layer’s weights
SaveEntity(path, net) / LoadEntity(path) File I/O
SerializeEntityTransformer(et) / DeserializeEntityTransformer(data) Universal transformer: decoder + embeddings/LM head/final norm
SaveEntityTransformer / LoadEntityTransformer / LoadEntityTransformerAs[T] File I/O + NewTransformer wiring
ImportHFToEntity(modelDir, path, opts) HF snapshot → universal .entity (hf_import.go)
ParseEntityHeader(data) Header only (no weight decode; mmap-friendly planning)
LayerPersistenceFromEntity(data, layerIndex) Raw blob + scale + native for one layer (parity checks)
EntityBlobBytes(data, blobIndex) Raw bytes for blob i without dtype decode

Hub model: load any format → VolumetricNetwork → save as JSON, .entity, or (lossy) safetensors F32 export.

net, err := poly.LoadEntity("brain.entity")
// … inference or re-quantize per layer (Morph) …
poly.SaveEntity("brain-v2.entity", net)
jsonWire, _ := poly.SerializeNetwork(net) // still valid debug export

Size vs JSON — observed compression (Lucy [7])

Source: lucy/lucy_testing_output/seven_layer.txt — full [7] Seven-layer CPU suite run (10 layer types × up to three grids × 21 dtypes). Checkpoints are written after MC training as tag_DType.json and tag_DType.entity.

Headline numbers

Metric Result
Runs compared 546 dtype×suite rows (26 memory tables)
Save/reload json=PASS entity=PASS on all 546 trained reload checks; entity=FAIL: 0
Average disk saving .entity is ~27.6% smaller than JSON (typical band 25–28%)
Runtime heap Unchanged — savings are on-disk only (same trained-native weight RAM in the log’s Weights column)

ENTITY v1 removes Base64 weight strings and pretty-printed JSON weights. It does not re-quantize. The header is still full topology JSON, so this is not safetensors-class compression.

Sample checkpoints (trained, after MC train)

SwiGLU 2×2×2 (8 cells × 7 layers = 56-layer stack) — the grid from the first live ENTITY comparison:

DType JSON ckpt .entity ckpt Saving
Float64 496.79 KiB 372.49 KiB 25%
Float32 258.17 KiB 193.35 KiB 25%
Int4 49.92 KiB 36.90 KiB 26%
Binary 27.66 KiB 20.29 KiB 27%

Dense 1×1×1 (7-layer pyramid stack):

DType JSON ckpt .entity ckpt Saving
Float32 57.95 KiB 43.47 KiB 25%
Int4 9.53 KiB 7.12 KiB 25%
Binary 4.35 KiB 3.24 KiB 26%

Dense 3×3×3 (27 cells × 7 = 189-layer stack):

DType JSON ckpt .entity ckpt Saving
Float32 83.90 KiB 60.95 KiB 27%
Int4 69.93 KiB 49.68 KiB 29%
Binary 68.61 KiB 49.09 KiB 28%

Across all 21 dtypes on SwiGLU 2×2×2, the ENTITY/JSON ratio stays in a 25.0–26.6% band — the saving is almost entirely Base64 removal, not dtype-specific magic.

Three things the log teaches

1. ENTITY vs JSON ≈ fixed ~25% discount, not 10×

The ratio is stable because both formats carry the same topology JSON header and the same native weight bits; only the weight encoding in the file changes (Base64 strings → raw blob section).

2. Quant dtype still dominates absolute file size

Same topology, different dtype — SwiGLU 2×2×2:

DType JSON .entity
Float64 497 KiB 372 KiB
Int4 50 KiB 37 KiB

Int4 JSON is ~10% of Float64 JSON on the same brain. Picking Int4/Binary matters far more than picking .entity over .json.

3. Topology overhead grows with grid size; ENTITY shrinks the gap

SwiGLU 2×2×2 Float32 breakdown (from the log’s Weights vs checkpoint columns):

Component Size
Trained-native weights in RAM 178.50 KiB
JSON checkpoint 258.17 KiB (+80 KiB overhead ≈ 31% of file)
.entity checkpoint 193.35 KiB (+15 KiB overhead ≈ 8% of file)

On Dense 3×3×3 Float32, trained-native weights are only ~12 KiB in RAM but the JSON checkpoint is ~84 KiB — topology metadata dominates. ENTITY drops that to ~61 KiB (~27% saving), but the file is still mostly header, not weights.

Residual 3×3×3 is an extreme case in the log: ~42% smaller .entity vs JSON when per-layer weight blobs are tiny relative to the 189-layer spec.

Where to read it in the log

Each layer-type × grid block ends with a memory & weight footprint table:

| DType | Heap | Sys | Heap+train | Weights | JSON ckpt | .entity ckpt |

Pass lines also print both sizes inline, e.g. json=496.79 KiB entity=372.49 KiB on SwiGLU 2×2×2 Float64.

Why not safetensors-small?

  1. Weights are already at the bit-width floor for each dtype (Int4 nibbles, Binary 8:1, …).
  2. Topology JSON is a large fixed cost on big grids (56–189 layer specs with type, dtype, dims, z/y/x/l).
  3. SafeTensors omits that graph entirely — flat tensor names only.

See Future: smaller files with full topology for the planned binary topology + optional zstd path.


Comparison to SafeTensors

SafeTensors ENTITY v1 JSON persistence
Weights on disk Raw HF dtypes Raw Loom native packing Base64 in JSON
Topology in same file ✅ (JSON header)
Per-layer Loom dtype + Scale
Volumetric (Z,Y,X,L)
Parallel / sequential tree
Typical header size Tiny Large on big grids Largest (includes Base64 weights)

SafeTensors wins on flat LLM weight dumps. ENTITY wins on native Loom brains you trained and need to reload exactly.


Idempotency

SerializeEntityDeserializeEntitySerializeEntity yields identical bytes for a given network state (tested in poly/tests/entity_test.go).

Topology fields are canonicalized on save (e.g. default seq_length omitted for non-sequence layers) so reload does not inflate the header.


Validation

Suite What it checks
Lucy [7] (seven_layer/runner.go) Before/after train: JSON and .entity save/reload PASS; memory table shows both checkpoint sizes
Lucy [8] (lucy/hf_entity.go) HF cache → Q4 .entity convert → GPU ENTITY Talk; SmolLM2 parity with Poly Talk; Qwen load + Q4 GPU path
poly/tests/entity_test.go Round-trip, idempotent bytes, selective layer load, Q4_0 blob round-trip for transformers

Checkpoints land in lucy/lucy_testing_output/ as tag_DType.json and tag_DType.entity. Full-run numbers and compression observations: entity.md — observed compression (from seven_layer.txt).


Relationship to other I/O

File Role
safetensors.go Read HF .safetensors; SaveSafetensors is F32-only export
persistence.go JSON save/load — semantic reference for ENTITY topology and packing
entity.go Native .entity binary save/load
serialization.go Architecture-only JSON (BuildNetworkFromJSON) — random init, no trained weights
universal_loader.go Auto-detect from safetensors shapes — import only

Future: smaller files with full topology

ENTITY v1 prioritizes correctness and debuggability. Planned v2+ improvements (same full topology, smaller wire):

  1. Binary topology section — string tables, dtype/layer-type enums, grid-implied (z,y,x,l) where regular
  2. Compact blob index — fixed records (node_id, u8 dtype, f32 scale, offsets) instead of JSON path strings
  3. Optional zstd (lossless) on header/index/payload via flags bits
  4. ConvertSafetensorsToEntity — import HF weights into a Loom topology wrapper
  5. Welvet C-ABILoomSaveEntity, LoomLoadEntity; loaders accept .entity or .safetensors
  6. SoulGlitch — prefer .entity for on-device trained saves; keep HF download as .safetensors

Weights stay on EncodeNativeWeightsRaw; the big disk wins are in topology + index, not re-quantizing weights.


See also