ENTITY format (`.entity`)

Every Numerical Type In Native TopologY

Native Loom checkpoint files. One .entity file = one saved brain: full volumetric topology + all native-packed weights in a single binary artifact.

Implementation: poly/entity.go

Validated in Lucy menu [7] Seven-layer CPU suite — JSON and .entity save/reload run side by side for all 21 dtypes (lucy/examples/seven_layer/runner.go). Lucy [8] ENTITY Talk converts HF LLMs to .entity and runs GPU chat from native checkpoints (lucy/hf_entity.go).

Why we built this

HuggingFace .safetensors is the right import lane for PyTorch/HF checkpoints. Loom uses it for model download and HF decode (poly/safetensors.go, SoulGlitch, LoomCreateLLM).

It is not a native Loom checkpoint format. That is not a bug in SafeTensors — Loom simply does more than flat named tensors:

Loom needs	SafeTensors
21 DTypes with native on-disk packing (Int4 nibbles, Binary 8:1, Ternary, FP4, …)	Fixed HF dtype strings; sub-byte types are awkward; Loom export is F32-only
Per-layer `Scale` (quant mapping used at save time)	No standard field
Volumetric grid `(Z, Y, X, L)` per layer	Flat string keys only (`model.layers.0…`)
Topology — parallel branches, sequential stacks, metacognition	Requires separate `config.json`; no recursion
Bit-perfect reload of trained native dtypes	Import path usually decodes to FP32 master

We already had full fidelity in persistence.go (SerializeNetwork / DeserializeNetwork) — JSON + Base64 native blobs. That works (Lucy save/reload PASS on all 21 dtypes) and remains the transparent debug lane, but it is large and slow for shipping brains to phones or edge nodes.

ENTITY is the native binary path:

Same semantics as JSON persistence (same topology spec, same packing rules)
SafeTensors-like wire safety: length-prefixed header + indexed blob section
Raw weight bytes (no Base64)
Different file extension so HF tooling does not assume HuggingFace semantics

Import:   model.safetensors     ← HF yarn (read-only in product flow)
Native:   fluffy.entity         ← ENTITY (train, save, reload, ship)
Debug:    model.json            ← JSON persistence (same brain, verbose)

One file = topology + weights

Unlike HuggingFace’s split of config.json + model.safetensors, ENTITY keeps everything together:

┌─────────────────────────────────────────┐
│  Fixed header (magic, version, flags)   │
├─────────────────────────────────────────┤
│  JSON header                            │
│    • network topology (grid + layers)   │  ← PersistenceNetworkSpec
│    • blob index (path, offset, dtype…)  │
├─────────────────────────────────────────┤
│  Binary payload                         │  ← native-packed weight blobs
└─────────────────────────────────────────┘

After LoadEntity, you get a full VolumetricNetwork — grid dimensions, every layer’s type/activation/dtype/config, recursive branches, and quantized weights ready for forward or further training.

The unlock: HF models as native citizens

ENTITY is not only a smaller checkpoint format. It is the bridge that moves real LLM weights from HuggingFace’s flat tensor world into Loom’s volumetric brain format — the same container Lucy [7] uses for 3D grids, parallel branches, and per-layer dtypes.

Before vs after

Before — two separate worlds:

HF .safetensors  →  flat tensor names  →  Poly Talk reads every run
                      ↓
              foreign format (no grid, no branches, no per-layer dtype in one file)

Lucy [7] seven-layer suite  →  2×2×2 grids, remote links, 21 dtypes
                      ↓
              synthetic trained brains only

After — one native lane:

HF snapshot  →  convert once  →  .entity  →  VolumetricNetwork + transformer globals
                                      ↓
                         same format as Lucy [7] save/reload
                         chat without HF weights at runtime (Lucy [8] ENTITY Talk)

HuggingFace models are no longer guests. They are .entity citizens — reloadable, trainable, graftable, and eligible for every volumetric feature the stack already implements.

The arc: simple → native → experimental → full 3D

Stage	What it is	Status
Simple	Flat HF decoder; Poly Talk loads safetensors each run	✅ Shipped
Native	HF → `.entity`; Q4 baked for decoder blocks; GPU chat from Lucy-owned checkpoint	✅ Shipped (Lucy [8] ENTITY Talk)
Experimental	Graft, parallel branches, remote links, per-layer dtype mixes on imported LLM weights	🔓 Unlocked in format + API; product UI not built
Full 3D	LLM blocks as cells in a `(Z,Y,X,L)` grid — experts, hops, evolution around a frozen core	🔮 Next chapter

Lucy [7] proved the volumetric stack on small trained grids. Lucy [8] brings real LLM weights into that same format. The chat path today is still a flat decoder layout; the container is already the full brain OS.

HF import layout today

ImportHFToEntity (hf_import.go) maps a Llama-style stack into a 1×1×1 grid with four sub-layers per block (pre-norm, MHA, post-norm, SwiGLU):

net := NewVolumetricNetwork(1, 1, 1, dims.NumLayers*4)
InitHFDecoderBlocks(net, dims)

ENTITY Talk chat uses this linear layout. Nothing in the format prevents expanding to 2×2×N, parallel experts, or remote links — that is topology editing on a loaded VolumetricNetwork, then SaveEntityTransformer.

What the format unlocks

Capability	Supported by format / poly	ENTITY Talk UI today
Save / reload / train native state	✅	Convert + chat only
Different dtype per layer in one file	✅	Q4 decoder when user picks INT4 at convert
Parallel branches / MoE-style `filter` gates	✅ `parallel.go`	❌
Spatial hops (`IsRemoteLink`)	✅ `dispatch.md`	❌
Graft multiple networks into one parallel layer	✅ `grafting.go`	❌
NEAT / topology evolution	✅ `evolution.md`	❌
Selective layer load + block-wise GPU upload	✅ `DeserializeEntityWithOptions`	✅ block upload prompt
Merge two LLMs with mismatched hidden size / vocab	❌ shapes must align	❌

Principle: anything Lucy [7] could do to a trained .entity, you can now in principle do to an imported LLM .entity — graft a side branch, add an experimental layer, mix dtypes, evolve topology around a frozen decoder core. Wiring those flows into product UI is separate work; the format bridge is the prerequisite, and it exists.

Example directions (not shipped)

// Load two checkpoints, graft parallel branches, save hybrid
a, _ := poly.LoadEntityTransformer("lucy_entities/Qwen--Qwen3-0.6B.entity")
b, _ := poly.LoadEntity("lucy_testing_output/my_swiglu_Int4.entity")
graft, err := poly.GraftNetworksPolymorphic([]*poly.VolumetricNetwork{a.Network, b.Network}, "concat")
// … embed graft in a new net topology, SaveEntityTransformer …

See parallel_sequential.md, evolution.md, and quick_reference.md for the underlying APIs.

LLM transformer checkpoints (Lucy [8])

Lucy menu [8] ENTITY Talk (lucy/hf_entity.go) converts supported HF models (SmolLM2, Qwen, Llama-style) to universal-transformer .entity files and runs GPU chat without loading safetensors at runtime.

Flow:

HF cache  →  ImportHFToEntity (FP32 master)  →  SerializeEntityTransformer (Q4_0 bake if INT4)
         →  lucy_entities/*.entity  →  LoadEntityTransformer  →  chat

Q4 on disk vs GPU (v1)

When the user selects Q4 (INT4) at convert time, implementation lives in entity_q4.go:

Weight region	On disk (`.entity`)	On GPU at chat
Decoder MHA + SwiGLU	Q4_0 blocks (baked; no re-quant on load)	Q4_0 via cached `Q4_0Packed` / `uploadQ4_0Cached`
RMSNorm, MHA Q/K norms, final norm	FP32	FP32
Embeddings, LM head	FP32	FP32

RMSNorm stays FP32 intentionally — quantizing norm gamma corrupts the forward pass. Globals stay FP32 in v1; that is why large-vocab untied models (e.g. Qwen3) may show little disk shrink vs BF16 safetensors even though decoder Q4 is real (GPU weights ~1450 MB vs ~4550 MB FP32 for Qwen3-0.6B).

Model-specific metadata persisted in the header includes expanded query_dim / kv_dim (Qwen-style MHA), MHA q_norm / k_norm auxiliary blobs, and lm_head_tied.

Tokenizer and chat template still come from the HF snapshot; only weights move native.

Name and identity


Format	ENTITY
Expansion	Every Numerical Type In Native TopologY
Extension	`.entity`
Magic	`ENTITY\0\0` (8 bytes)
Format version	`1` (v1, implemented)

Wire layout (v1)

Offset  Size  Content
──────  ────  ───────
0       8     magic "ENTITY\0\0"
8       2     u16 format_version  (= 1)
10      2     u16 flags           (reserved; 0 today)
12      8     u64 header_byte_length (LE)
20      N     header JSON (see below)
20+N    …     native-packed weight blobs (contiguous)

Header JSON

The header is one JSON object:

{
  "format_version": 1,
  "network": { /* PersistenceNetworkSpec — topology only, no weight strings */ },
  "transformer": {
    "architecture": "llama_style_decoder",
    "hidden_size": 2048,
    "vocab_size": 32000,
    "lm_head_tied": true,
    "has_final_norm": true,
    "dims": { "num_layers": 24, "num_heads": 32, ... }
  },
  "blobs": [
    {
      "path": "layers.0",
      "offset": 0,
      "length": 1234,
      "dtype": "INT4",
      "scale": 0.01,
      "native": true
    }
  ]
}

Field	Role
`network`	Same shape as `PersistenceNetworkSpec`: `depth`, `rows`, `cols`, `layers_per_cell`, and every `PersistenceLayerSpec` (type, activation, dtype, z/y/x/l, MHA/CNN dims, parallel/sequential recursion). No `weights` Base64 strings — those live in the blob section.
`transformer`	Optional universal-transformer add-on. When present, global causal-LM weights live outside `net.Layers`: embeddings, LM head, final RMSNorm. Used by `ImportHFToEntity` for SmolLM2, Qwen, Llama-style decoders. Tokenizer/chat template still come from the HF snapshot (or your app).
`blobs[]`	Index into the payload. Each entry points at one weight store (main layer, nested branch, or transformer global).

Blob paths mirror the in-memory tree:

Path example	Weight store
`layers.0`	Top-level layer index 0
`layers.3.sequential_layers.1`	Nested sequential sub-layer
`layers.2.parallel_branches.0`	Parallel branch
`layers.5.meta_observed_layer`	Metacognition observed layer
`transformer.embeddings`	Token embedding matrix (FP32 blob)
`transformer.lm_head`	Output projection (omitted when `lm_head_tied`)
`transformer.final_norm`	Pre-head RMSNorm gamma (when `has_final_norm`)

Each blob carries its own dtype, scale, and native flag — so a single checkpoint can hold different numerical types per layer (e.g. layer 0 Int4, layer 12 BFloat16, layer 40 Binary).

Weight blobs

Payload bytes use the same bit-packing as JSON persistence:

Implemented via EncodeNativeWeightsRaw / DecodeNativeWeightsRaw in persistence.go
Documented in serialization.md

No Base64. No FP32-only export constraint (unlike SaveSafetensors).

API (`poly/entity.go`)

Function	Purpose
`SerializeEntity(net)`	Network → `.entity` bytes
`DeserializeEntity(data)`	Bytes → full network (topology + all weights)
`DeserializeEntityWithOptions(data, opts)`	Selective weight load (`EntityLoadOptions.LayerIndices`)
`DeserializeEntityLayer(data, layerIndex)`	Topology + one top-level layer’s weights
`SaveEntity(path, net)` / `LoadEntity(path)`	File I/O
`SerializeEntityTransformer(et)` / `DeserializeEntityTransformer(data)`	Universal transformer: decoder + embeddings/LM head/final norm
`SaveEntityTransformer` / `LoadEntityTransformer` / `LoadEntityTransformerAs[T]`	File I/O + `NewTransformer` wiring
`ImportHFToEntity(modelDir, path, opts)`	HF snapshot → universal `.entity` (`hf_import.go`)
`ParseEntityHeader(data)`	Header only (no weight decode; mmap-friendly planning)
`LayerPersistenceFromEntity(data, layerIndex)`	Raw blob + scale + native for one layer (parity checks)
`EntityBlobBytes(data, blobIndex)`	Raw bytes for blob `i` without dtype decode

Hub model: load any format → VolumetricNetwork → save as JSON, .entity, or (lossy) safetensors F32 export.

net, err := poly.LoadEntity("brain.entity")
// … inference or re-quantize per layer (Morph) …
poly.SaveEntity("brain-v2.entity", net)
jsonWire, _ := poly.SerializeNetwork(net) // still valid debug export

Size vs JSON — observed compression (Lucy [7])

Source: lucy/lucy_testing_output/seven_layer.txt — full [7] Seven-layer CPU suite run (10 layer types × up to three grids × 21 dtypes). Checkpoints are written after MC training as tag_DType.json and tag_DType.entity.

Headline numbers

Metric	Result
Runs compared	546 dtype×suite rows (26 memory tables)
Save/reload	`json=PASS entity=PASS` on all 546 trained reload checks; `entity=FAIL`: 0
Average disk saving	`.entity` is ~27.6% smaller than JSON (typical band 25–28%)
Runtime heap	Unchanged — savings are on-disk only (same trained-native weight RAM in the log’s Weights column)

ENTITY v1 removes Base64 weight strings and pretty-printed JSON weights. It does not re-quantize. The header is still full topology JSON, so this is not safetensors-class compression.

Sample checkpoints (trained, after MC train)

SwiGLU 2×2×2 (8 cells × 7 layers = 56-layer stack) — the grid from the first live ENTITY comparison:

DType	JSON ckpt	`.entity` ckpt	Saving
Float64	496.79 KiB	372.49 KiB	25%
Float32	258.17 KiB	193.35 KiB	25%
Int4	49.92 KiB	36.90 KiB	26%
Binary	27.66 KiB	20.29 KiB	27%

Dense 1×1×1 (7-layer pyramid stack):

DType	JSON ckpt	`.entity` ckpt	Saving
Float32	57.95 KiB	43.47 KiB	25%
Int4	9.53 KiB	7.12 KiB	25%
Binary	4.35 KiB	3.24 KiB	26%

Dense 3×3×3 (27 cells × 7 = 189-layer stack):

DType	JSON ckpt	`.entity` ckpt	Saving
Float32	83.90 KiB	60.95 KiB	27%
Int4	69.93 KiB	49.68 KiB	29%
Binary	68.61 KiB	49.09 KiB	28%

Across all 21 dtypes on SwiGLU 2×2×2, the ENTITY/JSON ratio stays in a 25.0–26.6% band — the saving is almost entirely Base64 removal, not dtype-specific magic.

Three things the log teaches

1. ENTITY vs JSON ≈ fixed ~25% discount, not 10×

The ratio is stable because both formats carry the same topology JSON header and the same native weight bits; only the weight encoding in the file changes (Base64 strings → raw blob section).

2. Quant dtype still dominates absolute file size

Same topology, different dtype — SwiGLU 2×2×2:

DType	JSON	`.entity`
Float64	497 KiB	372 KiB
Int4	50 KiB	37 KiB

Int4 JSON is ~10% of Float64 JSON on the same brain. Picking Int4/Binary matters far more than picking .entity over .json.

3. Topology overhead grows with grid size; ENTITY shrinks the gap

SwiGLU 2×2×2 Float32 breakdown (from the log’s Weights vs checkpoint columns):

Component	Size
Trained-native weights in RAM	178.50 KiB
JSON checkpoint	258.17 KiB (+80 KiB overhead ≈ 31% of file)
`.entity` checkpoint	193.35 KiB (+15 KiB overhead ≈ 8% of file)

On Dense 3×3×3 Float32, trained-native weights are only ~12 KiB in RAM but the JSON checkpoint is ~84 KiB — topology metadata dominates. ENTITY drops that to ~61 KiB (~27% saving), but the file is still mostly header, not weights.

Residual 3×3×3 is an extreme case in the log: ~42% smaller .entity vs JSON when per-layer weight blobs are tiny relative to the 189-layer spec.

Where to read it in the log

Each layer-type × grid block ends with a memory & weight footprint table:

| DType | Heap | Sys | Heap+train | Weights | JSON ckpt | .entity ckpt |

Pass lines also print both sizes inline, e.g. json=496.79 KiB entity=372.49 KiB on SwiGLU 2×2×2 Float64.

Why not safetensors-small?

Weights are already at the bit-width floor for each dtype (Int4 nibbles, Binary 8:1, …).
Topology JSON is a large fixed cost on big grids (56–189 layer specs with type, dtype, dims, z/y/x/l).
SafeTensors omits that graph entirely — flat tensor names only.

See Future: smaller files with full topology for the planned binary topology + optional zstd path.

Comparison to SafeTensors

	SafeTensors	ENTITY v1	JSON persistence
Weights on disk	Raw HF dtypes	Raw Loom native packing	Base64 in JSON
Topology in same file	❌	✅ (JSON header)	✅
Per-layer Loom dtype + Scale	❌	✅	✅
Volumetric (Z,Y,X,L)	❌	✅	✅
Parallel / sequential tree	❌	✅	✅
Typical header size	Tiny	Large on big grids	Largest (includes Base64 weights)

SafeTensors wins on flat LLM weight dumps. ENTITY wins on native Loom brains you trained and need to reload exactly.

Idempotency

SerializeEntity → DeserializeEntity → SerializeEntity yields identical bytes for a given network state (tested in poly/tests/entity_test.go).

Topology fields are canonicalized on save (e.g. default seq_length omitted for non-sequence layers) so reload does not inflate the header.

Validation

Suite	What it checks
Lucy [7] (`seven_layer/runner.go`)	Before/after train: JSON and `.entity` save/reload PASS; memory table shows both checkpoint sizes
Lucy [8] (`lucy/hf_entity.go`)	HF cache → Q4 `.entity` convert → GPU ENTITY Talk; SmolLM2 parity with Poly Talk; Qwen load + Q4 GPU path
`poly/tests/entity_test.go`	Round-trip, idempotent bytes, selective layer load, Q4_0 blob round-trip for transformers

Checkpoints land in lucy/lucy_testing_output/ as tag_DType.json and tag_DType.entity. Full-run numbers and compression observations: entity.md — observed compression (from seven_layer.txt).

Relationship to other I/O

File	Role
`safetensors.go`	Read HF `.safetensors`; `SaveSafetensors` is F32-only export
`persistence.go`	JSON save/load — semantic reference for ENTITY topology and packing
`entity.go`	Native `.entity` binary save/load
`serialization.go`	Architecture-only JSON (`BuildNetworkFromJSON`) — random init, no trained weights
`universal_loader.go`	Auto-detect from safetensors shapes — import only

Future: smaller files with full topology

ENTITY v1 prioritizes correctness and debuggability. Planned v2+ improvements (same full topology, smaller wire):

Binary topology section — string tables, dtype/layer-type enums, grid-implied (z,y,x,l) where regular
Compact blob index — fixed records (node_id, u8 dtype, f32 scale, offsets) instead of JSON path strings
Optional zstd (lossless) on header/index/payload via flags bits
ConvertSafetensorsToEntity — import HF weights into a Loom topology wrapper
Welvet C-ABI — LoomSaveEntity, LoomLoadEntity; loaders accept .entity or .safetensors
SoulGlitch — prefer .entity for on-device trained saves; keep HF download as .safetensors

Weights stay on EncodeNativeWeightsRaw; the big disk wins are in topology + index, not re-quantizing weights.

ENTITY format (.entity)