Docs v0.80.0 — Native Ship (ENTITY + Modern GPU)

v0.80.0 — Native Ship (ENTITY + Modern GPU)

Release: 0.79.0 "Bedrock Validation"0.80.0 "Native Ship"
Checklist: 111 / 142 (78.2%) → 114 / 142 (80.3%)

This wave ships native Loom checkpoints (ENTITY), moves production GPU to openfluke/webgpu v1.0.4 (wgpu-native v29), and validates real LLM inference on Metal, Vulkan (Intel + NVIDIA), and Windows ARM64. The Planet Bridging POC in ../planetbridging/ completes the “planets → Loom” half of the hub; it releases as its own repo/version after Loom 0.80 — Loom must land first.


What shipped

ENTITY — native .entity checkpoints

Item Detail
Format entity.md — magic ENTITY, JSON topology header + native-packed weight blobs
Semantics Same as JSON persistence: 21 dtypes, volumetric (Z,Y,X,L), parallel/sequential trees, per-layer Scale
Lucy [7] Seven-layer CPU suite: JSON and .entity save/reload PASS on all trained rows
Lucy [8] ENTITY Talk — HF cache → ImportHFToEntity → optional Q4 bake → GPU chat without safetensors at runtime
Size ~25% smaller than JSON checkpoints (Base64 removed); quant dtype still dominates absolute size
Unlock Real LLM weights become .entity citizens — same container as volumetric experiments (graft, remote links, per-layer dtype)

Import lane unchanged: HuggingFace .safetensors for download. Ship lane: .entity for trained or converted brains.

WebGPU v29 — github.com/openfluke/[email protected]

Item Detail
Module Standalone openfluke/webgpu (no longer a cogentcore fork)
Native stack wgpu-native v29 C API — futures, WGPUStringView, Go-side validation error scopes
Loom dependency require github.com/openfluke/webgpu v1.0.4 in root and lucy/go.mod
Binaries Prebuilt libwgpu_native.a per platform under the module; ios/amd64 (Intel simulator) dropped to satisfy Go module size limits

See webgpu README for platform table and version history.

Cross-platform GPU validation (Lucy Poly Talk / ENTITY Talk)

Same SmolLM2-135M-Instruct, Q4, block-wise GPU upload — webgpu v1.0.4 + poly WGSL:

Platform GPU Backend Decode (approx.) Notes
macOS arm64 Apple M5 Metal ✅ parity with prior v29 work Adapter → device → buffer → forward
Windows arm64 Snapdragon Vulkan ✅ validated Previously broken on old bindings
Linux Intel Iris Xe Vulkan (Mesa i915) ~19 tok/s decode Headless; tier fallback OK
Linux RTX 3050 Mobile Vulkan (NVIDIA) ~69 tok/s decode, ~492 tok/s prefill Requires healthy nvidia-smi + VK_ICD_FILENAMES

Not a llama.cpp/Ollama tok/s contest yet — custom WGSL through wgpu-native — but ~3.5× decode vs iGPU on the same box confirms the v29 stack is production-real on NVIDIA Linux.

Planet Bridging POC (in monorepo — separate release)

../planetbridging/ reached v0.5.0 internally:

  • Direction: planets → Loom (complete for standard volumetric layer types)
  • 13 compare tabs: Dense, CNN1/2/3, MHA, LSTM, RNN, LayerNorm, Embedding, RMSNorm, SwiGLU, Residual, Mixer v1/v2
  • Planets: PyTorch, TensorFlow, JAX (+ sklearn on Dense)
  • Mechanism: live weight stream → .stream.entity → Loom infer → PASS vs native (fp32 tolerance)
  • Mixer v2: 16-layer stack, all 12 types chained (~5e-5 max diff POC)

Release order: Loom 0.80 first → then Planet Bridging 0.5.0 as its own published hub (v1.0 = Loom → ONNX/Safetensors/GGUF export).


What this release is (and is not)

You now have:

  • A shippable native checkpoint (.entity) beside JSON debug persistence and HF import
  • HF LLMs as Loom citizens via Lucy [8] — not just flat safetensor guests each run
  • Modern GPU bindings decoupled from upstream fork politics
  • Multi-vendor GPU proof on one engine (Metal, Qualcomm, Intel, NVIDIA Vulkan)
  • A complete planet→Loom POC waiting on Loom’s release tag

You do not yet claim:

  • Planet Bridging published (repo/version ships after Loom)
  • Loom → export hub formats (ONNX/GGUF out) — Planet Bridging v1.0
  • Ollama-class decode on every GPU (WGSL matmul path still has headroom)
  • ENTITY v2 binary topology (header still JSON; see entity.md — future)

Next named targets:

  • v0.81 — ASM rollout (Dense backward, SwiGLU, MHA); GPU kernel fusion
  • Planet Bridging v0.5.0 — publish after Loom 0.80 tag
  • Planet Bridging v1.0 — Loom → hub formats → any inference engine

How to verify

# Lucy ENTITY + GPU (from repo root)
cd lucy && go get github.com/openfluke/[email protected] && go mod tidy
go run .   # [7] seven-layer (entity save/reload), [8] ENTITY Talk, [1] Poly Talk GPU

# ENTITY round-trip tests
cd ../poly/tests && go test -run Entity -v

# Planet Bridging compare host (POC — not part of Loom release artifact yet)
cd ../planetbridging && go run .

Linux NVIDIA:

export VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/nvidia_icd.x86_64.json
export WGPU_ADAPTER_NAME=NVIDIA

Key source files

Area Files
ENTITY poly/entity.go, poly/entity_q4.go, poly/hf_import.go
Lucy [8] lucy/hf_entity.go
WebGPU init poly/wgpu_context_native.go
Docs entity.md, gpu.md, transformer.md
Planet Bridging planetbridging/README.md, planetbridging/PROGRESS.md

See also