v0.81.0 — Accelerator Bridge (Intel NPU + vendor plugin model)
Release: 0.80.0 "Native Ship" → 0.81.0 "Accelerator Bridge"
Checklist: 112 / 146 (76.7%) on adjustments — Intel forward dispatch advances Accelerators & Distributed (experimental)
First public vendor accelerator path: Loom forwards individual layers through poly/accel into chaosglue-built plugins, starting with Intel OpenVINO CPU + NPU on Linux.
What shipped
poly/accel — vendor-neutral plugin loader
| Item | Detail |
|---|---|
| Package | poly/accel/ — Discover, Registry, Plugin, CompiledLayer |
| C ABI | loom_accel.h in chaosglue (Loom does not vendor OpenVINO) |
| Linux | dlopen via CGO (CGO_ENABLED=1) |
| Intel plugin | libloom_accel_intel.so — built from chaosglue/npu/intel/cabi/ |
Dispatch integration
| Item | Detail |
|---|---|
accel_intel.go |
DiscoverAccel, SyncToAccel, DispatchAccelForward, weight → FP32 bytes |
forward.go |
DispatchLayer calls accel when layer.ExecTarget.UseAccel() |
VolumetricLayer |
ExecTarget, AccelBinding fields |
| Init-once | SyncToAccel(sizeLabel) compiles + uploads weights; steady infer reuses handle |
Lucy [9] — Intel NPU bridge suite
| Item | Detail |
|---|---|
| Menu | [9] → [4] medium or [5] full matrix |
| Tables | Timing (Loom / Intel CPU / Intel NPU, speedup) + seven-style drift spectrum |
| Log | lucy_testing_output/nine_layer.txt |
| Proof | 90 cells: Intel infer 💎 EXACT repeat-forward; Conv2D large ~22× NPU vs Loom |
Documentation
| File | Contents |
|---|---|
accelerators.md |
User/developer guide — Intel now, Qualcomm + Google planned |
chaosglue npu/docs/2025-06-26-loom-dispatch-integration-assessment.md |
Full benchmark evidence |
What this release is (and is not)
You now have:
- A real dispatch hook — not a standalone bench binary
- Intel CPU + NPU on Linux with documented env + Lucy validation
- A plugin model ready for Qualcomm NPU and Google TPU (same ABI, new
.so) - Experimental label — appropriate for first wild release
You do not yet claim:
- End-user “turn on NPU” without code (
ExecTargetis manual) - JSON network field for
exec: intel-npu - Training or backward on vendor path
- Bit-perfect Loom ↔ Intel parity on all layers
- Windows or macOS Intel plugin builds
- Qualcomm or Google plugins (roadmap only)
Quick start (developers)
# 1. Build Intel CABI (chaosglue)
cd ~/git/chaosglue/npu/intel/cabi && ./build.sh
# 2. OpenVINO + NPU environment
source ~/git/chaosglue/npu/intel/example/setup_env.sh
export LOOM_ACCEL_INTEL_SO=~/git/chaosglue/npu/intel/cabi/build/libloom_accel_intel.so
# 3. Run Lucy validation
cd ~/git/chaosglue/loom/lucy
CGO_ENABLED=1 go run .
# → 9 → 4
Or: ./run_npu_bridge.sh from lucy/.
Future vendors (planned)
| Vendor | Plugin (planned) | SDK / hardware |
|---|---|---|
| Intel | libloom_accel_intel.so |
✅ OpenVINO, Core Ultra NPU |
| Qualcomm | libloom_accel_qcom.so |
QNN / Hexagon, Snapdragon X |
libloom_accel_google.so |
TPU / PJRT (cloud + edge TBD) |
Loom code path is identical: DiscoverAccel → ExecTarget → SyncToAccel → ForwardPolymorphic.
Next targets (v0.82+)
- AccelPlanner — auto-select CPU vs Intel CPU vs Intel NPU from shape + layer type
- JSON
execfield —"intel-npu"per layer in network JSON - Parity — MatMul bias/layout, norm weight upload, shared INT8 quant
- Qualcomm CABI stub in chaosglue
npu/qualcomm/ - ASM rollout (continues from v0.80 roadmap) — Dense backward, SwiGLU, MHA
Key source files
| Area | Files |
|---|---|
| Accel package | poly/accel/*.go |
| Intel dispatch | poly/accel_intel.go, poly/forward.go |
| Types | poly/poly.go (ExecTarget, AccelBinding, net.Accel) |
| Lucy suite | lucy/examples/nine_layer/ |
| CABI | chaosglue npu/include/loom_accel.h, npu/intel/cabi/ |
See also
- accelerators.md — full developer guide
- v080_release.md — previous release (ENTITY + WebGPU)
- dispatch.md —
DispatchLayerhub - gpu.md — WebGPU (complementary backend)