Docs v0.81.0 — Accelerator Bridge (Intel NPU + vendor plugin model)

v0.81.0 — Accelerator Bridge (Intel NPU + vendor plugin model)

Release: 0.80.0 "Native Ship"0.81.0 "Accelerator Bridge"
Checklist: 112 / 146 (76.7%) on adjustments — Intel forward dispatch advances Accelerators & Distributed (experimental)

First public vendor accelerator path: Loom forwards individual layers through poly/accel into chaosglue-built plugins, starting with Intel OpenVINO CPU + NPU on Linux.


What shipped

poly/accel — vendor-neutral plugin loader

Item Detail
Package poly/accel/Discover, Registry, Plugin, CompiledLayer
C ABI loom_accel.h in chaosglue (Loom does not vendor OpenVINO)
Linux dlopen via CGO (CGO_ENABLED=1)
Intel plugin libloom_accel_intel.so — built from chaosglue/npu/intel/cabi/

Dispatch integration

Item Detail
accel_intel.go DiscoverAccel, SyncToAccel, DispatchAccelForward, weight → FP32 bytes
forward.go DispatchLayer calls accel when layer.ExecTarget.UseAccel()
VolumetricLayer ExecTarget, AccelBinding fields
Init-once SyncToAccel(sizeLabel) compiles + uploads weights; steady infer reuses handle

Lucy [9] — Intel NPU bridge suite

Item Detail
Menu [9][4] medium or [5] full matrix
Tables Timing (Loom / Intel CPU / Intel NPU, speedup) + seven-style drift spectrum
Log lucy_testing_output/nine_layer.txt
Proof 90 cells: Intel infer 💎 EXACT repeat-forward; Conv2D large ~22× NPU vs Loom

Documentation

File Contents
accelerators.md User/developer guide — Intel now, Qualcomm + Google planned
chaosglue npu/docs/2025-06-26-loom-dispatch-integration-assessment.md Full benchmark evidence

What this release is (and is not)

You now have:

  • A real dispatch hook — not a standalone bench binary
  • Intel CPU + NPU on Linux with documented env + Lucy validation
  • A plugin model ready for Qualcomm NPU and Google TPU (same ABI, new .so)
  • Experimental label — appropriate for first wild release

You do not yet claim:

  • End-user “turn on NPU” without code (ExecTarget is manual)
  • JSON network field for exec: intel-npu
  • Training or backward on vendor path
  • Bit-perfect Loom ↔ Intel parity on all layers
  • Windows or macOS Intel plugin builds
  • Qualcomm or Google plugins (roadmap only)

Quick start (developers)

# 1. Build Intel CABI (chaosglue)
cd ~/git/chaosglue/npu/intel/cabi && ./build.sh

# 2. OpenVINO + NPU environment
source ~/git/chaosglue/npu/intel/example/setup_env.sh
export LOOM_ACCEL_INTEL_SO=~/git/chaosglue/npu/intel/cabi/build/libloom_accel_intel.so

# 3. Run Lucy validation
cd ~/git/chaosglue/loom/lucy
CGO_ENABLED=1 go run .
# → 9 → 4

Or: ./run_npu_bridge.sh from lucy/.


Future vendors (planned)

Vendor Plugin (planned) SDK / hardware
Intel libloom_accel_intel.so ✅ OpenVINO, Core Ultra NPU
Qualcomm libloom_accel_qcom.so QNN / Hexagon, Snapdragon X
Google libloom_accel_google.so TPU / PJRT (cloud + edge TBD)

Loom code path is identical: DiscoverAccelExecTargetSyncToAccelForwardPolymorphic.


Next targets (v0.82+)

  • AccelPlanner — auto-select CPU vs Intel CPU vs Intel NPU from shape + layer type
  • JSON exec field"intel-npu" per layer in network JSON
  • Parity — MatMul bias/layout, norm weight upload, shared INT8 quant
  • Qualcomm CABI stub in chaosglue npu/qualcomm/
  • ASM rollout (continues from v0.80 roadmap) — Dense backward, SwiGLU, MHA

Key source files

Area Files
Accel package poly/accel/*.go
Intel dispatch poly/accel_intel.go, poly/forward.go
Types poly/poly.go (ExecTarget, AccelBinding, net.Accel)
Lucy suite lucy/examples/nine_layer/
CABI chaosglue npu/include/loom_accel.h, npu/intel/cabi/

See also