← Anulum Institute
v3.12.0 · Open Source + Commercial

Director Class AI
Real-time LLM Hallucination Guardrail

Stop hallucinations before they reach your users. Token-level streaming halt, NLI fact-checking, prompt injection detection. Production-tested with 4,310+ tests and 12 Rust-accelerated compute functions.

75.8%
Balanced accuracy
14.6 ms
Per claim (GPU)
4,310+
Tests
9.4×
Rust speedup
12
SDK integrations

The problem

LLMs hallucinate. Your users trust them anyway. One wrong medical dosage. One fabricated legal citation. One invented financial figure. By the time a human reviewer catches it, the damage is done. Generic output filters catch obvious toxicity but miss subtle factual errors — the kind that sound perfectly plausible. Director-AI intercepts the stream before it reaches your users, scores every claim against your knowledge base, and halts generation the moment coherence degrades.

How it works

LLM Output
Claim Extraction
NLI Scoring
FactCG 0.4B
RAG Fact-Check
Your knowledge base
Dual Entropy
Confidence + divergence
■ Halt stream
/
✓ Pass

Core features

Token-level streaming halt
Severs LLM output mid-generation when coherence degrades. Not a post-hoc filter — a real-time guardrail that stops hallucinations before they reach the user.
Dual-entropy scoring
NLI contradiction detection (FactCG-DeBERTa, 0.4B params) combined with RAG fact-checking against your knowledge base. Two independent signals, one confidence score.
Injection detection
Intent-grounded, two-stage prompt injection detection: fast regex pre-filter + bidirectional NLI semantic analysis. 25 adversarial attack patterns tested.
Structured output verification
JSON schema validation, numeric consistency checking, reasoning chain verification, temporal freshness scoring. All stdlib-only, zero dependencies.
12 Rust accelerators
Performance-critical functions compiled to native via backfire-kernel (PyO3 FFI). Sanitiser 27×, temporal freshness 21×, confidence scoring 33× faster than pure Python.
EU AI Act compliance
Audit trails, adversarial robustness testing, domain presets (medical/finance/legal/creative), drift detection, and feedback loops. Built for regulated industries.

Integrations

Drop-in guards for every major LLM provider and framework. Zero code changes with the REST proxy.

LLM providers (SDK guards)

OpenAIAnthropic (Claude)AWS BedrockGoogle GeminiCohere

Frameworks

LangChainLlamaIndexLangGraphHaystackCrewAIDSPySemantic Kernel

Deployment

FastAPI middlewareREST/gRPC proxyDocker (CPU/GPU)Kubernetes HelmVoice AI (ElevenLabs/Deepgram)

Benchmarks

Accuracy (LLM-AggreFact, 29,320 samples)

ScorerParamsBalanced AccuracyLatency (GPU)
FactCG-DeBERTa0.4B75.8%14.6 ms/pair
MiniCheck-Flan-T5-L0.8B77.4%~40 ms/pair
Heuristic-only (no NLI)0~55%<0.5 ms

Latency (p99, 16-pair batch)

HardwareBackendLatency
NVIDIA GTX 1060ONNX CUDA17.9 ms/pair
AMD RX 6600 XTROCm80.1 ms/pair
AMD EPYC 9575FCPU118.9 ms/pair
Intel Xeon E5-2640CPU207.3 ms/pair

Rust acceleration (backfire-kernel, 5000 iterations)

FunctionPythonRustSpeedup
sanitiser_score57 µs2.1 µs27×
probs_to_confidence486 µs15 µs33×
temporal_freshness53 µs2.5 µs21×
lite_score47 µs26 µs1.8×
Geometric mean (12 functions)9.4×

Quick start

# Install pip install director-ai[all] # Score a claim against a source from director_ai import score result = score("The Earth is 4.5 billion years old", "The Earth formed approximately 4.54 billion years ago.") print(result) # GuardResult(score=0.94, passed=True) # Or run as a REST proxy (zero code changes to your app) director-ai serve --port 8000 --upstream https://api.openai.com/v1

NLI models

FactCG-DeBERTa-v3-Large
Default scorer. 0.4B params, MIT licensed. Best speed/accuracy trade-off. ONNX + TensorRT GPU acceleration paths available.
MiniCheck-Flan-T5-L
0.8B params. Higher accuracy (77.4%) at ~3× latency cost. Best for offline batch verification.
MiniCheck-DeBERTa-L
0.4B params. Alternative DeBERTa backbone with different NLI training data.
Gemma 4 E4B (LLM-as-judge)
LLM-based scoring for complex claims. Highest accuracy but sends data to external provider. Off by default.
Heuristic-only (Lite)
Zero-dependency scorer using word overlap, numeric consistency, and structural checks. <0.5 ms. ~55% accuracy. CPU-only fallback.
Rust backend (backfire)
Native compiled compute via backfire-kernel. 12 accelerated functions. No Python GIL. No CUDA dependency for basic scoring.

Domain presets

Medical
Strict thresholds. Dosage verification. Citation requirements. HIPAA-aware logging.
Finance
Numeric precision. Temporal freshness. Market data validation. FINMA-compatible audit trails.
Legal
Citation verification. Precedent checking. Jurisdiction awareness. Privilege-safe logging.
Creative
Relaxed thresholds. Factual claims still checked but creative expression permitted.

Licensing

Open Source
Free
AGPL-3.0-or-later. Use freely for research, personal projects, and open-source applications.
  • Full feature set
  • All NLI models
  • Rust accelerators
  • Community support
  • Copyleft: derivatives must be open-source
pip install director-ai
Commercial
Contact us
Proprietary license. Removes copyleft obligation for closed-source and SaaS deployments.
  • Full feature set
  • Closed-source permitted
  • SaaS deployment permitted
  • Priority support
  • Custom model fine-tuning
  • On-premise deployment assistance
Request commercial license

Architecture at a glance

136 Python files
32 top-level modules. Modular, testable, documented.
7 Rust crates
backfire-core, FFI, observers, physics, SSGF, types, WASM.
17+ CLI commands
serve, proxy, bench, tune, finetune, batch, review, adversarial-test, doctor...
217 test files
4,310+ test functions. 90% coverage enforced. CI on every push.
Python ≥3.11
Tested on 3.11, 3.12, 3.13. Zero core dependencies (numpy + requests only).
23 optional extras
Install only what you need: NLI, vector DBs, server, SDKs, voice, enterprise, ONNX.