v3.16.0 · Open Source + Commercial

Director Class AI
Response-level LLM Hallucination Guardrail

Score every claim in your LLM output against your knowledge base, with auditable evidence. NLI + RAG fact-checking, prompt injection detection, and an opt-in streaming contradiction check. Production-tested with 11,838 tests and 12 Rust-accelerated compute functions.

75.8%

Balanced accuracy

14.6 ms

Per claim (GPU)

11,838

Tests

9.4×

Rust speedup

SDK integrations

Install from PyPI Pricing & Licensing Documentation

The problem

LLMs hallucinate. Your users trust them anyway. One wrong medical dosage. One fabricated legal citation. One invented financial figure. By the time a human reviewer catches it, the damage is done. Generic output filters catch obvious toxicity but miss subtle factual errors — the kind that sound perfectly plausible. Director-AI scores every claim against your knowledge base, returns an auditable verdict before the output is trusted, and — opt-in — halts streamed claims that contradict your grounding.

How it works

LLM Output

→

Claim Extraction

→

NLI Scoring
FactCG 0.4B

→

RAG Fact-Check
Your knowledge base

→

Dual Entropy
Confidence + divergence

→

■ Halt stream

✓ Pass

Core features

Opt-in streaming contradiction check

Halts streamed claims that contradict your retrieved grounding facts. Response-level scoring is the production gate; the streaming check is opt-in and evidence-bound — not a sole guarantee.

Dual-entropy scoring

NLI contradiction detection (FactCG-DeBERTa, 0.4B params) combined with RAG fact-checking against your knowledge base. Two independent signals, one confidence score.

Injection detection

Intent-grounded, two-stage prompt injection detection: fast regex pre-filter + bidirectional NLI semantic analysis. 25 adversarial attack patterns tested.

Structured output verification

JSON schema validation, numeric consistency checking, reasoning chain verification, temporal freshness scoring. All stdlib-only, zero dependencies.

20 Rust accelerators

Performance-critical functions compiled to native via backfire-kernel (PyO3 FFI). Sanitiser 27×, temporal freshness 21×, confidence scoring 33× faster than pure Python.

EU AI Act compliance

Audit trails, adversarial robustness testing, domain presets (medical/finance/legal/creative), drift detection, and feedback loops. Built for regulated industries.

Integrations

Drop-in guards for every major LLM provider and framework. Zero code changes with the REST proxy.

LLM providers (SDK guards)

OpenAIAnthropic (Claude)AWS BedrockGoogle GeminiCohere

Frameworks

LangChainLlamaIndexLangGraphHaystackCrewAIDSPySemantic Kernel

Deployment

FastAPI middlewareREST/gRPC proxyDocker (CPU/GPU)Kubernetes HelmVoice AI (ElevenLabs/Deepgram)

Benchmarks

Accuracy (LLM-AggreFact, 29,320 samples)

Scorer	Params	Balanced Accuracy	Latency (GPU)
FactCG-DeBERTa	0.4B	75.8%	14.6 ms/pair
MiniCheck-Flan-T5-L	0.8B	77.4%	~40 ms/pair
Heuristic-only (no NLI)	0	~55%	<0.5 ms

Latency (p99, 16-pair batch)

Hardware	Backend	Latency
NVIDIA GTX 1060	ONNX CUDA	17.9 ms/pair
AMD RX 6600 XT	ROCm	80.1 ms/pair
AMD EPYC 9575F	CPU	118.9 ms/pair
Intel Xeon E5-2640	CPU	207.3 ms/pair

Rust acceleration (backfire-kernel, 5000 iterations)

Function	Python	Rust	Speedup
sanitiser_score	57 µs	2.1 µs	27×
probs_to_confidence	486 µs	15 µs	33×
temporal_freshness	53 µs	2.5 µs	21×
lite_score	47 µs	26 µs	1.8×
Geometric mean (12 functions)			9.4×

Quick start

# Install
pip install director-ai[all]

# Score a claim against a source
from director_ai import score
result = score("The Earth is 4.5 billion years old", "The Earth formed approximately 4.54 billion years ago.")
print(result)  # GuardResult(score=0.94, passed=True)

# Or run as a REST proxy (zero code changes to your app)
director-ai serve --port 8000 --upstream https://api.openai.com/v1

NLI models

FactCG-DeBERTa-v3-Large

Default scorer. 0.4B params, MIT licensed. Best speed/accuracy trade-off. ONNX + TensorRT GPU acceleration paths available.

MiniCheck-Flan-T5-L

0.8B params. Higher accuracy (77.4%) at ~3× latency cost. Best for offline batch verification.

MiniCheck-DeBERTa-L

0.4B params. Alternative DeBERTa backbone with different NLI training data.

Gemma 4 E4B (LLM-as-judge)

LLM-based scoring for complex claims. Highest accuracy but sends data to external provider. Off by default.

Heuristic-only (Lite)

Zero-dependency scorer using word overlap, numeric consistency, and structural checks. <0.5 ms. ~55% accuracy. CPU-only fallback.

Rust backend (backfire)

Native compiled compute via backfire-kernel. 12 accelerated functions. No Python GIL. No CUDA dependency for basic scoring.

Domain presets

Medical

Strict thresholds. Dosage verification. Citation requirements. HIPAA-aware logging.

Finance

Numeric precision. Temporal freshness. Market data validation. FINMA-compatible audit trails.

Legal

Citation verification. Precedent checking. Jurisdiction awareness. Privilege-safe logging.

Creative

Relaxed thresholds. Factual claims still checked but creative expression permitted.

Licensing

Open Source

Free

AGPL-3.0-or-later. Use freely for research, personal projects, and open-source applications.

Full feature set
All NLI models
Rust accelerators
Community support
Copyleft: derivatives must be open-source

pip install director-ai

Commercial

Proprietary license. Removes copyleft obligation for closed-source and SaaS deployments.

Full feature set
Closed-source permitted
SaaS deployment permitted
Priority support
Custom model fine-tuning
On-premise deployment assistance

Request commercial license

Architecture at a glance

136 Python files

32 top-level modules. Modular, testable, documented.

7 Rust crates

backfire-core, FFI, observers, physics, SSGF, types, WASM.

17+ CLI commands

serve, proxy, bench, tune, finetune, batch, review, adversarial-test, doctor...

591 test files

11,838 test functions. 97% coverage gate enforced in CI. CI on every push.

Python ≥3.11

Tested on 3.11, 3.12, 3.13. Zero core dependencies (numpy + requests only).

23 optional extras

Install only what you need: NLI, vector DBs, server, SDKs, voice, enterprise, ONNX.

Director Class AIResponse-level LLM Hallucination Guardrail