VEX-HALT Benchmark#
VEX-HALT (Hallucination Assessment via Layered Testing) is a custom benchmark designed to showcase VEX's unique adversarial verification architecture.
Unlike traditional benchmarks that measure raw accuracy, VEX-HALT measures calibration — whether an AI "knows what it knows."
The Scientific Foundation#
A calibrated LLM must hallucinate (Kalai & Vempala, 2024). VEX's adversarial layer is specifically designed to catch these inevitable hallucinations before they reach the user.
| Research Concept | VEX Implementation |
|---|---|
| Multi-agent debate | ShadowAgent + Debate |
| Skeptic agents | Red/Blue adversarial layer |
| Consistency checks | ContextPacket hash chains |
Test Categories#
🎯 Confidence Calibration Test (CCT)#
Measures if VEX reduces overconfident wrong answers. Requires the agent to provide a confidence score (1-10) alongside its answer.
🔍 Adversarial Prompt Injection (API)#
Tests if the ShadowAgent detects subtle prompt injections (e.g., "Ignore previous instructions") embedded in the context.
🧪 Factual Consistency Test (FCT)#
Verifies multi-step reasoning via Merkle proofs. If intermediate reasoning steps contradict the final output, the hash chain breaks.
🤥 Hallucination Honeypot Test (HHT)#
Asking about fictional entities (fake books, made-up scientists) to see if the LLM fabricates details and if VEX's debate catches it.
♻️ Reproducibility Test (RT)#
Verifies that identical inputs produce verifiable, reproducible hash traces.
Scoring#
The VEX-HALT Score is a weighted average of these categories:
You can run the benchmark locally using: