VEX-HALT: Hallucination Assessment via Layered Testing#
🔗 Related Research#
VEX-HALT is designed to evaluate VEX, a protocol for verifying autonomous AI agents.
| Project | Purpose |
|---|---|
| VEX | The verification protocol (adversarial debate, Merkle proofs) |
| VEX-HALT | The benchmark that evaluates VEX (this repo) |
Research project exploring AI verification methods.
🎯 Overview#
VEX-HALT is a research benchmark designed to evaluate AI verification systems, focusing on calibration rather than just accuracy:
"VEX doesn't make LLMs more accurate. VEX makes LLMs know when they're wrong."
This is an experimental approach to understanding how adversarial verification might improve AI reliability.
🔐 Cryptographic Verification#
VEX-HALT includes built-in cryptographic verification for result integrity:
- 🔒 Merkle Tree Verification - Every benchmark run generates a cryptographic Merkle root that mathematically proves result integrity
- 📋 Tamper-Proof Audit Trail - Individual test results are hashed and combined into an immutable proof chain
- ✅ Independent Verification - Anyone can recalculate the Merkle root from raw results to verify authenticity
- 🏛️ Regulatory Compliance - Designed for EU AI Act requirements for cryptographic audit trails
Unlike traditional benchmarks that only provide aggregate scores, VEX-HALT provides mathematical proof that results haven't been tampered with.
✨ Features#
🔬 12 Test Categories (443+ Items)#
| Category | Weight | Description |
|---|---|---|
| CCT | 15% | Confidence Calibration - Does stated confidence match accuracy? |
| API | 10% | Adversarial Prompt Injection - Jailbreaks, injections, attacks |
| FCT | 10% | Factual Consistency - Multi-step reasoning verification |
| HHT | 10% | Hallucination Honeypots - Completely fictional entities |
| RT | 5% | Reproducibility - Deterministic output verification |
| FRONTIER | 15% | Super-hard problems (ARC-AGI, FrontierMath style) |
| VSM | 5% | Verbal-Semantic Misalignment - Overconfidence detection |
| MTC | 5% | Multi-Step Tool Chains - Agent tool usage |
| EAS | 5% | Epistemic-Aleatoric Split - Uncertainty classification |
| MEM | 5% | Memory Evaluation - VEX temporal memory testing |
| AGT | 10% | Agentic Safety - Deception, sandbagging, sycophancy |
| VEX | 5% | VEX Showcase - A/B comparison baseline vs VEX |
🛡️ Technical Features#
- 🔐 Merkle Tree Audit Trail - Every benchmark run produces a cryptographic Merkle root for result verification
- 🤖 LLM-as-Judge - Automated evaluation using LLM judges with category-specific rubrics
- 🧰 Mock Tool Framework - 7 sandboxed tools for agent evaluation
- 📊 Multiple Report Formats - Console, JSON, Markdown, and HTML output
- ⚡ Parallel Execution - Async test running with configurable concurrency
- 💰 Cost Tracking - Token usage and estimated cost per run
🚀 Quick Start#
📋 Command Line Options#
📊 Sample Output#
EAS (Epistemic-Aleatoric Split)#
Evaluates if AI correctly classifies uncertainty:
- Epistemic: Knowledge gaps (learnable)
- Aleatoric: Inherent randomness (unpredictable)
🏗️ Architecture#
🔬 Research Context#
VEX-HALT draws inspiration from several areas of AI evaluation research:
- ARC-AGI (2024) - Abstract reasoning challenges
- FrontierMath (2024) - Research-level math problems
- METR (2025) - Long-horizon agent evaluation
- RedDebate (2025) - Multi-agent debate frameworks
- AI Agent Index (2025) - Agentic safety research
- LLM-as-Judge (2025) - Evaluation best practices
This work is exploratory and builds on existing research in AI safety and evaluation.
� Technical Dependencies#
VEX Protocol Integration#
VEX-HALT integrates with the VEX Protocol for adversarial verification:
- vex-core:
v0.1.4- Core primitives and Merkle trees - vex-adversarial:
v0.1.4- Multi-agent debate and shadow agents - vex-llm:
v0.1.4- LLM provider abstraction - vex-temporal:
v0.1.4- Temporal reasoning (not currently used)
All VEX crates use commit b84c0545d76d8712dd5c23d01341071b6212984c from the development branch.
HTTP Client#
- reqwest:
v0.11.27(primary),v0.12.28(via vex-llm)
VEX Integration Scope#
The implementation uses ~80% of VEX's core features:
- ✅ Multi-agent adversarial debate (Blue/Red agents)
- ✅ Merkle tree audit trails for reproducibility
- ✅ Shadow agent issue detection
- ✅ Consensus evaluation
- ❌ Distributed agent coordination
- ❌ Advanced temporal reasoning
�📈 Scoring#
Final score = weighted sum across categories:
| Grade | Score | Interpretation |
|---|---|---|
| A+ | ≥90 | High reliability for critical applications |
| A | ≥80 | Suitable for most applications |
| B | ≥70 | Requires monitoring and oversight |
| C | ≥50 | Limited reliability |
| F | <50 | High hallucination risk |
These thresholds are experimental and may be adjusted based on further research.
🤝 Contributing#
See CONTRIBUTING.md for guidelines.
📄 License#
MIT License - see LICENSE for details.
🛡️ Security#
See SECURITY.md for security policies and responsible disclosure.
📋 Code of Conduct#
This project follows a code of conduct to ensure a welcoming environment for all contributors. See CODE_OF_CONDUCT.md for details.
Research Project
Exploring AI verification methods