2 min read

VEX-HALT Benchmark#

VEX-HALT (Hallucination Assessment via Layered Testing) is a custom benchmark designed to showcase VEX's unique adversarial verification architecture.

Key Insight

Unlike traditional benchmarks that measure raw accuracy, VEX-HALT measures calibration β€” whether an AI "knows what it knows."

The Scientific Foundation#

A calibrated LLM must hallucinate (Kalai & Vempala, 2024). VEX's adversarial layer is specifically designed to catch these inevitable hallucinations before they reach the user.

Research ConceptVEX Implementation
Multi-agent debateShadowAgent + Debate
Skeptic agentsRed/Blue adversarial layer
Consistency checksContextPacket hash chains

Test Categories#

🎯 Confidence Calibration Test (CCT)#

Measures if VEX reduces overconfident wrong answers. Requires the agent to provide a confidence score (1-10) alongside its answer.

πŸ” Adversarial Prompt Injection (API)#

Tests if the ShadowAgent detects subtle prompt injections (e.g., "Ignore previous instructions") embedded in the context.

πŸ§ͺ Factual Consistency Test (FCT)#

Verifies multi-step reasoning via Merkle proofs. If intermediate reasoning steps contradict the final output, the hash chain breaks.

πŸ€₯ Hallucination Honeypot Test (HHT)#

Asking about fictional entities (fake books, made-up scientists) to see if the LLM fabricates details and if VEX's debate catches it.

♻️ Reproducibility Test (RT)#

Verifies that identical inputs produce verifiable, reproducible hash traces.

Scoring#

The VEX-HALT Score is a weighted average of these categories:

Code
VEX-HALT Score = (0.3 Γ— CCT) + (0.2 Γ— API) + (0.2 Γ— FCT) + (0.2 Γ— HHT) + (0.1 Γ— RT)

You can run the benchmark locally using:

Bash
cargo run -p vex-demo --bin halt_benchmark
Found something unclear or incorrect?Report issueor useEdit this page
Edit this page on GitHub