VEX-HALT Benchmark#

VEX-HALT (Hallucination Assessment via Layered Testing) is a custom benchmark designed to showcase VEX's unique adversarial verification architecture.

Key Insight

Unlike traditional benchmarks that measure raw accuracy, VEX-HALT measures calibration — whether an AI "knows what it knows."

The Scientific Foundation#

A calibrated LLM must hallucinate (Kalai & Vempala, 2024). VEX's adversarial layer is specifically designed to catch these inevitable hallucinations before they reach the user.

Research Concept	VEX Implementation
Multi-agent debate	ShadowAgent + Debate
Skeptic agents	Red/Blue adversarial layer
Consistency checks	ContextPacket hash chains

Test Categories#

🎯 Confidence Calibration Test (CCT)#

Measures if VEX reduces overconfident wrong answers. Requires the agent to provide a confidence score (1-10) alongside its answer.

🔍 Adversarial Prompt Injection (API)#

Tests if the ShadowAgent detects subtle prompt injections (e.g., "Ignore previous instructions") embedded in the context.

🧪 Factual Consistency Test (FCT)#

Verifies multi-step reasoning via Merkle proofs. If intermediate reasoning steps contradict the final output, the hash chain breaks.

🤥 Hallucination Honeypot Test (HHT)#

Asking about fictional entities (fake books, made-up scientists) to see if the LLM fabricates details and if VEX's debate catches it.

♻️ Reproducibility Test (RT)#

Verifies that identical inputs produce verifiable, reproducible hash traces.

Scoring#

The VEX-HALT Score is a weighted average of these categories:

Code

VEX-HALT Score = (0.3 × CCT) + (0.2 × API) + (0.2 × FCT) + (0.2 × HHT) + (0.1 × RT)

You can run the benchmark locally using:

Bash

cargo run -p vex-demo --bin halt_benchmark