ConsensusLLM Oracle
A 4-stage anti-hallucination oracle on Chainlink CRE that verifies any LLM answer through multi-source evidence, consensus, debate, and adjudication.
What it is
ConsensusLLM Oracle is a decentralized anti-hallucination oracle that applies Chainlink's price feed principle — multiple independent sources plus consensus — to LLM responses. Built for the Chainlink Convergence Hackathon 2026 (CRE & AI, Privacy & Confidential Data, and Tenderly Virtual TestNets tracks).
Problem: LLMs hallucinate — they generate coherent but factually incorrect answers. Traditional oracles verify structured data (prices, scores), but can't verify open-ended questions. When LLM-powered agents feed unverified answers to smart contracts with financial consequences, hallucinations become fund losses. A single LLM answering "What is the current ETH price?" can confidently return stale or fabricated data, and no amount of prompt engineering eliminates this risk.
Solution: A 4-stage cognitive pipeline chaining independent Chainlink CRE workflows through onchain events. Each stage implements a peer-reviewed anti-hallucination strategy:
Stage 1 — Evidence Gathering: Queries 4 independent search engines (Jina, Gemini with Google Search grounding, OpenAI Search, Groq Compound) plus Chainlink Data Feeds (ETH/USD, BTC/USD, LINK/USD via AggregatorV3Interface.latestRoundData() on Sepolia) for financial queries. For example, when asked "What is the price of ETH?", the pipeline cross-validates LLM answers against live Chainlink Data Feed values — a price mismatch triggers rejection. Sources are tagged with temporal freshness.
Stage 2 — Cognitive Consensus: 4 diverse LLMs from different families (Llama 3.3 70B, Kimi K2, Qwen3 32B, GPT-OSS 120B via Groq) answer the query using Stage 1 evidence. Responses are clustered using Belief-Calibrated Consensus (BCCS) with Fact Signature Extraction. The key insight: different LLMs from different families rarely hallucinate in the same way.
Stage 2b — Adversarial Debate (conditional): Only triggers when consensus is PARTIAL or NONE. Dissenting models see the majority position and must maintain or revise their answers (Multi-Agent Debate). Up to 2 rounds with adaptive early break.
Stage 3 — Adjudicator: An independent judge LLM (Llama 4 Maverick 17B, different family) evaluates the answer. Produces a Composite Integrity Score (evidence quality + model cohesion + fact grounding) with domain-adaptive thresholds — financial queries are verified against Chainlink Data Feed values in the fact grounding component, making price hallucinations detectable and rejectable.
The pipeline supports any language, automatically classifies queries by domain (financial, sports, science, current affairs, general), and adapts verification intensity accordingly. Every stage writes results onchain via OraclePipelineCoordinator with full hash-based traceability. Oracle queries are monetized via Coinbase's x402 protocol (USDC on Base Sepolia). Deployed on Tenderly Virtual TestNets with built-in transaction explorer.
Scale: ~69,000 lines of TypeScript, ~20,600 lines of Vue components, ~13,000 lines of CRE workflow code, ~3,400 lines of Solidity, 78 contract tests, 476+ workflow test assertions, and 50+ documented architectural decisions.
How it Works
4 independent CRE workflows in TypeScript compiled to WASM via Javy/QuickJS, using @chainlink/cre-sdk v1.1.2. Each workflow runs as an isolated WASM binary on the Chainlink DON with its own 5-HTTP-call budget. Stages 1 and 3 use Chainlink's ConfidentialHTTPClient (TEE enclave execution with threshold-decrypted API keys). Stages 2/2b intentionally use regular HTTP because multi-node LLM execution IS the security property.
7 Solidity contracts (0.8.24) with a three-layer hierarchy: ReceiverTemplate -> CREConsumerBase -> ConsensusOracleConsumer, plus OraclePipelineCoordinator as event router with hash-based storage. Constructor-level identity validation and AtomicProxyDeployer for front-run-proof initialization. Built with @asanrom/smart-contract-wrapper (not Hardhat).
Chainlink Data Feeds (ETH/USD, BTC/USD, LINK/USD) are read via callContract() on Sepolia using AggregatorV3Interface.latestRoundData(), providing onchain price anchors that Stage 3 uses to validate financial query answers — if an LLM claims ETH is $500 but the Data Feed says $2,400, the response is rejected.
Backend: Express.js + MongoDB with event scanner monitoring Sepolia/Tenderly for PipelineStageCompleted events, auto-chaining stages including conditional debate detection. WebSocket notifications via Cluster IPC.
Frontend: Vue 3 + Nuxt with real-time pipeline visualization, integrity score breakdowns, debate round views, confidence gauges, and PDF report export.
Payments: x402 protocol (Coinbase) with EIP-712 signed USDC authorization on Base Sepolia. API key system with credit-based access. Deployed on Tenderly Virtual TestNets forking Sepolia state including Chainlink Data Feeds. Also compatible easy to configure Sepolia or Tenderly Virtual TestNets via scripts.
Links
Created by
- Diego Valdeolmillos Villaverde