Image Generated by Author Using AI
You ask your database: "Show me customers who spent more than average last quarter."
The AI generates SQL. Executes it. Returns an error.
Tries again. Same error, different phrasing.
Third attempt. Still broken.
This happens constantly with Text-to-SQL systems. Even GPT-4 fails on complex joins, ambiguous columns, and aggregation logic. Most systems retry blindly, making the same mistakes in slightly different ways.
A recent paper from Max Planck Institute and AWS GenAI introduced SQL-of-Thought, a multi-agent framework that categorizes 31 types of SQL errors and fixes them systematically. It hits 91.59% accuracy on the Spider benchmark.
Instead of just explaining the research, I built a working demo to see how this approach handles real errors.
Here's what I learned.
The Problem with Single-Agent Text-to-SQL
Traditional approaches use one LLM call to generate SQL directly from natural language.
When the query fails, they retry. Maybe adjust the prompt. Try again.
No systematic understanding of what went wrong. No targeted fix based on error type.
Common failures include:
- Schema mismatches (wrong table or column names)
- Ambiguous column references (same name in multiple tables)
- Missing joins between related tables
- Incorrect aggregation functions
- Wrong filter conditions
The paper's insight: categorize errors into 31 types across 9 categories, then apply targeted corrections based on error classification.
Understanding the Paper's Architecture
Before diving into the demo, here's how SQL-of-Thought actually works.
Figure 1 from Research Paper - Multi-Agent Pipeline
The Multi-Agent Pipeline
The framework orchestrates six specialized agents in sequence:
- Schema Linking Agent receives the natural language question and database schema, then identifies relevant tables, columns, primary keys, and foreign key relationships.
- Subproblem Agent takes the schema-linked output and breaks the query into clause-level components: WHERE conditions, JOIN requirements, GROUP BY aggregations, ORDER BY sorting, HAVING filters, LIMIT clauses.
- Query Plan Agent uses chain-of-thought reasoning to create a step-by-step execution plan. This agent thinks through the problem procedurally before writing any SQL.
- SQL Agent converts the query plan into executable SQL, handling syntax and formatting.
- DB Execution Engine runs the query against the database.
- Correction Loop activates on failure:
- Correction Plan Agent analyzes the error using the taxonomy, identifies root cause, generates fix strategy
- Correction SQL Agent regenerates query based on correction plan
- Loop repeats until success or max attempts reached
The key innovation: when errors occur, the system doesn't just retry. It diagnoses the error type and applies a targeted fix.
The Error Taxonomy
Figure 2 from Research Paper - Error Taxonomy
The correction system categorizes SQL failures into 9 major categories with 31 specific error types.
table 1 from Research Paper - Error Taxonomy
Why This Taxonomy Matters
Traditional systems see only execution errors: "column not found" or "ambiguous reference." They retry with slight variations, often repeating the same mistake.
SQL-of-Thought categorizes the error type and applies the known solution pattern. If it's ambiguous_col, add table prefixes. If it's join_missing, identify the join path through foreign keys. If it's agg_no_groupby, add GROUP BY for non-aggregated columns.
The taxonomy transforms debugging from guesswork into systematic diagnosis.
Building the Demo
I wanted to see this work with real data and real errors.
Database Choice
I used Chinook, a SQLite database modeling a music store. It has 11 tables: Artist, Album, Track, Invoice, Customer, Employee, and more. Realistic complexity with foreign keys and multiple join paths.
Tech Stack
- Backend: Node.js + Express + DuckDB (SQLite compatibility)
- Frontend: Vite + vanilla JavaScript
- LLM: OpenAI GPT-4o-mini (fast and cost-effective)
- Real-time updates: Server-Sent Events to stream each agent step
The 6 Specialized Agents
Agent Architecture - Image by Author
Following the paper's architecture, I built six specialized agents in Node.js:
1. Schema Linking Agent
Identifies relevant tables and columns from the database schema.
Takes the natural language question and full schema, and returns only the subset needed for the query.
For "top selling tracks," it identifies Track, InvoiceLine, Album, and Artist tables with their relationships.
2. Subproblem Agent
Breaks the question into SQL clause components.
Takes the question and linked schema, outputs structured breakdown of SELECT, FROM, JOIN, WHERE, GROUP BY, ORDER BY, LIMIT clauses.
Example breakdown:
{
"SELECT": "Track.Name, SUM(revenue)",
"FROM": "Track, InvoiceLine",
"JOIN": "Track.TrackId = InvoiceLine.TrackId",
"GROUP BY": "Track.Name",
"ORDER BY": "revenue DESC",
"LIMIT": "5"
}
3. Query Plan Agent (Chain-of-Thought)
Creates step-by-step execution plan with reasoning for each action.
Takes question, schema, and subproblems, outputs numbered steps explaining the logic:
- Step 1: Start with Track table (contains track names)
- Step 2: JOIN InvoiceLine (links tracks to revenue via TrackId)
- Step 3: Calculate SUM(UnitPrice × Quantity) for total revenue
- Step 4: GROUP BY Track.Name to aggregate per track
- Step 5: ORDER BY revenue DESC, LIMIT 5 for top results
This reasoning step prevents logical errors before SQL generation.
4. SQL Generation Agent
Converts the plan into executable SQL.
Takes the query plan and schema, outputs syntactically correct SQL with proper table aliases and join conditions.
5. SQL Execution
Runs the generated SQL against DuckDB.
Success returns results. Failure captures the error message and triggers correction.
6. Correction Agents (The Critical Innovation)
When execution fails, two agents collaborate:
- Correction Plan Agent runs DESCRIBE queries on problematic tables to inspect the actual schema. It classifies the error using the taxonomy (schema_link.ambiguous_col, join.wrong_condition, etc.) and identifies root cause with fix strategy.
- Correction SQL Agent applies the fix by rewriting SQL based on the correction plan. It uses actual column names from DESCRIBE output rather than guessing. Maximum 3 correction attempts with different strategies.
This correction loop is what makes the system robust. No blind retry; diagnostic analysis and targeted repair.
The UI shows each agent's output in real-time. You watch schema linking, see the query plan being built, view generated SQL, and observe error correction when needed.
Demo 1: Simple Query Success
Let's start with a straightforward query.
Query: "What are the top 5 best-selling tracks by total revenue?"
Demo 1 - Simple Query Success - Image by Author
Here's what happens behind the scenes:
- Schema Linking identifies the tables needed: Track (for names), InvoiceLine (for sales), and the relationship TrackId connecting them.
- Subproblem decomposition breaks it into: SELECT track name and revenue sum, JOIN Track with InvoiceLine, GROUP BY track name, ORDER BY revenue descending, LIMIT to 5.
- Query Plan reasons through the logic: start with Track table, join to InvoiceLine via TrackId, calculate revenue as UnitPrice × Quantity, aggregate per track, sort by total, take top 5.
- SQL Generation produces:
SQL
SELECT t.Name AS TrackName, SUM(il.UnitPrice * il.Quantity) AS TotalRevenue FROM Track t JOIN InvoiceLine il ON t.TrackId = il.TrackId GROUP BY t.Name ORDER BY TotalRevenue DESC LIMIT 5;
Result: ✅ Success in first attempt. 5 rows returned in 2.8 seconds.
This shows the pipeline working smoothly when no errors occur.
Demo 2: Ambiguous Column Error with Guided Correction
Now the interesting part.
Query: "Get all invoice items with their unit price and track unit price"
Demo 2 - Ambiguous Column Error - Image Generated by Author
The problem: Both InvoiceLine and Track tables have a UnitPrice column. The initial SQL failed with "ambiguous column reference."
How the correction worked:
- Correction Plan Agent classified the error as
schema_link.ambiguous_col - Ran
DESCRIBE InvoiceLineandDESCRIBE Trackto see actual schemas - Identified that both tables contain UnitPrice
- Generated fix strategy: add table prefixes to disambiguate
- Correction SQL Agent regenerated SQL with proper table aliases
Result: ✅ Success on second attempt after targeted correction.
Error Taxonomy in Action
Here's how the taxonomy guides corrections for different error types:
Error Categories and Solutions - Image by Author
Each category has a known solution pattern. The correction agent identifies the category and applies the appropriate fix.
Why This Approach Works
Systematic Error Handling
Instead of random retry, each error type has a diagnostic procedure and fix strategy. The agent knows what to check and how to repair it.
Dynamic Schema Inspection
The correction agent runs DESCRIBE queries to see actual table structures. No hardcoded schema knowledge; it adapts to any database.
Multi-Agent Collaboration
Each agent specializes:
- Schema linking focuses on identifying relevant tables
- Query planning focuses on logical flow
- SQL generation focuses on syntax
- Correction focuses on debugging
This separation reduces cognitive load on each LLM call.
Transparency
Users see every step. When errors occur, they understand what went wrong and how it was fixed. This builds trust in the system.
Limitations and Tradeoffs
1. No Value Retrieval Agent
The paper's agents don't retrieve sample data from tables to inform filtering decisions.
Example limitation: For "show me customers in California," the system must guess whether the database uses "California," "CA," "calif," or another variant.
A value retrieval agent could run:
SELECT DISTINCT State FROM Customer LIMIT 20;
This would show actual values: ["CA", "NY", "TX"…] and inform the WHERE clause construction.
Without this, queries fail when:
- State abbreviations vs full names differ from user's phrasing
- Date formats are ambiguous (MM/DD/YYYY vs YYYY-MM-DD)
- Category names use internal codes ("cat_001" vs "Electronics")
- Case sensitivity matters ("active" vs "Active")
2. Token Costs
Multi-agent architecture uses more tokens than single-call approaches. The demo averaged:
- Simple query: ~2,000 tokens ($0.003)
- Complex query with correction: ~5,500 tokens ($0.008)
Production systems would need cost optimization through hybrid models (reasoning models for planning, cheaper models for execution).
3. Latency
Six agent calls take longer than one call. Average end-to-end time:
- Single-agent baseline: 1.8 seconds
- SQL-of-Thought: 4.2 seconds
The accuracy gain justifies the latency for most use cases.
Future Improvements
Several directions could enhance the demo:
- Fine-tuned models for specific agent roles. A small model trained on schema linking could reduce costs while maintaining accuracy.
- Query execution plan visualization. Show users how the database actually executes the generated SQL, not just the SQL itself.
- Support for more databases. PostgreSQL, MySQL, SQL Server each have dialect-specific quirks that need custom correction strategies.
- Interactive correction guidance. Let users provide hints when automatic correction fails, teaching the system new error patterns.
- Performance optimization tracking. Not just correctness; also measure query execution time and suggest index improvements.
Conclusion
Image Generated by Author Using AI
The SQL-of-Thought paper introduces a practical framework for reliable Text-to-SQL translation.
The key insight: categorize errors systematically and apply targeted fixes based on error type. This transforms SQL generation from a trial-and-error process into a diagnostic system.
Building the demo confirmed that multi-agent architecture with taxonomy-guided correction works on real data. The approach handles ambiguous columns, missing joins, wrong aggregations, and other common failures that break single-agent systems.
The 91.59% accuracy on the Spider benchmark isn't just a number. It represents a systematic approach to error handling that makes Text-to-SQL reliable enough for production use.
Try It Yourself
Full source code and working demo available on GitHub
Need Help Implementing Multi-Agent Text-to-SQL?
I offer consulting services for advanced Text-to-SQL systems. Let's discuss your requirements.
Schedule a Consultation