← Back to All Solutions

Building a Text2SQL Agent With Long-Term Memory: A Production Grade Architecture for User Memory Isolation

How to implement user-specific, long-term memory for smarter database interactions

Text2SQL Agent with Long-Term Memory

Long-term memory transforms ordinary text2SQL agents into intelligent assistants that remember user preferences across sessions. This tutorial shows how to build a custom implementation inspired by the Mem0 architecture research paper.

Important Clarification

This implementation doesn't use the official Mem0 libraries from https://mem0.ai/research. Instead, I have developed the code from scratch that follows the core principles described in the Mem0 research paper. This gives us more flexibility to adapt the architecture specifically for text2SQL applications.

Why Long-Term Memory Matters for Text2SQL

In traditional text2SQL systems, conversations restart from scratch each session. Preferences, terminology, and context vanish, forcing users to re-educate their AI assistant repeatedly.

This creates a significant cognitive load: financial analysts must specify "only show approved loans" every morning, property managers must redefine "luxury properties" each session, and follow-up questions require constant context refreshing.

Long-term memory transforms this experience by enabling your text2SQL agent to:

  • Remember preferences across sessions, automatically filtering by user-specific criteria like "exclude canceled orders"
  • Learn domain terminology that bridges user language and database schema, such as "quarterly performers" = "accounts with activity in last three months"
  • Maintain context over time, allowing natural follow-ups like "How many of those were from California?" days later
  • Build personalized understanding of each user's data interests, improving relevance without explicit instruction

This capability doesn't just enhance experience — it fundamentally changes how people interact with data, creating a more intuitive, human-like interface to complex database information.

What is Mem0 and How it Inspired Our Multi-User Text2SQL Architecture

Mem0 (pronounced "mem-zero") is a memory architecture from mem0.ai that solves limited context window problems through a two-phase system: extracting key information from conversations and updating memory to maintain consistency.

This architecture enhances Mem0 with multi-user memory isolation:

  1. Complete memory isolation: Each user's preferences stored separately, preventing cross-contamination.
  2. User-targeted extraction: This Text2SQL Memory component identifies SQL-relevant elements (entities, preferences, terminology, metrics) per user.
  3. Database integration: Connects personalized memories directly to database schema for accurate SQL generation.

This architecture creates personalized database interfaces where financial analysts automatically filter by preferred metrics, property managers use specialized terminology, and executives reference custom KPIs — all without restating preferences each session.

Text2SQL Long-Term Memory Architecture

Architecture Overview - Image by Author

Core Components of This Text2SQL + Long-Term Memory Architecture

As illustrated in the diagram, this text2sql long-term memory architecture consists of two primary phases that work in tandem:

1. Extraction Phase (Blue)

The extraction phase captures important information from conversations:

  • Message Ingestion: The system processes new message pairs (user question + AI response).
  • Context Retrieval: Two sources of context are leveraged:
    • A conversation summary that encapsulates the entire history
    • The most recent m messages for immediate context
  • LLM Processing: An LLM analyzes the conversation context to extract relevant information.

2. Text2SQL User-Specific Memory Extraction (Red)

This specialized component transforms general conversation elements into structured, database-relevant memories:

  • Entity Extraction: Identifies database entities (tables, fields) that the user frequently references
  • Preference Capture: Records filtering and sorting preferences specific to each user
  • Terminology Recognition: Maps user-defined terms to their database equivalents
  • Metric Definition: Stores custom calculations or criteria defined by the user

3. Update Phase (Green)

The update phase maintains a consistent knowledge base:

  • Similarity Search: The system finds similar existing memories using the vector database
  • Operation Classification: The system determines whether to:
    • ADD: Create new memories when no similar ones exist
    • UPDATE: Enhance existing memories with new information
    • DELETE: Remove outdated or contradicted memories
    • NOOP: Make no changes when information is redundant

All operations center around the Vector Database at the top, which stores memories with proper user isolation, enabling personalized responses while maintaining privacy between users.

Implementing Multi-User Memory Storage

The heart of our text2SQL memory system is its PostgreSQL-based storage implementation for production deployments.

This implementation leverages the pgvector extension for efficient similarity searches across high-dimensional embedding vectors, handling three critical aspects:

  1. Vector-based similarity search: Stores embeddings as native vector types, performing cosine similarity operations directly in the database without transferring large vector data.
  2. User isolation: The user_id field serves as both security boundary and performance optimization, ensuring one user's memories don't affect another's.
  3. Hierarchical indexing: HNSW (Hierarchical Navigable Small World) index accelerates similarity searches through a layered graph structure, enabling logarithmic-time approximate nearest neighbor searches.
Python
class PostgresMemoryStore:
    """PostgreSQL-based memory storage with user isolation"""

    def __init__(self, connection_string: str):
        self.conn_string = connection_string
        self.embedding_dim = 1536  # OpenAI embedding dimension
        self._init_db()

    def _init_db(self):
        """Initialize database tables if they don't exist"""
        try:
            import psycopg2

            # Connect to PostgreSQL
            conn = psycopg2.connect(self.conn_string)
            cursor = conn.cursor()

            # Ensure vector extension is loaded
            cursor.execute("CREATE EXTENSION IF NOT EXISTS vector;")
            conn.commit()

            # Create memories table with user_id field
            cursor.execute("""
            CREATE TABLE IF NOT EXISTS memories (
                id SERIAL PRIMARY KEY,
                user_id TEXT NOT NULL,
                content TEXT NOT NULL,
                created_at FLOAT NOT NULL,
                source TEXT,
                metadata JSONB DEFAULT '{}'::JSONB,
                embedding vector(1536)
            );
            """)

            # Create index for vector similarity search
            cursor.execute("""
            CREATE INDEX memories_embedding_idx
            ON memories USING hnsw (embedding vector_cosine_ops);
            """)

The database schema maintains three core tables:

  • memories: Stores the actual memory content with embeddings
  • conversation_summaries: Maintains compressed representations of conversation history
  • recent_messages: Caches the most recent interactions for immediate context

Connecting to the Data Platform

This API client serves as a flexible connector between our memory system and the underlying data platform. Though our implementation uses Denodo in this example, the architecture is designed to be platform-agnostic and can easily be adapted to work with any database system:

  • Snowflake: Extract metadata using Snowflake's Information Schema views
  • Databricks: Access catalog metadata through Unity Catalog API
  • SQL Server/Oracle/PostgreSQL/MySQL: Generate semantic models through their respective system catalogs
Python
class DenodoAPIClient:
    """Client for interacting with Denodo data virtualization platform"""

    def __init__(self, api_host: str):
        self.api_host = api_host

    async def get_metadata(self, database_name: str,
                          username: str, password: str) -> Dict[str, Any]:
        """Get metadata from data platform"""
        try:
            url = f"{self.api_host}/getMetadata"

            headers = {
                "Content-Type": "application/json",
                "Authorization": self._get_basic_auth_header(username, password)
            }

            params = {
                "database_name": database_name,
                "include_schema": True
            }

            response = requests.get(url, headers=headers, params=params)

            if response.status_code != 200:
                return {"error": f"Status code: {response.status_code}"}

            return response.json()
        except Exception as e:
            return {"error": str(e)}

The Memory Agent Implementation

The Mem0Agent class serves as the cognitive core of our system, orchestrating the extraction, retrieval, and management of memories across user sessions.

Python
class Mem0Agent:
    """Memory agent inspired by Mem0 architecture for text2SQL"""

    def __init__(self, use_postgres: bool = False,
                 postgres_conn_string: str = None):
        # Initialize the appropriate memory store
        if use_postgres and postgres_conn_string:
            self.store = PostgresMemoryStore(postgres_conn_string)
            print("Using PostgreSQL for memory storage")
        else:
            self.store = JSONMemoryStore()

        self.db_schema = None
        self.current_user_id = None
        self.max_recent_messages = 10

    def load_user_context(self, user_id: str):
        """Load context for a specific user"""
        self.current_user_id = user_id
        self.memories = self.store.load_memories(user_id)
        self.conversation_summary = self.store.get_conversation_summary(user_id)
        self.recent_messages = self.store.get_recent_messages(user_id)

This implementation uses the strategy pattern to abstract storage mechanisms, allowing flexibility between PostgreSQL (production) and JSON (development).

Memory Extraction and Update Processes

The core of our system is how it extracts and updates memories:

Python
async def extract_memories(self, message_pair: List[str]) -> List[str]:
    """Extract salient memories from a message pair using LLM"""

    prompt = f"""You are an intelligent memory extraction system.
Your task is to identify important information from conversations
about database queries that should be remembered for future reference.

Context:
Conversation summary so far: {self.conversation_summary}
Recent messages: {self.recent_messages}
Current message pair:
Human: {message_pair[0]}
AI: {message_pair[1]}

Extract 0-3 concise, salient facts that would be useful to remember
in future database queries.

Pay special attention to the following categories and tag them accordingly:
1. [PREFERENCE] User's query preferences or filters
2. [TERM] Custom terminology or abbreviations defined by the user
3. [METRIC] Custom metrics or calculations defined by the user

Format each memory as a single, concise sentence with the appropriate
tag prefix. If no important information is worth retaining, return an
empty list.

Extracted memories:"""

    # Send prompt to LLM and parse response...

    return memories

The memory extraction process uses zero-shot learning where an LLM acts as a semantic filter to identify information worth remembering.

Demonstrating the Power of Long-Term Memory in Text2SQL

Let's see how our implementation helps two different users, Sandra and Alex, interact with queries using the bank database.

User 1: Sandra (Financial Analyst)

Sandra's First Session

Sandra's First Session - Image by Author

Session Flow

  1. Authentication: Sandra logs in, establishing her identity.
  2. Database Connection: System loads bank database schema to understand tables and relationships.
  3. Initial Query: "How many loans do we currently have in the system?"
  4. Preference Setting: "I'm only interested in approved loans going forward."
  5. Follow-up Query: "Show me the loans with the highest interest rates."
  6. Memory Application: System automatically filters to show only approved loans with high rates.

Memory System in Action

The right panel shows the memory system working:

  • Preference Memory: Stores "User is only interested in approved loans" and applies this filter automatically.
  • Entity Memories: Tracks tables Sandra is interested in.
  • Metric Memory: Records how loans were counted for consistency.
  • Database Context: Maintains schema knowledge for accurate SQL generation.
Sandra's Session 2

Sandra's Session 2 (3 days later) - Image by Author

When Sandra asks "Let's define high-risk loans as those with credit scores below 750. How many high-risk loans in California?", the system:

  1. Updates her metric definition (threshold from 650 to 750)
  2. Applies her approved loans preference automatically
  3. Generates SQL incorporating both preferences
  4. Acknowledges with "(Note: I've applied your previously expressed preferences)"

User 2: Alex (Property Manager)

Alex's Session

Alex's Session - Image by Author

This demonstration shows how the text2SQL agent with long-term memory creates a personalized experience for Property Manager Alex:

  1. Regional preference: Alex specifies he works with California properties, stored as a user preference.
  2. Custom terminology: Alex defines "luxury properties" as those valued over $300,000.
  3. Smart confirmation: When asked about luxury properties, system confirms whether to apply his California preference.
  4. Personalized results: After confirmation, system applies both preferences in the SQL query.

Performance Optimization for Production

For production deployment, consider these optimizations:

  1. Embedding cache: Store frequently used embeddings to reduce API calls
  2. Database indexing: Ensure proper vector and text indices on memory tables
  3. Batch processing: Update memories in batches rather than individually
  4. Connection pooling: Maintain a pool of database connections for better throughput
  5. Asynchronous execution: Process memory operations in the background when possible

Key Benefits for Text2SQL Agents

Adding long-term memory to text2SQL agents provides several advantages:

  1. Personalized experiences: Users don't need to repeat preferences in every session
  2. Terminology consistency: Custom definitions and abbreviations persist across sessions
  3. Reduced friction: Users can build on previous queries without reestablishing context
  4. Multi-user support: Each user maintains their own memory context
  5. More natural interactions: The agent remembers important details like humans do

Conclusion

Long-term memory transforms text2SQL agents from basic query translators into collaborative partners that learn user preferences over time. By implementing this memory system inspired by the Mem0 architecture, I have created a text2SQL agent that remembers what matters.

The system extracts, stores, and retrieves important facts from conversations, allowing agents to maintain context across sessions, remember user preferences, learn custom terminology, and provide increasingly personalized experiences.

While I have used Denodo as the data platform for this demonstration, the same architecture works equally well with Snowflake, Databricks, or traditional databases. The key innovation lies in the memory system itself, which remains consistent regardless of the underlying data platform.

Try It Yourself

Full source code and examples available on GitHub

Need Help Implementing This?

I offer consulting services for Text-to-SQL systems with long-term memory capabilities. Let's discuss your requirements.

Schedule a Consultation