Dieser Blogpost ist auch auf Deutsch verfügbar

Large Language Models (LLMs) struggle with transforming large datasets, particularly when performing aggregations that require calculations - essentially anything where you’d normally use GROUP BY or ORDER BY in SQL. But what if we need to retrieve extensive data from a system and process it with an LLM, but only have an API with limited functionality available? In this blog post, I present a solution I’ve implemented and tested.

My task was to build an assistant that finds and analyzes data in Salesforce CRM in response to natural language queries. Users can ask questions that may sound like database queries but often require a multi-stage strategy:

  • “I was on vacation. What happened with the Mustermann client in the last two weeks?”
  • “There’s a visit planned to Mustershop this afternoon, what should I absolutely keep in mind?”
  • “Which opportunities in my client portfolio should I prioritize?”

Strategy

In line with my blog post The Promise of Agents, I focused on a solution that makes no assumptions about which questions need to be answered. The LLM should determine the solution path itself using the tools I provide.

The tools access endpoints of the Salesforce API. I implemented functions for the following endpoints:

  • SOQL Query: SOQL is the universal query language in Salesforce. Objects are comparable to tables, and the SOQL syntax resembles SQL.
  • Read Object: Objects have a unique ID. This endpoint reads data for a specific object.
  • Describe Object: Retrieves metadata about the structure of Objects, including field definitions, constraints, connections to other Objects, etc.

With these capabilities, the LLM can inform itself about the data structure and formulate appropriate queries. GPT-5 demonstrates sufficient understanding of SOQL.

Obstacles

Customization

The Objects in Salesforce are extensively customized, with some Objects that don’t exist in standard Salesforce. My solution consists of two parts:

  • The system prompt includes rules guiding the LLM on field usage and relationship structures.
  • The LLM independently retrieves metadata for Objects on-demand using the Describe-Object tool, with results being cached.

This approach works effectively. The LLM ignores irrelevant Salesforce knowledge and adheres to the specific rules.

Weak SOQL

SOQL has significant limitations that often prevent calculating aggregations. Some calculated fields aren’t sortable, groupable, or filterable, despite being necessary for business purposes. Joins are also not always possible in the desired way. While specialized programs could address specific queries in Salesforce, this contradicted my strategy of building a universal system.

Solution: In-Memory Database as a Swiss Army Knife

My solution implements an in-memory SQL database that the LLM can freely use within a dialogue. I selected DuckDB because of its efficient JSON loader that imports SOQL results into a table in a single operation.

When making a SOQL query, the LLM can specify a self-chosen name for the target table in DuckDB. The results from Salesforce aren’t placed in the LLM context but are copied into the specified table. The LLM can then execute SQL queries against the in-memory database using another tool. Since the LLM knows exactly what data it retrieved from Salesforce, it can generate precise SQL without needing all the data in context.

The LLM consistently uses this approach when planning complex, multi-stage analyses or when encountering fields that aren’t sortable or groupable. It even executes joins across multiple intermediate results that would be impossible with SOQL alone.

The in-memory database resides in a time-based cache[1] and is assigned to exactly one dialogue. The dialogue forms the scope for the database, creating a data context associated with the dialogue context that relieves it while remaining precisely accessible to the LLM.

This approach allows us to process mass data precisely, deterministically(!), and efficiently without overwhelming the LLM context or burdening models with tasks they’re not optimized for.

The following image shows the schematic flow of a SOQL query, where results are stored in a DuckDB table and then queried using SQL.

Diagram showing a data flow process starting with a SOQL-Tool querying Salesforce (steps 1 and 2), passing results to DuckDB for table creation (steps 3-4), and retrieving SQL query results to a dialog context (steps 5-7).
SOQL query with in-memory DB

MCP

This solution would not work well with MCP (Model Context Protocol). The in-memory DB is tightly integrated with the dialogue managed by the Assistant app. Since the SOQL tool populates the DB (with the LLM determining whether data should be copied from Salesforce during the call), the DB would need to migrate to the MCP server, requiring the server to manage it. The server would become involved in dialogue management, and SQL results would need to travel through complex transport routes. When two components are logically so closely coupled, they should be considered a single component.

MCP would only complicate this architecture. Efficient solutions sometimes prohibit decoupling, as is the case here.

Conclusion

An in-memory database is a versatile, universal tool for LLMs when working with data sources that have limited query capabilities and when intermediate results need multi-stage processing. It provides the LLM with extensive capabilities for developing solution strategies.

GPT-5 has exceeded my expectations. Even GPT-5-mini navigates impressively well through the complex data schema. The full GPT-5 generates remarkably sophisticated solution paths. In one example, I observed a sequence of 15 tool calls creating and using multiple tables.

It’s also notable how the model understands error messages from invalid SOQL or SQL queries and makes intelligent corrections.

While deep reasoning does require longer response times, domain experts are likely to accept this when they see the thorough solution path and quality results.

Outlook

The next logical evolution after interactive assistants with tools would be agents that react asynchronously to events and continuously perform tasks for domain experts in the background (headless). A CRM system offers numerous opportunities for agents to proactively identify important information and provide insights. However, we should progress methodically. It will take time to understand how domain experts use these assistants and to identify additional capabilities that could be beneficial.

The major innovation remains agentive AI systems whose tasks aren’t predetermined in detail. As builders of such systems, we must master equipping LLMs with appropriate tools and guiding them in navigating data sources. This will enable entirely new categories of systems.

  1. OpenAI implements something similar with its code interpreter sandbox. You have 20 minutes to use it in dialogue, after which it's discarded.  ↩