Learning Library

← Back to Library

Semantic Layer + LLM for Scalable Queries

Key Points

  • The speaker highlights the difficulty of reliably answering complex business questions (e.g., “impact of customer satisfaction on sales”) from large, multi‑table databases.
  • The desired solution must be **scalable**, **accurate**, and **consistent**, delivering the same answer to identical or similar queries.
  • A naïve approach that pulls whole tables into a large language model fails at scale due to token limits and lack of business‑specific context.
  • To address scalability, the workflow inserts an intermediate step where the LLM generates targeted SQL queries that retrieve only the necessary data before summarization.
  • Consistency and correctness are further ensured by adding a **semantic layer** that maps database schemas to business terminology and includes custom calculations, enabling the LLM to interpret data in the organization’s context.

Full Transcript

# Semantic Layer + LLM for Scalable Queries **Source:** [https://www.youtube.com/watch?v=sfyNLcHHDOM](https://www.youtube.com/watch?v=sfyNLcHHDOM) **Duration:** 00:05:33 ## Summary - The speaker highlights the difficulty of reliably answering complex business questions (e.g., “impact of customer satisfaction on sales”) from large, multi‑table databases. - The desired solution must be **scalable**, **accurate**, and **consistent**, delivering the same answer to identical or similar queries. - A naïve approach that pulls whole tables into a large language model fails at scale due to token limits and lack of business‑specific context. - To address scalability, the workflow inserts an intermediate step where the LLM generates targeted SQL queries that retrieve only the necessary data before summarization. - Consistency and correctness are further ensured by adding a **semantic layer** that maps database schemas to business terminology and includes custom calculations, enabling the LLM to interpret data in the organization’s context. ## Sections - [00:00:00](https://www.youtube.com/watch?v=sfyNLcHHDOM&t=0s) **Semantic Layer + LLM for Consistent Data Queries** - The speaker outlines a structured, scalable method that combines a semantic layer with a large language model to reliably translate complex business questions—like the impact of customer satisfaction on sales—into accurate, repeatable database queries. - [00:03:04](https://www.youtube.com/watch?v=sfyNLcHHDOM&t=184s) **Semantic Layer with Metric Definitions** - The speaker outlines using an LLM to generate focused SQL queries and then improving accuracy by applying a semantic layer that encodes business vocabularies and calculations, complemented by metric definitions that capture KPIs and limit the tables and columns needed for each query. ## Full Transcript
0:00Imagine you are working with lots of data, any new database, 0:05hundreds of tables, several columns, and you being asked the question, 0:08what would the impact of that customer satisfaction on sales last months? 0:13It's a complex question that even though you have access to the data, it's very difficult to pull out the right data. 0:19That's come down to this to this one question and be able to answer the question every single time reliably. 0:26So in this video, I will walk you through a structured approach 0:30where we're going to combine the semantic layer with a large language model, 0:34so you can answer this kind of question consistently and reliably. 0:40Starting with the data source that you have, 0:43let's say it a database or a data warehouse, 0:46and here we have our user who is asking these questions and over there will put that answer. 0:57But before we talk about the approach, let's first hit some goals for the approach. 1:03So we have three goals. 1:04We have the approach to be scalable. 1:07We also want to have accurate and consistent and consistent. 1:12Here means everybody asking the same question, or similar question, 1:15they need to get the same answer. 1:17A very simple approach would be to pull everything from the database and answer the question 1:23and will describe that by would have the tables and columns and select, 1:34we select some of these tables to answer the question. 1:39And then we'll pass it to a query layer 1:44that will pull the data. 1:47Based on the select set of tables. 1:50Once we have the data pulled out from the database, we'll pass it to a summarized task, 2:01which will use a large language model to answer the question, 2:09and then based on that, we'll have the answer. 2:13That approach works nicely if you have a very generic and a small set of data. 2:19It will break if the data is a lot more than a few 2:23thousand of those because you're going to go beyond the token limit for that large language model. 2:29And also in the same time, even if the data is small, 2:32the model doesn't understand the business definitions, the 2:36vocabulary or any of the calculations you have is specific to your business. 2:40So let's first solve the the scalability issue, which is our first goal. 2:45We'll introduce SQL step in the middle. 2:53In this step, and instead of loading everything from the database, we will rely on 3:00the large language model using the set of tables to generate sequence. 3:06So this task will call the LLM using that set of tables. 3:11Get back how we find a SQL that will scope down 3:15just write what we need to answer the question. 3:18Then pass that through there to the 3:20query summarize it and get the answer. 3:24That's great. 3:24It will scale up to any amount of data you have, 3:27but we didn't solve the other goals yet. 3:30Now let's focus on the accuracy and the consistency 3:36for those who will introduce the semantic layer. 3:41Think of the semantic layer as an artifact you create 3:45to describe your tables columns using your business definitions and vocabulary. 3:52It will also include any calculations like how do you calculate the cost of goods, 3:56or how do you calculate the sales over whatever amount of period of time. 4:03Using that, you will be able to teach the model how to answer questions well in a way that's relevant to your business. 4:11But that's not enough. 4:12Semantic layers don't really capture KPIs, 4:15like how they track the revenue of its target or the projected values for for evidence, 4:21or when you track sales, you calculate sales over a month, a quarter or a year. 4:28So that's where we introduce the metric definition. 4:34Alongside with the benefit I just mentioned about the definition. 4:37It will also scope down the amount of columns and tables we are talking about 4:42because they will the meeting definition will only include those that are relevant to the KPI that you are interested in. 4:51So by using the semantic layer and the metric definition, 4:56we can pass specifically what's required to answer the question to the large language model. Get the relevant query. 5:05That's that now is a lot more relevant to our question and our business and get the answer. 5:10In the same time we think definitions when the above list to a centralized metric definition catalog, 5:18that catalog will make them available to all of your users in the business and hence it will add to the consistency. 5:26So now we have added the scalability, the accuracy, and the consistency by combining those techniques together.