SQL Sandwich Architecture for Cloud Analytics
Key Points
- The “SQL sandwich” architecture layers a data warehouse between two object‑storage tiers: raw data landing at the top and archived, cold data at the bottom.
- Raw logs, IoT streams, and other inexpensive, elastic storage reside in the upper object store, where they are explored, cleansed, and batch‑processed before entering the warehouse.
- The central data warehouse holds high‑quality, curated data for interactive analytics that demand low‑latency SLAs, despite its higher operational cost.
- Older, less‑frequently accessed data is moved to the lower object‑storage tier to reduce costs while still supporting compliance and long‑term reporting needs.
- An ETL (or ELT) pipeline acts as the “sauce,” extracting data from the landing store, transforming it, and loading it into the warehouse, then offloading stale data back to archive.
Sections
- SQL Sandwich Architecture Explained - Torsten Steinbach outlines a cloud analytics pattern that sandwiches a data warehouse between raw‑data object storage on top and archive object storage below, enabling cost‑effective ingestion, high‑quality processing, and long‑term retention.
- Data Archiving and Lifecycle Management - The speaker explains moving cold data from costly warehouses to cheap object storage via ETL, using a sandwich analogy to illustrate how archived data still supports compliance and other analytics while being managed as part of an overall lifecycle.
- Brief Expression of Thanks - The speaker offers a concise statement of gratitude.
Full Transcript
# SQL Sandwich Architecture for Cloud Analytics **Source:** [https://www.youtube.com/watch?v=Ausqifk0ZiM](https://www.youtube.com/watch?v=Ausqifk0ZiM) **Duration:** 00:06:32 ## Summary - The “SQL sandwich” architecture layers a data warehouse between two object‑storage tiers: raw data landing at the top and archived, cold data at the bottom. - Raw logs, IoT streams, and other inexpensive, elastic storage reside in the upper object store, where they are explored, cleansed, and batch‑processed before entering the warehouse. - The central data warehouse holds high‑quality, curated data for interactive analytics that demand low‑latency SLAs, despite its higher operational cost. - Older, less‑frequently accessed data is moved to the lower object‑storage tier to reduce costs while still supporting compliance and long‑term reporting needs. - An ETL (or ELT) pipeline acts as the “sauce,” extracting data from the landing store, transforming it, and loading it into the warehouse, then offloading stale data back to archive. ## Sections - [00:00:00](https://www.youtube.com/watch?v=Ausqifk0ZiM&t=0s) **SQL Sandwich Architecture Explained** - Torsten Steinbach outlines a cloud analytics pattern that sandwiches a data warehouse between raw‑data object storage on top and archive object storage below, enabling cost‑effective ingestion, high‑quality processing, and long‑term retention. - [00:03:18](https://www.youtube.com/watch?v=Ausqifk0ZiM&t=198s) **Data Archiving and Lifecycle Management** - The speaker explains moving cold data from costly warehouses to cheap object storage via ETL, using a sandwich analogy to illustrate how archived data still supports compliance and other analytics while being managed as part of an overall lifecycle. - [00:06:31](https://www.youtube.com/watch?v=Ausqifk0ZiM&t=391s) **Brief Expression of Thanks** - The speaker offers a concise statement of gratitude. ## Full Transcript
Hello, this is Torsten Steinbach, Architect at IBM for Data and Analytics in the Cloud
and I'm going to talk to you today about the SQL sandwich.
This is actually about databases and object storage.
So, as you can see in the image, we're putting it together as a nice and tasty sandwich.
So, at the center of the sandwich we have our data warehouse.
So, this is a very, well established mechanism of storing and analyzing data, big data but
we're surrounding it now with more recent technology and it is especially popular in
the cloud and object storage.
We have it as the bonds basically for a sandwich at the top and at the bottom.
Now, what's the purpose of having all of these components?
What we have actually at the top is our initial place where we are landing and storing all
of our raw data.
So, this might be things like log messages from applications, IoT messages from devices
that are just coming in and in a pretty raw format, we're able to just store them in the
object storage for very little money in a highly elastic manner.
And in the data warehouse however we want to have high quality data.
Because the data warehouse is a much more sophisticated but also much more expensive
component to operate a host and to purchase.
And at the bottom again, why do we have another object storage at the bottom?
Well, this is where we have our archived data.
Now, all of these components basically makes sense in an end-to-end big data analytics
use case because we have new data arriving and being stored in the object storage, and
here in object storage basically we are going to explore the data.
So, we try to find out what's actually in the data, and we prep the data, we prepare
the data, cleanse it, make it higher quality and more curated and we also conduct to some
extent batch analytics directly at the data that is stored in the object storage.
In a data warehouse however we are basically able to do interactive analytics that requires
certain SLAs for latencies of poor performance response times and so on.
And why are you having this archived data again, well this is basically because you
do not want to keep all of the data for years around in a data warehouse because as I told
you this is the more expensive thing to run with all of these components.
So, it makes sense for the data that is not hot anymore and is not required for your daily
business to archive it off again into an object storage.
So, for instance you can still run things like compliance reports that you're required
to do to be compliant with certain regulations.
So, these are the different types of analytics and some things that you want to do with the
data at the different stages of its life cycle basically.
Of course, a real sandwich should not be this dry so you will have some sauce in there.
So, what is the source basically that glues these things together.
It is basically an ETL mechanism.
It allows you basically to read data from here and transform it so that it can be read
here and same from here to here.
So, basically data is traveling this way.
Now finally you see a sandwich is often something that you do not eat into pieces right, you
do not just eat the patty or just the bun.
You eat it as a whole thing.
So, we consume it as a whole thing and it's also the same analogy that holds here that
while it makes sense to prepare a system that can serve these different types of workloads
in the most efficient way and cost effective way possible, it's of course hard if you always
have to think about do I have to go here, here, or here for this query.
It should be automatically figured out by the system and for that reason we are putting
this all into a nice box, putting it all together.
And we are using federation as a mechanism on top of all of that basically virtualizes
the location of the data depending on its age or state off the pipeline that it is in.
Now finally it is called a SQL sandwich, why is it called a SQL sandwich?
Well because SQL is the essential thing for all of these things that you can see here.
SQL is used for basically doing these analytics, these are SQL queries that we are running
here.
SQL is also used to do detail.
And finally, SQL is also used in order to federate those things together, It's a SQL
federation.
Okay, that's our SQL sandwich and I hope this helps you to put in perspective the roles
of object storage and data warehouses and how these two things can be put together into
this nice and tasty format so that you can get the most out of the technology in combination.
Thank you very much.