Learning Library

← Back to Library

Serverless Technology for Big Data Analytics

Key Points

  • Traditional big‑data analytics relied on highly‑integrated data warehouses, which excel at efficient query processing but are less flexible.
  • Hadoop disrupted this model around 2000 by introducing openness to diverse data formats, analytics libraries, languages, and heterogeneous hardware, gaining rapid industry adoption.
  • The rise of cloud computing combined with consumer‑driven “sharing economy” behaviors has created a new form factor for big‑data analytics: serverless, which treats compute resources as a shared team utility.
  • While many equate serverless with Function‑as‑a‑Service, it actually represents a broader cloud‑native execution model that abstracts away servers, enabling on‑demand scaling of any workload.

Full Transcript

# Serverless Technology for Big Data Analytics **Source:** [https://www.youtube.com/watch?v=HRfR4dJoKDc](https://www.youtube.com/watch?v=HRfR4dJoKDc) **Duration:** 00:07:02 ## Summary - Traditional big‑data analytics relied on highly‑integrated data warehouses, which excel at efficient query processing but are less flexible. - Hadoop disrupted this model around 2000 by introducing openness to diverse data formats, analytics libraries, languages, and heterogeneous hardware, gaining rapid industry adoption. - The rise of cloud computing combined with consumer‑driven “sharing economy” behaviors has created a new form factor for big‑data analytics: serverless, which treats compute resources as a shared team utility. - While many equate serverless with Function‑as‑a‑Service, it actually represents a broader cloud‑native execution model that abstracts away servers, enabling on‑demand scaling of any workload. ## Sections - [00:00:00](https://www.youtube.com/watch?v=HRfR4dJoKDc&t=0s) **Evolution from Data Warehouses to Serverless** - Torsten Steinbach outlines the shift from traditional data‑warehouse architectures through Hadoop’s open, flexible model to the emerging serverless approach for big‑data analytics. - [00:03:13](https://www.youtube.com/watch?v=HRfR4dJoKDc&t=193s) **Serverless Data Storage Explained** - The speaker clarifies that serverless goes beyond function‑as‑a‑service by including cloud‑native object storage that abstracts disk provisioning, provides durable, scalable access to data, and operates on a pay‑as‑you‑go consumption model. - [00:06:31](https://www.youtube.com/watch?v=HRfR4dJoKDc&t=391s) **Serverless Big Data Tradeoffs** - The speaker explains how serverless platforms offer performance, cost, and use‑case‑specific trade‑offs, creating a new form factor for big‑data analytics solutions. ## Full Transcript
0:00Hello, this is Torsten Steinbach, Architect at IBM 0:03for Data and Analytics in the Cloud, 0:05and today I'm going to talk to you about serverless technology 0:09and how it is applied to big data analytics. 0:12When we look at big data in the past few decades, 0:17we can see that there has been a traditional 0:20form factor of big data systems 0:23that has been used for many decades already and this is 0:27the form factor of a data warehouse. 0:31So, this is a highly integrated system, 0:34highly optimized for handing big data queries, big data analytics 0:40in a very efficient manner. 0:43Nevertheless, we had around the year 2000, 0:47Hadoop coming up and being adopted very rapidly, 0:52and gaining a lot of popularity and is now widely adopted in the industry. 0:57Even though there was already big data analytics, so why is that Hadoop came up? 1:01So, this is because it brought, 1:03in addition to this integrated system, more openness to the table. 1:07More openness, in terms of the type of data that it could handle, 1:11data formats, Bring Your Own Data (BYOD) formats, 1:14the types of analytics, analytics libraries, and languages that can be supported. 1:20And also, the flexibility in terms of the hardware, 1:24the deployment options that you can have. 1:27You can bring your custom hardware, or even have heterogeneous hardware. 1:30So, that's why Hadoop basically gained a lot of traction 1:33and is now widely adopted. 1:35Today, however, we are seeing a trend that basically 1:39results in yet another form factor 1:41of doing big data analytics, and this trend 1:44is driven by actually one thing that is happening 1:49which is era of the rise of cloud. 1:53And another thing that actually goes hand in hand a little bit with the rise of cloud 1:58is the consumption behavior of many people, of end users, 2:03to be more oriented on a sharing economy. 2:07So, people are using more and more just ride shares 2:11instead of just renting a car and not to speak of buying a car just to get around. 2:16Or they are just going with Airbnb to sleep a night somewhere. 2:20So, this consumer behavior is also applied now to a team. 2:23And this term "serverless" is actually exactly this: 2:29serverless is, in fact, the sharing economy for a team. 2:36And it is it is enabled by cloud. 2:39And it is, in fact, the most consequent usage model of cloud - serverless. 2:46And many of you have heard the term serverless, 2:50and probably most of you will associate a thing called "Function as a Service" with serverless. 2:56Many of you may think it's synonymous, 2:58which is not exactly true, but that is what basically many people think of 3:02and Function as a Service is: 3:04I have my code that I need to run, my business logic, 3:07but I don't provision dedicated systems, dedicated hardware, 3:11or not even dedicated software. 3:13I'm just sending it to service and saying, "please run it for me". 3:17Run it for me maybe that many times. 3:20So, how to scale out, it's all done ad hoc. 3:23It's basically hiding the fact that there are servers. 3:26That's why it's called serverless. 3:30Now, as I said, this is what many people think of 3:33when they hear the term serverless, 3:35but serverless is more than just function as a service. 3:38Especially when we now look back again at our domain here 3:41which is data, big data and analytics. 3:44The problem with big data analytics is that we are talking about state. 3:48State has to be kept, 3:49my data has to be kept safely, durably, and reliably. 3:55I need to be able to access it anytime I want it. 3:57And that's what these systems provide. 3:59But now in the cloud we have new options. 4:01We can actually extract the storage of data itself 4:05as a cloud service on its own. 4:07And that's also what's happening on the cloud and there is 4:11basically cloud-native storage of object storage. 4:19And object storage is basically serverless storage because 4:23you do not provision disk volumes, you do not configure disk volumes, 4:28and you just bring your data 4:30and the system figures out how to store it and 4:32how to distribute it, and make it highly available and so on. 4:36It's highly abstracted, you just have a REST API 4:40where you upload and download your data 4:43and you can come with kilobytes of data, going up to terabytes of data, 4:47in the same organizational unit. 4:49And the thing about why it is serverless, is also 4:53that it's a "Pay As You Go" consumption model. 4:58You just don't use it as you go, you also to pay as you go. 5:03Which means you're just paying for the 5:06gigabytes that you're storing at this point, right now. 5:08And if you store less, you will be paying less 5:11in a very elastic, completely seamlessly elastic way. 5:16Now, may we now talk about big data analytics. 5:19It's not just about storage of data, but also how can we 5:22analyze this data and process this data. 5:24And that's exactly what we are now seeing as well, driven by cloud, 5:27we are seeing additional services that are made available 5:31around object storage such as "SQL as a Service", 5:36or also it allows you to run SQL, 5:39basically, on the data in object storage 5:42and just be built for this one SQL, 5:44depending on how big the SQL was in terms of how much data it had to scan, 5:49and you do not pay for database that is provisioned and standing around. 5:53Just a single SQL and that's it. 5:56And there are other things that basically play into this 5:58like, for instance, messaging as a service, 6:02so Kafka as a service, 6:04where you are just paying by the number of messages being processed, 6:08and then eventually stored to the object storage. 6:11So there's a series of these services basically coming up, 6:14and in combination they are providing this new form factor 6:17of a big data and analytics system 6:20that is augmenting and actually complementing 6:24the existing form factors because even though 6:27they are more established and older, 6:29there is still a point for using them. 6:31Because they have their sweet spots in terms of their own 6:34performance characteristics and response time guarantees, 6:38but, on the other side, there are maybe cost effectiveness benefits here. 6:43So, depending on your business model and requirements, 6:47you may use this, or this, or the combination of those things. 6:51So, I hope this helps to put in perspective 6:53how serverless plays into big data analytics, 6:56and how it basically generates a whole new form factor of big data and analytics systems. 7:01Thank you very much.