Learning Library

← Back to Library

From Cloud AI to Distributed AI

Key Points

  • Niru Desai explains that **distributed AI** enables scaling of data and AI workloads across hybrid environments—public cloud, on‑premises, and edge—while providing unified lifecycle management.
  • He traces the evolution from **cloud‑centric AI** (centralized training and inference with data streamed from plants to a core cloud) to **edge‑focused AI**, where more processing happens locally to reduce latency, bandwidth use, and sensitivity concerns.
  • The move toward **distributed AI** addresses key business challenges such as intermittent connectivity, large‑volume data transfers, and the need for real‑time decision making at remote sites.
  • IBM’s new distributed‑AI capabilities are available for free via the **IBM API Hub**, allowing developers to experiment with the tools and platforms that support this paradigm.

Full Transcript

# From Cloud AI to Distributed AI **Source:** [https://www.youtube.com/watch?v=jevuDDjFEsM](https://www.youtube.com/watch?v=jevuDDjFEsM) **Duration:** 00:15:57 ## Summary - Niru Desai explains that **distributed AI** enables scaling of data and AI workloads across hybrid environments—public cloud, on‑premises, and edge—while providing unified lifecycle management. - He traces the evolution from **cloud‑centric AI** (centralized training and inference with data streamed from plants to a core cloud) to **edge‑focused AI**, where more processing happens locally to reduce latency, bandwidth use, and sensitivity concerns. - The move toward **distributed AI** addresses key business challenges such as intermittent connectivity, large‑volume data transfers, and the need for real‑time decision making at remote sites. - IBM’s new distributed‑AI capabilities are available for free via the **IBM API Hub**, allowing developers to experiment with the tools and platforms that support this paradigm. ## Sections - [00:00:00](https://www.youtube.com/watch?v=jevuDDjFEsM&t=0s) **Introducing Distributed AI Paradigm** - Niru Desai outlines IBM’s concept of Distributed AI, tracing its evolution from cloud‑based AI to AGI, the business challenges it solves, and how developers can experiment with related APIs on IBM’s API Hub. ## Full Transcript
0:00hi i'm niru desai from ibm i'm here to 0:02talk to you about distributed ai 0:04distributed ai is a paradigm of 0:06computing that allows you to scale your 0:09data and ai applications across 0:11distributed cloud environments 0:13distributed cloud environments as you 0:15may be familiar 0:16allow you to have a single pane of glass 0:19application life cycle management across 0:21public cloud on premise and edge 0:23environments 0:25now 0:26as we 0:27look at the emergence of distributed ai 0:31i want to take you through the journey 0:32of how we arrived there we started with 0:35the cloud-based ai we go to agi and then 0:38we talk about distributed ai 0:40also i'm going to introduce to you the 0:43challenges that distributed ai helps you 0:46address in your business 0:48finally 0:49all the capabilities that we are 0:51creating 0:52for enabling distributed ai 0:55are available for you to try freely at 0:57ibm api hub see the link in the 0:59description 1:01without further ado let me take you 1:03through the journey of where we've been 1:06first we're gonna talk about 1:09cloud-based 1:13ai 1:14what happens here 1:17is you have a 1:19let's take a concrete example 1:22so that concrete example is going to 1:24involve a plant it actually could be any 1:26location where you have your business 1:29operations 1:31and you're making some local decisions 1:33on the other side of the picture 1:36you have some kind of core location 1:37could be your enterprise data center 1:39could be public cloud 1:42i'm just going to 1:43take an example here so let's say it's a 1:46public cloud and what you have is in 1:49this public cloud you have 1:51some kind of kubernetes service with 1:54your data and ai 1:56middleware 1:58and then on top you're deploying one or 2:01more applications 2:02these applications when they are data ai 2:05based you may be actually doing some 2:08kind of training for your ai pipelines 2:10and you may be doing inferencing as well 2:14all right 2:15what happens on the business process 2:17side on your plant is that as the 2:20process takes place it generates a 2:22tremendous amount of data and all that 2:24data is getting pushed to the core 2:26location where the decisions are being 2:28made through the ai pipeline inference 2:30those decisions are then communicated 2:35back to your plant where it drives your 2:38downstream automation 2:40so 2:41clearly because you're sending all the 2:43data over to core and it could be a 2:45large amount of data it could be 2:47sensitive data 2:48uh it could run into the challenges of 2:51connectivity intermittent connectivity 2:53issues with 2:55your core location 2:57this has run into challenges this is why 2:59we are seeing emergence of 3:02what i'm going to call 3:05edge ai 3:08so what happens in aji 3:11so in the case of agi you still have 3:14your plant i'm going to draw a slightly 3:17bigger box here because more is going to 3:19happen at the plant 3:23and you still have your 3:26core location 3:29unlike before 3:30where most of the decision making 3:32actually was happening in the core 3:34you're going to have the decision making 3:36uh happening here 3:38you're going to take advantage of 3:39distributed cloud environments 3:41distributed cloud platform capabilities 3:43to make the application lifecycle from 3:45core to all your plans so what happens 3:48then 3:49is you actually have 3:54a container platform 3:56with data and ai middleware deployed 4:00on it 4:01and the application deployed right in 4:03your plant your core 4:06still does what it did before 4:09except it is now taking care of 4:11deploying 4:12the application 4:13and taking care of its life cycle 4:15so let's complete the picture here 4:18you have data in ai 4:20and then you also have the application 4:22deployed on it 4:25unlike before you're going to train your 4:28applications here 4:29deploy them 4:32through the distributed cloud platform 4:34in single pane of glass that's important 4:37uh and these applications are going to 4:39make inferences here so if your business 4:43process 4:45is taking place 4:48or your business operations are taking 4:49place here 4:51what happens is that your application is 4:53generating 4:54decisions 4:59and they are driving your business 5:01process and this process is then feeding 5:04data back 5:05to your computing stack that is implant 5:08so what we have done here is we localize 5:10decision making 5:11we no longer have to continuously send 5:13data up to a core location and wait for 5:16it to make a decision that then can 5:18automate our business process of course 5:20we still need to send some data over 5:24and we have to use that data to train 5:27this or retrain these ai pipelines and 5:29redeploy them 5:31so 5:32we made some progress when we switched 5:34from cloud base ai to distributed ai 5:37or actually aji pardon me 5:39but 5:40when we try to deploy 5:42this pattern 5:43across a large number of locations and 5:45across a large number of a large variety 5:47of applications we run into certain 5:50challenges and so we have then a need 5:54to 5:55address those challenges 5:58with the capabilities we are describing 6:01as distributed ai the pattern of 6:04distributed ai is very similar to aji 6:06but i'm going to replicate this and i'm 6:08going to move away 6:10from the terminology of edge and cloud 6:13and core to actually talk about what 6:15matters the most what matters the most 6:18is where is the data and where does it 6:20need to be analyzed so it is possible 6:22that you have a vast amount of data 6:24sitting in a public cloud but you want 6:27to consume the ai capabilities from 6:29another cloud in this case the first 6:32cloud is what we call 6:34as 6:35a spoke 6:41and 6:42this is where your data is 6:45on the other 6:46hand the cloud where you have the ai 6:49capabilities and the application and 6:51analytics is what we call the hub 6:54and this is where your control plane is 6:57see how it allows us to talk about hub 6:59and spokes 7:01where hub and spoke do not really have a 7:04connotation of cloud or edge or whether 7:07this is a mobile 7:09vehicle 7:10or this is a stationary data center it 7:12doesn't really matter what matters is 7:14your data here your control plane here 7:16you want to manage the deployment of 7:18applications from hub to spokes you also 7:21want to take control of the data and ai 7:23lifecycle from the hub so i'm going to 7:26for the sake of completeness 7:29complete this picture 7:31which looks not very different from what 7:33we have seen before 7:40so that i can take you through 7:43the challenges you're going to run into 7:45when we try to scale such a stack 7:48to a large number of 7:50spokes and large number of applications 7:54okay so let's say 7:56we have this application and you have 7:59you know you have your 8:01business process here 8:03the decisions 8:06are going down 8:08the data is coming back 8:11uh you are of course 8:14deploying 8:17through the hybrid distributed cloud 8:19environments and you're 8:22pushing some data over okay just to 8:24complete that picture 8:26so 8:27what happens 8:29when you have a potentially large number 8:32of 8:33these perks 8:35on which 8:36you're trying to 8:38enable this ai application 8:41the first thing that comes to mind 8:43is that because you're still 8:45collecting the data for training 8:48and you're pushing 8:50a large amount of training sets you know 8:51these models consume a large amount 8:53large amounts of data 8:55and you have large number of 8:56applications and large number locations 8:57doing that you're going to run into a 9:00challenge we call as 9:02data gravity 9:07it's just 9:08causing two main problems for you 9:11you're putting tremendous pressure on 9:13the resources in the hub to manage all 9:15that data 9:17and you're actually incurring 9:19costs in then having having to analyze 9:21the data having to train that data 9:24not to also mention some of the network 9:26bandwidth limitations that may come in 9:28your way especially as you try to do 9:30this for a large number of applications 9:32so data gravity is a key challenge 9:35the next challenge i want to introduce 9:37to you 9:38is the fact that each of these spokes 9:42may be slightly different you're 9:43probably manufacturing a slightly 9:44different product mix at each of your 9:46plants or each of your retail stores are 9:48serving a slightly different 9:49demographics because of that 9:52one model that you've trained or one 9:54pipeline that you've trained in your hub 9:56is not going to be fit for all your 9:58sports so there is always going to be 10:00a challenge in dealing with that 10:02heterogeneity and not having to do 10:05manual work so we'll get to 10:07how we address that in a second 10:09the third challenge is just the sheer 10:11scale we've talked about scale but it 10:13actually has two aspects 10:15one is just the number of spokes you 10:18have to deal with and the computational 10:20complexity of 10:21doing that 10:22training so many models deploying so 10:24many applications in so many locations 10:26the second part is the variety 10:30in 10:31applications and data 10:34remember 10:36in all these cases we have look where 10:37we've looked at data the data could come 10:40in many different types so you have 10:43data types of 10:44let's say images 10:47it could be sounds 10:50it could be sensor information 10:54it could also be lidar 10:57network 10:58information 11:00and time series information there is 11:02just a 11:03very wide variety of data modalities 11:06that 11:07different applications that you are 11:08trying to deploy and manage would need 11:10to consume so that variety in 11:12applications and data 11:14make it even harder for you to scale and 11:17accelerate deployments 11:19the last challenge i want to introduce 11:21to you is the challenge of resource 11:24constraints 11:26so 11:29although 11:31the spokes and the hubs may have some 11:34resource it is also quite common for 11:36some of the spokes such as plants and 11:37retail stores to have a finite amount 11:40and a small amount of resource so you 11:42have a resource budget that must be 11:44respected as you deploy your data and ai 11:46pipelines to them and that causes new 11:49challenges right so resource constraint 11:51is a key challenge as well now 11:55we are very excited that in ibm we have 11:59addressed these challenges head on 12:01and enable this distributed ai 12:04that scales across distributed cloud 12:06environments across locations and 12:08applications so how do we address 12:11data gravity well the key thing to do in 12:14addressing data gravity is to not 12:16collect all the data but only the 12:17important data so intelligent 12:22data collection is a key capability 12:26that 12:27we are going to bring to you 12:29and you can actually try it out through 12:31api hub as i mentioned earlier 12:34a lot of the data at the spokes is 12:36repetitive some of that is noisy so you 12:38don't want to necessarily collect all of 12:40it you only want to collect what's 12:42important and identifying what's 12:44important especially when you have a 12:45large number of locations and vast 12:47variety in the data modalities and 12:49applications is a challenging problem to 12:51solve 12:52the second part about heterogeneity 12:55uh basically means that 12:57when you are clear deploying your ai 12:59pipelines or applications across 13:00different spokes you want to target them 13:03you want to adapt to those spokes so 13:05adapting 13:08and also then after deployment you want 13:10to monitor 13:11so to make sure that they are performing 13:13well 13:14uh adaptation and monitoring at each of 13:17the spoke locations 13:18is critical in addressing the 13:20heterogeneity challenge 13:22and then in terms of the scale it simply 13:24means you need greater amount of 13:27automation 13:29in 13:30controlling 13:31your data in ai life cycles so 13:33automation of 13:35data lifecycle basically is 13:37about 13:38policy based decision making to see what 13:42data should stay where 13:44when should it be purged when should it 13:45be replicated where what 13:48policies in terms of data localization 13:51apply so that you can respect those 13:53constraints as you take care of data 13:54lifecycle automating that 13:57then 13:57lets you address the large number of 13:59locations that we've been talking about 14:01similarly ai life cycle 14:04can also be automated 14:06so starting from 14:08training the models deploying them 14:10monitoring them if the data or the 14:13environment drifts then retraining them 14:16collecting the rash the 14:17right kind of samples through 14:18intelligent data collection and then 14:20using them for retaining the model 14:22automating all that life cycle is 14:24critical as well because 14:26you may end up with hundreds if not 14:28thousands of different ai models and 14:30pipelines that are automating various 14:32aspects of your business as you start as 14:35you start scaling this 14:36lastly on resource constraints what is 14:39essential is that we have some 14:42ability to optimize 14:45the data in ai pipelines 14:53what this does is it does things like 14:56feature extraction a model compression 14:58pruning and some of those techniques it 15:01brings them to bed to make sure your 15:02resource budget is respected at all 15:04times during your 15:06pipeline execution so in summary 15:09we have introduced to you a new paradigm 15:12called distributed ai and we've 15:13introduced to you some capabilities that 15:16actually bring it alive distributed ai 15:19will allow you to scale applications to 15:21a large number of 15:22locations large number of spokes and it 15:26allows you to scale across wide variety 15:28of applications thank you 15:32thank you for watching this video if you 15:34are interested in more content like this 15:36please hit like and subscribe to this 15:38channel also please check out the links 15:40in the description which will get you 15:42started on distributed ai apis on ibm 15:45api hub 15:55you