Learning Library

← Back to Library

Leveraging Open Source in Watson X

Key Points

  • IBM is extending its long‑standing open‑source heritage to Watson X, using community‑driven tools to deliver the best AI models and innovation.
  • Watson X’s model‑training and validation layer is built on the open‑source CodeFlare project, which abstracts scaling, queuing and deployment by integrating Ray, Kubernetes (OpenShift) and PyTorch.
  • CodeFlare automatically provisions clusters, queues jobs, scales resources up or down when needed, and tears down the environment after training, freeing data scientists from infrastructure concerns.
  • The platform represents and runs models with PyTorch, leveraging its tensor operations, GPU support and distributed‑training capabilities for large foundation models.
  • Complementary open‑source components also handle model tuning/inferencing and data gathering/analytics, completing an end‑to‑end AI lifecycle in Watson X.

Full Transcript

# Leveraging Open Source in Watson X **Source:** [https://www.youtube.com/watch?v=Cgiqx0pJuLo](https://www.youtube.com/watch?v=Cgiqx0pJuLo) **Duration:** 00:07:31 ## Summary - IBM is extending its long‑standing open‑source heritage to Watson X, using community‑driven tools to deliver the best AI models and innovation. - Watson X’s model‑training and validation layer is built on the open‑source CodeFlare project, which abstracts scaling, queuing and deployment by integrating Ray, Kubernetes (OpenShift) and PyTorch. - CodeFlare automatically provisions clusters, queues jobs, scales resources up or down when needed, and tears down the environment after training, freeing data scientists from infrastructure concerns. - The platform represents and runs models with PyTorch, leveraging its tensor operations, GPU support and distributed‑training capabilities for large foundation models. - Complementary open‑source components also handle model tuning/inferencing and data gathering/analytics, completing an end‑to‑end AI lifecycle in Watson X. ## Sections - [00:00:00](https://www.youtube.com/watch?v=Cgiqx0pJuLo&t=0s) **IBM Watson X Open‑Source Stack** - The segment explains how IBM’s Watson X platform uses open‑source tools—especially the CodeFlare project integrating Ray, Kubernetes (OpenShift) and PyTorch—to streamline model training, tuning/inferencing, and data analytics for large foundation models. ## Full Transcript
0:00IBM has a rich history of both 0:03contributing to open source and 0:04leveraging open source in its offerings 0:07and IBM continues that tradition with 0:09Watson X 0:10what is Watson X well that's our new 0:13Enterprise platform for AI and data and 0:16why do we leverage open source in Watson 0:18X well open source gives us the best AI 0:23it gives us the best innovation 0:26and it gives us the best models 0:28and so today we're going to look at the 0:30open source that's in Watson X and we're 0:32going to look at it from three different 0:33aspects we're going to look at it from 0:35model training and validation 0:37we're going to look at it from model 0:39tuning and inferencing 0:41and we're going to look at it from data 0:43Gathering and Analytics 0:46okay let's get started with model 0:48training and validation 0:50training and validating models can take 0:52a large amount of cluster resources 0:54especially when the models we're looking 0:56at are those huge multi-billion 0:59parameter Foundation models that 1:02everyone's talking about 1:04so to efficiently use a cluster and to 1:06make it easier for data scientists we 1:08have an open source project called code 1:10flare 1:11code Flair provides user-friendly 1:14abstractions for scaling queuing and 1:17deploying machine learning workloads it 1:19integrates Ray kubrey and pytorch to 1:22provide these features with Ray it 1:24provides a job abstraction kubrey allows 1:27Ray to run on kubernetes platforms like 1:29openshift and we'll talk a little bit 1:31more about pytorch in a minute 1:34let's look at a typical code flare use 1:36case again the first thing it's going to 1:38allow us to do is spin up array cluster 1:41it's then going to allow the data 1:44scientist to submit training jobs to the 1:47cluster if the openshift cluster is 1:49heavily used and there aren't resources 1:51available code flare is able to actually 1:53cue the jobs and wait till there's 1:55resources available to run the jobs 1:58and in some cases if the cluster is full 2:02it can actually be scaled up and so it's 2:05possible to actually scale up the 2:06cluster in certain cases from code flare 2:09and then when all the training and 2:10validation is done it can actually 2:12delete the array jobs and and take them 2:16off the cluster 2:17so again what's nice about code flare is 2:21it enables the data scientist to 2:23efficiently use a cluster or in some 2:26cases multiple openshift clusters 2:29and not have them worry about the 2:31infrastructure underneath 2:33we just looked at how we run model 2:37training and validation on a cluster but 2:40now let's look at how we actually 2:42represent those models 2:44and the open source project that we use 2:46to represent the models is pytorch 2:49pytorch provides some key features for 2:52representing models one of which is 2:55tensor support 2:57what's a tensor well it's a huge 2:59multi-dimensional array that supports 3:03all those weighted values or 3:05probabilities that are in the model that 3:07we tweak over time to get the model 3:08right to be able to predict things 3:11correctly 3:12the other key feature that pytorch 3:14provides are GPU support and distributed 3:17training 3:18when we train the models we're actually 3:22doing large amounts of computation and 3:25the gpus that Pi torch is able to 3:28effectively use allow us to do that very 3:30efficiently and pytorch also provides 3:32distributed training so with those large 3:34Foundation models that wouldn't fit on a 3:36single machine pytorch enables us to do 3:38distributed training across a large 3:40number of machines 3:42let's look at the key features that 3:44pytorch provides one of which is neural 3:47network creation there's different types 3:49of neural networks and pytorch makes it 3:51easy to create all the different popular 3:53types of neural networks 3:54pytorch also provides easy loading of 3:57data another key feature of of Pi 4:00torch's training Loops so built-in easy 4:02to use training Loops that are tweaking 4:04the model data to improve its ability to 4:07more accurately provide inferencing 4:11and finally pie torch also provides 4:13built-in model adjustments the key one 4:16here is the auto gradient calculation so 4:19think from your Calculus days when 4:20you're calculating gradients having that 4:23feature built in making the minor tweaks 4:26to the model so that it improves it and 4:28gets it over time to to do it be a 4:31better predictor and better usage this 4:34is what pytorch provides 4:36we just looked at how to represent 4:38models but now let's look at model 4:41tuning and inferencing and what do we 4:43mean by this well we want to be able to 4:46serve a large number of AI models and be 4:50able to do it at scale on openshift 4:54so the open source projects that we look 4:56at the first key one is k-serv model 4:59mesh so this is what we used to actually 5:01serve up the models and originally there 5:04was just k-serv which would allow us to 5:07put one model in a single pod so one pod 5:10one pod per model 5:12uh that's not very efficient at all and 5:15k-serv was merged with another open 5:17source project called Model mesh and 5:19model mesh is much better at at being 5:23able to efficiently get large thousands 5:27of models in a single pod 5:29so between these two technologies we're 5:31able to serve up thousands of models 5:33efficiently on an openshift cluster now 5:36where are we going to find all these 5:38models well 5:40hugging phase has over 200 000 models 5:43open source models it's typically 5:45referred to as the GitHub of models and 5:48IBM has a partnership with hugging face 5:50and again it's a great place to find 5:53great models to use on our IBM Watson X 5:56offerings the other key open source 5:59Technologies we have are kkit kkit is an 6:03open source project that provides apis 6:05for prompt tuning so again 6:08typically on the inferencing side you're 6:11you're serving up the models but you 6:13also in some cases need to do a little 6:15bit of tuning to to improve the results 6:18and kkit provides tuning apis to do that 6:20the next technology is kubeflow kuflow 6:25provides orchestration of machine 6:27learning workloads and again allow you 6:29to build those machine learning 6:31pipelines that you that you need to make 6:33life easy so again we have a a wonderful 6:36large number of Open Source projects 6:38that provide our prompt tuning and 6:40inferencing all running on openshift 6:43now let's switch gears and look at data 6:45Gathering and Analytics and the open 6:47source project that we use for that is 6:49presto 6:51what is presto presto is an SQL query 6:54engine and it's used for open data 6:58analytics and for the open data lake 7:00house 7:01and let's look at the key features that 7:03it provides 7:04high performance 7:07Presto is highly scalable 7:10it provides Federated queries 7:12and it's able to query the data where it 7:14lives 7:15I hope I've convinced you that Watson X 7:18has continued IBM's long tradition of 7:22contributing to open source and 7:23leveraging open source and its offerings 7:26if you'd like to learn more please check 7:28out the links below