Learning Library

← Back to Library

Kubeflow Pipelines Streamline the MLOps Journey

3m • Unknown Channel • ai-ml • tutorial • intermediate • Watch on YouTube ↗

Key Points

Data scientists follow a repeatable workflow—data prep/EDA, feature engineering, model training/tuning, deployment, and continuous monitoring—much like a 4‑year‑old’s busy schedule before bedtime.
Kubeflow applies MLOps principles to automate and streamline this workflow by breaking each stage into independent, reusable pipeline components (e.g., separate Jupyter notebooks for EDA, training, and tuning).
These pipeline components are portable; once a block works it can be executed on anything from a local laptop to a large Kubernetes cluster and reused across multiple projects.
Kubeflow leverages Kubernetes under the hood, allowing the same pipeline to scale seamlessly while handling the underlying infrastructure automatically.
A Python SDK in Kubeflow lets data scientists define the required YAML configuration programmatically, eliminating the need to write verbose Kubernetes manifests by hand.

Sections

Full Transcript

# Kubeflow Pipelines Streamline the MLOps Journey **Source:** [https://www.youtube.com/watch?v=Dbwj-NHnHfw](https://www.youtube.com/watch?v=Dbwj-NHnHfw) **Duration:** 00:03:42 ## Summary - Data scientists follow a repeatable workflow—data prep/EDA, feature engineering, model training/tuning, deployment, and continuous monitoring—much like a 4‑year‑old’s busy schedule before bedtime. - Kubeflow applies MLOps principles to automate and streamline this workflow by breaking each stage into independent, reusable pipeline components (e.g., separate Jupyter notebooks for EDA, training, and tuning). - These pipeline components are portable; once a block works it can be executed on anything from a local laptop to a large Kubernetes cluster and reused across multiple projects. - Kubeflow leverages Kubernetes under the hood, allowing the same pipeline to scale seamlessly while handling the underlying infrastructure automatically. - A Python SDK in Kubeflow lets data scientists define the required YAML configuration programmatically, eliminating the need to write verbose Kubernetes manifests by hand. ## Sections - [00:00:00](https://www.youtube.com/watch?v=Dbwj-NHnHfw&t=0s) **Data Science Workflow Meets Toddler Routine** - The speaker humorously equates a data scientist’s end‑to‑end model pipeline—preparing data, training, deploying, and monitoring—to a 4‑year‑old’s pre‑bedtime chores, then highlights Kubeflow as an MLOps tool to simplify that process. - [00:03:03](https://www.youtube.com/watch?v=Dbwj-NHnHfw&t=183s) **Kubeflow Turns YAML into Python** - The speaker explains how Kubeflow replaces complex Kubernetes YAML configurations with a Python SDK, creating a streamlined, visual workflow for data scientists. ## Full Transcript

0:00The other day, a friend was telling me about his 4-year-old 0:03who's learning about R. 0:05The letter R, not the programing language. 0:07If you understood that joke, you're likely a data scientist. 0:09Today I want to explore 0:11some of the things that data scientists have in common with 4-year-olds, 0:14and that is we have a lot to get done before bedtime. 0:17As a data scientist myself, the process of getting a model out 0:21is usually along the following steps. 0:25You start out with data, you have some data that's prepared 0:27or given to you by the business, 0:29and you have to do either some EDA on it, 0:31Exploratory Data Analysis, 0:34or some feature engineering and prepping just to make sure it's ready. 0:38The next step is training, and this can be the fun part 0:41because you get to choose different models, research different different models, 0:45but eventually you might have to fine tune 0:47and make sure that it's ready for the next step, which is deployment 0:53At the deployment stage, 0:54We're looking at putting this in front of customers, 0:57or putting this in front of whatever endpoint that's going to receive it. 1:01And this might be an API or a front end. 1:04And finally, we always have to monitor 1:08and we monitor just to see the efficacy of the model, 1:11make sure that the data drift isn't too much, 1:14or maybe the business needs might change, 1:16which usually kick starts this process all over again. 1:19I learned about something called Kubeflow that can simplify 1:23and improve this whole process. 1:28So what is Kubeflow? 1:30Kubeflow is a tool that uses MLOps 1:33principles in data science. 1:36So whereas I used to do this process manually step-by-step, 1:40a different one for each project, 1:42Kubeflow breaks this down into something called Kubeflow pipelines 1:46"KFP", abbreviated. 1:49And what is the Kubeflow pipeline? 1:50It looks something like this. 1:54Each step of the design process can be broken out 1:57and simplified into individual logical blocks of operation. 2:03So first of all, we might have EDA take place 2:06in one block using a Jupyter notebook. 2:09The next step might be training its own notebook 2:13and then tuning a notebook. 2:17or you might use a tool, 2:18and then finally deployment. 2:21The great thing about Kubeflow is that there is already a library 2:24of common data science protocols 2:26that you can plug in and just get ready to play. 2:31The great thing about each of these logical blocks is that they're portable. 2:34Once you have a block that runs, 2:37it can run anywhere and you can reuse it for different projects. 2:42The "kube" part of Kubeflow 2:43indicates and implies that it uses Kubernetes, which is correct. 2:47Once this process is defined, you can run it on a local laptop 2:53or scale it up for a bigger training sequence. 2:57Another cool thing about Kubeflow is, 2:59if you're familiar with MLOps, 3:01you're likely to have seen the YAMLs. 3:03A lot of YAMLs to define what each Kubernetes pod is doing. 3:08Kubeflow simplifies this by, 3:10first of all, introducing a Python SDK 3:12that you can define the same YAML in using Python, 3:17which you're probably more familiar with as a data scientist. 3:20So Kubeflow takes our sometimes disparate process of data science 3:24and streamlines it and creates a visual process 3:27for us to easily use and follow. 3:30I hope this helps with your understanding of Kubeflow. 3:33Thanks, if you want more relevant content for your 4 year olds, 3:36like this video and subscribe. 3:39If you have any questions, please drop them in the comments below.