Learning Library

← Back to Library

Kubeflow Pipelines Streamline the MLOps Journey

Key Points

  • Data scientists follow a repeatable workflow—data prep/EDA, feature engineering, model training/tuning, deployment, and continuous monitoring—much like a 4‑year‑old’s busy schedule before bedtime.
  • Kubeflow applies MLOps principles to automate and streamline this workflow by breaking each stage into independent, reusable pipeline components (e.g., separate Jupyter notebooks for EDA, training, and tuning).
  • These pipeline components are portable; once a block works it can be executed on anything from a local laptop to a large Kubernetes cluster and reused across multiple projects.
  • Kubeflow leverages Kubernetes under the hood, allowing the same pipeline to scale seamlessly while handling the underlying infrastructure automatically.
  • A Python SDK in Kubeflow lets data scientists define the required YAML configuration programmatically, eliminating the need to write verbose Kubernetes manifests by hand.

Full Transcript

# Kubeflow Pipelines Streamline the MLOps Journey **Source:** [https://www.youtube.com/watch?v=Dbwj-NHnHfw](https://www.youtube.com/watch?v=Dbwj-NHnHfw) **Duration:** 00:03:42 ## Summary - Data scientists follow a repeatable workflow—data prep/EDA, feature engineering, model training/tuning, deployment, and continuous monitoring—much like a 4‑year‑old’s busy schedule before bedtime. - Kubeflow applies MLOps principles to automate and streamline this workflow by breaking each stage into independent, reusable pipeline components (e.g., separate Jupyter notebooks for EDA, training, and tuning). - These pipeline components are portable; once a block works it can be executed on anything from a local laptop to a large Kubernetes cluster and reused across multiple projects. - Kubeflow leverages Kubernetes under the hood, allowing the same pipeline to scale seamlessly while handling the underlying infrastructure automatically. - A Python SDK in Kubeflow lets data scientists define the required YAML configuration programmatically, eliminating the need to write verbose Kubernetes manifests by hand. ## Sections - [00:00:00](https://www.youtube.com/watch?v=Dbwj-NHnHfw&t=0s) **Data Science Workflow Meets Toddler Routine** - The speaker humorously equates a data scientist’s end‑to‑end model pipeline—preparing data, training, deploying, and monitoring—to a 4‑year‑old’s pre‑bedtime chores, then highlights Kubeflow as an MLOps tool to simplify that process. - [00:03:03](https://www.youtube.com/watch?v=Dbwj-NHnHfw&t=183s) **Kubeflow Turns YAML into Python** - The speaker explains how Kubeflow replaces complex Kubernetes YAML configurations with a Python SDK, creating a streamlined, visual workflow for data scientists. ## Full Transcript
0:00The other day, a friend was telling me about his 4-year-old 0:03who's learning about R. 0:05The letter R, not the programing language. 0:07If you understood that joke, you're likely a data scientist. 0:09Today I want to explore 0:11some of the things that data scientists have in common with 4-year-olds, 0:14and that is we have a lot to get done before bedtime. 0:17As a data scientist myself, the process of getting a model out 0:21is usually along the following steps. 0:25You start out with data, you have some data that's prepared 0:27or given to you by the business, 0:29and you have to do either some EDA on it, 0:31Exploratory Data Analysis, 0:34or some feature engineering and prepping just to make sure it's ready. 0:38The next step is training, and this can be the fun part 0:41because you get to choose different models, research different different models, 0:45but eventually you might have to fine tune 0:47and make sure that it's ready for the next step, which is deployment 0:53At the deployment stage, 0:54We're looking at putting this in front of customers, 0:57or putting this in front of whatever endpoint that's going to receive it. 1:01And this might be an API or a front end. 1:04And finally, we always have to monitor 1:08and we monitor just to see the efficacy of the model, 1:11make sure that the data drift isn't too much, 1:14or maybe the business needs might change, 1:16which usually kick starts this process all over again. 1:19I learned about something called Kubeflow that can simplify 1:23and improve this whole process. 1:28So what is Kubeflow? 1:30Kubeflow is a tool that uses MLOps 1:33principles in data science. 1:36So whereas I used to do this process manually step-by-step, 1:40a different one for each project, 1:42Kubeflow breaks this down into something called Kubeflow pipelines 1:46"KFP", abbreviated. 1:49And what is the Kubeflow pipeline? 1:50It looks something like this. 1:54Each step of the design process can be broken out 1:57and simplified into individual logical blocks of operation. 2:03So first of all, we might have EDA take place 2:06in one block using a Jupyter notebook. 2:09The next step might be training its own notebook 2:13and then tuning a notebook. 2:17or you might use a tool, 2:18and then finally deployment. 2:21The great thing about Kubeflow is that there is already a library 2:24of common data science protocols 2:26that you can plug in and just get ready to play. 2:31The great thing about each of these logical blocks is that they're portable. 2:34Once you have a block that runs, 2:37it can run anywhere and you can reuse it for different projects. 2:42The "kube" part of Kubeflow 2:43indicates and implies that it uses Kubernetes, which is correct. 2:47Once this process is defined, you can run it on a local laptop 2:53or scale it up for a bigger training sequence. 2:57Another cool thing about Kubeflow is, 2:59if you're familiar with MLOps, 3:01you're likely to have seen the YAMLs. 3:03A lot of YAMLs to define what each Kubernetes pod is doing. 3:08Kubeflow simplifies this by, 3:10first of all, introducing a Python SDK 3:12that you can define the same YAML in using Python, 3:17which you're probably more familiar with as a data scientist. 3:20So Kubeflow takes our sometimes disparate process of data science 3:24and streamlines it and creates a visual process 3:27for us to easily use and follow. 3:30I hope this helps with your understanding of Kubeflow. 3:33Thanks, if you want more relevant content for your 4 year olds, 3:36like this video and subscribe. 3:39If you have any questions, please drop them in the comments below.