Learning Library

← Back to Library

Data Scientist vs AI Engineer

10m • Unknown Channel • ai-ml • tutorial • intermediate • Watch on YouTube ↗

Key Points

Generative AI’s rapid breakthroughs have spun off a distinct discipline—AI engineering—positioning AI engineers as the emerging “sexiest job” of the 21st century.
Data scientists act as “data storytellers,” using descriptive (EDA, clustering) and predictive (regression, classification) analytics to turn messy raw data into insights about past and future events.
AI engineers are “AI system builders” who leverage foundation models to create generative AI solutions that reshape business processes.
Their primary focus is on prescriptive use cases, such as decision‑optimization and recommendation‑engine design, which determine the best possible actions for an organization.
The speaker, a former data scientist turned AI engineer at IBM, outlines four key areas where the roles differ, emphasizing the shift from insight generation to actionable AI‑driven system design.

Sections

00:00:00 Data Scientist vs Generative AI Engineer - The speaker explains how the rise of generative AI has birthed a distinct AI engineering role, contrasting it with traditional data scientists across four key areas of work.

Full Transcript

# Data Scientist vs AI Engineer **Source:** [https://www.youtube.com/watch?v=Vxw0nE1qfZc](https://www.youtube.com/watch?v=Vxw0nE1qfZc) **Duration:** 00:10:32 ## Summary - Generative AI’s rapid breakthroughs have spun off a distinct discipline—AI engineering—positioning AI engineers as the emerging “sexiest job” of the 21st century. - Data scientists act as “data storytellers,” using descriptive (EDA, clustering) and predictive (regression, classification) analytics to turn messy raw data into insights about past and future events. - AI engineers are “AI system builders” who leverage foundation models to create generative AI solutions that reshape business processes. - Their primary focus is on prescriptive use cases, such as decision‑optimization and recommendation‑engine design, which determine the best possible actions for an organization. - The speaker, a former data scientist turned AI engineer at IBM, outlines four key areas where the roles differ, emphasizing the shift from insight generation to actionable AI‑driven system design. ## Sections - [00:00:00](https://www.youtube.com/watch?v=Vxw0nE1qfZc&t=0s) **Data Scientist vs Generative AI Engineer** - The speaker explains how the rise of generative AI has birthed a distinct AI engineering role, contrasting it with traditional data scientists across four key areas of work. ## Full Transcript

0:00for many years data science has been 0:02called the sexiest job of the 21st 0:04century but in recent years it seems 0:06like there's a new job buying for that 0:08title the AI engineer so who even are 0:11these New Kids on the Block are they 0:12just data scientists in Disguise what's 0:15up y'all I'm Isaac key and I'm a former 0:17data scientist turn AI engineer at IBM 0:20to answer these questions I'm going to 0:21lay out four key areas in which the work 0:23of a data scientist differs from an AI 0:26engineer specifically a generative AI 0:28engineer but before before I dive into 0:30these differences we first have to 0:32understand more about what's happening 0:33in the industry so traditionally data 0:36scientists have always used AI models to 0:39do their analysis so what's changed well 0:42with the Advent of generative AI the 0:45boundaries of what AI can do are being 0:46pushed in ways that we've never seen 0:48before and so these breakthroughs have 0:50been so 0:51groundbreaking that generative AI has 0:54split off into its own distinct field 0:56and we call that AI 0:58engineering Okay so now that we 1:00understand the landscape let's dive into 1:02the differences the first area of 1:04difference lies in the use 1:07cases so at a very high level think of a 1:09data scientist as a data Storyteller 1:12they take massive amounts of messy real 1:14world data and they use mathematical 1:16models to translate this data into 1:18insights on the other hand think of an 1:20AI engineer as an AI system builder they 1:24use Foundation models to build 1:26generative AI systems that help to 1:28transform business process 1:31so since data scientists are fantastic 1:33storytellers they use a lot of 1:35descriptive analytics to describe the 1:37past one example of this is through 1:40what's called exploratory data analysis 1:42or Eda which is all about graphing the 1:45data and doing statistical inference 1:48they can also do this through what's 1:50called 1:52clustering which group similar data 1:54points based off of similar 1:55characteristics such as say doing 1:57customer segmentation 1:59now every good story has the reader 2:01trying to figure out what's going to 2:02come next and that's where predictive 2:05use cases comes in as opposed to a book 2:08however a data scientist does not have 2:09the end already written so they have to 2:12use what are called machine learning 2:13models to to make their predictions an 2:16example of this is called regression 2:19models which predict a numeric value 2:22such as say a temperature or Revenue 2:25another type of these models are 2:27classification models which predict a 2:30categorical value such as a success or a 2:33failure so putting on the AI engineering 2:37hat now one of the main use cases that 2:39AI Engineers work on are called 2:41prescriptive use cases which are all 2:43about uh choosing the best course of 2:45action an example of this is a technique 2:49called decision 2:51optimization which enables businesses to 2:54assess a set of possible actions and 2:56then choose the most optimal path based 2:58off a set of requir requirements or 3:01standards another example of a 3:02prescriptive use case is through uh 3:05creating what are called recommendation 3:08engines uh as an example this can 3:11involve suggesting uh targeted marketing 3:13campaigns for a select customer 3:16base in addition to prescriptive use 3:18cases there are also generative use 3:20cases hence the name generative AI now 3:23Foundation models which why I will touch 3:26on more in a bit enable the creation of 3:28what are called intell 3:32assistants uh for example a coding 3:34assistant or a digital 3:36adviser they also enable the creation of 3:39chat Bots as an example which enable 3:43conversational search through 3:45information retrieval and the 3:46summarization of various content so 3:48after we have a use case identified we 3:50need 3:51data now people say that data is a new 3:55oil because like oil you have to search 3:57for and find the right data and then use 4:00the right processes to transform it into 4:02various products which then power 4:04various processes for a data scientist 4:07the oil of choice is often structured 4:09data AKA tabular data uh do note that 4:12data scientists still work with 4:14unstructured data but not as much as AI 4:16Engineers now these tables are often in 4:20the order of hundreds to hundreds of 4:23thousands of 4:24observations and they require a lot of 4:27cleaning and pre-processing before uh 4:29the data can be modeled uh some of the 4:31cleaning involved for example involves 4:34uh removing outliers or joining and 4:37filtering on a new table or even 4:40creating new features alog together this 4:45clean data is then used to train various 4:47machine learning 4:48models now on the other hand an AI 4:51engineer for them the oil of choice is 4:54mainly unstructured data such as text 4:57images videos audio files Etc 5:00uh let's take a text-based foundation 5:02model called an llm or large language 5:05model as an example these models require 5:08anywhere between billions to trillions 5:11of tokens of text to be trained on which 5:14is a lot larger scale compared to 5:16traditional machine learning models this 5:18leads me to the next area of difference 5:20which is the underlying 5:24models so the data science toolbox 5:27consists of hundreds of different models 5:30and different algorithms that they can 5:33choose 5:34from due to the nature of these models 5:37each different use case requires 5:39Gathering a different data set and thus 5:41requires training a different model and 5:43so as a result the scope of these 5:46individual models is a lot more narrow 5:50meaning that it's harder for them to 5:52generalize past the domain of data that 5:54they've been trained on and generally 5:57speaking these models are a lot smaller 5:59and size in terms of the number of 6:02parameters they take less compute power 6:05to train and do inference and they 6:08require less time to 6:09train anywhere between seconds to 6:13hours now on the other hand the 6:16generative AI toolbox is a lot less 6:18cluttered and it really only contains 6:20one type of model and that is called the 6:23foundation model now Foundation models 6:25are revolutionary because they allow for 6:27one single type of model to generalize 6:29to a wide range of tasks without having 6:31to be retrained thus their scope is 6:36called more 6:37wide and due to the sophistication of 6:40these models they are a lot larger in 6:43size often billions of 6:46parameters they acquire require a lot 6:48more compute power to train we're 6:51talking hundreds to thousands of 6:53gpus and they require a lot more 6:56training 6:57time now we're talking anywhere between 6:59weeks to 7:01months due to the differences in the 7:03intrinsic nature between traditional 7:05machine learning models and Foundation 7:07models this also means that the 7:09underlying processes and techniques that 7:13are used to develop Solutions with these 7:15also differ so a typical data science 7:19process will look something like this 7:21you start off with a use case and then 7:23from that use case you pick the right 7:26data then after that data is prepared 7:28you use it to to train and validate a 7:31model using techniques such as feature 7:34engineering cross validation or 7:37hyperparameter tuning as an example this 7:40model then is 7:41deployed at some endpoint for example in 7:44the cloud to do real-time prediction and 7:47inference now on the other hand the 7:50generative AI 7:52process also starts off with a use case 7:55but then we can skip directly to working 7:57with a pre-trained model 8:00and what makes this possible is a 8:01phenomenon called AI democratization 8:04which is a big fancy word that simply 8:05means making AI more widely accessible 8:08to Everyday users some of the best 8:10foundation models out there are 8:12published to open source communities 8:13such as hugging face and since these uh 8:16models are so generalizable and so 8:18powerful out of the box they make it 8:20easy for developers to get started AI 8:23Engineers interact with these Foundation 8:25models via natural language instructions 8:27to prompt them to do various tasks and 8:30this process is known as prompt 8:34engineering now prompt engineering can 8:36be used in conjunction with different 8:38Frameworks to then build larger AI 8:41systems an example of these Frameworks 8:43include uh as one chaining different 8:46prompts together or doing what's called 8:49parameter efficient fine-tuning or PFT 8:52on domain specific 8:54data or doing retrieval augmented 8:57generation AKA rag to ground answers in 9:00truth or even by creating autonomous 9:04agents uh to reason through very complex 9:06multi-step 9:08problems so these are just a few of the 9:10examples of the building blocks that can 9:12be used to build larger AI 9:14applications the last step is to then 9:17embed the AI in a larger system or 9:20workflow Um this can take on the form of 9:23creating assistants or virtual agents uh 9:26building a larger application uh with a 9:29UI or even doing some sort of 9:31automation so okay let's take a step 9:34back and let's look at all the 9:35differences at a very high level as we 9:37can see the breakthroughs in generative 9:39AI underpin many of the differences in 9:41the use cases data models and processes 9:45that data scientists and AI Engineers 9:47work on it's important to note that 9:49there is still overlap between the two 9:51fields for example uh data scientists 9:54will still work on prescriptive use 9:55cases or an AI engineer will still work 9:58with structured data 10:00regardless of these differences both of 10:02these fields are continuing to evolve at 10:04a blazing Fast Pace with new research 10:06papers new models new tools coming out 10:09every single day with data Ai and a 10:12creative mind really anything is 10:14possible with these thank you for tuning 10:16in I hope this was helpful until next 10:18time 10:20peace if you like this video and want to 10:22see more like it please like And 10:24subscribe if you have any questions or 10:26want to share your thoughts about this 10:27topic please leave a comment below e