Learning Library

← Back to Library

Generative AI Transforms Data Strategy

11m • Unknown Channel • ai-ml • interview • intermediate • Watch on YouTube ↗

Key Points

Data is the foundation of AI, and generative AI unlocks new value by effectively leveraging the massive, unstructured data that makes up most modern information.
Large language models can autonomously dive into huge volumes of text and code, spotting patterns and connections that would be difficult for humans to see without extensive preprocessing.
Generative AI can be applied to data‑management challenges, automatically normalizing and enriching heterogeneous legacy data across silos, turning scattered information into a cohesive, high‑quality asset.
Customizing and fine‑tuning enterprise‑specific large language models with an organization’s own data transforms that data into a sustainable competitive advantage and intellectual property.
IBM Watson X showcases how embedding generative AI into existing applications drives productivity gains, better business performance, and distinct market differentiation.

Sections

00:00:00 Data-Driven Power of Generative AI - In this IBM Watson X briefing, leaders explain how generative AI unlocks competitive advantage by turning massive, unstructured data into actionable insights, emphasizing that AI’s impact hinges on the quality and volume of data.

Full Transcript

# Generative AI Transforms Data Strategy **Source:** [https://www.youtube.com/watch?v=qtuzVc0N5o0](https://www.youtube.com/watch?v=qtuzVc0N5o0) **Duration:** 00:11:30 ## Summary - Data is the foundation of AI, and generative AI unlocks new value by effectively leveraging the massive, unstructured data that makes up most modern information. - Large language models can autonomously dive into huge volumes of text and code, spotting patterns and connections that would be difficult for humans to see without extensive preprocessing. - Generative AI can be applied to data‑management challenges, automatically normalizing and enriching heterogeneous legacy data across silos, turning scattered information into a cohesive, high‑quality asset. - Customizing and fine‑tuning enterprise‑specific large language models with an organization’s own data transforms that data into a sustainable competitive advantage and intellectual property. - IBM Watson X showcases how embedding generative AI into existing applications drives productivity gains, better business performance, and distinct market differentiation. ## Sections - [00:00:00](https://www.youtube.com/watch?v=qtuzVc0N5o0&t=0s) **Data-Driven Power of Generative AI** - In this IBM Watson X briefing, leaders explain how generative AI unlocks competitive advantage by turning massive, unstructured data into actionable insights, emphasizing that AI’s impact hinges on the quality and volume of data. ## Full Transcript

0:01patterns and relationships and vast 0:02amounts of data unlock entirely new 0:05possibilities sometimes we learn about 0:07our past or we discover something that 0:09helps us predict the future for some 0:11time we've been collecting data without 0:13even knowing what might come of it and 0:15then the volume just becomes 0:16overwhelming and that's when the 0:18relationship between data and AI gets 0:20really 0:22interesting hello and welcome to AI 0:24Academy my name is Love uger wall and 0:27I'm the worldwide sales leader for IBM 0:29Watson X and my partner today is Edward 0:31cisper vice president of product 0:33management for the IBM Watson X platform 0:35it's great to reconnect with you here 0:36today in this amazing IBM research 0:38facility I know this place is incredible 0:41this is an active working lab and we're 0:43in front of a prototype system designed 0:45for AI you and I both talk to a lot of 0:48clients and it seems like every 0:49conversation these days is about 0:51generative AI or gen AI well every 0:53conversation starts with AI but it 0:55usually ends with data because the truth 0:58is there is no AI without data and data 1:02is the only sustainable source of 1:04competitive advantage in business 1:06generative AI is changing how we think 1:08about data in a couple of ways first gen 1:12models can make much more effective use 1:14of unstructured data which is the 1:16majority of all new data gen can dive 1:19into large volumes of language data 1:21primarily documents or software code 1:23which is also a language and spot 1:25patterns or make connections without 1:27much preparation or supervision it can 1:29see things that we likely wouldn't see 1:31yes second we can apply gen to the data 1:35management problem itself and it can 1:37help us organize refin and enrich data 1:40so that it's higher quality and easier 1:42to consume right so for example if a 1:45client has different Legacy applications 1:47that have been encoding data in 1:49different ways like different formats or 1:51dates or people's names and initials or 1:54different columns and column headings 1:55and that sort of thing gen can help make 1:58sense of that and the data is often 2:01scattered everywhere gen can look at 2:03those siloed systems and begin to 2:05understand what the data is and how it 2:06could be related so you can take 2:08advantage of all of that data 2:09holistically and much more efficiently 2:12big savings in time and energy how 2:14effectively and efficiently you manage 2:17your data as real cost and business 2:19performance implications it becomes 2:21intellectual property and a point of 2:23competitive Advantage so let's talk more 2:25about gen and data as a source of 2:28competitive advantage high quality data 2:30is essential to helping Enterprises use 2:32gen to improve their business almost 2:35every one of our customers is at least 2:37experimenting with Gen today which means 2:39that everyone has access to essentially 2:41the same technology so it's the 2:42customization or tuning of the large 2:45language models with Enterprise data and 2:47the infusion of gen into new and 2:49existing Enterprise applications that 2:51dries productivity gains improve 2:53business performance and competitive 2:55Advantage so every company has data but 2:57there's really a spectrum of how 2:59different organizations take advantage 3:01of that data some might be stuck with an 3:03architectural problem like data being 3:05inaccessible or locked in silos across 3:08on Prem and Cloud environments the data 3:10silos problem is pervasive and you can't 3:13solve it by creating a new data Silo in 3:14the cloud so we have different 3:16approaches to solving it such as 3:17building a virtual data layer and 3:19quering data across multiple sources or 3:21consolidating data onto one platform 3:23like a data lake house which is open and 3:25cost effective and for other companies 3:28what's holding them back from unlocking 3:30the full value of their data is 3:32something more subtle it's almost a 3:34psychological barrier they have to shift 3:36how they think about data and turn the 3:39business into Data so that they can then 3:42turn their data into a business data 3:44monetization is nirvana when you can 3:47literally sell a version of your data 3:49effectively a byproduct of your core 3:51business as a product itself to create 3:53those products you need to ensure data 3:56Quality Security and governance so that 3:59it doesn't become a business risk or 4:01regulatory exposure so let's talk about 4:04those first you need just good 4:07traditional data quality practices do 4:09the things you already know you should 4:11be doing you need to catalog or organize 4:14your existing data into a business 4:16glossery now ai can assist with that 4:18work but the more you have ready the 4:21faster you can get value from Ai and the 4:23financial incentives to do so are just 4:25going up and up second is having 4:28thoughtful data access policies like 4:30with any user you really need to Define 4:33what information you want AI to be able 4:36to access or said differently which 4:38information you want to remove or redact 4:41in some way like Social Security numbers 4:43or other personal identifiable 4:46information and the third aspect 4:48regarding governance is monitoring and 4:51enforcement so you don't just set 4:52policies and call it a day right ideally 4:55you set policies centrally and enforce 4:58them locally while actively monitoring 5:00model inputs and outputs to ensure that 5:03the policies are effective including how 5:06the output of the model is changing or 5:08drifting as it's exposed to real world 5:10interactions I don't think we can 5:12emphasize that enough data and AI 5:14governance is absolutely critical risk 5:17and uncertainty about data and model 5:18outputs are two of the biggest barriers 5:20to the adoption of AI today everybody 5:22wants AI but companies can't risk having 5:25their own data or their client's data 5:27exposed I think it's clear that quality 5:29trusted data is essential to 5:31successfully implementing gen in 5:33business but a lot of our clients are 5:35still struggling with moving beyond a 5:37handful of single prototypes to the next 5:40phase of customizing the models with 5:41their own data and deploying them into 5:43production across the Enterprise okay so 5:46let's talk about how a company can 5:48customize gen with their data and 5:51integrate it with their Enterprise 5:52applications and workflows there are two 5:54main ways to customize gen with your own 5:57data to make it work for you the first 6:00is by tuning the model with your data 6:02and the second is through retrieval 6:04augmented generation or as is commonly 6:06referred to by its acronym rag tuning a 6:09model involves instructing it or 6:11partially retraining it with good 6:13examples from your Enterprise data of 6:16how it should respond to certain prompts 6:18the model quickly learns from these 6:20examples and adapts to incorporate the 6:22language and structure of your business 6:24so that it can become an integral part 6:25of your Enterprise systems on the other 6:27hand rag doesn't C cize the model but 6:30rather leverages a knowledge base again 6:32of your quality Enterprise data to 6:35improve the accuracy and even limit the 6:37responses of the model to known facts 6:40thereby mitigating the new risk of 6:42hallucinations what about the data 6:44that's used to train the models 6:46themselves well that's a great question 6:48considering we're here at IBM research 6:50where we train our foundation models as 6:52an example our training system is built 6:54on a data Lake housee architecture and 6:56it breaks down the Walls between data a 6:59Ai and governance first we Source 7:02catalog filter and transform the data 7:05that is going to be used in the models 7:07next we use it to train test and tune 7:10our large language models and finally we 7:13govern that end to-end life cycle so we 7:15have the complete lineage of our data 7:17sets data pipelines and AI pipelines and 7:20we're able to stand behind our models 7:22and that's the sort of comprehensive 7:24approach that enterprises are going to 7:25want to build as a core capability so 7:28another way to think about about a data 7:30lake house is like a commercial kitchen 7:32and I've used this analogy before but if 7:35you think about a business like a 7:36restaurant the ingredients for 7:39everything you make is the data now it'd 7:41be silly to send one truck to get 7:43carrots from a farm in Connecticut 7:45another truck to get peas from a grower 7:47in California and a third truck to get 7:49beets from Minnesota and then wait 7:51around while they bring it all back to 7:53the kitchen what are you making man you 7:55know I I don't know I'm not a chef but 7:57it's proprietary the the point is that 8:00this is how many businesses approach 8:02their data architecture today when what 8:04you really need is a well stocked Pantry 8:07where everything is already on hand and 8:10neatly organized in one place and 8:12labeled to be quickly accessible so 8:15ultimately I believe the open data lake 8:18house architecture is the best way to 8:20achieve that well stocked Pantry it 8:22combines the flexibility scalability and 8:25cost advantages of data lakes with the 8:28performance performance and 8:29functionality of data warehouses right 8:32so it's the best of both worlds and as 8:35an industry we're really moving past the 8:38old monolithic architectures to these 8:40more flexible open and interoperable 8:43architectures that let you choose the 8:45right tool for the right job at the 8:47right price so for example we can use 8:51one query engine for data inest and 8:54transformation another for interactive 8:56queries and yet another for embedding 8:59documents as vectors for rack that 9:01sounds ideal and this flexibility also 9:03enables optimal price performance which 9:06is an essential Enterprise consideration 9:08when deploying this technology at 9:09scale I'm actually a huge fan of all the 9:12experimentation and Innovation we are 9:14seeing with Gen in the market in open 9:16source and across the ecosystem but in 9:18addition the choice of tooling and cost 9:21considerations there is another 9:22important lesson to be learned from our 9:24experience over the last decade or so 9:26scaling the adoption of AI a lot of good 9:28machine Lear learning models never got 9:30deployed into production because they 9:32were built by data scientists working on 9:33the sidelines of core Enterprise it and 9:36could not be properly assess for risks 9:38including Regulatory Compliance data and 9:41AI governance is very hard to do after 9:43the fact because so much of it involves 9:46managing and tracking the end to-end 9:48life cycle so in order for all of that 9:50experimentation to someday make its way 9:52efficiently into production companies 9:54need to implement data and AI governance 9:56Frameworks or platforms from the very 9:59very beginning and not as an 10:00afterthought right and that all makes 10:02sense Edward but I see you sneaked in 10:04your beloved open source so what role do 10:07you think open source plays in the 10:09evolution of this technology and the 10:11market I think it plays a huge role of 10:13simply put open source technology and 10:16community-based Innovation delivers 10:18Superior transparency security and 10:20project oversight which I think is 10:23critical at this early stage of maturity 10:25the transparency and security points are 10:28straightforward with open source 10:29technology you have more users and 10:31developers finding and fixing bugs and 10:34vulnerabilities but the point on Project 10:36oversight is more subtle and it's about 10:38having a broad and diverse community of 10:40both vendors and users of the technology 10:43driving its long-term roadmap which 10:45ensures that it's aligned with the 10:47interests of the broader community and 10:49not just those of a particular vendor 10:52the productivity gains of gen are so 10:55massive that how and how rapidly anpr 10:59prizes implement it using their own data 11:01could make the difference between the 11:02winners and losers in almost every 11:04industry being well prepared is great 11:07but the best way to start building a new 11:09capability is to just do it with the 11:11proper guard rails of course absolutely 11:14well thank you Edward and for everyone 11:16else thank you for watching keep an eye 11:19on this space for more episodes of AI 11:21Academy with expert perspectives and 11:24real talk about some of the most 11:26important issues in AI for business