Learning Library

← Back to Library

Generative AI Transforms Data Strategy

Key Points

  • Data is the foundation of AI, and generative AI unlocks new value by effectively leveraging the massive, unstructured data that makes up most modern information.
  • Large language models can autonomously dive into huge volumes of text and code, spotting patterns and connections that would be difficult for humans to see without extensive preprocessing.
  • Generative AI can be applied to data‑management challenges, automatically normalizing and enriching heterogeneous legacy data across silos, turning scattered information into a cohesive, high‑quality asset.
  • Customizing and fine‑tuning enterprise‑specific large language models with an organization’s own data transforms that data into a sustainable competitive advantage and intellectual property.
  • IBM Watson X showcases how embedding generative AI into existing applications drives productivity gains, better business performance, and distinct market differentiation.

Full Transcript

# Generative AI Transforms Data Strategy **Source:** [https://www.youtube.com/watch?v=qtuzVc0N5o0](https://www.youtube.com/watch?v=qtuzVc0N5o0) **Duration:** 00:11:30 ## Summary - Data is the foundation of AI, and generative AI unlocks new value by effectively leveraging the massive, unstructured data that makes up most modern information. - Large language models can autonomously dive into huge volumes of text and code, spotting patterns and connections that would be difficult for humans to see without extensive preprocessing. - Generative AI can be applied to data‑management challenges, automatically normalizing and enriching heterogeneous legacy data across silos, turning scattered information into a cohesive, high‑quality asset. - Customizing and fine‑tuning enterprise‑specific large language models with an organization’s own data transforms that data into a sustainable competitive advantage and intellectual property. - IBM Watson X showcases how embedding generative AI into existing applications drives productivity gains, better business performance, and distinct market differentiation. ## Sections - [00:00:00](https://www.youtube.com/watch?v=qtuzVc0N5o0&t=0s) **Data-Driven Power of Generative AI** - In this IBM Watson X briefing, leaders explain how generative AI unlocks competitive advantage by turning massive, unstructured data into actionable insights, emphasizing that AI’s impact hinges on the quality and volume of data. ## Full Transcript
0:01patterns and relationships and vast 0:02amounts of data unlock entirely new 0:05possibilities sometimes we learn about 0:07our past or we discover something that 0:09helps us predict the future for some 0:11time we've been collecting data without 0:13even knowing what might come of it and 0:15then the volume just becomes 0:16overwhelming and that's when the 0:18relationship between data and AI gets 0:20really 0:22interesting hello and welcome to AI 0:24Academy my name is Love uger wall and 0:27I'm the worldwide sales leader for IBM 0:29Watson X and my partner today is Edward 0:31cisper vice president of product 0:33management for the IBM Watson X platform 0:35it's great to reconnect with you here 0:36today in this amazing IBM research 0:38facility I know this place is incredible 0:41this is an active working lab and we're 0:43in front of a prototype system designed 0:45for AI you and I both talk to a lot of 0:48clients and it seems like every 0:49conversation these days is about 0:51generative AI or gen AI well every 0:53conversation starts with AI but it 0:55usually ends with data because the truth 0:58is there is no AI without data and data 1:02is the only sustainable source of 1:04competitive advantage in business 1:06generative AI is changing how we think 1:08about data in a couple of ways first gen 1:12models can make much more effective use 1:14of unstructured data which is the 1:16majority of all new data gen can dive 1:19into large volumes of language data 1:21primarily documents or software code 1:23which is also a language and spot 1:25patterns or make connections without 1:27much preparation or supervision it can 1:29see things that we likely wouldn't see 1:31yes second we can apply gen to the data 1:35management problem itself and it can 1:37help us organize refin and enrich data 1:40so that it's higher quality and easier 1:42to consume right so for example if a 1:45client has different Legacy applications 1:47that have been encoding data in 1:49different ways like different formats or 1:51dates or people's names and initials or 1:54different columns and column headings 1:55and that sort of thing gen can help make 1:58sense of that and the data is often 2:01scattered everywhere gen can look at 2:03those siloed systems and begin to 2:05understand what the data is and how it 2:06could be related so you can take 2:08advantage of all of that data 2:09holistically and much more efficiently 2:12big savings in time and energy how 2:14effectively and efficiently you manage 2:17your data as real cost and business 2:19performance implications it becomes 2:21intellectual property and a point of 2:23competitive Advantage so let's talk more 2:25about gen and data as a source of 2:28competitive advantage high quality data 2:30is essential to helping Enterprises use 2:32gen to improve their business almost 2:35every one of our customers is at least 2:37experimenting with Gen today which means 2:39that everyone has access to essentially 2:41the same technology so it's the 2:42customization or tuning of the large 2:45language models with Enterprise data and 2:47the infusion of gen into new and 2:49existing Enterprise applications that 2:51dries productivity gains improve 2:53business performance and competitive 2:55Advantage so every company has data but 2:57there's really a spectrum of how 2:59different organizations take advantage 3:01of that data some might be stuck with an 3:03architectural problem like data being 3:05inaccessible or locked in silos across 3:08on Prem and Cloud environments the data 3:10silos problem is pervasive and you can't 3:13solve it by creating a new data Silo in 3:14the cloud so we have different 3:16approaches to solving it such as 3:17building a virtual data layer and 3:19quering data across multiple sources or 3:21consolidating data onto one platform 3:23like a data lake house which is open and 3:25cost effective and for other companies 3:28what's holding them back from unlocking 3:30the full value of their data is 3:32something more subtle it's almost a 3:34psychological barrier they have to shift 3:36how they think about data and turn the 3:39business into Data so that they can then 3:42turn their data into a business data 3:44monetization is nirvana when you can 3:47literally sell a version of your data 3:49effectively a byproduct of your core 3:51business as a product itself to create 3:53those products you need to ensure data 3:56Quality Security and governance so that 3:59it doesn't become a business risk or 4:01regulatory exposure so let's talk about 4:04those first you need just good 4:07traditional data quality practices do 4:09the things you already know you should 4:11be doing you need to catalog or organize 4:14your existing data into a business 4:16glossery now ai can assist with that 4:18work but the more you have ready the 4:21faster you can get value from Ai and the 4:23financial incentives to do so are just 4:25going up and up second is having 4:28thoughtful data access policies like 4:30with any user you really need to Define 4:33what information you want AI to be able 4:36to access or said differently which 4:38information you want to remove or redact 4:41in some way like Social Security numbers 4:43or other personal identifiable 4:46information and the third aspect 4:48regarding governance is monitoring and 4:51enforcement so you don't just set 4:52policies and call it a day right ideally 4:55you set policies centrally and enforce 4:58them locally while actively monitoring 5:00model inputs and outputs to ensure that 5:03the policies are effective including how 5:06the output of the model is changing or 5:08drifting as it's exposed to real world 5:10interactions I don't think we can 5:12emphasize that enough data and AI 5:14governance is absolutely critical risk 5:17and uncertainty about data and model 5:18outputs are two of the biggest barriers 5:20to the adoption of AI today everybody 5:22wants AI but companies can't risk having 5:25their own data or their client's data 5:27exposed I think it's clear that quality 5:29trusted data is essential to 5:31successfully implementing gen in 5:33business but a lot of our clients are 5:35still struggling with moving beyond a 5:37handful of single prototypes to the next 5:40phase of customizing the models with 5:41their own data and deploying them into 5:43production across the Enterprise okay so 5:46let's talk about how a company can 5:48customize gen with their data and 5:51integrate it with their Enterprise 5:52applications and workflows there are two 5:54main ways to customize gen with your own 5:57data to make it work for you the first 6:00is by tuning the model with your data 6:02and the second is through retrieval 6:04augmented generation or as is commonly 6:06referred to by its acronym rag tuning a 6:09model involves instructing it or 6:11partially retraining it with good 6:13examples from your Enterprise data of 6:16how it should respond to certain prompts 6:18the model quickly learns from these 6:20examples and adapts to incorporate the 6:22language and structure of your business 6:24so that it can become an integral part 6:25of your Enterprise systems on the other 6:27hand rag doesn't C cize the model but 6:30rather leverages a knowledge base again 6:32of your quality Enterprise data to 6:35improve the accuracy and even limit the 6:37responses of the model to known facts 6:40thereby mitigating the new risk of 6:42hallucinations what about the data 6:44that's used to train the models 6:46themselves well that's a great question 6:48considering we're here at IBM research 6:50where we train our foundation models as 6:52an example our training system is built 6:54on a data Lake housee architecture and 6:56it breaks down the Walls between data a 6:59Ai and governance first we Source 7:02catalog filter and transform the data 7:05that is going to be used in the models 7:07next we use it to train test and tune 7:10our large language models and finally we 7:13govern that end to-end life cycle so we 7:15have the complete lineage of our data 7:17sets data pipelines and AI pipelines and 7:20we're able to stand behind our models 7:22and that's the sort of comprehensive 7:24approach that enterprises are going to 7:25want to build as a core capability so 7:28another way to think about about a data 7:30lake house is like a commercial kitchen 7:32and I've used this analogy before but if 7:35you think about a business like a 7:36restaurant the ingredients for 7:39everything you make is the data now it'd 7:41be silly to send one truck to get 7:43carrots from a farm in Connecticut 7:45another truck to get peas from a grower 7:47in California and a third truck to get 7:49beets from Minnesota and then wait 7:51around while they bring it all back to 7:53the kitchen what are you making man you 7:55know I I don't know I'm not a chef but 7:57it's proprietary the the point is that 8:00this is how many businesses approach 8:02their data architecture today when what 8:04you really need is a well stocked Pantry 8:07where everything is already on hand and 8:10neatly organized in one place and 8:12labeled to be quickly accessible so 8:15ultimately I believe the open data lake 8:18house architecture is the best way to 8:20achieve that well stocked Pantry it 8:22combines the flexibility scalability and 8:25cost advantages of data lakes with the 8:28performance performance and 8:29functionality of data warehouses right 8:32so it's the best of both worlds and as 8:35an industry we're really moving past the 8:38old monolithic architectures to these 8:40more flexible open and interoperable 8:43architectures that let you choose the 8:45right tool for the right job at the 8:47right price so for example we can use 8:51one query engine for data inest and 8:54transformation another for interactive 8:56queries and yet another for embedding 8:59documents as vectors for rack that 9:01sounds ideal and this flexibility also 9:03enables optimal price performance which 9:06is an essential Enterprise consideration 9:08when deploying this technology at 9:09scale I'm actually a huge fan of all the 9:12experimentation and Innovation we are 9:14seeing with Gen in the market in open 9:16source and across the ecosystem but in 9:18addition the choice of tooling and cost 9:21considerations there is another 9:22important lesson to be learned from our 9:24experience over the last decade or so 9:26scaling the adoption of AI a lot of good 9:28machine Lear learning models never got 9:30deployed into production because they 9:32were built by data scientists working on 9:33the sidelines of core Enterprise it and 9:36could not be properly assess for risks 9:38including Regulatory Compliance data and 9:41AI governance is very hard to do after 9:43the fact because so much of it involves 9:46managing and tracking the end to-end 9:48life cycle so in order for all of that 9:50experimentation to someday make its way 9:52efficiently into production companies 9:54need to implement data and AI governance 9:56Frameworks or platforms from the very 9:59very beginning and not as an 10:00afterthought right and that all makes 10:02sense Edward but I see you sneaked in 10:04your beloved open source so what role do 10:07you think open source plays in the 10:09evolution of this technology and the 10:11market I think it plays a huge role of 10:13simply put open source technology and 10:16community-based Innovation delivers 10:18Superior transparency security and 10:20project oversight which I think is 10:23critical at this early stage of maturity 10:25the transparency and security points are 10:28straightforward with open source 10:29technology you have more users and 10:31developers finding and fixing bugs and 10:34vulnerabilities but the point on Project 10:36oversight is more subtle and it's about 10:38having a broad and diverse community of 10:40both vendors and users of the technology 10:43driving its long-term roadmap which 10:45ensures that it's aligned with the 10:47interests of the broader community and 10:49not just those of a particular vendor 10:52the productivity gains of gen are so 10:55massive that how and how rapidly anpr 10:59prizes implement it using their own data 11:01could make the difference between the 11:02winners and losers in almost every 11:04industry being well prepared is great 11:07but the best way to start building a new 11:09capability is to just do it with the 11:11proper guard rails of course absolutely 11:14well thank you Edward and for everyone 11:16else thank you for watching keep an eye 11:19on this space for more episodes of AI 11:21Academy with expert perspectives and 11:24real talk about some of the most 11:26important issues in AI for business