Generative AI Transforms Data Strategy
Key Points
- Data is the foundation of AI, and generative AI unlocks new value by effectively leveraging the massive, unstructured data that makes up most modern information.
- Large language models can autonomously dive into huge volumes of text and code, spotting patterns and connections that would be difficult for humans to see without extensive preprocessing.
- Generative AI can be applied to data‑management challenges, automatically normalizing and enriching heterogeneous legacy data across silos, turning scattered information into a cohesive, high‑quality asset.
- Customizing and fine‑tuning enterprise‑specific large language models with an organization’s own data transforms that data into a sustainable competitive advantage and intellectual property.
- IBM Watson X showcases how embedding generative AI into existing applications drives productivity gains, better business performance, and distinct market differentiation.
Full Transcript
# Generative AI Transforms Data Strategy **Source:** [https://www.youtube.com/watch?v=qtuzVc0N5o0](https://www.youtube.com/watch?v=qtuzVc0N5o0) **Duration:** 00:11:30 ## Summary - Data is the foundation of AI, and generative AI unlocks new value by effectively leveraging the massive, unstructured data that makes up most modern information. - Large language models can autonomously dive into huge volumes of text and code, spotting patterns and connections that would be difficult for humans to see without extensive preprocessing. - Generative AI can be applied to data‑management challenges, automatically normalizing and enriching heterogeneous legacy data across silos, turning scattered information into a cohesive, high‑quality asset. - Customizing and fine‑tuning enterprise‑specific large language models with an organization’s own data transforms that data into a sustainable competitive advantage and intellectual property. - IBM Watson X showcases how embedding generative AI into existing applications drives productivity gains, better business performance, and distinct market differentiation. ## Sections - [00:00:00](https://www.youtube.com/watch?v=qtuzVc0N5o0&t=0s) **Data-Driven Power of Generative AI** - In this IBM Watson X briefing, leaders explain how generative AI unlocks competitive advantage by turning massive, unstructured data into actionable insights, emphasizing that AI’s impact hinges on the quality and volume of data. ## Full Transcript
patterns and relationships and vast
amounts of data unlock entirely new
possibilities sometimes we learn about
our past or we discover something that
helps us predict the future for some
time we've been collecting data without
even knowing what might come of it and
then the volume just becomes
overwhelming and that's when the
relationship between data and AI gets
really
interesting hello and welcome to AI
Academy my name is Love uger wall and
I'm the worldwide sales leader for IBM
Watson X and my partner today is Edward
cisper vice president of product
management for the IBM Watson X platform
it's great to reconnect with you here
today in this amazing IBM research
facility I know this place is incredible
this is an active working lab and we're
in front of a prototype system designed
for AI you and I both talk to a lot of
clients and it seems like every
conversation these days is about
generative AI or gen AI well every
conversation starts with AI but it
usually ends with data because the truth
is there is no AI without data and data
is the only sustainable source of
competitive advantage in business
generative AI is changing how we think
about data in a couple of ways first gen
models can make much more effective use
of unstructured data which is the
majority of all new data gen can dive
into large volumes of language data
primarily documents or software code
which is also a language and spot
patterns or make connections without
much preparation or supervision it can
see things that we likely wouldn't see
yes second we can apply gen to the data
management problem itself and it can
help us organize refin and enrich data
so that it's higher quality and easier
to consume right so for example if a
client has different Legacy applications
that have been encoding data in
different ways like different formats or
dates or people's names and initials or
different columns and column headings
and that sort of thing gen can help make
sense of that and the data is often
scattered everywhere gen can look at
those siloed systems and begin to
understand what the data is and how it
could be related so you can take
advantage of all of that data
holistically and much more efficiently
big savings in time and energy how
effectively and efficiently you manage
your data as real cost and business
performance implications it becomes
intellectual property and a point of
competitive Advantage so let's talk more
about gen and data as a source of
competitive advantage high quality data
is essential to helping Enterprises use
gen to improve their business almost
every one of our customers is at least
experimenting with Gen today which means
that everyone has access to essentially
the same technology so it's the
customization or tuning of the large
language models with Enterprise data and
the infusion of gen into new and
existing Enterprise applications that
dries productivity gains improve
business performance and competitive
Advantage so every company has data but
there's really a spectrum of how
different organizations take advantage
of that data some might be stuck with an
architectural problem like data being
inaccessible or locked in silos across
on Prem and Cloud environments the data
silos problem is pervasive and you can't
solve it by creating a new data Silo in
the cloud so we have different
approaches to solving it such as
building a virtual data layer and
quering data across multiple sources or
consolidating data onto one platform
like a data lake house which is open and
cost effective and for other companies
what's holding them back from unlocking
the full value of their data is
something more subtle it's almost a
psychological barrier they have to shift
how they think about data and turn the
business into Data so that they can then
turn their data into a business data
monetization is nirvana when you can
literally sell a version of your data
effectively a byproduct of your core
business as a product itself to create
those products you need to ensure data
Quality Security and governance so that
it doesn't become a business risk or
regulatory exposure so let's talk about
those first you need just good
traditional data quality practices do
the things you already know you should
be doing you need to catalog or organize
your existing data into a business
glossery now ai can assist with that
work but the more you have ready the
faster you can get value from Ai and the
financial incentives to do so are just
going up and up second is having
thoughtful data access policies like
with any user you really need to Define
what information you want AI to be able
to access or said differently which
information you want to remove or redact
in some way like Social Security numbers
or other personal identifiable
information and the third aspect
regarding governance is monitoring and
enforcement so you don't just set
policies and call it a day right ideally
you set policies centrally and enforce
them locally while actively monitoring
model inputs and outputs to ensure that
the policies are effective including how
the output of the model is changing or
drifting as it's exposed to real world
interactions I don't think we can
emphasize that enough data and AI
governance is absolutely critical risk
and uncertainty about data and model
outputs are two of the biggest barriers
to the adoption of AI today everybody
wants AI but companies can't risk having
their own data or their client's data
exposed I think it's clear that quality
trusted data is essential to
successfully implementing gen in
business but a lot of our clients are
still struggling with moving beyond a
handful of single prototypes to the next
phase of customizing the models with
their own data and deploying them into
production across the Enterprise okay so
let's talk about how a company can
customize gen with their data and
integrate it with their Enterprise
applications and workflows there are two
main ways to customize gen with your own
data to make it work for you the first
is by tuning the model with your data
and the second is through retrieval
augmented generation or as is commonly
referred to by its acronym rag tuning a
model involves instructing it or
partially retraining it with good
examples from your Enterprise data of
how it should respond to certain prompts
the model quickly learns from these
examples and adapts to incorporate the
language and structure of your business
so that it can become an integral part
of your Enterprise systems on the other
hand rag doesn't C cize the model but
rather leverages a knowledge base again
of your quality Enterprise data to
improve the accuracy and even limit the
responses of the model to known facts
thereby mitigating the new risk of
hallucinations what about the data
that's used to train the models
themselves well that's a great question
considering we're here at IBM research
where we train our foundation models as
an example our training system is built
on a data Lake housee architecture and
it breaks down the Walls between data a
Ai and governance first we Source
catalog filter and transform the data
that is going to be used in the models
next we use it to train test and tune
our large language models and finally we
govern that end to-end life cycle so we
have the complete lineage of our data
sets data pipelines and AI pipelines and
we're able to stand behind our models
and that's the sort of comprehensive
approach that enterprises are going to
want to build as a core capability so
another way to think about about a data
lake house is like a commercial kitchen
and I've used this analogy before but if
you think about a business like a
restaurant the ingredients for
everything you make is the data now it'd
be silly to send one truck to get
carrots from a farm in Connecticut
another truck to get peas from a grower
in California and a third truck to get
beets from Minnesota and then wait
around while they bring it all back to
the kitchen what are you making man you
know I I don't know I'm not a chef but
it's proprietary the the point is that
this is how many businesses approach
their data architecture today when what
you really need is a well stocked Pantry
where everything is already on hand and
neatly organized in one place and
labeled to be quickly accessible so
ultimately I believe the open data lake
house architecture is the best way to
achieve that well stocked Pantry it
combines the flexibility scalability and
cost advantages of data lakes with the
performance performance and
functionality of data warehouses right
so it's the best of both worlds and as
an industry we're really moving past the
old monolithic architectures to these
more flexible open and interoperable
architectures that let you choose the
right tool for the right job at the
right price so for example we can use
one query engine for data inest and
transformation another for interactive
queries and yet another for embedding
documents as vectors for rack that
sounds ideal and this flexibility also
enables optimal price performance which
is an essential Enterprise consideration
when deploying this technology at
scale I'm actually a huge fan of all the
experimentation and Innovation we are
seeing with Gen in the market in open
source and across the ecosystem but in
addition the choice of tooling and cost
considerations there is another
important lesson to be learned from our
experience over the last decade or so
scaling the adoption of AI a lot of good
machine Lear learning models never got
deployed into production because they
were built by data scientists working on
the sidelines of core Enterprise it and
could not be properly assess for risks
including Regulatory Compliance data and
AI governance is very hard to do after
the fact because so much of it involves
managing and tracking the end to-end
life cycle so in order for all of that
experimentation to someday make its way
efficiently into production companies
need to implement data and AI governance
Frameworks or platforms from the very
very beginning and not as an
afterthought right and that all makes
sense Edward but I see you sneaked in
your beloved open source so what role do
you think open source plays in the
evolution of this technology and the
market I think it plays a huge role of
simply put open source technology and
community-based Innovation delivers
Superior transparency security and
project oversight which I think is
critical at this early stage of maturity
the transparency and security points are
straightforward with open source
technology you have more users and
developers finding and fixing bugs and
vulnerabilities but the point on Project
oversight is more subtle and it's about
having a broad and diverse community of
both vendors and users of the technology
driving its long-term roadmap which
ensures that it's aligned with the
interests of the broader community and
not just those of a particular vendor
the productivity gains of gen are so
massive that how and how rapidly anpr
prizes implement it using their own data
could make the difference between the
winners and losers in almost every
industry being well prepared is great
but the best way to start building a new
capability is to just do it with the
proper guard rails of course absolutely
well thank you Edward and for everyone
else thank you for watching keep an eye
on this space for more episodes of AI
Academy with expert perspectives and
real talk about some of the most
important issues in AI for business