Chunking Errors Cost Major Deals
Key Points
- Proper chunking of text is essential for effective retrieval‑augmented generation, as AI models rely on a few well‑chosen chunks to formulate accurate answers.
- A fintech company’s chatbot gave a wrong indemnification answer because a contract clause was split across token‑based chunks, illustrating that poor chunking, not model intelligence, caused the error.
- Incorrect chunk sizes lead to missed context, increase hallucinations, and inflate costs by forcing the system to retrieve and process unnecessary tokens.
- The primary challenge for organizations implementing RAG is not choosing the embedding model but designing a chunking strategy that preserves semantic continuity and fits the overall data pipeline.
Sections
- Chunking Missteps Cost Fintech Deal - A fintech’s AI chatbot mis‑interpreted a contract because it split the text into improper chunks, leading to a wrong indemnification answer and a near‑lost deal.
- Chunking vs Agentic Search - The speaker emphasizes that proper chunking dramatically reduces costs and prevents hallucinations, while acknowledging that agentic search can tackle complex, multi‑source queries but does not eliminate the need for good chunking.
- The Critical Role of Chunking - The speaker explains how proper semantic chunking is essential for effective retrieval‑augmented generation and agentic systems, outlining common pitfalls and introducing five principles for creating useful vector‑based document fragments.
- Chunking Financial Tables & Code - The speaker outlines the intricacies of handling financial tables and source code, stressing that simple row‑by‑row chunking fails and that building dependency graphs and semantic “neighborhood” chunking is essential for effective retrieval and analysis.
- AI Demands Structured Data - The speaker emphasizes that AI forces users to clean and hierarchically organize code, spreadsheets, and financial data—using clear chunking and semantic labeling—because agentic search alone cannot compensate for messy, poorly organized information.
- Strategic Chunking for AI Retrieval - The speaker stresses that treating data chunking as an afterthought harms downstream AI performance and that preserving metadata, document structure, and rearchitecting data are essential for effective retrieval.
- Importance of Embeddings and Chunking - The speaker stresses that while embeddings are crucial, they’re not a silver‑bullet solution, and proper chunking—often overlooked—remains essential to effective implementation.
Full Transcript
# Chunking Errors Cost Major Deals **Source:** [https://www.youtube.com/watch?v=pMSXPgAUq_k](https://www.youtube.com/watch?v=pMSXPgAUq_k) **Duration:** 00:21:38 ## Summary - Proper chunking of text is essential for effective retrieval‑augmented generation, as AI models rely on a few well‑chosen chunks to formulate accurate answers. - A fintech company’s chatbot gave a wrong indemnification answer because a contract clause was split across token‑based chunks, illustrating that poor chunking, not model intelligence, caused the error. - Incorrect chunk sizes lead to missed context, increase hallucinations, and inflate costs by forcing the system to retrieve and process unnecessary tokens. - The primary challenge for organizations implementing RAG is not choosing the embedding model but designing a chunking strategy that preserves semantic continuity and fits the overall data pipeline. ## Sections - [00:00:00](https://www.youtube.com/watch?v=pMSXPgAUq_k&t=0s) **Chunking Missteps Cost Fintech Deal** - A fintech’s AI chatbot mis‑interpreted a contract because it split the text into improper chunks, leading to a wrong indemnification answer and a near‑lost deal. - [00:03:07](https://www.youtube.com/watch?v=pMSXPgAUq_k&t=187s) **Chunking vs Agentic Search** - The speaker emphasizes that proper chunking dramatically reduces costs and prevents hallucinations, while acknowledging that agentic search can tackle complex, multi‑source queries but does not eliminate the need for good chunking. - [00:06:19](https://www.youtube.com/watch?v=pMSXPgAUq_k&t=379s) **The Critical Role of Chunking** - The speaker explains how proper semantic chunking is essential for effective retrieval‑augmented generation and agentic systems, outlining common pitfalls and introducing five principles for creating useful vector‑based document fragments. - [00:10:13](https://www.youtube.com/watch?v=pMSXPgAUq_k&t=613s) **Chunking Financial Tables & Code** - The speaker outlines the intricacies of handling financial tables and source code, stressing that simple row‑by‑row chunking fails and that building dependency graphs and semantic “neighborhood” chunking is essential for effective retrieval and analysis. - [00:13:26](https://www.youtube.com/watch?v=pMSXPgAUq_k&t=806s) **AI Demands Structured Data** - The speaker emphasizes that AI forces users to clean and hierarchically organize code, spreadsheets, and financial data—using clear chunking and semantic labeling—because agentic search alone cannot compensate for messy, poorly organized information. - [00:18:04](https://www.youtube.com/watch?v=pMSXPgAUq_k&t=1084s) **Strategic Chunking for AI Retrieval** - The speaker stresses that treating data chunking as an afterthought harms downstream AI performance and that preserving metadata, document structure, and rearchitecting data are essential for effective retrieval. - [00:21:25](https://www.youtube.com/watch?v=pMSXPgAUq_k&t=1285s) **Importance of Embeddings and Chunking** - The speaker stresses that while embeddings are crucial, they’re not a silver‑bullet solution, and proper chunking—often overlooked—remains essential to effective implementation. ## Full Transcript
I want to tell you the story of a
fintech company that almost lost a major
deal because they handled chunking
badly. You might think, what is
chunking? Chunking is the foundation of
so much efficient context engineering
and data work with AI. It sounds boring.
I know it sounds boring, but we are
going to go through it together. I'm
going to lay out the key principles of
chunking and embedding and I'm going to
explain why they matter. This is the
number one question I get as soon as
people understand that they need to put
their data into a position where they
make it ready for the AI. They're like,
"Okay, so I need something. Oh, it's
probably a rag, a retrieval augmented
generation system. Well, now what?" And
this is where chunking appears. Now what
is chunking? And if you can't cut your
text into the appropriate sized chunks,
you're going to get into huge trouble.
And we're going to get into a lot of
specifics on this. Buckle in. Get
excited. Grab some coffee. Okay, so this
fintech company, their AI chatbot was
asked about indemnification for an NDA.
The contract said party A indemnifies
party B in one chunk and accept as
provided in section whatever in the next
chunk. It broke in the middle of the
sentence because they were using every
so many token chunking. So the AI
retrieved only the first chunk and
confidently said party A fully
indemnifies party B. That's the wrong
answer and it took a lot of billable
hours to clean up. Here's the thing.
That was not a model intelligence
problem. I've seen that happen over and
over again. You get an inaccurate
response and people assume this will get
fixed when chat GPT5 comes out. It won't
because it's a problem of context
engineering. You're not chunking your
data right and it matters. I've
consulted with a lot of companies
implementing rack and this is the top
question. It's not which embedding
model. It's it's how should we chunk our
data. That's what matters. It's not how
do we prevent hallucinations. That is
actually also a question of chunking.
People just don't know how to ask it. So
if chunking is the foundation that
everything builds on, how do you think
about it in the context of your overall
pipeline? First, let's understand how
rag and let's understand how retrieval
load generation and how AI works
together with chunking to get you
answers very briefly. So when someone
asks a question of AI, you are going to
get three to five chunks back and that
is what the system is going to depend on
to formulate an answer. And those chunks
are going to be retrieved by their
semantic fit to the query. And so if the
true answer got split across multiple
chunks and part of it is missing from
that three to five chunk set like I
described, you're not going to get the
right answer. It doesn't matter how
smart the model is. Chunking also
directly impacts your costs. Bad
chunking means retrieving more chunks
than you need to to get the information
you need. Pulling more chunks means
pulling more tokens, loading more into
the context window, which could
overwhelm
the system and ironically produce less
accurate responses because it has so
much meaningless context in there.
Companies can reduce their bills to
major model makers significantly, like
double-digit percentages, by getting
chunking right. And again, I'm just
going to say this one more time.
Chunking is one of your first lines of
defense against models hallucinating.
You think models hallucinate because the
model itself is bad. What you don't
realize is that what else is the model
going to do when you give it bad chunks
with incomplete information. The AI
fills in the gaps. That's where the
hallucinations come from. And that's
really on you for not chunking well.
Now, some of you are going to raise your
hands at this point and you're going to
say, "Nate, I've heard about this thing
called Agentic Search. It uses AI
agents, so it must be cool. Why would we
chunk it all?" Well, that's a fair
question. Aentic Search is a different
technology. An AI agent can iteratively
search and then read and then reason and
then search again. And it seems like it
could sidestep chunking all the way.
Certain certain use cases, that's true.
If you have a complex use case, you're
reasoning across multiple types of data
at the same time. Agentic search can be
really, really effective. It's great for
exploratory queries. It's great for
answering complex challenges like what
is the total impact of our Q3 marketing
campaign across all channels, right?
Like that kind of a query where you're
going to have to go and look at multiple
tables, you're going to have to reason,
you're going to have to sum, etc. Do
some math, come back with an answer.
There's a lot in that one sentence. Rag
is more foundational and chunking is
more foundational. Rag is good at
solving the problem of fast economic
retrieval and chunking is how you get
that retrieval to be accurate. Chunking
is it's like eating your vegetables.
People don't think of it as a super
amazing technology that's sexy, but that
doesn't matter. You either have accurate
retrieval and low hallucinations at an
economical price or you pay a lot for a
gentic search that's going to be a lot
slower. And those are both step change
differences. A gentic search can be 10
or more times slower than a good rag
retrieval. And it can be 10 or more
times more expensive. Do you really want
to 10x your expenses just to use aentic
search and sidestep the hard
conversations around embeddings and
chunking? most businesses don't when
they actually sit down and pencil out
the math. So the so this is where rag
wins. Rag wins when you need consistent
responses and you need them fast and
they need to be economical and the
questions relatively cleanly map to
specific information. You can retrieve
the answers when the cost per query
matters at scale. When you need
predictable behavior when queries are
often semantic meaning related lookups
those are all cases where good chunking
strategy wins and we will go through
some use cases. There's a lot of them
that you can dig into. On the other
hand, I will I I recommend a gentic
search to some companies. It matters and
it makes a difference when you're using
multi-step reasoning with a query. It
matters when information is scattered
across a whole lot of documents. So,
retrieving the chunks would be very
difficult. It matters when you need to
follow references and you need to follow
links. It matters when the path to the
answer is really unknown. Agentic search
can be really helpful. So, the point is
agentic searching is helpful is useful.
But interestingly enough, if you've been
watching this far, you'll realize that
agentic systems also rely on good
chunking because they are also involved
in picking out semantic information.
What I said information is scattered
across many documents. That gets easier
with chunking. Following references and
links, it would sure help if the
references and links were in the same
semantic unit of meaning as the original
context. Multi-step reasoning is easier
when you have clearly labeled chunks
that actually work as individual units
of content. All of this stuff, this
boring chunking stuff, turns out to add
value not just to cheap, efficient rag,
but also to a gentic search. So, let's
get into chunking a little bit and we'll
talk about the five principles of
effective chunking. When you build a
retrieval augmented generation system,
you're not just feeding the whole
document into AI and saying, "God bless
right off you go." You have to break it
into pieces into chunks that get stored
in a vector database. So your AI is
taking an open book exam, right? And
someone has to tear that book page by
page into little chunks. And if you tear
it wrong, your AI is reading half the
sentence. That's what you need to have
in your head as a picture. You're giving
the AI a book, but you're giving it in
pieces. And you have to have it retrieve
the right piece to get the answer. So
bad chunking is responsible for a huge
amount of rag failures and realistic
production pipelines and it can take
weeks or months. I've had teams spend
months working on figuring out chunking
strategies so that they get all of the
meaning in the query and they end up
iterating and iterating iterating to get
there. I would like to make that easy
for you. I would like to make you more
of an expert on chunking than I was when
I got started. So, let me lay out the
five principles of effective chunking
that I've seen work over and over again.
Number one, context coherence. You are
doing context engineering when you
chunk. Never split, meaning your AI can
only work with what's in the chunk that
it retrieves. If you split the defendant
shall pay damages into one chunk and
unless gross negligence is proven into
another chunk, you've created a
hallucination waiting to happen. Respect
natural boundaries. For contracts, that
would be sections and subsections. If
it's code, it might be functions and
classes. I'm going to talk a little bit
more about code. Code is an interesting
case. For conversations, it's usually
speaker turns. It might be time windows.
Every data type has semantic boundaries.
Take the time to find them and use them.
Principle number two, there are three
levers that you can control in chunking
and know how to use them. Boundaries,
size, and overlap. Boundaries are where
you cut, maybe by sentence, by
paragraph, by section, whatever makes
semantic sense. Size is how big each
chunk gets. It's not an arbitrary token
count. It should be a complete unit of
meaning. And overlap is an insurance
policy. It is often the case that you
are 10, 15, 20% overlapped on your
chunks because you don't want to have
breaks in your chunks that give you the
risks I've described with AI
hallucinating contracts. Most people
only think about size. They'll set it at
whatever thousand tokens and call it
good. You don't want to live in that
world because then you're just ripping
the pages off and it doesn't matter in
this book that we're imagining AI
reading and AI is going to be really
confused reading the book because it's
ripped in weird places. It's not ripped
at the chapter breaks or the section
breaks. Okay, so know your levers. No
boundaries, no size, no overlap. Use
them all and use them in a way that
respects principle number one, context
coherence. Third principle, data type is
going to dictate your strategy. This is
where we'll get back into the code piece
a little bit. A legal contract chunks
differently than source code, which
everyone would envision is true, but so
few people think about it that way. You
can split on section markers in a legal
contract, and that's typically labeled
really cleanly. You want to be in a
place where you include the full
hierarchy of contracts in metadata so
that it's easy to read and understand.
Financial tables, this can get complex.
Tables tend to have orthogonal
relationships. Rows relate to columns.
Cells reference other cells. Formulas
depend on ranges. A simple rowbyrow
chunk does not work. So, I'm going to
get in in a second and explain to you an
approach that can help a bit with
financial tables. And then we'll get
into sort of where to use that versus
maybe where to use a gentic search. And
then let's talk about source code too.
To me, that is the biggest elephant in
the room. Do you use real code and look
at all the dependencies and try and
build a semantically meaningful rag
system with good chunking? How do you do
it? So in reality, if you have really
clean code, which most people don't, and
your functions are pure and
self-contained, it is possible to have
very useful semantic chunking with
source code that enables you to retrieve
bits of code and actually operate
against them. Often times you need to be
in a place where you are retrieving
information across a really messy
dependency tree. So your function might
call three other functions. It
references class variables that are not
local, uses imported modules, whatever
it may be. Your code has side effects.
Everybody's does. So the best way to
think about it is if you were going to
find use and value out of chunking code,
take the time to build dependency
graphs. Take to the time to include all
called functions in your metadata.
Consider something like neighborhood
chunking where you include the function
plus everything that it's going to call
in one chunk. And if the code is really
highly coupled together, you might need
to chunk like an entire class or an
entire module of code together. The best
strategy if you really want clean,
semantically meaningful code is
sometimes to refactor it. And by the
way, AI is good at that. That's a
separate conversation, but AI can be
quite good at that. Bad code
architecture leads to very a huge amount
of difficulty with chunks. And that by
the way, that is why a lot of
organizations that are trying to figure
out how to get their code into AI are
employing Agentic Search. Agentic Search
enables them to not have to immediately
refactor their bad code and instead
they're going to burn tokens and have an
Agentic Search reason across a large and
messy codebase. Is it perfect? No. Is it
expensive? Yes. Is it something that may
be a way for them to go forward because
they know that chunking is going to be
hard here? Also, yes. Now, let's dive a
little bit more into Excel. I think this
deserves its own section because I see a
lot of disasters here. Excel data isn't
just rows and columns. You're basically
preserving a web of relationships. So,
the marketing dashboard might have a
time series that runs horizontally and
categories that run vertically and
formulas that reference various ranges,
etc. You can't chunk it rowby row and
expect it to work. So, here's a few ways
that people approach this. One, again,
you can go back to aentic search.
Sometimes that happens. Or if you really
want to get useful semantic meaning,
think about the natural semantic chunks.
And so you could take a particular time
window and chunk that and include all
categories like Q3 2024, chunk that. Uh
for formula heavy sheets, you may want
to trace dependencies, build a map, and
chunk calculable units together. Like if
cells A1 to A10 feed to a summary in
B15, they would all be the same chunk.
This also takes work. Again, the AI is
norming us and pushing us toward cleaner
code and cleaner spreadsheets here. And
that is absolutely going to be a trend
in the workplace. If you're using pivot
tables, if you're using summaries, you
want to duplicate the summary in each
chunk or create a very clear hierarchy
so that detailed chunks can reference a
summary chunk. You want to be again
clear about what each thing does. And
sometimes it is possible if you really
want the semantic meaning that you will
need to extract and convert the pivot
table or whatever the Excel sheet is
into natural language. I don't see that
happen very often. If it gets that bad,
most people go back to aentic search
because at least it works with the messy
data a little bit. Even so, if you are
dealing with a situation where you have
a lot of complex financial data, I would
recommend that you look at the semantic
borders and meaning of your financial
data because as we discussed at the top,
a gentic search isn't a silver bullet.
It still needs good chunking strategies
if you have them to retrieve stuff
effectively. In a sense, one of the
things I want you to take away here is
there's no such thing as a free lunch.
There is no way to easily and
intuitively get away with not chunking
well and agentic search is not a get out
of jail free card on that one. You have
to wrestle with the challenge of your
own messy data and AI forces you to
confront it if you want the benefits of
reasoning with machine intelligence
across your data sets. Let's go to the
fourth principle. You want to size for
Goldilocks outcomes. If your chunks are
too small and your chunks lack context,
your AI at best will say, "I don't know
a lot." If they're too big, you're
wasting a lot of tokens. Your answers
will be really unfocused and you're
paying more. So, you want to think about
the sweet spot that reflects both the
semantic meaning in the data and the
natural language answer you want. And
so, maybe for legal clauses that ends up
being 750 tokens, somewhere between 500
and a,000, something like that. Maybe
for technical documents, it's longer,
they're more complex. Uh, I know the
irony, right? Technical docs could be
more complex than legal docs. So maybe
it's closer to a thousand all the time.
For coupled code, you can have like
really large chunks where like entire
classes or modules are in there. They
can get into the thousands of tokens.
Uh, time series data. If you're pulling
a full period with all contexts, again,
that can be a somewhat sizable piece of
of context. It could run over a thousand
tokens. And the key is not Nate said it
was going to be X token, so that's what
we use. I've been saying the entire
video, don't use an arbitrary token
boundary. Go for semantic meaning. You
want to build an evaluation set and you
want to test these evaluation questions
against various chunking strategies
until you find one that works. Evals
win. Accuracy is going to max if you try
different chunking strategies against a
common evaluation set of questions. All
right. Fifth and final principle.
Remember overlap. Remember, overlap gets
underused so much. Overlap means the end
of chunk A appears at the beginning of
chunk B. And this matters because
important information can still span
boundaries sometimes. No matter how good
your semantic splits are, they may not
be perfect. And if you have a big enough
data set, you may not be able to
handcheck every split. And so you have
to have some overlap as insurance. One
of the catches is if you have orthogonal
data like spreadsheets, which way do you
overlap? For time series, maybe it's a
temporal overlap where you include like
a summary from the previous period. For
categorical data, is it a category
overlap? It's one of the things that
like gets sort of fraught when you get
into the details. And this is why I'm
making this video is I want to surface
and sunlight these conversations so that
everyone knows that we're all having
this discussion. It's a big question and
it's easier to have it as an AI
community so that we can actually review
it effectively. I have looked at
documentation, blogs, how you get
started on chunking from a lot of
different sources and for the most part
it's people shilling their own
solutions. I don't have a solution to
show. I'm just trying to build best
practices into the community here so
that we have an easier job building in
effective data sets that we can retrieve
against and reason against with machine
learning with AI. So your biggest your
leverage point in building any kind of
retrieval augmented generation system in
context engineering is getting the
context right. And that comes down to
chunking strategies and embedding which
is why we're just drilling on this so
hard. So if that's you, if you've been
struggling with chunking, if you don't
know where to get started, I want you to
run a little bit of an audit on your
current strategy. Are you using flat
token and character splits? Are you
having no overlap between your chunks?
Are you using the same strategy for all
your data types? Are you not preserving
metadata? Are you splitting in ways that
ignore document structure? These are all
common issues. Is financial data chunked
with no thought for relationship
preservation? The fixes aren't simple.
You are wrestling with the structure of
the data itself and how it expresses
semantic meaning. I am not here to
pretend it's an easy solve, but it is
the way forward toward truly
transformative AI retrieval. It is the
difference between well the AI kind of
makes sense when it looks at our big
pile of messy data or wow that is on
point that is correct we use it all the
time if you want that latter world you
cannot treat chunking as an afterthought
it's the foundation for AI performance
if you have a large data set bad
chunking poisons everything downstream
whether that's rag performance or prompt
engineering or model upgrades or even a
gentic search you need to start with
your highest value and most problematic
data type and just run into that pain,
run into the spaghetti nature of that
data and map it out and figure out how
you're going to deal with the chunking
side and fix it. Apply those five
principles and fix it. Sometimes what
you are really wrestling with is the
fact that you made data archite
architecture decisions that are very
difficult to undo. And so it's possible
that you will need to rearchitect your
data sets for AI. And I see companies
being willing to do it because they see
the benefits of AI. They were not
willing to do it for the cloud, right?
They were not willing to do it for a SAS
company and their SAS tool. They will do
it for AI and they will do it for AI
because they see the benefits. Getting
data architecture right matters. If you
are in the data architecture space as a
specialist, someone who designs good
data architectures, you have a sweet job
right now. People need your expertise.
If you really want to get your data into
AI in a way that generates that step
change in value that everybody on
LinkedIn likes to brag about, not
everybody on LinkedIn is actually
telling the truth, but we all know that
the value is actually there. It's just
there when you put the hard work in. And
this, by the way, this whole
conversation, the fact that I have to
make this video is why it is so
difficult to just stamp out solutions
for companies. Companies all have
different flavors of painful data. All
have different messes. Every data set is
painful in its own way. And so you need
to be able to take these principles and
figure out in your data environment what
chunking strategies make sense. And
that's why I've leaned on principles so
much because I think that in the end
that is the only thing that really
scales across really complex corporate
data sets. I looked at a bunch and these
are the things that keep standing out.
You got to maintain context coherence.
You have to be aware of boundaries,
size, and overlap. Your three levers.
Number three, you should recognize that
data type is dictating your strategy. We
talked about Excel. We talked about
code. We talked a little bit about
legal. You think about the data type. We
talked about conversations a little bit.
And then make sure that you are actually
getting your size right. So, size for
Goldilock outcomes. That's that's
principle number four. Don't size them
too big or you're going to get vague
answers. Don't size them too small or
you're going to get unfocused,
hallucinated answers. And then the fifth
principle, remember overlap. Remember
overlap. Remember overlap. There you go.
Those are the principles. That is why
embeddings matter. I don't want you to
walk away from this and think that there
is a handy silver bullet alternative.
Chunking matters and we don't talk about
it enough. Tears.