Local Document QA Using Docling and Granite
Key Points
- The tutorial shows how to build a local document‑based question‑answering system using IBM’s open‑source Docling for format conversion and the Granite 3.1 model (run via Ollama) for large‑context text processing.
- A six‑step Jupyter notebook guides you through environment setup, creating helper functions for format detection and Docling conversion (to markdown), chunking the document, storing chunks in a vector store, and wiring the retrieval‑augmented generation chain.
- The example works with a 200‑page IBM Red Book PDF, demonstrating Docling’s ability to handle complex PDFs and Granite 3.1’s capacity to retrieve up to ten relevant chunks, exploiting its extensive context window.
- The complete code is available on GitHub, and the process time varies with your local hardware; the presenter notes it takes roughly one minute on his machine.
Sections
Full Transcript
# Local Document QA Using Docling and Granite **Source:** [https://www.youtube.com/watch?v=eBJblHgDaRs](https://www.youtube.com/watch?v=eBJblHgDaRs) **Duration:** 00:04:04 ## Summary - The tutorial shows how to build a local document‑based question‑answering system using IBM’s open‑source Docling for format conversion and the Granite 3.1 model (run via Ollama) for large‑context text processing. - A six‑step Jupyter notebook guides you through environment setup, creating helper functions for format detection and Docling conversion (to markdown), chunking the document, storing chunks in a vector store, and wiring the retrieval‑augmented generation chain. - The example works with a 200‑page IBM Red Book PDF, demonstrating Docling’s ability to handle complex PDFs and Granite 3.1’s capacity to retrieve up to ten relevant chunks, exploiting its extensive context window. - The complete code is available on GitHub, and the process time varies with your local hardware; the presenter notes it takes roughly one minute on his machine. ## Sections - [00:00:00](https://www.youtube.com/watch?v=eBJblHgDaRs&t=0s) **Untitled Section** - - [00:03:07](https://www.youtube.com/watch?v=eBJblHgDaRs&t=187s) **Docling and Granite QA Demo** - The speaker demonstrates converting a PDF to Markdown with Docling, feeding it to Granite 3.1 to create a local question‑answering system, and shares the complete code on GitHub. ## Full Transcript
Welcome to this tutorial on creating a document based question and answering system
that works locally on your own computer.
Using IBM's OpenSource tool kit Docling in conjunction with the Granite 3.1 model
Docling facilitates the passing and conversion of various document formats,
while Granite 3.1 with this extensive context window enables efficient processing of large textual data.
You're going to need to install Ollama on your machine as that's what we're going to use to run the Granite 3.1 model.
View the dependency so downloaded as part of the tutorial.
I recommend that you go through the tutorial step by step
in order to ensure that you're following along and understand what each step does.
The code for this tutorial is also available on our GitHub here as the Jupyter Notebook.
And I'm going to walk you through all of those steps and go through and run this in a few moments.
So let's run through that Jupyter notebook.
There are six steps in this notebook, one to set up the environment.
Another for creating a helper function to detect the document format.
Step three is a function that we're going to be calling Docling in to do the document conversion.
In our case, we're actually going to be converting the document to markdown so you can see what Docling doing.
Step four is where we're going to set up our question and answer chain,
using the document, splitting it into chunks, storing it in a vector store, and then setting up our language model.
Step five is where we're going to set up our question answering interface.
And step six is where we actually perform the question answering itself.
Before I go through and run sales in the notebook, I wanted to show you the document that we're going to use.
What I have in front of me here is an IBM red book that we downloaded from the IBM website.
And it's about creating OpenShift, multiple architecture clusters with IBM Power.
As you can see, this PDF document is 12 chapters long and it has nearly 200 pages in it.
It's quite a large document.
So what we're going to do is let's run through these cells and see the output.
So first off, we're pulling our models locally using Ollama.
I'm assuming here that you've already installed Ollama on your machine.
Well, then going to pull some dependencies down.
Now here is the dependency on Docling 2.0.
Within going to import those things into our project.
We're going to create a helper function here.
We're going to create a document conversion function here.
We then set up our question answering chain.
As I shown you,
note that I have set my retriever here to retrieve ten documents from the retrieval chain in order to answer questions.
This is to take advantage of granite 3.1's large context window.
Then the interface is getting put together and now performing the question and answering.
Depending on the type of machine that you have locally that will make a determination on how long this process takes.
At the moment it's taking me one minute and I think I have one more question waiting to be answered.
Great.
And then we go.
Here are all my answers.
So I've just showing you how you can use Docling and Granite 3.1
in order to build a question and answering system using chat that works locally on your computer
and in our function to convert the document, we've actually chosen to convert the document
to markdown so that you can see the work that Docling did.
So if I open up this markdown file here,
you can actually see this is Docling converting the PDF that I provided it into a format that we then sent to the LLM.
And as you can see, it's done a pretty good job.
The code for all of this, including this IBM red box available in our GitHub.
Have a play with it and let me know what you think.