AI-Powered Contract Summarization Workflow
Key Points
- The demo shows how to use generative AI to conversationally extract key information from lengthy client contracts and produce a concise summary in under 20 minutes.
- It leverages two LLMs—Granite 13b chat for extracting contract fields (title, parties, services, dates, compensation) into JSON, and Mistral Large to format that data into a readable table.
- The workflow uses LangChain (and its community tools) with PDF‑Parse to load the contract, Watsonx for authentication and API calls, and environment variables for IBM Cloud credentials.
- After obtaining a bearer token, prompts are sent to each model, the JSON output is cleaned of backticks, and the resulting table is written to a local Markdown file via Node’s fs module.
- The final product is a neatly formatted summary document that demonstrates the end‑to‑end automation of contract data extraction using LLMs.
Sections
- AI-Powered Contract Extraction Workflow - The speaker outlines how to use LLMs (Granite 13b.chat and Mistral Large) together with Watsonx, LangChain, PDF parsing, and Node.js utilities to programmatically extract key data from lengthy client contracts and generate a concise, tabular summary in under 20 minutes.
- LLM Pipeline: JSON to Markdown - The speaker walks through chaining two LLM calls—first converting data to clean JSON, then using a Mistral model to generate a table and write the results to a local Markdown file.
Full Transcript
# AI-Powered Contract Summarization Workflow **Source:** [https://www.youtube.com/watch?v=GyIaXarpq9w](https://www.youtube.com/watch?v=GyIaXarpq9w) **Duration:** 00:05:02 ## Summary - The demo shows how to use generative AI to conversationally extract key information from lengthy client contracts and produce a concise summary in under 20 minutes. - It leverages two LLMs—Granite 13b chat for extracting contract fields (title, parties, services, dates, compensation) into JSON, and Mistral Large to format that data into a readable table. - The workflow uses LangChain (and its community tools) with PDF‑Parse to load the contract, Watsonx for authentication and API calls, and environment variables for IBM Cloud credentials. - After obtaining a bearer token, prompts are sent to each model, the JSON output is cleaned of backticks, and the resulting table is written to a local Markdown file via Node’s fs module. - The final product is a neatly formatted summary document that demonstrates the end‑to‑end automation of contract data extraction using LLMs. ## Sections - [00:00:00](https://www.youtube.com/watch?v=GyIaXarpq9w&t=0s) **AI-Powered Contract Extraction Workflow** - The speaker outlines how to use LLMs (Granite 13b.chat and Mistral Large) together with Watsonx, LangChain, PDF parsing, and Node.js utilities to programmatically extract key data from lengthy client contracts and generate a concise, tabular summary in under 20 minutes. - [00:03:08](https://www.youtube.com/watch?v=GyIaXarpq9w&t=188s) **LLM Pipeline: JSON to Markdown** - The speaker walks through chaining two LLM calls—first converting data to clean JSON, then using a Mistral model to generate a table and write the results to a local Markdown file. ## Full Transcript
This is how you can extract content from client contracts using Gen AI.
Contracts can be long.
Many are 50 pages or more.
Instead of scrolling through and reading these long documents,
what if we could interact with them in a conversational manner
and only extract the information that we need?
We're going to use LLM's to extract the data we want
and create a new summary document in less than 20 minutes.
The two LLM's are going to use our Granite.13b.chat
to extract the text and Mistral Large to transform the output
into an easy to read table.
Let's get started.
Here's the contract in the public directory in our application.
The first thing we need to do is import our dependencies.
We'll be using link chain and link chain community.
For this project, they'll help us load and process our contract
with the help of PDF Parse.
To interact with our LLM's will be using watsonx,
which doesn't need to be imported here.
We'll use dotenv for importing end variables.
An axios HTTP requests.
And lastly, we can use the built in Node.js file system module
for adding our data to a new file.
Before we start coding, we need to have our credentials defined in our M file.
The credentials we'll need include an IBM Cloud API key,
our wastonx project ID and the watsonx.ai API URL.
For ease, we'll be implementing the solution all in one large function.
So let's define and export that function here.
Next, we'll need to reach out to watsonx to grab a bear token
which will pass to our two API calls.
We can save the response as a variable to use later.
Now let's load our contract so we can interact with it.
We'll instantiate the PDF loader class
to load and process our PDF document.
Now we get to the LLM's.
Our first prompt will extract and transform
key pieces of information from our contract.
For this, we're going to use Granite.13b.chat,
and we're going to tell it to extract the title, the name,
the services, effective date,
and the compensation from the contract.
Then we want the LLM to transform this data into JSON format.
And we'll set the output to a variable that we can use later.
Sometimes models respond with extra characters
in the output that we may not need.
This output contains back ticks, so let's go ahead
and remove any back ticks from the response output.
Okay, that output looks good.
So now we're going to set up our second LLM.
And this time we're going to be working
with the Mistral large model.
I'm going to copy the first LLM since we're going to be using
the same credentials and parameters
that we used in our first model call.
For our next prompt.
Let's tell it to refer to the previous output
and then create a simple table.
And we're going to save that to a variable as well.
Okay, so once we have the responses back from our LLM,
we're ready to write the file to a local file
using the fs.writeFile method.
So let's tell it to create a .md file.
We'll put in our callback.
Okay. Now let's check the new summarize file.
Okay, that looks great.
Congratulations.
You've successfully built a content extraction solution.