Learning Library

← Back to Library

AI-Powered Contract Summarization Workflow

Key Points

  • The demo shows how to use generative AI to conversationally extract key information from lengthy client contracts and produce a concise summary in under 20 minutes.
  • It leverages two LLMs—Granite 13b chat for extracting contract fields (title, parties, services, dates, compensation) into JSON, and Mistral Large to format that data into a readable table.
  • The workflow uses LangChain (and its community tools) with PDF‑Parse to load the contract, Watsonx for authentication and API calls, and environment variables for IBM Cloud credentials.
  • After obtaining a bearer token, prompts are sent to each model, the JSON output is cleaned of backticks, and the resulting table is written to a local Markdown file via Node’s fs module.
  • The final product is a neatly formatted summary document that demonstrates the end‑to‑end automation of contract data extraction using LLMs.

Full Transcript

# AI-Powered Contract Summarization Workflow **Source:** [https://www.youtube.com/watch?v=GyIaXarpq9w](https://www.youtube.com/watch?v=GyIaXarpq9w) **Duration:** 00:05:02 ## Summary - The demo shows how to use generative AI to conversationally extract key information from lengthy client contracts and produce a concise summary in under 20 minutes. - It leverages two LLMs—Granite 13b chat for extracting contract fields (title, parties, services, dates, compensation) into JSON, and Mistral Large to format that data into a readable table. - The workflow uses LangChain (and its community tools) with PDF‑Parse to load the contract, Watsonx for authentication and API calls, and environment variables for IBM Cloud credentials. - After obtaining a bearer token, prompts are sent to each model, the JSON output is cleaned of backticks, and the resulting table is written to a local Markdown file via Node’s fs module. - The final product is a neatly formatted summary document that demonstrates the end‑to‑end automation of contract data extraction using LLMs. ## Sections - [00:00:00](https://www.youtube.com/watch?v=GyIaXarpq9w&t=0s) **AI-Powered Contract Extraction Workflow** - The speaker outlines how to use LLMs (Granite 13b.chat and Mistral Large) together with Watsonx, LangChain, PDF parsing, and Node.js utilities to programmatically extract key data from lengthy client contracts and generate a concise, tabular summary in under 20 minutes. - [00:03:08](https://www.youtube.com/watch?v=GyIaXarpq9w&t=188s) **LLM Pipeline: JSON to Markdown** - The speaker walks through chaining two LLM calls—first converting data to clean JSON, then using a Mistral model to generate a table and write the results to a local Markdown file. ## Full Transcript
0:00This is how you can extract content  from client contracts using Gen AI. 0:06Contracts can be long. 0:07Many are 50 pages or more. 0:10Instead of scrolling through and reading these long documents, 0:13what if we could interact with  them in a conversational manner 0:16and only extract the information that we need? 0:20We're going to use LLM's to extract the data we want 0:24and create a new summary document in less than 20 minutes. 0:29The two LLM's are going to use our Granite.13b.chat 0:33to extract the text and Mistral Large to transform the output 0:37into an easy to read table. 0:39Let's get started. 0:41Here's the contract in the public directory in our application. 0:46The first thing we need to do is import our dependencies. 0:49We'll be using link chain and link chain community. 0:55For this project, they'll help us load and process our contract 0:59with the help of PDF Parse. 1:04To interact with our LLM's will be using watsonx, 1:07which doesn't need to be imported here. 1:10We'll use dotenv for importing end variables. 1:17An axios HTTP requests. 1:23And lastly, we can use the built in Node.js file system module 1:27for adding our data to a new file. 1:33Before we start coding, we need to have our credentials defined in our M file. 1:39The credentials we'll need include an IBM Cloud API key, 1:43our wastonx project ID and the watsonx.ai API URL. 1:53For ease, we'll be implementing the solution all in one large function. 1:58So let's define and export that function here. 2:04Next, we'll need to reach out to watsonx to grab a bear token 2:08which will pass to our two API calls. 2:20We can save the response as a variable to use later. 2:29Now let's load our contract so we can interact with it. 2:35We'll instantiate the PDF loader class 2:37to load and process our PDF document. 2:46Now we get to the LLM's. 2:49Our first prompt will extract and transform 2:52key pieces of information from our contract. 2:55For this, we're going to use Granite.13b.chat, 3:00and we're going to tell it to extract the title, the name, 3:04the services, effective date, 3:06and the compensation from the contract. 3:11Then we want the LLM to transform this data into JSON format. 3:16And we'll set the output to a variable that we can use later. 3:24Sometimes models respond with extra characters 3:26in the output that we may not need. 3:30This output contains back ticks, so let's go ahead 3:32and remove any back ticks from the response output. 3:40Okay, that output looks good. 3:43So now we're going to set up our second LLM. 3:46And this time we're going to be working 3:47with the Mistral large model. 3:52I'm going to copy the first LLM since we're going to be using 3:55the same credentials and parameters 3:56that we used in our first model call. 4:00For our next prompt. 4:03Let's tell it to refer to the previous output 4:06and then create a simple table. 4:11And we're going to save that to a variable as well. 4:15Okay, so once we have the responses back from our LLM, 4:19we're ready to write the file to a local file 4:25using the fs.writeFile method. 4:30So let's tell it to create a .md file. 4:36We'll put in our callback. 4:41Okay. Now let's check the new summarize file. 4:54Okay, that looks great. 4:56Congratulations. 4:58You've successfully built a  content extraction solution.