Learning Library

← Back to Library

Choosing the Right LLM Model

Key Points

  • The most important factor in choosing a language model is the specific problem you need to solve, as different tasks may require different trade‑offs in accuracy, speed, cost, and control.
  • Proprietary SaaS models like GPT are great for quick prototyping, but many organizations prefer open‑source options (e.g., Llama, Mistral) for full customization and flexibility.
  • Model intelligence generally correlates with higher price and slower performance, while smaller models can deliver faster inference at lower cost, especially for high‑volume query workloads.
  • Community‑driven evaluation tools such as the Chatbot Arena leaderboard and the Open LLM Leaderboard provide practical, user‑voted rankings and detailed metrics that help assess model suitability beyond traditional benchmarks.

Full Transcript

# Choosing the Right LLM Model **Source:** [https://www.youtube.com/watch?v=pYax2rupKEY](https://www.youtube.com/watch?v=pYax2rupKEY) **Duration:** 00:06:56 ## Summary - The most important factor in choosing a language model is the specific problem you need to solve, as different tasks may require different trade‑offs in accuracy, speed, cost, and control. - Proprietary SaaS models like GPT are great for quick prototyping, but many organizations prefer open‑source options (e.g., Llama, Mistral) for full customization and flexibility. - Model intelligence generally correlates with higher price and slower performance, while smaller models can deliver faster inference at lower cost, especially for high‑volume query workloads. - Community‑driven evaluation tools such as the Chatbot Arena leaderboard and the Open LLM Leaderboard provide practical, user‑voted rankings and detailed metrics that help assess model suitability beyond traditional benchmarks. ## Sections - [00:00:00](https://www.youtube.com/watch?v=pYax2rupKEY&t=0s) **Choosing the Right LLM** - The speaker explains how developers can independently evaluate and compare proprietary and open-source large language models based on use case, performance, speed, and cost. - [00:03:07](https://www.youtube.com/watch?v=pYax2rupKEY&t=187s) **Running Granite Locally with Ollama** - The speaker demonstrates how to launch the Granite 3.1 model via Ollama, verify its output, and then integrate it into a Retrieval‑Augmented Generation workflow using the open‑source Open WebUI interface. - [00:06:16](https://www.youtube.com/watch?v=pYax2rupKEY&t=376s) **Closing Thoughts on Model Evaluation** - The speaker recaps various model testing methods—including leaderboards, benchmarks, and hybrid on‑device approaches—while urging viewers to share their projects, like the video, and stay engaged. ## Full Transcript
0:00With the huge amount of large language models out there today, 0:03it can be a bit overwhelming to choose the perfect one for your use case. 0:07Plus, the decision you make might have an impact on the accuracy of your results, as well as cost and performance. 0:13But don't worry. 0:14In the next few minutes, I'll show you as a developer how 0:16I independently evaluate different models, both proprietary and open source, 0:21and walk you through different demos of model use cases, 0:23like summarization, questioning and answering on your data, and more. 0:27Now, some people start off by looking at benchmarks or leaderboards, 0:31but for me, the biggest consideration for model selection is the problem that you're trying to solve. 0:36Because while GPT and other SaaS-based models are an easy and fast way to begin prototyping, 0:41many organizations need the full control, 0:43customization, and flexibility that an open-source model like Llama or Mistral provides. 0:49But no matter what you choose, you'll need to consider the performance, speed, and price of the model. 0:54And there's a lot of tools to help out with this. 0:57So, let's get started. 0:58Here I'm starting off at artificial analysis, comparing the entire landscape of models, both proprietary and open source. 1:05And you're probably gonna see some familiar names here, but something I do wanna note is that there are some trends. 1:10For example, with higher intelligence, typically results in a higher price or higher cost. 1:16While at the same time, smaller models might result in faster speeds and lower costs at the same time. 1:21Let's take intelligence as an example. 1:24So, the numbers that they calculated actually result from a variety of benchmarks on MMLU-Pro and similar evaluations. 1:31But let's say that you're scaling things up to millions of queries to your model. 1:35You probably don't need a PhD-level AI for a simple task like that. 1:39But one of my favorite community-based platforms to evaluate models is 1:43the Chatbot Arena Leaderboard by UC Berkeley and ALM Arena, 1:47which combines over a million blind user votes on models to rank them and essentially provide a vibe score. 1:55Because benchmarks sometimes can be reverse engineered by models, 1:57the Chatbot Arena is a great way to understand what the general AI community thinks is the best model. 2:03And this directly correlates to its abilities on reasoning, math, writing, and more. 2:09Plus, let's say, for example, that you want to compare two different models. 2:12You can actually do so in the interface. 2:14For example, I tried out this prompt to write an example customer response for a bank in JSON, 2:19and we're able to compare between Granite 8 billion and Llama 8 billion. 2:23So it's pretty cool, right? 2:24And finally, for simply open-source foundation and fine tune models, 2:28the Open LLM Leaderboard has a wide variety of model metrics 2:32and filters in order to understand which model might be best for your specific use case. 2:37For example, if you have a GPU or you wanna run it locally on your machine 2:41or even do real-time inferencing on a mobile or edge device. 2:45So what's great is that you can easily select these filters and see the model directly on Hugging Face. 2:50For example, the number three result here is the Granite model. 2:54And on Hugging Face, you can understand the millions of models and 2:57datasets that are on there and understand how you can use it on your machine. 3:01Now, we've taken a look at the general model landscape here, but let's start testing these out locally with our data. 3:07For example, this Granite model that we have here on Hugging Face. 3:11In order to test out different models and their use cases, we're going to use Ollama, 3:15which is a popular developer tool that enables everybody to run their own large language models on their own system. 3:21It's open source, and it has a model repository, meaning that we can run chat, 3:25vision, tool calling, and even a rag-embedding models locally. 3:30So to start, we're going to run Granite, specifically that Granite 3.1 model that we took a look at earlier on Hugging Face. 3:37And here, it's already quantized or optimized and compressed for our machine. 3:42And we're going to give it a quick question to make sure it's running. 3:45Talk like a pirate. 3:47Let's make sure. 3:48And there we go. 3:49We've got a funny response from our model. 3:52But now, with the model running locally on our machine, I want to use it with my 3:55own data to understand what it can do and its possibilities. 3:59We're going use RAG or retrieval augmented generation in order to do this. 4:03Here's an open-source AI interface called Open WebUI. 4:07And it's going to allow us to use a local model that we have running, for example, Granite, 4:11with Ollama, or maybe any open AI-compatible API model remotely as well. 4:17But let's think about it as an AI application, right? 4:20The back end could be our model and model server. 4:23And the front end could a user interface like this that allows us to take in 4:27our own custom data, to search the web, or to build agentic applications all with AI. 4:33So let's start off with RAG by attaching a file of something the model traditionally wouldn't know. 4:39This is specific enterprise data, right? Stuff that the model wasn't trained on originally, and specifically about Marty McFly. 4:46And we're going to provide this to the model and ask a specific question. 4:50What happened to Marty McFly in the 1955 accident from the claim? 4:55Now, traditionally, a model wouldn't know about this information. 4:57But by using an embedding model in the background, as well as a vector database, 5:02we're able to pull certain information from that source document 5:06and even provide that in ah the citations here to have a clear source of truth for our model's answers. 5:13So we're about to try out RAG here, but also different agentic functions as well. 5:17And it's a great place to get started with your own unique data. 5:20Now, let's say that you're building applications and you want to use a free coding assistant within your IDE. 5:25Well, traditionally, you need to use a SAS offering or a specifically fine-tuned coding model. 5:30But now, more recently, one model can now work with a variety of languages, including your code. 5:36So, what I've set up here is Continue. 5:38And it's an open-source and free extension from the VS Code marketplace or IntelliJ 5:43and specified it to use a local model that I have running with Ollama, that Granite model from earlier. 5:48So what we're able to do is to chat with our code base, explain entire files, and make edits for us. 5:54So here, I think we should add comments and some quick documentation 5:58on what this class is doing so that other developers can understand it as well. 6:02So I'm going to ask to add java.comments describing the service. 6:06And it's going to go in and add this necessary and important documentation 6:10to my project inline and be able to ask me to approve it or to deny it. 6:16So, I think it's pretty cool. 6:17And it is a great way to use an AI model with your code base as well. 6:22Okay great, so now you know the various ways to evaluate and test models, 6:26both from online leaderboards and benchmarks, as well as from your own machine. 6:31But remember, it all comes down to your use case, 6:33and there's even hybrid approaches of using a more powerful model in conjunction with a small model on device. 6:40But we're just getting started, because after experimenting with models comes the stage of building something great with AI. 6:46Now, what are you working on these days? 6:48Please let us know in the comments below. But as always, thank you so much for watching. 6:53Be sure to leave a like if you learned something today and have a good one.