Learning Library

← Back to Library

Open‑Source LLMs vs Proprietary Models

6m • Unknown Channel • ai-ml • tutorial • intermediate • Watch on YouTube ↗

Key Points

Hugging Face hosts over 325 000 large language models (LLMs), which fall into two categories: proprietary models owned and controlled by companies, and open‑source models that are freely accessible and modifiable.
Proprietary LLMs tend to be larger in parameter count and come with usage licenses, but bigger size doesn’t automatically mean better performance, and many details remain opaque.
Open‑source LLMs provide transparency into architecture and training data, enable fine‑tuning on domain‑specific datasets, and benefit from community contributions that reduce reliance on a single vendor.
A growing variety of sectors—including NASA/IBM for geospatial analysis, healthcare for diagnostics and treatment planning, and finance with models like FinGPT—are deploying open‑source LLMs for specialized applications.
The expanding open‑source LLM ecosystem is increasingly challenging the proprietary business model, though users must still consider associated risks such as data quality, security, and support.

Sections

00:00:00 Comparing Open and Proprietary LLMs - The speaker delineates the differences between proprietary and open‑source large language models, covering ownership, licensing, parameter scale, and the growing advantages of open‑source alternatives.

Full Transcript

# Open‑Source LLMs vs Proprietary Models **Source:** [https://www.youtube.com/watch?v=y9k-U9AuDeM](https://www.youtube.com/watch?v=y9k-U9AuDeM) **Duration:** 00:06:29 ## Summary - Hugging Face hosts over 325 000 large language models (LLMs), which fall into two categories: proprietary models owned and controlled by companies, and open‑source models that are freely accessible and modifiable. - Proprietary LLMs tend to be larger in parameter count and come with usage licenses, but bigger size doesn’t automatically mean better performance, and many details remain opaque. - Open‑source LLMs provide transparency into architecture and training data, enable fine‑tuning on domain‑specific datasets, and benefit from community contributions that reduce reliance on a single vendor. - A growing variety of sectors—including NASA/IBM for geospatial analysis, healthcare for diagnostics and treatment planning, and finance with models like FinGPT—are deploying open‑source LLMs for specialized applications. - The expanding open‑source LLM ecosystem is increasingly challenging the proprietary business model, though users must still consider associated risks such as data quality, security, and support. ## Sections - [00:00:00](https://www.youtube.com/watch?v=y9k-U9AuDeM&t=0s) **Comparing Open and Proprietary LLMs** - The speaker delineates the differences between proprietary and open‑source large language models, covering ownership, licensing, parameter scale, and the growing advantages of open‑source alternatives. ## Full Transcript

0:00There are over 325,000 models on Huggingface and thousands more are being added. And why might 0:13you choose to use AI models like these? Well, let's start by getting a few things straight. 0:21The models we're talking about in this video, they're specifically LLMs, and that's large 0:28language models, which are foundation models that use artificial intelligence, deep learning 0:33and massive datasets to generate text. We're talking generative AI. And there are two types of 0:39generative AI models: There's proprietary models, and there are open source models. Now, proprietary 0:49LLMs, those are owned by a company who can control its usage. A proprietary LLM may include a license 0:56that restricts how the LLM can be used. On the other hand, open source LLMs are free and 1:02available for anyone to access, and developers and researchers are free to use, improve or otherwise 1:08modify the model. Now look, it's not true in every instance, but generally many proprietary LLMs are 1:16far larger in size than open source models. And specifically in terms of parameter size. Some of 1:26the leading proprietary LLMs extend to thousands of billions of parameters. Probably? Actually, 1:35we don't necessarily know because, well, those LLMs and that parameter counts are proprietary. 1:41But bigger isn't necessarily better. And the open source model ecosystem is showing promise 1:48in challenging the proprietary LLM business model. So let's discuss the benefits of open source LLMs. 1:58Let's talk about the types of organizations that are using them. Let's talk about some of 2:04the leading open source models available today, and we should talk about the risks associated 2:12with using them. Now, clearly, one of the benefits of a open source large language model, 2:20that has to be transparency. Open source LLMs may offer better insight into how they work, 2:27their architecture, and the training data used to develop them. Another big one is pre-trained open 2:35source LLMs allow a process called fine tuning. That means you can add features to the LLM that 2:43benefit your specific use case and the LLMs can be trained on specific data sets. So I can 2:50fine tune an LLM with my own data. And community contributions are a big plus. Using a proprietary 3:00LLM means you're reliant on a single provider, whereas open source models benefit from community 3:06contributions and multiple service providers. You can experiment and use contributions from people 3:12with varying perspectives. And these benefits have led to all sorts of organizations to use 3:19open source LLMs. In another video, I addressed how NASA and IBM developed an open source LLM 3:27trained on geospatial data. Some healthcare organizations use open source LLMs for diagnostic 3:35tools and treatment optimization. There's an open source LLM called FinGPT [fin / financial]. It was 3:46developed for the financial industry. Which brings us onto the topic of talking about some specific 3:52open source LLMs that you might find of interest. Now Huggingface maintains an open LLM leaderboard, 4:01and that tracks , ranks, and evaluates open source LLMs on various benchmarks like which LLM 4:11is scoring highest on the Truthful AI Benchmark series, which measures whether a language model 4:19is truthful in generating answers to questions. So it gives those answers a score. And the top spots 4:28on this leaderboard, they change frequently. And it's quite fun to watch the progress these 4:34models are making. Many of the models on the leaderboard are variations on the Llama 2 open 4:42source LLM. That's the one provided by Meta AI. And Llama 2 encompasses pre-trained and 4:48fine tuned generative text models from 70 billion all the way down to 7 billion parameters. And it's 4:56licensed for commercial use. Vicuna was created on top of the Llama model and fine tuned to follow 5:04instructions. And then it's also Bloom by BigScience, which is a multilingual language model 5:10created by more than 1000 AI researchers. Now, one area that both proprietary and open source LLMs 5:19share is their associated risks. Although LLM outputs often sounds fluent and authoritative, 5:27they can be confidently wrong. Hallucinations, they can result from the LLM being trained on 5:36incomplete, contradictory, or inaccurate data from misunderstanding context. Bias happens 5:44when the source of data is not diverse or not representative. And security problems can include 5:51leaking PII, and cybercriminals using the LLMs for malicious tasks like phishing. Especially in these 5:59early days of large language models, we do need to mitigate risk. But open source LLMs are thriving 6:07in business. Here at IBM, the Watsonx.ai Studio makes available access to multiple Llama 2 models, 6:15and IBM has released a series of foundation models of its own called Granite. And this 6:20space is changing rapidly, making open source LLMs a field well-worth keeping a close eye on.