Learning Library

← Back to Library

Open‑Source LLMs vs Proprietary Models

Key Points

  • Hugging Face hosts over 325 000 large language models (LLMs), which fall into two categories: proprietary models owned and controlled by companies, and open‑source models that are freely accessible and modifiable.
  • Proprietary LLMs tend to be larger in parameter count and come with usage licenses, but bigger size doesn’t automatically mean better performance, and many details remain opaque.
  • Open‑source LLMs provide transparency into architecture and training data, enable fine‑tuning on domain‑specific datasets, and benefit from community contributions that reduce reliance on a single vendor.
  • A growing variety of sectors—including NASA/IBM for geospatial analysis, healthcare for diagnostics and treatment planning, and finance with models like FinGPT—are deploying open‑source LLMs for specialized applications.
  • The expanding open‑source LLM ecosystem is increasingly challenging the proprietary business model, though users must still consider associated risks such as data quality, security, and support.

Full Transcript

# Open‑Source LLMs vs Proprietary Models **Source:** [https://www.youtube.com/watch?v=y9k-U9AuDeM](https://www.youtube.com/watch?v=y9k-U9AuDeM) **Duration:** 00:06:29 ## Summary - Hugging Face hosts over 325 000 large language models (LLMs), which fall into two categories: proprietary models owned and controlled by companies, and open‑source models that are freely accessible and modifiable. - Proprietary LLMs tend to be larger in parameter count and come with usage licenses, but bigger size doesn’t automatically mean better performance, and many details remain opaque. - Open‑source LLMs provide transparency into architecture and training data, enable fine‑tuning on domain‑specific datasets, and benefit from community contributions that reduce reliance on a single vendor. - A growing variety of sectors—including NASA/IBM for geospatial analysis, healthcare for diagnostics and treatment planning, and finance with models like FinGPT—are deploying open‑source LLMs for specialized applications. - The expanding open‑source LLM ecosystem is increasingly challenging the proprietary business model, though users must still consider associated risks such as data quality, security, and support. ## Sections - [00:00:00](https://www.youtube.com/watch?v=y9k-U9AuDeM&t=0s) **Comparing Open and Proprietary LLMs** - The speaker delineates the differences between proprietary and open‑source large language models, covering ownership, licensing, parameter scale, and the growing advantages of open‑source alternatives. ## Full Transcript
0:00There are over 325,000 models on Huggingface and  thousands more are being added. And why might 0:13you choose to use AI models like these? Well,  let's start by getting a few things straight. 0:21The models we're talking about in this video,  they're specifically LLMs, and that's large 0:28language models, which are foundation models  that use artificial intelligence, deep learning 0:33and massive datasets to generate text. We're  talking generative AI. And there are two types of 0:39generative AI models: There's proprietary models,  and there are open source models. Now, proprietary 0:49LLMs, those are owned by a company who can control  its usage. A proprietary LLM may include a license 0:56that restricts how the LLM can be used. On  the other hand, open source LLMs are free and 1:02available for anyone to access, and developers and  researchers are free to use, improve or otherwise 1:08modify the model. Now look, it's not true in every  instance, but generally many proprietary LLMs are 1:16far larger in size than open source models. And  specifically in terms of parameter size. Some of 1:26the leading proprietary LLMs extend to thousands  of billions of parameters. Probably? Actually, 1:35we don't necessarily know because, well, those  LLMs and that parameter counts are proprietary. 1:41But bigger isn't necessarily better. And the  open source model ecosystem is showing promise 1:48in challenging the proprietary LLM business model.  So let's discuss the benefits of open source LLMs. 1:58Let's talk about the types of organizations  that are using them. Let's talk about some of 2:04the leading open source models available today,  and we should talk about the risks associated 2:12with using them. Now, clearly, one of the  benefits of a open source large language model, 2:20that has to be transparency. Open source LLMs  may offer better insight into how they work, 2:27their architecture, and the training data used to  develop them. Another big one is pre-trained open 2:35source LLMs allow a process called fine tuning.  That means you can add features to the LLM that 2:43benefit your specific use case and the LLMs  can be trained on specific data sets. So I can 2:50fine tune an LLM with my own data. And community  contributions are a big plus. Using a proprietary 3:00LLM means you're reliant on a single provider,  whereas open source models benefit from community 3:06contributions and multiple service providers. You  can experiment and use contributions from people 3:12with varying perspectives. And these benefits  have led to all sorts of organizations to use 3:19open source LLMs. In another video, I addressed  how NASA and IBM developed an open source LLM 3:27trained on geospatial data. Some healthcare  organizations use open source LLMs for diagnostic 3:35tools and treatment optimization. There's an open  source LLM called FinGPT [fin / financial]. It was 3:46developed for the financial industry. Which brings  us onto the topic of talking about some specific 3:52open source LLMs that you might find of interest.  Now Huggingface maintains an open LLM leaderboard, 4:01and that tracks , ranks, and evaluates open  source LLMs on various benchmarks like which LLM 4:11is scoring highest on the Truthful AI Benchmark  series, which measures whether a language model 4:19is truthful in generating answers to questions. So  it gives those answers a score. And the top spots 4:28on this leaderboard, they change frequently.  And it's quite fun to watch the progress these 4:34models are making. Many of the models on the  leaderboard are variations on the Llama 2 open 4:42source LLM. That's the one provided by Meta  AI. And Llama 2 encompasses pre-trained and 4:48fine tuned generative text models from 70 billion  all the way down to 7 billion parameters. And it's 4:56licensed for commercial use. Vicuna was created  on top of the Llama model and fine tuned to follow 5:04instructions. And then it's also Bloom by BigScience, which is a multilingual language model 5:10created by more than 1000 AI researchers. Now, one  area that both proprietary and open source LLMs 5:19share is their associated risks. Although LLM  outputs often sounds fluent and authoritative, 5:27they can be confidently wrong. Hallucinations,  they can result from the LLM being trained on 5:36incomplete, contradictory, or inaccurate data  from misunderstanding context. Bias happens 5:44when the source of data is not diverse or not  representative. And security problems can include 5:51leaking PII, and cybercriminals using the LLMs for  malicious tasks like phishing. Especially in these 5:59early days of large language models, we do need to  mitigate risk. But open source LLMs are thriving 6:07in business. Here at IBM, the Watsonx.ai Studio  makes available access to multiple Llama 2 models, 6:15and IBM has released a series of foundation  models of its own called Granite. And this 6:20space is changing rapidly, making open source  LLMs a field well-worth keeping a close eye on.