From the course: Generative AI: Working with Large Language Models
OPT and BLOOM
- [Instructor] You've probably noticed that up to this point all of the language models are from big tech firms. Now although OpenAI made GPT-3 available via an API, no access was given to the actual weights of the model making it difficult for smaller research organizations and institutions to study these models. The Meta, or Facebook, AI team then released OPT, or Open Pre-trained Transformers. This was a couple of decoder-only pre-trained transformers ranging from 125 million to 66 billion parameters, which they shared with everyone. Interested research teams could also apply for access to the 175 billion parameter model. Now, this effectively gave researchers access to a large language model that was very similar to GPT-3. The Facebook team also detailed the infrastructure challenges they faced, along with providing code for experimenting with the models. This model was primarily trained on English text. The research teams behind the BLOOM model went one step further. The Hugging Face team working together with the Montreal AI Ethics Institute got a 3 million Euro grant for compute resources from research institutes in France. And then working together with a volunteer team of over 1000 researchers from different countries and institutions, they created a 176 billion parameter decoder-only transformer model called BLOOM. Now, this team has made everything openly available from the dataset used for training to anyone being able to download and run the model. They also released intermediate checkpoints, so this allows other organizations outside of big tech to experiment with these models. Now, BLOOM is also able to generate text in 46 natural languages and 13 programming languages. Now, what makes BLOOM unique is that for most of these languages, such as Spanish and French and Arabic, BLOOM will be the first language model with over 100 billion parameters ever created. Now, even if you want to run these models as inference, you'll still need access to expensive hardware accelerators. What makes this project particularly exciting is that now because more research teams have access to these models, the weights and the data sets, some parts of the community might focus on trying to make smaller versions of the model, which can run on fewer and less expensive hardware accelerators. And other researchers might try and train it on other languages not covered so far and get the first 100 billion parameter model. These two initiatives from Facebook and Hugging Face have made large language models available to everyone, and only time will tell what impact this will have.
Contents
-
-
-
-
-
GPT-34m 32s
-
GPT-3 use cases5m 27s
-
Challenges and shortcomings of GPT-34m 17s
-
GLaM3m 6s
-
Megatron-Turing NLG Model1m 59s
-
Gopher5m 23s
-
Scaling laws3m 14s
-
Chinchilla7m 53s
-
BIG-bench4m 24s
-
PaLM5m 49s
-
OPT and BLOOM2m 51s
-
GitHub models2m 43s
-
Accessing Large Language Models using an API6m 25s
-
Inference time vs. pre-training4m 5s
-
-