From the course: Generative AI: Introduction to Large Language Models

The evolution of large language models

From the course: Generative AI: Introduction to Large Language Models

The evolution of large language models

- [Instructor] When the ancients carefully recorded their knowledge on scrolls of papyrus, and housed them in the legendary Library of Alexandria, little could they have imagined that all that knowledge and more would be available at the fingertips of their descendants millennia later. That's the power and beauty of large language models. These models not only possess the capacity to address queries, and unravel intricate puzzles, they also hold the ability to condense extensive volumes of information, translate languages, and extract meaning from diverse linguistic context. Large language models have a fascinating history, that goes back to the middle of the 20th century. The earliest foundations of large language models can be traced back to experiments with neural information processing systems that were conducted in the 1950s. These experiments were aimed at allowing computers to process natural language. As part of that effort, researchers from IBM and Georgetown University worked together to create a system that could automatically translate phrases from Russian to English. By successfully translating 60 Russian sentences into English, the experiment marked the beginning of machine translation research. Moving ahead through the 1960s, a notable advancement emerged when MIT researchers introduced the world to the inaugural chatbot, named ELIZA. Employing a rudimentary form of pattern recognition, ELIZA simulated human conversation, by transforming user inputs into inquiries, and generating responses based on predefined rules. Though ELIZA was far from perfect, it's marked the beginning of research into natural language processing, and the development of more sophisticated language models. During the 1980s and 1990s, statistical approaches gained prominence in the field of natural language processing, and researchers turned to tools, like hidden Markov models and N-gram language models, to estimate the likelihood of word sequences. The N-gram model relied on how often words appear together in a dataset to make predictions. While hidden Markov models depicted sequences of hidden states, offering a more structured approach. These approaches represented a transition from systems based on specific rules to methods centered around data analysis. From the late 1990s to the early 2000s, there was a renewed interest in neural networks, largely due to advancements in the back propagation algorithm. This improvement facilitated more efficient training of neural networks, with multiple layers. Notably, neural language models brought about a shift from traditional statistical techniques, presenting a novel approach to language representation and comprehension. Among the notable innovations in this era was the inception of long short-term memory networks in 1997. These networks enabled the development of more intricate neural architectures, allowing for the creation of deeper models capable of effectively handling larger datasets. Another significant milestone arrived with the introduction of Stanford's CoreNLP Suite in 2010. This suite brought together a collection of tools and algorithms, that proved instrumental in addressing intricate natural language processing tasks, such as sentiment analysis, and named entity recognition. The year 2011 saw the launch of Google Brain, which offered researchers access to robust computing resources and data sets. This platform also introduced advanced functionalities like word embeddings, which enhanced the ability of NLP systems to grasp the context of words more effectively. In 2013, the use of Word2VEC algorithm to generate word embeddings was introduced by researchers at Google. Word2VEC is a technique that learns vector representations for words from large amounts of text data. It captures the semantic relationships between words more effectively than previous techniques, which led to significant advancements in natural language processing and word representation. Google Brain's contributions have paved the way for some major advancements in natural language processing, including the introduction of transformer models in 2017. The advent of transformers marked a turning point in natural language processing, with a self-attention mechanism enabling parallel sequence processing, and improved model training times. The transformer would go on to become the basis for almost every subsequent large-scale language model. An example of such a model is BERT. Introduced by Google researchers in 2018, BERT, short for Bidirectional Encoder Representations from Transformers, used the transformer architecture bidirectionally. This enabled the model to understand context, by considering both the left and right surroundings of a word. BERT demonstrated top-notch performance on various NLP tasks, triggering extensive research in the domain of pre-trained language models. In 2018, while BERT was evolving, OpenAI introduced the initial model in its GPT series. Subsequently, GPT-2 was launched in 2019, followed by GPT-3 in 2020, and GPT-4 in 2023. GPT, short for Generative Pre-Trained Transformer, represent transformer-based generative language models. These types of models employ unsupervised learning by training on vast text collections to generate coherent and contextually appropriate text. They established the benchmark for large language models, and laid the groundwork for ChatGPT. Since introduction of BERT and GPT, many other language models have been developed. Some notable examples include RoBERTa by Facebook AI, T5 by Google Research, PaLM2, which powers the Bard chatbot, also by Google, and Llama 2 by Meta AI. The evolution of large language models, starting from initial rule-based systems, to the contemporary transformer-based models, embodies the swift growth in natural language processing. These strides highlight the boundless opportunities in AI-driven language comprehension and generation. As AI research continues to evolve, we can expect to witness even more sophisticated and versatile language models in the days ahead.

Contents