Skip to main content

Intro

LLM makes good programmers great, and not make bad programmers good

Moving from information to knowledge age

LLM

  • A large language model (LLM) is a type of artificial intelligence program that can recognize and generate text, among other tasks.
  • LLM are very large models that are pre-trained on vast amounts of data.
  • Built on transformer architecture, it is a set of neural network that consist of an encoder and a decoder with self-attention capabilities.
  • It can perform completely different tasks such as answering questions, summarizing documents, translating languages and completing sentences.
  • Open AI's GPT-3 model has 175 billion parameters. Also it can take inputs up to 10K tokens in each prompt.
  • In simpler terms, an LLM is a computer program that has been fed enough examples to be able to recognize and interpret human language or other types of complex data.
  • Quality of the samples impacts how well LLMs will learn natural language, so an LLM's programmers may use a more curated data set.

Types

Base LLM

  • Predicts next word, based on text training data
  • Prompt - What is the capital of France?
  • Ans - What is France's largest city?
  • Ans - What is France's population?

Instruction Tuned LLM

GPT-3 / GPT-4

Generative Pre-trained Transformer 3 (GPT-3) is an autoregressive language model that uses deep learning to produce human-like text. Given an initial text as prompt, it will produce text that continues the prompt.

The architecture is a standard transformer network(with a few engineering tweaks) with the unprecedented size of 2048-token-long context and 175 billion parameters(requiring 800 GB of storage). The training method is "generative pretraining", meaning that it is trained to predict what the next token is. The model demonstrated strong few-shot learning on many text-based tasks.

Past Present & Future

Datasets

MMLU Dataset | Papers With Code - Massive Multitask Language Understanding