Intro

LLM makes good programmers great, and not make bad programmers good

Moving from information to knowledge age

LMM - Large Multimodel Model

LLM

A large language model (LLM) is a type of artificial intelligence program that can recognize and generate text, among other tasks.
LLM are very large models that are pre-trained on vast amounts of data.
Built on transformer architecture, it is a set of neural network that consist of an encoder and a decoder with self-attention capabilities.
It can perform completely different tasks such as answering questions, summarizing documents, translating languages and completing sentences.
Open AI's GPT-3 model has 175 billion parameters. Also it can take inputs up to 10K tokens in each prompt.
In simpler terms, an LLM is a computer program that has been fed enough examples to be able to recognize and interpret human language or other types of complex data.
Quality of the samples impacts how well LLMs will learn natural language, so an LLM's programmers may use a more curated data set.

Types

Base LLM

Predicts next word, based on text training data
Prompt - What is the capital of France?
Ans - What is France's largest city?
Ans - What is France's population?

Instruction Tuned LLM

Tries to follow instructions
Fine-tune on instructions and good attempts at following those instructions.
RLHF: Reinforcement Learning with Human Feedback - Human Feedback in AI: The Essential Ingredient for Success | Label Studio Create a High-Quality Dataset for RLHF | Label Studio
Helpful, Honest, Harmless
Prompt - What is the capital of France?
Ans - The capital of France is Paris.

GPT-3 / GPT-4

Generative Pre-trained Transformer 3 (GPT-3) is an autoregressive language model that uses deep learning to produce human-like text. Given an initial text as prompt, it will produce text that continues the prompt.

The architecture is a standard transformer network(with a few engineering tweaks) with the unprecedented size of 2048-token-long context and 175 billion parameters(requiring 800 GB of storage). The training method is "generative pretraining", meaning that it is trained to predict what the next token is. The model demonstrated strong few-shot learning on many text-based tasks.

Past Present & Future

Major Gen AI Use Case 2025

Datasets

MMLU Dataset | Papers With Code - Massive Multitask Language Understanding

LLM​

Types​

Base LLM​

Instruction Tuned LLM​

GPT-3 / GPT-4​

Past Present & Future​

Datasets​

Links​

LLM