Building

Architecture

emerging-llm-app-stack

Emerging Architectures for LLM Applications | Andreessen Horowitz

Transformers, explained: Understand the model behind GPT, BERT, and T5 - YouTube

Positional encodings
Attention
Self attention
GPT3 - 45tb of text data

chat-gpt-working

Let’s Architect! Discovering Generative AI on AWS | AWS Architecture Blog

Building

GitHub - karpathy/nanoGPT: The simplest, fastest repository for training/finetuning medium-sized GPTs. ⭐ 57k

LLM Working

Decoding Strategies

Greedy Search
Beam search in Large Language Models (LLMs) is a decoding strategy that explores multiple potential output sequences simultaneously, keeping track of the most promising "beams" (or sequences) at each step, to find the most likely output.
Decoding Demystified : How LLMs Generate Text - III - DEV Community
Decoding Strategies in Large Language Models – Maxime Labonne
Decoding Strategies in Large Language Models
Decoding Strategies: How LLMs Choose The Next Word
Understanding greedy search and beam search | by Jessica López Espejel | Medium

How to train your ChatGPT

Stage 1: Pretraining

Download ~10TB of text
Get a cluster of ~6,000 GPUs
Compress the text into a neural network, pay ~$2M, wait ~12 days
Obtain base model

Stage 2: Finetuning

Write labeling instructions
Hire people (or use scale.ai!), collect 100K high quality ideal Q&A responses, and/or comparisons
Finetune base model on this data, wait ~1 day
Obtain assistant model
Run a lot of evaluations
Deploy
Monitor, collect misbehaviors, go to step 1

LLM Security

Jailbreaking
Prompt injection
Backdoors & data poisoning
Adversarial inputs
Insecure output handling
Data extraction & privacy
Data reconstruction
Denial of service
Escalation
Watermarking & evasion
Model theft

1hr Talk Intro to Large Language Models - YouTube

Awesome ChatGPT Prompts | This repo includes ChatGPT prompt curation to use ChatGPT better.

SynthID - Google DeepMind

Distillation Attack

A distillation attack occurs when someone systematically queries a proprietary AI model and uses its outputs to train a smaller or competing model, effectively ‘stealing’ its capabilities without access to the original weights.

Instead of copying parameters, the attacker copies behaviour by treating the target model as a teacher and learning from its responses at scale. For developers building AI products, this raises serious concerns around API exposure, rate limits, watermarking, and model output monitoring.

Others

People have a lot of implicit knowledge — things we know but struggle to fully explain. People often use body-oriented metaphors for this phenomenon. We say that an insight is “on the tip of our tongue,” that we “can’t put our finger on” an idea, or that we know something “in our gut.”

Something similar is true of LLMs: their ability to perform cognitive tasks greatly exceeds their ability to explicitly explain how and why they’re able to perform them.

Dev Tools

Ollama / LM Studio

The easiest way to get up and running with large language models locally.

docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

docker exec -it ollama ollama run llama2

docker exec -it ollama ollama run llama2-uncensored

docker exec -it ollama ollama run mistral

>>> /? # for help

Options

LM Studio – A polished desktop application with a built-in search for finding and downloading models from Hugging Face. It is the easiest way to visually manage models and adjust hardware settings like GPU offloading without touching a terminal.
Ollama – A lightweight command-line tool that runs as a background service to serve models via a simple API. It is the best choice for developers who want to integrate local AI into other apps or run models with a single terminal command.
MLX-LM – Apple’s official framework optimized specifically for Apple Silicon to achieve the highest possible inference speeds. It is the "performance king" for those comfortable with Python who want to squeeze every drop of power from their Mac's GPU.
Jan.ai – An open-source, privacy-focused assistant that provides a clean chat interface similar to ChatGPT but entirely offline. It is ideal for users who want organized chat history, file uploads, and a "set-it-and-forget-it" local workspace.
GPT4All – A beginner-friendly app designed to run efficiently on standard CPUs without needing a powerful graphics card. It features a built-in "LocalDocs" tool that lets you chat privately with your own PDF and text collections out of the box.

Ludwig

Ludwig is an open-source, declarative machine learning framework that makes it easy to define deep learning pipelines with a simple and flexible data-driven configuration system. Ludwig is suitable for a wide variety of AI tasks, and is hosted by the Linux Foundation AI & Data.

Ludwig enables you to apply state-of-the-art tabular, natural language processing, and computer vision models to your existing data and put them into production with just a few short commands.

GitHub - ludwig-ai/ludwig: Low-code framework for building custom LLMs, neural networks, and other AI models ⭐ 12k

Ludwig

What is Ludwig? - Ludwig

Others

GitHub - LMCache/LMCache: Supercharge Your LLM with the Fastest KV Cache Layer ⭐ 8.1k

SAAS

Resources

document-based-question-answering-system

Architecture​

Building​

Decoding Strategies​

How to train your ChatGPT​

Stage 1: Pretraining​

Stage 2: Finetuning​

LLM Security​

Distillation Attack​

Others​

Dev Tools​

Ollama / LM Studio​

Options​

Ludwig​

Others​

SAAS​

Resources​

Architecture

Building

Decoding Strategies

How to train your ChatGPT

Stage 1: Pretraining

Stage 2: Finetuning

LLM Security

Distillation Attack

Others

Dev Tools

Ollama / LM Studio

Options

Ludwig

Others

SAAS

Resources