Skip to main content

LLM Building

Architecture

emerging-llm-app-stack

Emerging Architectures for LLM Applications | Andreessen Horowitz

Transformers, explained: Understand the model behind GPT, BERT, and T5 - YouTube

  • Positional encodings
  • Attention
  • Self attention
  • GPT3 - 45tb of text data

chat-gpt-working

Let’s Architect! Discovering Generative AI on AWS | AWS Architecture Blog

Building

LLM Working

How to train your ChatGPT

Stage 1: Pretraining

  1. Download ~10TB of text
  2. Get a cluster of ~6,000 GPUs
  3. Compress the text into a neural network, pay ~$2M, wait ~12 days
  4. Obtain base model

Stage 2: Finetuning

  1. Write labeling instructions
  2. Hire people (or use scale.ai!), collect 100K high quality ideal Q&A responses, and/or comparisons
  3. Finetune base model on this data, wait ~1 day
  4. Obtain assistant model
  5. Run a lot of evaluations
  6. Deploy
  7. Monitor, collect misbehaviors, go to step 1

LLM Security

  • Jailbreaking
  • Prompt injection
  • Backdoors & data poisoning
  • Adversarial inputs
  • Insecure output handling
  • Data extraction & privacy
  • Data reconstruction
  • Denial of service
  • Escalation
  • Watermarking & evasion
  • Model theft

1hr Talk Intro to Large Language Models - YouTube

Awesome ChatGPT Prompts | This repo includes ChatGPT prompt curation to use ChatGPT better.

SynthID - Google DeepMind

Dev Tools

python -m pip install --upgrade langchain[llm]
pip install chromadb
pip install pypdf

pip install chainlit
chainlit hello

chainlit run document_qa.py

Langchain

Welcome to LangChain - 🦜🔗 LangChain 0.0.180

Langchain Modules

Langchain vs LlamaIndex

Both LangChain & LlamaIndex offer distinct approaches to implementing RAG workflows.

LangChain follows a modular pipeline starting with Document Loaders that handle various file formats, followed by Text Splitters for chunk management, and Embeddings for vector creation.

It then utilizes Vector Stores like SingleStore, FAISS or Chroma for storage, a Retriever for similarity search, and finally, an LLM Chain for response generation. This framework emphasizes composability and flexibility in pipeline construction.

On the other hand, LlamaIndex begins with Data Connectors for multi-source loading, employs a Node Parser for sophisticated document processing, and features diverse Index Construction options including vector, list, and tree structures.

It implements a Storage Context for persistent storage, an advanced Query Engine for retrieval, and Response Synthesis for context integration. LlamaIndex specializes in data indexing and retrieval, offering more sophisticated indexing structures out of the box, while maintaining a focus on ease of use with structured data.

The key distinction lies in their approaches: LangChain prioritizes customization and pipeline flexibility, while LlamaIndex emphasizes structured data handling and advanced indexing capabilities, making each framework suitable for different use cases in RAG implementations.

No matter what AI framework you pick, I always recommend using a robust data platform like SingleStore that supports not just vector storage but also hybrid search, low latency, fast data ingestion, all data types, AI frameworks integration, and much more.

A Beginner’s Guide to Building LLM-Powered Applications with LangChain! - DEV Community

Understanding LlamaIndex in 9 Minutes! - YouTube

Ollama / LM Studio

The easiest way to get up and running with large language models locally.

docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

docker exec -it ollama ollama run llama2

docker exec -it ollama ollama run llama2-uncensored

docker exec -it ollama ollama run mistral

>>> /? # for help

Docker

LM Studio - SUPER EASY Text AI - Windows, Mac & Linux / How To - YouTube

LM Studio - Discover, download, and run local LLMs

oobabooga

A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

GitHub - oobabooga/text-generation-webui: A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

GitHub - oobabooga/text-generation-webui-extensions

Ludwig

Ludwig is an open-source, declarative machine learning framework that makes it easy to define deep learning pipelines with a simple and flexible data-driven configuration system. Ludwig is suitable for a wide variety of AI tasks, and is hosted by the Linux Foundation AI & Data.

Ludwig enables you to apply state-of-the-art tabular, natural language processing, and computer vision models to your existing data and put them into production with just a few short commands.

GitHub - ludwig-ai/ludwig: Low-code framework for building custom LLMs, neural networks, and other AI models

Ludwig

What is Ludwig? - Ludwig

SAAS

LLM Agent

An LLM Agent is a software entity capable of reasoning and autonomously executing tasks.

Resources

Development with Large Language Models Tutorial - OpenAI, Langchain, Agents, Chroma - YouTube

document-based-question-answering-system