Libraries
Top Python libraries of 2023 | Tryolabs
The 10 main picks
- LiteLLM - call any LLM using OpenAI format, and more
- PyApp - deploy self contained Python applications anywhere
- Taipy - build UIs for data apps, even in production
- MLX - machine learning on Apple silicon with NumPy-like API
- Unstructured - the ultimate toolkit for text preprocessing
- ZenML and AutoMLOps - portable, production-ready MLOps pipelines
- WhisperX - speech recognition with word-level timestamps & diarization
- AutoGen - LLM conversational collaborative suite
- Guardrails - babysit LLMs so they behave as intended
- Temporian - the “Pandas” built for preprocessing temporal data
Runner-ups
Causal inference
- CausalTune - a library for automated tuning and selection for causal estimators.
- CausalPy - A Python package for causal inference in quasi-experimental settings.
- PyWhy-LLM - experimental library integrating LLM capabilities to support causal analyses.
CLI LLM Tools
- Chatblade - ChatGPT on the command line, providing utility methods to extract JSON or Markdown from ChatGPT responses.
- Elia - A terminal ChatGPT client built with Textual.
- Gorilla CLI - powers your command-line interactions with a user-centric tool. Simply state your objective, and Gorilla CLI will generate potential commands for execution.
- LLM - A CLI utility and Python library for interacting with Large Language Models, both via remote APIs and models that can be installed and run on your own machine. By the author of Datasette.
Code Tools
- Chainlit - “the Streamlit for ChatGPT”, create ChatGPT-like UIs on top of any Python code in minutes!
- pydistcheck - Linter that finds portability issues in Python package distributions (wheels and sdists).
- pyxet - lightweight interface for the XetHub platform, a blob-store with a filesystem like interface and git capabilities.
Code Review
- GitHub - luiyen/llm-code-review: A container GitHub Action to review a pull request by HuggingFace's LLM Model.
- Revolutionizing Code Review with Large Language Models: A Deep Dive into code2prompt and its Peers | by Pınar Ersoy | ANOLYTICS | Jun, 2024 | Medium
Computer vision
- deepdoctection - orchestrates document extraction and document layout analysis tasks using deep learning models.
- FaceFusion - Next generation face swapper and enhancer.
- MetaSeg - packaged version of the Segment Anything Model (SAM).
- VTracer - open source software to convert raster images (like jpg & png) into vector graphics (svg)
Data and Features
- Adala - Adala - Autonomous DAta (Labeling) Agent framework.
- Autolabel - Label, clean and enrich text datasets with LLMs.
- balance - simple workflow and methods for dealing with biased data samples when looking to infer from them to some target population of interest. See launch blog post. By META.
- Bytewax - Python framework that simplifies event and stream processing. Because Bytewax couples the stream and event processing capabilities of Flink, Spark, and Kafka Streams with the friendly and familiar interface of Python, you can re-use the Python libraries you already know and love.
- Featureform - feature store. Turn your existing data infrastructure into a feature store.
- Galactic cleaning and curation tools for massive unstructured text datasets. Ben (48/100) on X
- Great Expectations - helps data teams build a shared understanding of their data through quality testing, documentation, and profiling.
- GitHub - mendableai/firecrawl: 🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
Data Visualization
- PyGWalker - turn your pandas DataFrame into a Tableau-style User Interface for visual analysis.
- Vizro - a toolkit for creating modular data visualization applications. By McKinsey.
Embeddings and Vector DBs
- Epsilla - a high performance Vector Database Management System, focused on scalability, high performance, and cost-effectiveness of vector search.
- LanceDB - open-source database for vector-search built with persistent storage, which greatly simplifies retrieval, filtering and management of embeddings.
- SeaGOAT - local search tool that leverages vector embeddings to enable to search your codebase semantically.
- Text Embeddings Inference - A blazing fast inference solution for text embeddings models.
Federated learning
- Flower - A Friendly Federated Learning Framework.
- MetisFL - federated learning framework that allows developers to easily federate their machine learning workflows and train their models across distributed data silos without ever collecting the data in a centralized location.
Generative AI
- AudioCraft - library for audio processing and generation with deep learning. By Meta.
- Image Eval - A toolkit for evaluating your favorite image generation models. LinkedIn Launch Post.
- imaginAIry - Pythonic generation of stable diffusion images.
- Modular Diffusion - Python library for designing and training your own Diffusion Models with PyTorch.
- SapientML - Generative AutoML for Tabular Data.
LLM Accuracy Enhancements
- AutoChain - AutoChain: Build lightweight, extensible, and testable LLM Agents
- Auto-GPT - An experimental open-source attempt to make GPT-4 fully autonomous.
- Autotrain-Advanced - faster and easier training and deployments of state-of-the-art machine learning models.
- DSPy - framework for solving advanced tasks with language models (LMs) and retrieval models (RMs). DSPy unifies techniques for prompting and fine-tuning LMs - and approaches for reasoning and tool/retrieval augmentation. By Stanford NLP.
- GPTCache - GPTCache is a library for creating semantic cache to store responses from LLM queries.
- Neural-Cherche - fine-tune neural search models such as Splade, ColBERT, and SparseEmbed on a specific dataset, and run efficient inference on a fine-tuned retriever or ranker.
- MemGPT - Teaching LLMs memory management for unbounded context 📚🦙.
- nanoGPT - The simplest, fastest repository for training/finetuning medium-sized GPTs.
- Promptify - common prompts that work well to leverage LLMs for a variety of scenarios.
- SymbolicAI - Compositional Differentiable Programming Library.
- zep - a long-term memory store for LLM / Chatbot applications. Easily add relevant documents, chat history memory & rich user data to your LLM app's prompts.
- GitHub - ComposioHQ/composio: Composio equips agents with well-crafted tools empowering them to tackle complex tasks
LLM App Building
- autollm - Ship RAG based LLM web apps in seconds.
- Chidoriv - reactive runtime for building AI agents. It provides a framework for building AI agents that are reactive, observable, and robust. It supports building agents with Node.js, Python, and Rust.
- FastChat - open platform for training, serving, and evaluating large language model based chatbots.
- GPTRouter - smoothly manage multiple LLMs and image models, speed up responses, and ensure non-stop reliability. Similar to LiteLLM, our top pick!
- guidance - a guidance language for controlling large language models.
- haystack - end-to-end NLP framework that enables you to build NLP applications powered by LLMs, Transformer models, vector search and more.
- Instructor - interact with OpenAI’s function call API from Python code, with Python structs / objects.
- Jsonformer - A Bulletproof Way to Generate Structured JSON from Language Models
- Langroid - easily build LLM-powered applications. Set up Agents, equip them with optional components (LLM, vector-store and methods), assign them tasks, and have them collaboratively solve a problem by exchanging messages.
- LLM App - build innovative AI applications by providing real-time human-like responses to user queries based on the most up-to-date knowledge available in your data sources.
- maccarone - AI-managed code blocks in Python, lets you delegate sections of your Python program to AI ownership.
- magentic - prompt LLMs as simple Python functions using decorators.
- Semantic Kernel - integrate cutting-edge LLM technology quickly and easily into your apps. Microsoft’s “version” of LangChain.
- ControlFlow
- ControlFlow is a Python framework for building agentic AI workflows.
LLM Code Tools
- aider - command line tool that lets you pair program with GPT-3.5/GPT-4, to edit code stored in your local git repository.
- ChatGDB - Harness the power of ChatGPT inside the GDB debugger!
- Dataherald - natural language-to-SQL engine built for enterprise-level question answering over structured data. HN launch post.
- FauxPilot - open-source GitHub Copilot server.
- GPT Engineer - Specify what you want it to build, the AI asks for clarification, and then builds it.
- gpt-repository-loader - command-line tool that converts the contents of a Git repository into a text format that can be interpreted by LLMs.
- ipython-gpt - extension that allows you to use ChatGPT directly from your Jupyter Notebook or IPython Shell.
- Jupyter AI - generative AI extension for JupyterLab.
- PlotAI - use ChatGPT to create plots in Python and Matplotlib directly in your Python script or notebook.
- sketch - AI code-writing assistant for pandas users that understands the context of your data, greatly improving the relevance of suggestions.
LLM Development
- distilabel - AI Feedback framework for scalable LLM alignment.
- language-model-arithmetic - controlled text generation via language model arithmetic.
- Lit-GPT - Hackable implementation of state-of-the-art open-source LLMs based on nanoGPT. Supports flash attention, 4-bit and 8-bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training.
- Lit-LLaMA - Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training.
- LMQL - a query language for programming (large) language models.
LLM Experimentation
- ChainForge - open-source visual programming environment for battle-testing prompts to LLMs.
- Langflow - UI for LangChain, designed with react-flow to provide an effortless way to experiment and prototype flows.
- PromptTools - a set of open-source, self-hostable tools for experimenting with, testing, and evaluating LLMs, vector databases, and prompts. HN launch post.
LLM Serving
- Aviary - an LLM serving solution that makes it easy to deploy and manage a variety of open source LLMs. By the authors of Ray.
- GPT4All - an ecosystem to run powerful and customized large language models that work locally on consumer grade CPUs and any GPU (ex pygpt4all/pyllamacpp) with python bindings.
- LLM Engine - engine for fine-tuning and serving large language models. By Scale AI.
- LLM Gateway - gateway for secure & reliable communications with OpenAI and other LLM providers.
- punica - Serving multiple LoRA finetuned LLM as one.
- Ollama - Get up and running with Llama 2 and other large language models locally.
- OnPrem.LLM - tool for running on-premises large language models with non-public data.
- OpenLLM - An open platform for operating large language models (LLMs) in production. Fine-tune, serve, deploy, and monitor any LLMs with ease. By BentoML.
- OpenLLMetry - Open-source observability for your LLM application, based on OpenTelemetry.
- privateGPT - Interact privately with your documents using the power of GPT, 100% privately, no data leaks.
LLM Tools
- IncarnaMind - Connect and chat with your multiple documents (pdf and txt) through GPT and Claude LLMs in a minute.
- Puncia - leveraging AI and other tools, it will tell you everything about a web domain or subdomain, like finding hidden subdomains.
- scrapeghost - experimental library for scraping websites using OpenAI's GPT API.
MLOps, LLMOps, DevOps
- phoenix - ML Observability in a Notebook - Uncover Insights, Surface Problems, Monitor, and Fine Tune your Generative LLM, CV and Tabular Models.
Multimodal AI Tools
- LLaVAv - Visual Instruction Tuning - Large Language-and-Vision Assistant built towards multimodal GPT-4 level capabilities.
- Multimodal-Maestro - effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM.
- Nougat - the academic document PDF parser that understands LaTeX math and tables.
- UForm - Pocket-Sized Multi-Modal AI For Semantic Search & Recommendation Systems.
Python ML
- difflogic - A Library for Differentiable Logic Gate Networks by Felix Petersen.
- TensorDict - a dictionary-like class that inherits properties from tensors, such as indexing, shape operations, casting to device etc. The main purpose of TensorDict is to make code-bases more readable and modular by abstracting away tailored operations.
Performance and scalability
- AITemplate - Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
- AutoGPTQ - easy-to-use LLMs quantization package with user-friendly APIs, based on GPTQ algorithm.
- composer - PyTorch library that enables you to train neural networks faster, at lower cost, and to higher accuracy. Implements more than two dozen speedup methods that can be applied to your training loop in just a few lines of code.
- fastLLaMa - Python wrapper to run Inference of LLaMA models using C++.
- hidet - open-source deep learning compiler, written in Python. It supports end-to-end compilation of DNN models from PyTorch and ONNX to efficient cuda kernels.
- LPython - compiler that aggressively optimizes type-annotated Python code. It has several backends, including LLVM, C, C++, and WASM. LPython’s primary tenet is speed. Launch blog post.
- Petals - Run 100B+ language models at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading.
- TokenMonster - Determine the tokens that optimally represents a dataset at any specific vocabulary size
- GitHub - microsoft/LLMLingua: To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
Python Programming
- Django Ninja CRUD - declarative CRUD Endpoints & Tests with Django Ninja.
- DotDict - A simple Python library to make chained attributes possible.
- grai-core - Data lineage made simple. Grai makes it easy to understand and test how your data relates across databases, warehouses, APIs and dashboards. HN launch blog post.
- pypipe - Python pipe command line tool.
- ReactPy - library for building user interfaces in Python without Javascript, made from components which look and behave similarly to those found in ReactJS.
- Reflex - open source framework to build web apps in pure Python. Launch announcement.
- scrat - caching of expensive function results, like
lru_cache
but with persistency to disk. - svcs - a dependency container for Python SVCS
- view.py - lightning-fast, modern web framework. Currently in a very high alpha stage of development. HN launch post.
Optimization / Math
- Lineax - a JAX library for linear solves and linear least squares. Launch Tweet.
- pyribs - a bare-bones Python library for quality diversity optimization.
- Integrate Generative AI Into Your Applications Using LLMs - YouTube
Reinforcement Learning
- cheese - adaptive human in the loop evaluation of language and embedding models.
- imitation - Clean PyTorch implementations of imitation and reward learning algorithms.
- RL4LMs - modular RL library to fine-tune language models to human preferences. By AI2.
- trlX - distributed training of language models with Reinforcement Learning via Human Feedback (RLHF).
Time Series
- aeon - A unified framework for machine learning with time series.
Video Processing
- VapourSynth - video processing framework with simplicity in mind. Python docs.