Libraries
Deep Graph Library (DGL)
Python package built to ease deep learning on graph, on top of existing DL frameworks.
Deep Graph Library (DGL) is a Python package built for easy implementation of graph neural network model family, on top of existing DL frameworks (currently supporting PyTorch, MXNet and TensorFlow). It offers a versatile control of message passing, speed optimization via auto-batching and highly tuned sparse matrix kernels, and multi-GPU/CPU training to scale to graphs of hundreds of millions of nodes and edges
https://docs.dgl.ai/index.html
https://github.com/dmlc/dgl/tree/master/examples
Libraries
dlib
- Implementations of state-of-the-art CV and ML algorithms (including face recognition)
- Dlib is a modern C++ toolkit containing machine learning algorithms and tools for creating complex software in C++ to solve real world problems. It is used in both industry and academia in a wide range of domains including robotics, embedded devices, mobile phones, and large high performance computing environments.
scikit-image
- Collection of algorithms for image processing. Contains some algorithm implementations that OpenCV does not.
SimpleCV
Imbalanced-learn - ML
https://pypi.org/project/imbalanced-l
Theano - deep learning library
http://deeplearning.net/software/theano
LightGBM - machine learning
https://github.com/microsoft/LightGBM
Eli5 - machine learning
PyMC3 - Probabilistic Programming in Python
fastText (by facebookResearch)
FastText is an open-source, free, lightweight library that allows users to learn text representations and text classifiers. It works on standard, generic hardware. Models can later be reduced in size to even fit on mobile devices.
CMU Sphinx
CMU Sphinx, also called Sphinx in short, is the general term to describe a group of speech recognition systems developed at Carnegie Mellon University. These include a series of speech recognizers (Sphinx 2 - 4) and an acoustic model trainer (SphinxTrain)
SymPy
SymPy - algebraic evaluation, differentiation, expansion, complex numbers
https://www.sympy.org/en/index.html
https://wordsandbuttons.online/sympy_makes_math_fun_again.html
NetworkX
NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.
Features
- Data structures for graphs, digraphs, and multigraphs
- Many standard graph algorithms
- Network structure and analysis measures
- Generators for classic graphs, random graphs, and synthetic networks
- Nodes can be "anything" (e.g., text, images, XML records)
- Edges can hold arbitrary data (e.g., weights, time-series)
Other tools
- igraph
- SNAP
https://www.toptal.com/data-science/graph-data-science-python-networkx
Graph-tool
Graph-tool is an efficient Python module for manipulation and statistical analysis of graphs(a.k.a.networks).
GeoPy
Convert given string address to a location i.e. latitude and longitude values
Biotite
The Biotite package bundles popular tools in computational biology into an unifying framework. It offers file I/O operations, analyses and manipulations for biological sequence and structure data. Furthermore, the package provides interfaces for popular biological databases and external software.
The internal structure and sequence representations are based onNumPyndarrays, taking the advantage of C-accelerated operations. Time consuming operations that could not be vectorised are mostly implemented inCythonin order to achieve C-accelerations in those places, too.
Additionally the package aims for simple usability and extensibility: The objects representing structures and sequences can be indexed and scliced like anndarray. Even the actual internalndarraysare easily accessible allowing advanced users to implement their own algorithms upon the existing types.
Vaex
Vaex is a high performance Python library for lazy Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets. It calculates statistics such as mean, sum, count, standard deviation etc, on an N-dimensional grid for more than a billion (10^9
) samples/rows per second. Visualization is done using histograms, density plots and 3d volume rendering, allowing interactive exploration of big data. Vaex uses memory mapping, zero memory copy policy and lazy computations for best performance (no memory wasted).
https://github.com/vaexio/vaex
Polars
Polars is a blazingly fast DataFrames library implemented in Rust using Apache Arrow Columnar Format as the memory model.
- Lazy | eager execution
- Multi-threaded
- SIMD
- Query optimization
- Powerful expression API
- Hybrid Streaming (larger than RAM datasets)
Metatron
Self service solution for Big Data Discovery
All-in-one analytics from easy data preparation to fast visualization
Lyft Amundsen Data Discovery Metadata Engine
https://eng.lyft.com/amundsen-lyfts-data-discovery-metadata-engine-62d27254fbb9
Metacat
Making Big Data Discoverable and Meaningful atNetflix
Genie
Genie is a completely open source distributed job orchestration engine developed by Netflix. Genie provides REST-ful APIs to run a variety of big data jobs like Hadoop, Pig, Hive, Spark, Presto, Sqoop and more. It also provides APIs for managing the metadata of many distributed processing clusters and the commands and applications which run on them.
https://netflix.github.io/genie
- Stochastic Gradient Descent
- Backpropogation
Monotonic AI Models
https://www.toptal.com/machine-learning/monotonic-ai-models
Pre-Trained Models
https://www.toptal.com/deep-learning/exploring-pre-trained-models
ResNet
https://towardsdatascience.com/an-overview-of-resnet-and-its-variants-5281e2f56035
MELTANO - Model, Extract, Load, Transform, Analyze, Notebook, Orchestrate
Meltano is a convention-over-configuration framework for analytics, business intelligence, and data science. It leverages open source software and software development best practices including version control, CI, CD, and review apps.
Generative Art
https://tylerxhobbs.com/essays/2020/flow-fields
Getting started with generative art | Nicola Rennie
Kyso
Kyso is a central knowledge hub to manage, share and collaborate on technical reports - weconvertdata-science tools like Jupyter Notebooks to Medium-styleblog posts published to the web.We're fully integratedwithGithub, meaning when you push your commits, the changes are reflected immediately onKyso.It's very easy to get started - profiles, comments, etc. work right out of the box.
Jina
Cloud-native neural search framework for 𝙖𝙣𝙮 kind of data
Jinaallows you to build search-as-a-service powered by deep learning in just minutes.
- All data types- Large-scale indexing and querying of any kind of unstructured data: video, image, long/short text, music, source code, PDF, etc.
- Fast & cloud-native- Distributed architecture from day one, scalable & cloud-native by design: enjoy containerizing, streaming, paralleling, sharding, async scheduling, HTTP/gRPC/WebSocket protocol.
- Save time- The design pattern of neural search systems, from zero to a production-ready system in minutes.
- Own your stack- Keep end-to-end stack ownership of your solution, avoid integration pitfalls you get with fragmented, multi-vendor, generic legacy tools.
https://github.com/jina-ai/jina