Types of Databases
RDBMS / Relational database (ACID)
A relational database management system (RDBMS) is a program that allows you to create, update, and administer a relational database. Most relational database management systems use the SQL language to access the database.
A relational database is a type of database. It uses a structure that allows us to identify and access datain relationto another piece of data in the database. Often, data in a relational database is organized into tables.
Columns - Tables can have hundreds, thousands, sometimes even millions of columns of data. Columns are labeled with a descriptive name (say, age) and have a specific data type
Rows/Records - Tables can also have many rows of data. These rows are often called records
- MySQL Cluster
- PostgreSQL
- VoltDB
- Clustrix
- ScaleBase
- NimbusDB
- Megastore over BigTable
- MariaDB
- SQLite
NoSQL Databases (Scales better, Higher availability)
- While the traditional SQL can be effectively used to handle large amount of structured data, we need NoSQL (Not Only SQL) to handle unstructured data
- NoSQL databases store unstructured data with no particular schema
- Each row can have its own set of column values. NoSQL gives better performance in storing massive amount of data
Key-Value
- Project Voldemort
- Riak
- Redis
- Aerospike
- Scalaris
- Tokyo cabinet
- Memcached, membrain, and membase
- LF (fully decentralized fully replicated key/value store.)
- Etcd
Wide Column / Extensible Record Stores / Column-family
Can have many many different types of column
- HBase
- HyperTable
- Cassandra
Column Oriented Database
Not to be confused with column-family databases, column-oriented databases are very similar to relational databases, but store data on disk by column instead of by row. This means that all of the data for a single column is together, allowing for faster aggregation on larger data sets. Since the columns are separate from each other, inserting or updating values is a performance intensive task, so column-oriented databases are primarily used for analytical work where entire data sets can be preloaded at one time.- Druid
Object Oriented Database
Object-oriented databases store data items as objects, seeking to bridge the gap between the representations used by objected-oriented programming languages and databases. Although this solves many problems with translating between different data paradigms, historically, adoption has suffered due to increased complexity, lack of standardization, and difficulty decoupling the data from the original application.
Document Oriented Database / Document Stores
- Semi-structured data (XML, JSON)
- Flat File Database
Databases
- SimpleDB
- CouchDB
- MongoDB
- Terrastore
- SQLite
- RethinkDB
Hierarchical database / Graph based database (Entities, Relationships)
- Dgraph
- Nebula-graph - https://nebula-graph.io
- Alibaba Graph Database- A real-time, reliable, cloud-native graph database service that supports property graph model.
- Amazon Neptune- Fully-managed graph database service.
- Ultimate Scalable Graph Database: ArangoDB for Real-World Use Cases
- Bitsy- A small, fast, embeddable, durable in-memory graph database.
- Blazegraph- RDF graph database with OLTP support.
- CosmosDB- Microsoft's distributed OLTP graph database.
- ChronoGraph- A versioned graph database.
- DSEGraph- DataStax graph database with OLTP and OLAP support.
- GRAKN.AI- Distributed OLTP/OLAP knowledge graph system.
- Hadoop (Spark)- OLAP graph processor using Spark.
- HGraphDB- OLTP graph database running on Apache HBase.
- Huawei Graph Engine Service- Fully-managed, distributed, at-scale graph query and analysis service that provides a visualized interactive analytics platform.
- IBM Graph- OLTP graph database as a service.
- JanusGraph- Distributed OLTP and OLAP graph database with BerkeleyDB, Apache Cassandra and Apache HBase support.
- JanusGraph (Amazon)- The Amazon DynamoDB Storage Backend for JanusGraph.
https://medium.com/terminusdb/graph-fundamentals-part-1-rdf-60dcf8d0c459
- Neo4j- OLTP graph database (embedded and high availability) (open source, noSQL graph database) - Build Graph Databases with Neo4j
- neo4j-gremlin-bolt- OLTP graph database (using Bolt Protocol).
- OrientDB- OLTP graph database
- Apache S2Graph- OLTP graph database running on Apache HBase.
- Sqlg- OLTP implementation on SQL databases.
- Stardog- RDF graph database with OLTP and OLAP support.
- TinkerGraph- In-memory OLTP and OLAP reference implementation.
- Titan- Distributed OLTP and OLAP graph database with BerkeleyDB, Apache Cassandra and Apache HBase support.
- Titan (Amazon)- The Amazon DynamoDB storage backend for Titan.
- Titan (Tupl)- The Tupl storage backend for Titan.
- Unipop- OLTP Elasticsearch and JDBC backed graph.
Examples
- Filesystems
- DNS
- LDAP directories
Network databases
- IDMS
Time-Series databases
- TimeScale DB (TSDB)
- InfluxDB
- OpenTSDB
- Prometheus
In-memory databases
- Redis
- RocksDB
- Memcached (a distributed memory object caching system)
In-Memory Databases (IMDB) and In-Memory Data Grids (IMDG)
One of the crucial differences between In-Memory Data Grids and In-Memory Databases lies in the ability to scale to hundreds and thousands of servers. That is the In-Memory Data Grid'sinherent capability for such scale due to their MPP (Massively Parallel Processing) architecture, and the In-Memory Database'sexplicit inability to scale due to fact that SQL joins, in general, cannot be efficiently performed in a distribution context.
https://www.gridgain.com/resources/blog/in-memory-database-vs-in-memory-data-grid-revisited
Cloud databases / on-line databases / Managed services
- Google Firebase
- Facebook Parse
- Amazon DynamoDB
- Amazon Aurora
- One-stop Generative AI Stack to Build Production-ready Apps | DataStax
- Astra DB | DataStax
Object Storage
Object storage (also known asobject-based storage) is a computer data storage architecture that manages data as objects, as opposed to other storage architectures like file systems which manages data as a file hierarchy, and block storage which manages data as blocks within sectors and tracks.Each object typically includes the data itself, a variable amount of metadata, and a globally unique identifier. Object storage can be implemented at multiple levels, including the device level (object-storage device), the system level, and the interface level. In each case, object storage seeks to enable capabilities not addressed by other storage architectures, like interfaces that can be directly programmable by the application, a namespace that can span multiple instances of physical hardware, and data-management functions like data replication and data distribution at object-level granularity.
Object storage systems allow retention of massive amounts of unstructured data. Object storage is used for purposes such as storing photos on Facebook, songs on Spotify, or files in online collaboration services, such as Dropbox.- S3
- Azure Blob Storage
https://en.wikipedia.org/wiki/Object_storage
NewSQL databases
NewSQL databases follow the relational structure and semantics, but are built using more modern, scalable designs. The goal is to offer greater scalability than relational databases and greaterconsistency guaranteesthan NoSQL alternatives. They achieve this by sacrificing certain amounts of availability in the event of a networking partition. The trade offs between consistency and availability is a fundamental problem of distributed databases described by theCAP theorem.- MemSQL
- VoltDB
- Spanner
- Calvin
- CockroachDB
- FaunaDB
https://www.youtube.com/watch?v=2CipVwISumA&t=661s&ab_channel=Fireship
- yugabyteDB
Multi-model databases
Multi-model databases are databases that combine the functionality of more than one type of database. The benefits of this approach are clear - the same system can use different representations for different types of data.- ArangoDB
- OrientDB
- Couchbase
Semantic RDF graph database
Semantic RDF graph databases are databases that map objects using the Resource Description Framework. This framework a way to describe, in detail, objects and their relationships by categorizing pieces of data and connections. The idea is to map subjects, actions, and objects like you would in a sentence (for example, "Bill calls Sue"). For most use cases, labeled property graphs, usually just called graph databases, can express relationships more flexibly and concisely.
Ledger Databases
Embedded databases
Vector Databases
A vector database is a specialized DBMS that stores vector embeddings utilizing innovative techniques for storage, indexing, and query processing. They offer data management capabilities, such as CRUD and language bindings to widely used data science languages such as Python, SQL, Java, and Tensorflow. Additionally, they deliver advanced features such as high-speed ingestion, sharding, and replication.
Vector databases are designed to handle critical query and algorithmic styles seen in similarity search, anomaly search, observability, fraud detection, and IoT sensor analytics. Such emerging styles are the outcome of digital transformation and the rise of generative AI.
A Comprehensive Guide to Vector Databases - KDB.AI
A Fun & Absurd Introduction to Vector Databases • Alexander Chatzizacharias • GOTO 2024 - YouTube
- pinecone
- LanceDB
- Epsilla
- Welcome | Weaviate - Vector Database
- PostgresML: Leveraging Postgres as a Vector Database for AI
- Learn Vector Database in 10 Mins - Hottest AI Apps DB!
- What Are Vector Databases? | MongoDB
- Chroma - the AI-native open-source embedding database
- The 5 Best Vector Databases | A List With Examples | DataCamp
- GitHub - milvus-io/milvus: A cloud-native vector database, storage for next generation AI applications
- Key considerations when choosing a database for your generative AI applications | AWS Database Blog
Resources
- https://www.toptal.com/database/database-migrations-caterpillars-butterflies
- https://www.toptal.com/database/database-design-bad-practices
- https://dbdb.io
- https://www.sciencedirect.com/science/article/pii/S1319157816300453
- Rust at speed - building a fast concurrent database
- https://www.youtube.com/watch?v=Cym4TZwTCNU
- https://www.freecodecamp.org/news/learn-nosql-in-3-hours
- Trillions of Indexes: How Uber’s LedgerStore Supports Such Massive Scale