Trade offs
Compute is cheap. Storage is cheap. Engineering time is expensive.
- Speed vs. memory
- battery life vs. accuracy
- fairness vs. accuracy
- precision vs. recall
- ease of implementation vs. maintainability
- explore vs exploit
Questions
Cost
- Operational complexity
- How easy is it to scale if the TSDB storage is low?
- How easy is it to scale the performance with increased data?
- Are there any operations tasks that regularly need to be carried out?
Capabilities
- Does the TSDB support metrics and events?
- Can metadata be associated with an event?
- What is the precision i.e. the smallest increment of time between events?
- What is the consistency model?
- Can the data have a Time To Live?
Performance
- How many events per second can be written in, for a given scale of the system?
- How many events per second can be read out, for a given scale of the system?
- Bytes per point after any compression that occurs? (What volume of space is required?)
Query capabilities
- How is the data queried? DSL? API?
- How easy is the query mechanism to use?
- Can the query mechanism aggregate data? Can it do it by date?
- Can we de-dupe data? e.g., if we ingested the same event twice? A uniqueness constraint.
Ingestion
- How is the data sent to the TSDB? API? Log scraping? Multi language clients?
Export
- How could the entire TSDB be exported for use elsewhere or in another TSDB?
Maturity
- How mature is the TSDB and its ecosystem? Is it likely to die soon? Does it have good support?
Community
- How large and active is the community using the TSDB? If I have a problem will I be able to find someone to help me solve it?
Support
- Is there any paid support for the TSDB and if so how much?
Security
- What are the mechanisms for authentication and authorisation?
- What other security implications are there?
Integration
- How easy is it to integrate with other services? Is there anything specific that helps?
Visualisation
- How can the data from the TSDB be visualised?
- Are there any dashboards / high level visualisations?
- Are the dashboards internal to the TSDB or can they be shared on a website?
Importants Points
- Understand the functional and non-functional requirements before designing.
- Clearly define the use cases and constraints of the system.
- There is no perfect solution. It’s all about tradeoffs.
- Assume requirements will change and design the system to be flexible.
- Assume everything can and will fail. Make it fault tolerant.
- Don't add functionality until it's necessary. Avoid over-engineering.
- Design your system for scalability from the ground up.
- Prefer horizontal scaling over vertical scaling for scalability.
- Add Load Balancers to ensure high availability and distribute traffic.
- Consider using SQL Databases for structured data and ACID transactions.
- Opt for NoSQL Databases when dealing with unstructured data.
- Use Database sharding to scale SQL databases horizontally.
- Use Database Indexing and search engines for efficient data retrievals.
- Use Rate Limiting to prevent system overload and DOS attacks.
- Use WebSockets for real-time communication.
- Employ Heartbeat Mechanisms for failure detection.
- Consider using a message queue for asynchronous communication.
- Implement data partitioning and sharding for large datasets.
- Consider denormalizing databases for read-heavy workloads.
- Consider using event-driven architecture for decoupled systems.
- Use CDNs to reduce latency for a global user base.
- Use write-through cache for write-heavy applications.
- Use read-through cache for read-heavy applications.
- Use blob/object storage for storing media files like files, videos etc..
- Implement Data Replication and Redundancy to avoid single point of failure.
- Implement Autoscaling to handle traffic spikes smoothly.
- Use Asynchronous processing to run background tasks.
- Make operations idempotent where possible to simplify retry logic and error handling.
- When appropriate, use microservices for flexibility, scalability, and maintainability.
- Consider using a data lake or data warehouse for analytics and reporting.