1. Elimination of Data Silos - Unified Lakehouse Architecture
Challenge: Multiple data sources leading to fragmented analytics, duplication, and inconsistency.
Requirement: Design and implement a centralised Lakehouse architecture to unify batch and streaming data, enabling a single source of truth for structured and semi-structured data.
Challenge: Reporting systems break or require rework when data warehouses are switched.
Requirement: Introduce a semantic or federated query layer which enables Standard SQL-based querying across different storage engines, ensuring reporting tools continue to function without rewriting queries
Challenge: Lack of visibility into data flow and transformations across the pipeline.
Requirement: Build or integrate a searchable data catalog that includes table descriptions, column definitions, data freshness, owners, and usage metrics. Implement a metadata governance tool to track data lineage, schema changes, ownership, and business glossary.
Requirement: Establish a rule-based or ML-powered data quality layer for anomaly detection, null checks, duplicate checks, and type mismatches — embedded in the pipeline.