Sagemaker Lakehouse

Data lake centric

Bring your data into the lakehouse without expensive pipeline management

Zero-ETL Integration

Amazon S3
1. Store your data in Amazon S3 buckets
2. Access your data using Apache Iceberg REST catalog APIs
3. Enable automatic table optimization for Apache Iceberg tables
4. Get high performance with managed statistics
5. Access data seamlessly from AWS and 3P engines
Amazon S3 Tables
1. New S3 storage class for Apache Iceberg data lakes
2. Amazon S3 APls to read/write to S3 tables
3. Managed Iceberg table maintenance
4. Simple integration with Lakehouse (preview)
5. 10x requests per second compared to standard Amazon S3 buckets
Table Maintenance for Iceberg Tables
1. Compation: Consolidate small objects into larger ones to improve query performance
2. Snapshot Retention: Remove unused snapshots
Redshift Managed Storage (RMS)
1. Publish data from your existing Amazon Redshift data warehouses to the Lakehouse
2. Create new datasets for your data lake in Redshift Managed Storage natively in the Lakehouse
3. Benefit from ML-powered optimizations for frequently running workloads
Redshift Managed Storage use cases
1. Near real-time ingestion
2. Transactionally consistent change data capture (CDC) from operational data sources
3. Multi-statement and multi-table transactional consistency
4. 7x better throughput from Amazon Redshift for BI analytics
5. Faster performance for small writes in Apache Spark
6. Faster reads from Spark compared to Apache Iceberg tables

Dynamic catalog hierarchy to organize data in the storage system
Each catalog maps to a storage type
Managed catalogs to create new data
- Redshift Managed Storage
- Amazon S3
Bring data into a Federated Catalog
- Amazon Redshift
- Amazon S3 table buckets
- External Sources like MySQL, BigQuery

Support for fine-grained access control
- Allow/deny access at table level
- Allow/deny access at column level
- Allow/deny access at cell level
Industry standard access controls for 3P engines
- Tag-based access to data(TBAC)
- Role-based access to data(RBAC)
Zero copy data sharing within and across enterprises

Fine Grained Access Control

Fine Grained Access Control

Tag based access control (TBAC)

Tag based access control (TBAC)

Zero copy data sharing models

Zero copy data sharing models