BigLake

BigLake is a storage engine that provides a unified interface for analytics and AI engines to query multiformat, multicloud, and multimodal data in a secure, governed, and performant manner. Build a single-copy AI lakehouse designed to reduce management of and need for custom data infrastructure.

BigLake tables let you query structured data in external data stores with access delegation. Access delegation decouples access to the BigLake table from access to the underlying data store. An external connection associated with a service account is used to connect to the data store. Because the service account handles retrieving data from the data store, you only have to grant users access to the BigLake table. This lets you enforce fine-grained security at the table level, including row-level and column-level security. For BigLake tables based on Cloud Storage, you can also use dynamic data masking.

Supported data stores

You can use BigLake tables with the following data stores:

Comparison

Item	BigQuery		BigLake
	Native Table	External Table	BigLake Table	BigLake Iceberg Tables via BigLake Metastore	BigLake Managed Tables
Storage Format	Capacitor	CSV,ORC, Parquet, etc.	CSV, Iceberg, Parquet, etc.	Iceberg	Iceberg
Storage Location	Google Internal	Customer GCS	Customer GCS	Customer GCS	Customer GCS
Read/Write	CRUD	Read only	Read only from BQ / Updates via Spark (manual BQ metadata updates)	Read only from BQ / Updates via Spark	CRUD
RLS / CLS / Data Mask	Yes	No	Yes	Yes	Yes
Fully Managed	Yes (recluster, optimize, etc.)	No	No	No	Yes (recluster, optimize, etc.)
Partitioning	Partition/Clustering	Partitioning	Partitioning	Partitioning	Clustering
Streaming (native)	Yes	No	No	No	Yes
Time Travel	Yes	No	Manual	No	Yes

BigLake is a storage engine that unifies data stored in GCS (or other object stores) and BigQuery. It allows you users a uniform BQ experience whether their data is in native BQ storage or in an object store

For example, if you want to keep all of your data in an open source format like Parquet or Iceberg and not ingest into BQ, you can instead define a BigLake table. And still put things like fine-grained access control (e.g. row, column-level security) on top, including in other public clouds. Similar to BQ Native tables, you can also put BQML models on BigLake tables, or access BigLake tables via different analytics engines like Spark or Presto.

To your question - BigLake is the storage component in other Public Clouds (e.g. data in S3) and BigQuery Omni is the compute component that's run on the other cloud (sitting on a fleet of EC2 machines). Right now, you can see BQ native tables, GCS-backed tables (BigLake), or S3/Azure Blob-backed BigLake tables all in the familiar BQ console.

Unfortunately, multi-cloud tables cannot be joined yet. Much like how you can't join BQ native tables across regions. But I think that's on the BQ team's roadmap.

What's the point of BigLake? : r/bigquery

Google’s lakehouse offering, BigLake: A deep dive into various BigLake tables | by Suteja Kanuri | Medium

Supported data stores​

Comparison​

Links​

Supported data stores

Comparison

Links