Skip to main content

Unity Catalog

Unity Catalog is a unified governance solution for all data and AI assets including files, tables, machine learning models and dashboards in your lakehouse on any cloud.

  • Centralized governance for data and AI
  • Built-in data search and discovery
  • Performance and scale
  • Automated lineage for all workloads
  • Integrated with your existing tools
  • Unified and secure data search experience

Permissioning - Permissions do inherit downwards, but ownership does not.

Work with Unity Catalog and the legacy Hive metastore | Databricks on AWS

Work with Unity Catalog and the legacy Hive metastore - Azure Databricks | Microsoft Learn Upgrade Your Objects in Hive Metastore to Unity Catalog - The Databricks Blog

Unity Catalog best practices | Databricks on AWS

Create tables

Managed tables

Managed tables are the default way to create tables in Unity Catalog. Unity Catalog manages the lifecycle and file layout for these tables. You should not use tools outside of Databricks to manipulate files in these tables directly.

By default, managed tables are stored in the root storage location that you configure when you create a metastore. You can optionally specify managed table storage locations at the catalog or schema levels, overriding the root storage location. Managed tables always use the Delta table format.

When a managed table is dropped, its underlying data is deleted from your cloud tenant within 30 days.

External tables

External tables are tables whose data is stored outside of the managed storage location specified for the metastore, catalog, or schema. Use external tables only when you require direct access to the data outside of Databricks clusters or Databricks SQL warehouses.

When you run DROP TABLE on an external table, Unity Catalog does not delete the underlying data. To drop a table you must be its owner. You can manage privileges on external tables and use them in queries in the same way as managed tables. To create an external table with SQL, specify a LOCATION path in your CREATE TABLE statement. External tables can use the following file formats:

  • DELTA
  • CSV
  • JSON
  • AVRO
  • PARQUET
  • ORC
  • TEXT

Fully delete an external table

DROP TABLE <table_name>
dbutils.fs.rm("s3://<path_to_table>", True)

Create tables | Databricks on AWS

YouTube