Skip to main content

Databricks Certified Data Engineer Associate

  1. Databricks Intelligence Platform - 10%
  2. Development and Ingestion - 30%
  3. Data Processing & Transformations - 31%
  4. Productionizing Data Pipelines - 18%
  5. Data Governance & Quality - 11%

Assessment Details

  • Total number of questions: 45
  • Time limit: 90 minutes
  • Registration fee: $200
  • Validity period: 2 years

Exam outline

Section 1: Databricks Intelligence Platform

  • Enable features that simplify data layout decisions and optimize query performance.
  • Explain the value of the Data Intelligence Platform.
  • Identify the applicable compute to use for a specific use case.

Section 2: Development and Ingestion

  • Use Databricks Connect in a data engineering workflow.
  • Determine the capabilities of Notebooks functionality.
  • Classify valid Auto Loader sources and use cases.
  • Demonstrate knowledge of Auto Loader syntax.
  • Use Databricks' built-in debugging tools to troubleshoot a given issue.

Section 3: Data Processing & Transformations

  • Describe the three layers of the Medallion Architecture and explain the purpose of each layer in a data processing pipeline.
  • Classify the type of cluster and configuration for optimal performance based on the scenario in which the cluster is used.
  • Emphasize the advantages of LDP (for ETL process in Databricks).
  • Implement data pipelines using LDP.
  • Identify DDL (Data Definition Language)/DML features.
  • Compute complex aggregations and Metrics with PySpark Dataframes.

Section 4: Productionizing Data Pipelines

  • Identify the difference between DAB and traditional deployment methods.
  • Identify the structure of Asset Bundles.
  • Deploy a workflow, repair, and rerun a task in case of failure.
  • Use serverless for a hands-off, auto-optimized compute managed by Databricks.
  • Analyzing the Spark UI to optimize the query.

Section 5: Data Governance & Quality

  • Explain the difference between managed and external tables.
  • Identify the grant of permissions to users and groups within UC.
  • Identify key roles in UC.
  • Identify how audit logs are stored.
  • Use lineage features in Unity Catalog.
  • Use the Delta Sharing feature available with Unity Catalog to share data.
  • Identify the advantages and limitations of Delta sharing.
  • Identify types of delta sharing- Databricks vs external system.
  • Analyze the cost considerations of data sharing across clouds.
  • Identify Use cases of Lakehouse Federation when connected to external sources.