Introduction to Snowflake Architecture
Multi-cluster, shared storage architecture
Snowflake is a cloud-based database and is currently offered as a pay-as-you-go service in AWS, Azure and GCP cloud. It is developed by Snowflake Computing.
Snowflake adopts a shared-storage architecture. It uses Amazon S3 for its underlying data storage. It performs query execution within in elastic clusters of virtual machines, called virtual warehouse. The Cloud Service layer stores the collection of services that manage computation clusters, queries, transactions, and all the metadata like database catalogs and access control information in a key-value store (FoundationDB).
History
Implementation of Snowflake began in late 2012 and has been generally available since June 2015.
Concurrency Control
Multi-version Concurrency Control (MVCC)
Snowflake supports MVCC. As Snowflake's underlying data storage is done by Amazon S3, each write operation instead of performing writes in place, it creates a new entire file including the changes. The stale version of data is replaced by the newly created file, but is not deleted immediately. Snowflake allows users to define how long the stale version will be kept in S3, which is up to 90 days. Based on MVCC, Snowflake also supports time travel query.
Data Model
Snowflake is relational as it supports ANSI SQL and ACID transactions. It offers built-in functions and SQL extensions for traversing, flattening, and nesting of semi-structured data, with support for popular formats such as JSON and Avro. When storing semi-structured data, Snowflake can perform automatic type inference to find the most common types and store them using the same compressed columnar format as native relational data. Thus it can accelerate query execution on them.
Foreign Keys
Snowflake supports defining and maintaining constraints, but does not enforce them, except for NOT NULL constraints, which are always enforced including foreign key constraint.
Snowflake relies on deferred constraint checking during query execution rather than during data modification, allowing for flexibility in data loading.
Snowflake, cannot handle referential integrity because, even though it supports integrity and other constraints, they are not enforced except the NOT NULL constraint that is always enforced. Other constraints than NOT NULL are created as disabled constraints.
Snowflake provides the following constraint functionality:
- Unique, primary, and foreign keys, and NOT NULL columns.
- Named constraints.
- Single-column and multi-column constraints.
- Creation of constraints inline and out-of-line.
- Support for creation, modification and deletion of constraints.
Overview of Constraints | Snowflake Documentation
Indexes
Snowflake does not support index, as maintaining index is expensive due to its architecture. Snowflake uses min-max based pruning, and other techniques to accelerate data access.
Isolation Levels
According to their paper and talk, Snowflake supports Snapshot Isolation. However, according to their documentation, it is said that Read Committed is the only Isolation level that is supported.
Joins
Query Compilation
Query Execution
Snowflake processes data in pipelined fashion, in batches of a few thousand rows in columnar format. It also uses a push instead of pull model as the relational operators push the intermediate results to their downstream operators.
Query Interface
Snowflake's SQL query engine includes an automatic query optimization feature. The query optimizer assesses the query and execution plan, taking into account factors like table statistics, data distribution, and available compute resources. This dynamic optimization process ensures that queries are executed efficiently, leveraging the platform's resources effectively for optimal performance.
Storage Architecture
Snowflake's data storage is done via Amazon S3 service. Upon query execution, the responsible work nodes uses HTTP-based interface to read/write data. The worker node also uses its local disk as a cache.
Storage Model
Snowflake horizontally partitions data into large immutable files which are equivalent to blocks or pages in a traditional database system. Within each file, the values of each attribute or column are grouped together and heavily compressed, a well-known scheme called PAX or hybrid columnar. Each table file has a header which, among other metadata, contains the offsets of each column within the file.
Stored Procedures
System Architecture
It uses Amazon S3 for its underlying data storage. It performs query execution within its elastic clusters of virtual machines, called virtual warehouse. Upon query execution, virtual warehouse use HTTP-based interface to read/write data from S3. The Cloud Service layer stores the collection of services that manage computation clusters, queries, transactions, and all the metadata like database catalogs and access control information, in FoundationDB.
Snowflake actually uses a multi-cluster, shared data architecture. The storage and compute layers are separate, and the data is stored in a centralized object store (like Amazon S3). Compute clusters, or virtual warehouses, can access and process this shared data concurrently.
Views
Features
- Multiple Cloud Provider Support
- Unlimited Storage & Compute
- Data Platform as Service
- Unique 3 Layer Architecture
- Virtual Warehouse (compute)
- Support for semi structure data
- Snowflake Time Travel
- Snowflake Zero Copy Clone
- Continuous Data Loading (Snowpipe)
- Support for ANSI SQL + Extended SQL
- Snowflake Micropartition / Clustering
- Snowflake Data Security & Encryption
- Snowflake RBAC & DAC
- Data Sharing & Reader's Account
- Data Replication & Failover
- Snowflake Connectors & Drivers
- Tasks / Task Scheduling / DAGs
- Streams (CDC - any changes in the table)
- Sequences
- Sequences are used to generate unique numbers across sessions and statements, including concurrent statements. They can be used to generate values for a primary key or any column that requires a unique value.
- Snowpark for Python, Java ans Scala - Runtimes and libraries that securely deploy and process non-SQL code in Snowflake.
Releases
Snowflake Summit 2026 Highlights
Snowflake's 2026 highlights center entirely on transitioning from basic data storage into an "Agentic Enterprise" platform. Announced predominantly at the Snowflake Summit 2026 in early June, the major updates focus on bringing production-ready AI agents, semantic layers, and advanced infrastructure directly to governed corporate data.
🤖 AI Agents & Rebranding
Snowflake streamlined its primary interfaces to shift focus toward automated AI workflows:
- Snowflake CoCo: Formerly Cortex Code, CoCo is an AI coding agent for developers available across desktops, VS Code extensions, and Excel. It lets data teams build Python pipelines using a single conversational prompt.
- Snowflake CoWork: Formerly Snowflake Intelligence, this surface is expanded for everyday knowledge workers to automate operational business processes.
- Cortex Sense: An enterprise memory feature that analyzes query history and metadata to automatically learn an organization's business terms, boosting agent accuracy by 83%.
🔒 Security, Trust & Governance
With AI agents executing autonomous actions, Snowflake added critical control mechanisms:
- AI Agent Identity: Grants every AI agent a cryptographic identity with dedicated role-based access controls (RBAC) and audit trails.
- Data Movement Policies: Protects sensitive environments by blocking unauthorized downloads or cloud-stage data transfers by autonomous systems.
- AI Security Posture Management: Introduces native machine learning defenses within the Snowflake Trust Center to catch prompt injection attacks.
📊 Context & Semantic Layers
To prevent AI hallucination, Snowflake rolled out features to deliver "one version of the truth" across data siloed environments:
- Horizon Context & Semantic Studio: Tools allowing teams to map out business logic without deep SQL expertise.
- Open Semantics Initiative (OSI): A collaborative open standard adopted alongside dozens of external vendors to ensure seamless data interpretations across various software ecosystems.
- Multi-Party Data Clean Rooms: Upgraded data collaboration letting multiple corporate parties, advertisers, and publishers safely analyze shared campaign data concurrently without exposing individual raw data sets.
⚙️ Core Performance & Infrastructure Substrate
- Apache Iceberg v3 (GA): Delivers major interoperability upgrades to open-source table formats.
- Adaptive Compute: Automatically shifts software and hardware compute resources behind the scenes to optimize heavy AI processing workloads.
- Observe Acquisition: Integrates a complete observability suite (logs, infrastructure, and application monitoring) directly into native Snowflake workflows following its earlier corporate acquisition.
🤝 Ecosystem Partnerships
- Anthropic Expansion: Deepened integration to power Snowflake CoCo and CoWork natively using Anthropic's Claude models.
- AWS Alliance: Broadened cloud infrastructure commitments to accommodate enterprise clients migrating highly regulated data estates over to Snowflake's AI stack.
Snowflake Summit 2026 Platform Keynote - YouTube
Others
- Streamlit
Links
- Snowflake SnowPro Certification - Tutorial - YouTube
- Snowflake Tutorial - YouTube
- The Snowflake Data Cloud - Mobilize Data, Apps, and AI
- What is Snowflake? 8 Minute Demo - YouTube
- Snowflake Explained In 9 Mins | What Is Snowflake Database | Careers In Snowflake | MindMajix - YouTube
- Snowflake Documentation
- Top Snowflake Interview Questions and Answers (2023) - InterviewBit
- Top 50 Snowflake Interview Questions And Answers 2023
- Leveraging Cortex AISQL For Multi-Modal Analytics - YouTube