Skip to main content

Data Loading Using COPY INTO from AWS S3

Overview

COPY INTO is a bulk loading command in Snowflake used to load data on demand from external storage like Amazon S3 into Snowflake tables. Unlike Snowpipe (which provides continuous auto-ingestion), COPY INTO is manual or scheduled but gives more control, better error handling, and batch loading performance.

When to Use COPY INTO

Use CaseChoose COPY INTO
One-time or periodic batch loadsYes
Scheduled ETL/ELT jobsYes
Need more control over loadingYes
Continuous streaming-like ingestionUse Snowpipe instead

Prerequisites

Before you start, ensure:

  • Snowflake account is set up
  • Warehouse and database exist
  • IAM role for S3 access configured
  • Data files prepared (CSV, JSON, Parquet etc.)

Steps

Step 1: Create Warehouse, Database, and Schema

CREATE WAREHOUSE IF NOT EXISTS COPY_WH WAREHOUSE_SIZE = 'XSMALL' 

CREATE DATABASE IF NOT EXISTS COPY_DB;

CREATE SCHEMA IF NOT EXISTS COPY_SCHEMA;

Step 2: Create Target Table

CREATE OR REPLACE TABLE CUSTOMER_DATA (
    CUSTOMER_ID INT,
    FIRST_NAME STRING,
    LAST_NAME STRING,
    EMAIL STRING,
    COUNTRY STRING,
    CREATED_AT TIMESTAMP
);

Step 3: Configure AWS S3 External Stage

Option A: Using AWS Access Key 

CREATE OR REPLACE STAGE CUSTOMER_S3_STAGE
URL='s3://my-bucket/customer-data/'
CREDENTIALS = (AWS_KEY_ID='YOUR_AWS_KEY' AWS_SECRET_KEY='YOUR_AWS_SECRET');

Option B: Using AWS IAM Role 

CREATE OR REPLACE STAGE CUSTOMER_S3_STAGE
URL='s3://my-bucket/customer-data/'
CREDENTIALS = (AWS_ROLE='arn:aws:iam::123456789012:role/mySnowflakeRole');

Step 4: Define File Format

Example for CSV:

CREATE OR REPLACE FILE FORMAT CSV_FMT
TYPE = 'CSV'
FIELD_DELIMITER = ','
SKIP_HEADER = 1
NULL_IF = ('NULL', 'null')
EMPTY_FIELD_AS_NULL = TRUE;

Step 5: Validate File Visibility

LIST @CUSTOMER_S3_STAGE;

Step 6: Load Data Using COPY INTO

COPY INTO CUSTOMER_DATA
FROM @CUSTOMER_S3_STAGE
FILE_FORMAT = (FORMAT_NAME = CSV_FMT)
ON_ERROR = 'CONTINUE';

Optional Parameters

ParameterPurposeExample
ON_ERRORContinue or abort on errorsON_ERROR='SKIP_FILE'
PATTERNLoad only matching filenamesPATTERN='.2025..csv'
FORCEReload already loaded filesFORCE=TRUE

Step 7: Verify Successful Load

SELECT COUNT(*) FROM CUSTOMER_DATA;

SELECT * FROM TABLE(INFORMATION_SCHEMA.LOAD_HISTORY()) ORDER BY LAST_LOAD_TIME DESC;

Example: COPY INTO Script

1. Select Warehouse, Database & Schema

USE WAREHOUSE COPY_WH;       -- Warehouse for running COPY INTO
USE DATABASE COPY_DB;
USE SCHEMA COPY_SCHEMA;

2. Create Target Table

CREATE OR REPLACE TABLE CUSTOMER_DATA (
    CUSTOMER_ID INT,
    FIRST_NAME STRING,
    LAST_NAME STRING,
    EMAIL STRING,
    COUNTRY STRING,
    CREATED_AT TIMESTAMP
);

3. Create File Format (CSV with settings)

CREATE OR REPLACE FILE FORMAT CSV_FMT
    TYPE = 'CSV'
    FIELD_DELIMITER = ','
    SKIP_HEADER = 1
    NULL_IF = ('NULL', 'null', '')
    EMPTY_FIELD_AS_NULL = TRUE
    FIELD_OPTIONALLY_ENCLOSED_BY = '"';

4. Create STAGE to connect to S3

Best practice: Use IAM role, not access keys.

CREATE OR REPLACE STAGE CUSTOMER_S3_STAGE
    URL = 's3://my-bucket/customer-data/'
    CREDENTIALS = (AWS_ROLE = 'arn:aws:iam::123456789012:role/mySnowflakeRole')
    FILE_FORMAT = CSV_FMT;

 (Optional: if S3 bucket is in a different region add endpoint)

ENDPOINT = 's3.us-west-1.amazonaws.com';

5. Validate connectivity by listing files

LIST @CUSTOMER_S3_STAGE;

6. Load data using COPY INTO

  • Duplicate prevention
  • ON_ERROR handling
COPY INTO CUSTOMER_DATA
FROM @CUSTOMER_S3_STAGE
FILE_FORMAT = (FORMAT_NAME = CSV_FMT)
ON_ERROR = 'CONTINUE'           -- Skip bad rows

Prevent loading the same file twice:

MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE;

7. Verify load

SELECT COUNT(*) AS TOTAL_ROWS FROM CUSTOMER_DATA;

View load history

SELECT * FROM TABLE(INFORMATION_SCHEMA.LOAD_HISTORY(
    TABLE_NAME => 'CUSTOMER_DATA',
    START_TIME => DATEADD(hour, -24, CURRENT_TIMESTAMP())
))
ORDER BY LAST_LOAD_TIME DESC;