Skip to main content

SQL Data Types

SQL data type rules

Databricks uses several rules to resolve conflicts among data types:

You can also explicitly cast between many types:

Type precedence graph

type-precedence-graph

SQL data type rules | Databricks on AWS

Varchar vs NVarchar

The key difference between VARCHAR and NVARCHAR lies in their character encoding and how they handle different languages and characters. VARCHAR uses a single-byte character set, typically ASCII, and is best suited for data primarily in English or similar languages. NVARCHAR, on the other hand, uses Unicode (typically UTF-16) and can store characters from multiple languages and scripts.

Varchar vs Text

When a table has TEXT or BLOB columns, the table can't be stored in memory. This means every query (which doesn't hit cache) has to access the file system - which is orders of magnitude slower than the memory.

Therefore you should store this TEXT column in a seperate table which is only accessed when you actually need it. This way the original table can be stored in memory and will be much faster.

Think of it as separating the data into one "memory table" and one "file table". The reason for doing this is to avoid accessing of the filesystem except when neccessary (i.e. only when you need the text).

MySQL varchar(2000) vs text? - Stack Overflow

SQL Data Types

Data types | Databricks on AWS