SQL Data Types

SQL data type rules

Databricks uses several rules to resolve conflicts among data types:

Promotion safely expands a type to a wider type.
Implicit downcasting narrows a type. The opposite of promotion.
Implicit crosscasting transforms a type into a type of another type family.

You can also explicitly cast between many types:

cast function casts between most types, and returns errors if it cannot.
try_cast function works like cast function but returns NULL when passed invalid values.
Other builtin functions cast between types using provided format directives.

Type precedence graph

type-precedence-graph

SQL data type rules | Databricks on AWS

Varchar vs NVarchar

The key difference between VARCHAR and NVARCHAR lies in their character encoding and how they handle different languages and characters. VARCHAR uses a single-byte character set, typically ASCII, and is best suited for data primarily in English or similar languages. NVARCHAR, on the other hand, uses Unicode (typically UTF-16) and can store characters from multiple languages and scripts.

Varchar vs Text

When a table has TEXT or BLOB columns, the table can't be stored in memory. This means every query (which doesn't hit cache) has to access the file system - which is orders of magnitude slower than the memory.

Therefore you should store this TEXT column in a seperate table which is only accessed when you actually need it. This way the original table can be stored in memory and will be much faster.

Think of it as separating the data into one "memory table" and one "file table". The reason for doing this is to avoid accessing of the filesystem except when neccessary (i.e. only when you need the text).

MySQL varchar(2000) vs text? - Stack Overflow

SQL Data Types

Data types | Databricks on AWS

SQL data type rules​

Type precedence graph​

Varchar vs NVarchar​

Varchar vs Text​

SQL Data Types​

SQL data type rules

Type precedence graph

Varchar vs NVarchar

Varchar vs Text

SQL Data Types