Creating Tables in ClickHouse
Like most databases, ClickHouse logically groups tables into databases. Use the CREATE DATABASE command to create a new database in ClickHouse:
CREATE DATABASE IF NOT EXISTS helloworld
Similarly, use CREATE TABLE to define a new table. (If you do not specify the database name, the table will be in the
default database.) The following table named is my_first_table in the helloworld database:
CREATE TABLE helloworld.my_first_table
(
user_id UInt32,
message String,
timestamp DateTime,
metric Float32
)
ENGINE = MergeTree()
PRIMARY KEY (user_id, timestamp)
In the example above, my_first_table is a MergeTree table with four columns:
-
user_id: a 32-bit unsigned integer -
message: aStringdata type, which replaces types likeVARCHAR,BLOB,CLOBand others from other database systems -
timestamp: aDateTimevalue, which represents an instant in time -
metric: a 32-bit floating point numberNoteThe table engine determines:
-
How and where the data is stored
-
Which queries are supported
-
Whether or not the data is replicated
There are many engines to choose from, but for a simple table on a single-node ClickHouse server, MergeTree is your likely choice.
A Brief Intro to Primary Keys
Before you go any further, it is important to understand how primary keys work in ClickHouse (the implementation of primary keys might seem unexpected!):
- primary keys in ClickHouse are not unique for each row in a table
The primary key of a ClickHouse table determines how the data is sorted when written to disk. Every 8,192 rows or 10MB of data (referred to as the index granularity) creates an entry in the primary key index file. This granularity concept creates a sparse index that can easily fit in memory, and the granules represent a stripe of the smallest amount of column data that gets processed during
SELECTqueries.The primary key can be defined using the
PRIMARY KEYparameter. If you define a table without aPRIMARY KEYspecified, then the key becomes the tuple specified in theORDER BYclause. If you specify both aPRIMARY KEYand anORDER BY, the primary key must be a prefix of the sort order.The primary key is also the sorting key, which is a tuple of
(user_id, timestamp). Therefore, the data stored in each column file will be sorted byuser_id, thentimestamp. -
For more details, check out the Modeling Data training module in ClickHouse Academy.