This section provides reference information for the system-related functions in PlaidCloud Lakehouse.
This is the multi-page printable view of this section. Click here to print.
System Functions
- 1: CLUSTERING_INFORMATION
- 2: FUSE_BLOCK
- 3: FUSE_COLUMN
- 4: FUSE_ENCODING
- 5: FUSE_SEGMENT
- 6: FUSE_SNAPSHOT
- 7: FUSE_STATISTIC
1 - CLUSTERING_INFORMATION
Returns clustering information of a table.
SQL Syntax
CLUSTERING_INFORMATION('<database_name>', '<table_name>')
SQL Examples
CREATE TABLE mytable(a int, b int) CLUSTER BY(a+1);
INSERT INTO mytable VALUES(1,1),(3,3);
INSERT INTO mytable VALUES(2,2),(5,5);
INSERT INTO mytable VALUES(4,4);
SELECT * FROM CLUSTERING_INFORMATION('default','mytable')\G
*************************** 1. row ***************************
cluster_key: ((a + 1))
total_block_count: 3
constant_block_count: 1
unclustered_block_count: 0
average_overlaps: 1.3333
average_depth: 2.0
block_depth_histogram: {"00002":3}
Parameter | Description |
---|---|
cluster_key | The defined cluster key. |
total_block_count | The current count of blocks. |
constant_block_count | The count of blocks where min/max values are equal, meaning each block contains only one (group of) cluster_key value. |
unclustered_block_count | The count of blocks that have not yet been clustered. |
average_overlaps | The average ratio of overlapping blocks within a given range. |
average_depth | The average depth of overlapping partitions for the cluster key. |
block_depth_histogram | The number of partitions at each depth level. A higher concentration of partitions at lower depths indicates more effective table clustering. |
2 - FUSE_BLOCK
Returns the block information of the latest or specified snapshot of a table. For more information about what is block in PlaidCloud Lakehouse, see What are Snapshot, Segment, and Block?.
The command returns the location information of each parquet file referenced by a snapshot. This enables downstream applications to access and consume the data stored in the files.
See Also:
SQL Syntax
FUSE_BLOCK('<database_name>', '<table_name>'[, '<snapshot_id>'])
SQL Examples
CREATE TABLE mytable(c int);
INSERT INTO mytable values(1);
INSERT INTO mytable values(2);
SELECT * FROM FUSE_BLOCK('default', 'mytable');
---
+----------------------------------+----------------------------+----------------------------------------------------+------------+----------------------------------------------------+-------------------+
| snapshot_id | timestamp | block_location | block_size | bloom_filter_location | bloom_filter_size |
+----------------------------------+----------------------------+----------------------------------------------------+------------+----------------------------------------------------+-------------------+
| 51e84b56458f44269b05a059b364a659 | 2022-09-15 07:14:14.137268 | 1/7/_b/39a6dbbfd9b44ad5a8ec8ab264c93cf5_v0.parquet | 4 | 1/7/_i/39a6dbbfd9b44ad5a8ec8ab264c93cf5_v1.parquet | 221 |
| 51e84b56458f44269b05a059b364a659 | 2022-09-15 07:14:14.137268 | 1/7/_b/d0ee9688c4d24d6da86acd8b0d6f4fad_v0.parquet | 4 | 1/7/_i/d0ee9688c4d24d6da86acd8b0d6f4fad_v1.parquet | 219 |
+----------------------------------+----------------------------+----------------------------------------------------+------------+----------------------------------------------------+-------------------+
3 - FUSE_COLUMN
Returns the column information of the latest or specified snapshot of a table. For more information about what is block in PlaidCloud Lakehouse, see What are Snapshot, Segment, and Block?.
See Also:
SQL Syntax
FUSE_COLUMN('<database_name>', '<table_name>'[, '<snapshot_id>'])
SQL Examples
CREATE TABLE mytable(c int);
INSERT INTO mytable values(1);
INSERT INTO mytable values(2);
SELECT * FROM FUSE_COLUMN('default', 'mytable');
---
+----------------------------------+----------------------------+---------------------------------------------------------+------------+-----------+-----------+-------------+-------------+-----------+--------------+------------------+
| snapshot_id | timestamp | block_location | block_size | file_size | row_count | column_name | column_type | column_id | block_offset | bytes_compressed |
+----------------------------------+----------------------------+---------------------------------------------------------+------------+-----------+-----------+-------------+-------------+-----------+--------------+------------------+
| 3faefc1a9b6a48f388a8b59228dd06c1 | 2023-07-18 03:06:30.276502 | 1/118746/_b/44df130c207745cb858928135d39c1c0_v2.parquet | 4 | 196 | 1 | c | Int32 | 0 | 8 | 14 |
| 3faefc1a9b6a48f388a8b59228dd06c1 | 2023-07-18 03:06:30.276502 | 1/118746/_b/b6f8496d7e3f4f62a89c09572840cf70_v2.parquet | 4 | 196 | 1 | c | Int32 | 0 | 8 | 14 |
+----------------------------------+----------------------------+---------------------------------------------------------+------------+-----------+-----------+-------------+-------------+-----------+--------------+------------------+
4 - FUSE_ENCODING
import FunctionDescription from '@site/src/components/FunctionDescription';
Returns the encoding types applied to a specific column within a table. It helps you understand how data is compressed and stored in a native format within the table.
SQL Syntax
FUSE_ENCODING('<database_name>', '<table_name>', '<column_name>')
The function returns a result set with the following columns:
Column | Data Type | Description |
---|---|---|
VALIDITY_SIZE | Nullable(UInt32) | The size of a bitmap value that indicates whether each row in the column has a non-null value. This bitmap is used to track the presence or absence of null values in the column's data. |
COMPRESSED_SIZE | UInt32 | The size of the column data after compression. |
UNCOMPRESSED_SIZE | UInt32 | The size of the column data before applying encoding. |
LEVEL_ONE | String | The primary or initial encoding applied to the column. |
LEVEL_TWO | Nullable(String) | A secondary or recursive encoding method applied to the column after the initial encoding. |
SQL Examples
-- Create a table with an integer column 'c' and apply 'Lz4' compression
CREATE TABLE t(c INT) STORAGE_FORMAT = 'native' COMPRESSION = 'lz4';
-- Insert data into the table.
INSERT INTO t SELECT number FROM numbers(2048);
-- Analyze the encoding for column 'c' in table 't'
SELECT LEVEL_ONE, LEVEL_TWO, COUNT(*)
FROM FUSE_ENCODING('default', 't', 'c')
GROUP BY LEVEL_ONE, LEVEL_TWO;
level_one |level_two|count(*)|
------------+---------+--------+
DeltaBitpack| | 1|
-- Insert 2,048 rows with the value 1 into the table 't'
INSERT INTO t (c)
SELECT 1
FROM numbers(2048);
SELECT LEVEL_ONE, LEVEL_TWO, COUNT(*)
FROM FUSE_ENCODING('default', 't', 'c')
GROUP BY LEVEL_ONE, LEVEL_TWO;
level_one |level_two|count(*)|
------------+---------+--------+
OneValue | | 1|
DeltaBitpack| | 1|
5 - FUSE_SEGMENT
Returns the segment information of a specified table snapshot. For more information about what is segment in PlaidCloud Lakehouse, see What are Snapshot, Segment, and Block?.
See Also:
SQL Syntax
FUSE_SEGMENT('<database_name>', '<table_name>','<snapshot_id>')
SQL Examples
CREATE TABLE mytable(c int);
INSERT INTO mytable values(1);
INSERT INTO mytable values(2);
-- Obtain a snapshot ID
SELECT snapshot_id FROM FUSE_SNAPSHOT('default', 'mytable') limit 1;
---
+----------------------------------+
| snapshot_id |
+----------------------------------+
| 82c572947efa476892bd7c0635158ba2 |
+----------------------------------+
SELECT * FROM FUSE_SEGMENT('default', 'mytable', '82c572947efa476892bd7c0635158ba2');
---
+----------------------------------------------------+----------------+-------------+-----------+--------------------+------------------+
| file_location | format_version | block_count | row_count | bytes_uncompressed | bytes_compressed |
+----------------------------------------------------+----------------+-------------+-----------+--------------------+------------------+
| 1/319/_sg/d35fe7bf99584301b22e8f6a8a9c97f9_v1.json | 1 | 1 | 1 | 4 | 184 |
| 1/319/_sg/c261059d47c840e1b749222dabb4b2bb_v1.json | 1 | 1 | 1 | 4 | 184 |
+----------------------------------------------------+----------------+-------------+-----------+--------------------+------------------+
6 - FUSE_SNAPSHOT
Returns the snapshot information of a table. For more information about what is snapshot in PlaidCloud Lakehouse, see What are Snapshot, Segment, and Block?.
See Also:
SQL Syntax
FUSE_SNAPSHOT('<database_name>', '<table_name>')
SQL Examples
CREATE TABLE mytable(a int, b int) CLUSTER BY(a+1);
INSERT INTO mytable VALUES(1,1),(3,3);
INSERT INTO mytable VALUES(2,2),(5,5);
INSERT INTO mytable VALUES(4,4);
SELECT * FROM FUSE_SNAPSHOT('default','mytable');
---
| snapshot_id | snapshot_location | format_version | previous_snapshot_id | segment_count | block_count | row_count | bytes_uncompressed | bytes_compressed | index_size | timestamp |
|----------------------------------|------------------------------------------------------------|----------------|----------------------------------|---------------|-------------|-----------|--------------------|------------------|------------|----------------------------|
| a13d211b7421432898a3786848b8ced3 | 670655/783287/_ss/a13d211b7421432898a3786848b8ced3_v1.json | 1 | \N | 1 | 1 | 2 | 16 | 290 | 363 | 2022-09-19 14:51:52.860425 |
| cf08e6af6c134642aeb76bc81e6e7580 | 670655/783287/_ss/cf08e6af6c134642aeb76bc81e6e7580_v1.json | 1 | a13d211b7421432898a3786848b8ced3 | 2 | 2 | 4 | 32 | 580 | 726 | 2022-09-19 14:52:15.282943 |
| 1bd4f68b831a402e8c42084476461aa1 | 670655/783287/_ss/1bd4f68b831a402e8c42084476461aa1_v1.json | 1 | cf08e6af6c134642aeb76bc81e6e7580 | 3 | 3 | 5 | 40 | 862 | 1085 | 2022-09-19 14:52:20.284347 |
7 - FUSE_STATISTIC
Returns the estimated number of distinct values of each column in a table.
SQL Syntax
FUSE_STATISTIC('<database_name>', '<table_name>')
SQL Examples
You're most likely to use this function together with ANALYZE TABLE <table_name>
to generate and check the statistical information of a table. For more explanations and examples, see OPTIMIZE TABLE.