This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

System Functions

This section provides reference information for the system-related functions in PlaidCloud Lakehouse.

List of Functions:

1 - CLUSTERING_INFORMATION

Returns clustering information of a table.

SQL Syntax

CLUSTERING_INFORMATION('<database_name>', '<table_name>')

SQL Examples

CREATE TABLE mytable(a int, b int) CLUSTER BY(a+1);

INSERT INTO mytable VALUES(1,1),(3,3);
INSERT INTO mytable VALUES(2,2),(5,5);
INSERT INTO mytable VALUES(4,4);

SELECT * FROM CLUSTERING_INFORMATION('default','mytable')\G
*************************** 1. row ***************************
            cluster_key: ((a + 1))
      total_block_count: 3
   constant_block_count: 1
unclustered_block_count: 0
       average_overlaps: 1.3333
          average_depth: 2.0
  block_depth_histogram: {"00002":3}
ParameterDescription
cluster_keyThe defined cluster key.
total_block_countThe current count of blocks.
constant_block_countThe count of blocks where min/max values are equal, meaning each block contains only one (group of) cluster_key value.
unclustered_block_countThe count of blocks that have not yet been clustered.
average_overlapsThe average ratio of overlapping blocks within a given range.
average_depthThe average depth of overlapping partitions for the cluster key.
block_depth_histogramThe number of partitions at each depth level. A higher concentration of partitions at lower depths indicates more effective table clustering.

2 - FUSE_BLOCK

Returns the block information of the latest or specified snapshot of a table. For more information about what is block in PlaidCloud Lakehouse, see What are Snapshot, Segment, and Block?.

The command returns the location information of each parquet file referenced by a snapshot. This enables downstream applications to access and consume the data stored in the files.

See Also:

SQL Syntax

FUSE_BLOCK('<database_name>', '<table_name>'[, '<snapshot_id>'])

SQL Examples

CREATE TABLE mytable(c int);
INSERT INTO mytable values(1);
INSERT INTO mytable values(2); 

SELECT * FROM FUSE_BLOCK('default', 'mytable');

---
+----------------------------------+----------------------------+----------------------------------------------------+------------+----------------------------------------------------+-------------------+
| snapshot_id                      | timestamp                  | block_location                                     | block_size | bloom_filter_location                              | bloom_filter_size |
+----------------------------------+----------------------------+----------------------------------------------------+------------+----------------------------------------------------+-------------------+
| 51e84b56458f44269b05a059b364a659 | 2022-09-15 07:14:14.137268 | 1/7/_b/39a6dbbfd9b44ad5a8ec8ab264c93cf5_v0.parquet |          4 | 1/7/_i/39a6dbbfd9b44ad5a8ec8ab264c93cf5_v1.parquet |               221 |
| 51e84b56458f44269b05a059b364a659 | 2022-09-15 07:14:14.137268 | 1/7/_b/d0ee9688c4d24d6da86acd8b0d6f4fad_v0.parquet |          4 | 1/7/_i/d0ee9688c4d24d6da86acd8b0d6f4fad_v1.parquet |               219 |
+----------------------------------+----------------------------+----------------------------------------------------+------------+----------------------------------------------------+-------------------+

3 - FUSE_COLUMN

Returns the column information of the latest or specified snapshot of a table. For more information about what is block in PlaidCloud Lakehouse, see What are Snapshot, Segment, and Block?.

See Also:

SQL Syntax

FUSE_COLUMN('<database_name>', '<table_name>'[, '<snapshot_id>'])

SQL Examples

CREATE TABLE mytable(c int);
INSERT INTO mytable values(1);
INSERT INTO mytable values(2);

SELECT * FROM FUSE_COLUMN('default', 'mytable');

---
+----------------------------------+----------------------------+---------------------------------------------------------+------------+-----------+-----------+-------------+-------------+-----------+--------------+------------------+
| snapshot_id                      | timestamp                  | block_location                                          | block_size | file_size | row_count | column_name | column_type | column_id | block_offset | bytes_compressed |
+----------------------------------+----------------------------+---------------------------------------------------------+------------+-----------+-----------+-------------+-------------+-----------+--------------+------------------+
| 3faefc1a9b6a48f388a8b59228dd06c1 | 2023-07-18 03:06:30.276502 | 1/118746/_b/44df130c207745cb858928135d39c1c0_v2.parquet |          4 |       196 |         1 | c           | Int32       |         0 |            8 |               14 |
| 3faefc1a9b6a48f388a8b59228dd06c1 | 2023-07-18 03:06:30.276502 | 1/118746/_b/b6f8496d7e3f4f62a89c09572840cf70_v2.parquet |          4 |       196 |         1 | c           | Int32       |         0 |            8 |               14 |
+----------------------------------+----------------------------+---------------------------------------------------------+------------+-----------+-----------+-------------+-------------+-----------+--------------+------------------+

4 - FUSE_ENCODING

Returns the encoding types applied to a specific column within a table. It helps you understand how data is compressed and stored in a native format within the table.

SQL Syntax

FUSE_ENCODING('<database_name>', '<table_name>', '<column_name>')

The function returns a result set with the following columns:

ColumnData TypeDescription
VALIDITY_SIZENullable(UInt32)The size of a bitmap value that indicates whether each row in the column has a non-null value. This bitmap is used to track the presence or absence of null values in the column's data.
COMPRESSED_SIZEUInt32The size of the column data after compression.
UNCOMPRESSED_SIZEUInt32The size of the column data before applying encoding.
LEVEL_ONEStringThe primary or initial encoding applied to the column.
LEVEL_TWONullable(String)A secondary or recursive encoding method applied to the column after the initial encoding.

SQL Examples

-- Create a table with an integer column 'c' and apply 'Lz4' compression
CREATE TABLE t(c INT) STORAGE_FORMAT = 'native' COMPRESSION = 'lz4';

-- Insert data into the table.
INSERT INTO t SELECT number FROM numbers(2048);

-- Analyze the encoding for column 'c' in table 't'
SELECT LEVEL_ONE, LEVEL_TWO, COUNT(*) 
FROM FUSE_ENCODING('default', 't', 'c') 
GROUP BY LEVEL_ONE, LEVEL_TWO;

level_one   |level_two|count(*)|
------------+---------+--------+
DeltaBitpack|         |       1|

--  Insert 2,048 rows with the value 1 into the table 't'
INSERT INTO t (c)
SELECT 1
FROM numbers(2048);

SELECT LEVEL_ONE, LEVEL_TWO, COUNT(*) 
FROM FUSE_ENCODING('default', 't', 'c') 
GROUP BY LEVEL_ONE, LEVEL_TWO;

level_one   |level_two|count(*)|
------------+---------+--------+
OneValue    |         |       1|
DeltaBitpack|         |       1|

5 - FUSE_SEGMENT

Returns the segment information of a specified table snapshot. For more information about what is segment in PlaidCloud Lakehouse, see What are Snapshot, Segment, and Block?.

See Also:

SQL Syntax

FUSE_SEGMENT('<database_name>', '<table_name>','<snapshot_id>')

SQL Examples

CREATE TABLE mytable(c int);
INSERT INTO mytable values(1);
INSERT INTO mytable values(2); 

-- Obtain a snapshot ID
SELECT snapshot_id FROM FUSE_SNAPSHOT('default', 'mytable') limit 1;

---
+----------------------------------+
| snapshot_id                      |
+----------------------------------+
| 82c572947efa476892bd7c0635158ba2 |
+----------------------------------+

SELECT * FROM FUSE_SEGMENT('default', 'mytable', '82c572947efa476892bd7c0635158ba2');

---
+----------------------------------------------------+----------------+-------------+-----------+--------------------+------------------+
| file_location                                      | format_version | block_count | row_count | bytes_uncompressed | bytes_compressed |
+----------------------------------------------------+----------------+-------------+-----------+--------------------+------------------+
| 1/319/_sg/d35fe7bf99584301b22e8f6a8a9c97f9_v1.json |              1 |           1 |         1 |                  4 |              184 |
| 1/319/_sg/c261059d47c840e1b749222dabb4b2bb_v1.json |              1 |           1 |         1 |                  4 |              184 |
+----------------------------------------------------+----------------+-------------+-----------+--------------------+------------------+

6 - FUSE_SNAPSHOT

Returns the snapshot information of a table. For more information about what is snapshot in PlaidCloud Lakehouse, see What are Snapshot, Segment, and Block?.

See Also:

SQL Syntax

FUSE_SNAPSHOT('<database_name>', '<table_name>')

SQL Examples

CREATE TABLE mytable(a int, b int) CLUSTER BY(a+1);

INSERT INTO mytable VALUES(1,1),(3,3);
INSERT INTO mytable VALUES(2,2),(5,5);
INSERT INTO mytable VALUES(4,4);

SELECT * FROM FUSE_SNAPSHOT('default','mytable');

---
| snapshot_id                      | snapshot_location                                          | format_version | previous_snapshot_id             | segment_count | block_count | row_count | bytes_uncompressed | bytes_compressed | index_size | timestamp                  |
|----------------------------------|------------------------------------------------------------|----------------|----------------------------------|---------------|-------------|-----------|--------------------|------------------|------------|----------------------------|
| a13d211b7421432898a3786848b8ced3 | 670655/783287/_ss/a13d211b7421432898a3786848b8ced3_v1.json | 1              | \N                               | 1             | 1           | 2         | 16                 | 290              | 363        | 2022-09-19 14:51:52.860425 |
| cf08e6af6c134642aeb76bc81e6e7580 | 670655/783287/_ss/cf08e6af6c134642aeb76bc81e6e7580_v1.json | 1              | a13d211b7421432898a3786848b8ced3 | 2             | 2           | 4         | 32                 | 580              | 726        | 2022-09-19 14:52:15.282943 |
| 1bd4f68b831a402e8c42084476461aa1 | 670655/783287/_ss/1bd4f68b831a402e8c42084476461aa1_v1.json | 1              | cf08e6af6c134642aeb76bc81e6e7580 | 3             | 3           | 5         | 40                 | 862              | 1085       | 2022-09-19 14:52:20.284347 |

7 - FUSE_STATISTIC

Returns the estimated number of distinct values of each column in a table.

SQL Syntax

FUSE_STATISTIC('<database_name>', '<table_name>')

SQL Examples

You're most likely to use this function together with ANALYZE TABLE <table_name> to generate and check the statistical information of a table. For more explanations and examples, see OPTIMIZE TABLE.

8 - FUSE_TIME_TRAVEL_SIZE

Calculates the storage size of historical data (for Time Travel) for tables.

SQL Syntax

-- Calculate historical data size for all tables in all databases
SELECT ...
FROM fuse_time_travel_size();

-- Calculate historical data size for all tables in a specified database
SELECT ...
FROM fuse_time_travel_size('<database_name>');

-- Calculate historical data size for a specified table in a specified database
SELECT ...
FROM fuse_time_travel_size('<database_name>', '<table_name>'));

Output

The function returns a result set with the following columns:

ColumnDescription
database_nameThe name of the database where the table is located.
table_nameThe name of the table.
is_droppedIndicates whether the table has been dropped (true for dropped tables, false otherwise).
time_travel_sizeThe total storage size of historical data (for Time Travel) for the table, in bytes.
latest_snapshot_sizeThe storage size of the latest snapshot of the table, in bytes.
data_retention_period_in_hoursThe retention period for Time Travel data in hours (NULL means using the default retention policy).
errorAny error encountered while retrieving the storage size (NULL if no errors occurred).

SQL Examples

This example calculates the historical data for all tables in the default database:

SELECT * FROM fuse_time_travel_size('default')

┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
 database_name  table_name  is_dropped  time_travel_size  latest_snapshot_size  data_retention_period_in_hours        error      
├───────────────┼────────────┼────────────┼──────────────────┼──────────────────────┼────────────────────────────────┼──────────────────┤
 default        books       true                    2810                  1490                            NULL  NULL             
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘