This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

System Functions

1: CLUSTERING_INFORMATION
2: FUSE_BLOCK
3: FUSE_COLUMN
4: FUSE_ENCODING
5: FUSE_SEGMENT
6: FUSE_SNAPSHOT
7: FUSE_STATISTIC
8: FUSE_TIME_TRAVEL_SIZE

This section provides reference information for the system-related functions in PlaidCloud Lakehouse.

List of Functions:

1 - CLUSTERING_INFORMATION

Returns clustering information of a table.

SQL Syntax

CLUSTERING_INFORMATION('<database_name>', '<table_name>')

SQL Examples

CREATE TABLE mytable(a int, b int) CLUSTER BY(a+1);

INSERT INTO mytable VALUES(1,1),(3,3);
INSERT INTO mytable VALUES(2,2),(5,5);
INSERT INTO mytable VALUES(4,4);

SELECT * FROM CLUSTERING_INFORMATION('default','mytable')\G
*************************** 1. row ***************************
            cluster_key: ((a + 1))
      total_block_count: 3
   constant_block_count: 1
unclustered_block_count: 0
       average_overlaps: 1.3333
          average_depth: 2.0
  block_depth_histogram: {"00002":3}

Parameter	Description
cluster_key	The defined cluster key.
total_block_count	The current count of blocks.
constant_block_count	The count of blocks where min/max values are equal, meaning each block contains only one (group of) cluster_key value.
unclustered_block_count	The count of blocks that have not yet been clustered.
average_overlaps	The average ratio of overlapping blocks within a given range.
average_depth	The average depth of overlapping partitions for the cluster key.
block_depth_histogram	The number of partitions at each depth level. A higher concentration of partitions at lower depths indicates more effective table clustering.

2 - FUSE_BLOCK

Returns the block information of the latest or specified snapshot of a table. For more information about what is block in PlaidCloud Lakehouse, see What are Snapshot, Segment, and Block?.

The command returns the location information of each parquet file referenced by a snapshot. This enables downstream applications to access and consume the data stored in the files.

See Also:

SQL Syntax

FUSE_BLOCK('<database_name>', '<table_name>'[, '<snapshot_id>'])

SQL Examples

CREATE TABLE mytable(c int);
INSERT INTO mytable values(1);
INSERT INTO mytable values(2); 

SELECT * FROM FUSE_BLOCK('default', 'mytable');

---
+----------------------------------+----------------------------+----------------------------------------------------+------------+----------------------------------------------------+-------------------+
| snapshot_id                      | timestamp                  | block_location                                     | block_size | bloom_filter_location                              | bloom_filter_size |
+----------------------------------+----------------------------+----------------------------------------------------+------------+----------------------------------------------------+-------------------+
| 51e84b56458f44269b05a059b364a659 | 2022-09-15 07:14:14.137268 | 1/7/_b/39a6dbbfd9b44ad5a8ec8ab264c93cf5_v0.parquet |          4 | 1/7/_i/39a6dbbfd9b44ad5a8ec8ab264c93cf5_v1.parquet |               221 |
| 51e84b56458f44269b05a059b364a659 | 2022-09-15 07:14:14.137268 | 1/7/_b/d0ee9688c4d24d6da86acd8b0d6f4fad_v0.parquet |          4 | 1/7/_i/d0ee9688c4d24d6da86acd8b0d6f4fad_v1.parquet |               219 |
+----------------------------------+----------------------------+----------------------------------------------------+------------+----------------------------------------------------+-------------------+

3 - FUSE_COLUMN

Returns the column information of the latest or specified snapshot of a table. For more information about what is block in PlaidCloud Lakehouse, see What are Snapshot, Segment, and Block?.

See Also:

SQL Syntax

FUSE_COLUMN('<database_name>', '<table_name>'[, '<snapshot_id>'])

SQL Examples

CREATE TABLE mytable(c int);
INSERT INTO mytable values(1);
INSERT INTO mytable values(2);

SELECT * FROM FUSE_COLUMN('default', 'mytable');

---
+----------------------------------+----------------------------+---------------------------------------------------------+------------+-----------+-----------+-------------+-------------+-----------+--------------+------------------+
| snapshot_id                      | timestamp                  | block_location                                          | block_size | file_size | row_count | column_name | column_type | column_id | block_offset | bytes_compressed |
+----------------------------------+----------------------------+---------------------------------------------------------+------------+-----------+-----------+-------------+-------------+-----------+--------------+------------------+
| 3faefc1a9b6a48f388a8b59228dd06c1 | 2023-07-18 03:06:30.276502 | 1/118746/_b/44df130c207745cb858928135d39c1c0_v2.parquet |          4 |       196 |         1 | c           | Int32       |         0 |            8 |               14 |
| 3faefc1a9b6a48f388a8b59228dd06c1 | 2023-07-18 03:06:30.276502 | 1/118746/_b/b6f8496d7e3f4f62a89c09572840cf70_v2.parquet |          4 |       196 |         1 | c           | Int32       |         0 |            8 |               14 |
+----------------------------------+----------------------------+---------------------------------------------------------+------------+-----------+-----------+-------------+-------------+-----------+--------------+------------------+

4 - FUSE_ENCODING

Returns the encoding types applied to a specific column within a table. It helps you understand how data is compressed and stored in a native format within the table.

SQL Syntax

FUSE_ENCODING('<database_name>', '<table_name>', '<column_name>')

The function returns a result set with the following columns:

Column	Data Type	Description
VALIDITY_SIZE	Nullable(UInt32)	The size of a bitmap value that indicates whether each row in the column has a non-null value. This bitmap is used to track the presence or absence of null values in the column's data.
COMPRESSED_SIZE	UInt32	The size of the column data after compression.
UNCOMPRESSED_SIZE	UInt32	The size of the column data before applying encoding.
LEVEL_ONE	String	The primary or initial encoding applied to the column.
LEVEL_TWO	Nullable(String)	A secondary or recursive encoding method applied to the column after the initial encoding.

SQL Examples

-- Create a table with an integer column 'c' and apply 'Lz4' compression
CREATE TABLE t(c INT) STORAGE_FORMAT = 'native' COMPRESSION = 'lz4';

-- Insert data into the table.
INSERT INTO t SELECT number FROM numbers(2048);

-- Analyze the encoding for column 'c' in table 't'
SELECT LEVEL_ONE, LEVEL_TWO, COUNT(*) 
FROM FUSE_ENCODING('default', 't', 'c') 
GROUP BY LEVEL_ONE, LEVEL_TWO;

level_one   |level_two|count(*)|
------------+---------+--------+
DeltaBitpack|         |       1|

--  Insert 2,048 rows with the value 1 into the table 't'
INSERT INTO t (c)
SELECT 1
FROM numbers(2048);

SELECT LEVEL_ONE, LEVEL_TWO, COUNT(*) 
FROM FUSE_ENCODING('default', 't', 'c') 
GROUP BY LEVEL_ONE, LEVEL_TWO;

level_one   |level_two|count(*)|
------------+---------+--------+
OneValue    |         |       1|
DeltaBitpack|         |       1|

5 - FUSE_SEGMENT

Returns the segment information of a specified table snapshot. For more information about what is segment in PlaidCloud Lakehouse, see What are Snapshot, Segment, and Block?.

See Also:

SQL Syntax

FUSE_SEGMENT('<database_name>', '<table_name>','<snapshot_id>')

SQL Examples

CREATE TABLE mytable(c int);
INSERT INTO mytable values(1);
INSERT INTO mytable values(2); 

-- Obtain a snapshot ID
SELECT snapshot_id FROM FUSE_SNAPSHOT('default', 'mytable') limit 1;

---
+----------------------------------+
| snapshot_id                      |
+----------------------------------+
| 82c572947efa476892bd7c0635158ba2 |
+----------------------------------+

SELECT * FROM FUSE_SEGMENT('default', 'mytable', '82c572947efa476892bd7c0635158ba2');

---
+----------------------------------------------------+----------------+-------------+-----------+--------------------+------------------+
| file_location                                      | format_version | block_count | row_count | bytes_uncompressed | bytes_compressed |
+----------------------------------------------------+----------------+-------------+-----------+--------------------+------------------+
| 1/319/_sg/d35fe7bf99584301b22e8f6a8a9c97f9_v1.json |              1 |           1 |         1 |                  4 |              184 |
| 1/319/_sg/c261059d47c840e1b749222dabb4b2bb_v1.json |              1 |           1 |         1 |                  4 |              184 |
+----------------------------------------------------+----------------+-------------+-----------+--------------------+------------------+

6 - FUSE_SNAPSHOT

Returns the snapshot information of a table. For more information about what is snapshot in PlaidCloud Lakehouse, see What are Snapshot, Segment, and Block?.

See Also:

SQL Syntax

FUSE_SNAPSHOT('<database_name>', '<table_name>')

SQL Examples

CREATE TABLE mytable(a int, b int) CLUSTER BY(a+1);

INSERT INTO mytable VALUES(1,1),(3,3);
INSERT INTO mytable VALUES(2,2),(5,5);
INSERT INTO mytable VALUES(4,4);

SELECT * FROM FUSE_SNAPSHOT('default','mytable');

---
| snapshot_id                      | snapshot_location                                          | format_version | previous_snapshot_id             | segment_count | block_count | row_count | bytes_uncompressed | bytes_compressed | index_size | timestamp                  |
|----------------------------------|------------------------------------------------------------|----------------|----------------------------------|---------------|-------------|-----------|--------------------|------------------|------------|----------------------------|
| a13d211b7421432898a3786848b8ced3 | 670655/783287/_ss/a13d211b7421432898a3786848b8ced3_v1.json | 1              | \N                               | 1             | 1           | 2         | 16                 | 290              | 363        | 2022-09-19 14:51:52.860425 |
| cf08e6af6c134642aeb76bc81e6e7580 | 670655/783287/_ss/cf08e6af6c134642aeb76bc81e6e7580_v1.json | 1              | a13d211b7421432898a3786848b8ced3 | 2             | 2           | 4         | 32                 | 580              | 726        | 2022-09-19 14:52:15.282943 |
| 1bd4f68b831a402e8c42084476461aa1 | 670655/783287/_ss/1bd4f68b831a402e8c42084476461aa1_v1.json | 1              | cf08e6af6c134642aeb76bc81e6e7580 | 3             | 3           | 5         | 40                 | 862              | 1085       | 2022-09-19 14:52:20.284347 |

7 - FUSE_STATISTIC

Returns the estimated number of distinct values of each column in a table.

SQL Syntax

FUSE_STATISTIC('<database_name>', '<table_name>')

SQL Examples

You're most likely to use this function together with ANALYZE TABLE <table_name> to generate and check the statistical information of a table. For more explanations and examples, see OPTIMIZE TABLE.

8 - FUSE_TIME_TRAVEL_SIZE

Calculates the storage size of historical data (for Time Travel) for tables.

SQL Syntax

-- Calculate historical data size for all tables in all databases
SELECT ...
FROM fuse_time_travel_size();

-- Calculate historical data size for all tables in a specified database
SELECT ...
FROM fuse_time_travel_size('<database_name>');

-- Calculate historical data size for a specified table in a specified database
SELECT ...
FROM fuse_time_travel_size('<database_name>', '<table_name>'));

Output

The function returns a result set with the following columns:

Column	Description
`database_name`	The name of the database where the table is located.
`table_name`	The name of the table.
`is_dropped`	Indicates whether the table has been dropped (`true` for dropped tables, `false` otherwise).
`time_travel_size`	The total storage size of historical data (for Time Travel) for the table, in bytes.
`latest_snapshot_size`	The storage size of the latest snapshot of the table, in bytes.
`data_retention_period_in_hours`	The retention period for Time Travel data in hours (`NULL` means using the default retention policy).
`error`	Any error encountered while retrieving the storage size (`NULL` if no errors occurred).

SQL Examples

This example calculates the historical data for all tables in the default database:

SELECT * FROM fuse_time_travel_size('default')

┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ database_name │ table_name │ is_dropped │ time_travel_size │ latest_snapshot_size │ data_retention_period_in_hours │       error      │
├───────────────┼────────────┼────────────┼──────────────────┼──────────────────────┼────────────────────────────────┼──────────────────┤
│ default       │ books      │ true       │             2810 │                 1490 │                           NULL │ NULL             │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘