This section provides reference information for the table-related functions in PlaidCloud Lakehouse.
1 - GENERATE_SERIES
import FunctionDescription from '@site/src/components/FunctionDescription';
Generates a dataset starting from a specified point, ending at another specified point, and optionally with an incrementing value. The GENERATE_SERIES function works with the following data types:
- Integer
- Date
- Timestamp
Analyze Syntax
func.generate_series(<start>, <stop>[, <step_interval>])
Analyze Examples
func.generate_series(1, 10, 2);
generate_series|
---------------+
1|
3|
5|
7|
9|
SQL Syntax
GENERATE_SERIES(<start>, <stop>[, <step_interval>])
Arguments
Argument | Description |
---|---|
start | The starting value, representing the first number, date, or timestamp in the sequence. |
stop | The ending value, representing the last number, date, or timestamp in the sequence. |
step_interval | The step interval, determining the difference between adjacent values in the sequence. For integer sequences, the default value is 1. For date sequences, the default step interval is 1 day. For timestamp sequences, the default step interval is 1 microsecond. |
Return Type
Returns a list containing a continuous sequence of numeric values, dates, or timestamps from start to stop.
SQL Examples
SQL Examples 1: Generating Numeric, Date, and Timestamp Data
SELECT * FROM GENERATE_SERIES(1, 10, 2);
generate_series|
---------------+
1|
3|
5|
7|
9|
SELECT * FROM GENERATE_SERIES('2023-03-20'::date, '2023-03-27'::date);
generate_series|
---------------+
2023-03-20|
2023-03-21|
2023-03-22|
2023-03-23|
2023-03-24|
2023-03-25|
2023-03-26|
2023-03-27|
SELECT * FROM GENERATE_SERIES('2023-03-26 00:00'::timestamp, '2023-03-27 12:00'::timestamp, 86400000000);
generate_series |
-------------------+
2023-03-26 00:00:00|
2023-03-27 00:00:00|
SQL Examples 2: Filling Query Result Gaps
This example uses the GENERATE_SERIES function and left join operator to handle gaps in query results caused by missing information in specific ranges.
CREATE TABLE t_metrics (
date Date,
value INT
);
INSERT INTO t_metrics VALUES
('2020-01-01', 200),
('2020-01-01', 300),
('2020-01-04', 300),
('2020-01-04', 300),
('2020-01-05', 400),
('2020-01-10', 700);
SELECT date, SUM(value), COUNT() FROM t_metrics GROUP BY date ORDER BY date;
date |sum(value)|count()|
----------+----------+-------+
2020-01-01| 500| 2|
2020-01-04| 600| 2|
2020-01-05| 400| 1|
2020-01-10| 700| 1|
To close the gaps between January 1st and January 10th, 2020, use the following query:
SELECT t.date, COALESCE(SUM(t_metrics.value), 0), COUNT(t_metrics.value)
FROM generate_series(
'2020-01-01'::Date,
'2020-01-10'::Date
) AS t(date)
LEFT JOIN t_metrics ON t_metrics.date = t.date
GROUP BY t.date ORDER BY t.date;
date |coalesce(sum(t_metrics.value), 0)|count(t_metrics.value)|
----------+---------------------------------+----------------------+
2020-01-01| 500| 2|
2020-01-02| 0| 0|
2020-01-03| 0| 0|
2020-01-04| 600| 2|
2020-01-05| 400| 1|
2020-01-06| 0| 0|
2020-01-07| 0| 0|
2020-01-08| 0| 0|
2020-01-09| 0| 0|
2020-01-10| 700| 1|
2 - INFER_SCHEMA
Automatically detects the file metadata schema and retrieves the column definitions.
infer_schema
currently only supports parquet file format.SQL Syntax
INFER_SCHEMA(
LOCATION => '{ internalStage | externalStage }'
[ PATTERN => '<regex_pattern>']
)
Where:
internalStage
internalStage ::= @<internal_stage_name>[/<path>]
externalStage
externalStage ::= @<external_stage_name>[/<path>]
PATTERN = 'regex_pattern'
A PCRE2-based regular expression pattern string, enclosed in single quotes, specifying the file names to match. Click here to see an example. For PCRE2 syntax, see http://www.pcre.org/current/doc/html/pcre2syntax.html.
SQL Examples
Generate a parquet file in a stage:
CREATE STAGE infer_parquet FILE_FORMAT = (TYPE = PARQUET);
COPY INTO @infer_parquet FROM (SELECT * FROM numbers(10)) FILE_FORMAT = (TYPE = PARQUET);
LIST @infer_parquet;
+-------------------------------------------------------+------+------------------------------------+-------------------------------+---------+
| name | size | md5 | last_modified | creator |
+-------------------------------------------------------+------+------------------------------------+-------------------------------+---------+
| data_e0fd9cba-f45c-4c43-aa07-d6d87d134378_0_0.parquet | 258 | "7DCC9FFE04EA1F6882AED2CF9640D3D4" | 2023-02-09 05:21:52.000 +0000 | NULL |
+-------------------------------------------------------+------+------------------------------------+-------------------------------+---------+
infer_schema
SELECT * FROM INFER_SCHEMA(location => '@infer_parquet/data_e0fd9cba-f45c-4c43-aa07-d6d87d134378_0_0.parquet');
+-------------+-----------------+----------+----------+
| column_name | type | nullable | order_id |
+-------------+-----------------+----------+----------+
| number | BIGINT UNSIGNED | 0 | 0 |
+-------------+-----------------+----------+----------+
infer_schema
with Pattern Matching
SELECT * FROM infer_schema(location => '@infer_parquet/', pattern => '.*parquet');
+-------------+-----------------+----------+----------+
| column_name | type | nullable | order_id |
+-------------+-----------------+----------+----------+
| number | BIGINT UNSIGNED | 0 | 0 |
+-------------+-----------------+----------+----------+
Create a Table From Parquet File
The infer_schema
can only display the schema of a parquet file and cannot create a table from it.
To create a table from a parquet file:
CREATE TABLE mytable AS SELECT * FROM @infer_parquet/ (pattern=>'.*parquet') LIMIT 0;
DESC mytable;
+--------+-----------------+------+---------+-------+
| Field | Type | Null | Default | Extra |
+--------+-----------------+------+---------+-------+
| number | BIGINT UNSIGNED | NO | 0 | |
+--------+-----------------+------+---------+-------+
3 - INSPECT_PARQUET
import FunctionDescription from '@site/src/components/FunctionDescription';
Retrieves a table of comprehensive metadata from a staged Parquet file, including the following columns:
Column | Description |
---|---|
created_by | The entity or source responsible for creating the Parquet file |
num_columns | The number of columns in the Parquet file |
num_rows | The total number of rows or records in the Parquet file |
num_row_groups | The count of row groups within the Parquet file |
serialized_size | The size of the Parquet file on disk (compressed) |
max_row_groups_size_compressed | The size of the largest row group (compressed) |
max_row_groups_size_uncompressed | The size of the largest row group (uncompressed) |
SQL Syntax
INSPECT_PARQUET('@<path-to-file>')
SQL Examples
This example retrieves the metadata from a staged sample Parquet file named books.parquet. The file contains two records:
Transaction Processing,Jim Gray,1992
Readings in Database Systems,Michael Stonebraker,2004
-- Show the staged file
LIST @my_internal_stage;
┌──────────────────────────────────────────────────────────────────────────────────────────────┐
│ name │ size │ md5 │ last_modified │ creator │
├───────────────┼────────┼──────────────────┼───────────────────────────────┼──────────────────┤
│ books.parquet │ 998 │ NULL │ 2023-04-19 19:34:51.303 +0000 │ NULL │
└──────────────────────────────────────────────────────────────────────────────────────────────┘
-- Retrieve metadata from the staged file
SELECT * FROM INSPECT_PARQUET('@my_internal_stage/books.parquet');
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ created_by │ num_columns │ num_rows │ num_row_groups │ serialized_size │ max_row_groups_size_compressed │ max_row_groups_size_uncompressed │
├────────────────────────────────────┼─────────────┼──────────┼────────────────┼─────────────────┼────────────────────────────────┼──────────────────────────────────┤
│ parquet-cpp version 1.5.1-SNAPSHOT │ 3 │ 2 │ 1 │ 998 │ 332 │ 320 │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
4 - LIST_STAGE
import FunctionDescription from '@site/src/components/FunctionDescription';
Lists files in a stage. This allows you to filter files in a stage based on their extensions and obtain comprehensive details about each file. The function is similar to the DDL command LIST STAGE FILES, but provides you the flexibility to retrieve specific file information with the SELECT statement, such as file name, size, MD5 hash, last modified timestamp, and creator, rather than all file information.
SQL Syntax
LIST_STAGE(
LOCATION => '{ internalStage | externalStage | userStage }'
[ PATTERN => '<regex_pattern>']
)
Where:
internalStage
internalStage ::= @<internal_stage_name>[/<path>]
externalStage
externalStage ::= @<external_stage_name>[/<path>]
userStage
userStage ::= @~[/<path>]
PATTERN
See COPY INTO table.
SQL Examples
SELECT * FROM list_stage(location => '@my_stage/', pattern => '.*[.]log');
+----------------+------+------------------------------------+-------------------------------+---------+
| name | size | md5 | last_modified | creator |
+----------------+------+------------------------------------+-------------------------------+---------+
| 2023/meta.log | 475 | "4208ff530b252236e14b3cd797abdfbd" | 2023-04-19 20:23:24.000 +0000 | NULL |
| 2023/query.log | 1348 | "1c6654b207472c277fc8c6207c035e18" | 2023-04-19 20:23:24.000 +0000 | NULL |
+----------------+------+------------------------------------+-------------------------------+---------+
-- Equivalent to the following statement:
LIST @my_stage PATTERN = '.log';
5 - RESULT_SCAN
Returns the result set of a previous command in same session as if the result was a table.
SQL Syntax
RESULT_SCAN( { '<query_id>' | LAST_QUERY_ID() } )
SQL Examples
Create a simple table:
CREATE TABLE t1(a int);
Insert some values;
INSERT INTO t1(a) VALUES (1), (2), (3);
result_scan
SELECT * FROM t1 ORDER BY a;
+-------+
| a |
+-------+
| 1 |
+-------+
| 2 |
+-------+
| 3 |
+-------+
SELECT * FROM RESULT_SCAN(LAST_QUERY_ID()) ORDER BY a;
+-------+
| a |
+-------+
| 1 |
+-------+
| 2 |
+-------+
| 3 |
+-------+