This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Table Functions

This section provides reference information for the table-related functions in PlaidCloud Lakehouse.

1 - GENERATE_SERIES

import FunctionDescription from '@site/src/components/FunctionDescription';

Generates a dataset starting from a specified point, ending at another specified point, and optionally with an incrementing value. The GENERATE_SERIES function works with the following data types:

  • Integer
  • Date
  • Timestamp

Analyze Syntax

func.generate_series(<start>, <stop>[, <step_interval>])

Analyze Examples

func.generate_series(1, 10, 2);

generate_series|
---------------+
              1|
              3|
              5|
              7|
              9|

SQL Syntax

GENERATE_SERIES(<start>, <stop>[, <step_interval>])

Arguments

ArgumentDescription
startThe starting value, representing the first number, date, or timestamp in the sequence.
stopThe ending value, representing the last number, date, or timestamp in the sequence.
step_intervalThe step interval, determining the difference between adjacent values in the sequence. For integer sequences, the default value is 1. For date sequences, the default step interval is 1 day. For timestamp sequences, the default step interval is 1 microsecond.

Return Type

Returns a list containing a continuous sequence of numeric values, dates, or timestamps from start to stop.

SQL Examples

SQL Examples 1: Generating Numeric, Date, and Timestamp Data

SELECT * FROM GENERATE_SERIES(1, 10, 2);

generate_series|
---------------+
              1|
              3|
              5|
              7|
              9|

SELECT * FROM GENERATE_SERIES('2023-03-20'::date, '2023-03-27'::date);

generate_series|
---------------+
     2023-03-20|
     2023-03-21|
     2023-03-22|
     2023-03-23|
     2023-03-24|
     2023-03-25|
     2023-03-26|
     2023-03-27|

SELECT * FROM GENERATE_SERIES('2023-03-26 00:00'::timestamp, '2023-03-27 12:00'::timestamp, 86400000000);

generate_series    |
-------------------+
2023-03-26 00:00:00|
2023-03-27 00:00:00|

SQL Examples 2: Filling Query Result Gaps

This example uses the GENERATE_SERIES function and left join operator to handle gaps in query results caused by missing information in specific ranges.

CREATE TABLE t_metrics (
  date Date,
  value INT
);

INSERT INTO t_metrics VALUES
  ('2020-01-01', 200),
  ('2020-01-01', 300),
  ('2020-01-04', 300),
  ('2020-01-04', 300),
  ('2020-01-05', 400),
  ('2020-01-10', 700);

SELECT date, SUM(value), COUNT() FROM t_metrics GROUP BY date ORDER BY date;

date      |sum(value)|count()|
----------+----------+-------+
2020-01-01|       500|      2|
2020-01-04|       600|      2|
2020-01-05|       400|      1|
2020-01-10|       700|      1|

To close the gaps between January 1st and January 10th, 2020, use the following query:

SELECT t.date, COALESCE(SUM(t_metrics.value), 0), COUNT(t_metrics.value)
FROM generate_series(
  '2020-01-01'::Date,
  '2020-01-10'::Date
) AS t(date)
LEFT JOIN t_metrics ON t_metrics.date = t.date
GROUP BY t.date ORDER BY t.date;

date      |coalesce(sum(t_metrics.value), 0)|count(t_metrics.value)|
----------+---------------------------------+----------------------+
2020-01-01|                              500|                     2|
2020-01-02|                                0|                     0|
2020-01-03|                                0|                     0|
2020-01-04|                              600|                     2|
2020-01-05|                              400|                     1|
2020-01-06|                                0|                     0|
2020-01-07|                                0|                     0|
2020-01-08|                                0|                     0|
2020-01-09|                                0|                     0|
2020-01-10|                              700|                     1|

2 - INFER_SCHEMA

Automatically detects the file metadata schema and retrieves the column definitions.

SQL Syntax

INFER_SCHEMA(
  LOCATION => '{ internalStage | externalStage }'
  [ PATTERN => '<regex_pattern>']
)

Where:

internalStage

internalStage ::= @<internal_stage_name>[/<path>]

externalStage

externalStage ::= @<external_stage_name>[/<path>]

PATTERN = 'regex_pattern'

A PCRE2-based regular expression pattern string, enclosed in single quotes, specifying the file names to match. Click here to see an example. For PCRE2 syntax, see http://www.pcre.org/current/doc/html/pcre2syntax.html.

SQL Examples

Generate a parquet file in a stage:

CREATE STAGE infer_parquet FILE_FORMAT = (TYPE = PARQUET);
COPY INTO @infer_parquet FROM (SELECT * FROM numbers(10)) FILE_FORMAT = (TYPE = PARQUET);
LIST @infer_parquet;
+-------------------------------------------------------+------+------------------------------------+-------------------------------+---------+
| name                                                  | size | md5                                | last_modified                 | creator |
+-------------------------------------------------------+------+------------------------------------+-------------------------------+---------+
| data_e0fd9cba-f45c-4c43-aa07-d6d87d134378_0_0.parquet |  258 | "7DCC9FFE04EA1F6882AED2CF9640D3D4" | 2023-02-09 05:21:52.000 +0000 | NULL    |
+-------------------------------------------------------+------+------------------------------------+-------------------------------+---------+

infer_schema

SELECT * FROM INFER_SCHEMA(location => '@infer_parquet/data_e0fd9cba-f45c-4c43-aa07-d6d87d134378_0_0.parquet');
+-------------+-----------------+----------+----------+
| column_name | type            | nullable | order_id |
+-------------+-----------------+----------+----------+
| number      | BIGINT UNSIGNED |        0 |        0 |
+-------------+-----------------+----------+----------+

infer_schema with Pattern Matching

SELECT * FROM infer_schema(location => '@infer_parquet/', pattern => '.*parquet');
+-------------+-----------------+----------+----------+
| column_name | type            | nullable | order_id |
+-------------+-----------------+----------+----------+
| number      | BIGINT UNSIGNED |        0 |        0 |
+-------------+-----------------+----------+----------+

Create a Table From Parquet File

The infer_schema can only display the schema of a parquet file and cannot create a table from it.

To create a table from a parquet file:

CREATE TABLE mytable AS SELECT * FROM @infer_parquet/ (pattern=>'.*parquet') LIMIT 0;

DESC mytable;
+--------+-----------------+------+---------+-------+
| Field  | Type            | Null | Default | Extra |
+--------+-----------------+------+---------+-------+
| number | BIGINT UNSIGNED | NO   | 0       |       |
+--------+-----------------+------+---------+-------+

3 - INSPECT_PARQUET

import FunctionDescription from '@site/src/components/FunctionDescription';

Retrieves a table of comprehensive metadata from a staged Parquet file, including the following columns:

ColumnDescription
created_byThe entity or source responsible for creating the Parquet file
num_columnsThe number of columns in the Parquet file
num_rowsThe total number of rows or records in the Parquet file
num_row_groupsThe count of row groups within the Parquet file
serialized_sizeThe size of the Parquet file on disk (compressed)
max_row_groups_size_compressedThe size of the largest row group (compressed)
max_row_groups_size_uncompressedThe size of the largest row group (uncompressed)

SQL Syntax

INSPECT_PARQUET('@<path-to-file>')

SQL Examples

This example retrieves the metadata from a staged sample Parquet file named books.parquet. The file contains two records:

Transaction Processing,Jim Gray,1992
Readings in Database Systems,Michael Stonebraker,2004
-- Show the staged file
LIST @my_internal_stage;

┌──────────────────────────────────────────────────────────────────────────────────────────────┐
      name       size          md5                last_modified               creator     
├───────────────┼────────┼──────────────────┼───────────────────────────────┼──────────────────┤
 books.parquet     998  NULL              2023-04-19 19:34:51.303 +0000  NULL             
└──────────────────────────────────────────────────────────────────────────────────────────────┘

-- Retrieve metadata from the staged file
SELECT * FROM INSPECT_PARQUET('@my_internal_stage/books.parquet');

┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
             created_by              num_columns  num_rows  num_row_groups  serialized_size  max_row_groups_size_compressed  max_row_groups_size_uncompressed 
├────────────────────────────────────┼─────────────┼──────────┼────────────────┼─────────────────┼────────────────────────────────┼──────────────────────────────────┤
 parquet-cpp version 1.5.1-SNAPSHOT            3         2               1              998                             332                               320 
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

4 - LIST_STAGE

import FunctionDescription from '@site/src/components/FunctionDescription';

Lists files in a stage. This allows you to filter files in a stage based on their extensions and obtain comprehensive details about each file. The function is similar to the DDL command LIST STAGE FILES, but provides you the flexibility to retrieve specific file information with the SELECT statement, such as file name, size, MD5 hash, last modified timestamp, and creator, rather than all file information.

SQL Syntax

LIST_STAGE(
  LOCATION => '{ internalStage | externalStage | userStage }'
  [ PATTERN => '<regex_pattern>']
)

Where:

internalStage

internalStage ::= @<internal_stage_name>[/<path>]

externalStage

externalStage ::= @<external_stage_name>[/<path>]

userStage

userStage ::= @~[/<path>]

PATTERN

See COPY INTO table.

SQL Examples

SELECT * FROM list_stage(location => '@my_stage/', pattern => '.*[.]log');
+----------------+------+------------------------------------+-------------------------------+---------+
|      name      | size |                md5                 |         last_modified         | creator |
+----------------+------+------------------------------------+-------------------------------+---------+
| 2023/meta.log  |  475 | "4208ff530b252236e14b3cd797abdfbd" | 2023-04-19 20:23:24.000 +0000 | NULL    |
| 2023/query.log | 1348 | "1c6654b207472c277fc8c6207c035e18" | 2023-04-19 20:23:24.000 +0000 | NULL    |
+----------------+------+------------------------------------+-------------------------------+---------+

-- Equivalent to the following statement:
LIST @my_stage PATTERN = '.log';

5 - RESULT_SCAN

Returns the result set of a previous command in same session as if the result was a table.

SQL Syntax

RESULT_SCAN( { '<query_id>' | LAST_QUERY_ID() } )

SQL Examples

Create a simple table:

CREATE TABLE t1(a int);

Insert some values;

INSERT INTO t1(a) VALUES (1), (2), (3);

result_scan

SELECT * FROM t1 ORDER BY a;
+-------+
|   a   |
+-------+
|   1   |
+-------+
|   2   |
+-------+
|   3   |
+-------+
SELECT * FROM RESULT_SCAN(LAST_QUERY_ID()) ORDER BY a;
+-------+
|   a   |
+-------+
|   1   |
+-------+
|   2   |
+-------+
|   3   |
+-------+