Aggregate Functions
Aggregate functions are essential tools in SQL that allow you to perform calculations on a set of values and return a single result.
These functions help you extract and summarize data from databases to gain valuable insights.
Function Name | What It Does |
---|
ANY | Checks if any row meets the specified condition |
APPROX_COUNT_DISTINCT | Estimates the number of distinct values with HyperLogLog |
ARG_MAX | Finds the arg value for the maximum val value |
ARG_MIN | Finds the arg value for the minimum val value |
AVG_IF | Calculates the average for rows meeting a condition |
ARRAY_AGG | Converts all the values of a column to an Array |
AVG | Calculates the average value of a specific column |
COUNT_DISTINCT | Counts the number of distinct values in a column |
COUNT_IF | Counts rows meeting a specified condition |
COUNT | Counts the number of rows that meet certain criteria |
COVAR_POP | Returns the population covariance of a set of number pairs |
COVAR_SAMP | Returns the sample covariance of a set of number pairs |
GROUP_ARRAY_MOVING_AVG | Returns an array with elements calculates the moving average of input values |
GROUP_ARRAY_MOVING_SUM | Returns an array with elements calculates the moving sum of input values |
KURTOSIS | Calculates the excess kurtosis of a set of values |
MAX_IF | Finds the maximum value for rows meeting a condition |
MAX | Finds the largest value in a specific column |
MEDIAN | Calculates the median value of a specific column |
MEDIAN_TDIGEST | Calculates the median value of a specific column using t-digest algorithm |
MIN_IF | Finds the minimum value for rows meeting a condition |
MIN | Finds the smallest value in a specific column |
QUANTILE_CONT | Calculates the interpolated quantile for a specific column |
QUANTILE_DISC | Calculates the quantile for a specific column |
QUANTILE_TDIGEST | Calculates the quantile using t-digest algorithm |
QUANTILE_TDIGEST_WEIGHTED | Calculates the quantile with weighted using t-digest algorithm |
RETENTION | Calculates retention for a set of events |
SKEWNESS | Calculates the skewness of a set of values |
STDDEV_POP | Calculates the population standard deviation of a column |
STDDEV_SAMP | Calculates the sample standard deviation of a column |
STRING_AGG | Converts all the non-NULL values to String, separated by the delimiter |
SUM_IF | Adds up the values meeting a condition of a specific column |
SUM | Adds up the values of a specific column |
WINDOW_FUNNEL | Analyzes user behavior in a time-ordered sequence of events |
1 - ANY
Aggregate function.
The ANY() function selects the first encountered (non-NULL) value, unless all rows have NULL values in that column. The query can be executed in any order and even in a different order each time, so the result of this function is indeterminate. To get a determinate result, you can use the ‘min’ or ‘max’ function instead of ‘any’.
Analyze Syntax
Analyze Examples
func.any(table.product_name).alias('any_product_name')
| any_product_name |
|------------------|
| Laptop |
SQL Syntax
Arguments
Arguments | Description |
---|
<expr> | Any expression |
Return Type
The first encountered (non-NULL) value, in the type of the value. If all values are NULL, the return value is NULL.
SQL Examples
Create a Table and Insert Sample Data
CREATE TABLE product_data (
id INT,
product_name VARCHAR NULL,
price FLOAT NULL
);
INSERT INTO product_data (id, product_name, price)
VALUES (1, 'Laptop', 1000),
(2, NULL, 800),
(3, 'Keyboard', NULL),
(4, 'Mouse', 25),
(5, 'Monitor', 150);
Query Demo: Retrieve the First Encountered Non-NULL Product Name
SELECT ANY(product_name) AS any_product_name
FROM product_data;
Result
| any_product_name |
|------------------|
| Laptop |
2 - APPROX_COUNT_DISTINCT
Estimates the number of distinct values in a data set with the HyperLogLog algorithm.
The HyperLogLog algorithm provides an approximation of the number of unique elements using little memory and time. Consider using this function when dealing with large data sets where an estimated result can be accepted. In exchange for some accuracy, this is a fast and efficient method of returning distinct counts.
To get an accurate result, use COUNT_DISTINCT. See Examples for more explanations.
Analyze Syntax
func.approx_count_distinct(<expr>)
Analyze Examples
func.approx_count_distinct(table.user_id).alias('approx_distinct_user_count')
| approx_distinct_user_count |
|----------------------------|
| 4 |
SQL Syntax
APPROX_COUNT_DISTINCT(<expr>)
Return Type
Integer.
SQL Examples
Create a Table and Insert Sample Data
CREATE TABLE user_events (
id INT,
user_id INT,
event_name VARCHAR
);
INSERT INTO user_events (id, user_id, event_name)
VALUES (1, 1, 'Login'),
(2, 2, 'Login'),
(3, 3, 'Login'),
(4, 1, 'Logout'),
(5, 2, 'Logout'),
(6, 4, 'Login'),
(7, 1, 'Login');
Query Demo: Estimate the Number of Distinct User IDs
SELECT APPROX_COUNT_DISTINCT(user_id) AS approx_distinct_user_count
FROM user_events;
Result
| approx_distinct_user_count |
|----------------------------|
| 4 |
3 - ARG_MAX
Calculates the arg
value for a maximum val
value. If there are several values of arg
for maximum values of val
, returns the first of these values encountered.
Analyze Syntax
Analyze Examples
func.arg_max(table.product, table.price).alias('max_price_product')
| max_price_product |
| ----------------- |
| Product C |
SQL Syntax
Arguments
Return Type
arg
value that corresponds to maximum val
value.
matches arg
type.
SQL Examples
Creating a Table and Inserting Sample Data
Let's create a table named "sales" and insert some sample data:
CREATE TABLE sales (
id INTEGER,
product VARCHAR(50),
price FLOAT
);
INSERT INTO sales (id, product, price)
VALUES (1, 'Product A', 10.5),
(2, 'Product B', 20.75),
(3, 'Product C', 30.0),
(4, 'Product D', 15.25),
(5, 'Product E', 25.5);
Query: Using ARG_MAX() Function
Now, let's use the ARG_MAX() function to find the product that has the maximum price:
SELECT ARG_MAX(product, price) AS max_price_product
FROM sales;
The result should look like this:
| max_price_product |
| ----------------- |
| Product C |
4 - ARG_MIN
Calculates the arg
value for a minimum val
value. If there are several different values of arg
for minimum values of val
, returns the first of these values encountered.
Analyze Syntax
Analyze Examples
func.arg_min(table.name, table.score).alias('student_name')
| student_name |
|--------------|
| Charlie |
SQL Syntax
Arguments
Return Type
arg
value that corresponds to minimum val
value.
matches arg
type.
SQL Examples
Let's create a table students with columns id, name, and score, and insert some data:
CREATE TABLE students (
id INT,
name VARCHAR,
score INT
);
INSERT INTO students (id, name, score) VALUES
(1, 'Alice', 80),
(2, 'Bob', 75),
(3, 'Charlie', 90),
(4, 'Dave', 80);
Now, we can use ARG_MIN to find the name of the student with the lowest score:
SELECT ARG_MIN(name, score) AS student_name
FROM students;
Result:
| student_name |
|--------------|
| Charlie |
5 - ARRAY_AGG
The ARRAY_AGG function (also known by its alias LIST) transforms all the values, including NULL, of a specific column in a query result into an array.
Analyze Syntax
Analyze Examples
table.movie_title, func.array_agg(table.rating).alias('ratings')
| movie_title | ratings |
|-------------|------------|
| Inception | [5, 4, 5] |
SQL Syntax
ARRAY_AGG(<expr>)
LIST(<expr>)
Arguments
Arguments | Description |
---|
<expr> | Any expression |
Return Type
Returns an Array with elements that are of the same type as the original data.
SQL Examples
This example demonstrates how the ARRAY_AGG function can be used to aggregate and present data in a convenient array format:
-- Create a table and insert sample data
CREATE TABLE movie_ratings (
id INT,
movie_title VARCHAR,
user_id INT,
rating INT
);
INSERT INTO movie_ratings (id, movie_title, user_id, rating)
VALUES (1, 'Inception', 1, 5),
(2, 'Inception', 2, 4),
(3, 'Inception', 3, 5),
(4, 'Interstellar', 1, 4),
(5, 'Interstellar', 2, 3);
-- List all ratings for Inception in an array
SELECT movie_title, ARRAY_AGG(rating) AS ratings
FROM movie_ratings
WHERE movie_title = 'Inception'
GROUP BY movie_title;
| movie_title | ratings |
|-------------|------------|
| Inception | [5, 4, 5] |
6 - AVG
Aggregate function.
The AVG() function returns the average value of an expression.
Note: NULL values are not counted.
Analyze Syntax
Analyze Examples
func.avg(table.price).alias('avg_price')
| avg_price |
| --------- |
| 20.4 |
SQL Syntax
Arguments
Arguments | Description |
---|
<expr> | Any numerical expression |
Return Type
double
SQL Examples
Creating a Table and Inserting Sample Data
Let's create a table named "sales" and insert some sample data:
CREATE TABLE sales (
id INTEGER,
product VARCHAR(50),
price FLOAT
);
INSERT INTO sales (id, product, price)
VALUES (1, 'Product A', 10.5),
(2, 'Product B', 20.75),
(3, 'Product C', 30.0),
(4, 'Product D', 15.25),
(5, 'Product E', 25.5);
Query: Using AVG() Function
Now, let's use the AVG() function to find the average price of all products in the "sales" table:
SELECT AVG(price) AS avg_price
FROM sales;
The result should look like this:
| avg_price |
| --------- |
| 20.4 |
7 - AVG_IF
The suffix -If can be appended to the name of any aggregate function. In this case, the aggregate function accepts an extra argument – a condition.
Analyze Syntax
func.avg_if(<column>, <cond>)
Analyze Examples
func.avg_if(table.salary, table.department=='IT').alias('avg_salary_it')
| avg_salary_it |
|-----------------|
| 65000.0 |
SQL Syntax
SQL Examples
Create a Table and Insert Sample Data
CREATE TABLE employees (
id INT,
salary INT,
department VARCHAR
);
INSERT INTO employees (id, salary, department)
VALUES (1, 50000, 'HR'),
(2, 60000, 'IT'),
(3, 55000, 'HR'),
(4, 70000, 'IT'),
(5, 65000, 'IT');
Query Demo: Calculate Average Salary for IT Department
SELECT AVG_IF(salary, department = 'IT') AS avg_salary_it
FROM employees;
Result
| avg_salary_it |
|-----------------|
| 65000.0 |
8 - COUNT
The COUNT() function returns the number of records returned by a SELECT query.
Caution: NULL values are not counted.
Analyze Syntax
Analyze Examples
func.count(table.grade).alias('count_valid_grades')
| count_valid_grades |
|--------------------|
| 4 |
SQL Syntax
Arguments
Arguments | Description |
---|
<expr> | Any expression. This may be a column name, the result of another function, or a math operation.
* is also allowed, to indicate pure row counting. |
Return Type
An integer.
SQL Examples
Create a Table and Insert Sample Data
CREATE TABLE students (
id INT,
name VARCHAR,
age INT,
grade FLOAT NULL
);
INSERT INTO students (id, name, age, grade)
VALUES (1, 'John', 21, 85),
(2, 'Emma', 22, NULL),
(3, 'Alice', 23, 90),
(4, 'Michael', 21, 88),
(5, 'Sophie', 22, 92);
Query Demo: Count Students with Valid Grades
SELECT COUNT(grade) AS count_valid_grades
FROM students;
Result
| count_valid_grades |
|--------------------|
| 4 |
9 - COUNT_DISTINCT
Aggregate function.
The count(distinct ...) function calculates the unique value of a set of values.
To obtain an estimated result from large data sets with little memory and time, consider using APPROX_COUNT_DISTINCT.
Caution: NULL values are not counted.
Analyze Syntax
func.count_distinct(<column>)
Analyze Examples
func.count_distinct(table.category).alias('unique_categories')
| unique_categories |
|-------------------|
| 2 |
SQL Syntax
COUNT(distinct <expr> ...)
UNIQ(<expr>)
Arguments
Arguments | Description |
---|
<expr> | Any expression, size of the arguments is [1, 32] |
Return Type
UInt64
SQL Examples
Create a Table and Insert Sample Data
CREATE TABLE products (
id INT,
name VARCHAR,
category VARCHAR,
price FLOAT
);
INSERT INTO products (id, name, category, price)
VALUES (1, 'Laptop', 'Electronics', 1000),
(2, 'Smartphone', 'Electronics', 800),
(3, 'Tablet', 'Electronics', 600),
(4, 'Chair', 'Furniture', 150),
(5, 'Table', 'Furniture', 300);
Query Demo: Count Distinct Categories
SELECT COUNT(DISTINCT category) AS unique_categories
FROM products;
Result
| unique_categories |
|-------------------|
| 2 |
10 - COUNT_IF
The suffix _IF
can be appended to the name of any aggregate function. In this case, the aggregate function accepts an extra argument – a condition.
Analyze Syntax
func.count_if(<column>, <cond>)
Analyze Examples
func.count_if(table.status, table.status=='Completed').alias('completed_orders')
| completed_orders |
|------------------|
| 3 |
SQL Example
COUNT_IF(<column>, <cond>)
SQL Examples
Create a Table and Insert Sample Data
CREATE TABLE orders (
id INT,
customer_id INT,
status VARCHAR,
total FLOAT
);
INSERT INTO orders (id, customer_id, status, total)
VALUES (1, 1, 'completed', 100),
(2, 2, 'completed', 200),
(3, 1, 'pending', 150),
(4, 3, 'completed', 250),
(5, 2, 'pending', 300);
Query Demo: Count Completed Orders
SELECT COUNT_IF(status, status = 'completed') AS completed_orders
FROM orders;
Result
| completed_orders |
|------------------|
| 3 |
11 - COVAR_POP
COVAR_POP returns the population covariance of a set of number pairs.
Analyze Syntax
func.covar_pop(<expr1>, <expr2>)
Analyze Examples
func.covar_pop(table.units_sold, table.revenue).alias('covar_pop_units_revenue')
| covar_pop_units_revenue |
|-------------------------|
| 20000.0 |
SQL Syntax
COVAR_POP(<expr1>, <expr2>)
Arguments
Arguments | Description |
---|
<expr1> | Any numerical expression |
<expr2> | Any numerical expression |
Return Type
float64
SQL Examples
Create a Table and Insert Sample Data
CREATE TABLE product_sales (
id INT,
product_id INT,
units_sold INT,
revenue FLOAT
);
INSERT INTO product_sales (id, product_id, units_sold, revenue)
VALUES (1, 1, 10, 1000),
(2, 2, 20, 2000),
(3, 3, 30, 3000),
(4, 4, 40, 4000),
(5, 5, 50, 5000);
Query Demo: Calculate Population Covariance between Units Sold and Revenue
SELECT COVAR_POP(units_sold, revenue) AS covar_pop_units_revenue
FROM product_sales;
Result
| covar_pop_units_revenue |
|-------------------------|
| 20000.0 |
12 - COVAR_SAMP
Aggregate function.
The covar_samp() function returns the sample covariance (Σ((x - x̅)(y - y̅)) / (n - 1)) of two data columns.
Caution: NULL values are not counted.
Analyze Syntax
func.covar_samp(<expr1>, <expr2>)
Analyze Examples
func.covar_samp(table.items_sold, table.profit).alias('covar_samp_items_profit')
| covar_samp_items_profit |
|-------------------------|
| 250000.0 |
SQL Syntax
COVAR_SAMP(<expr1>, <expr2>)
Arguments
Arguments | Description |
---|
<expr1> | Any numerical expression |
<expr2> | Any numerical expression |
Return Type
float64, when n <= 1, returns +∞.
SQL Examples
Create a Table and Insert Sample Data
CREATE TABLE store_sales (
id INT,
store_id INT,
items_sold INT,
profit FLOAT
);
INSERT INTO store_sales (id, store_id, items_sold, profit)
VALUES (1, 1, 100, 1000),
(2, 2, 200, 2000),
(3, 3, 300, 3000),
(4, 4, 400, 4000),
(5, 5, 500, 5000);
Query Demo: Calculate Sample Covariance between Items Sold and Profit
SELECT COVAR_SAMP(items_sold, profit) AS covar_samp_items_profit
FROM store_sales;
Result
| covar_samp_items_profit |
|-------------------------|
| 250000.0 |
13 - GROUP_ARRAY_MOVING_AVG
The GROUP_ARRAY_MOVING_AVG function calculates the moving average of input values. The function can take the window size as a parameter. If left unspecified, the function takes the window size equal to the number of input values.
Analyze Syntax
func.group_array_moving_avg(<expr1>)
Analyze Examples
table.user_id, func.group_array_moving_avg(table.request_num).alias('avg_request_num')
| user_id | avg_request_num |
|---------|------------------|
| 1 | [5.0,11.5,21.5] |
| 3 | [10.0,22.5,35.0] |
| 2 | [7.5,18.0,31.0] |
SQL Syntax
GROUP_ARRAY_MOVING_AVG(<expr>)
GROUP_ARRAY_MOVING_AVG(<window_size>)(<expr>)
Arguments
Arguments | Description |
---|
<window_size> | Any numerical expression |
<expr> | Any numerical expression |
Return Type
Returns an Array with elements of double or decimal depending on the source data type.
SQL Examples
-- Create a table and insert sample data
CREATE TABLE hits (
user_id INT,
request_num INT
);
INSERT INTO hits (user_id, request_num)
VALUES (1, 10),
(2, 15),
(3, 20),
(1, 13),
(2, 21),
(3, 25),
(1, 30),
(2, 41),
(3, 45);
SELECT user_id, GROUP_ARRAY_MOVING_AVG(2)(request_num) AS avg_request_num
FROM hits
GROUP BY user_id;
| user_id | avg_request_num |
|---------|------------------|
| 1 | [5.0,11.5,21.5] |
| 3 | [10.0,22.5,35.0] |
| 2 | [7.5,18.0,31.0] |
14 - GROUP_ARRAY_MOVING_SUM
The GROUP_ARRAY_MOVING_SUM function calculates the moving sum of input values. The function can take the window size as a parameter. If left unspecified, the function takes the window size equal to the number of input values.
Analyze Syntax
func.group_array_moving_sum(<expr>)
Analyze Examples
table.user_id, func.group_array_moving_sum(table.request_num)
| user_id | request_num |
|---------|-------------|
| 1 | [10,23,43] |
| 2 | [20,45,70] |
| 3 | [15,36,62] |
SQL Syntax
GROUP_ARRAY_MOVING_SUM(<expr>)
GROUP_ARRAY_MOVING_SUM(<window_size>)(<expr>)
Arguments
Arguments | Description |
---|
<window_size> | Any numerical expression |
<expr> | Any numerical expression |
Return Type
Returns an Array with elements that are of the same type as the original data.
SQL Examples
-- Create a table and insert sample data
CREATE TABLE hits (
user_id INT,
request_num INT
);
INSERT INTO hits (user_id, request_num)
VALUES (1, 10),
(2, 15),
(3, 20),
(1, 13),
(2, 21),
(3, 25),
(1, 30),
(2, 41),
(3, 45);
SELECT user_id, GROUP_ARRAY_MOVING_SUM(2)(request_num) AS request_num
FROM hits
GROUP BY user_id;
| user_id | request_num |
|---------|-------------|
| 1 | [10,23,43] |
| 2 | [20,45,70] |
| 3 | [15,36,62] |
15 - KURTOSIS
Aggregate function.
The KURTOSIS()
function returns the excess kurtosis of all input values.
Analyze Syntax
Analyze Examples
func.kurtosis(table.price).alias('excess_kurtosis')
| excess_kurtosis |
|-------------------------|
| 0.06818181325581445 |
SQL Syntax
Arguments
Arguments | Description |
---|
<expr> | Any numerical expression |
Return Type
Nullable Float64.
SQL Examples
Create a Table and Insert Sample Data
CREATE TABLE stock_prices (
id INT,
stock_symbol VARCHAR,
price FLOAT
);
INSERT INTO stock_prices (id, stock_symbol, price)
VALUES (1, 'AAPL', 150),
(2, 'AAPL', 152),
(3, 'AAPL', 148),
(4, 'AAPL', 160),
(5, 'AAPL', 155);
Query Demo: Calculate Excess Kurtosis for Apple Stock Prices
SELECT KURTOSIS(price) AS excess_kurtosis
FROM stock_prices
WHERE stock_symbol = 'AAPL';
Result
| excess_kurtosis |
|-------------------------|
| 0.06818181325581445 |
16 - MAX
Aggregate function.
The MAX() function returns the maximum value in a set of values.
Analyze Syntax
Analyze Examples
table.city, func.max(table.temperature).alias('max_temperature')
| city | max_temperature |
|------------|-----------------|
| New York | 32 |
SQL Syntax
MAX(<expr>)
Arguments
Arguments | Description |
---|
<expr> | Any expression |
Return Type
The maximum value, in the type of the value.
SQL Examples
Create a Table and Insert Sample Data
CREATE TABLE temperatures (
id INT,
city VARCHAR,
temperature FLOAT
);
INSERT INTO temperatures (id, city, temperature)
VALUES (1, 'New York', 30),
(2, 'New York', 28),
(3, 'New York', 32),
(4, 'Los Angeles', 25),
(5, 'Los Angeles', 27);
Query Demo: Find Maximum Temperature for New York City
SELECT city, MAX(temperature) AS max_temperature
FROM temperatures
WHERE city = 'New York'
GROUP BY city;
Result
| city | max_temperature |
|------------|-----------------|
| New York | 32 |
17 - MAX_IF
The suffix _IF
can be appended to the name of any aggregate function. In this case, the aggregate function accepts an extra argument – a condition.
Analyze Syntax
func.max_if(<column>, <cond>)
Analyze Examples
func.max_if(table.revenue, table.salesperson_id==1).alias('max_revenue_salesperson_1')
| max_revenue_salesperson_1 |
|---------------------------|
| 3000 |
SQL Example
SQL Examples
Create a Table and Insert Sample Data
CREATE TABLE sales (
id INT,
salesperson_id INT,
product_id INT,
revenue FLOAT
);
INSERT INTO sales (id, salesperson_id, product_id, revenue)
VALUES (1, 1, 1, 1000),
(2, 1, 2, 2000),
(3, 1, 3, 3000),
(4, 2, 1, 1500),
(5, 2, 2, 2500);
Query Demo: Find Maximum Revenue for Salesperson with ID 1
SELECT MAX_IF(revenue, salesperson_id = 1) AS max_revenue_salesperson_1
FROM sales;
Result
| max_revenue_salesperson_1 |
|---------------------------|
| 3000 |
18 - MEDIAN
Aggregate function.
The MEDIAN() function computes the median of a numeric data sequence.
Caution: NULL values are not counted.
Analyze Syntax
Analyze Examples
func.median(table.score).alias('median_score')
| median_score |
|----------------|
| 85.0 |
SQL Syntax
Arguments
Arguments | Description |
---|
<expr> | Any numerical expression |
Return Type
the type of the value.
SQL Examples
Create a Table and Insert Sample Data
CREATE TABLE exam_scores (
id INT,
student_id INT,
score INT
);
INSERT INTO exam_scores (id, student_id, score)
VALUES (1, 1, 80),
(2, 2, 90),
(3, 3, 75),
(4, 4, 95),
(5, 5, 85);
Query Demo: Calculate Median Exam Score
SELECT MEDIAN(score) AS median_score
FROM exam_scores;
Result
| median_score |
|----------------|
| 85.0 |
19 - MEDIAN_TDIGEST
Computes the median of a numeric data sequence using the t-digest algorithm.
Caution: NULL values are not included in the calculation.
Analyze Syntax
func.median_tdigest(<expr>)
Analyze Examples
func.median_tdigest(table.score).alias('median_score')
| median_score |
|----------------|
| 85.0 |
SQL Syntax
Arguments
Arguments | Description |
---|
<expr> | Any numerical expression |
Return Type
Returns a value of the same data type as the input values.
SQL Examples
-- Create a table and insert sample data
CREATE TABLE exam_scores (
id INT,
student_id INT,
score INT
);
INSERT INTO exam_scores (id, student_id, score)
VALUES (1, 1, 80),
(2, 2, 90),
(3, 3, 75),
(4, 4, 95),
(5, 5, 85);
-- Calculate median exam score
SELECT MEDIAN_TDIGEST(score) AS median_score
FROM exam_scores;
| median_score |
|----------------|
| 85.0 |
20 - MIN
Aggregate function.
The MIN() function returns the minimum value in a set of values.
Analyze Syntax
Analyze Examples
table.station_id, func.min(table.price).alias('min_price')
| station_id | min_price |
|------------|-----------|
| 1 | 3.45 |
SQL Syntax
MIN(<expr>)
Arguments
Arguments | Description |
---|
<expr> | Any expression |
Return Type
The minimum value, in the type of the value.
SQL Examples
title: MIN
Aggregate function.
The MIN() function returns the minimum value in a set of values.
SQL Syntax
MIN(expression)
Arguments
Arguments | Description |
---|
expression | Any expression |
Return Type
The minimum value, in the type of the value.
SQL Examples
Create a Table and Insert Sample Data
CREATE TABLE gas_prices (
id INT,
station_id INT,
price FLOAT
);
INSERT INTO gas_prices (id, station_id, price)
VALUES (1, 1, 3.50),
(2, 1, 3.45),
(3, 1, 3.55),
(4, 2, 3.40),
(5, 2, 3.35);
Query Demo: Find Minimum Gas Price for Station 1
SELECT station_id, MIN(price) AS min_price
FROM gas_prices
WHERE station_id = 1
GROUP BY station_id;
Result
| station_id | min_price |
|------------|-----------|
| 1 | 3.45 |
21 - MIN_IF
The suffix _IF
can be appended to the name of any aggregate function. In this case, the aggregate function accepts an extra argument – a condition.
Analyze Syntax
func.min_if(<column>, <cond>)
Analyze Examples
func.min_if(table.budget, table.departing=='IT').alias('min_it_budget')
| min_it_budget |
|---------------|
| 2000 |
SQL Syntax
MIN_IF(<column>, <cond>)
SQL Examples
Create a Table and Insert Sample Data
CREATE TABLE project_budgets (
id INT,
project_id INT,
department VARCHAR,
budget FLOAT
);
INSERT INTO project_budgets (id, project_id, department, budget)
VALUES (1, 1, 'HR', 1000),
(2, 1, 'IT', 2000),
(3, 1, 'Marketing', 3000),
(4, 2, 'HR', 1500),
(5, 2, 'IT', 2500);
Query Demo: Find Minimum Budget for IT Department
SELECT MIN_IF(budget, department = 'IT') AS min_it_budget
FROM project_budgets;
Result
| min_it_budget |
|---------------|
| 2000 |
22 - QUANTILE_CONT
Aggregate function.
The QUANTILE_CONT() function computes the interpolated quantile number of a numeric data sequence.
Caution: NULL values are not counted.
Analyze Syntax
func.quantile_cont(<levels>, <expr>)
Analyze Examples
func.quantile_cont(0.5, table.sales_amount).alias('median_sales_amount')
| median_sales_amount |
|-----------------------|
| 6000.0 |
SQL Syntax
QUANTILE_CONT(<levels>)(<expr>)
QUANTILE_CONT(level1, level2, ...)(<expr>)
Arguments
Arguments | Description |
---|
<level(s) | level(s) of quantile. Each level is constant floating-point number from 0 to 1. We recommend using a level value in the range of [0.01, 0.99] |
<expr> | Any numerical expression |
Return Type
Float64 or float64 array based on level number.
SQL Examples
Create a Table and Insert Sample Data
CREATE TABLE sales_data (
id INT,
sales_person_id INT,
sales_amount FLOAT
);
INSERT INTO sales_data (id, sales_person_id, sales_amount)
VALUES (1, 1, 5000),
(2, 2, 5500),
(3, 3, 6000),
(4, 4, 6500),
(5, 5, 7000);
Query Demo: Calculate 50th Percentile (Median) of Sales Amount using Interpolation
SELECT QUANTILE_CONT(0.5)(sales_amount) AS median_sales_amount
FROM sales_data;
Result
| median_sales_amount |
|-----------------------|
| 6000.0 |
23 - QUANTILE_DISC
Aggregate function.
The QUANTILE_DISC()
function computes the exact quantile number of a numeric data sequence.
The QUANTILE
alias to QUANTILE_DISC
Caution: NULL values are not counted.
Analyze Syntax
func.quantile_disc(<levels>, <expr>)
Analyze Examples
func.quantile_disc([0.25, 0.75], table.salary).alias('salary_quantiles')
| salary_quantiles |
|---------------------|
| [55000.0, 65000.0] |
SQL Syntax
QUANTILE_DISC(<levels>)(<expr>)
QUANTILE_DISC(level1, level2, ...)(<expr>)
Arguments
Arguments | Description |
---|
level(s) | level(s) of quantile. Each level is constant floating-point number from 0 to 1. We recommend using a level value in the range of [0.01, 0.99] |
<expr> | Any numerical expression |
Return Type
InputType or array of InputType based on level number.
SQL Examples
Create a Table and Insert Sample Data
CREATE TABLE salary_data (
id INT,
employee_id INT,
salary FLOAT
);
INSERT INTO salary_data (id, employee_id, salary)
VALUES (1, 1, 50000),
(2, 2, 55000),
(3, 3, 60000),
(4, 4, 65000),
(5, 5, 70000);
Query Demo: Calculate 25th and 75th Percentile of Salaries
SELECT QUANTILE_DISC(0.25, 0.75)(salary) AS salary_quantiles
FROM salary_data;
Result
| salary_quantiles |
|---------------------|
| [55000.0, 65000.0] |
24 - QUANTILE_TDIGEST
import FunctionDescription from '@site/src/components/FunctionDescription';
Computes an approximate quantile of a numeric data sequence using the t-digest algorithm.
Caution: NULL values are not included in the calculation.
Analyze Syntax
func.quantile_tdigest(<levels>, <expr>)
Analyze Examples
func.quantile_tdigest([0.5, 0.8], table.sales_amount).alias('sales_amounts')
| sales_amounts |
|-----------------------+
| [6000.0,7000.0] |
SQL Syntax
QUANTILE_TDIGEST(<level1>[, <level2>, ...])(<expr>)
Arguments
Arguments | Description |
---|
<level n> | A level of quantile represents a constant floating-point number ranging from 0 to 1. It is recommended to use a level value in the range of [0.01, 0.99]. |
<expr> | Any numerical expression |
Return Type
Returns either a Float64 value or an array of Float64 values, depending on the number of quantile levels specified.
SQL Examples
-- Create a table and insert sample data
CREATE TABLE sales_data (
id INT,
sales_person_id INT,
sales_amount FLOAT
);
INSERT INTO sales_data (id, sales_person_id, sales_amount)
VALUES (1, 1, 5000),
(2, 2, 5500),
(3, 3, 6000),
(4, 4, 6500),
(5, 5, 7000);
SELECT QUANTILE_TDIGEST(0.5)(sales_amount) AS median_sales_amount
FROM sales_data;
median_sales_amount|
-------------------+
6000.0|
SELECT QUANTILE_TDIGEST(0.5, 0.8)(sales_amount)
FROM sales_data;
quantile_tdigest(0.5, 0.8)(sales_amount)|
----------------------------------------+
[6000.0,7000.0] |
25 - QUANTILE_TDIGEST_WEIGHTED
import FunctionDescription from '@site/src/components/FunctionDescription';
Computes an approximate quantile of a numeric data sequence using the t-digest algorithm.
This function takes into account the weight of each sequence member. Memory consumption is log(n), where n is a number of values.
Caution: NULL values are not included in the calculation.
Analyze Syntax
func.quantile_tdigest_weighted(<levels>, <expr>, <weight_expr>)
Analyze Examples
func.quantile_tdigest_weighted([0.5, 0.8], table.sales_amount, 1).alias('sales_amounts')
| sales_amounts |
|-----------------------+
| [6000.0,7000.0] |
SQL Syntax
QUANTILE_TDIGEST_WEIGHTED(<level1>[, <level2>, ...])(<expr>, <weight_expr>)
Arguments
Arguments | Description |
---|
<level n> | A level of quantile represents a constant floating-point number ranging from 0 to 1. It is recommended to use a level value in the range of [0.01, 0.99]. |
<expr> | Any numerical expression |
<weight_expr> | Any unsigned integer expression. Weight is a number of value occurrences. |
Return Type
Returns either a Float64 value or an array of Float64 values, depending on the number of quantile levels specified.
SQL Examples
-- Create a table and insert sample data
CREATE TABLE sales_data (
id INT,
sales_person_id INT,
sales_amount FLOAT
);
INSERT INTO sales_data (id, sales_person_id, sales_amount)
VALUES (1, 1, 5000),
(2, 2, 5500),
(3, 3, 6000),
(4, 4, 6500),
(5, 5, 7000);
SELECT QUANTILE_TDIGEST_WEIGHTED(0.5)(sales_amount, 1) AS median_sales_amount
FROM sales_data;
median_sales_amount|
-------------------+
6000.0|
SELECT QUANTILE_TDIGEST_WEIGHTED(0.5, 0.8)(sales_amount, 1)
FROM sales_data;
quantile_tdigest_weighted(0.5, 0.8)(sales_amount)|
-------------------------------------------------+
[6000.0,7000.0] |
26 - RETENTION
Aggregate function
The RETENTION() function takes as arguments a set of conditions from 1 to 32 arguments of type UInt8 that indicate whether a certain condition was met for the event.
Any condition can be specified as an argument (as in WHERE).
The conditions, except the first, apply in pairs: the result of the second will be true if the first and second are true, of the third if the first and third are true, etc.
Analyze Syntax
func.retention(<cond1> , <cond2> , ..., <cond32>)
Analyze Examples
table.user_id, func.retention(table.event_type=='signup', table.event_type='login', table.event_type='purchase').alias('sales_amounts')
| user_id | retention |
|---------|-----------|
| 1 | [1, 1, 0] |
| 2 | [1, 0, 1] |
| 3 | [1, 1, 0] |
SQL Syntax
RETENTION( <cond1> , <cond2> , ..., <cond32> );
Arguments
Arguments | Description |
---|
<cond> | An expression that returns a Boolean result |
Return Type
The array of 1 or 0.
SQL Examples
Create a Table and Insert Sample Data
CREATE TABLE user_events (
id INT,
user_id INT,
event_date DATE,
event_type VARCHAR
);
INSERT INTO user_events (id, user_id, event_date, event_type)
VALUES (1, 1, '2022-01-01', 'signup'),
(2, 1, '2022-01-02', 'login'),
(3, 2, '2022-01-01', 'signup'),
(4, 2, '2022-01-03', 'purchase'),
(5, 3, '2022-01-01', 'signup'),
(6, 3, '2022-01-02', 'login');
Query Demo: Calculate User Retention Based on Signup, Login, and Purchase Events
SELECT
user_id,
RETENTION(event_type = 'signup', event_type = 'login', event_type = 'purchase') AS retention
FROM user_events
GROUP BY user_id;
Result
| user_id | retention |
|---------|-----------|
| 1 | [1, 1, 0] |
| 2 | [1, 0, 1] |
| 3 | [1, 1, 0] |
27 - SKEWNESS
Aggregate function.
The SKEWNESS()
function returns the skewness of all input values.
Analyze Syntax
Analyze Examples
func.skewness(table.temperature).alias('temperature_skewness')
| temperature_skewness |
|----------------------|
| 0.68 |
SQL Syntax
Arguments
Arguments | Description |
---|
<expr> | Any numerical expression |
Return Type
Nullable Float64.
SQL Examples
Create a Table and Insert Sample Data
CREATE TABLE temperature_data (
id INT,
city_id INT,
temperature FLOAT
);
INSERT INTO temperature_data (id, city_id, temperature)
VALUES (1, 1, 60),
(2, 1, 65),
(3, 1, 62),
(4, 2, 70),
(5, 2, 75);
Query Demo: Calculate Skewness of Temperature Data
SELECT SKEWNESS(temperature) AS temperature_skewness
FROM temperature_data;
Result
| temperature_skewness |
|----------------------|
| 0.68 |
28 - STDDEV_POP
Aggregate function.
The STDDEV_POP() function returns the population standard deviation(the square root of VAR_POP()) of an expression.
Note: STD() or STDDEV() can also be used, which are equivalent but not standard SQL.
Caution: NULL values are not counted.
Analyze Syntax
Analyze Examples
func.stddev_pop(table.score).alias('test_score_stddev_pop')
| test_score_stddev_pop |
|-----------------------|
| 7.07107 |
SQL Syntax
STDDEV_POP(<expr>)
STDDEV(<expr>)
STD(<expr>)
Arguments
Arguments | Description |
---|
<expr> | Any numerical expression |
Return Type
double
SQL Examples
Create a Table and Insert Sample Data
CREATE TABLE test_scores (
id INT,
student_id INT,
score FLOAT
);
INSERT INTO test_scores (id, student_id, score)
VALUES (1, 1, 80),
(2, 2, 85),
(3, 3, 90),
(4, 4, 95),
(5, 5, 100);
Query Demo: Calculate Population Standard Deviation of Test Scores
SELECT STDDEV_POP(score) AS test_score_stddev_pop
FROM test_scores;
Result
| test_score_stddev_pop |
|-----------------------|
| 7.07107 |
29 - STDDEV_SAMP
Aggregate function.
The STDDEV_SAMP() function returns the sample standard deviation(the square root of VAR_SAMP()) of an expression.
Caution: NULL values are not counted.
Analyze Syntax
Analyze Examples
func.stddev_samp(table.height).alias('height_stddev_samp')
| height_stddev_samp |
|--------------------|
| 0.240 |
SQL Syntax
Arguments
Arguments | Description |
---|
<expr> | Any numerical expression |
Return Type
double
SQL Examples
Create a Table and Insert Sample Data
CREATE TABLE height_data (
id INT,
person_id INT,
height FLOAT
);
INSERT INTO height_data (id, person_id, height)
VALUES (1, 1, 5.8),
(2, 2, 6.1),
(3, 3, 5.9),
(4, 4, 5.7),
(5, 5, 6.3);
Query Demo: Calculate Sample Standard Deviation of Heights
SELECT STDDEV_SAMP(height) AS height_stddev_samp
FROM height_data;
Result
| height_stddev_samp |
|--------------------|
| 0.240 |
30 - STRING_AGG
Aggregate function.
The STRING_AGG() function converts all the non-NULL values of a column to String, separated by the delimiter.
Analyze Syntax
func.string_agg(<expr> [, delimiter])
Analyze Examples
func.string_agg(table.language_name).alias('concatenated_languages')
| concatenated_languages |
|-----------------------------------------|
| Python, JavaScript, Java, C#, Ruby |
SQL Syntax
STRING_AGG(<expr>)
STRING_AGG(<expr> [, delimiter])
Note:If <expr>
is not a String expression, should use ::VARCHAR
to convert.
For example:
SELECT string_agg(number::VARCHAR, '|') AS s FROM numbers(5);
+-----------+
| s |
+-----------+
| 0|1|2|3|4 |
+-----------+
Arguments
Arguments | Description |
---|
<expr> | Any string expression (if not a string, use ::VARCHAR to convert) |
delimiter | Optional constant String, if not specified, use empty String |
Return Type
the String type
SQL Examples
Create a Table and Insert Sample Data
CREATE TABLE programming_languages (
id INT,
language_name VARCHAR
);
INSERT INTO programming_languages (id, language_name)
VALUES (1, 'Python'),
(2, 'JavaScript'),
(3, 'Java'),
(4, 'C#'),
(5, 'Ruby');
Query Demo: Concatenate Programming Language Names with a Delimiter
SELECT STRING_AGG(language_name, ', ') AS concatenated_languages
FROM programming_languages;
Result
| concatenated_languages |
|------------------------------------------|
| Python, JavaScript, Java, C#, Ruby |
31 - SUM
Aggregate function.
The SUM() function calculates the sum of a set of values.
Caution: NULL values are not counted.
Analyze Syntax
Analyze Examples
func.sum(table.quantity).alias('total_quantity_sold')
| total_quantity_sold |
|---------------------|
| 41 |
SQL Syntax
SUM(<expr>)
Arguments
Arguments | Description |
---|
<expr> | Any numerical expression |
Return Type
A double if the input type is double, otherwise integer.
SQL Examples
Create a Table and Insert Sample Data
CREATE TABLE sales_data (
id INT,
product_id INT,
quantity INT
);
INSERT INTO sales_data (id, product_id, quantity)
VALUES (1, 1, 10),
(2, 2, 5),
(3, 3, 8),
(4, 4, 3),
(5, 5, 15);
Query Demo: Calculate the Total Quantity of Products Sold
SELECT SUM(quantity) AS total_quantity_sold
FROM sales_data;
Result
| total_quantity_sold |
|---------------------|
| 41 |
32 - SUM_IF
The suffix -If can be appended to the name of any aggregate function. In this case, the aggregate function accepts an extra argument – a condition.
Analyze Syntax
func.sum_if(<column>, <cond>)
Analyze Examples
func.sum_if(table.amount, table.status=='Completed').alias('total_amount_completed')
| total_amount_completed |
|------------------------|
| 270.0 |
SQL Syntax
SUM_IF(<column>, <cond>)
SQL Examples
Create a Table and Insert Sample Data
CREATE TABLE order_data (
id INT,
customer_id INT,
amount FLOAT,
status VARCHAR
);
INSERT INTO order_data (id, customer_id, amount, status)
VALUES (1, 1, 100, 'Completed'),
(2, 2, 50, 'Completed'),
(3, 3, 80, 'Cancelled'),
(4, 4, 120, 'Completed'),
(5, 5, 75, 'Cancelled');
Query Demo: Calculate the Total Amount of Completed Orders
SELECT SUM_IF(amount, status = 'Completed') AS total_amount_completed
FROM order_data;
Result
| total_amount_completed |
|------------------------|
| 270.0 |
33 - WINDOW_FUNNEL
Funnel Analysis

Similar to windowFunnel
in ClickHouse (they were created by the same author), it searches for event chains in a sliding time window and calculates the maximum number of events from the chain.
The function works according to the algorithm:
The function searches for data that triggers the first condition in the chain and sets the event counter to 1. This is the moment when the sliding window starts.
If events from the chain occur sequentially within the window, the counter is incremented. If the sequence of events is disrupted, the counter isn’t incremented.
If the data has multiple event chains at varying completion points, the function will only output the size of the longest chain.
SQL Syntax
WINDOW_FUNNEL( <window> )( <timestamp>, <cond1>, <cond2>, ..., <condN> )
Arguments
<timestamp>
— Name of the column containing the timestamp. Data types supported: integer types and datetime types.<cond>
— Conditions or data describing the chain of events. Must be Boolean
datatype.
Parameters
<window>
— Length of the sliding window, it is the time interval between the first and the last condition. The unit of window
depends on the timestamp
itself and varies. Determined using the expression timestamp of cond1 <= timestamp of cond2 <= ... <= timestamp of condN <= timestamp of cond1 + window
.
Returned value
The maximum number of consecutive triggered conditions from the chain within the sliding time window.
All the chains in the selection are analyzed.
Type: UInt8
.
Example
Determine if a set period of time is enough for the user to SELECT a phone and purchase it twice in the online store.
Set the following chain of events:
- The user logged into their account on the store (
event_name = 'login'
). - The user land the page (
event_name = 'visit'
). - The user adds to the shopping cart(
event_name = 'cart'
). - The user complete the purchase (
event_name = 'purchase'
).
CREATE TABLE events(user_id BIGINT, event_name VARCHAR, event_timestamp TIMESTAMP);
INSERT INTO events VALUES(100123, 'login', '2022-05-14 10:01:00');
INSERT INTO events VALUES(100123, 'visit', '2022-05-14 10:02:00');
INSERT INTO events VALUES(100123, 'cart', '2022-05-14 10:04:00');
INSERT INTO events VALUES(100123, 'purchase', '2022-05-14 10:10:00');
INSERT INTO events VALUES(100125, 'login', '2022-05-15 11:00:00');
INSERT INTO events VALUES(100125, 'visit', '2022-05-15 11:01:00');
INSERT INTO events VALUES(100125, 'cart', '2022-05-15 11:02:00');
INSERT INTO events VALUES(100126, 'login', '2022-05-15 12:00:00');
INSERT INTO events VALUES(100126, 'visit', '2022-05-15 12:01:00');
Input table:
+---------+------------+----------------------------+
| user_id | event_name | event_timestamp |
+---------+------------+----------------------------+
| 100123 | login | 2022-05-14 10:01:00.000000 |
| 100123 | visit | 2022-05-14 10:02:00.000000 |
| 100123 | cart | 2022-05-14 10:04:00.000000 |
| 100123 | purchase | 2022-05-14 10:10:00.000000 |
| 100125 | login | 2022-05-15 11:00:00.000000 |
| 100125 | visit | 2022-05-15 11:01:00.000000 |
| 100125 | cart | 2022-05-15 11:02:00.000000 |
| 100126 | login | 2022-05-15 12:00:00.000000 |
| 100126 | visit | 2022-05-15 12:01:00.000000 |
+---------+------------+----------------------------+
Find out how far the user user_id
could get through the chain in an hour window slides.
Query:
SELECT
level,
count() AS count
FROM
(
SELECT
user_id,
window_funnel(3600000000)(event_timestamp, event_name = 'login', event_name = 'visit', event_name = 'cart', event_name = 'purchase') AS level
FROM events
GROUP BY user_id
)
GROUP BY level ORDER BY level ASC;
Note: The event_timestamp
type is timestamp, 3600000000
is a hour time window.
Result:
+-------+-------+
| level | count |
+-------+-------+
| 2 | 1 |
| 3 | 1 |
| 4 | 1 |
+-------+-------+
- User
100126
level is 2 (login -> visit
) . - user
100125
level is 3 (login -> visit -> cart
). - User
100123
level is 4 (login -> visit -> cart -> purchase
).