This is the full developer documentation for PlaidCloud # PlaidCloud > Unified financial analytics for business users. [Get started ](/get-started/)Quickstart, concepts, and end-to-end tutorials. [Guides ](/guides/)Task-oriented how-to documentation. [Reference ](/reference/)Workflow steps, expressions, API, and CLI. [Integrations ](/integrations/)AI coding agents and external tools. [Administration ](/administration/)Access, security, and scheduled events. [Releases ](/releases/)Monthly summaries of what's new. # Page not found > The page you were looking for does not exist or has moved. Try searching from the top bar, or jump to one of the main sections: * [Get started](/get-started/) * [Guides](/guides/) * [Reference](/reference/) * [Integrations](/integrations/) * [Administration](/administration/) * [Releases](/releases/) # Administration > Access management, security, and scheduled operations. For workspace admins and security owners. [Access management ](/administration/access/)Organizations, workspaces, members, security groups, authentication, and single sign-on. [Control plane ](/administration/control-plane/)Manage organizations, workspaces, services, branding, lakehouse access, and maintenance windows. [Scheduled events ](/administration/scheduled-events/)Set up cron-based and event-driven workflow scheduling. # Identity and Access Management (IAM) > Manage PlaidCloud identity and access controls including user authentication, role-based permissions, and security groups. PlaidCloud’s access controls are organized around a few core concepts: * **Organization** — the top-level billing and identity boundary. An organization contains workspaces and members. * **Workspace** — an isolated environment where actual work happens. Members get access at the workspace level. * **Member** — a user with credentials who belongs to one or more workspaces in one or more organizations. * **Security group** — a bundle of permissions inside a workspace. Members are assigned to security groups to grant them specific capabilities. * **Single sign-on (SSO)** — optional SAML-based federation that delegates authentication to your identity provider (Okta, Auth0, Microsoft Entra, Google, AWS). ## Where to Start [Section titled “Where to Start”](#where-to-start) If you’re setting up a new organization: 1. **[Organizations and workspaces explained](/administration/access/overview/organizations-and-workspaces-explained/)** — the boundaries between them and when to use each 2. **[Managing workspace members](/administration/access/overview/managing-workspace-members/)** — invite users, assign them to workspaces, grant capabilities 3. **[Managing security groups](/administration/access/managing-security-groups-and-assignments/)** — bundle permissions and assign them If you’re integrating with an existing identity provider: * **[Managing single sign-on for organization](/administration/access/advanced/managing-single-sign-on-for-organization/)** — overview of the SSO flow * Vendor-specific guides: * [Okta SAML setup](/administration/access/advanced/okta-saml-setup/) * [Auth0 SAML setup](/administration/access/advanced/auth0-saml-setup/) * [Microsoft Entra SAML setup](/administration/access/advanced/entra-saml-setup/) * [Google SAML setup](/administration/access/advanced/google-saml-setup/) * [AWS SAML setup](/administration/access/advanced/aws-saml-setup/) ## Related [Section titled “Related”](#related) * [Member authentication](/administration/access/member-authentication/) — password and MFA options for non-SSO members * [Member management](/administration/access/member-management/) — adding, removing, and updating members * [Member user identity](/administration/access/member-user-identity/) — identity attributes and how PlaidCloud uses them * [Setting member expiration](/administration/access/advanced/setting-member-expiration-period/) — automatic deactivation policies # Advanced Operations > Configure advanced PlaidCloud security features including SAML single sign-on, organization admin roles, and member expiration. Advanced configuration for PlaidCloud security and identity — SAML single sign-on with major identity providers, organization admin roles, and member expiration policies. # Setting Up Auth0 SAML for Single Sign-On > Configure Auth0 as a SAML identity provider for PlaidCloud single sign-on to enable secure federated authentication for members. PlaidCloud supports Single Sign-On (SSO) via SAML 2.0. This guide walks through configuring Auth0 as a SAML identity provider so your organization’s users can authenticate through Auth0 when accessing PlaidCloud. Note The PlaidCloud-side configuration is handled by the PlaidCloud team. Your responsibility is to set up the SAML application in Auth0 and provide PlaidCloud with your **Identity Provider Metadata URL**. PlaidCloud support will complete the remaining configuration. ## Prerequisites [Section titled “Prerequisites”](#prerequisites) * An Auth0 tenant * An Auth0 account with the **Administrator** role * Contact with PlaidCloud support to coordinate the setup and exchange configuration values ## Overview [Section titled “Overview”](#overview) The setup process involves two parties exchanging SAML metadata: 1. **You configure** an application in Auth0 with the SAML2 Web App addon enabled and provide PlaidCloud with your Identity Provider Metadata URL. 2. **PlaidCloud provides** you with the Service Provider (SP) Entity ID and ACS URL (Assertion Consumer Service URL) needed to complete your Auth0 application configuration. Coordinate with PlaidCloud support to obtain the SP values before completing Step 3 below. ## Step 1: Create an Application [Section titled “Step 1: Create an Application”](#step-1-create-an-application) 1. Sign in to the [Auth0 Dashboard](https://manage.auth0.com). 2. In the left sidebar, navigate to **Applications** > **Applications**. 3. Click **Create Application**. 4. Enter a name for the application (e.g., `PlaidCloud SSO`). 5. Select **Regular Web Applications** as the application type. 6. Click **Create**. ## Step 2: Enable the SAML2 Web App Addon [Section titled “Step 2: Enable the SAML2 Web App Addon”](#step-2-enable-the-saml2-web-app-addon) 1. On the application detail page, select the **Addons** tab. 2. Click the **SAML2 Web App** addon to enable it. 3. The addon settings panel will open. Leave it open — you will configure it in the next step. ## Step 3: Configure SAML Settings [Section titled “Step 3: Configure SAML Settings”](#step-3-configure-saml-settings) Note You will need the **SP Entity ID** and **ACS URL** from PlaidCloud before completing this step. Contact PlaidCloud support to obtain these values. In the **SAML2 Web App** addon settings panel: 1. In the **Application Callback URL** field, enter the ACS URL provided by PlaidCloud. 2. In the **Settings** JSON editor, set the `audience` field to the SP Entity ID provided by PlaidCloud: ```json { "audience": "your-sp-entity-id-from-plaidcloud", "mappings": { "email": "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/emailaddress", "given_name": "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/givenname", "family_name": "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/surname" }, "nameIdentifierFormat": "urn:oasis:names:tc:SAML:1.1:nameid-format:emailAddress", "nameIdentifierProbes": [ "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/emailaddress" ] } ``` 3. Click **Enable** (or **Save**) to apply the settings. ## Step 4: Retrieve and Send the Identity Provider Metadata URL [Section titled “Step 4: Retrieve and Send the Identity Provider Metadata URL”](#step-4-retrieve-and-send-the-identity-provider-metadata-url) Once the addon is enabled, locate the metadata URL and send it to PlaidCloud so the integration can be completed. 1. In the **SAML2 Web App** addon settings panel, select the **Usage** tab. 2. Copy the **Identity Provider Metadata** URL (formatted as `https://{your-auth0-domain}/samlp/metadata/{client-id}`). **Send this Metadata URL to PlaidCloud support.** This is the Entity Descriptor URL that PlaidCloud needs to configure the trust relationship on the identity provider side. Once PlaidCloud receives this URL, the team will complete the Keycloak configuration and notify you when SSO is ready to test. ## Step 5: Configure Attribute Mappings for Groups (optional) [Section titled “Step 5: Configure Attribute Mappings for Groups (optional)”](#step-5-configure-attribute-mappings-for-groups-optional) If your PlaidCloud configuration uses group-based security role assignments, you can pass group membership through the SAML assertion using Auth0 rules or actions. ### Using Auth0 Actions [Section titled “Using Auth0 Actions”](#using-auth0-actions) 1. In the left sidebar, navigate to **Actions** > **Library**. 2. Click **Build Custom** and create a new action for the **Login / Post Login** trigger. 3. Add logic to append group information to the SAML assertion. For example, if groups are stored as user metadata: ```javascript exports.onExecutePostLogin = async (event, api) => { const groups = event.user.app_metadata?.groups || []; api.samlResponse.setAttribute("groups", groups); }; ``` 4. Deploy the action and add it to the **Login** flow. Note Discuss with PlaidCloud support which group attribute name and format are expected so that group-based security role assignments work correctly in PlaidCloud. ## Step 6: Control User Access [Section titled “Step 6: Control User Access”](#step-6-control-user-access) Auth0 controls which users can authenticate based on the connections and rules attached to the application. 1. On the application detail page, select the **Connections** tab. 2. Enable the appropriate connections (e.g., your organization’s database connection, Active Directory, or social connections) for this application. 3. Disable any connections that should not have access to PlaidCloud. To restrict access to specific users within a connection, use Auth0 Actions or Rules to allow or deny authentication based on user attributes or group membership. ## Testing the Integration [Section titled “Testing the Integration”](#testing-the-integration) After PlaidCloud confirms the configuration is complete: 1. Navigate to your organization’s PlaidCloud Workspace (e.g., `https://my-workspace.plaid.cloud`). 2. You will be redirected to the Auth0 sign-in page (or your configured connection’s login). 3. Sign in with your Auth0 credentials. 4. Upon successful authentication, you will be redirected back to PlaidCloud. If you encounter errors, verify that: * The Application Callback URL and audience match exactly what PlaidCloud provided * The SAML2 Web App addon is enabled on the application * The `nameIdentifierFormat` is set to the email address format * The Metadata URL you sent to PlaidCloud is accessible * The user’s connection is enabled on the application # Setting Up AWS IAM Identity Center SAML for Single Sign-On > Set up AWS IAM Identity Center as a SAML provider for PlaidCloud single sign-on to enable federated authentication for members. PlaidCloud supports Single Sign-On (SSO) via SAML 2.0. This guide walks through configuring AWS IAM Identity Center (formerly AWS SSO) as a SAML identity provider so your organization’s users can authenticate through AWS when accessing PlaidCloud. Note The PlaidCloud-side configuration is handled by the PlaidCloud team. Your responsibility is to set up the custom SAML application in IAM Identity Center and provide PlaidCloud with your **IAM Identity Center SAML Metadata URL**. PlaidCloud support will complete the remaining configuration. ## Prerequisites [Section titled “Prerequisites”](#prerequisites) * An AWS account with **IAM Identity Center** enabled * An IAM user or role with the **AWSSSOMasterAccountAdministrator** managed policy or equivalent permissions * IAM Identity Center must be configured with an identity source (the built-in directory, Active Directory, or an external IdP) * Contact with PlaidCloud support to coordinate the setup and exchange configuration values ## Overview [Section titled “Overview”](#overview) The setup process involves two parties exchanging SAML metadata: 1. **You configure** a custom SAML application in IAM Identity Center and provide PlaidCloud with your SAML Metadata URL. 2. **PlaidCloud provides** you with the Service Provider (SP) Entity ID and ACS URL (Assertion Consumer Service URL) needed to complete your application configuration. Coordinate with PlaidCloud support to obtain the SP values before completing Step 3 below. ## Step 1: Create a Custom SAML Application [Section titled “Step 1: Create a Custom SAML Application”](#step-1-create-a-custom-saml-application) 1. Sign in to the [AWS Management Console](https://console.aws.amazon.com) and navigate to **IAM Identity Center**. 2. In the left sidebar, select **Applications**. 3. Click **Add application**. 4. Select **I have an application I want to set up** and choose **Custom SAML 2.0 application**. 5. Click **Next**. 6. Enter a **Display name** for the application (e.g., `PlaidCloud SSO`) and optionally a description. ## Step 2: Retrieve the IAM Identity Center SAML Metadata URL [Section titled “Step 2: Retrieve the IAM Identity Center SAML Metadata URL”](#step-2-retrieve-the-iam-identity-center-saml-metadata-url) Before configuring the service provider details, locate your IAM Identity Center metadata URL to send to PlaidCloud. 1. On the application configuration page, scroll to the **IAM Identity Center metadata** section. 2. Copy the **IAM Identity Center SAML metadata URL** (formatted as `https://portal.sso.{region}.amazonaws.com/saml/metadata/{instanceId}`). **Send this Metadata URL to PlaidCloud support.** This is the Entity Descriptor URL that PlaidCloud needs to configure the trust relationship on the identity provider side. Once PlaidCloud receives this URL, the team will complete the Keycloak configuration and notify you when SSO is ready to test. ## Step 3: Configure Service Provider Details [Section titled “Step 3: Configure Service Provider Details”](#step-3-configure-service-provider-details) Note You will need the **SP Entity ID** and **ACS URL** from PlaidCloud before completing this step. Contact PlaidCloud support to obtain these values. 1. Scroll to the **Application properties** section. 2. In the **Application ACS URL** field, enter the ACS URL provided by PlaidCloud. 3. In the **Application SAML audience** field, enter the SP Entity ID provided by PlaidCloud. 4. Click **Submit**. ## Step 4: Configure Attribute Mappings [Section titled “Step 4: Configure Attribute Mappings”](#step-4-configure-attribute-mappings) IAM Identity Center passes user attributes to PlaidCloud in the SAML assertion. Configure attribute mappings so PlaidCloud receives the necessary user details. 1. On the application detail page, select the **Attribute mappings** tab. 2. Click **Add new attribute mapping** and add the following: | User attribute in the application | Maps to this string value or user attribute in IAM Identity Center | Format | | --------------------------------- | ------------------------------------------------------------------ | ------------ | | `Subject` | `${user:email}` | emailAddress | | `email` | `${user:email}` | unspecified | | `firstName` | `${user:givenName}` | unspecified | | `lastName` | `${user:familyName}` | unspecified | 3. Click **Save changes**. ### Group Membership (optional) [Section titled “Group Membership (optional)”](#group-membership-optional) IAM Identity Center does not natively pass group membership as a SAML attribute in the same way as other providers. If your PlaidCloud configuration requires group-based security role assignments, discuss the available options with PlaidCloud support. Common approaches include using the built-in directory with group assignments or syncing groups from an external identity source such as Active Directory. Note Discuss with PlaidCloud support how group membership should be conveyed so that group-based security role assignments work correctly in PlaidCloud. ## Step 5: Assign Users and Groups to the Application [Section titled “Step 5: Assign Users and Groups to the Application”](#step-5-assign-users-and-groups-to-the-application) Only users and groups assigned to the application will be able to authenticate through this SSO configuration. 1. On the application detail page, select the **Assign users and groups** tab. 2. Click **Assign users and groups**. 3. Search for and select the users or groups that should have SSO access to PlaidCloud. 4. Click **Assign users**. ## Testing the Integration [Section titled “Testing the Integration”](#testing-the-integration) After PlaidCloud confirms the configuration is complete: 1. Navigate to your organization’s PlaidCloud Workspace (e.g., `https://my-workspace.plaid.cloud`). 2. You will be redirected to the AWS IAM Identity Center sign-in page. 3. Sign in with your AWS IAM Identity Center credentials. 4. Upon successful authentication, you will be redirected back to PlaidCloud. If you encounter errors, verify that: * The ACS URL and SP Entity ID match exactly what PlaidCloud provided * The user attempting to log in is assigned to the application in IAM Identity Center * The Subject attribute is mapped to `${user:email}` with the **emailAddress** format * The Metadata URL you sent to PlaidCloud is accessible from PlaidCloud’s servers # Setting Up Microsoft Entra ID SAML for Single Sign-On > Configure Microsoft Entra ID as a SAML identity provider for PlaidCloud single sign-on to enable secure federated authentication. PlaidCloud supports Single Sign-On (SSO) via SAML 2.0. This guide walks through configuring Microsoft Entra ID (formerly Azure Active Directory) as a SAML identity provider so your organization’s users can authenticate through Entra when accessing PlaidCloud. Note The PlaidCloud-side configuration is handled by the PlaidCloud team. Your responsibility is to set up the Enterprise Application in Entra and provide PlaidCloud with your **App Federation Metadata URL**. PlaidCloud support will complete the remaining configuration. ## Prerequisites [Section titled “Prerequisites”](#prerequisites) * An active Microsoft Entra ID (Azure AD) tenant * An account with one of the following Entra roles: **Global Administrator**, **Cloud Application Administrator**, or **Application Administrator** * Contact with PlaidCloud support to coordinate the setup and exchange configuration values ## Overview [Section titled “Overview”](#overview) The setup process involves two parties exchanging SAML metadata: 1. **You configure** an Enterprise Application in Entra ID and provide PlaidCloud with your App Federation Metadata URL. 2. **PlaidCloud provides** you with the Service Provider (SP) Entity ID and Reply URL (Assertion Consumer Service URL) needed to complete your Entra configuration. Coordinate with PlaidCloud support to obtain the SP values before completing Step 3 below. ## Step 1: Create an Enterprise Application [Section titled “Step 1: Create an Enterprise Application”](#step-1-create-an-enterprise-application) 1. Sign in to the [Azure portal](https://portal.azure.com) and navigate to **Microsoft Entra ID**. 2. In the left sidebar, select **Enterprise Applications**. 3. Click **+ New application**. 4. Click **+ Create your own application**. 5. Enter a name for the application (e.g., `PlaidCloud SSO`). 6. Select **Integrate any other application you don’t find in the gallery (Non-gallery)**. 7. Click **Create**. ## Step 2: Enable SAML-Based Single Sign-on [Section titled “Step 2: Enable SAML-Based Single Sign-on”](#step-2-enable-saml-based-single-sign-on) 1. After the application is created, select **Single sign-on** from the left sidebar under **Manage**. 2. On the “Select a single sign-on method” screen, click **SAML**. ## Step 3: Configure Basic SAML Settings [Section titled “Step 3: Configure Basic SAML Settings”](#step-3-configure-basic-saml-settings) Note You will need the **SP Entity ID** and **Reply URL (ACS URL)** from PlaidCloud before completing this step. Contact PlaidCloud support to obtain these values. 1. In the **Basic SAML Configuration** section, click **Edit**. 2. In the **Identifier (Entity ID)** field, enter the SP Entity ID provided by PlaidCloud. 3. In the **Reply URL (Assertion Consumer Service URL)** field, enter the ACS URL provided by PlaidCloud. 4. Click **Save**. ## Step 4: Configure Attributes and Claims [Section titled “Step 4: Configure Attributes and Claims”](#step-4-configure-attributes-and-claims) By default, Entra will pass the user’s email address and name in the SAML assertion. If your PlaidCloud configuration uses security group assignments from SSO, you should also include group claims. ### Add Group Claims [Section titled “Add Group Claims”](#add-group-claims) 1. In the **Attributes & Claims** section, click **Edit**. 2. Click **+ Add a group claim**. 3. Choose **Groups assigned to the application** (recommended to limit token size). 4. Under **Source attribute**, select an appropriate value: * **Group ID** — passes the Azure Object ID (UUID) of the group * **Cloud-only group display names** — passes the human-readable group name (for cloud-only groups) * **sAMAccountName** — passes the on-premises group name (for hybrid/synced environments) 5. Click **Save**. Note Discuss with PlaidCloud support which group attribute format is expected so that group-based security role assignments work correctly in PlaidCloud. ## Step 5: Assign Users and Groups to the Application [Section titled “Step 5: Assign Users and Groups to the Application”](#step-5-assign-users-and-groups-to-the-application) Only users and groups assigned to the Enterprise Application will be able to authenticate through this SSO configuration. 1. In the left sidebar, select **Users and groups** under **Manage**. 2. Click **+ Add user/group**. 3. Select the users or groups that should have SSO access to PlaidCloud. 4. Click **Assign**. ## Step 6: Retrieve and Send the App Federation Metadata URL [Section titled “Step 6: Retrieve and Send the App Federation Metadata URL”](#step-6-retrieve-and-send-the-app-federation-metadata-url) Once the application is configured, locate the Federation Metadata URL and send it to PlaidCloud so the integration can be completed. 1. Navigate to the **Single sign-on** page for your Enterprise Application. 2. Scroll to the **SAML Certificates** section. 3. Copy the **App Federation Metadata URL**. **Send this URL to PlaidCloud support.** This is the Entity Descriptor URL that PlaidCloud needs to configure the trust relationship on the identity provider side. Once PlaidCloud receives this URL, the team will complete the Keycloak configuration and notify you when SSO is ready to test. ## Testing the Integration [Section titled “Testing the Integration”](#testing-the-integration) After PlaidCloud confirms the configuration is complete: 1. Navigate to your organization’s PlaidCloud Workspace (e.g., `https://my-workspace.plaid.cloud`). 2. You will be redirected to the Microsoft login page. 3. Sign in with your Entra ID credentials. 4. Upon successful authentication, you will be redirected back to PlaidCloud. If you encounter errors, verify that: * The SP Entity ID and Reply URL match exactly what PlaidCloud provided * The user attempting to log in is assigned to the Enterprise Application * The App Federation Metadata URL you sent to PlaidCloud is accessible (not blocked by a firewall or conditional access policy) # Setting Up Google Workspace SAML for Single Sign-On > Set up Google Workspace as a SAML identity provider for PlaidCloud single sign-on to enable secure federated authentication. PlaidCloud supports Single Sign-On (SSO) via SAML 2.0. This guide walks through configuring Google Workspace as a SAML identity provider so your organization’s users can authenticate through Google when accessing PlaidCloud. Note The PlaidCloud-side configuration is handled by the PlaidCloud team. Your responsibility is to set up the custom SAML app in Google Workspace and provide PlaidCloud with your **IdP Metadata URL**. PlaidCloud support will complete the remaining configuration. ## Prerequisites [Section titled “Prerequisites”](#prerequisites) * A Google Workspace account (Business Starter or higher) * A Google Workspace account with the **Super Admin** role * Contact with PlaidCloud support to coordinate the setup and exchange configuration values ## Overview [Section titled “Overview”](#overview) The setup process involves two parties exchanging SAML metadata: 1. **You configure** a custom SAML app in Google Workspace and provide PlaidCloud with your IdP Metadata URL. 2. **PlaidCloud provides** you with the Service Provider (SP) Entity ID and ACS URL (Assertion Consumer Service URL) needed to complete your Google Workspace configuration. Coordinate with PlaidCloud support to obtain the SP values before completing Step 3 below. ## Step 1: Create a Custom SAML App [Section titled “Step 1: Create a Custom SAML App”](#step-1-create-a-custom-saml-app) 1. Sign in to the [Google Admin console](https://admin.google.com) as a Super Admin. 2. Navigate to **Apps** > **Web and mobile apps**. 3. Click **Add app** > **Add custom SAML app**. 4. Enter a name for the app (e.g., `PlaidCloud SSO`) and optionally add a description and icon. 5. Click **Continue**. ## Step 2: Retrieve the Idp Metadata URL [Section titled “Step 2: Retrieve the Idp Metadata URL”](#step-2-retrieve-the-idp-metadata-url) On the **Google Identity Provider details** screen, Google displays the identity provider information needed by PlaidCloud. 1. Copy the **SSO URL**, **Entity ID**, and download the **Certificate** — or 2. Click **Copy** next to the **IDP metadata** URL (formatted as `https://accounts.google.com/o/saml2/idp?idpid=XXXXXXXXX`). **Send this IdP Metadata URL to PlaidCloud support.** This is the Entity Descriptor URL that PlaidCloud needs to configure the trust relationship on the identity provider side. Once PlaidCloud receives this URL, the team will complete the Keycloak configuration and notify you when SSO is ready to test. 3. Click **Continue** to proceed to the Service Provider configuration. ## Step 3: Configure Service Provider Details [Section titled “Step 3: Configure Service Provider Details”](#step-3-configure-service-provider-details) Note You will need the **SP Entity ID** and **ACS URL** from PlaidCloud before completing this step. Contact PlaidCloud support to obtain these values. 1. In the **ACS URL** field, enter the ACS URL provided by PlaidCloud. 2. In the **Entity ID** field, enter the SP Entity ID provided by PlaidCloud. 3. Leave **Start URL** blank unless PlaidCloud support instructs otherwise. 4. Set **Name ID format** to **EMAIL**. 5. Set **Name ID** to **Basic Information > Primary email**. 6. Click **Continue**. ## Step 4: Configure Attribute Mapping [Section titled “Step 4: Configure Attribute Mapping”](#step-4-configure-attribute-mapping) Google Workspace passes user attributes to PlaidCloud in the SAML assertion. At minimum, map the user’s email address. If your PlaidCloud configuration uses group-based security role assignments, also map group membership. ### Basic Attribute Mapping [Section titled “Basic Attribute Mapping”](#basic-attribute-mapping) Add the following attribute mappings on the **Attribute mapping** screen: | Google Directory attribute | App attribute | | -------------------------- | ------------- | | Primary email | `email` | | First name | `firstName` | | Last name | `lastName` | Click **Add mapping** to add each row. ### Group Membership (optional) [Section titled “Group Membership (optional)”](#group-membership-optional) If you want PlaidCloud to automatically assign users to security groups based on their Google group membership: 1. Click **Add mapping**. 2. Under **Google Directory attributes**, select **Group membership** and choose the relevant Google Groups. 3. Set the **App attribute** name to `groups` (confirm the expected name with PlaidCloud support). Note Discuss with PlaidCloud support which group attribute name and format is expected so that group-based security role assignments work correctly in PlaidCloud. Click **Finish**. ## Step 5: Enable the App for Users [Section titled “Step 5: Enable the App for Users”](#step-5-enable-the-app-for-users) By default, a new SAML app is disabled for all users. Enable it for the appropriate organizational units or groups. 1. On the app detail page, click **User access**. 2. Select the organizational unit or groups that should have SSO access to PlaidCloud. 3. Set the service status to **ON**. 4. Click **Save**. ## Testing the Integration [Section titled “Testing the Integration”](#testing-the-integration) After PlaidCloud confirms the configuration is complete: 1. Navigate to your organization’s PlaidCloud Workspace (e.g., `https://my-workspace.plaid.cloud`). 2. You will be redirected to the Google sign-in page. 3. Sign in with your Google Workspace credentials. 4. Upon successful authentication, you will be redirected back to PlaidCloud. If you encounter errors, verify that: * The SP Entity ID and ACS URL match exactly what PlaidCloud provided * The user attempting to log in belongs to an organizational unit or group with the app enabled * The Name ID format is set to **EMAIL** and mapped to **Primary email** * The IdP Metadata URL you sent to PlaidCloud is accessible # Manage Organization Administrators > Manage PlaidCloud organization administrator roles, assign admin privileges, and control top-level organizational security access. Organizations in PlaidCloud provide a top level area to control options such as single sign-on and member access capabilities. Organizations each contain at least one workspace, which allows workspaces to serve as the main level of tenant separation within PlaidCloud. A workspace helps to align teams with specific areas of interest and isolate access as appropriate. PlaidCloud allows Organizations to have an unlimited number of workspaces. ## Managing Organization Administrators [Section titled “Managing Organization Administrators”](#managing-organization-administrators) Each Organization in PlaidCloud can assign multiple administrators. Administrators have special privileges to control the Organization. They can do things such as manage billing, update access management, and perform workspace management. To manage administrators: 1. Select the “Organization Settings” menu from the top right of screen 2. Click “Administrators” This will display the table of current administrators. After the table opens, you may add new administrators, delete existing administrators, or alter administrative privileges. ## Adding an Administrator [Section titled “Adding an Administrator”](#adding-an-administrator) To add an administrator: 1. Select the “Organization Settings” menu from the top right of screen 2. Click “Administrators” 3. Click the “Add Organization Administrator” button 4. Complete the required fields 5. Click “Add as Administrator” ## Deleting an Administrator [Section titled “Deleting an Administrator”](#deleting-an-administrator) To delete an administrator: 1. Select the “Organization Settings” menu from the top right of screen 2. Click “Administrators” 3. Click the delete icon of the desired administrator 4. Confirm and click “Delete as Administrator” # Managing Single Sign-On for Organization > Configure and manage single sign-on settings for your PlaidCloud organization to streamline secure member authentication. Each Organization can have a custom url ([https://plaidcloud.com/sso](https://plaidcloud.com)/``) for members to access the single sign-on page you specified in the configuration. Note Single Sign-On uses SAML 2.0 protocols and is set up through the user interface. To create a custom URL: 1. Select the “Organization Settings” menu from the top right of screen 2. Click “Single Sign-On Security Credentials” 3. Adjust the Single Sign-On URL as desired 4. Click “Update Organization SSO Settings” ## Allow Creation of Users Automatically [Section titled “Allow Creation of Users Automatically”](#allow-creation-of-users-automatically) If Single Sign-On is enabled, you can choose to automatically create members based on successful Single Sign-On authentication. New members will receive the default workspace and security roles specified in the Organization settings. To automatically create members: 1. Select the “Organization Settings” menu from the top right of screen 2. Click “Organization and User Settings” 3. Check the “Create Users Automatically from Single Sign-On” box 4. Choose the desired default workspace Use of this feature greatly simplifies member management because new members will automatically have access without any additional setup in PlaidCloud. Similarly, if members are removed from the Single Sign-On facility, they will no longer have access to PlaidCloud. ## Allow Security Group Assignments From Single Sign-on [Section titled “Allow Security Group Assignments From Single Sign-on”](#allow-security-group-assignments-from-single-sign-on) If Single Sign-On is enabled, you can choose to pass a group association list along with the positive authentication message. The list’s items will be used to assign a member to the specified groups and remove them from any not specified. This is an effective way to manage security group assignments by using a central user management service such as Active Directory or other LDAP service. Note If a member is marked as an administrator within a workspace, they will continue to have full access to that workspace regardless of the specific role they may be assigned through this automated process. If this option is enabled, security roles will be assigned using the supplied list the next time a member signs in. If the option is disabled, existing members will retain their current security roles until manually updated within PlaidCloud. # Setting Up Okta SAML for Single Sign-On > Configure Okta as a SAML identity provider for PlaidCloud single sign-on to enable secure federated authentication for members. PlaidCloud supports Single Sign-On (SSO) via SAML 2.0. This guide walks through configuring Okta as a SAML identity provider so your organization’s users can authenticate through Okta when accessing PlaidCloud. Note The PlaidCloud-side configuration is handled by the PlaidCloud team. Your responsibility is to set up the SAML application in Okta and provide PlaidCloud with your **Identity Provider Metadata URL**. PlaidCloud support will complete the remaining configuration. ## Prerequisites [Section titled “Prerequisites”](#prerequisites) * An Okta account with the **Administrator** role (Super Admin or Org Admin) * Contact with PlaidCloud support to coordinate the setup and exchange configuration values ## Overview [Section titled “Overview”](#overview) The setup process involves two parties exchanging SAML metadata: 1. **You configure** a SAML application in Okta and provide PlaidCloud with your Identity Provider Metadata URL. 2. **PlaidCloud provides** you with the Service Provider (SP) Entity ID and Single Sign-On URL (ACS URL) needed to complete your Okta application configuration. Coordinate with PlaidCloud support to obtain the SP values before completing Step 3 below. ## Step 1: Create a New SAML Application [Section titled “Step 1: Create a New SAML Application”](#step-1-create-a-new-saml-application) 1. Sign in to the [Okta Admin console](https://your-org.okta.com/admin). 2. In the left sidebar, navigate to **Applications** > **Applications**. 3. Click **Create App Integration**. 4. Select **SAML 2.0** as the sign-in method. 5. Click **Next**. 6. Enter a name for the application (e.g., `PlaidCloud SSO`) and optionally upload a logo. 7. Click **Next**. ## Step 2: Configure SAML Settings [Section titled “Step 2: Configure SAML Settings”](#step-2-configure-saml-settings) Note You will need the **SP Entity ID** and **Single Sign-On URL (ACS URL)** from PlaidCloud before completing this step. Contact PlaidCloud support to obtain these values. 1. In the **Single sign-on URL** field, enter the ACS URL provided by PlaidCloud. 2. In the **Audience URI (SP Entity ID)** field, enter the SP Entity ID provided by PlaidCloud. 3. Leave **Default RelayState** blank unless PlaidCloud support instructs otherwise. 4. Set **Name ID format** to **EmailAddress**. 5. Set **Application username** to **Email**. 6. Click **Next**. ## Step 3: Configure Attribute Statements [Section titled “Step 3: Configure Attribute Statements”](#step-3-configure-attribute-statements) On the same SAML settings screen, add attribute statements so that PlaidCloud receives user details in the SAML assertion. ### User Attributes [Section titled “User Attributes”](#user-attributes) In the **Attribute Statements** section, add the following: | Name | Name format | Value | | ----------- | ----------- | ---------------- | | `email` | Unspecified | `user.email` | | `firstName` | Unspecified | `user.firstName` | | `lastName` | Unspecified | `user.lastName` | ### Group Attributes (optional) [Section titled “Group Attributes (optional)”](#group-attributes-optional) If your PlaidCloud configuration uses group-based security role assignments, add a group attribute statement so group membership is passed in the assertion. In the **Group Attribute Statements** section, add the following: | Name | Name format | Filter | | -------- | ----------- | ---------------------------------------------------------------------------------------- | | `groups` | Unspecified | **Matches regex** — `.*` (or a more specific pattern to limit which groups are included) | Note Discuss with PlaidCloud support which group attribute name and filter are expected so that group-based security role assignments work correctly in PlaidCloud. Click **Next**, then select **I’m an Okta customer adding an internal app** and click **Finish**. ## Step 4: Retrieve and Send the Identity Provider Metadata URL [Section titled “Step 4: Retrieve and Send the Identity Provider Metadata URL”](#step-4-retrieve-and-send-the-identity-provider-metadata-url) Once the application is created, locate the metadata URL and send it to PlaidCloud so the integration can be completed. 1. On the application detail page, select the **Sign On** tab. 2. Scroll to the **SAML 2.0** section and click **More details**. 3. Copy the **Metadata URL** (formatted as `https://your-org.okta.com/app/your-app-id/sso/saml/metadata`). **Send this Metadata URL to PlaidCloud support.** This is the Entity Descriptor URL that PlaidCloud needs to configure the trust relationship on the identity provider side. Once PlaidCloud receives this URL, the team will complete the Keycloak configuration and notify you when SSO is ready to test. ## Step 5: Assign Users and Groups to the Application [Section titled “Step 5: Assign Users and Groups to the Application”](#step-5-assign-users-and-groups-to-the-application) Only users and groups assigned to the application will be able to authenticate through this SSO configuration. 1. On the application detail page, select the **Assignments** tab. 2. Click **Assign** and choose either **Assign to People** or **Assign to Groups**. 3. Select the users or groups that should have SSO access to PlaidCloud and click **Assign**. 4. Click **Done**. ## Testing the Integration [Section titled “Testing the Integration”](#testing-the-integration) After PlaidCloud confirms the configuration is complete: 1. Navigate to your organization’s PlaidCloud Workspace (e.g., `https://my-workspace.plaid.cloud`). 2. You will be redirected to the Okta sign-in page. 3. Sign in with your Okta credentials. 4. Upon successful authentication, you will be redirected back to PlaidCloud. If you encounter errors, verify that: * The ACS URL and SP Entity ID match exactly what PlaidCloud provided * The user attempting to log in is assigned to the application in Okta * The Name ID format is set to **EmailAddress** and the application username is set to **Email** * The Metadata URL you sent to PlaidCloud is accessible # Setting Member Expiration Period > Set member expiration periods in PlaidCloud to automatically manage access duration and enforce security compliance policies. If retaining inactive members within PlaidCloud is not desired, members can be set for automatic removal from the Organization after a specified period of inactivity using the expiration capabilities PlaidCloud offers. This automated removal of dormant members can be set as short as one day, if desired. Note Setting this option to zero (0) indicates no automated removal will occur for the Organization. **To set expiration of members:** 1. Select the “Organization Settings” menu from the top right of screen 2. Click “Organization and User Settings” 3. Set the desired number of days until expiration 4. Click Update # Managing Security Groups and Assignments > Manage PlaidCloud security groups, assign members to groups, and configure group-based access permissions for your workspace. PlaidCloud’s security and access management is straightforward. A member is granted or denied access based on the groups in which a member is associated. Adding or changing a member’s security association is easily customizable. Note Each workspace is allowed an unlimited number of security groups, but we recommend minimizing the number in order to ease security management. ## Managing Security Groups [Section titled “Managing Security Groups”](#managing-security-groups) Security groups can be added, updated, or deleted. **To manage security groups:** 1. Open Identity 2. Select the “Security” tab 3. Click “Security Groups” in the dropdown menu (this will display a form with existing groups) 4. To add a group, click the “Create Security Group” 5. To edit permissions of a group, click on the left-most icon **To manage group members:** 1. Open Identity 2. Select the “Security” tab 3. Click “Security Groups” in the dropdown menu 4. Click the Member icon 5. Drag desired members from the “Unassigned Members” column to the “Assigned Members” column or vice versa to remove members ## Setting Default Security Groups [Section titled “Setting Default Security Groups”](#setting-default-security-groups) To reduce the time needed for adding new members, identify a set of default security groups. This provides a baseline set of security groups for new members without needing to manually assign each person. The setting is available when adding a new security group if you check the box at the bottom of the Security Group window that reads “Assign to New Users by Default”. ## Performing a Security Audit [Section titled “Performing a Security Audit”](#performing-a-security-audit) The security audit capability provides the ability to see group membership across all members and groups. **To perform a security audit:** 1. Open Identity 2. Select the “Security” tab 3. Click “Security Group Audit” in the dropdown menu As all tables in PlaidCloud are exportable as a CSV file format, the group member associations are reviewable outside of PlaidCloud for either historical purposes or just some fun off-line viewing. **To export from the “Security Group Audit” form:** 1. Open Identity 2. Select the “Security” Tab 3. Click “Security Group Audit” in the dropdown menu 4. Click the small icon to the far right of “Username” in the table 5. Click “Export CSV” or “Export XLXS” depending on your preference ## Viewing Available Permission Settings [Section titled “Viewing Available Permission Settings”](#viewing-available-permission-settings) Each application being used in the workspace has specific available permissions. The security group permissions are based on these application permissions. The complete list of available permission for each application is viewable from the Security Bin. **To access the Security Bin:** 1. Open Identity 2. Select the “Security” 3. Click “Security Bins” in the dropdown menu To view the detailed security settings for each application, select the tags icon on the far left. This available security settings information is informational only. For details on managing permissions, refer to the Managing Security Groups section above. # Member Authentication > Configure PlaidCloud member authentication options including password management, multi-factor authentication, and login security. The Identity tab houses the security and authentication features that PlaidCloud focuses on in order to ensure a secure member platform. PlaidCloud offers three options for authentication types. They are: * Password Only * Two-Factor Authentication * Single Sign-On The default authentication type is password only. However, two-factor authentication can also be activated. If a Single Sign-On SAML authentication provider is available, you can configure your PlaidCloud organization to use Single Sign-On. If you choose to create a personal account, the default authentication type is password only. To change this to a two-factor authentication, reference the steps under the Two-Factor section. Note Members may have access to the Identity tab for security purposes or in order to manage members for the workspace. Details on managing security and authentication for new members or members without access can be found on the main “Identity” page. ## Changing Passwords [Section titled “Changing Passwords”](#changing-passwords) For members using two-factor or password-only authentication, password changes are simple and can be performed under the “Member” menu (gravatar icon) in the upper right corner. **To change passwords:** 1. Select the icon (gravatar) in the upper right * The “Member” menu icon will be different for each user 2. Click “Change Password” in the dropdown menu 3. Enter your current password where requested 4. Enter your new password where requested 5. Re-enter your password (for confirmation) 6. Click the “Update” button to save Note Only strong passwords are accepted, and the new password must be different from the current one. ## Password Only Authentication [Section titled “Password Only Authentication”](#password-only-authentication) Password-only authentication is the simplest and least secure option, even with long cryptic passwords. This option may be ideal for those looking to maintain quick and convenient access without too much concern about security risks. Password-only authentication continues to be a common practice but we highly recommend using Two-Factor instead. ## Two-Factor Authentication [Section titled “Two-Factor Authentication”](#two-factor-authentication) Two-Factor, or multi-factor, authentication provides a substantial increase in security over password-only because it requires both something “you know” (the password) and something “you have” (the access key). In other words, the password alone will not enable access. Passwords are susceptible to security threats because they represent a *single* piece of information that a malicious actor needs to gain access; two-factor provides additional security by requiring *additional information* to sign in. For this reason we **strongly** urge you to use two-factor for the safety of your account, not only on PlaidCloud, but on other websites that support it. ### Enabling Two-Factor [Section titled “Enabling Two-Factor”](#enabling-two-factor) **To enable two-factor and set your authentication code preferences:** 1. Select the icon (gravatar) in the upper right 2. Click “Manage Multi-Factor Authentication” in the dropdown menu 3. Select your preferred type of two-factor authentication code delivery. ### Types of Two-Factor Authentication [Section titled “Types of Two-Factor Authentication”](#types-of-two-factor-authentication) PlaidCloud has three options for receiving this additional information: * Via smartphone app (e.g. Google Authenticator, Authy, Okta, FreeOTP, etc…) * Via text message (or SMS) * Via a YubiKey from Yubico ### Smartphone-Based Authentication [Section titled “Smartphone-Based Authentication”](#smartphone-based-authentication) To get your code via a smartphone app, you will need to download an authenticator app, such as Google Authenticator, for your [iOS](https://itunes.apple.com/us/app/google-authenticator/id388497605?mt=8) or [Android](https://play.google.com/store/apps/details?id=com.google.android.apps.authenticator2) device. Note that there are other compatible authenticator apps that can be used, but this article assumes you’re using the Google Authenticator app. After downloading the app, open it and follow the in-app setup instructions. **Once you have the authenticator set up:** 1. Tap the “+” button 2. Select “Scan barcode” 3. Open “Manage Multi-Factor Authentication” under the gravatar icon on PlaidCloud 4. Select “Configure Authenticator” on PlaidCloud 5. When prompted, use your phone to scan the QR code displayed on PlaidCloud 6. After scanning the QR code, your authenticator app should display a six-digit authentication code which changes every 30 seconds 7. Enter this code into the text box at the bottom of the PlaidCloud “Configure SmartPhone Authentication” screen which should still be pulled up from the previous steps 8. Select “Verify.” 9. If the code is valid, Two-Factor will be enabled for your account and you will be shown a list of backup codes. 10. Once enabled, you can select “Manage Multi-Factor Authentication” again to view your backup codes or to disable two-factor. ### SMS-Based Authentication [Section titled “SMS-Based Authentication”](#sms-based-authentication) **To use SMS-based Authentication:** 1. Open “Manage Multi-Factor Authentication” under the gravatar icon on PlaidCloud 2. Select “Configure SMS” on PlaidCloud 3. Enter your mobile phone number and carrier 4. Click “Submit” 5. You will then be sent a text message containing an authentication code 6. Enter this code in the window that appears in PlaidCloud 7. If the code is valid, two-factor will be enabled for your account and SMS will send you a different code to enter whenever you log in 8. Once enabled, you can select “Manage Multi-Factor Authentication” again to update your contact information or to disable two-factor. ### Yubikey Authentication [Section titled “Yubikey Authentication”](#yubikey-authentication) If using Yubikeys – hardware authentication devices manufactured by Yubico – members can register up to five YubiKeys for their account. We have both a managed pool of PlaidCloud YubiKeys that can be administered by the person responsible for your workspace access security, or members can choose to use any standard YubiKey. Note Keys from the PlaidCloud YubiKey pool (YubiKeys specifically issued by PlaidCloud) count towards the five key limit. To enable YubiKey authentication, you must first register at least one YubiKey. **To register a YubiKey:** 1. Select the icon (gravatar) in the upper right 2. Click “Change Registered YubiKeys” in the dropdown menu 3. Place the cursor in an open spot on the “My Registered YubiKeys” form 4. Insert the YubiKey into your computer 5. Press the YubiKey one-time password (OTP) button 6. When the OTP is filled in, click the “Update” button in the form to save After you register at least one YubiKey you can configure it to your account. **To configure a YubiKey:** 1. Select the gravatar icon 2. Click “Manage Multi-Factor Authentication” 3. Select “Configure YubiKey” 4. Enter one of your YubiKey OTPs in the provided form. If the OTP is valid, two-factor will be enabled for your account and you will need to enter a YubiKey OTP each time you log in. ### PlaidCloud Yubikey Pool [Section titled “PlaidCloud Yubikey Pool”](#plaidcloud-yubikey-pool) The Managed YubiKey Pool provides an easy way to manage two-factor authentication for members of the workspace. The managed keys are branded with the PlaidCloud logo and can be shipped directly to members or in bulk to an administrator. The managed pool provides advantages over individual Yubikeys in the following ways: * Lost keys are easily replaced without the member needing to store recovery codes * Assignment of keys is point and click. Members don’t have to register the key. * View YubiKey assignments and revoke keys with a point and click interface * Order and ship new keys directly to members * Managed YubiKeys are fully compatible with other services that accept YubiKey OTPs * YubiKeys can be reassigned to other members without compromising security as member turn-over occurs **To order new keys:** 1. Open Identity 2. Select the “Security” tab 3. Click “PlaidCloud Security Keys” in the dropdown menu 4. Click the “Order More Keys” button in the form If managed keys were ordered, they will appear in the managed keys table. From the key assignment form, keys can be assigned, marked as unassigned, or marked as lost. In addition, each key can have a memo attached for keeping track of notes related to issuance of the key. To do this simply click the edit icon and make the desired adjustments. Managed keys are a one-time cost. There are no additional on-going charges for their use. Managed Yubikeys are $30 each plus shipping. ## What Recovery Codes Do [Section titled “What Recovery Codes Do”](#what-recovery-codes-do) For security reasons, PlaidCloud Support cannot immediately restore access to accounts with two-factor authentication enabled if you lose your phone or YubiKey. Recovery codes allow for you to still access your account with a lost phone or YubiKey and then reconfigure it from there. After successfully setting up your two-factor authentication, you’ll be provided with a set of randomly generated recovery codes that you should view and save. We strongly recommend saving your recovery codes immediately. However, these codes can be downloaded at any point after enabling two-factor authentication. For more information, see [Downloading your two-factor authentication recovery codes](https://plaidcloud.com/docs/identity/downloading-your-two-factor-authentication-recovery-codes). Note If you do not have a backup code or a backup key registered a much more stringent process is followed that may require several days to validate the authenticity of the access request and maintain PlaidCLoud security. ### Lost Yubikey [Section titled “Lost Yubikey”](#lost-yubikey) You can provide an SMS number as part of your profile. If you lose access to both your registered set of YubiKeys and your recovery codes, a backup SMS number can get you back in to your account. Note This is not an automated process, so regaining access may require some time. If the member is using a managed pool key and loses it, the workspace pool administrator can mark the key as lost and issue a new one. This reduces the risk of being locked out of an account or having to retain recovery codes. To mark a key as lost: 1. Open Identity 2. Select “Security” 3. Click “PlaidCloud Security Keys” 4. Click the edit icon 5. Select “Lost” under the Key Usage Information section 6. Click “Update” This will mark the key as lost and allow you to issue a new one. ## Single Sign-on [Section titled “Single Sign-on”](#single-sign-on) Single Sign-On requires an external service to perform the actual authentication process, and PlaidCloud simply receives a positive or negative response. Use of Single Sign-On can reduce the administrative requirements for managing passwords across multiple applications and ensure good member management practices when employees leave or access restrictions are applied. Single Sign-On is the easiest option for members to use. It is as secure as the authentication process the external party uses. Single Sign-On helps ensure passwords are up-to-date and synchronized with other services the member interacts with. While Single Sign-On does require a more extensive authentication process behind the scenes, and usually requires technical coordination with IT and/or network security, it can be used by anyone, although it is typically used by larger companies and academic institutions. For more information on setting up and managing Single Sign-On see the [Organization and Workspace management area.](/administration/access/advanced/managing-single-sign-on-for-organization) # Member Management > Add, remove, and suspend PlaidCloud workspace members, manage user roles, and control member access to projects and resources. Identity provides the ability to add, remove, and/or suspend members of the workspace. Since PlaidCloud members can be members of multiple workspaces, removing a member from the workspace does not delete the member account from PlaidCloud. ## New Members [Section titled “New Members”](#new-members) ### Adding New Members [Section titled “Adding New Members”](#adding-new-members) **To add members:** 1. Open Identity 2. Select the “Member” tab 3. Click “All” in the dropdown menu to display members 4. Click “Add Workspace Member” 5. Complete all required fields on the member form 6. Click the “Create” button ### New Member Welcome Email [Section titled “New Member Welcome Email”](#new-member-welcome-email) After adding a new member, a welcome email with sign-in credentials will be sent to their provided email address. The welcome email can be customized to provide additional information relevant to the new member’s PlaidCloud use. **To update or view the welcome email:** 1. Open Identity 2. Select the “Member” tab 3. Click “Email Welcome Message” from the dropdown menu 4. Make any additions or changes desired 5. Click the “Update” button ## Viewing and Managing Member Sessions [Section titled “Viewing and Managing Member Sessions”](#viewing-and-managing-member-sessions) **To view the current member sessions:** 1. Open Identity 2. Select the “Member” tab 3. Click “Session Manager” in the dropdown menu From this table, it’s possible to view session information (current sessions and last activity), as well as terminate sessions if desired. **To terminate a session:** 1. Highlight the member(s) you wish to logout 2. Click the “End Selected Sessions” button in the upper left ## Managing Distribution (distro) Lists [Section titled “Managing Distribution (distro) Lists”](#managing-distribution-distro-lists) Distribution lists, Distros, are simply email distribution lists managed within PlaidCloud. They provide an easy way to quickly send reports, files, and/or other information to groups. The Distribution list feature allows for the management of lists on a workspace by workspace basis. This eliminates the need to rely on external lists that may over or undercover the intended audience. **To manage lists:** 1. Open Identity 2. Select the “Distro Lists” tab 3. Click the “Create New Distro List” button to create a new list 4. Complete all required fields of the Distro List form 5. Click the “create” button Note Distro lists can include both workspace members and non-members **To manage workspace members for each list:** 1. Select the workspace icon (cloud) in the table 2. To manage non-members, click on the globe icon. # Member (User) Identity > Configure PlaidCloud member identity settings including authentication methods, role-based security, and user profile management. PlaidCloud makes authentication and role-based security easy to control from one centralized location: the “Identity” tab, located on the left side of the screen. Identity provides the foundation for member management, security, and different types of authentication processes. Member management includes everything from viewing current members and adding new members to sending mass emails. Security is a priority for PlaidCloud. The Security subset of the Identity tab allows you to perform security audits, set up security groups and default security groups for new members, and control the approved security level of each member. Authentication is where security starts. PlaidCloud offers multiple authentication options to support most use cases: * Password Only * Two-Factor Authentication * Single Sign-On # Overview > Overview of PlaidCloud member management and organization structure, including workspaces, roles, and access permissions. How PlaidCloud access is structured — organizations contain workspaces, workspaces contain members, and security groups grant capabilities. This section explains how the pieces fit together. # Managing Workspace Members > Add, manage, and remove workspace members in PlaidCloud including inviting new users, setting roles, and managing permissions. While members may be associated with other workspaces within an Organization, each workspace has it’s own access restrictions. Members must be granted permission to view and access a workspace. ## Adding Members [Section titled “Adding Members”](#adding-members) To add a member: 1. Select “Organization Settings” from the menu in the upper right of the browser 2. Click “Workspaces” 3. Click the members icon 4. Select the desired member and drag them to the appropriate column 5. Click “Submit” Note In order to add members to a workspace, the members must be part of the Organization and must appear on the member management form. If you want to add a member who does not appear on the member management form, you must first invite the member into the workspace. To send an invite: 1. Select “Organization Settings” from the menu in the upper right of the browser 2. Click “Workspaces” 3. Click the invite icon This process will send an email invitation to the member. The member then needs to click the link in the email and follow the directions to login or create an account if they are new to PlaidCloud. After a successful login, the member will be added to the workspace. ## Removing Members [Section titled “Removing Members”](#removing-members) To remove a member: 1. Select “Organization Settings” from the menu in the upper right of the browser 2. Click “Workspaces” 3. Click the members icon 4. Select the desired member and drag them to the appropriate column 5. Click “Submit” # Organizations and Workspaces Explained > Understand PlaidCloud organizations and workspaces, how they relate to each other, and how to structure your access hierarchy. Organizations are a collection of one or more workspaces. All data and projects exist within workspaces. Organizations only serve as a way to manage multiple workspaces. Security and access controls are managed by each workspace to cater to the workspace’s unique role within the organization. PlaidCloud’s workspaces aim to maximize collaboration and increase information access while restricting access to private or confidential information. In PlaidCloud, Organizations serve as the foundation, while Workspaces are designed to help support unique needs. With PlaidCloud being a multi-tenant workspace service, it provides flexibility by eliminating the need to perform technical configurations of isolated workspace environments. PlaidCloud is designed to provide maximum collaboration and flexibility while ensuring that privacy and confidentiality are never compromised through complete isolation of people and data by workspace. PlaidCloud’s Organizations makes managing small teams, large teams, and multinational organizations easy. It allows you to easily integrate authentication and member management into existing systems or, if you choose to, manually manage them. PlaidCloud’s multiple tiers of access control simultaneously minimizes management overhead and keeps data and activities compartmentalized. While this may sound complex, we keep the process as simple as possible, so getting started and scaling up is not difficult. PlaidCloud is broken down into the following levels of access control: 1. Organization 2. Workspaces 3. Projects Each progressive layer of control enables administrators to apply access controls and permissions for certain operations. # Viewing and Managing Workspaces > View and manage your PlaidCloud workspaces including settings, membership, connected services, and workspace configuration options. Workspaces allow an Organization to operate as its own cloud-based service for small to large Organizations. For example, small teams may have a single workspace in their Organization, while large Organizations may have hundreds of specialized workspaces. Workspaces manage access and visibility while providing isolated areas for an Organization’s members to operate. Workspace access is assigned to members in a private, multi-tenant environment for the Organization. With workspaces, teams can collaborate on open projects within some workspaces while maintaining strict confidentiality in other workspaces. Since workspaces are fully isolated, data cannot be directly shared or accessed across workspaces. However, workspaces can access the same shared Document area, so that sharing of files between workspaces is possible if desired. ## Viewing and Managing Workspaces [Section titled “Viewing and Managing Workspaces”](#viewing-and-managing-workspaces) Viewing and managing workspaces within an Organization is simple. You must be an Organization owner to manage workspaces. To view and manage workspaces: 1. Select “Organization Settings” from the menu in the upper right of the browser 2. Click “Workspaces” This will bring you to a table showing all the current workspaces within the Organization. From here you can create, update, suspend, and delete workspaces, add apps to workspaces, and manage member access to each workspace. ### Creating a Workspace [Section titled “Creating a Workspace”](#creating-a-workspace) 1. Select “Organization Settings” from the menu in the upper right of the browser 2. Click “Workspaces” 3. Click the “New Workspace” button 4. Complete the required fields 5. Click “Submit” Note By default, the member who created the workspace will be assigned to it automatically. ### Updating a Workspace [Section titled “Updating a Workspace”](#updating-a-workspace) 1. Select “Organization Settings” from the menu in the upper right of the browser 2. Click “Workspaces” 3. Click the edit icon of the desired workspace 4. Adjust the fields as desired 5. Click “Submit” ### Suspending a Workspace [Section titled “Suspending a Workspace”](#suspending-a-workspace) 1. Select “Organization Settings” from the menu in the upper right of the browser 2. Click “Workspaces” 3. Uncheck the “Active” checkbox of the desired workspace 4. Click “Submit” ### Deleting a Workspace [Section titled “Deleting a Workspace”](#deleting-a-workspace) 1. Select “Organization Settings” from the menu in the upper right of the browser 2. Click “Workspaces” 3. Click the delete icon of the desired workspace 4. Click “Delete” again Note Deletion is a permanent action. This process will delete the workspace and all associated data. Be sure you have everything you need backed up before doing this. ### Managing Apps Available in Workspace [Section titled “Managing Apps Available in Workspace”](#managing-apps-available-in-workspace) By default, new workspaces have three apps automatically added: Analyze, Document, and Identity. While Identity cannot be removed because it is essential to managing access and roles within a workspace, Analyze and Document can be removed. To manage which apps are available in a workspace, including custom apps: 1. Select “Organization Settings” from the menu in the upper right of the browser 2. Click “Workspaces” 3. Click on the apps icon for the workspace you want to modify the associated apps 4. If you want to remove and app, click on the delete icon for the app to remove and confirm the deletion 5. If you want to add a new app, click on the **Add App to Workspace** button, select the app you want to add, check the **Enable for Use** checkbox, and click the create button # Control Plane > Use the PlaidCloud control plane to manage organizations, workspaces, access, workspace services, branding, and maintenance windows. The PlaidCloud control plane is for partner administrators and customer administrators who manage organizations and workspaces. Use it to create workspaces, assign workspace ownership, choose release behavior, enable workspace services, configure branding, and manage organization-level access. Note The PlaidAdmin area is for PlaidCloud operations staff. This guide covers the partner and customer administration areas only. ## What You Can Manage [Section titled “What You Can Manage”](#what-you-can-manage) [Organizations ](/administration/control-plane/organizations/)Configure organization identity, billing metadata, SSO behavior, and organization administrators. [Workspace Configuration ](/administration/control-plane/workspace-configuration/)Review every workspace setting available in the control plane, including services, release channels, branding, lakehouse access, and maintenance. ## Navigation [Section titled “Navigation”](#navigation) After you sign in, use the left navigation: 1. Open **Organizations** to view or edit organizations you administer. 2. Open **Workspaces** to view, create, edit, pause, unpause, or bulk-update workspaces you can manage. 3. Use your profile menu to update your account profile or sign out. The actions you see depend on your organization roles. For example, a user with workspace access can manage workspaces for an organization, while a user with security access can manage invitations and roles. ## Roles [Section titled “Roles”](#roles) Organization access is controlled by these roles: | Role | What It Allows | | --------- | --------------------------------------------------------------------------------- | | Admin | Edit organization settings and manage organization-level administration. | | Workspace | Create, edit, pause, unpause, delete, and version workspaces in the organization. | | Security | Invite users, remove users, and edit organization roles. | | Billing | View billing information when billing access is enabled. | Roles are assigned from the organization access dialog. A pending invitation becomes active after the invited person accepts it. # Organizations > Manage PlaidCloud organization settings, billing metadata, SSO requirements, and administrator access in the control plane. An organization is the administrative boundary for related PlaidCloud workspaces. Organization settings control the name shown in the control plane, billing metadata, SSO behavior, and who can administer the organization. ## Organization List [Section titled “Organization List”](#organization-list) The **Organizations** page shows the organizations you can access. From the table you can: 1. Search organizations by visible table fields. 2. Refresh the organization list. 3. View organization details. 4. Edit organization settings when you have admin access. 5. Open organization access management when you have security access. 6. Open billing records when you have billing access. ## Organization Settings [Section titled “Organization Settings”](#organization-settings) | Setting | Description | | ----------- | ------------------------------------------------------------------------------------ | | Name | Display name for the organization. | | ID | Stable organization identifier. It is set when the organization is created. | | Memo | Internal note or description shown in the control plane. | | Plan | Commercial plan: Enterprise, Team, or Free. | | Tax ID | Tax identifier used for billing records. | | Active | Whether the organization is active. | | Locked | Prevents normal organization changes while enabled. | | Require SSO | Requires members to sign in through the organization’s single sign-on configuration. | ## Billing Metadata [Section titled “Billing Metadata”](#billing-metadata) Billing fields describe how the organization is billed. They do not grant product access by themselves. | Setting | Options | | ------------- | ------------------------------------------------ | | Billing cycle | Monthly, Quarterly, Semi-Annual, Annual | | Billing price | Numeric billing amount for the selected cycle. | | Payment type | PO Invoiced, Free Trial, Free, Scheduled Payment | ## SSO Behavior [Section titled “SSO Behavior”](#sso-behavior) | Setting | Description | | --------------------------------- | --------------------------------------------------------------------------------------------------- | | Dynamic group assignment | Allows SSO group information to drive organization role assignment. | | Dynamic user creation | Allows SSO sign-in to create users automatically when the identity provider sends an accepted user. | | SSO dynamic group assignment name | The SSO group name used for dynamic assignment. | Note SSO provider setup is documented separately in the Access management guides. These control-plane settings determine how the organization uses the SSO configuration after it exists. ## Manage Access [Section titled “Manage Access”](#manage-access) Use **Manage Access** to invite people to the organization and assign organization roles. 1. Open **Organizations**. 2. Select the organization. 3. Open **Manage Access**. 4. Click **Invite User** to invite one or more email addresses. 5. Select the roles each person should receive. 6. Add an optional custom message. 7. Send the invitation. You can edit roles for accepted users, remove accepted users, or cancel pending invitations. Pending users cannot be edited until they accept the invitation. ## Organization Roles [Section titled “Organization Roles”](#organization-roles) | Role | Use It For | | --------- | ----------------------------------------------------------------------- | | Admin | Organization settings and administrator-level changes. | | Workspace | Workspace creation, configuration, release, pause, and unpause actions. | | Security | Invitations and organization role management. | | Billing | Billing record access. | # Workspace Configuration > Configure PlaidCloud workspace identity, release channel, services, branding, lakehouse access, invite links, and maintenance windows. The **Workspaces** page lists the PlaidCloud workspaces you can manage. Use it to create a workspace, edit workspace settings, open the workspace, open dashboards, open the SQL console, view logs and metrics, or apply bulk actions to selected workspaces. Caution Delete removes the workspace and its associated data. Confirm backups and retention requirements before deleting a workspace. ## Workspace List [Section titled “Workspace List”](#workspace-list) The workspace table includes name, unique ID, version, release channel, status, paused state, cluster, maintenance day, and maintenance time. The table supports search, refresh, row actions, and bulk actions. Available row actions include: | Action | Description | | -------------- | ---------------------------------------------------- | | View | Open read-only workspace details. | | Edit | Change workspace configuration. | | Delete | Delete the workspace. | | Home | Open the workspace application. | | Dashboards | Open the workspace dashboard service. | | SQL | Open the workspace SQL console. | | Logs & Metrics | Open operational logs and metrics for the workspace. | ## Bulk Actions [Section titled “Bulk Actions”](#bulk-actions) Select one or more workspaces to use bulk actions. | Bulk Action | Description | | ----------- | -------------------------------------------------------------------------------------------------------- | | Set Version | Pins selected workspaces to a chosen version. All selected workspaces must use the same release channel. | | Pause | Pauses selected workspaces. | | Unpause | Restores selected paused workspaces. | ## Primary Settings [Section titled “Primary Settings”](#primary-settings) Primary settings identify the workspace, assign ownership, and control release behavior. | Setting | Description | | --------------------- | ----------------------------------------------------------------------------------------------------------------------------------- | | Name | Display name for the workspace. | | ID | Unique workspace identifier. This becomes part of the workspace identity and is set during creation. | | Memo | Optional description or administrative note. | | Organization | Organization that owns the workspace. | | Owner | Owner email for the workspace. | | Data Center (Cluster) | Data center or cluster where the workspace runs. | | Release | Release channel: Rapid, Regular, Stable, or No Release (pinned). | | Version | Specific deployed version. | | Active | Marks the workspace as active. | | Paused | Pauses workspace operation without deleting the workspace. | | Invite Link Lifespan | How long new-user invitation links remain valid. Options are 12 hours, 1 day, 2 days, 3 days, 5 days, 7 days, 10 days, and 14 days. | ## Release Options [Section titled “Release Options”](#release-options) | Release | Typical Use | | ------------------- | ------------------------------------------------------------------------------------ | | Rapid | Receives updates earliest. Use for workspaces that can accept faster product change. | | Regular | Default update cadence for most workspaces. | | Stable | Slower cadence for workspaces that prioritize update stability. | | No Release (pinned) | Keeps the workspace on the selected version until an administrator changes it. | The maintenance window determines when automatic release updates are applied. ## Theming [Section titled “Theming”](#theming) Theming settings customize branding across workspace entry points. | Setting | Description | | ---------------------- | --------------------------------------------------------- | | App Logo | Logo shown in the workspace application. | | Splash Screen Logo | Logo shown on the workspace splash or sign-in screen. | | Superset Logo | Logo shown in the dashboard service. | | Superset Custom Themes | Custom dashboard color themes available to the workspace. | ## Services [Section titled “Services”](#services) Service settings enable optional applications and supporting services inside the workspace. | Setting | Description | | ----------------------------------- | --------------------------------------------------- | | JupyterHub | Enables hosted notebook access for workspace users. | | Web SQL Console (CloudBeaver) | Enables browser-based SQL access. | | Dashboards (Apache Superset) | Enables the dashboard service. | | User Forms & Workflows (Forms Flow) | Enables forms and workflow app support. | | Changed Data Capture (Apache Flink) | Enables changed-data-capture processing. | | SFTP Access and Web UI | Enables SFTP access and the SFTP web interface. | | Use PlaidCloud Proxy Download | Routes downloads through the PlaidCloud proxy path. | | Activate Vector Database (Weaviate) | Enables vector database support. | | Activate Custom App Sandbox | Enables custom app sandbox support. | | Activate Project Management App | Enables the project management application. | Some services may require additional provisioning outside the control-plane form before users can use them. ## Lakehouse [Section titled “Lakehouse”](#lakehouse) Lakehouse settings control external database connectivity and extra database users. | Setting | Description | | ------------------------------- | ------------------------------------------------------------------------------------------------ | | Enable External Database Access | Allows external clients to connect to the workspace lakehouse. | | Allowed CIDRs | Comma-separated CIDR ranges that are allowed to connect. Leave blank if no allow list is needed. | | Denied CIDRs | Comma-separated CIDR ranges that are denied. | | Additional Lakehouse Users | Extra lakehouse usernames and passwords for database access. | Use the narrowest CIDR ranges that support your users and integrations. Remove unused lakehouse users when access is no longer needed. ## Maintenance [Section titled “Maintenance”](#maintenance) Maintenance settings define the workspace’s preferred update window. | Setting | Description | | ------- | ----------------------------------------------------- | | Day | Day of week for maintenance: Sunday through Saturday. | | Time | Time of day in 15-minute increments. | Release-channel upgrades use the workspace maintenance window. Pinned workspaces are skipped by automatic release-channel upgrades until an administrator selects a new version or channel. # Scheduled Workflows > Schedule and automate PlaidCloud workflow execution using the Event Scheduler with ordering, timing, and conditional triggers. Schedule PlaidCloud workflows to run on a calendar or be triggered by other events. Configure run windows, ordering, conditional triggers, and retry behavior. # Event Scheduler > Configure the PlaidCloud Event Scheduler to automate workflow execution with custom timing, ordering, and conditional triggers. ## Description [Section titled “Description”](#description) Scheduling specific workflows can be a useful organization tool, so PlaidCloud provides the ability to do just that. Using event scheduler, you can schedule a workflow to run by month, day, hour, minute, or even on a financial workday schedule. If using the financial workday schedule approach, PlaidCloud also allows configuration of holiday schedules using various holiday calendars. The Events Table will indicate whether the event is scheduled by month, day, hour and minute, or workday under the event description column. **To view events:** 1. Open Analyze 2. Select “Tools” 3. Click “Event Scheduler” This will open the **Events Table** showing all the current events configured for the workspace. Note If the event is active, the “Active” icon will be displayed. ## Creating an Event [Section titled “Creating an Event”](#creating-an-event) **To create an event:** 1. Open Analyze 2. Select “Tools” 3. Click “Event Scheduler” 4. Click “Add Scheduled Event” 5. Complete the required fields 6. Click “create” **Limit Running**: this section allows you to schedule an event to run for a specific time period and a specific number of times. Otherwise, you can set the workflow to run using the **classic schedule** approach. **To use the classic schedule approach:** 1. Click the “Event Schedule” tab of the Event table 2. Under the “Schedule type” select “Use Classic Schedule” 3. Select the specific months, hours, minutes, and days you want the workflow to run **To set the workflow to run using the workday schedule approach:** 1. Click the “Event Schedule” tab of the Event table 2. Under the “Schedule type” select “Use Workday Schedule” 3. Choose the workday you would like the workflow to run on Note By default, the timezone for events is set to UTC but can be adjusted using the “Timezone” field. ## Editing an Event [Section titled “Editing an Event”](#editing-an-event) **To edit an event:** 1. Open Analyze 2. Select “Tools” 3. Click “Event Scheduler” 4. Click the edit icon 5. Adjust desired fields 6. Click “Update” ## Deleting an Event [Section titled “Deleting an Event”](#deleting-an-event) **To delete an event:** 1. Open Analyze 2. Select “Tools” 3. Click “Event Scheduler” 4. Click the delete icon 5. Click delete again ## Pausing an Event [Section titled “Pausing an Event”](#pausing-an-event) **To temporarily pause an event:** 1. Open Analyze 2. Select “Tools” 3. Click “Event Scheduler” 4. Click the edit icon 5. Uncheck the “Active” checkbox 6. Click “Update” Saving the event after unchecking the active box means the event will no longer run on the specified schedule until it’s reactivated. ## Running Events on Demand [Section titled “Running Events on Demand”](#running-events-on-demand) **To run an event immediately:** 1. Open Analyze 2. Select “Tools” 3. Click “Event Scheduler” 4. Select the desired event or events 5. Click “Run Selected Events” # Upcoming Runs Calendar > Preview when scheduled PlaidCloud workflows will run on a month, week, or agenda calendar — and spot overlapping run windows before they collide. ## Description [Section titled “Description”](#description) The **Upcoming Runs Calendar** is a read-only preview of when your scheduled workflows will run next. It expands the same schedules that drive actual execution, so what you see on the calendar is what will run — without triggering anything. It answers two questions that the Events table alone can’t: * **When does everything run?** A single view of every enabled schedule’s upcoming runs. * **Do any runs collide?** Each run is drawn as a bar sized to the workflow’s typical duration, laid out side-by-side so overlapping execution windows are obvious. Note The calendar never starts a workflow. It reads upcoming occurrences from the scheduler and estimates each run’s length from past run history. Bar lengths are an estimate and are labelled as such. ## Views [Section titled “Views”](#views) Switch between three views: * **Month** — a calendar grid of upcoming runs across the month. * **Week** — a week at a time, with runs placed on the day and time they’ll fire. * **Agenda** — a chronological list of upcoming runs. Use **Previous**, **Next**, and **Today** to move through time, **Refresh** to re-pull the latest occurrences, and **Filter…** to narrow to specific schedules. Note Times are shown in each schedule’s own timezone. A note on the calendar reminds you of this, since a workspace can have schedules configured in different zones. ## Reading the Calendar [Section titled “Reading the Calendar”](#reading-the-calendar) * **Bars** represent a scheduled run; longer bars mean a longer typical run duration. * **Side-by-side bars** in the same window mean those runs overlap — a cue to stagger their schedules if they compete for the same resources or data. * **No upcoming scheduled runs** appears when nothing is scheduled in the visible range. Sensors (event-driven triggers) are **not** shown — they fire in response to events and have no deterministic future time. ## Scope [Section titled “Scope”](#scope) The calendar adapts to where you open it: * **Workspace- or project-wide** — the comprehensive deconfliction view, showing every enabled schedule’s upcoming runs together. * **A single schedule** — a focused agenda popover of just that schedule’s upcoming runs. ## Next Steps [Section titled “Next Steps”](#next-steps) * [Event Scheduler](/administration/scheduled-events/event-scheduler/) — create and edit the schedules shown here * [Advanced workflows](/guides/workflows/advanced-workflows/) — build workflows on the visual canvas # Get started > Begin with PlaidCloud — concepts, quickstart, and end-to-end tutorials. New to PlaidCloud? Start here. [Quickstart ](/get-started/quickstart/)Build your first workflow in about 10 minutes. [Concepts ](/get-started/concepts/)The data model — workspaces, projects, workflows, tables, dimensions, and allocations. [Tutorials ](/get-started/tutorials/)End-to-end scenarios that walk through real analytics work. [FAQ ](/get-started/faq/)Common questions about PlaidCloud — what it is, plans, capabilities, and getting help. [Start a free trial ](https://app.plaidcloud.com)Spin up a workspace and follow along with your own data. # Concepts > The PlaidCloud data model — workspaces, projects, workflows, tables, dimensions, and allocations. PlaidCloud is built around a small set of concepts that compose. Once these click, the rest of the documentation is mostly about how to do specific things with them. ## Organization, Workspace, Member [Section titled “Organization, Workspace, Member”](#organization-workspace-member) Your account starts at the **organization** level — the billing and identity boundary. Inside an organization are one or more **workspaces**, which are isolated environments where actual work happens. **Members** are users who belong to an organization and are granted access to specific workspaces with specific roles. * Most teams start with one workspace per environment (dev, staging, prod) or one per business unit. * Security groups inside a workspace control what each member can do. More: [Access management](/administration/access/) ## Project [Section titled “Project”](#project) A **project** is the unit of work inside a workspace. Each project owns its own data, workflows, dimensions, and audit history. Projects don’t share state with each other — they’re isolated. Use projects to separate distinct analyses, business processes, or data products from each other. More: [Projects](/guides/projects/) ## Connection [Section titled “Connection”](#connection) A **connection** is a saved configuration that lets PlaidCloud reach an external system — a database, a cloud storage account, an ERP, a REST API. Connections are reused by workflow steps so credentials aren’t duplicated across steps. More: [Connections (task)](/guides/connections/) · [Connectors (reference)](/reference/connectors/) ## Table and View [Section titled “Table and View”](#table-and-view) A **table** is structured data inside a project — rows and columns, like a SQL table. Tables come from imports, transformations, or external sources. **Views** are saved query results layered on top of tables. More: [Tables and views](/guides/data/) ## Workflow and Step [Section titled “Workflow and Step”](#workflow-and-step) A **workflow** is a pipeline that operates on tables. Each workflow is a sequence of **steps**: import a CSV, join two tables, filter rows, export to JSON, send a notification. Steps can run sequentially, in parallel, conditionally, or in loops. Steps come in categories: * **Import** — pull data in (CSV, Excel, SQL, Parquet, JSON, etc.) * **Tables** — transform tables (join, filter, melt, pivot, append, upsert) * **Export** — push data out * **Document** — handle PDFs, images, and arbitrary files * **Notifications** — send messages via email, Slack, Teams, SMS, webhook * **Allocation** — execute cost allocation models * **Dimension** — build and modify hierarchies * **SAP / SAP-PCM** — call SAP-specific operations * **Workflow control** — variables, loops, sub-workflows More: [Workflows (task)](/guides/workflows/) · [Workflow steps (reference)](/reference/workflow-steps/) ## Dimension [Section titled “Dimension”](#dimension) A **dimension** is a hierarchy — typically used for slicing or aggregating data. Cost centers, products, geography, time. Dimensions can be built from tables, loaded from external sources, or modified incrementally. Allocations use dimensions to decide *what to allocate to what*. More: [Dimensions](/guides/dimensions/) ## Allocation [Section titled “Allocation”](#allocation) An **allocation** spreads values from one set of rows to another based on driver data and rules. Think transfer pricing, activity-based costing, IT chargeback, profitability — any time you have a pool of cost that needs to be distributed across consumers. Allocations combine tables (values, drivers, results), dimensions (the rules), and workflow steps (to execute the model). More: [Allocations](/guides/allocations/) ## Dashboard [Section titled “Dashboard”](#dashboard) A **dashboard** is a published, interactive view of project data. Build from published tables and views. More: [Dashboards](/guides/dashboards/) ## AI Assistant [Section titled “AI Assistant”](#ai-assistant) A project-scoped chat for asking questions about your data and workflows. Conversations persist and are isolated per project. More: [AI Assistant](/guides/ai-assistant/) ## How They Fit Together [Section titled “How They Fit Together”](#how-they-fit-together) A typical end-to-end flow: 1. **Set up a connection** to your source system. 2. Inside a **project**, build a **workflow** that: * **Imports** data via the connection (creates tables) * **Transforms** with table steps * **Joins** with **dimensions** for context * **Allocates** if you’re doing cost spreading * **Publishes** the result 3. A **dashboard** reads the published tables. 4. **Workspace members** browse the dashboard or query results via the AI Assistant. Note Most documentation pages assume you understand these terms. If something on a guide or reference page seems to skip a step, it’s because that piece is covered here. # Frequently Asked Questions > Common questions about PlaidCloud — what it is, what it does, getting started, plans, support, and common gotchas. ## What Is PlaidCloud? [Section titled “What Is PlaidCloud?”](#what-is-plaidcloud) PlaidCloud is a unified financial analytics platform. Connect data sources, build workflows that transform and combine the data, define dimensions and hierarchies that match how your business is organized, run cost allocations, and publish results to dashboards or downstream systems — all in one platform. ## Who Uses PlaidCloud? [Section titled “Who Uses PlaidCloud?”](#who-uses-plaidcloud) Primarily finance, FP\&A, and analytics teams in mid-to-large organizations doing work like: * **Cost allocation** — activity-based costing, IT chargeback, shared-service distribution, transfer pricing * **Profitability analysis** — customer / product / channel margin at scale * **Financial consolidation** — combining data across entities and currencies * **Operational reporting** — dashboards over enterprise data with clean, governed metrics * **Data warehousing** — building a unified analytical layer over operational systems ## Getting Started [Section titled “Getting Started”](#getting-started) ### How Do I Try PlaidCloud? [Section titled “How Do I Try PlaidCloud?”](#how-do-i-try-plaidcloud) [Start a free trial](https://app.plaidcloud.com) — self-serve sign-up gets you a workspace in a few minutes. ### Where Do I Start After Signing Up? [Section titled “Where Do I Start After Signing Up?”](#where-do-i-start-after-signing-up) The [Quickstart](/get-started/quickstart/) walks through your first workflow in about 10 minutes. From there: [Quickstart ](/get-started/quickstart/)10-minute walkthrough — sign up, create a project, run your first workflow. [Concepts ](/get-started/concepts/)Understand the data model — workspaces, projects, workflows, tables, dimensions. [Tutorials ](/get-started/tutorials/)Longer end-to-end scenarios — load and transform data, build an allocation, connect an AI agent. ### Do I Need a Specific Technical Background? [Section titled “Do I Need a Specific Technical Background?”](#do-i-need-a-specific-technical-background) Most PlaidCloud users are business analysts comfortable with Excel and SQL fundamentals. You don’t need to be a developer. The platform exposes workflows and data operations through a visual interface, with SQL expressions available when you need them. ## Plans and Pricing [Section titled “Plans and Pricing”](#plans-and-pricing) ### What Plans Are Available? [Section titled “What Plans Are Available?”](#what-plans-are-available) PlaidCloud offers self-service trial workspaces, team plans, and enterprise plans. Plan limits, pricing, and feature availability differ across tiers. For current pricing and plan details, see [plaidcloud.com](https://plaidcloud.com/) or contact your account team. ### What’s the Free Trial Limit? [Section titled “What’s the Free Trial Limit?”](#whats-the-free-trial-limit) Trial workspaces have time-limited access and usage limits appropriate for evaluating the platform. Specifics are shown during signup. ### Can I Switch Plans Later? [Section titled “Can I Switch Plans Later?”](#can-i-switch-plans-later) Yes — workspaces can be upgraded without losing data or breaking integrations. Talk to your account team about the right path for your situation. ## Capabilities and Limits [Section titled “Capabilities and Limits”](#capabilities-and-limits) ### How Big Can Data Get? [Section titled “How Big Can Data Get?”](#how-big-can-data-get) PlaidCloud’s underlying Lakehouse engine handles small reference tables (hundreds of rows) up to multi-billion-row analytical datasets. For very large workloads, talk to your account team about sizing — compute resources are configurable. ### Does PlaidCloud Replace My Data Warehouse? [Section titled “Does PlaidCloud Replace My Data Warehouse?”](#does-plaidcloud-replace-my-data-warehouse) It can. PlaidCloud Lakehouse is a full analytical store that can serve as your primary data warehouse, or it can sit alongside an existing one (Snowflake, BigQuery, Redshift, etc.) and pull from it via [connectors](/reference/connectors/). ### Can I Run PlaidCloud On-Premises? [Section titled “Can I Run PlaidCloud On-Premises?”](#can-i-run-plaidcloud-on-premises) PlaidCloud is a SaaS platform. For accessing on-premises data sources, the [PlaidLink Agent](/reference/cli/plaidlink/) installs inside your network and bridges PlaidCloud to firewalled databases and file systems. ### Does PlaidCloud Have an API? [Section titled “Does PlaidCloud Have an API?”](#does-plaidcloud-have-an-api) Yes. The API is exposed per-tenant inside each workspace, so the interactive API documentation lives in your workspace rather than centrally on this docs site. For programmatic integration, the Jupyter / CLI access patterns at [Jupyter CLI](/reference/cli/jupyter/) and [PlaidLink](/reference/cli/plaidlink/) are good starting points. ## Working with the Product [Section titled “Working with the Product”](#working-with-the-product) ### Can I Use Excel with PlaidCloud? [Section titled “Can I Use Excel with PlaidCloud?”](#can-i-use-excel-with-plaidcloud) Yes — [PlaidXL](/reference/cli/plaidxl/) is an Excel add-in that lets you pull data from project tables into worksheets and refresh on demand. Useful for analysts whose primary modeling environment is Excel. ### Can I Use AI Tools Like Claude Code or Cursor? [Section titled “Can I Use AI Tools Like Claude Code or Cursor?”](#can-i-use-ai-tools-like-claude-code-or-cursor) Yes — PlaidCloud exposes an MCP (Model Context Protocol) server per workspace. See [AI coding agents](/integrations/ai-coding-agents/) for setup. ### Can I Use Jupyter Notebooks? [Section titled “Can I Use Jupyter Notebooks?”](#can-i-use-jupyter-notebooks) Yes — see [Jupyter CLI](/reference/cli/jupyter/). Authentication uses OAuth tokens, so the same credentials work across CLI, notebooks, and the REST API. ### Can I Use SQL Directly? [Section titled “Can I Use SQL Directly?”](#can-i-use-sql-directly) Yes — workflows accept SQL expressions for column computations, filters, and joins. The [Expressions reference](/reference/expressions/) covers every SQL function available, split by Lakehouse engine version. ### How Do I Schedule Workflows? [Section titled “How Do I Schedule Workflows?”](#how-do-i-schedule-workflows) PlaidCloud has built-in scheduling. See [Scheduled events](/administration/scheduled-events/). ### How Do I Get Notified When a Workflow Finishes (Or Fails)? [Section titled “How Do I Get Notified When a Workflow Finishes (Or Fails)?”](#how-do-i-get-notified-when-a-workflow-finishes-or-fails) Use a notification step — [Email](/reference/workflow-steps/notifications/notify-via-email/), [Slack](/reference/workflow-steps/notifications/notify-via-slack/), [Teams](/reference/workflow-steps/notifications/notify-via-microsoft-teams/), [SMS](/reference/workflow-steps/notifications/notify-via-sms/), or [webhook](/reference/workflow-steps/notifications/notify-via-web-hook/). You can also configure a remediation workflow that runs automatically on failure. ## Security and Access [Section titled “Security and Access”](#security-and-access) ### How Are Permissions Managed? [Section titled “How Are Permissions Managed?”](#how-are-permissions-managed) PlaidCloud uses **security groups** at the workspace level. Members are assigned to groups; groups grant specific capabilities. See [Access management](/administration/access/). ### Can I Use Single Sign-On (SSO)? [Section titled “Can I Use Single Sign-On (SSO)?”](#can-i-use-single-sign-on-sso) Yes — PlaidCloud supports SAML 2.0. See setup guides for [Okta](/administration/access/advanced/okta-saml-setup/), [Auth0](/administration/access/advanced/auth0-saml-setup/), [Microsoft Entra](/administration/access/advanced/entra-saml-setup/), [Google](/administration/access/advanced/google-saml-setup/), and [AWS](/administration/access/advanced/aws-saml-setup/). ### Where Is My Data Stored? [Section titled “Where Is My Data Stored?”](#where-is-my-data-stored) PlaidCloud data is stored in the cloud region configured for your tenant. Talk to your account team about region-specific deployments if you have data residency requirements. ### How Is Data Encrypted? [Section titled “How Is Data Encrypted?”](#how-is-data-encrypted) In transit and at rest. Encryption keys are managed by the platform; key-management options for regulated industries are available on enterprise plans. ## Common Gotchas [Section titled “Common Gotchas”](#common-gotchas) ### My Workflow Step Errors With “No Rows Returned” [Section titled “My Workflow Step Errors With “No Rows Returned””](#my-workflow-step-errors-with-no-rows-returned) Usually means the filter or join didn’t match what you expected. Open the source tables in Table Explorer and check: * Are the join key columns spelled the same in both tables (including casing)? * Are there leading/trailing spaces or hidden characters in the key values? * Is the filter condition more restrictive than you intended? ### My Allocation Results Don’t Reconcile [Section titled “My Allocation Results Don’t Reconcile”](#my-allocation-results-dont-reconcile) The sum of allocated amounts should equal the sum of source amounts. If it doesn’t: * **Orphaned source rows** — a source row with no matching driver data won’t allocate. Check that every source row has a driver value. * **Missing target members** — a dimension member with no driver entry won’t receive an allocation. Confirm the dimension and driver table are in sync. * **Negative drivers** — produce unexpected behavior. Filter them out or handle explicitly. See [Troubleshooting allocations](/guides/allocations/results/troubleshooting-allocations/). ### My Dimension Load Created Duplicate Members [Section titled “My Dimension Load Created Duplicate Members”](#my-dimension-load-created-duplicate-members) Usually a casing or whitespace issue in the source data. Dimensions treat “ACME Corp” and “Acme Corp” as different members. Normalize the source before loading, or use a transform step upstream to clean it. ### Can’t See a Project a Coworker Mentioned [Section titled “Can’t See a Project a Coworker Mentioned”](#cant-see-a-project-a-coworker-mentioned) Project visibility is controlled by workspace security groups. Ask a workspace administrator to add you to the group that grants access to the project. ## Getting Help [Section titled “Getting Help”](#getting-help) ### How Do I Reach Support? [Section titled “How Do I Reach Support?”](#how-do-i-reach-support) * For trial and self-serve users: email * For enterprise customers: your dedicated support channel, typically Slack or a customer portal — ask your account team ### Where Do I File a Bug or Feature Request? [Section titled “Where Do I File a Bug or Feature Request?”](#where-do-i-file-a-bug-or-feature-request) Through the same channels as support. Bugs that affect documentation specifically can also be flagged via the **Edit Page** link at the top of any doc page. ### Is There a User Community? [Section titled “Is There a User Community?”](#is-there-a-user-community) PlaidCloud user forums and shared knowledge base are accessible from the in-product help menu once you have a workspace. The docs site is the public-facing resource. # Quickstart > Build your first PlaidCloud workflow in about 10 minutes. This walkthrough takes you from a new workspace to a working data transformation in roughly 10 minutes. [Start your free trial ](https://app.plaidcloud.com)You'll need a PlaidCloud workspace to follow along. The free trial gets you one in a few minutes. ## What You’ll Build [Section titled “What You’ll Build”](#what-youll-build) A small workflow that: 1. Imports a CSV into a project table 2. Filters and transforms the data with a couple of steps 3. Publishes the result so other tools can consume it Note This is intentionally lightweight — it’s a tour of the moving parts, not a deep dive. Once you’ve finished, the [Concepts](/get-started/concepts/) page explains *why* things are organized the way they are, and the [Guides](/guides/) section covers each task in detail. ## 1. Open a Project [Section titled “1. Open a Project”](#1-open-a-project) After signing in, you’ll land in your workspace. Open the **Projects** tab and create a new project — give it a descriptive name like “Quickstart”. A project is where your data, workflows, and dimensions live together. See [Managing projects](/guides/projects/managing-projects/) for more on the project lifecycle. ## 2. Import a CSV [Section titled “2. Import a CSV”](#2-import-a-csv) Inside the project, open the **Workflows** tab and create a new workflow. Add an **Import → CSV** step. Either upload a small CSV from your local machine or point at a CSV in a connected document store. Run that step. The CSV lands as a table in your project. Open the **Tables** tab to see it. Reference: [Import CSV step](/reference/workflow-steps/import/import-csv/) · Guide: [Where are workflows?](/guides/workflows/where-are-workflows/) ## 3. Add a Transform Step [Section titled “3. Add a Transform Step”](#3-add-a-transform-step) Back in the workflow, add a **Tables → Table Lookup** step (or any of the table steps). Configure source and target, choose which columns to keep, and apply a simple filter. When you run the step, the output is a new table you can preview. Reference: [Workflow steps](/reference/workflow-steps/) ## 4. Publish the Result [Section titled “4. Publish the Result”](#4-publish-the-result) Add a final **Publish** step (under **Data → Publish**) so the table becomes available to dashboards, BI tools, or external consumers. Guide: [Publishing data](/guides/data/publish/) ## 5. Run the Whole Workflow [Section titled “5. Run the Whole Workflow”](#5-run-the-whole-workflow) Click **Run** on the workflow. Watch the log as each step executes. If anything errors, the [Managing step errors](/guides/workflows/managing-step-errors/) guide covers debugging. ## Where to Go Next [Section titled “Where to Go Next”](#where-to-go-next) * **Understand the model** — [Concepts](/get-started/concepts/) explains what a workspace, project, workflow, table, and dimension actually are. * **Build something real** — [Tutorials](/get-started/tutorials/) walk through end-to-end scenarios (loading and transforming data, building an allocation, publishing a dashboard). * **Browse by task** — [Guides](/guides/) covers specific things you might want to do. * **Look something up** — [Reference](/reference/) has every workflow step, expression, and connector. # Tutorials > End-to-end scenarios that walk through real analytics work in PlaidCloud. Tutorials are longer, scenario-based walkthroughs. Each builds something concrete and references the underlying guides as it goes. [Load, Transform, and Publish Data ](/get-started/tutorials/load-and-transform-data/)\~1 hour · End-to-end workflow that imports a CSV, transforms it through table steps, and publishes the result for dashboards. [Build an Allocation Model ](/get-started/tutorials/build-an-allocation/)\~1 hour · Spread a cost pool across business units using driver data and a dimension hierarchy. [Connect an AI Coding Agent ](/get-started/tutorials/mcp-with-ai-agent/)\~20 min · Wire Claude Code, Cursor, ChatGPT, or another MCP-compatible tool to your PlaidCloud workspace. If you’d like more tutorials or a specific scenario covered, contact your account team — concrete use cases drive what we add here. # Build an Allocation Model > End-to-end tutorial — spread a pool of costs across consumers using driver data and a hierarchy. This tutorial walks through building a complete cost allocation in PlaidCloud. By the end you’ll have a working model that spreads a cost pool across a target dimension using driver data — the foundation of activity-based costing, IT chargeback, and shared-service distribution. Takes about an hour. Allocations are PlaidCloud’s most distinctive feature; this is the best way to understand the model. ## What You’ll Build [Section titled “What You’ll Build”](#what-youll-build) Spread total IT department cost across business units, weighted by each unit’s user count. ```text IT cost table → ┐ ├→ Allocation step → results table User counts → ┤ (driver-based) ┘ ↑ Business unit dimension ``` ## Prerequisites [Section titled “Prerequisites”](#prerequisites) * A PlaidCloud workspace ([start a free trial](https://app.plaidcloud.com) if you don’t have one) * A project containing or able to import: * **Values to allocate** — IT cost by month, total or by sub-category * **Driver data** — user count by business unit * **A dimension** — hierarchy of business units (units → divisions → company) * Familiarity with [Concepts](/get-started/concepts/) — workspace, project, workflow, dimension, allocation Note If you don’t have the source data yet, you can prep small CSV files and follow [Load, Transform, and Publish Data](/get-started/tutorials/load-and-transform-data/) to import them first. ## Step 1: Set up the Inputs [Section titled “Step 1: Set up the Inputs”](#step-1-set-up-the-inputs) Confirm you have these three things in your project: | Object | What it holds | | -------------------------- | ------------------------------------------------------------------------------------- | | `it_cost` table | Total IT cost (e.g., one row per month with the amount) | | `users_by_bu` table | One row per business unit with a `user_count` column | | `business_units` dimension | Hierarchy of business units, with leaf nodes matching the unit names in `users_by_bu` | If anything’s missing, load it via a workflow before continuing. ## Step 2: Create the Allocation Workflow [Section titled “Step 2: Create the Allocation Workflow”](#step-2-create-the-allocation-workflow) 1. Open the project and switch to the **Workflows** tab. 2. Click **New Workflow**. Name it “IT Cost Allocation”. 3. Open the new workflow in the Workflow Explorer. ## Step 3: Add the Allocation Step [Section titled “Step 3: Add the Allocation Step”](#step-3-add-the-allocation-step) 1. Add a new step from the **Allocation** category. Choose **Allocation Rules** — the most flexible option. 2. Configure the source: * **Source table**: `it_cost` * **Source amount column**: the column with the dollar amount 3. Configure the driver: * **Driver table**: `users_by_bu` * **Driver value column**: `user_count` * **Driver match column**: the column with the business unit name 4. Configure the target: * **Target dimension**: `business_units` * **Target level**: choose which level of the hierarchy to allocate to (usually the leaf level) 5. Configure the output: * **Result table**: `it_cost_allocated` ## Step 4: Run and Inspect [Section titled “Step 4: Run and Inspect”](#step-4-run-and-inspect) 1. Run the step. 2. Open the **Tables** tab and click into `it_cost_allocated`. 3. Each row represents one slice of cost going to one business unit. Columns include: * The source amount (e.g., total IT cost for the month) * The target business unit * The driver value used (that unit’s user count) * The allocation rate (user count ÷ total user count across all units) * The allocated amount (source × rate) ## Step 5: Verify the Numbers [Section titled “Step 5: Verify the Numbers”](#step-5-verify-the-numbers) Three checks every allocation should pass: 1. **Total reconciliation** — sum of allocated amounts equals the source total (within rounding tolerance) 2. **Rate sum** — allocation rates sum to 1.0 (= 100%) per source row 3. **Coverage** — every business unit in your dimension that should receive a slice actually got one The [Allocation results](/guides/allocations/results/allocation-results/) guide has a full checklist for verification. If something’s off, the most common issues are: * **Missing driver data** — a business unit in the dimension with no row in `users_by_bu` won’t receive an allocation * **Mismatched names** — driver table says “Sales East” but dimension says “Sales-East” (different spacing/casing won’t match) * **Zero or negative drivers** — produce zero or unexpected allocations ## Step 6: Use the Results [Section titled “Step 6: Use the Results”](#step-6-use-the-results) The result table is just like any other PlaidCloud table — it can be: * **Joined** with other tables in further workflow steps (e.g., add a fully-loaded cost column to the GL) * **Published** for dashboards or external consumers * **Re-allocated** as a source for the next round of allocations (for cascading models) ## Variations to Try [Section titled “Variations to Try”](#variations-to-try) Once the basic model works, common extensions: * **Multiple cost pools** — replace the single `it_cost` row with one row per IT sub-category (compute, storage, licensing) and allocate each independently with different drivers * **Multi-period** — partition the source by month and produce one allocation result per period * **Layered allocations** — allocate divisional overhead to units, then unit costs to products, then product costs to customers. Each layer is its own allocation step * **Recursive allocations** — when shared services consume each other (IT serves HR, HR serves IT). See [Recursive allocations](/guides/allocations/setup/recursive-allocations/) ## What’s Next [Section titled “What’s Next”](#whats-next) * [Allocations guide](/guides/allocations/) — every option and configuration choice * [Rule-based tagging](/guides/allocations/getting-started/rule-based-tagging/) — different allocation rules per source row * [Allocation step reference](/reference/workflow-steps/allocation/) — every workflow step in the Allocation category * [Dimensions guide](/guides/dimensions/) — building the hierarchies that allocations target # Load, Transform, and Publish Data > End-to-end tutorial — import a CSV, clean it with table steps, and publish the result for downstream consumers. This tutorial takes about an hour. By the end you’ll have a working workflow that imports a CSV, transforms it through a few table steps, and publishes the result. ## What You’ll Build [Section titled “What You’ll Build”](#what-youll-build) A workflow that takes a raw sales CSV, cleans it, joins it to a product reference table, computes derived columns, and publishes the result as a clean fact table. ```text sales.csv → import → filter → join with products → add columns → publish ↓ products.csv → import ``` ## Prerequisites [Section titled “Prerequisites”](#prerequisites) * A PlaidCloud workspace ([start a free trial](https://app.plaidcloud.com) if you don’t have one) * A project to work in (create one from the **Projects** tab if needed) * Two CSV files to import — for this tutorial we’ll use a simple sales transactions file and a product catalog file. You can use your own or generate sample data with any spreadsheet tool ## Step 1: Create the Workflow [Section titled “Step 1: Create the Workflow”](#step-1-create-the-workflow) 1. Open your project and switch to the **Workflows** tab. 2. Click **New Workflow**. Name it something descriptive like “Sales Cleanup”. 3. Click **Create**. The empty workflow appears in your list. 4. Double-click the workflow to open the Workflow Explorer. ## Step 2: Import the Sales CSV [Section titled “Step 2: Import the Sales CSV”](#step-2-import-the-sales-csv) 1. In the Workflow Explorer, add a new step. 2. Choose **Import → CSV** (or whichever import step matches your source format). 3. In the step configuration: * **Source file** — point at your sales CSV (upload, or pick from a connected document account) * **Target table** — name it `sales_raw` * **Delimiter, quote character, header row** — adjust if your file is non-standard 4. Run the step. The CSV lands as a new table in your project. Check the **Tables** tab to see `sales_raw`. Click into it to verify the data looks right — column count, sample rows, data types. ## Step 3: Import the Product Catalog [Section titled “Step 3: Import the Product Catalog”](#step-3-import-the-product-catalog) Repeat Step 2 with your products CSV, targeting a table named `products_raw`. This gives you both reference tables needed for the join. ## Step 4: Filter Out Bad Rows [Section titled “Step 4: Filter Out Bad Rows”](#step-4-filter-out-bad-rows) Real sales data has gaps — null amounts, test transactions, refunds you don’t want in the main fact table. Add a filter step to remove them. 1. Add a **Tables → Table Lookup** step (or any table transform that lets you filter). 2. Configure: * **Source table**: `sales_raw` * **Target table**: `sales_clean` * **Filter conditions**: e.g., `amount > 0 AND status = 'completed'` 3. Run the step. `sales_clean` should have fewer rows than `sales_raw`. Note Filters apply as the data flows from source to target. The source table is unchanged. This is a core PlaidCloud pattern — each step reads from one or more sources and writes to one or more targets, leaving the originals intact for auditability. ## Step 5: Join to the Product Catalog [Section titled “Step 5: Join to the Product Catalog”](#step-5-join-to-the-product-catalog) Now combine the cleaned sales rows with product details from the catalog. 1. Add a **Tables → Table Inner Join** step. 2. Configure: * **Left table**: `sales_clean` * **Right table**: `products_raw` * **Join keys**: the column linking the two tables (e.g., `product_id`) * **Target table**: `sales_enriched` 3. Run the step. `sales_enriched` now has every column from both tables. You probably want a subset — that comes next. ## Step 6: Select and Compute Columns [Section titled “Step 6: Select and Compute Columns”](#step-6-select-and-compute-columns) Cleaning typically means dropping columns you don’t need and computing derived ones. Add another table step: 1. Add a **Tables → Table Lookup** step (used here for column selection and computation). 2. Configure: * **Source**: `sales_enriched` * **Target**: `sales_final` * **Columns to keep**: the subset that matters for downstream consumers * **Computed columns**: e.g., `revenue = amount * price`, `margin = revenue - cost` 3. Run the step. For column-level calculations, the [Expressions reference](/reference/expressions/) covers every function available — string operations, date math, conditional logic, aggregations. ## Step 7: Publish the Result [Section titled “Step 7: Publish the Result”](#step-7-publish-the-result) Make `sales_final` available to dashboards and downstream systems. 1. Add a **Data → Publish** step (or use the **Publish** option directly on the table). 2. Configure who can read the published table — typically other members of the workspace plus any external systems that have access. 3. Run the step. The published table is now reachable by [Dashboards](/guides/dashboards/), BI tools, and any external consumer with the right permissions. ## Step 8: Run the Whole Workflow [Section titled “Step 8: Run the Whole Workflow”](#step-8-run-the-whole-workflow) Click **Run** on the workflow (not just one step). Watch the log as each step executes in order. The complete pipeline runs from CSV import through to publish. If any step errors, the [Managing step errors](/guides/workflows/managing-step-errors/) guide covers debugging — the most common issues are bad join keys (mismatch between tables) and unexpected null values in computed columns. ## What’s Next [Section titled “What’s Next”](#whats-next) * [Build an Allocation Model](/get-started/tutorials/build-an-allocation/) — spread costs across consumers using driver data * [Workflows guide](/guides/workflows/) — error handling, conditions, loops, variables * [Workflow steps reference](/reference/workflow-steps/) — every step type and what it does # Connect an AI Coding Agent > End-to-end tutorial — wire Claude Code, Cursor, Copilot, or another AI coding agent to your PlaidCloud workspace via MCP. This tutorial sets up an AI coding agent to interact with your PlaidCloud workspace using the **Model Context Protocol** (MCP). Once configured, you can ask the agent to read tables, run workflows, build allocations, and answer questions about your data — directly from your editor. Takes about 20 minutes. Works with Claude Code, Cursor, Claude Desktop, ChatGPT, GitHub Copilot, and any MCP-compatible client. ## What You’ll Build [Section titled “What You’ll Build”](#what-youll-build) A working connection between your AI coding agent and your PlaidCloud workspace, where the agent can: * List projects, tables, workflows, and dimensions * Read table contents * Inspect workflow definitions * Trigger workflow runs * Answer questions about your data without you switching contexts ## Prerequisites [Section titled “Prerequisites”](#prerequisites) * A PlaidCloud workspace ([start a free trial](https://app.plaidcloud.com) if you don’t have one) * An AI coding agent you already use — Claude Code, Cursor, Claude Desktop, ChatGPT, Copilot, or Gemini * The agent must support MCP (most current AI tools do) ## Step 1: Find Your Workspace’s MCP URL [Section titled “Step 1: Find Your Workspace’s MCP URL”](#step-1-find-your-workspaces-mcp-url) Every PlaidCloud workspace exposes an MCP endpoint at: ```text https://.plaid.cloud/mcp/ ``` Replace `` with your workspace subdomain — the same one you use to sign in to the PlaidCloud UI. ## Step 2: Get an Authentication Token [Section titled “Step 2: Get an Authentication Token”](#step-2-get-an-authentication-token) 1. While signed in to PlaidCloud in a browser, visit: ```plaintext https://.plaid.cloud/mcp/setup/token ``` 2. Copy the bearer token shown on the page. Keep it safe — it grants the same access your account has. Caution Treat the token like a password. It bypasses interactive authentication and acts on your behalf. If it leaks, revoke it from the same endpoint. ## Step 3: Configure Your Agent [Section titled “Step 3: Configure Your Agent”](#step-3-configure-your-agent) * Claude Code Run in your terminal: ```bash claude mcp add --transport http plaidcloud https://.plaid.cloud/mcp/ ``` For static-token authentication (no OAuth flow, simpler for long sessions), open the URL from Step 2 in a browser, copy the displayed config snippet, and paste it into your `.mcp.json` file. See [Claude Code setup](/integrations/ai-coding-agents/claude-code/) for full options. * Cursor Get a Bearer token from `https://.plaid.cloud/mcp/setup/token` and add this to your Cursor MCP config: ```json { "mcpServers": { "plaidcloud": { "url": "https://.plaid.cloud/mcp/", "headers": { "Authorization": "Bearer " } } } } ``` See [Cursor setup](/integrations/ai-coding-agents/cursor/) for full options. * Claude Desktop Open Settings → Developer → MCP Servers and add: * **Server URL**: `https://.plaid.cloud/mcp/` * Use OAuth login when prompted See [Claude Desktop setup](/integrations/ai-coding-agents/claude-desktop/) for full options. * ChatGPT 1. Settings → Connectors → Add custom connector 2. Enter: * **Name**: `PlaidCloud` * **MCP server URL**: `https://.plaid.cloud/mcp/` 3. ChatGPT redirects you to PlaidCloud for OAuth login. Approve the connection. 4. Toggle the connector on inside any conversation that should use it. See [ChatGPT setup](/integrations/ai-coding-agents/chatgpt/) for full options. * Copilot Get a Bearer token and add to `.vscode/mcp.json`: ```json { "servers": { "plaidcloud": { "url": "https://.plaid.cloud/mcp/", "headers": { "Authorization": "Bearer " } } } } ``` VSCode reads this on startup and on file change. See [Copilot setup](/integrations/ai-coding-agents/copilot/) for full options. ## Step 4: Verify the Connection [Section titled “Step 4: Verify the Connection”](#step-4-verify-the-connection) Ask the agent something simple: > “List the projects in my PlaidCloud workspace.” The agent should respond with your project list. If it doesn’t, check the troubleshooting steps below. ## Step 5: Try Real Tasks [Section titled “Step 5: Try Real Tasks”](#step-5-try-real-tasks) Once connected, try: * **Explore your data**: “What columns does the `sales` table in my Quickstart project have?” * **Run a workflow**: “Trigger the `monthly_close` workflow in the Financials project and let me know when it finishes.” * **Build something**: “Create a Table Lookup step in my Test workflow that filters orders to the last 30 days.” The agent’s responses will be grounded in your actual workspace state — not generic answers. ## Troubleshooting [Section titled “Troubleshooting”](#troubleshooting) If the agent can’t reach PlaidCloud or returns auth errors: * **Token expired** — refresh it at `https://.plaid.cloud/mcp/setup/token` * **Wrong scopes** — some tools require specific PlaidCloud scopes (e.g., `analyze.workflow.write`). Run `mcp_introspect(name='')` in the agent to see required scopes * **Workspace subdomain wrong** — confirm by signing into the PlaidCloud UI; the subdomain is the part before `.plaid.cloud` See [AI coding agents troubleshooting](/integrations/ai-coding-agents/troubleshooting/) for more. ## What’s Next [Section titled “What’s Next”](#whats-next) * [AI coding agents getting started](/integrations/ai-coding-agents/getting-started/) — base setup details * [AI coding agents reference](/integrations/ai-coding-agents/) — every supported agent and its setup * [Concepts](/get-started/concepts/) — the PlaidCloud data model the agent will be reasoning over # Guides > Task-oriented how-to documentation for using PlaidCloud. How to accomplish specific things in PlaidCloud. Each guide is task-focused — find the goal, follow the steps. ## Data [Section titled “Data”](#data) [Connections ](/guides/connections/)Connect projects to external databases, file stores, and APIs. [Tables and views ](/guides/data/)Explore, publish, and manage project tables. [Dimensions ](/guides/dimensions/)Build and load hierarchies for slicing and aggregating data. [Documents ](/guides/documents/)Connect cloud storage accounts and manage documents. ## Modeling [Section titled “Modeling”](#modeling) [Workflows ](/guides/workflows/)Build, run, and manage data transformation pipelines. [Allocations ](/guides/allocations/)Configure cost allocations, drivers, and recursive models. [Projects ](/guides/projects/)Organize work into projects with hierarchies, editors, and audit logs. ## Analysis and Delivery [Section titled “Analysis and Delivery”](#analysis-and-delivery) [Dashboards ](/guides/dashboards/)Build interactive dashboards from your published data. [AI Assistant ](/guides/ai-assistant/)Project-scoped chat for asking questions about your data and workflows. [Panel apps ](/guides/panel-apps/)Create and use interactive Panel apps. [Email ](/guides/email/)Send email notifications from workflows. [Sandbox ](/guides/sandbox/)A safe scratch space for trying things out. # AI Assistant > Use the built-in PlaidCloud AI Assistant to ask questions, generate expressions, and operate on your projects in natural language. The PlaidCloud AI Assistant is the in-app chat experience for asking questions about your data, generating workflow expressions, and performing operations in natural language. It is separate from the [AI Agents (MCP)](../ai-agents/) area, which covers connecting external AI clients to your tenant. # Using the AI Assistant > Chat with the PlaidCloud AI Assistant — manage conversations, see token usage, and ask the assistant to draft expressions for workflow steps. ## Description [Section titled “Description”](#description) The AI Assistant is a project-scoped chat. Open the project, then click the **AI** tab alongside Home, Workflows, Tables, etc. The tab is split into two parts: a conversation history list on the left, and a tabbed chat workspace on the right. Each conversation is its own tab, so you can keep several threads open at once. Conversations persist, so they survive across sessions, browsers, and devices. ## Start a Conversation [Section titled “Start a Conversation”](#start-a-conversation) 1. Open the project’s **AI** tab 2. A new chat tab is created automatically; click in the input box at the bottom 3. Type your question and press Enter (or click `Send`) The assistant streams its response back, including any tool calls it made along the way. Click `+` on the tab bar to start an additional conversation in parallel. ## Manage Past Conversations [Section titled “Manage Past Conversations”](#manage-past-conversations) The history list on the left of the **AI** tab shows every conversation in this project for your user, most recent first. **To switch to a past conversation:** 1. Click the conversation in the history list 2. The full transcript opens in a new chat tab (or the existing one if it’s already open) **To delete a conversation:** 1. Right-click the conversation in the history list 2. Select `Delete Thread` Note Deleting a conversation also closes its chat tab if one is open. ## Token Usage [Section titled “Token Usage”](#token-usage) Every AI response shows the token usage for that turn — input tokens, output tokens, and a running total for the conversation. Use this to keep an eye on cost as you work. ## Automatic Tool Selection [Section titled “Automatic Tool Selection”](#automatic-tool-selection) The assistant decides on its own which tools to call and which documents to consult for each question. There are no toggles to choose what’s used; tool selection happens behind the scenes by scoring the available tools against your prompt. If the answer doesn’t use the tool you expected, rephrase the question or include the table, project, or document name explicitly. ## Expression AI [Section titled “Expression AI”](#expression-ai) The Expression Editor — used by Project Table, Calculate, Filter, and any other step that takes expressions — has the AI Assistant built in as a side panel. 1. Open a workflow step that uses expressions 2. Open the editor for the column you want to fill in 3. The AI panel sits alongside the expression editor; ask it to draft or fix the expression 4. Copy the suggested expression into the editor The chat already has the column list and types from the current step, so you can ask questions like “concatenate first\_name and last\_name with a space” without restating the schema. # Allocation Assignments > Configure PlaidCloud allocation models for cost splitting, activity-based costing, IT chargeback, and driver-based distribution. Allocations spread values from one set of rows (“source”) to another (“target”) using driver data and rules. PlaidCloud supports rule-based tagging, allocation split, dimension-driven allocation, and recursive allocations for transfer pricing, IT chargeback, and similar cost-distribution problems. # Getting Started with Allocations > Get started with PlaidCloud allocations including quick start guides, rule-based tagging, and understanding allocation use cases. Get started with PlaidCloud allocations — what they are, common use cases (IT chargeback, transfer pricing, activity-based costing, profitability analysis), and a step-by-step walkthrough of your first allocation. # Allocations Quick Start > Quickly set up a basic cost allocation in PlaidCloud with this step-by-step guide covering sources, drivers, and target mapping. This walkthrough takes you from raw cost data to a working allocation model in roughly 30 minutes. ## What You’ll Need [Section titled “What You’ll Need”](#what-youll-need) * A project with at least two tables: * **Values to allocate** — the costs (or revenues, or volumes) you want to spread. One row per source unit, one column with the amount. * **Driver data** — the basis for spreading. Headcount, square footage, transaction counts, revenue — whatever you want to allocate *by*. * A dimension that ties source and target together (cost centers, departments, products — whichever taxonomy fits your model). If you don’t have all of that yet, the [Tables and views](/guides/data/) and [Dimensions](/guides/dimensions/) guides cover loading the inputs. ## Steps [Section titled “Steps”](#steps) 1. **Open the project** that holds your values and driver tables. 2. **Create a new workflow.** Allocations always run inside a workflow — they don’t operate on tables directly outside of one. 3. **Add an Allocation step.** Inside the workflow, add a step from the **Allocation** category. The most common starting point is **Allocation Rules** for straightforward driver-based spreading. 4. **Configure the source.** Point the step at your values table and pick the column holding the amount to allocate. 5. **Configure the driver.** Point the step at the driver table and pick the column holding the driver values. 6. **Map the dimension.** Identify which column on each table represents the dimension members. The allocation step uses these to match source rows to driver rows. 7. **Run the step.** The output is a new table with one row per spread amount. 8. **Inspect results.** Check that the totals match what you expected — sum of allocated amounts should equal sum of source amounts (within rounding tolerance). ## Common Follow-Ups [Section titled “Common Follow-Ups”](#common-follow-ups) * **Spreading recursively** — if a target itself contains drivers for further allocation, see [Recursive allocations](/guides/allocations/setup/recursive-allocations/). * **Tagging rows for allocation** — to drive *which* rows allocate to which targets, see [Rule-based tagging](/guides/allocations/getting-started/rule-based-tagging/). * **Investigating unexpected results** — if totals don’t reconcile or specific rows look wrong, see [Troubleshooting allocations](/guides/allocations/results/troubleshooting-allocations/). ## Next Steps [Section titled “Next Steps”](#next-steps) * [Why are allocations useful?](/guides/allocations/getting-started/why-are-allocations-useful/) — when to use them * [Configure an allocation](/guides/allocations/setup/configure-an-allocation/) — deeper configuration reference * [Allocation step types](/reference/workflow-steps/allocation/) — every workflow step in the Allocation category # Rule Based Tagging > Configure rule-based tagging in PlaidCloud allocations to automatically categorize and label data records using defined criteria. Rule-based tagging lets you mark source rows with metadata that the allocation engine uses to decide *where* those rows go. Use it when you need allocation behavior to vary by row — for example, when costs for one cost center should be spread by headcount but costs for another should be spread by revenue. ## When to Use It [Section titled “When to Use It”](#when-to-use-it) * A flat allocation rule doesn’t capture how cost should actually be spread (different rules for different cost types). * You want to direct certain source rows to specific targets while leaving others to spread normally. * You’re modeling a multi-pool allocation where each pool uses a different driver. ## How Tagging Works [Section titled “How Tagging Works”](#how-tagging-works) 1. **Tag the source.** The values table gets one or more tag columns that classify each row. 2. **Reference tags in the allocation rule.** When configuring the allocation step, you express rules in the form *“if source tag X = value Y, allocate using driver D and target dimension T.”* 3. **The engine routes rows.** Each source row is matched against rules in order; the first matching rule decides the allocation behavior. ## Tag-Friendly Source Patterns [Section titled “Tag-Friendly Source Patterns”](#tag-friendly-source-patterns) * A column named `cost_category` with values like `payroll`, `facilities`, `it`, `marketing` * A column named `pool` that names the allocation pool the row belongs to * A boolean column like `is_overhead` that triggers different treatment ## Example [Section titled “Example”](#example) A cost table with `cost_center` and `cost_category`: | cost\_center | cost\_category | amount | | ------------ | -------------- | ------ | | 1010 | payroll | 50,000 | | 1010 | it | 8,000 | | 1020 | payroll | 35,000 | You can configure two allocation rules: * **Payroll rows** spread by headcount driver * **IT rows** spread by user-count driver Both rules run against the same source table; tags decide which one applies to each row. ## Related [Section titled “Related”](#related) * [Allocations Quick Start](/guides/allocations/getting-started/allocations-quick-start/) — basic flow before adding tagging * [Configure an allocation](/guides/allocations/setup/configure-an-allocation/) — full step reference * [Allocation rules step](/reference/workflow-steps/allocation/allocation-rules/) — workflow step that consumes tagged data # Why are Allocations Useful > Understand why cost allocations are useful in PlaidCloud for activity-based costing, chargeback, and profitability analysis. Allocations spread a pool of values (typically cost, but it works for any aggregate) across consumers based on a measurable driver. They answer the question: *if we incur this cost, how should it be assigned to the things that consume it?* ## Common Use Cases [Section titled “Common Use Cases”](#common-use-cases) ### Activity-Based Costing [Section titled “Activity-Based Costing”](#activity-based-costing) You know your total marketing spend for a quarter. You want to attribute it to specific products based on something measurable — campaign hours, leads generated, qualified opportunities. An allocation spreads the marketing pool across products using your chosen driver, giving each product a fully-loaded cost. ### IT Chargeback [Section titled “IT Chargeback”](#it-chargeback) You spend $X running shared infrastructure (compute, storage, licensing). Each business unit consumes a different amount. An allocation spreads the IT cost across units based on usage metrics — VM hours, storage GB, license seats — so each unit’s P\&L reflects what it actually consumed. ### Shared Service Distribution [Section titled “Shared Service Distribution”](#shared-service-distribution) Finance, HR, legal, facilities — central functions that serve the whole company. Allocations distribute their cost across the divisions they serve, typically by headcount, revenue, or a weighted blend. ### Transfer Pricing [Section titled “Transfer Pricing”](#transfer-pricing) For multi-entity organizations, allocations model how internal services are priced between entities. The output drives intercompany journal entries. ### Profitability Analysis [Section titled “Profitability Analysis”](#profitability-analysis) You have revenue at the product or customer level. You have costs at various pools (sales, support, infrastructure, COGS). Allocations bring everything together at the product/customer grain so you can see actual margin. ### Bill of Materials Costing [Section titled “Bill of Materials Costing”](#bill-of-materials-costing) Cost flows down a hierarchy of components. Each step in the BoM is an allocation: subassembly costs spread to assemblies, assemblies to finished goods, finished goods to SKUs. ## What Allocations Save You From [Section titled “What Allocations Save You From”](#what-allocations-save-you-from) Without an allocation engine, you’d build these models in spreadsheets — fragile, hard to audit, hard to repeat with updated data. PlaidCloud allocations give you: * **Reproducible models** that re-run automatically as source data refreshes * **Audit trail** showing which source rows contributed to which target rows * **Layered allocations** where outputs feed further allocations * **Dimensional integration** so allocations respect your existing hierarchies ## Next Steps [Section titled “Next Steps”](#next-steps) * [Allocations Quick Start](/guides/allocations/getting-started/allocations-quick-start/) — build one in 30 minutes * [Rule-Based Tagging](/guides/allocations/getting-started/rule-based-tagging/) — control allocation behavior by row * [Configure an allocation](/guides/allocations/setup/configure-an-allocation/) — full configuration reference # Results and Troubleshooting > Review PlaidCloud allocation results, analyze output data, and troubleshoot common allocation configuration issues. Review allocation outputs, validate that totals reconcile end-to-end, and troubleshoot common issues — orphaned source rows, missing target members, negative drivers, and rounding artifacts. # Allocation Results > Analyze PlaidCloud allocation results including reviewing output data, verifying distributions, and validating allocation accuracy. After running an allocation step, the output is a result table you can inspect, verify, and feed into downstream steps. ## What the Result Table Contains [Section titled “What the Result Table Contains”](#what-the-result-table-contains) A typical allocation result row includes: * **Source identifier** — the row in the source table this allocation came from * **Target identifier** — the row in the target dimension or table that received the spread * **Allocated amount** — the share of the source amount assigned to this target * **Driver value** — the driver number that justified the spread (e.g., the headcount, the revenue, the hours) * **Allocation rate** — driver share as a proportion of the total * **Source amount** — the original total being spread (carried for auditability) * **Pool / tag / rule reference** — if rule-based tagging was used, which rule produced this row The exact columns depend on the allocation step type and your configuration. ## Verification Checklist [Section titled “Verification Checklist”](#verification-checklist) Before relying on the results: 1. **Reconciliation** — sum of allocated amounts equals sum of source amounts (within floating-point tolerance). If not, something didn’t spread. 2. **No orphaned source rows** — every source row produced at least one allocation row. Orphans usually mean no driver data matched the source’s tag or dimension member. 3. **No orphaned targets** — if you expected every target to receive something, check that every target dimension member appears in the results. 4. **Reasonable rates** — allocation rates should sum to 1.0 (100%) per source pool. Rates significantly off-target indicate driver data issues. 5. **Spot-check totals** — pick a high-value source row and verify its allocation matches what you’d compute by hand. ## Common Patterns to Look For [Section titled “Common Patterns to Look For”](#common-patterns-to-look-for) * **Zero allocations** — a target that received nothing usually means the driver row was missing or had a zero value * **Mass concentration** — most of the spread landing on one target usually means the driver column has one very large value (often a data quality issue upstream) * **Negative drivers** — depending on the allocation step, negative driver values may produce inverted spreads. Verify intent. ## Next Steps [Section titled “Next Steps”](#next-steps) * [Troubleshooting allocations](/guides/allocations/results/troubleshooting-allocations/) — what to do when reconciliation fails * [Publishing data](/guides/data/publish/) — once you trust the results, publish them for dashboards and downstream consumers # Troubleshooting Allocations > Troubleshoot PlaidCloud allocation issues including common errors, configuration problems, and techniques for debugging results. ## Stranded Cost [Section titled “Stranded Cost”](#stranded-cost) Stranded cost is… ## Over Allocation of Cost [Section titled “Over Allocation of Cost”](#over-allocation-of-cost) Over allocation of cost is when you end up with more output cost… ## Incorrect Allocation of Cost [Section titled “Incorrect Allocation of Cost”](#incorrect-allocation-of-cost) Incorrect allocation of costs happens when… # Configure Allocations > Configure PlaidCloud allocation models including allocation rules, driver data, source and target mapping, and recursion settings. Configure PlaidCloud allocation steps — define allocation rules, choose driver data, map source columns to targets, and control recursion and ordering. # Configure an Allocation > Configure a PlaidCloud allocation including source data, driver data, target mapping, allocation methods, and processing options. ## Purpose [Section titled “Purpose”](#purpose) Allocations enable values (typically costs) to be shredded to a more-granular level by applying a driver. Allocations are used to for a multitude of purposes. including but not limited to **Activity-Based Costing**, **IT & Shared Service Chargeback**, calculation of fully loaded cost to produce and provide a good or service to customers, etc. They are a fundamental tool for financial analysis, and a cornerstone for managerial reporting operations such as **Customer & Product Profitability**. They are also a useful construct for establishing and managing global Intercompany Transfer Prices for goods and services. ## Setting up the Allocation Transform [Section titled “Setting up the Allocation Transform”](#setting-up-the-allocation-transform) From a practical purpose, allocations are set up in PlaidCloud in similar fashion as other data transforms such as joins and lookups. Four configuration parameters must be set in order for an Allocation transform to succeed. 1. **Specify Preallocated Data**: Specify the preallocated data table in the **Values To Allocate Table** section of the allocation transform. 2. **Specify Driver Data**: Driver data will serve as the basis for the ratios used in the allocation. Choose the driver data table in the **Driver Data Table** section of the allocation transform. 3. **Specify the Results Table**: Post-allocated data must be stored in a table. Specify the table in the **Allocation Result Table** section of the allocation result section of the transform. 4. **Specify the Assignment Dimension**: Allocations require an assignment dimension, whose purpose is to provide the prescription for how each record or set of records in the preallocated will be assigned. Specify the the assignment dimension in the **Assignment Dimension Hierarchy** section of the allocation transform. ## Key Concepts [Section titled “Key Concepts”](#key-concepts) The sum of values in an allocated dataset should tie out to those of the pre-allocated source data Allocations are accessible in PlaidCloud as a transform option. To set up an allocation, first, set up assignments, and then configure an allocation transform to use the assignments to allocate inbound records using a specified driver table. Assignments are special dimensions. They are accessed within the Dimensions section of a PlaidCloud Project. To set up an assignment dimension, perform the following steps: 1. From the project screen, Navigate to the Dimensions tab 2. Create a new dimension # Recursive Allocations > Set up recursive allocations in PlaidCloud to handle multi-pass cost distribution where allocated costs feed subsequent rounds. Recursive allocations handle the case where a target of one allocation becomes a *source* of the next. Common when modeling shared services that consume each other — IT serves HR, but HR also serves IT. ## When You Need Recursion [Section titled “When You Need Recursion”](#when-you-need-recursion) * **Reciprocal services** — two cost pools that consume each other. * **Layered spreads** — divisional costs cascade through a hierarchy and the lower levels need to absorb the upper levels before re-allocating. * **Iterative balance** — the model needs to converge after multiple passes (cost pool A allocates some to B, B allocates some back, repeat until stable). ## How Recursion Works in PlaidCloud [Section titled “How Recursion Works in PlaidCloud”](#how-recursion-works-in-plaidcloud) Configure the allocation step with: * **Source table** — where the values start * **Driver table** — the basis for spreading * **Recursion mode** — direct (one pass), reciprocal (resolve mutual dependencies), or iterative (loop until convergence) * **Convergence tolerance** — for iterative mode, how close residuals must be to zero before the loop stops * **Maximum iterations** — safety cap so a non-converging model doesn’t loop forever The output table includes a generation or iteration column so downstream consumers can see which pass each row came from. ## Reciprocal vs Iterative [Section titled “Reciprocal vs Iterative”](#reciprocal-vs-iterative) * **Reciprocal** — solves a simultaneous equation in one mathematical pass. Use when relationships are well-defined and finite. * **Iterative** — runs allocations repeatedly until each round produces a residual smaller than your tolerance. Use when you want explicit control over how many passes happen, or when the relationship isn’t easily inverted. ## Tips [Section titled “Tips”](#tips) * Start with **direct** (single-pass) allocations and confirm the simple model behaves as expected before introducing recursion. * For iterative models, log the residual at each pass while tuning. If residuals don’t shrink, the model has a circular dependency the engine can’t resolve cleanly. * Recursive allocations can be expensive on large datasets. Test on a slice before running across the full source. ## Related [Section titled “Related”](#related) * [Configure an allocation](/guides/allocations/setup/configure-an-allocation/) — base configuration * [Allocation split step](/reference/workflow-steps/allocation/allocation-split/) — workflow step for split allocations * [Troubleshooting allocations](/guides/allocations/results/troubleshooting-allocations/) — when residuals don’t make sense # Connections > Set up and manage PlaidCloud connections — saved configurations that let workflows reach external databases, cloud storage, ERPs, and REST APIs. A **connection** is a saved configuration that lets PlaidCloud reach an external system — a database, cloud storage account, ERP, or REST API. Workflow steps that need to read from or write to that system reference the connection, so credentials and endpoint details live in one place. ## Guides [Section titled “Guides”](#guides) * [Clone a Connection](/guides/connections/clone-connection/) — duplicate an existing connection for a new environment or tenant. * [Singer Sources](/guides/connections/singer-sources/) — connect to sources such as Stripe, GitHub, Slack, and BigQuery with Singer taps, then import their data into project tables. ## Related [Section titled “Related”](#related) * [Connectors reference](/reference/connectors/) — the full catalog of supported systems and the fields each one needs. # Clone a Connection > Clone an existing external data connection in PlaidCloud to reuse its configuration as the starting point for a new connection. ## Description [Section titled “Description”](#description) Cloning duplicates the configuration of an existing connection — host, port, options, credentials reference — so you can edit a few fields and save it as a new connection rather than re-entering every setting. Cloning works for every external data connection type: database, ERP, REST, cloud service, Git, and document. ## Clone a Connection [Section titled “Clone a Connection”](#clone-a-connection) 1. Open **Tools > Connections** 2. Select the connection you want to copy 3. Click `Clone` in the toolbar (or right-click the row and select `Clone`) 4. Edit the new connection’s name and any fields that should differ 5. Click `Save` ## Owner-Only Actions [Section titled “Owner-Only Actions”](#owner-only-actions) `Edit`, `Clone`, and `Delete` are only available on connections you own. If those toolbar buttons are greyed out for the selected row, you are not the owner — ask the owner to clone the connection for you, or have them add you as an additional owner via `Edit Owners`. Note Cloning copies the configuration but not any test results or run history. Test the cloned connection before relying on it in a workflow. # Singer Sources > Connect to SaaS, API, and database sources such as Stripe, GitHub, Slack, and BigQuery with Singer taps, then import their data into project tables with a workflow step. ## Description [Section titled “Description”](#description) A **Singer source** lets PlaidCloud pull data from a wide catalog of SaaS apps, APIs, and databases — such as Stripe, GitHub, Slack, and BigQuery — using [Singer](https://www.singer.io/) taps. Each tap is a connector for one source; you pick the tap when you create the connection, and PlaidCloud shows the configuration fields that tap needs. Using a Singer source has two parts: 1. **A Singer Source connection** holds the tap choice and its settings (API token, account ID, start date, and so on). 2. **An Import Singer Source workflow step** discovers the tap’s available streams, lets you choose which to import and where each lands, and runs the extract. Note PlaidCloud ships a curated catalog of permissively licensed taps. The exact configuration fields differ from tap to tap — the connection form is generated from the tap you select. ## Before You Start [Section titled “Before You Start”](#before-you-start) You’ll need credentials for the source system (for example, a GitHub personal access token or a Stripe API key), and the project and workflow where you want the data to land. ## Create a Singer Source Connection [Section titled “Create a Singer Source Connection”](#create-a-singer-source-connection) 1. Open **Tools > Connections**. 2. Click **New Connection** and choose **Singer Source**. 3. Give the connection a **Name** (for example, `GitHub (prod)`). 4. Choose a **Tap** from the dropdown. The form below it rebuilds to show that tap’s fields. See [Singer Sources](/reference/connectors/singer-sources/) for the full catalog and a link to each source’s configuration docs. 5. Fill in the tap’s configuration fields, then click **Create**. ### Configuration Field Types [Section titled “Configuration Field Types”](#configuration-field-types) The fields depend on the tap, and each is rendered to match the value the tap expects: * **Text** — a single-line value (for example, an account ID or start date). * **Password** — a secret such as an API token or key. Secrets are write-only: they aren’t shown when you edit the connection, and leaving one blank on save keeps the stored value. * **Number** — an integer or decimal (for example, a port or page size). * **List** — one entry per line (for example, a list of repositories or project IDs). * **JSON** — a structured value entered as JSON, used when a tap expects an object or an array of objects. For example, a CSV tap’s file definitions: ```json [ { "entity": "orders", "path": "/data/orders.csv", "keys": ["id"] } ] ``` * **Checkbox** — an on/off option. Required fields are marked, and the form validates them (including that JSON and number fields are well-formed) before it saves. ## Import Streams into a Workflow [Section titled “Import Streams into a Workflow”](#import-streams-into-a-workflow) 1. Open the workflow and go to the **Analyze Steps** tab. 2. Add a step and choose **Import: Singer Source** as the type. The editor opens with a **Source** tab and a **Streams** tab. 3. On the **Source** tab: * Choose the **Connection** (your Singer Source connection). * Choose a **Sync Mode** — **Full table (replace each run)**, **Incremental (append new data)**, or **Upsert (merge on key)**. * Click **Discover Streams**. Discovery runs on the runner and may take up to about three minutes the first time; the status shows how many streams were found. 4. On the **Streams** tab, you’ll see one row per discovered stream. For each stream you want: * Open its **Stream** panel and check **Import this stream**. * Choose a **Target Table** for where the stream’s data lands. * If the sync mode is **Upsert**, set the stream’s **Key Columns** — one column name per line. The field defaults to the tap’s declared primary key; override it to merge on different columns. Every imported stream needs at least one key column when the mode is **Upsert**. 5. Save the step and run it as part of the workflow (or [on its own](/guides/workflows/running-one-step-in-a-workflow/)). Caution The streams you select *are* the saved set. To change which streams import, click **Discover Streams** again — streams that reappear keep the target table and selection you already set. ## Sync Modes [Section titled “Sync Modes”](#sync-modes) * **Full table (replace each run)** re-extracts the whole stream every run and replaces the target table. Use it for small or fully refreshed sources. * **Incremental (append new data)** extracts only the rows that are new since the last run and appends them, resuming from where the previous run left off. It works only for streams the tap can sync incrementally — those that expose a replication key (such as an updated-at timestamp or an incrementing ID). If you choose incremental and a selected stream has no replication key, the step asks you to switch that stream to full table or deselect it. * **Upsert (merge on key)** re-extracts the whole stream each run, then merges it into the target on the stream’s **Key Columns**: rows whose key matches an existing row are updated in place, and rows with a new key are inserted. Existing rows that aren’t in this run are kept. Use it to keep a table in step with a source whose records change over time — without the duplicates an append would create or the full rebuild a replace would do. The target table is created on the first run, so the initial upsert inserts every row. Note Incremental progress is tracked per step. The first incremental run extracts everything; later runs pick up only new rows. Note Upsert re-extracts the full stream each run (it doesn’t use a replication key), so it suits sources where existing rows are updated and you want one row per key. Set each stream’s **Key Columns** on the **Streams** tab; they default to the tap’s declared primary key. Caution Pick key columns that are always present. Upsert matches rows by exact key value, and an empty (null) key never matches another — so rows with a null key are always inserted rather than merged, and can accumulate across runs. ## How Credentials Are Handled [Section titled “How Credentials Are Handled”](#how-credentials-are-handled) The step stores a reference to the connection, not a copy of its credentials. Each run reads the connection’s current credentials at run time. So when you rotate a token or key, update it once on the connection and every step that uses it picks up the new value on its next run — there’s nothing to update on the individual steps. The extract runs in an isolated job that receives only the tap’s own configuration; it has no access to other connections or to PlaidCloud’s internal services. ## Related [Section titled “Related”](#related) * [Singer Sources catalog](/reference/connectors/singer-sources/) — every available source and its configuration docs * [Import Singer Source step reference](/reference/workflow-steps/import/import-singer/) * [Connections](/guides/connections/) # Dashboards > Create and customize PlaidCloud dashboards to visualize data with interactive charts, graphs, and dynamic metric displays. Build interactive dashboards over PlaidCloud project data — charts, metrics, calculated columns, dynamic filters, and embedded data exploration. # Example Calculated Columns > Learn how to create calculated columns in PlaidCloud dashboards using formulas and expressions for custom data transformations. ## Description [Section titled “Description”](#description) Data in dashboards can be augmented with calculated columns. Each dataset will contain a section for calculated columns. Calculated columns can be written and modified with PostgreSQL-flavored SQL. ## Navigating to a dataset In order to view and edit metrics and calculated expressions, perform the following steps: 1. Sign into plaidcloud.com and navigate to dashboards 2. From within visualize.plaidcloud.com, navigate to Data > Datasets 3. Search for a dataset to view or modify 4. Modify the dataset by hovering over the `edit` button beneath `Actions` ## Examples [Section titled “Examples”](#examples) ### Count [Section titled “Count”](#count) ```sql COUNT(*) ``` ### Min [Section titled “Min”](#min) ```sql min("MyColumnName") ``` ### Max [Section titled “Max”](#max) ```sql max("MyColumnName") ``` ### Coalesce (useful for Converting Nulls to 0.0, for Instance) [Section titled “Coalesce (useful for Converting Nulls to 0.0, for Instance)”](#coalesce-useful-for-converting-nulls-to-00-for-instance) ```sql coalesce("BaselineCost",0.0) ``` ### Substring [Section titled “Substring”](#substring) ```sql substring("PERIOD",6,2) ``` ### Cast [Section titled “Cast”](#cast) ```sql CAST("YEAR" AS integer)-1 ``` ### Concat [Section titled “Concat”](#concat) ```sql concat("Biller Entity" , ' ', "Country_biller") ``` ### To\_char [Section titled “To\_char”](#to_char) ```sql to_char("date_created", 'YYYY-mm-dd') ``` ### Left [Section titled “Left”](#left) ```sql left("period",4) ``` ### Divide [Section titled “Divide”](#divide) divide, with a hack for avoiding DIV/0 errors ```sql sum("so_infull")/(count(*)+0.00001) ``` Note A better way to do this would be to check for a null or zero denominator and then coalese to zero rather than attempting the division. ### Conditional Statement [Section titled “Conditional Statement”](#conditional-statement) ```sql CASE WHEN "Field_A"= 'Foo' THEN max(coalesce("Value_A",0.0)) - max(coalesce("Value_B",0.0)) END ``` ```sql CASE WHEN "sol_otif_pod_missing" = 1 THEN 'POD is missing.' ELSE 'POD exists.' END ``` ```sql case when "Customer DC" = "origin_dc" or "order_reason_type" = 'Off Schedule' or "mot_type" = 'UPS' then 'Yes' else 'No' end ``` ```sql CASE WHEN "module_type" is NULL THEN '---' ELSE "module_type" END ``` ```sql CASE WHEN "NODE_TYPE" = 'External' THEN '3rd Party' ELSE "ENTITY_LOCATION_DESCRIPTION" END ``` ### Concatenate [Section titled “Concatenate”](#concatenate) ```sql concat("Class",' > ',"Product Family",' > ',"Meta Series") ``` # Example Metrics > Explore common dashboard metric examples in PlaidCloud including KPIs, aggregations, and calculated measures for data analysis. ## Description [Section titled “Description”](#description) Data in dashboards can be augmented with metrics. Each dataset will contain a section for Metrics. Metrics can be written and modified with PostgreSQL-flavored SQL. ## Navigating to a dataset In order to view and edit metrics and calculated expressions, perform the following steps: 1. Sign into plaidcloud.com and navigate to dashboards 2. From within visualize.plaidcloud.com, navigate to Data > Datasets 3. Search for a dataset to view or modify 4. Modify the dataset by hovering over the `edit` button beneath `Actions` ## Examples [Section titled “Examples”](#examples) Calculated columns are typically additional columns made by combining logic and existing columns. ### Convert a Date to Text [Section titled “Convert a Date to Text”](#convert-a-date-to-text) ```sql to_char("week_ending_sol_del_req", 'YYYY-mm-dd') ``` ### Various SUM Examples [Section titled “Various SUM Examples”](#various-sum-examples) ```sql SUM(Value) SUM(-1*"value_usd_mkp") / (0.0001+SUM(-1*"value_usd_base")) (SUM("Value_USD_VAT")/SUM("Value_USD_HEADER"))*100 sum(delivery_cases) where Material_Type = Gloves sum("total_cost") / sum("delivery_count") ``` ### Various Case Examples [Section titled “Various Case Examples”](#various-case-examples) ```sql CASE WHEN SUM("distance_dc_xd") = 0 THEN 0 ELSE sum("XD")/sum("distance_dc_xd") END sum(CASE WHEN "FUNCTION" = 'OM' THEN "VALUE__FC" ELSE 0.0 END) ``` ### Count [Section titled “Count”](#count) ```sql count(*) ``` ### First and Cast [Section titled “First and Cast”](#first-and-cast) ```sql public.first(cast("PRETAX_SEQ" AS NUMERIC)) ``` ### Round [Section titled “Round”](#round) ```sql round(Sum("GROSS PROFIT"),0) ``` ### Concat [Section titled “Concat”](#concat) ```sql concat("GCOA","CC Code") ``` # Formatting Numbers and Other Data Types > Format numbers in PlaidCloud dashboards including currency, percentages, decimal places, and custom number display patterns. ## Formatting Numbers and Other Data Types [Section titled “Formatting Numbers and Other Data Types”](#formatting-numbers-and-other-data-types) There are 2 ways of formatting numbers in PlaidCloud. One way is to transform the values in the tables directly, and a second (more common way) is to format them on display so the values don’t lose precision in the table and the user can see the values in a cleaner, more appropriate way. When I display a value on a dashboard, how do I format it the way I want? The core way to display a value is through a chart object on a dashboard. Charts can be Tables, Big Numbers, Bar Charts, and so on. Each chart object may have a slightly different place or means to display the values. For example, in Tables, you can change the format for each column, and for a Big Number, you can change the format of the number. To change the format, edit the chart and locate the `D3 FORMAT` or `NUMBER FORMAT` field. For a Big Number chart, click on the `CUSTOMIZE` tab, and you will see `NUMBER FORMAT`. For a Table, click on the `CUSTOMIZE` tab, select a number column (displayed with a #) in `CUSTOMIZE COLUMN` and you will see the `D3 FORMAT` field. The default value is `Adaptive formatting`. This will adjust the format based on the values. But if you want to fix it to a format (i.e. $12.23 or 12,345,678), then you select the format you want from the dropdown or manually type a different value (if the field allows). ## D3 Formatting - What is It? [Section titled “D3 Formatting - What is It?”](#d3-formatting---what-is-it) D3 Formatting is a structured, formalized means to display data results in a particular format. For example, in certain situations you may wish to display a large value as 3B (3 billion), formatted as `.3s` in D3 format, or as 3,001,238,383, formatted as `,d`. Another common example is the decision to represent dollar values with 2 decimal precision, or to round that to the nearest dollar $,d or $,.2f to show dollar sign, commas, 2 decimal precision, and a fixed point notation. For a deeper dive into D3, see the following site: [GitHub D3](https://github.com/d3/d3-format) ## General D3 Format [Section titled “General D3 Format”](#general-d3-format) The general structure of D3 is the following: `[​[fill]align][sign][symbol][0][width][,][.precision][~][type]` The fill can be any character (like a period, x or anything else). If you have a fill character, you then have an `align` character following it, which must be one of the following: `>` - Right-aligned within the available space. (Default behavior). `<` - Left-aligned within the available space. `^` - Centered within the available space. `=` - like >, but with any sign and symbol to the left of any padding. The `sign` can be: `-` - blank for zero or positive and a minus sign for negative. (Default behavior.) `+` - a plus sign for zero or positive and a minus sign for negative. `(` - nothing for zero or positive and parentheses for negative. (space) - a space for zero or positive and a minus sign for negative. The `symbol` can be: `$` - apply currency symbol. The `zero` (0) option enables zero-padding; this implicitly sets fill to 0 and align to =. The `width` defines the minimum field width; if not specified, then the width will be determined by the content. For example, if you have 8, the width of the field will be 8 characters. The `comma` (,) option enables the use commas as separators (i.e. for thousands). Depending on the type, the `precision` can either indicate the number of digits that follow the decimal point (types f and %), or the number of significant digits (types ​, g, r, s and p). If the precision is not specified, it defaults to 6 for all types except (none), which defaults to 12. The `tilde` \~ option trims insignificant trailing zeros across all format types. This is most commonly used in conjunction with types r, s and %. `types` | Type | Description | | ---- | -------------------------------------------------------------------------------------------- | | f | fixed point notation. **(common)** | | d | decimal notation, rounded to integer. **(common)** | | % | multiply by 100, and then decimal notation with a percent sign. **(common)** | | g | either decimal or exponent notation, rounded to significant digits. | | r | decimal notation, rounded to significant digits. | | s | decimal notation with an SI prefix, rounded to significant digits. | | p | multiply by 100, round to significant digits, and then decimal notation with a percent sign. | ## Examples [Section titled “Examples”](#examples) | Expression | Input | Output | Notes | | ---------- | --------- | ---------------- | --------------------------------------------------------------------------------------------------------------- | | ,d | 12345.67 | 12,346 | rounds the value to the nearest integer, adds commas | | ,.2f | 12345.678 | 12,345.68 | Adds commas, 2 decimal, rounds to the nearest integer | | $,.2f | 12345.67 | $12,345.67 | Adds a $ symbol, has commas, 2 digits after the decimal | | $,d | 12345.67 | $12,346 | | | .<10, | 151925 | 151,925… | have periods to the left of the value, 10 characters wide, with commas | | 0>10 | 12345 | 0000012345 | pad the value with zeroes to the left, 10 characters wide. This works well for fixing the width of a code value | | ,.2% | 13.215 | 1,321.50% | have commas, 2 digits to the right of a decimal, convert to percentage, and show a % symbol | | x^+$16,.2f | 123456 | xx+$123,456.00xx | buffer with “x”, centered, have a +/- symbol, $ symbol, 16 characters wide, have commas, 2 digit decimal | # Learning About Dashboards > Get started building PlaidCloud dashboards with this learning guide covering chart types, data sources, and layout configuration. ## Description [Section titled “Description”](#description) Dashboards support a wide range of use cases from static reporting to dynamic analysis. Dashboards support complex reporting needs while also providing an intuitive point-and-click interface. There may be times when you run into trouble. A member of the PlaidCloud Support Team is always available to assist you, but we have also compiled some tips below in case you run into a similar problem. ## **Common Questions and Answers for Dashboard** [Section titled “Common Questions and Answers for Dashboard”](#common-questions-and-answers-for-dashboard) ### Preferred Browser [Section titled “Preferred Browser”](#preferred-browser) Due to frequent caching, Google Chrome is usually the best web browser to use with Dashboard. If you are using another browser and encounter a problem, we suggest first clearing the cache and cookies to see if that resolves the issue. If not, then we suggest switching to Google Chrome and seeing if the problem recurs. ### Sync Delay [Section titled “Sync Delay”](#sync-delay) * *Problem:* After unpublishing and publishing tables in the Dashboards area, the data does not appear to be syncing properly. * *Solutions:* Refresh the dashboard. Currently, old table data is cached, so it is necessary to refresh the dashboard when rebuilding tables. ### Table Sync Error [Section titled “Table Sync Error”](#table-sync-error) * *Problem:* After recreating a table using the same published name as a previous table, the table is not syncing, even after hitting refresh on the dashboard, publishing, unpublishing, and republishing the table. * *Solutions:* Republish the table with a different name. The Dashboard data model does not allow for duplicate tables, or tables with the same published name and project ID. ### Cache Warning [Section titled “Cache Warning”](#cache-warning) * *Problem:* A warning popped up on the upper right saying “Loaded data cached **3 hours ago**. Click to force-refresh.” * *Solutions:* Click on the warning to force-refresh the cache. You can also click the drop-down menu beside “Edit dashboard” and select “Force refresh dashboard” there. Either of these options will refresh within the system and is preferred to refreshing the web browser itself. ### Permission Warning [Section titled “Permission Warning”](#permission-warning) * *Problem:* My published dashboard is populating with the same error in each section where data should be populated: “This endpoint requires the datasource… permission” * *Solutions:* Check that the datasources are not old. Most likely, the charts are pulling from outdated material. If this happens, update the charts with new datasources. * *Problem:* I am getting the same permission warning from above, but my colleague can view the chart data. * *Solutions:* If the problem is that one individual can see the data in the charts and another cannot, the second person may need to be granted permission by someone within the permitted category. To do so: 1. Go to Charts 2. Select the second small icon of a pencil and paper next to the chart you want to grant access to 3. Click Edit Table 4. Click Detail 5. Click Owners and add the name of the person you want to grant access to and save. Note As a best practice, any time you create and save a new chart, add all applicable individuals to the Owners section at that time. Otherwise, you will have to go back through to edit and add Owners each time someone new needs access. ### Saving Modified Filters to Dashboard [Section titled “Saving Modified Filters to Dashboard”](#saving-modified-filters-to-dashboard) * *Problem:* I modified filters in my draft model and want to save them to my dashboard. The filters are not in the list. In my draft model, a warning stated, “There is no chart definition associated with this component, could it have been deleted? Delete this container and save to remove this message.” * *Solutions:* Go to “Edit Chart.” From there, make sure the “Dashboards” section has the correct dashboard filled in. If it is blank, add the correct dashboard name. ### Formatting Numbers: Breaks [Section titled “Formatting Numbers: Breaks”](#formatting-numbers-breaks) * *Problem:* My number formatting is broken and out of order. * *Solutions:* The most likely reason for this break is the use of nulls in a numeric column. Using a filter, eliminate all null numeric columns. Try running it again. If that does not work, review the material provided here: or here: . Finally, always feel free to reach out to a PlaidCloud Support team member. This problem is known, and a more permanent solution is being developed. ### Formatting Numbers [Section titled “Formatting Numbers”](#formatting-numbers) To round numbers to nearest integer: 1. *Do not use:* ,.0f 2. *Instead use:* ,d or $,d for dollars ### Importing Existing Dashboard [Section titled “Importing Existing Dashboard”](#importing-existing-dashboard) * *Problem:* I’m importing an existing dashboard and getting an error on my export. * *Solutions:* First, check whether the dashboard has a “Slug.” To do this, open Edit Dashboard, and the second section is titled Slug. If that section is empty or says “null,” then this is not the problem. Otherwise, if there is any other value in that field, you need to ensure that export JSON has a unique slug value. Change the slug to something unique. # Using Dashboards > Learn how to use and interact with PlaidCloud dashboards including filtering, drilling down, exporting, and sharing visualizations. ## Overview [Section titled “Overview”](#overview) Dashboards let you build interactive views over data from any project or workspace you have access to. A single dashboard can combine tables from multiple projects, mix visualizations with raw exploration, and serve both standing reports and ad-hoc analysis. Dashboards scale from small reference tables to billion-row datasets without configuration changes. ## Editing a Table [Section titled “Editing a Table”](#editing-a-table) The message you receive after creating a new table also directs you to edit the table configuration. While there are more advanced features to edit the configuration, we will start with a limited and more simple portion. To edit table configuration: 1. Click on the edit icon of the desired table 2. Click the “List Columns” tab 3. Arrange the columns as desired 4. Click “Save” This allows you to define the way you want to use specific columns of your table when exploring your data. * **Groupable:** If you want users to group metrics by a specific field * **Filterable:** If you need to filter on a specific field * **Count Distinct:** If you want to get the distinct count of this field * **Sum:** If this is a metric you want to sum * **Min:** If this is a metric you want to gather basic summary statistics for * **Max:** If this is a metric you want to gather basic summary statistics for * **Is temporal:** This should be checked for any date or time fields ## Exploring Your Data [Section titled “Exploring Your Data”](#exploring-your-data) To start exploring your data, simply click on the desired table. By default, you’ll be presented with a Table View. ### Getting a Data Count [Section titled “Getting a Data Count”](#getting-a-data-count) To get a the count of all your records in the table: 1. Change the filter to “Since” 2. Enter the desired since filter * You can use simple phrases such as “3 years ago” 3. Enter the desired until filter * The upper limit for time defaults is “now” 4. Select the “Group By” header 5. Type “Count” into the metrics section 6. Select “COUNT(\*)” 7. Click the “Query” button You should then see your results in the table. **If you want to find the count of a specific field or restriction:** 1. Type in the desired restriction(s) in the “Group By” field 2. Run the query Note When using “measurement” in a restriction it will refer to the value of the measurement taken which depends on the type of measurement. Therefore you should ensure the measurement types are the same under the “filter section (e.g. weather\_description and Maximum temperature.)” ### Restricting Result Number [Section titled “Restricting Result Number”](#restricting-result-number) If you only need a certain number of results, such as the top 10: 1. Select “Options” 2. Type in the desired max result count in the “Row Limit” section 3. Click “Query” ### Additional Visualization Tools [Section titled “Additional Visualization Tools”](#additional-visualization-tools) To expand abbreviated values to their full length: 1. Select “Edit Table Config” 2. Click “List Sql Metric” 3. Click “Edit Metric” 4. Click “D3Format” To edit the unit of measurement: 1. Select “Edit Table Config” 2. Click “List Sql Metric” 3. Click “Edit Metric” 4. Click “SQL Expression” To change the chart type: 1. Scroll to “Chart Options” 2. Fill in the required fields 3. Click “Query” From here you are able to set axis labels, margins, ticks, etc. # Data Management - Tabular > Manage tabular data in PlaidCloud using tables, views, and the high-performance Lakehouse engine for any-scale data processing. PlaidCloud’s data layer is built around **tables** (structured row-and-column data) and **views** (saved queries over tables). Both live inside a project and are powered by the Lakehouse engine, which scales from small reference tables to billion-row analytical datasets without configuration changes. ## What’s in This Section [Section titled “What’s in This Section”](#whats-in-this-section) * [Tables and views](/guides/data/tables-views/) — what each is, when to use which, and how they interact * [Table explorer](/guides/data/table-explorer/) — browse and inspect tables in your project * [Publishing data](/guides/data/publish/) — make project tables available to dashboards, BI tools, and downstream systems * [Selecting the latest record in a large history table](/guides/data/selecting-latest-record-in-large-history-table/) — a common pattern with a performance-aware solution ## Where Data Comes From [Section titled “Where Data Comes From”](#where-data-comes-from) Tables are typically populated by **workflows** — automated pipelines that import data, transform it, and write results back. See [Workflows](/guides/workflows/) for how to build them, and [Workflow step reference](/reference/workflow-steps/) for every step type you can use. For connecting external systems as data sources, see [Connections (guide)](/guides/connections/) and [Connectors (reference)](/reference/connectors/). ## Related [Section titled “Related”](#related) * [Concepts](/get-started/concepts/) — how tables relate to workflows, dimensions, and the broader data model * [Projects](/guides/projects/) — projects own the tables; tables don’t exist outside a project * [Dashboards](/guides/dashboards/) — consume published tables for visualization # Publishing Tables > Publish PlaidCloud data tables and views for controlled sharing with downstream applications, reports, and external consumers. Since data pipelines can generate many intermediate tables and views useful for validation and process checks but not suitable for final results reporting, PlaidCloud provides a `Publish` process to help reduce the noise when building Dashboards or pulling data in PlaidXL. The `Publish` process helps clarify which tables and views are final and reliable for reporting purposes. ## Publish [Section titled “Publish”](#publish) From the `Tables` tab in a PlaidCloud project configuration, find the table you wish to publish for use in dashboards and PlaidXL. Right-click on the table record and select `Set Published Table Reporting Name` from the menu. This will open a dialog where you can specify a unique published name. This name does not need to be the same as the table or view name. Enabling a different name is often useful when referencing data sources in dashboards and PlaidXL because it can provide a friendlier name to users. Once the table or view is published, its published name will appear in the `Published As` column in the `Tables` view. Note There are some restrictions on published names. They can be a maximum of 63 characters and do have some restrictions on special characters. This is needed to ensure maximum compatibility with systems, tools, and processes outside of PlaidCloud. ## Unpublish [Section titled “Unpublish”](#unpublish) Unpublishing a table or view is similar to the publish process. From the `Tables` tab in a PlaidCloud project configuration, find the table you wish to publish for use in dashboards and PlaidXL. Right-click on the table record and select `Set Published Table Reporting Name` from the menu. When the dialog appears to set the published name, select the `Unpublish` button. This will remove the table from Dashboard and PlaidXL usage. The published name will no longer appear in the `Published As` column. ## Renaming [Section titled “Renaming”](#renaming) Renaming a table or view is similar to the publish process. From the `Tables` tab in a PlaidCloud project configuration, find the table you wish to publish for use in dashboards and PlaidXL. Right-click on the table record and select `Set Published Table Reporting Name` from the menu. When the dialog appears change the publish name to the new desired name. Press the `Publish` button to update the name. The updated name will now appear in the `Published As` column as well as in Dashboard and PlaidXL. # Selecting the Latest Record in a Large Version History Table > Learn how to efficiently select the latest record from large history tables in PlaidCloud using optimized query techniques. ## Challenge [Section titled “Challenge”](#challenge) A table that contains many versions of each record is available but you must use the latest version. ## Discussion [Section titled “Discussion”](#discussion) This problem could be solved by selecting the ID and MAX update date into a temporary table. Then that temporary table could be INNER JOINED back to the history table to obtain the result. Unfortunately, this requires two steps and storing an intermediate table that has no function other than finding the latest update. The more elegant solution to perform this operation in a single query uses a Window Function with sort plus a filter. ## Solution [Section titled “Solution”](#solution) ### The Version History Table [Section titled “The Version History Table”](#the-version-history-table) | employee\_id | department | salary | update\_date | | ------------ | ---------- | ------ | ------------ | | 3 | IT | 90000 | 2024-09-17 | | 2 | HR | 85000 | 2024-09-17 | | 5 | HR | 82000 | 2024-09-17 | | 3 | IT | 77000 | 2023-10-01 | | 3 | IT | 75000 | 2022-10-04 | | 5 | IT | 72000 | 2024-07-12 | | 2 | IT | 67000 | 2024-03-18 | | 1 | Sales | 62000 | 2022-02-28 | | 5 | Sales | 60000 | 2023-01-14 | | 4 | Sales | 58000 | 2021-11-19 | ### Step Setup [Section titled “Step Setup”](#step-setup) Using an extract step, create a window function expression in a column called `Rank` like: ```python func.rank().over(order_by=table.updated_date.desc(), partition_by=table.employee_id) ``` On the filter tab in the Extract step, set a filter like: ```python table.Rank == 1 ``` ### The Result [Section titled “The Result”](#the-result) | employee\_id | department | salary | update\_date | Rank | | ------------ | ---------- | ------ | ------------ | ---- | | 3 | IT | 90000 | 2024-09-17 | 1 | | 2 | HR | 85000 | 2024-09-17 | 1 | | 5 | HR | 82000 | 2024-09-17 | 1 | | 1 | Sales | 62000 | 2022-02-28 | 1 | | 4 | Sales | 58000 | 2021-11-19 | 1 | This approach is highly efficient and allows selection of the latest record in a multi-version history table in a single step. This works by ranking each record within the `employee_id` group by the `update_date` and then only picking the first record. If there are multiple columns that make up the unique row key, you can add them to the `partition_by` argument as a list like: ```python partition_by=[table.first_column, table.second_column, table.third_column] ``` If you need to apply multi-column sorts you can apply that with a list of columns too like: ```python order_by=[table.first_column.desc(), table.second_column, table.third_column.desc()] ``` # Table Explorer > Use the PlaidCloud Table Explorer to browse table schemas, preview data, view statistics, and manage your data table properties. Table Explorer provides a powerful and readily accessible data exploration tool with built in filtering, summarization, and other features to make life easy for people working with large and complex data. Table Explorer supports exploration on any size dataset so you can use the same tool no matter how much your data grows. It also provides point-and-click filtering along with advanced filter capabilities to zero in on the data you need. The best part is that anywhere in PlaidCloud with tables or views, you can click on those tables and views to explore with Table Explorer. By being fully integrated, data access is only a click away. The `Grid` view provides a tabular view of the data. The `Details` view provides a summary of each column, a count of unique values, and summary statistics for numeric columns. Data can be exported directly from a filtered set as well as being able to save and share filters with others. Finally, the filters and column settings can be saved directly as a workflow `Extract` step. ## The Grid View [Section titled “The Grid View”](#the-grid-view) The Grid view provides a tabular view of the data. ### Setting the Row Limit [Section titled “Setting the Row Limit”](#setting-the-row-limit) By default, the row limit is set to 5,000 rows. However, this can be adjusted or disabled entirely. The rows shown along with the total size of the dataset are shown at the bottom of the table. The information provides three key pieces of information: 1. The current row count shown based on the row limit applied 2. The size of the global data after filters are applied 3. The size of the unfiltered global data Caution Be careful not to disable the row limit functionality when viewing larger (e.g. millions of rows) because this could cause your browser to run slow. Try using filters to find the data instead. ### Sorting Locally Versus Globally [Section titled “Sorting Locally Versus Globally”](#sorting-locally-versus-globally) The Grid view provides the ability to click on the column header and sort the data based on that column. However, this method is only sorting the dataset that has already been retrieved and is not sorting based on the full dataset. If your retrieved data contains the entire dataset this distinction is immaterial however if your full dataset is larger than what appears in the browser, this may not be the desired sort result. If you desire to sort the global dataset before retrieving the limited data that will appear in your browser those sorts can be applied to the columns in the `Details` view by clicking on the `Sort` icon at the top of each column. An additional benefit of using the global sort approach is that you can apply multiple sorts along with a mix of sort directions. ## Quick Reference Column List [Section titled “Quick Reference Column List”](#quick-reference-column-list) All of the columns in the table or view are shown on the left of the Table Explorer window by default. This column list can be toggled on and off using the column list toggle button. The column list provides a number of quick access and useful features including: * Double clicking an item jumps to the column in the `Grid` or `Details` view * Control visibility of the column through the visibility checkbox * Use multi-select and right-click to include or exclude many columns at once * Quickly view the data type of each column using the data type icons * View the total column count ## The Details View [Section titled “The Details View”](#the-details-view) The `Details` view provides an efficient way to view the data at a high level and exposes tools to quickly filter down to information with point-and-click operations. Note Column summaries are not automatically generated for views. You can click on the column refresh button to calculated the details though. ### Column Data and Unique Counts [Section titled “Column Data and Unique Counts”](#column-data-and-unique-counts) Each column is shown, provided it is currently marked as visible. The column summary displays the top 1,000 unique values by count. The number of unique values shown can be adjusted by selecting the `Detailed Rows Displayed` selection for a different value. ### Managing Point-and-Click Filters [Section titled “Managing Point-and-Click Filters”](#managing-point-and-click-filters) Each column provides for point-and-click filtering by activating the filter toggle at the top of the column. Select the items in the column that you would like to include in the resulting data. Multi-select is supported. Once you apply a filter, there may be items you wish to remove or to clear the entire column filter without clearing all filters. This is accomplished by selecting the dropdown on the column filter button and unchecking columns or selecting the clear all option at the top. ### Managing Summarization [Section titled “Managing Summarization”](#managing-summarization) Summarization of the data can be applied by toggling the `Summarize` button to `On`. When the `Summarize` button is activated, each column will display a summarization type to apply. Adjust the summarization type desired for each column. When the desired summarizations are complete, refresh the data and the summarizations will be applied. Examples of summarization types are Min, Max, Sum, Count, and Count Distinct. ### Finding Distinct Values [Section titled “Finding Distinct Values”](#finding-distinct-values) Activating the `Distinct` button will help reduce the data to only a set of unique records. When the `Distinct` button is active, a *Distinct* checkbox will appear on each column. Uncheck the columns that *DO NOT* define uniqueness of the column to the dataset. For example, if you want to find the unique set of customers in a customer order table, you would only want to select the customer column rather than including the customer order number too. Caution If you include too many columns in the unique records determination, it will appear you have many more distinct results than you should. ### Summary Statistics for Numeric Columns [Section titled “Summary Statistics for Numeric Columns”](#summary-statistics-for-numeric-columns) Integer and numeric columns automatically display summary statistics at the bottom of the column information. This includes: * Min * Max * Mean * Sum * Standard Deviation * Variance These statistics are calculated on the full **filtered** dataset. ## Copying Data [Section titled “Copying Data”](#copying-data) It is sometimes useful to allow for copying of selected data from PlaidCloud so that it can be pasted into other applications such as a spreadsheet. From the Copy button in the upper right, there are several copy options available for the data: * Copy All - Copies all of the data to the clipboard * Copy Selection - Copies the selected data to the clipboard * Copy Cell - Copies only the contents of a single cell to the clipboard * Copy Column - Copies the full contents of the column to the clipboard ## Exporting Data [Section titled “Exporting Data”](#exporting-data) Exporting data from the Table Explorer interface allows exporting of the filtered data with only the columns visible. You can export in the following formats: * Microsoft Excel (xlsx) * CSV (Comma) * TSV (Tab) * PSV (Pipe) The Download menu also offers the ability to download only the rows visible in the browser. This is based on using the row limit specified. ## Additional Actions [Section titled “Additional Actions”](#additional-actions) Additional useful actions are available under the `Actions` menu. ### Save as Extract Step [Section titled “Save as Extract Step”](#save-as-extract-step) When exploring data, it is often in the context of determining how to filter it for a data pipeline process. This often consists of applying multiple filters including advanced filters to zero in on the desired result. Instead of attempting to replicate all the filters, columns, summarizations, and sorts in an Extract Step, you can simply save the existing Table Explorer settings as a new Extract Step. ### Save as View [Section titled “Save as View”](#save-as-view) Similar to saving the current Table Explorer settings as an Extract Step above, you can also save the settings directly as a view. This can be particularly useful when trying to construct slices of data for reporting or other downstream processes that don’t require a a data pipeline. ### Manage Saved Filters [Section titled “Manage Saved Filters”](#manage-saved-filters) You never have to lose your filter work. You can save your Table Explorer settings as a saved filter. Saved filters also include column visibility, summarizations, columns filters, advanced filters, and sorts. You can also let others use a saved filter by checking the `Public` checkbox when saving the filter. From the `Actions` menu you can also choose to delete and rename saved filters. ## Advanced Filters [Section titled “Advanced Filters”](#advanced-filters) While point-and-click column filters allow for quick application of filters to zero in on the desired results, sometimes filter conditions are complex and need more advanced specifications. The advanced filter area provides both a pre-aggregation filter as well as a post-aggregation filter, if `Summarize` is enabled. Any valid Python expression is acceptable to subset the data. Please see [Expressions](/reference/expressions/) for more details and examples. # Using Tables and Views > Create, manage, and query tables and views in PlaidCloud to organize and access your structured data for analysis workflows. Tabular data and information in PlaidCloud is stored in Greenplum data warehouses. This provides massive scalability and performance while using well understood and mature technology to minimize risk of data loss or corruption. In addition, utilizing a data warehouse that operates with a common syntax allows 3rd party tools to connect and explore data directly. Essentially, this makes the PlaidCloud data ecosystem open and explorable while also ensuring industry leading security and access controls. ## Tables [Section titled “Tables”](#tables) Tables hold the physical tabular data throughout PlaidCloud. Individual tables can hold many terabytes of data if needed. Data is stored across many physical servers and is automatically mirrored to ensure data integrity and high availability. Tables consist of columns of various data types. Using an appropriate data type can help with performance and especially the storage size of your data. PlaidCloud can do a better job of compressing the data if the data is using the most appropriate data type too. This is usually guessed by PlaidCloud but it is also possible to change the data types using the column mappers in workflow steps. ## Views [Section titled “Views”](#views) Views act just like tables but don’t hold any physical data. They are logical representations of tables derived through a query. Using views can save on storage. There are some limitations to the use of views though. Just be aware of the following: * View Stacking Performance - View stacking (view of a view of a view…etc) can impact performance on very large tables or complex calculations. It might be necessary to create intermediate tables to improve performance. * Dashboard Performance - While perfectly fine to publish a view for Dashboard use, for very large tables you may want to publish a table rather than a view for optimal user experience. * Dynamic Data - The data in a view changes when the underlying referenced table data changes. This can be both a benefit (everything updates automatically) or an unexpected headache if the desire was a static set of data. Note Using views can help speed up workflows since no data movement is necessary at workflow run time. Note Since views contain no data, you will notice that they cannot be used as a target for imports. A table must be used in that case. # Data Management - Dimensions > Manage hierarchical data dimensions in PlaidCloud including attributes, alternate hierarchies, properties, and calculated values. Dimensions are hierarchies you use to slice and aggregate data — cost centers, products, geography, time periods. This section covers managing attributes, alternate hierarchies, properties, and calculated values. # Using Dimensions (Hierarchies) > Create and manage hierarchical dimensions in PlaidCloud including member properties, attributes, and alternate roll-up structures. Dimensions in PlaidCloud are **hierarchies** — tree structures that organize things like cost centers, products, accounts, geography, or time periods. They’re the scaffolding that allocations, dashboards, and reports use to slice and aggregate data. A dimension can carry more than just parent-child relationships. Each node can hold properties, aliases, and values — so a cost center hierarchy can also tell you which currency each center reports in, what business unit it rolls up to in an alternate view, and what its operating budget is. Dimensions are managed in the **Dimensions** tab within each project. ## Main Hierarchy [Section titled “Main Hierarchy”](#main-hierarchy) Every dimension has a **main** hierarchy. The main hierarchy defines the complete set of leaf members — every leaf node anywhere in the dimension must appear here. Think of the main hierarchy as the canonical, single-truth tree. Anything in the dimension is a member of the main hierarchy; the question is just *where* in the tree it sits. ## Alternate (attribute) Hierarchies [Section titled “Alternate (attribute) Hierarchies”](#alternate-attribute-hierarchies) Alternate hierarchies are different views of the leaves in the main hierarchy. They can pick a subset of leaves, group them differently, or use entirely different roll-ups. Two common patterns: * **Subset view** — pull a specific set of leaves into a focused tree for a specific report or allocation. The alternate inherits any changes to its members from the main. * **Different roll-up** — same leaves, different parents. For example: the main hierarchy organizes cost centers by department; an alternate organizes the same cost centers by geography. Note Items in the main hierarchy carry attribute labels showing which alternate hierarchies they also belong to. ## Managing Dimensions [Section titled “Managing Dimensions”](#managing-dimensions) ### Creating [Section titled “Creating”](#creating) From the **New** button in the toolbar, choose **New Dimension**. Enter a name, a directory (for folder-style organization), and a descriptive memo. Click **Create** — the dimension is ready immediately. You can also create one from a workflow using the [Dimension Create](/reference/workflow-steps/dimensions/dimension-create/) step. ### Deleting [Section titled “Deleting”](#deleting) Select the dimension, open the **Actions** menu, and choose **Delete Dimension**. This removes the dimension and all underlying data. You can also delete from a workflow using the [Dimension Delete](/reference/workflow-steps/dimensions/dimension-delete/) step. If you want to keep the dimension but reset its contents (clear all structure, values, aliases, properties, and alternate hierarchies), use [Dimension Clear](/reference/workflow-steps/dimensions/dimension-clear/) instead of delete. ### Copying [Section titled “Copying”](#copying) Select the dimension, open **Actions**, and choose **Copy Dimension**. Specify a name for the copy and click **Create Copy**. The copy includes values, aliases, properties, and all alternate hierarchies. ### Sorting [Section titled “Sorting”](#sorting) The dimension management area lets you move members up and down and change parents directly. For large hierarchies, doing this by hand gets tedious — use the [Dimension Sort](/reference/workflow-steps/dimensions/dimension-sort/) workflow step to sort programmatically. It’s a big time saver after data loads or major restructures. ## Loading Dimensions [Section titled “Loading Dimensions”](#loading-dimensions) Loading a dimension means converting tabular source data into hierarchical structure. PlaidCloud supports two data shapes for loads: * **Parent-Child** — Two columns: one for parent, one for child. Each row defines one edge in the tree. This works for arbitrarily deep, irregular hierarchies. * **Levels** — One column per level. Row by row, each column tells you the parent at that level. Best for regular hierarchies with predictable depth (e.g., country → region → city). Loads can also carry values, aliases, and properties alongside the structure. See the [Dimension Load](/reference/workflow-steps/dimensions/dimension-load/) workflow step for the full set of options. ## Dimension Property Inheritance [Section titled “Dimension Property Inheritance”](#dimension-property-inheritance) A dimension can be configured so that children inherit property values from their ancestors. To turn this on, click the dropdown next to **Properties** and select **Inherited Properties**. Notes on how inheritance behaves: * Inheritance applies to **all** properties in the dimension — you can’t enable it for one property and not others. * If you set a property on a child, then later delete that value, the child reverts to its parent’s value. Children cannot have a null property when the parent has a value. * Setting a property on a node propagates down to its descendants, overriding their inherited value until they’re explicitly set. * Inheritance walks the tree all the way down to leaf nodes. # Dimension Functions for Expressions and Aggregations > Use function expressions in PlaidCloud dimensions to define calculated values, conditional logic, and dynamic member properties. ## Functions for Use in Dimension Hierarchy Expressions [Section titled “Functions for Use in Dimension Hierarchy Expressions”](#functions-for-use-in-dimension-hierarchy-expressions) Within the Dimension Hierarchy screen it is possible to add ‘Aggregations’ and ‘Expressions’. A description for these is included below. ## Aggregations [Section titled “Aggregations”](#aggregations) An Aggregation is used to display an aggregated value from a table (which can be ‘Sum’, ‘Count’, ‘Min’ or ‘Max’) The following image shows an Aggregation that has been configured to pull values from a ‘Line Item Values’ table so that values can be displayed for each ‘Period’ in the hierarchy. ![Dimension Load](/images/dimensions/expressions/dimension_expressions_1_agg.png) Aggregations can be filtered so that only items matching the filter are displayed. In the following image we have set up the aggregation to show values for a selected item in the ‘Account’ dimension. ![Dimension Load](/images/dimensions/expressions/dimension_expressions_2_agg.png) If these filters are left blank then the data can be filtered by using the dimension filter bar at the top of the screen, as can be seen in the following image: ![Dimension Load](/images/dimensions/expressions/dimension_expressions_3_agg.png) ## Expressions [Section titled “Expressions”](#expressions) Using Expressions it is possible to display values which are calculated based on values from Aggregations displayed for the dimension. Expressions are built using mathematical formulae, which can contain many kinds of operators, and some special functions. see the [list of operators](https://mathjs.org/docs/expressions/syntax.html). The functions available are described below ## Functions [Section titled “Functions”](#functions) ### Column(``) [Section titled “Column(\)”](#columncolumn_name) Fetch a value from a named column for the current row/node. Below we see an example of an Expression being defined to display the result of multiplying the Line Item Value by 2. ![Dimension Load](/images/dimensions/expressions/dimension_expressions_4_func_column.png) ![Dimension Load](/images/dimensions/expressions/dimension_expressions_5_func_column.png) ### Childcount() [Section titled “Childcount()”](#childcount) Returns the number of children for the current row/node. If the current row/node is a leaf item this will return 0. In the following example this is being used to return the average value for the child nodes of a parent node. ![Dimension Load](/images/dimensions/expressions/dimension_expressions_6_func_childcount.png) ![Dimension Load](/images/dimensions/expressions/dimension_expressions_7_func_childcount.png) ### Leafcount() [Section titled “Leafcount()”](#leafcount) Returns the number of leaf items found in the tree for the current row/node. If the current row/node is a leaf item this will return 1. ![Dimension Load](/images/dimensions/expressions/dimension_expressions_8_func_leafcount.png) ### Descendantcount() [Section titled “Descendantcount()”](#descendantcount) Returns the total number of items found in the tree for the current row/node. If the current row/node is a leaf item this will return 0. ![Dimension Load](/images/dimensions/expressions/dimension_expressions_9_func_desccount.png) ### Siblingcount() [Section titled “Siblingcount()”](#siblingcount) Returns the number of sibling items for the current row/node. The value returned includes the current node. ![Dimension Load](/images/dimensions/expressions/dimension_expressions_10_func_siblingcount.png) ### Nodevalue(“``”,“``”) [Section titled “Nodevalue(“\”,“\”)”](#nodevaluenode_namecolumn_name) Returns the value from a named column for a named node. Here’s an example which is used to show the percentage of the “LIV” total for each row/node. ![Dimension Load](/images/dimensions/expressions/dimension_expressions_11_func_nodevalue.png) ![Dimension Load](/images/dimensions/expressions/dimension_expressions_12_func_nodevalue.png) ### Parentvalue(“\`”) [Section titled “Columntextcompare(“\”)”](#columntextcomparecolumn_name-text) Returns a numerical result representing if the text in a named column is greater than, less to, or equal to a provided value. If the text from the column equals the provided text then this function returns 0. If the text from the column is less than the provided text then this function returns -1. If the text from the column is greater than the provided text then this function returns 0. The following example compares the name of the Period to “Jun” ![Dimension Load](/images/dimensions/expressions/dimension_expressions_15_func_textcompare.png) ![Dimension Load](/images/dimensions/expressions/dimension_expressions_16_func_textcompare.png) ## Conditional Expressions [Section titled “Conditional Expressions”](#conditional-expressions) The examples shown above are fairly simplistic. By using conditionals within expressions it is possible to create more complex expressions. Within Expressions conditionals take the following form: `` ? `` : `` e.g ‘12 > 6 ? 1000: 0’ By combining expressions containing both conditionals and functions we can build more complex expressions, such as this example where 100,000 is added to a Line Item Value if the month is “Jun” ![Dimension Load](/images/dimensions/expressions/dimension_expressions_17_cond.png) ![Dimension Load](/images/dimensions/expressions/dimension_expressions_18_cond.png) ## Another Example: Simple Allocation [Section titled “Another Example: Simple Allocation”](#another-example-simple-allocation) This example shows the amount of a parent’s Line Item Value consumed by using the Resource Driver Value for a leaf node. ![Dimension Load](/images/dimensions/expressions/dimension_expressions_19_alloc.png) ![Dimension Load](/images/dimensions/expressions/dimension_expressions_20_alloc.png) ## Limitations: [Section titled “Limitations:”](#limitations) It is currently not possible to build Expressions which are based on values from other Expressions. Expressions can only be built using values from Aggregations. # Getting Started with Dimensions > Get started with PlaidCloud dimensions to organize hierarchical data with attributes, properties, and calculated values. Dimensions are PlaidCloud’s hierarchies — the trees you use to organize cost centers, products, accounts, geography, time periods, or any other rolled-up structure. They’re the foundation that allocations, dashboards, and reports build on. ## When to Use a Dimension [Section titled “When to Use a Dimension”](#when-to-use-a-dimension) Reach for a dimension when you need to: * Roll values up from leaves to parents (sum cost by department, then by region, then by company) * Allocate from one set of rows to another based on a shared hierarchy (spread IT cost across business units) * Slice a dashboard by a structure that’s deeper than a flat list (drill from continent → country → city) * Apply the same grouping logic in multiple places — define the tree once, reference it everywhere If your data is already flat and one-dimensional, you don’t need a dimension. A regular table is fine. ## How Dimensions Are Structured [Section titled “How Dimensions Are Structured”](#how-dimensions-are-structured) Every dimension has one **main hierarchy** — the canonical tree where every leaf is registered exactly once. On top of that, you can layer **alternate hierarchies** that re-roll-up the same leaves under different parents (e.g., the main tree groups cost centers by department, an alternate groups them by region). Each node in the tree can carry properties, aliases, and values, so a hierarchy is more than just a tree of names — it’s a structure with metadata that allocations and reports can reference. For the mechanics of creating, copying, sorting, and managing dimensions, see [Using Dimensions (Hierarchies)](/guides/dimensions/dimensions/). ## How Dimensions Get Loaded [Section titled “How Dimensions Get Loaded”](#how-dimensions-get-loaded) Dimensions are typically populated from a source table via a workflow: * **Parent-Child format** — two columns (parent, child), one edge per row * **Levels format** — one column per hierarchy level, one full path per row See [Loading and unloading dimensions](/guides/dimensions/loading-unloading/) for the load process, and the [Dimension Load](/reference/workflow-steps/dimensions/dimension-load/) workflow step for the full set of options. ## What’s Next [Section titled “What’s Next”](#whats-next) * [Using Dimensions (Hierarchies)](/guides/dimensions/dimensions/) — full management reference * [Function expressions](/guides/dimensions/function-expressions/) — calculated values inside a hierarchy * [Loading and unloading](/guides/dimensions/loading-unloading/) — moving data in and out * [Allocations](/guides/allocations/) — the main consumer of dimension structure # Loading and Unloading Dimensions > Load and unload dimension data in PlaidCloud including bulk imports, data refresh, and synchronization with external data sources. Dimensions can be maintained from workflow operations by loading data. In addition, dimensional data can be flattened into tabular data and stored in tables. This is often useful for enriching reporting and analytics data. ## Loading Dimensions [Section titled “Loading Dimensions”](#loading-dimensions) Since dimensions represent hierarchical data structures, the load process must convey the relationships in the data. PlaidCloud supports two different data structures for loading dimensions: * Parent-Child - The data is organized vertically with a *Parent* column and *Child* column defining each parent of a child throughout the structure * Levels - The data is organized horizontally with each column representing a level in the hierarchy from left to right In addition to structure, other dimension information can be included in the load process such as values, aliases, and properties. See the Workflow Step for [Dimension Load](/reference/workflow-steps/dimensions/dimension-load) for more information. ## Unloading (exporting) Dimensions [Section titled “Unloading (exporting) Dimensions”](#unloading-exporting-dimensions) Exporting dimensions to tables supports two structural approaches: * Parent-Child - The data is organized vertically with a *Parent* column and *Child* column defining each parent of a child throughout the structure * Levels - The data is organized horizontally with each column representing a level in the hierarchy from left to right Properties and values can also be included in the flattened tabular data. See the Workflow Step for [Dimension Export](/reference/workflow-steps/dimensions/dimension-export) for more information. # File Management > Manage file storage accounts in PlaidCloud for importing and exporting data via CSV, Excel, Parquet, and other file formats. PlaidCloud Documents — store, organize, search, and share files alongside your data and workflows. Documents can be sourced from any connected storage account and referenced in workflows. # Account and Access Management > Control access to PlaidCloud document accounts including permissions, backups, ownership settings, and start path configuration. Control access to PlaidCloud document accounts — permissions, backups, ownership transfer, and start-path configuration so each account exposes only the relevant subtree of storage. # Control Document Account Access > Control who can access PlaidCloud document storage accounts by configuring permissions, roles, and access restrictions. Four types of access restrictions are available for an account: Private, Workspace, Member Only, and Security Group. The type of restriction set for a user is editable at any time from the account form. Note None of the account access levels reveal the account credentials used to access the documents. Only account owners can view the credentials. ## Updating Account Access [Section titled “Updating Account Access”](#updating-account-access) 1. Select `Document > Manage Accounts` within PlaidCloud 2. Enter the edit mode on the account you wish to change 3. Select the desired access level restriction located under `Security Model` 4. Select the Save button Note Depending on the selected Security Model, there will be different options for assigning which members or security groups are allowed access from the account list under Manage Accounts. ## Restriction Options [Section titled “Restriction Options”](#restriction-options) ### All Workspace Members [Section titled “All Workspace Members”](#all-workspace-members) This access is the simplest since it provides access to all members of the workspace and does not require any additional assignment of members. ### Specific Members Only [Section titled “Specific Members Only”](#specific-members-only) This access setting requires assignment of each member to an account. This option is particularly useful when combined with the single sign-on option of assigning members based on a list of groups sent with the authentication. However, for workspaces with large numbers of members, this approach can often require more effort than desired, which is where security groups become useful. To choose specific members only: 1. Select the members icon from the Manage Accounts list 2. Drag the desired members from the `Unassigned Members` column on the left, to the `Assigned Members` column on the right 3. To remove members, do the opposite 4. Select the Save button ### Specific Security Groups Only [Section titled “Specific Security Groups Only”](#specific-security-groups-only) With this option, permission to access an account is granted to specific security groups rather than just individuals. With access restrictions relying on association with a security group or groups, the administration of accounts with much larger user counts becomes much simpler. To edit assigned groups: 1. Select the groups icon from the Manage Accounts list 2. Drag the desired groups from the `Unassigned Groups` column on the left, to the `Assigned Groups` column on the right 3. To remove groups, do the opposite 4. Select the Save button ### Remote Agents [Section titled “Remote Agents”](#remote-agents) PlaidLink agents will often use Document accounts to store files or move files among systems. To allow remote agents access to Document accounts, agents MUST have permission granted. This is a security feature to limit unwanted access to potentially sensitive information. To add agents: 1. Select the agent icon from the Manage Accounts list 2. Drag desired agents from the `Unassigned Agents` column on the left, to the `Assigned Agents` column on the right 3. To remove agents, do the opposite 4. Select the Save button # Document Temporary Storage > Manage temporary file storage in PlaidCloud document accounts for intermediate data processing and short-term file staging. Temporary storage may sound counter-intuitive, but real-world use has shown it to be valuable. Typically, permanent storage is used to move large files between members or among other systems, and file cleanup in these storage locations often happens haphazardly, at best. This causes storage to fill with files that shouldn’t be there, eventually requiring manual cleanup. Temporary storage is perfect for sharing or transferring these types of large files because the files are automatically deleted after 24 hours. ## To View Temporary Storage Options [Section titled “To View Temporary Storage Options”](#to-view-temporary-storage-options) 1. Go To the `Document > Temp Share` in PlaidCloud ## Shared Temporary Storage [Section titled “Shared Temporary Storage”](#shared-temporary-storage) Shared temporary storage is viewable by all members of the workspace but is not viewable across workspaces. To access the shared temporary storage area, select the `Temp Share` menu and click `Workspace Temp Share` to display a table of files currently in the workspace’s Temp Share area. ### To Add New Files to a Shared Temporary Storage Location [Section titled “To Add New Files to a Shared Temporary Storage Location”](#to-add-new-files-to-a-shared-temporary-storage-location) 1. Select the `Temp Share` menu along the top of the main Document page 2. Click `Workspace Temp Share` 3. Click `Browse` to browse locally stored items 4. Select the desired file and click `Open` 5. Click `Upload` to upload the file to the temporary storage location ### To Download Existing Files From Temporary Storage [Section titled “To Download Existing Files From Temporary Storage”](#to-download-existing-files-from-temporary-storage) 1. Click on left-most icon, which represents the file type ### To Manually Delete a File [Section titled “To Manually Delete a File”](#to-manually-delete-a-file) 1. Click the red delete icon to the left of the file name. Additional details on file management can be found below under “File Explorer”. ## Personal Temporary Storage [Section titled “Personal Temporary Storage”](#personal-temporary-storage) Personal temporary storage is only viewable by the member to which the temp share belongs. This storage option is beneficial because it’s accessible across workspaces. This functionality makes it easy to move or use files across workspaces if the member is working in multiple workspaces simultaneously. All members of the workspace can upload files to a members personal share as a dropbox. ### To Upload a File to Another Member’s Personal Share: [Section titled “To Upload a File to Another Member’s Personal Share:”](#to-upload-a-file-to-another-members-personal-share) 1. Select the `Temp Share` menu along the top of the main Document page 2. Select `Drop File to Member Temp.` A list of members will be displayed. 3. Click the left-most icon associated with the member of your choosing 4. Click `Browse` to browse locally stored items 5. Select desired file and then click `Open` 6. Click `Upload` to upload the file to the member’s personal storage Additional details on file uploading can be found below under “File Explorer”. # Managing Document Account Backups > Configure and manage backup settings for PlaidCloud document storage accounts to protect your files and ensure data recovery. Document enables the backup of any account on a nightly basis. This feature permits backup across different cloud storage providers and on local systems. Essentially, any account is a valid target for the backup of another account. Note You cannot backup to the same account. The backup process is not limited to a single backup destination. It is possible to have multiple redundant backup locations specified if this is a desired approach. For example, the backup of an internal server to another server may be one location with a second backup sent to Amazon S3 for off-site storage. By using the prefix feature, it’s possible to have a single backup account contain the backups from multiple other accounts. Each account backup set begins its top level folder(s) with a different prefix, making it easy to distinguish the originating location and the restoration process. For example, if you have three different Document accounts but want to set their backup destination to the same location, using a prefix would allow all three accounts to properly backup without the fear of a name collision. ## Reviewing Current Backup Settings [Section titled “Reviewing Current Backup Settings”](#reviewing-current-backup-settings) 1. Go to Document > Manage Accounts 2. Select the backup icon for the account you wish to review ## Creating a Backup Set [Section titled “Creating a Backup Set”](#creating-a-backup-set) 1. Go to Document > Manage Accounts 2. Select the backup icon for the account for which to create a backup 3. Select the `New Backup Set` button 4. Complete the required fields 5. Select the `Create` button The backup process is now scheduled to run nightly (US Time). ## Updating a Backup Set [Section titled “Updating a Backup Set”](#updating-a-backup-set) 1. Go to Document > Manage Accounts 2. Select the backup icon for the account for which to edit a backup 3. Select the edit icon of the desired backup set 4. Adjust the desired information 5. Select the `Update` button ## Deleting a Backup Set [Section titled “Deleting a Backup Set”](#deleting-a-backup-set) 1. Go to Document > Manage Accounts 2. Select the backup icon for the account for which to edit a backup 3. Select the delete icon of the desired backup set 4. Select the `Delete` button Note The backup sets already present will not be deleted but the backup process will no longer run. You can remove the existing backups using Document file and directory management processes. # Managing Document Account Owners > Assign and manage ownership of PlaidCloud document storage accounts to control administrative access and account settings. The member who creates the account is assigned as the owner by default. However, Document accounts are designed to support multiple owners. This feature is helpful when a team is responsible for managing account access or when there is member turnover. Adding and removing owners is similar to adding and removing access permissions. ## Add or Remove Owners [Section titled “Add or Remove Owners”](#add-or-remove-owners) 1. Go to `Document > Management Accounts` in PlaidCloud 2. Select the owners icon in the Manage Accounts list 3. Drag new owners from the `Unassigned Members` column on the left to the `Assigned Members` column on the right 4. To remove owners, do the opposite 5. Select the Save button Because only owners have the ability to view and edit an account, account administration is set up with two levels: * The member needs security access to view and manage accounts in general, and * The member must be an owner of the account to view, manage, and change settings of accounts Note The list of accounts to manage will show a member only the accounts to which they are assigned as an account owner # Using Start Paths in Document Accounts > Configure start paths in PlaidCloud document accounts to control the default directory location when browsing file storage. The account management form allows the configuration of the storage connection information and a start path. A start path allows those who use the account to begin browsing the directory structure further down the directory tree. This particular option is useful when you have multiple teams that need segregated file storage, but you only want one underlying storage service account. The Start Path option in Document accounts is useful for the following reasons: * When controlling access to sub-directories for specific teams and groups * Granting access to only one bucket For example, setting a start path of *teams/team\_1/* for the `Team 1` Document account *and teams/team\_2* for the `Team 2` Document account provides different start points on a shared account. When a member opens the Team 1 Document account they will begin file navigation inside *team/team\_1*. They will not be able to move up the tree and see anything above *teams/team\_1*. Team 2 would have a similar restriction of not being able to navigate into Team 1’s area. This provides the ability to restrict specific teams to lower levels of the tree while allowing other teams higher level access to the tree while not needing any additional cloud storage complexity like additional buckets or special permissions. ## Adding and Updating the Start Path [Section titled “Adding and Updating the Start Path”](#adding-and-updating-the-start-path) 1. Go to Document > Manage Accounts 2. Select the account you wish to edit and enter the edit mode 3. Add a Start Path in the Start Path text field 4. Select the save button ## Start Path Format [Section titled “Start Path Format”](#start-path-format) The path always begins with the bucket name followed by the sub-directories. ```text /folder1/folder2/ ``` # Adding New Document Accounts > Connect cloud and on-prem document storage to PlaidCloud — S3, GCS, Azure Blob, Google Drive, OneDrive, SFTP, WebDAV, and more. Connect cloud and on-prem document storage to PlaidCloud — S3, Google Cloud Storage, Azure Blob, Google Drive, OneDrive, SFTP, WebDAV, and more. Each provider has its own credentials and connection flow. # Add AWS S3 Account > Add an AWS S3 storage account to PlaidCloud for importing and exporting data files using Amazon cloud object storage. ## AWS S3 Setup [Section titled “AWS S3 Setup”](#aws-s3-setup) These steps need to be completed within the AWS console. 1. Sign into or create an Amazon Web Services (AWS) account 2. Go to `All services > Storage > S3` in the console 3. Create a default or test bucket. Note the bucket name and region (e.g. `us-east-1`). 4. Go to `All Services > Security Identity & Compliance > IAM > Users` in the console 5. Select the `Create User` button 6. When prompted, enter a username and select `Access Key - Programmatic access` only. Select the `Next: Permissions` button. 7. Select the option box called `Attach existing policies directly` 8. In the filter search box type `s3`. When the list filters down to S3 related items select `AmazonS3FullAccess` by checking the box to the left. Select the `Next: Tags` button. 9. Skip this step by selecting the `Next: Review` button 10. Review the User settings and select `Create user` 11. Capture the keys generated for the user by downloading the CSV or copy/pasting the keys somewhere for use later. You will not be able to retrieve this key again so keep track of it. If you need to regenerate a key simply go back to step 5 above. You should now have everything you need to add your S3 account to PlaidCloud Document. ## PlaidCloud Document Setup [Section titled “PlaidCloud Document Setup”](#plaidcloud-document-setup) 1. Sign into PlaidCloud 2. Select the workspace that the new Document account will reside 3. Go to `Document > Manage Accounts` 4. Select the `+ New Account` button 5. Select `Amazon S3` as the Service Type 6. Fill in a name and description 7. Enter the bucket name and optional path prefix into the **Start Path** field (e.g. `my-bucket` or `my-bucket/data`). The first path segment is the bucket name. 8. Select an appropriate **Security Model** for your use case. Leave it `Private` if unsure. 9. Paste the **Access Key** created in step 11 above into the Access Key ID field under Auth Credentials 10. Paste the **Secret Key** created in step 11 above into the Secret Access Key field under Auth Credentials 11. Enter the **Region** for your bucket (e.g. `us-east-1`, `eu-west-1`). If left blank, defaults to `us-east-2`. 12. Select the Save button and your new Document account is live # Add Azure Blob Storage Account > Add an Azure Blob Storage account to PlaidCloud for importing and exporting data files using Microsoft Azure cloud object storage. ## Azure Blob Storage Setup [Section titled “Azure Blob Storage Setup”](#azure-blob-storage-setup) These steps need to be completed within the Azure portal. 1. Sign in to the [Azure portal](https://portal.azure.com) 2. Navigate to **Storage accounts** and select or create a storage account 3. In the left sidebar under **Security + networking**, select **Access keys** 4. Copy the **Storage account name** and one of the **Key** values. Save both for the PlaidCloud Document setup below. 5. Navigate to **Containers** under **Data storage** and create a container if one does not already exist. Note the container name. You should now have everything you need to add your Azure Blob Storage account to PlaidCloud Document. ## PlaidCloud Document Setup [Section titled “PlaidCloud Document Setup”](#plaidcloud-document-setup) 1. Sign into PlaidCloud 2. Select the workspace that the new Document account will reside 3. Go to `Document > Manage Accounts` 4. Select the `+ New Account` button 5. Select `Azure Blob Storage` as the Service Type 6. Fill in a name and description 7. Enter the container name and optional path prefix into the **Start Path** field (e.g. `my-container/data`). The first path segment is the container name. 8. Select an appropriate **Security Model** for your use case. Leave it `Private` if unsure. 9. Paste the **Storage account name** into the Account Name field under Auth Credentials 10. Paste the **Key** into the Account Key field under Auth Credentials 11. Select the Save button and your new Document account is live # Add Azure Data Lake Gen2 Account > Add an Azure Data Lake Storage Gen2 account to PlaidCloud for importing and exporting data files using hierarchical namespace storage on Azure. ## Azure Data Lake Gen2 Setup [Section titled “Azure Data Lake Gen2 Setup”](#azure-data-lake-gen2-setup) Azure Data Lake Storage Gen2 is built on top of Azure Blob Storage with a hierarchical namespace enabled. These steps need to be completed within the Azure portal. 1. Sign in to the [Azure portal](https://portal.azure.com) 2. Navigate to **Storage accounts** and select or create a storage account that has **Hierarchical namespace** enabled 3. In the left sidebar under **Security + networking**, select **Access keys** 4. Copy the **Storage account name** and one of the **Key** values. Save both for the PlaidCloud Document setup below. 5. Navigate to **Containers** under **Data storage** and create a filesystem (container) if one does not already exist. Note the filesystem name. You should now have everything you need to add your Azure Data Lake Gen2 account to PlaidCloud Document. ## PlaidCloud Document Setup [Section titled “PlaidCloud Document Setup”](#plaidcloud-document-setup) 1. Sign into PlaidCloud 2. Select the workspace that the new Document account will reside 3. Go to `Document > Manage Accounts` 4. Select the `+ New Account` button 5. Select `Azure Data Lake Gen2` as the Service Type 6. Fill in a name and description 7. Enter the filesystem name and optional path prefix into the **Start Path** field (e.g. `my-filesystem/data`). The first path segment is the filesystem name. 8. Select an appropriate **Security Model** for your use case. Leave it `Private` if unsure. 9. Paste the **Storage account name** into the Account Name field under Auth Credentials 10. Paste the **Key** into the Account Key field under Auth Credentials 11. Select the Save button and your new Document account is live # Add Backblaze B2 Account > Add a Backblaze B2 storage account to PlaidCloud for importing and exporting data files using affordable cloud object storage. ## Backblaze B2 Setup [Section titled “Backblaze B2 Setup”](#backblaze-b2-setup) These steps need to be completed within the Backblaze B2 console. 1. Sign in to the [Backblaze B2 console](https://secure.backblaze.com/b2_buckets.htm) 2. Navigate to **Buckets** and create a bucket if one does not already exist. Note the bucket name. 3. Navigate to **App Keys** 4. Select **Add a New Application Key** 5. Give the key a name, select the bucket it should have access to, and choose the appropriate permissions (read and write) 6. Select **Create New Key** 7. Copy the **keyID** (this is your Access Key) and **applicationKey** (this is your Secret Key). Save both for the PlaidCloud Document setup below. The application key is only shown once. 8. Note the **S3 Endpoint** for your bucket’s region. It follows the pattern `https://s3.{region}.backblazeb2.com` (e.g. `https://s3.us-west-004.backblazeb2.com`). This can be found on the bucket details page. You should now have everything you need to add your Backblaze B2 account to PlaidCloud Document. ## PlaidCloud Document Setup [Section titled “PlaidCloud Document Setup”](#plaidcloud-document-setup) 1. Sign into PlaidCloud 2. Select the workspace that the new Document account will reside 3. Go to `Document > Manage Accounts` 4. Select the `+ New Account` button 5. Select `Backblaze B2` as the Service Type 6. Fill in a name and description 7. Enter the **Start Path** as your S3-compatible endpoint followed by the bucket name: `https://s3.us-west-004.backblazeb2.com/my-bucket` 8. Enter the **Region** for your bucket (e.g. `us-west-004`) 9. Select an appropriate **Security Model** for your use case. Leave it `Private` if unsure. 10. Paste the **keyID** into the Access Key ID field under Auth Credentials 11. Paste the **applicationKey** into the Secret Access Key field under Auth Credentials 12. Select the Save button and your new Document account is live # Add Cloudflare R2 Account > Add a Cloudflare R2 storage account to PlaidCloud for importing and exporting data files using Cloudflare's zero-egress-fee object storage. ## Cloudflare R2 Setup [Section titled “Cloudflare R2 Setup”](#cloudflare-r2-setup) These steps need to be completed within the Cloudflare dashboard. 1. Sign in to the [Cloudflare dashboard](https://dash.cloudflare.com) 2. Select your account, then navigate to **R2 Object Storage** in the left sidebar 3. Create a bucket if one does not already exist. Note the bucket name. 4. Navigate to **R2 Object Storage > Manage R2 API Tokens** 5. Select **Create API Token** 6. Give the token a name, select the bucket(s) it should have access to, and choose **Object Read & Write** permissions 7. Select **Create API Token** 8. Copy the **Access Key ID** and **Secret Access Key**. Save both for the PlaidCloud Document setup below. The secret is only shown once. 9. Note the **S3 API endpoint** for your account. It follows the pattern `https://{account_id}.r2.cloudflarestorage.com` and is shown on the R2 overview page. You should now have everything you need to add your Cloudflare R2 account to PlaidCloud Document. ## PlaidCloud Document Setup [Section titled “PlaidCloud Document Setup”](#plaidcloud-document-setup) 1. Sign into PlaidCloud 2. Select the workspace that the new Document account will reside 3. Go to `Document > Manage Accounts` 4. Select the `+ New Account` button 5. Select `Cloudflare R2` as the Service Type 6. Fill in a name and description 7. Enter the **Start Path** as your R2 endpoint followed by the bucket name: `https://{account_id}.r2.cloudflarestorage.com/my-bucket` 8. The **Region** field can be set to `auto` or left blank — R2 automatically selects the closest region 9. Select an appropriate **Security Model** for your use case. Leave it `Private` if unsure. 10. Paste the **Access Key ID** into the Access Key ID field under Auth Credentials 11. Paste the **Secret Access Key** into the Secret Access Key field under Auth Credentials 12. Select the Save button and your new Document account is live # Add DigitalOcean Spaces Account > Add a DigitalOcean Spaces storage account to PlaidCloud for importing and exporting data files using DigitalOcean's S3-compatible object storage. ## Digitalocean Spaces Setup [Section titled “Digitalocean Spaces Setup”](#digitalocean-spaces-setup) These steps need to be completed within the DigitalOcean control panel. 1. Sign in to the [DigitalOcean Control Panel](https://cloud.digitalocean.com) 2. Navigate to **Spaces Object Storage** in the left sidebar 3. Create a Space if one does not already exist. Note the Space name and region (e.g. `nyc3`). 4. Navigate to **API > Spaces Keys** (under the Tokens section) 5. Select **Generate New Key** 6. Give the key a name 7. Copy the **Key** (Access Key) and **Secret**. Save both for the PlaidCloud Document setup below. The secret is only shown once. 8. Note the endpoint URL for your Space’s region. It follows the pattern `https://{region}.digitaloceanspaces.com` (e.g. `https://nyc3.digitaloceanspaces.com`) You should now have everything you need to add your DigitalOcean Spaces account to PlaidCloud Document. ## PlaidCloud Document Setup [Section titled “PlaidCloud Document Setup”](#plaidcloud-document-setup) 1. Sign into PlaidCloud 2. Select the workspace that the new Document account will reside 3. Go to `Document > Manage Accounts` 4. Select the `+ New Account` button 5. Select `DigitalOcean Spaces` as the Service Type 6. Fill in a name and description 7. Enter the **Start Path** as the endpoint URL followed by the Space name: `https://nyc3.digitaloceanspaces.com/my-space` 8. Enter the **Region** (e.g. `nyc3`, `sfo3`, `ams3`) 9. Select an appropriate **Security Model** for your use case. Leave it `Private` if unsure. 10. Paste the **Key** into the Access Key ID field under Auth Credentials 11. Paste the **Secret** into the Secret Access Key field under Auth Credentials 12. Select the Save button and your new Document account is live # Add FTP Account > Add an FTP (File Transfer Protocol) account to PlaidCloud for importing and exporting data files using traditional FTP servers. ## FTP Server Setup [Section titled “FTP Server Setup”](#ftp-server-setup) Ensure the following are available from your FTP server administrator: 1. The **FTP server URL** (e.g. `ftp://ftp.yourcompany.com` or `ftp://192.168.1.100`) 2. A **username** with access to the target directory 3. A **password** for authentication Note FTP transmits credentials and data in plain text. For production use, consider SFTP instead which encrypts all traffic over SSH. Use FTP only when connecting to legacy systems that do not support SFTP. You should now have everything you need to add your FTP account to PlaidCloud Document. ## PlaidCloud Document Setup [Section titled “PlaidCloud Document Setup”](#plaidcloud-document-setup) 1. Sign into PlaidCloud 2. Select the workspace that the new Document account will reside 3. Go to `Document > Manage Accounts` 4. Select the `+ New Account` button 5. Select `FTP` as the Service Type 6. Fill in a name and description 7. Enter the **FTP server URL** into the **Start Path** field (e.g. `ftp://ftp.yourcompany.com`) 8. Select an appropriate **Security Model** for your use case. Leave it `Private` if unsure. 9. Enter the **username** into the Username field under Auth Credentials 10. Enter the **password** into the Password field under Auth Credentials 11. Select the Save button and your new Document account is live # Add Google Cloud Storage Account > Add a Google Cloud Storage account to PlaidCloud for importing and exporting data files using Google cloud object storage. ## Google Cloud Setup [Section titled “Google Cloud Setup”](#google-cloud-setup) These steps need to be completed within Google Cloud Platform 1. Sign into or create a Google Cloud Platform account 2. Select or create a project where the Google Cloud Storage account will reside 3. Go to `Cloud Storage > Browser` in the Google Cloud Platform console 4. Create a default or test bucket 5. Go To `IAM & Admin > Service Accounts` in the Google Cloud Platform console 6. Select the `+ Create Service Account` button 7. Complete the service account information and create the account 8. Find the service account just created in the list of service accounts and select `Manage Keys` from the context menu on the right 9. Under the `Add Key` menu, select `Create a Key` 10. When prompted, select JSON format for the key. This will generate the key and automatically download it to your desktop. You will not be able to retrieve this key again so keep track of it. If you need to regenerate a key simply go back to step 8 above. 11. Go to `IAM & Admin > IAM` in the Google Cloud Platform console 12. Find the service account you just created and click on the edit permissions icon 13. Add `Storage Admin` and `Storage Transfer Admin` rights for the service account and save. Note less permissive rights can be assigned but this will impact the functionality available through Document. You should now have everything you need to add your GCS account to PlaidCloud Document. ## PlaidCloud Document Setup [Section titled “PlaidCloud Document Setup”](#plaidcloud-document-setup) 1. Sign into PlaidCloud 2. Select the workspace that the new Document account will reside 3. Go to `Document > Manage Accounts` 4. Select the `+ New Account` button 5. Select `Google Cloud Storage` as the Service Type 6. Fill in a name and description 7. Enter the bucket name and optional path prefix into the **Start Path** field (e.g. `my-bucket` or `my-bucket/data`). The first path segment is the bucket name. 8. Select an appropriate **Security Model** for your use case. Leave it `Private` if unsure. 9. Open the Service Account JSON key file you downloaded in step 10 above and copy the entire contents 10. Paste the contents into the **System User JSON Key** field under Auth Credentials 11. Select the Save button and your new Document account is live # Add Google Drive Account > Add a Google Drive storage account to PlaidCloud for importing and exporting data files using Google Drive cloud storage. ## Google Drive Setup [Section titled “Google Drive Setup”](#google-drive-setup) These steps need to be completed within the Google Cloud Console to create a service account that PlaidCloud can use to access Google Drive. ### Create a Google Cloud Project [Section titled “Create a Google Cloud Project”](#create-a-google-cloud-project) 1. Sign in to the [Google Cloud Console](https://console.cloud.google.com) 2. Create a new project or select an existing one ### Enable the Google Drive API [Section titled “Enable the Google Drive API”](#enable-the-google-drive-api) 1. Navigate to **APIs & Services > Library** 2. Search for **Google Drive API** 3. Select it and click **Enable** ### Create a Service Account [Section titled “Create a Service Account”](#create-a-service-account) 1. Navigate to **APIs & Services > Credentials** 2. Select **+ Create Credentials > Service account** 3. Enter a name for the service account (e.g. `plaidcloud-drive`) 4. Select **Create and Continue** 5. Optionally grant roles (e.g. **Viewer** for read-only, or **Editor** for read/write). Select **Continue**. 6. Select **Done** ### Generate a Service Account Key [Section titled “Generate a Service Account Key”](#generate-a-service-account-key) 1. In the **Service Accounts** list, select the service account you just created 2. Navigate to the **Keys** tab 3. Select **Add Key > Create new key** 4. Choose **JSON** format and select **Create** 5. A JSON key file will download. Save this file securely — it contains the credentials PlaidCloud will use. ### Share Drive Content With the Service Account [Section titled “Share Drive Content With the Service Account”](#share-drive-content-with-the-service-account) 1. Copy the service account’s email address (e.g. `plaidcloud-drive@your-project.iam.gserviceaccount.com`) 2. In Google Drive, share the folder(s) you want PlaidCloud to access with this email address, granting **Editor** access You should now have everything you need to add your Google Drive account to PlaidCloud Document. ## PlaidCloud Document Setup [Section titled “PlaidCloud Document Setup”](#plaidcloud-document-setup) 1. Sign into PlaidCloud 2. Select the workspace that the new Document account will reside 3. Go to `Document > Manage Accounts` 4. Select the `+ New Account` button 5. Select `Google Drive` as the Service Type 6. Fill in a name and description 7. Enter the shared folder path into the **Start Path** field, or leave it blank to access all shared content 8. Select an appropriate **Security Model** for your use case. Leave it `Private` if unsure. 9. Paste the entire contents of the JSON key file into the OAuth2 Credentials JSON field under Auth Credentials 10. Select the Save button and your new Document account is live # Add Linode Object Storage Account > Add a Linode (Akamai) Object Storage account to PlaidCloud for importing and exporting data files using Linode's S3-compatible cloud storage. ## Linode Object Storage Setup [Section titled “Linode Object Storage Setup”](#linode-object-storage-setup) These steps need to be completed within the Linode Cloud Manager. 1. Sign in to the [Linode Cloud Manager](https://cloud.linode.com) 2. Navigate to **Object Storage** in the left sidebar 3. Create a bucket if one does not already exist. Note the bucket name (called **label**) and region (e.g. `us-east-1`). 4. Navigate to **Object Storage > Access Keys** 5. Select **Create Access Key** 6. Give the key a label and select the bucket(s) it should have access to with read/write permissions 7. Select **Create Access Key** 8. Copy the **Access Key** and **Secret Key**. Save both for the PlaidCloud Document setup below. The secret is only shown once. 9. Note the endpoint URL for your bucket’s region. It follows the pattern `https://{region}.linodeobjects.com` (e.g. `https://us-east-1.linodeobjects.com`) You should now have everything you need to add your Linode Object Storage account to PlaidCloud Document. ## PlaidCloud Document Setup [Section titled “PlaidCloud Document Setup”](#plaidcloud-document-setup) 1. Sign into PlaidCloud 2. Select the workspace that the new Document account will reside 3. Go to `Document > Manage Accounts` 4. Select the `+ New Account` button 5. Select `Linode Object Storage` as the Service Type 6. Fill in a name and description 7. Enter the **Start Path** as the endpoint URL followed by the bucket name: `https://us-east-1.linodeobjects.com/my-bucket` 8. Enter the **Region** (e.g. `us-east-1`) 9. Select an appropriate **Security Model** for your use case. Leave it `Private` if unsure. 10. Paste the **Access Key** into the Access Key ID field under Auth Credentials 11. Paste the **Secret Key** into the Secret Access Key field under Auth Credentials 12. Select the Save button and your new Document account is live # Add MinIO Account > Add a MinIO storage account to PlaidCloud for importing and exporting data files using self-hosted S3-compatible object storage. ## Minio Setup [Section titled “Minio Setup”](#minio-setup) These steps need to be completed within the MinIO Console or via the `mc` CLI. 1. Sign in to the MinIO Console (e.g. `https://your-minio-host:9001`) 2. Navigate to **Buckets** and create a bucket if one does not already exist. Note the bucket name. 3. Navigate to **Identity > Users** (or **Access Keys**) 4. Create a new user or service account with read/write access to the target bucket 5. Copy the **Access Key** and **Secret Key** generated for the user. Save both for the PlaidCloud Document setup below. 6. Note the MinIO endpoint URL (e.g. `https://play.min.io` or `https://minio.yourcompany.com`) You should now have everything you need to add your MinIO account to PlaidCloud Document. ## PlaidCloud Document Setup [Section titled “PlaidCloud Document Setup”](#plaidcloud-document-setup) 1. Sign into PlaidCloud 2. Select the workspace that the new Document account will reside 3. Go to `Document > Manage Accounts` 4. Select the `+ New Account` button 5. Select `MinIO` as the Service Type 6. Fill in a name and description 7. Enter the **Start Path** as your MinIO endpoint URL followed by the bucket name and optional prefix: `https://minio.yourcompany.com/my-bucket/optional/prefix` 8. Enter the **Region** if your MinIO deployment uses regions; otherwise leave blank 9. Select an appropriate **Security Model** for your use case. Leave it `Private` if unsure. 10. Paste the **Access Key** into the Access Key ID field under Auth Credentials 11. Paste the **Secret Key** into the Secret Access Key field under Auth Credentials 12. Select the Save button and your new Document account is live # Add OneDrive Account > Add a Microsoft OneDrive storage account to PlaidCloud for importing and exporting data files using OneDrive cloud storage. ## Onedrive Setup [Section titled “Onedrive Setup”](#onedrive-setup) These steps need to be completed within the Azure portal to register an application and obtain the credentials PlaidCloud needs to access OneDrive. ### Register an Application in Azure [Section titled “Register an Application in Azure”](#register-an-application-in-azure) 1. Sign in to the [Azure portal](https://portal.azure.com) and navigate to **Microsoft Entra ID**. 2. In the left sidebar, select **App registrations**. 3. Click **+ New registration**. 4. Enter a name for the application (e.g., `PlaidCloud`). 5. Under **Supported account types**, select the option that matches your organization: * **Accounts in this organizational directory only** — for a single-tenant setup (most common) * **Accounts in any organizational directory** — for multi-tenant access 6. Leave the **Redirect URI** blank. 7. Click **Register**. ### Copy the Client ID and Tenant ID [Section titled “Copy the Client ID and Tenant ID”](#copy-the-client-id-and-tenant-id) After registration, you will land on the application overview page. 1. Copy the **Application (client) ID** — this is your **Client ID**. Save it for the PlaidCloud Document setup below. 2. Copy the **Directory (tenant) ID** — this is your **Tenant ID**. Save it as well. Both values are displayed on the application overview page immediately after registration. ### Create a Client Secret [Section titled “Create a Client Secret”](#create-a-client-secret) 1. In the left sidebar, select **Certificates & secrets** under **Manage**. 2. Click **+ New client secret**. 3. Enter a description (e.g., `PlaidCloud`) and choose an expiration period. 4. Click **Add**. 5. Copy the **Value** of the newly created secret immediately — this is your **Client Secret**. It will not be shown again after you leave this page. ### Grant API Permissions [Section titled “Grant API Permissions”](#grant-api-permissions) 1. In the left sidebar, select **API permissions** under **Manage**. 2. Click **+ Add a permission**. 3. Select **Microsoft Graph**. 4. Add the following permissions, selecting the type indicated for each: * `Directory.ReadWrite.All` (Application) — Read and write directory data * `Files.Read.All` (Application) — Read files in all site collections * `Files.ReadWrite.All` (Application) — Read and write files in all site collections * `Sites.ReadWrite.All` (Application) — Read and write items in all site collections * `User.Read` (Delegated) — Sign in and read user profile * `User.Read.All` (Application) — Read all users’ full profiles 5. Click **Add permissions**. 6. Click **Grant admin consent for \[your organization]** and confirm. *** ### Find the Onedrive Drive Path (start Path) [Section titled “Find the Onedrive Drive Path (start Path)”](#find-the-onedrive-drive-path-start-path) The **Start Path** in PlaidCloud Document controls which drive or folder in OneDrive is used as the root for the account. In the most common scenario, the registered application has access to multiple drives or SharePoint sites. In this case the Start Path must begin with the name of the drive or site. For most OneDrive for Business accounts this is simply: ```text Documents ``` To target a specific subfolder within that drive, append the folder path: ```text Documents/Finance Documents/Shared/Data ``` Note If the application only has access to a single drive, the Start Path can be left blank to use the root of that drive. When in doubt, start with `Documents` as the drive name. *** You should now have everything you need to add your OneDrive account to PlaidCloud Document. ## PlaidCloud Document Setup [Section titled “PlaidCloud Document Setup”](#plaidcloud-document-setup) 1. Sign into PlaidCloud 2. Select the workspace that the new Document account will reside 3. Go to `Document > Manage Accounts` 4. Select the `+ New Account` button 5. Select `OneDrive` as the Service Type 6. Fill in a name and description 7. Enter the folder path identified above into the **Start Path** field, or leave it blank to use the root of the drive 8. Select an appropriate **Security Model** for your use case. Leave it `Private` if unsure. 9. Paste the **Client ID** copied from the Azure app registration into the Public Key/User text field under Auth Credentials 10. Paste the **Client Secret** copied from the Azure app registration into the Private Key/Password text field under Auth Credentials 11. Paste the **Tenant ID** copied from the Azure app registration into the Tenant ID field under Auth Credentials 12. Select the Save button and your new Document account is live # Add SFTP Account > Add an SFTP (Secure File Transfer Protocol) account to PlaidCloud for importing and exporting data files using SSH-based file transfer. ## SFTP Server Setup [Section titled “SFTP Server Setup”](#sftp-server-setup) Ensure the following are available from your SFTP server administrator: 1. The **hostname or IP address** of the SFTP server 2. The **SSH port** (default is `22`) 3. A **username** with access to the target directory 4. Either a **password** or an **SSH private key** for authentication 5. Optionally, the **server’s SSH host key fingerprint** for strict host verification You should now have everything you need to add your SFTP account to PlaidCloud Document. ## PlaidCloud Document Setup [Section titled “PlaidCloud Document Setup”](#plaidcloud-document-setup) 1. Sign into PlaidCloud 2. Select the workspace that the new Document account will reside 3. Go to `Document > Manage Accounts` 4. Select the `+ New Account` button 5. Select `Secure File Transfer (SFTP)` as the Service Type 6. Fill in a name and description 7. Enter the remote directory path into the **Start Path** field (e.g. `/data/uploads`) 8. Select an appropriate **Security Model** for your use case. Leave it `Private` if unsure. 9. Enter the **username** into the Public Key/User field under Auth Credentials 10. Enter the **password** into the Private Key/Password field under Auth Credentials 11. Navigate to the **SSH Config** tab 12. Enter the **Host or IP Address** of the SFTP server 13. Enter the **SSH Connection Port** (default `22`) 14. If using key-based authentication instead of a password, paste the **RSA Private Key** into the RSA Private Key field. When a private key is provided, it takes precedence over the password. 15. Optionally paste the **Remote Server RSA Fingerprint** for strict host key verification. Leave blank to auto-fill on first connection. 16. Select the Save button and your new Document account is live Note You can test the SSH connection using the **Test SSH Connection** button on the SSH Config tab before saving. # Add Wasabi Hot Storage Account > Add a Wasabi Hot Storage account to PlaidCloud for importing and exporting data files using cost-effective cloud storage. ## Wasabi Hot Storage Setup [Section titled “Wasabi Hot Storage Setup”](#wasabi-hot-storage-setup) These steps need to be completed within the Wasabi Hot Storage console 1. Sign into or create a Wasabi Hot Storage account 2. Go to `Buckets` in the console 3. Create a default or test bucket 4. Go to Users in the console 5. Select the `Create User` button 6. When prompted, enter a username and select `Programmatic (create API key)` user 7. Skip the group assignment. Select the `Next` button 8. Select the plus icon next to the `WasabiFullAccess` policy to attach the policy to the user. Select the `Next` button. 9. Review the User settings and select `Create User` 10. Capture the keys generated for the user by downloading the CSV or copy/pasting the keys somewhere for use later. You will not be able to retrieve this key again so keep track of it. If you need to regenerate a key simply go back to step 5 above. You should now have everything you need to add your Wasabi account to PlaidCloud Document. ## PlaidCloud Document Setup [Section titled “PlaidCloud Document Setup”](#plaidcloud-document-setup) 1. Sign into PlaidCloud 2. Select the workspace that the new Document account will reside 3. Go to `Document > Manage Accounts` 4. Select the `+ New Account` button 5. Select `Wasabi Hot Storage` as the Service Type 6. Fill in a name and description 7. Enter the bucket name and optional path prefix into the **Start Path** field (e.g. `my-bucket` or `my-bucket/data`). The first path segment is the bucket name. 8. Select an appropriate **Security Model** for your use case. Leave it `Private` if unsure. 9. Paste the **Access Key** created in step 10 above into the Access Key ID field under Auth Credentials 10. Paste the **Secret Key** created in step 10 above into the Secret Access Key field under Auth Credentials 11. Enter the **Region** if your Wasabi account uses a specific region; otherwise leave blank 12. Select the Save button and your new Document account is live # Add WebDAV Account > Add a WebDAV storage account to PlaidCloud for importing and exporting data files using WebDAV-compatible servers such as Nextcloud, ownCloud, or Apache. ## Webdav Server Setup [Section titled “Webdav Server Setup”](#webdav-server-setup) Ensure the following are available from your WebDAV server administrator: 1. The **WebDAV endpoint URL** (e.g. `https://nextcloud.yourcompany.com/remote.php/dav/files/username/`) 2. A **username** with access to the target directory 3. A **password** or app-specific password for authentication Note Many cloud services expose a WebDAV interface. For example, Nextcloud uses `https://your-server/remote.php/dav/files/{username}/` and ownCloud uses `https://your-server/remote.php/webdav/`. Check your provider’s documentation for the correct URL. You should now have everything you need to add your WebDAV account to PlaidCloud Document. ## PlaidCloud Document Setup [Section titled “PlaidCloud Document Setup”](#plaidcloud-document-setup) 1. Sign into PlaidCloud 2. Select the workspace that the new Document account will reside 3. Go to `Document > Manage Accounts` 4. Select the `+ New Account` button 5. Select `WebDAV` as the Service Type 6. Fill in a name and description 7. Enter the full **WebDAV endpoint URL** into the **Start Path** field (e.g. `https://nextcloud.yourcompany.com/remote.php/dav/files/username/`) 8. Select an appropriate **Security Model** for your use case. Leave it `Private` if unsure. 9. Enter the **username** into the Username field under Auth Credentials 10. Enter the **password** into the Password field under Auth Credentials 11. Select the Save button and your new Document account is live # Searching Documents > Find files in a PlaidCloud Document account using inline search with live progress, advanced filters, and reveal-in-folder. ## Description [Section titled “Description”](#description) The Document browser includes a search bar at the top of the file list. Searches run live against the connected backend (S3, Azure Blob, Google Drive, OneDrive, etc.) so results reflect the current state of the account, not a stale index. ## Run a Search [Section titled “Run a Search”](#run-a-search) 1. Open a Document account 2. Click in the **Search files…** field at the top of the file list 3. Type a name pattern 4. Watch results stream into the file list Press `Esc` or click the clear icon at the end of the field to exit search mode and return to the regular file list. The status line below the field shows how many folders have been scanned and how many matches have been found. ## Advanced Filters [Section titled “Advanced Filters”](#advanced-filters) Click `Advanced` next to the search field to show the filter row. Combine any of these filters with the name pattern: * **Ext:** — comma-separated extensions, e.g. `pdf,xlsx`. No spaces. * **Kind:** — `Files & folders` (default), `Files only`, or `Folders only`. * **Size (bytes):** — minimum and/or maximum size. * **Modified:** — on/after date and on/before date, both as `YYYY-MM-DD`. Filters apply on top of the name pattern. Click `Clear` to wipe the query and all filters. ## Highlighting and Reveal [Section titled “Highlighting and Reveal”](#highlighting-and-reveal) * While the search is running, results stream in arrival order so you can act on early matches without waiting. * Once the stream completes, the file list re-sorts by name and the matched substring is highlighted in each row. * Right-click any result and select `Reveal in folder` to jump to the file in its containing directory and exit search mode. ## Searching Across Backends [Section titled “Searching Across Backends”](#searching-across-backends) Search uses the storage backend’s own search API where one is available, so results match what you’d see in the backend’s native UI: * **Google Drive** — uses the Drive API. * **OneDrive / SharePoint** — uses Microsoft Graph. For object stores (S3 and S3-compatible, Azure Blob, GCS, etc.) PlaidCloud crawls the configured paths in parallel. Note Each user is rate-limited to a small number of concurrent searches per account so a heavy search won’t starve other users. If you hit the limit the search bar reports an error — wait for one of your other searches to finish and try again. # Using Document Accounts > Browse, upload, search, and manage files in PlaidCloud Document storage using the two-pane file explorer. The Document browser is a two-pane split view: the folder tree on the left, the file list on the right. Most operations are available from a right-click menu in either pane. The right-click menu shows different options depending on whether a folder, a file, or empty space is selected. The root of an account is itself a viewable, droppable folder — clicking it shows everything at the top level and accepts dropped uploads. **To open the file explorer:** 1. Go to **Document > Shared Accounts** (or **Private Accounts**) 2. Click the folder icon (far left) for the account you want to explore The various file and folder operations are detailed below. ## Upload a File [Section titled “Upload a File”](#upload-a-file) **Drag-and-drop:** 1. Browse to the desired directory 2. Drag one or more files from your desktop onto the file list **From the right-click menu:** 1. Browse to the desired directory 2. Right-click in the file list and select `Upload Here` 3. Pick the files in the OS file picker **From the toolbar:** 1. Browse to the desired directory 2. Click the `Upload` button on the right pane Note Multiple files can be uploaded at once. If a target name already exists, you will be prompted to confirm overwrite or cancel. ## Download a File [Section titled “Download a File”](#download-a-file) 1. Browse to the desired directory 2. Left-click to select the desired file 3. Right-click and select `Download` ## Rename a File [Section titled “Rename a File”](#rename-a-file) 1. Browse to the desired directory 2. Left-click to select the desired file 3. Right-click and select `Rename` ## Move a File [Section titled “Move a File”](#move-a-file) 1. Browse to the desired directory 2. Left-click to select the desired file 3. Drag into the destination folder 4. Select `Move File` ## Copy a File [Section titled “Copy a File”](#copy-a-file) 1. Browse to the desired directory 2. Left-click to select the desired file 3. Right-click and select `Copy` ## Delete a File [Section titled “Delete a File”](#delete-a-file) 1. Browse to the desired directory 2. Left-click to select the desired file 3. Right-click and select `Delete` ## Create a Folder [Section titled “Create a Folder”](#create-a-folder) 1. Open the account 2. Click `New Top Level Folder` (or right-click an existing folder and select `New Folder`) 3. Enter a folder name 4. Click `Create` ## Rename a Folder [Section titled “Rename a Folder”](#rename-a-folder) 1. Browse to the desired directory 2. Left-click to select the desired folder 3. Right-click and select `Rename` ## Move a Folder [Section titled “Move a Folder”](#move-a-folder) 1. Browse to the desired directory 2. Left-click to select the desired folder 3. Drag into the destination folder 4. Select `Move Folder` ## Delete a Folder [Section titled “Delete a Folder”](#delete-a-folder) 1. Browse to the desired directory 2. Left-click to select the desired folder 3. Right-click and select `Delete` ## Download Folder Contents (zip File) [Section titled “Download Folder Contents (zip File)”](#download-folder-contents-zip-file) The `Download as Zip` option compresses every file under the selected folder into a single `.zip` and downloads it. The archive preserves the folder structure shown in the explorer. 1. Browse to the desired directory 2. Left-click to select the desired folder 3. Right-click and select `Download as ZIP` ## Read-Only Accounts [Section titled “Read-Only Accounts”](#read-only-accounts) If an account is marked read-only, upload affordances (drag-prompts, the toolbar `Upload` button, and the `Upload Here` context menu) are hidden automatically. Browse, download, and search still work. ## Search [Section titled “Search”](#search) For finding files across an account — including across many subfolders or across multiple connected accounts — see [Searching Documents](../searching-documents/). # Email > View sent transactional email and bounces from PlaidCloud, backed by your workspace's email virtual server. The **Email** area in the **Tools** menu shows the transactional email PlaidCloud sends on your workspace’s behalf. It is backed by your email virtual server. # Using the Email Area > Browse sent transactional email and bounces, filter by stream, recipient, status, type, or tag, and reactivate bounced recipients. ## Description [Section titled “Description”](#description) Open the area from **Tools > Email**. The page is split into two panels: **Sent Email** and **Bounces**. The stream selector at the top picks which Postmark stream to view; it defaults to the first transactional stream configured for your workspace. ## Sent Email [Section titled “Sent Email”](#sent-email) The **Sent Email** panel shows messages PlaidCloud has sent on your workspace’s behalf. Columns include `Sent At`, `Recipient`, `Subject`, `Status`, `Stream`, `Tag`, `Opens`, and `Clicks`. **To filter sent email:** 1. Open **Tools > Email > Sent Email** 2. Type into **Recipient** to search by To address (substring match) 3. Pick a **Status** to narrow by email delivery status (delivered, opened, etc.) 4. Pick a **Tag** to narrow to a specific message tag 5. Click `Apply` Click `Clear` to wipe the filters. Click any row to open a **Message Details** window with the full message metadata returned by Postmark. ## Bounces [Section titled “Bounces”](#bounces) The **Bounces** panel shows delivery failures returned by Postmark. Columns include `Bounced At`, `Recipient`, `Type`, `Inactive`, `Description`, and `Stream`. **To filter bounces:** 1. Open **Tools > Email > Bounces** 2. Type into **Recipient** to search by To address (substring match) 3. Pick a **Type** to narrow by bounce type (HardBounce, SoftBounce, Transient, etc.) 4. Pick a **Tag** to narrow to a specific message tag 5. Tick **Inactive only** to show only recipients Postmark has marked inactive 6. Click `Apply` Click `Clear` to wipe the filters. Click any row to open the bounce details. ## Reactivate a Bounced Recipient [Section titled “Reactivate a Bounced Recipient”](#reactivate-a-bounced-recipient) When Postmark marks a recipient inactive (typically after a hard bounce), no further mail is sent to that address until the recipient is reactivated. 1. Filter to **Inactive only** 2. Select the recipient row 3. Click `Reactivate` After reactivation, future PlaidCloud-generated mail to that address will be attempted again. ## Paging [Section titled “Paging”](#paging) Both panels page through history rather than loading every message at once. Use `Prev` and `Next` to move through pages; the label between them shows your position (`1–25 of 837`). Note The Email area is for inspection and recipient reactivation only. Replying to messages, configuring servers, and managing templates still happen in the Postmark dashboard. # Panel Apps > Build and deploy interactive Holoviz Panel data applications natively within PlaidCloud for custom dashboards and data tools. Build custom interactive applications on top of PlaidCloud data using the Panel framework — parameterized inputs, live charts, and embedded data tables. # Creating and Registering Panel Apps in Plaidcloud > Create, load, and register Holoviz Panel applications in PlaidCloud for interactive data visualization and custom tool deployment. ## Description [Section titled “Description”](#description) Documentation coming soon… # Using Panel Apps in Plaidcloud > Access and use deployed Holoviz Panel applications in PlaidCloud for interactive data exploration and custom analytics tools. ## Description [Section titled “Description”](#description) Documentation coming soon… # Projects > Set up and manage PlaidCloud projects to organize workflows, tables, data imports, and other analysis objects by purpose. A **project** is the unit of work in PlaidCloud. Each project owns its own data, workflows, dimensions, and audit history. Projects don’t share state with each other — they’re isolated, which makes them the natural boundary for separating distinct analyses, business processes, or data products. Most teams start with one project per analytical area: a project for headcount cost allocation, another for revenue analysis, another for monthly close — whatever maps to how your team thinks about its work. ## What’s in This Section [Section titled “What’s in This Section”](#whats-in-this-section) * [Manage projects](/guides/projects/managing-projects/) — create, configure, and organize projects in your workspace * [View projects](/guides/projects/viewing-projects/) — find and open existing projects * [Manage hierarchies](/guides/projects/managing-hierarchies/) — folder structure for organizing many projects * [Manage tables and views](/guides/projects/managing-tables-and-views/) — the data layer inside a project * [Manage data editors](/guides/projects/managing-data-editors/) — who can modify project data * [View the project log](/guides/projects/viewing-the-project-log/) — audit trail of changes * [Archive a project](/guides/projects/archive-a-project/) — preserve completed work without deleting * [Compare and merge projects](/guides/projects/compare-and-merge-projects/) — diff two projects and selectively copy changes between them ## Related [Section titled “Related”](#related) * [Concepts](/get-started/concepts/) — how projects relate to workspaces, members, and the broader data model * [Workflows](/guides/workflows/) — the automation primitive that lives inside a project * [Access management](/administration/access/) — workspace-level controls that govern who can see and edit projects # Archive a Project > Archive PlaidCloud projects to preserve completed work, free up workspace resources, and maintain a clean project environment. ## Creating an Archive [Section titled “Creating an Archive”](#creating-an-archive) Projects normally contain critical processes and logic, which are important to archive. If you ever need to restore the project to a specific state, having archives is essential. PlaidCloud allows you to archive projects at any point in time. Creation of archives complements the built-in point-in-time tracking of PlaidCloud by allowing for specific points in time to be captured. This might be particularly useful before a major change or to capture the exact state of a production environment for posterity. **Full backup**: This includes all the data tables included in a project. The archive may be quite large, depending on the volume of data in the project. **Partial backup:** This can be used if all of the project data can be derived from other sources. If this is the case, it is not necessary to archive the data in the project and have it remain elsewhere. Partial archives save time and storage space when creating the archive. To archive a project: 1. Open Analyze 2. Select the “Projects” tab ## Restoring an Archive [Section titled “Restoring an Archive”](#restoring-an-archive) Once you have an archive, you may want to restore it. You can restore an archive into a new project or into an existing project. To restore an archive: 1. Open Analyze 2. Select the “Projects” tab ## Archiving Schedule [Section titled “Archiving Schedule”](#archiving-schedule) Archives can also serve as a periodic backup of your project. PlaidCloud allows you to manage the backup schedule and set the retention period of the backup archives to whatever is most convenient or desired. Since all changes to a project are automatically tracked, archiving is not necessary for rollback purposes. However, it does provide specific snapshots of the project state, which is often useful for control purposes and/or having the ability to recover to a known point. To set an archiving schedule: 1. Open Analyze 2. Select the “Projects” tab 3. Click the backup icon 4. Choose a directory destination in a **Document** account 5. Choose the backup frequency and retention 6. Choose which items to backup 7. Click “Update” # Compare and Merge Projects > Compare two analyze projects side by side — workflows, steps, variables, dimensions, and more — and selectively merge changes from one project into another. Compare lets you see exactly what differs between two projects and, for the parts that can be safely copied, merge selected changes from one into the other. It’s built for two everyday situations: * **Promote changes between environments** — review what’s different between a QA (development) project and its Production counterpart, then move just the changes you want into Production. * **Review against an earlier snapshot or copy** — compare a project to a copy of itself (for example, a month-over-month clone) to see what changed. You stay in control the whole way: nothing is written until you select specific items and apply them, and you can validate first with a dry run. ## How Matching Works [Section titled “How Matching Works”](#how-matching-works) To compare two projects, PlaidCloud has to decide which item on the left corresponds to which item on the right. It picks the strategy automatically: * **Copies and snapshots of the same project** are matched by their internal id (a clone keeps the same ids), so even renamed items line up precisely. * **Independent projects** — such as a QA project and a separately built Production project — share no ids, so they’re matched **by name**. When this happens, items are tagged `(name-matched)`. Name matching is what makes a QA-to-Production comparison useful, since the two projects were built separately. It also has limits: items with duplicate names can’t always be paired one-to-one, and an item flagged `ambiguous match` means more than one candidate shared the same name — review those by hand before merging. ## Opening a Comparison [Section titled “Opening a Comparison”](#opening-a-comparison) 1. Open **Analyze** 2. Select **Projects** from the top menu bar 3. Right-click the project you want to start from (or use its actions menu) and choose **Compare to…** 4. In the **Compare Projects** window, the project you started from is filled in as the **Source** 5. Enter the **Target** project id — the project you want to compare against 6. Click **Compare** Note Use **Swap** to reverse the Source and Target before comparing — the direction matters, because a merge copies *from* Source *into* Target. Both projects must be in the same workspace. ## Reading the Comparison [Section titled “Reading the Comparison”](#reading-the-comparison) The left side groups every difference under its category: **Workflows** (with their **Steps** nested underneath), **Variables**, **Dimensions**, **Editors**, **UDFs**, **Views**, and **Tables**. Each item is badged with its status: * **Added** — present in Source but not in Target (it would be created in Target) * **Modified** — present in both, but the configuration differs * **Deleted** — present in Target but not in Source * **Unchanged** — identical on both sides (hidden by default) Select any item to see its details on the right: a field-by-field **Config diff** and the full text difference. If an item’s diff is very large it’s truncated — click **Expand full diff** to load the complete version. ### Focusing the List [Section titled “Focusing the List”](#focusing-the-list) * **Filter by name or path** narrows the tree to matching items. * The status checkboxes show or hide **Added**, **Modified**, **Deleted**, and **Unchanged** items. * **Needs manual attention** shows only the changes that can’t be merged automatically (see below). ## What Can Be Merged [Section titled “What Can Be Merged”](#what-can-be-merged) Not every difference can be safely copied between projects. The comparison labels each item so you always know what’s mergeable: * **Workflows, Steps, and Variables** can be merged. * **Dimensions, Editors, UDFs, Views, and Tables** are **view-only** — they appear in the comparison so you can review them, but you copy them across using their own dedicated tools, not from here. Some individual changes are also marked **manual** (with a lock) even in a mergeable category, because applying them automatically wouldn’t be safe. The detail pane explains why and what to do instead. The most common cases: * **Steps in a name-matched (QA vs Production) comparison** are review-only — a step’s configuration can reference environment-specific connections and agents, so it’s copied by hand. Use the diff to see precisely what to change in the Target. * **Variable deletion** isn’t applied for you — remove the variable in the Target project by hand. The summary line tells you the split at a glance, for example *“12 mergeable · 5 view-only”* and *“3 need manual attention.”* ## Merging Changes [Section titled “Merging Changes”](#merging-changes) 1. Select the items you want to copy from Source into Target. Use **Select all** to pick every mergeable item currently shown, or multi-select individual rows. View-only and manual items can’t be selected. 2. Click **Dry run** to validate the selection without writing anything. PlaidCloud reports how many operations would apply (new vs. updated) and flags any that would fail. 3. When the dry run looks right, click **Apply** (either from the dry-run result or the footer). 4. Confirm the operation in the **Confirm merge** dialog. Merged items keep their identity in the Target, so you can run the comparison again later and merge further changes without creating duplicates. Caution A merge copies *into* the Target project and changes it. Review the diff and run a dry run first, and make sure Source and Target are the right way around. ### If Something Goes Wrong [Section titled “If Something Goes Wrong”](#if-something-goes-wrong) * **Target changed** — if someone else modified the Target after you loaded the comparison, the merge stops to avoid overwriting their work. Refresh the comparison and reapply. * **Merge partially applied** — if one operation fails partway through, the rest are reported as remaining. Fix the cause and choose **Retry remaining**, or cancel and refresh. ## Related [Section titled “Related”](#related) * [Manage projects](/guides/projects/managing-projects/) — versioning and point-in-time tracking within a single project * [Archive a project](/guides/projects/archive-a-project/) — capture a point-in-time snapshot you can compare against later * [Workflows](/guides/workflows/) — the automation primitive a comparison merges between projects # Managing Data Editors > Manage data editor assignments in PlaidCloud projects to control who can modify table data directly through the data interface. PlaidCloud offers the ability to organize and manage data editors, including labels. Data Editors allow editing table data or creating data by user interaction. PlaidCloud uses a path-based system to organize data editors, like you would use to navigate a series of folders, allowing for a more flexible and logical organization (control hierarchy) of the data editors. Using this system, data editors can move within a control hierarchy. Multiple references to one data editor from different locations in the control hierarchy (alternate hierarchies) can be created. The ability to manage data editors using this method allows the structure to reflect operational needs, reporting, and control. ## Searching [Section titled “Searching”](#searching) To search for data editors: 1. Use the filter box in the lower left of the control hierarchy The search filter will search data editors’ names and labels for matches and show the results in the control hierarchy above. ## Move [Section titled “Move”](#move) To move a data editor within the control hierarchy: 1. Drag it into the folder where you wish to place it ## Rename [Section titled “Rename”](#rename) To rename a data editor: 1. Right click on the data editor 2. Select the rename option 3. Type in the new name and save it The data editor will now be renamed but retain its original unique identifier. ## Delete [Section titled “Delete”](#delete) You can delete a single data editor or multiple data editors. To delete a data editor: 1. Select the data editors in the control hierarchy 2. Click the delete button on the top toolbar ## Create New Directory Structure [Section titled “Create New Directory Structure”](#create-new-directory-structure) To add a new folder to the control hierarchy: 1. Click the New Folder button on the toolbar To add a folder to an existing folder: 1. Right-click on the folder 2. Select New Folder ## Mark Hierarchy for Viewing Roles [Section titled “Mark Hierarchy for Viewing Roles”](#mark-hierarchy-for-viewing-roles) The viewing of data editors by various roles: 1. Click in the Explorer or Manager checkboxes To update multiple data editors: 1. Select the data editors in the control hierarchy 2. Select the desired viewing role from the Actions menu on the top toolbar ## Memos to Describe Table Contents [Section titled “Memos to Describe Table Contents”](#memos-to-describe-table-contents) To add a memo to a data editor: 1. Select the data editor 2. Update the memo in the right context form ## View Additional Hierarchy Attributes [Section titled “View Additional Hierarchy Attributes”](#view-additional-hierarchy-attributes) To view and edit additional data editor attributes: 1. Select the data editor and view the data editor context form on the right ## Duplicate a Data Editor [Section titled “Duplicate a Data Editor”](#duplicate-a-data-editor) To duplicate a data editor: 1. Select the data editor 2. Click on the Duplicate button on the top toolbar # Managing Hierarchies > Manage hierarchical dimensions within PlaidCloud projects including assigning, configuring, and organizing dimension structures. PlaidCloud offers the ability to organize and manage hierarchies, including labels. Hierarchies are available to all workflows within a project. PlaidCloud uses a path-based system to organize hierarchies, like you would use to navigate a series of folders, allowing for a more flexible and logical organization (control hierarchy) of the hierarchies. Using this system, hierarchies can be moved within a control hierarchy, or multiple references to one hierarchy, from different locations in the control hierarchy (alternate hierarchies) can be created. The ability to manage hierarchies using this method allows the structure to reflect operational needs, reporting, and control. ## Searching [Section titled “Searching”](#searching) To search for hierarchies: 1. Use the filter box in the lower left of the control hierarchy 2. The search filter will search hierarchy names and labels for matches and show the results in the control hierarchy above ## Move [Section titled “Move”](#move) To move a hierarchy within the control hierarchy: 1. Drag it into the folder where you wish to place it ## Rename [Section titled “Rename”](#rename) To Rename a Hierarchy: 1. Right click on the hierarchy 2. Select the rename option 3. Type in the new name and save it 4. The hierarchy is now renamed, but it will retain its original unique identifier ## Clear [Section titled “Clear”](#clear) You can clear a single hierarchy or multiple hierarchies. To clear a hierarchy: 1. Select the hierarchies in the control hierarchy 2. Click the clear button on the top toolbar ## Delete [Section titled “Delete”](#delete) ### You Can Delete a Single Hierarchy or Multiple Hierarchies. [Section titled “You Can Delete a Single Hierarchy or Multiple Hierarchies.”](#you-can-delete-a-single-hierarchy-or-multiple-hierarchies) To delete a hierarchy: 1. Select the hierarchies in the control hierarchy 2. Click the delete button on the top toolbar The delete operation will check to see if the hierarchy is in use by workflow steps, tables, or views. If so, you will be asked to remove those associations. Note You can also force delete the hierarchy(s). Force deletion of the hierarchy(s) will leave references broken, so this should be used sparingly. ## Create New Directory Structure [Section titled “Create New Directory Structure”](#create-new-directory-structure) To create a new folder: 1. Clicking the New Folder button on the toolbar To add a folder to an existing folder: 1. Right-click on the folder 2. Select New Folder. ## Mark Hierarchy for Viewing Roles [Section titled “Mark Hierarchy for Viewing Roles”](#mark-hierarchy-for-viewing-roles) To view hierarchies by roles: 1. Click in the Explorer or Manager checkboxes To view hierarchies that need to be updated: 1. Select the hierarchies in the control hierarchy 2. Select the desired viewing role from the Actions menu on the top toolbar ## Memos to Describe Table Contents [Section titled “Memos to Describe Table Contents”](#memos-to-describe-table-contents) To add a memo to a hierarchy: 1. Select the hierarchy 2. Update the memo in the right context form ## View Additional Hierarchy Attributes [Section titled “View Additional Hierarchy Attributes”](#view-additional-hierarchy-attributes) To view and edit additional hierarchy attributes: 1. Select a hierarchy 2. View the hierarchy context form on the right ## Duplicate a Hierarchy [Section titled “Duplicate a Hierarchy”](#duplicate-a-hierarchy) To duplicate a hierarchy: 1. Select the hieracrhy 2. Click the duplicate button on the top toolbar # Managing Projects > Create, configure, and manage PlaidCloud projects including settings, permissions, and organizational structure for data analysis. ## Searching [Section titled “Searching”](#searching) Searching for projects is accomplished by using the filter box in the lower left of the hierarchy. The search filter will search project names and labels for matches and show the results in the hierarchy above. ## Creating New Projects [Section titled “Creating New Projects”](#creating-new-projects) To create a new project: 1. Open Analyze 2. Select “Projects” from the top menu bar 3. Click the “New Project” button 4. Complete the form information including the “Access Control” section 5. Click “Create” The project is now ready for updating access permissions, adding owners, and creating workflows. Note By default, the project will be accessible by all members of the current workspace ## Automatic Change Tracking [Section titled “Automatic Change Tracking”](#automatic-change-tracking) All changes to a project, including workflows, data editors, hierarchies, table structures, and UDFs are tracked and allow point-in-time recovery of the state. This allows for easy recovery from user introduced problems or simply copying a different point-in-time to another project for comparison. In addition to overall tracking, projects and their elements also allow for versioning. Not only is creating a version easy, you can also merge changes from one version to another. This provides a simple way to keep track of snapshots or to create a version for development and then be able to merge those changes into the non-development version when you want. ## Managing Project Access [Section titled “Managing Project Access”](#managing-project-access) ### Types of Access [Section titled “Types of Access”](#types-of-access) Project security has been simplified into three types of access: * All Workspace Members * Specific Members Only * Specific Security Groups Only Setting the project security is easy to do: 1. Open Analyze 2. Select “Projects” 3. Click the edit icon of the project you want to restrict 4. Choose desired restriction under “Access Control” 5. Click “Update” ## All Workspace Members [Section titled “All Workspace Members”](#all-workspace-members) “All Workspace Members” access is the most simple option since it provides access to all members of the workspace and does not require any additional assignment of members. ## Specific Members Only [Section titled “Specific Members Only”](#specific-members-only) “The Specific Members Only” access setting requires assignment of each member to the project.To assign members to a project: 1. Open Analyze 2. Select “Projects” from the top menu bar 3. Click the members icon 4. Grant access to members by selecting the check box next to their name in the “Access” column 5. Click “Update” For clouds with large numbers of members, this approach can often require more effort than desired, which is where security groups become useful. Note To add members, you must be a member of the workspace. ## Specific Security Groups Only [Section titled “Specific Security Groups Only”](#specific-security-groups-only) The “Specific Security Groups Only” option enables assigning specific security groups permission to access the account. With access restrictions relying on association with a security group or groups, the administration of account access for larger groups is much simpler. This is particularly useful when combined with single sign-on automatic group association. By using single sign-on to set member group assignments, these groups can also enable and disable access to projects implicitly. To edit assigned groups: 1. Open Analyze 2. Select “Projects” from the top menu bar 3. Click the security groups icon 4. Grant access to security groups by selecting the check box next to their name in the “Access” column 5. Click “Update” ## Setting Different Viewing Roles [Section titled “Setting Different Viewing Roles”](#setting-different-viewing-roles) Many times a project may require several transformations and tables to complete intermediate steps while the end result may end up only consisting of a few tables. Members do not always require viewing of all the elements of the project, sometimes just the final product. PlaidCloud offers you the ability to set different viewing roles to easily declutter and control the visibility of each member. There are three built-in viewing roles: **Architect, Manager,** and **Explorer** The **Architect** role is the most simple because it allows full visibility and control of projects, workflows, tables, variables, data editors, hierarchies, and user defined functions. The **Manager** and **Explorer** roles have no specific access privileges but can be custom-defined. In other words, you can choose which items are visible to each group. Note **Manager**\* \*and **Explorer** are not security groups, they only provide a convenient way of segregating duties and visibility of information. You can make everyone an **Architect** if you feel visibility of everything within the project is needed; otherwise, you can designate members as **Manager** and/or **Explorer** project members and control visibility that way. To set the different role: 1. Open Analyze 2. Select “Projects” 3. Click the members icon 4. Select the member you whose role you would like to change 5. Double click their current role in the “Role” column 6. Select the desired role 7. Click “Update” ## Managing Project Variables [Section titled “Managing Project Variables”](#managing-project-variables) When running a project or workflow it may be useful to set variables for recurring tasks in order to decrease clutter and save time. These variables operate just like a normal algebraic variable by allowing you to set what the variable represents and what operation should follow it. PlaidCloud allows you to set these variables at the project level, which will effect all the workflows within that project, or at the workflow level, which will only effect that specific workflow. To set a project level variable: 1. Open Analyze 2. Select “Projects” 3. Click the Manage Project Variables icon From the Variables Table you can view the variables and view/edit the current values. You can also add new or delete existing variables by clicking the “New Project Variable” button. ## Cloning a Project [Section titled “Cloning a Project”](#cloning-a-project) When a project is cloned, there may be project related references, such as workflow steps, that run within the project. PlaidCloud offers two options for performing a full duplication: * Duplicate with updating project references * Duplicate without updating project references Duplicating **with** updating project references means all the related references point to the newly duplicated project. To duplicate **with** updating project references: 1. Open Analyze 2. Select “Projects” 3. Select the project you would like to duplicate 4. Click the “Actions” button 5. Select the “Duplicate with project reference updates” option To duplicate **without** updating project references means to have all of the related references continue pointing to the original project. To duplicate **without** updating project references: 1. Open Analyze 2. Select “Projects” 3. Select the project you would like to duplicate 4. Click the “Actions” button 5. Select the “Duplicate without project reference updates” option ## Viewing the Project Report [Section titled “Viewing the Project Report”](#viewing-the-project-report) When a project or workflow is dynamic, maintaining detailed documentation becomes a challenge. To help solve this problem, PlaidCloud provides the ability to generate a project-level report that gives detailed documentation of workflows, workflow steps, user defined transforms, variables, and tables. This report is generated on-demand and reflects the current state of the project. To download the report: 1. Open Analyze 2. Select “Projects” 3. Click the report icon # Managing Tables and Views > Manage tables and views within PlaidCloud projects including creation, configuration, permissions, and data object organization. PlaidCloud offers the ability to organize and manage tables, including labels. Tables are available to all workflows within a project and have many tools and options. In addition to tables, PlaidCloud also offers Views based on table data. Using Views allows for instant updates when underlying table changes occur, as well as saving data storage space. Options include: * The same table can exist on multiple paths in the hierarchy (alternate hierarchies) * Tables are taggable for easier search and inclusion in PlaidCloud processes * Tables can be versioned * Tables can be published so they are available for Dashboard Visualizations PlaidCloud uses a path-based system to organize tables, like you would use to navigate a series of folders, allowing for a more flexible and logical organization of tables. Using this system, tables can be moved within a hierarchy, or multiple references to one table from different locations in the hierarchy (alternate hierarchies), can be created. The ability to manage tables using this method allows the structure to reflect operational needs, reporting, and control. ## Searching [Section titled “Searching”](#searching) Searching for tables is accomplished by using the filter box in the lower left of hierarchy. The search filter will search table names and labels for matches and show the results in the hierarchy above. ## Move [Section titled “Move”](#move) **To move a table:** 1. Drag it into the folder where you wish it to be located ## Rename [Section titled “Rename”](#rename) **To rename a table:** 1. Right click on the table 2. Select the rename option 3. Type in the new name and save it 4. The table is now renamed, but it retains its original unique identifier. ## Clear [Section titled “Clear”](#clear) **To clear a table:** 1. Select the tables in the hierarchy ‘ 2. Click the clear button on the top toolbar. *Note: You can clear a single table or multiple tables* ## Delete [Section titled “Delete”](#delete) **To delete a table:** 1. Select the tables in the hierarchy 2. Click the delete button on the top toolbar 3. The deleted operation will check to see if the table is in use by workflow steps or Views. If so, you will be asked to remove those associations before deletion can occur. *Note: You can also force delete the table(s). Force deletion of the table(s) will leave references broken, so this should be used sparingly.* ## Create New Directory Structure [Section titled “Create New Directory Structure”](#create-new-directory-structure) **To add a new folder:** 1. Click the New Folder button on the toolbar **To add a folder to an existing folder:** 1. Right-click on the folder 2. Select New Folder ## View Data (table Explorer) [Section titled “View Data (table Explorer)”](#view-data-table-explorer) Table data is viewed using the Data Explorer. The Data Explorer provides a grid view of the data as well as a column by column summary of values and statistics. Point-and-click filtering and exporting to familiar file formats are both available. The filter selections can also be saved as an Extract step usable in a workflow. ## Publish Table for Reporting [Section titled “Publish Table for Reporting”](#publish-table-for-reporting) Dashboard Visualizations are purposely limited to tables that have been published. When publishing a table, you can provide a unique name that may distinguish the data. This may be useful when the table has a more obscure name on part of the workflow that generated it, but it needs a clearer name for those building dashboards. Published tables do not have paths associated with them. They will appear as a list of tables for use in the dashboards area. ## Mark Table for Viewing Roles [Section titled “Mark Table for Viewing Roles”](#mark-table-for-viewing-roles) The viewing of tables by various roles can be controlled by clicking the Explorer or Manager checkboxes. If multiple tables need to be updated, select the tables in the hierarchy and select the desired viewing role from the Actions menu on the top toolbar. ## Memos to Describe Table Contents [Section titled “Memos to Describe Table Contents”](#memos-to-describe-table-contents) Add a memo to a table to help understand the data. ## View Table Shape, Size, and Last Updated Time [Section titled “View Table Shape, Size, and Last Updated Time”](#view-table-shape-size-and-last-updated-time) The number of rows, columns, and the data size for each table is shown in the table hierarchy. For very large tables (multi-million rows) the row count may be estimated and an indicator for approximate row count will be shown. ## View Additional Table Attributes [Section titled “View Additional Table Attributes”](#view-additional-table-attributes) **To view and edit other table attributes:** 1. Select a table 2. Click the view the table context form on the right. ## Duplicate a Table [Section titled “Duplicate a Table”](#duplicate-a-table) **To duplicate a table:** 1. Selecting the table 2. Click on the duplicate button on the top toolbar. # Viewing Projects > View and browse authorized PlaidCloud projects including project details, status, membership, and associated data resources. ## Description [Section titled “Description”](#description) Within **Analyze**, the Projects function provides a level of compartmentalization that makes controlling access and modifying privileges much easier. Projects are what provide the primary segregation of data within a workspace tab. While Projects fall under Analyze, workflows fall under Projects, meaning that Projects contain workflows. Workflows, simply put, perform a wide range of tasks including data transformation pipelines, data analysis, and even ETL processes. More information on workflows can be found under the “Workflows” section. ## Accessing Projects [Section titled “Accessing Projects”](#accessing-projects) **To access Projects:** 1. Open Analyze 2. Select “Projects” from the top menu bar This displays the Projects Hierarchy. From here, you will see a hierarchy of projects for which you have access. There may be additional projects within the workspace, but, if you are not an owner or assigned to the project, they will not be visible to you. # Viewing the Project Log > View the PlaidCloud project log to monitor workflow execution history, track changes, and troubleshoot data processing issues. ## Viewing and Sorting the Project Log [Section titled “Viewing and Sorting the Project Log”](#viewing-and-sorting-the-project-log) As actions occur within a project, such as assigning new members or running workflows, the Project Log stores the events. The Project Log consolidates the view of all individual workflow logs in order to provide a more comprehensive view of project activities. PlaidCloud also enables the viewer to sort and filter a Project Log and view details of a particular log entry. **To view the Project Log:** 1. Open Analyze 2. Select “Projects” 3. Click the log icon **To sort and filter the Project Log:** 1. Click the small icon to the right of the log and to the left of the “log message” 2. Select desired guidelines **To view details of a particular log entry:** 1. Right click on the desired log entry 2. View the “Log Message” box for details ## Clearing the Project Log [Section titled “Clearing the Project Log”](#clearing-the-project-log) Clearing the Project Log may be desirable from time to time Note Clearing the Project Log will include deleting all the sub-logs for each workflo\*w **To clear the Project Log:** 1. Open Analyze 2. Select “Projects” 3. Click the log icon 4. Click the “Clear Log” button # Custom App Sandbox > Build and deploy custom applications within the PlaidCloud Sandbox environment using your preferred frameworks and languages. Sandbox environments let you test changes safely before promoting them. Use a sandbox project or workspace to validate workflow edits, allocation logic, and dimension changes without touching production. # Getting Started with the Custom App Sandbox > Get started building and deploying custom applications in the PlaidCloud Sandbox environment with setup and configuration steps. ## What is the Sandbox [Section titled “What is the Sandbox”](#what-is-the-sandbox) The PlaidCloud Sandbox allows for the deployment of your own custom apps with native local access to data and PlaidCloud operations. The Sandbox environment provides a full compute environment for building custom applications to augment your use of PlaidCloud. All custom apps run using Kubernetes deployment processes, therefore a basic understanding of Kubernetes objects is necessary. A [Hello World example](https://github.com/PlaidCloud/custom-app-template) is available to show you how to deploy a simple application. ## Available Resources [Section titled “Available Resources”](#available-resources) There is soft resource limit on the Sandbox apps with the expectation that resource usage will not be abused. We can support large amounts of compute if needed but let’s discuss before attempting to deploy. Contact us if you expect needing significant resources. The applications running the in the sandbox will have direct access to the Lakehouse and any number of Postgres databases that you desire. Postgres databases are designed to handle moderate sized data so it is perfect for storing configurations and other meta data. For primary data storage, use the Lakehouse as it will enable storing large amounts of data and remain performant. All PlaidCloud APIs are also available directly from the Sandbox without using a public URL to help with data transfer speeds. ## Image Requirements [Section titled “Image Requirements”](#image-requirements) Any image that supports a Docker based Kubernetes deployment is suitable for a custom app. Only \*nix based images are currently supported. If you have a need to run a Windows based image, please contact us. ## Integrate With Your CI/CD Pipeline [Section titled “Integrate With Your CI/CD Pipeline”](#integrate-with-your-cicd-pipeline) The Kubernetes deployment of the Sandbox app utilizes GitOps processes. This allows you to implement your own CI/CD process for image builds and deployments. Your custom app git repo is constantly monitored for changes so as updates are made, your sandbox will be updated. # Workflows > Create and manage PlaidCloud workflows to load, transform, schedule, and automate data processing across your projects. Workflows are the automation unit in PlaidCloud — orchestrate imports, transforms, allocations, exports, and notifications. Steps can run sequentially, in parallel, conditionally, or in loops. ## Common Tasks [Section titled “Common Tasks”](#common-tasks) * [Migrate Alteryx Workflows](/guides/workflows/migrate-alteryx-workflows/) * [Alteryx Migration Readiness Checklist](/guides/workflows/alteryx-migration-readiness-checklist/) * [Package Alteryx Dependencies](/guides/workflows/package-alteryx-dependencies/) * [Validate Converted Alteryx Workflows](/guides/workflows/validate-converted-alteryx-workflows/) * [Use Converted Alteryx Apps](/guides/workflows/use-converted-alteryx-apps/) * [Orchestrate Alteryx Migrations With MCP](/guides/workflows/orchestrate-alteryx-migrations-with-mcp/) * [Tune Alteryx Imports](/guides/workflows/troubleshoot-alteryx-imports/) * [Validate Alteryx Reports And Artifacts](/guides/workflows/validate-alteryx-reports-and-artifacts/) * [Migrate Spatial Alteryx Workflows](/guides/workflows/migrate-spatial-alteryx-workflows/) * [Create A Macro](/guides/workflows/create-a-macro/) * [Run A Workflow](/guides/workflows/run-a-workflow/) # Advanced Workflows (Visual Workflow Designer) > Build and run PlaidCloud workflows on a visual DAG canvas — drag-and-drop steps, breakpoints, containers, simulation, and real-time collaboration. ## Description [Section titled “Description”](#description) An **Advanced workflow** runs on a visual canvas — the **Visual Workflow Designer** — where steps are nodes in a directed graph (a DAG). The lines between nodes show how data flows, and the runtime executes them in dependency order rather than top-to-bottom. Branches that don’t depend on each other run in parallel automatically. Advanced is one of PlaidCloud’s **workflow types**; the others are Standard. The capabilities in this guide — the visual canvas, **breakpoints**, **containers**, **Run From Here**, **Simulate**, and the docked **Inspector** — are all Advanced-only. The canvas also supports **real-time collaboration**: several people can open the same Advanced workflow and edit it together, seeing each other’s presence and changes live. ## Standard vs Advanced [Section titled “Standard vs Advanced”](#standard-vs-advanced) Every workflow has a **type**, chosen when you create it: | Type | Steps are arranged in… | …and execute | | ------------------------- | --------------------------------------- | ------------------------------------------------------------------------ | | **Standard Serial** | the Steps list | top to bottom, one at a time | | **Standard Parallel** | the Steps list | from the list, in parallel where dependencies allow | | **Advanced (DAG canvas)** | a visual graph of nodes and connections | in dependency order — independent branches run in parallel automatically | Standard and Advanced run on **different engines**: a Standard workflow executes from its **Steps list**, while an Advanced workflow executes from the **graph** you draw, following the explicit producer/consumer connections between steps. That’s why the Visual Workflow Designer opens only for Advanced workflows — arrows drawn on a Standard workflow wouldn’t change how it runs. Note A workflow type called **Macro** is on the roadmap — a reusable, callable workflow with declared inputs and outputs — and appears in the type selector as “Macro (coming soon)”. ## Choose the Workflow Type [Section titled “Choose the Workflow Type”](#choose-the-workflow-type) You can set the type when creating a workflow, or convert an existing Standard workflow: * **At creation** — the workflow type selector offers **Standard Serial**, **Standard Parallel**, **Advanced (DAG canvas)**, and **Macro (coming soon)**. New workflows default to **Standard Serial**. * **Convert an existing workflow** — select a Standard workflow in the **Workflows** list and choose **Convert to Advanced…**, then confirm. Your steps, their configuration, and their dependencies are preserved — only how the workflow is displayed and executed changes. Note There’s no in-app “Convert to Standard” action, so treat the switch to Advanced as a forward move. Advanced is per-workflow: a workspace can mix Standard and Advanced workflows freely, and converting one doesn’t affect the others. ## The Canvas [Section titled “The Canvas”](#the-canvas) Each step is a node, and the connections between nodes define the order steps run in. The Designer lays steps out automatically, and you can rearrange them freely. ### Navigate [Section titled “Navigate”](#navigate) * **Zoom in** / **Zoom out**, **Reset zoom to 100%**, and **Fit all nodes to view** frame the workflow at any size. * **Pan tool** — when active, dragging pans the canvas instead of selecting. Middle-click drag and Space+drag pan regardless of the toggle. * **Snap to grid** rounds node positions to a fixed grid when you drop them, for tidy alignment. ### Lay Out [Section titled “Lay Out”](#lay-out) * **Tidy Layout** (auto-arrange) re-runs the automatic left-to-right layout. It overwrites the current positions but preserves connections, notes, and highlights. * **Undo** and **Redo** step backward and forward through layout changes (Cmd/Ctrl+Z and Cmd/Ctrl+Shift+Z), and **History** opens the panel of changes. ### Annotate [Section titled “Annotate”](#annotate) Annotations are visual only — they document the diagram and never affect execution. * **Add Note** — drop a sticky note anywhere on the canvas, then **Edit Text…** to write in it (or **Delete Note** to remove it). * **Add Highlight** — draw a translucent box around a group of related steps and give it a label with **Edit Label…** (or **Delete Highlight** to remove it). * **Color** — color-code a step (or reset it to **Default**) to group work visually. ### Lock [Section titled “Lock”](#lock) A workflow can be **locked** to prevent accidental edits. Click the lock toggle to switch between *Workflow editing — click to lock* and *Workflow is locked — click to unlock for editing*. While locked, the canvas is read-only until you unlock it. ### Export the Diagram [Section titled “Export the Diagram”](#export-the-diagram) * **Export layout as PNG** and **Export layout as PDF** save a picture of the canvas for documentation, review, or sharing. ## Add, Connect, and Edit Steps [Section titled “Add, Connect, and Edit Steps”](#add-connect-and-edit-steps) Drag a step type from the **palette** onto the canvas. To connect steps, drag from one step to another to draw a **connector** — this is what tells the runtime that one step’s output feeds the next. Caution The workflow must stay a DAG: a connector that would create a cycle is rejected (“that connector would create a cycle”). Right-click a connector to **Remove Connector** or jump to **Edit Source Step…** / **Edit Target Step…** at either end. Right-click any step for: * **Edit Step Configuration…** — the step’s settings form. * **Edit Step Details…** — name, memo, error handling, retry, and conditions. * **Convert Step Type…** — change the step to a different operation type. * **Duplicate Step…** — open the new-step form pre-populated from this step. * **Enable Step** / **Disable Step** — a disabled step is skipped at run time. * **View Step Inputs** / **View Step Outputs** — open the step’s data in the Inspector. * **Color** — apply or clear a step color. * **Delete Step…** — removes it from the workflow structure. Downstream steps that depended on its output will need to be reconfigured. ## The Step Palette [Section titled “The Step Palette”](#the-step-palette) The **Step Palette** lists every step type you can add. Use **Filter…** to find a step by name, and mark the ones you use most with **Add to Favorites** so they surface under **Favorites** at the top. ## Run Controls [Section titled “Run Controls”](#run-controls) The canvas runs the whole workflow or any part of it. Because execution follows the graph, “from here” and “selected” honor dependencies rather than list position. | Action | What it runs | | ----------------- | -------------------------------------------------------------------------------------------------------------------- | | **Run Workflow** | The entire workflow from its starting steps. | | **Run This Step** | Only the selected step. | | **Run From Here** | The selected step and every downstream step that would normally run after it. | | **Run Selected** | Only the selected steps — they fire in parallel and the runtime sequences them by their dependencies. | | **Run Section** | The selected steps plus every step the graph places between the topologically earliest and latest of your selection. | While a workflow is running you can **Pause** (in-flight steps finish; new steps wait until you **Resume**), **Resume** a paused or stopped workflow from where it left off, or **Stop** (in-flight steps finish; queued steps are cancelled). ## Simulate [Section titled “Simulate”](#simulate) **Simulate** walks the workflow’s graph without running the actual transforms. Steps paint as they would during a real run, so you can visualize the order and branches before committing to compute. Nothing executes and no data changes. Tip Simulate is the fastest way to sanity-check a large or heavily branched workflow — confirm the order and dependencies are what you expect, then do a real run. ## Breakpoints [Section titled “Breakpoints”](#breakpoints) A **breakpoint** pauses a run when it reaches a step, so you can inspect upstream output before the rest of the workflow continues. 1. Right-click a step and choose **Set Breakpoint** (or **Clear Breakpoint** to remove it). 2. Run the workflow. When execution reaches a step with a breakpoint, the run pauses there; everything downstream waits. 3. Inspect the step’s inputs and outputs in the Inspector, then **Resume** to continue. Breakpoints are saved with the workflow, so a breakpoint you set persists across sessions until you clear it. ## Containers [Section titled “Containers”](#containers) A **container** groups related steps into a labeled box you can collapse or disable as a unit — useful for organizing large workflows or toggling a whole sub-process on and off. * **Group into Container** — select the steps, then group them and give the container a name. * **Collapse Container** / **Expand Container** — fold the container down to a single tile to declutter the canvas, or open it back up. * **Disable Container** / **Enable Container** — disabling a container skips all of its member steps in one action. Enabling restores them. * **Rename Container…** — change its label. * **Ungroup Container** — remove the container; its member steps stay on the canvas. Note Disabling a container is the quickest way to skip an entire branch of work — for example, a block of export steps you want to leave out of a test run — without disabling each step individually. ## The Inspector [Section titled “The Inspector”](#the-inspector) The docked **Step Inspector** shows everything about the step you select: * **Inputs** and **Outputs** — the data flowing into and out of the step. * **Run Stats** — the step’s **Last Run** result, **Last Duration**, run count, and timing summaries including the **Average** and **p95** durations, plus the **Last Run Error** or **Last Run Warning** when there is one. * **Edit Step Configuration…** and **Edit Step Details…** — jump straight to the step’s forms, and rename the step inline by clicking its name. * **Rollback Step Config (Flashback)** — restore the step’s configuration to an earlier saved version. * **View Run History** — open this step’s full run history. Select a single step to inspect it; select several and the Inspector points you to the bulk actions in the canvas toolbar. ## Run History [Section titled “Run History”](#run-history) Choose **View this workflow’s run history** to see past runs with summary statistics. The canvas also paints a heatmap from recent run records, so frequently failing or slow steps stand out at a glance. ## Next Steps [Section titled “Next Steps”](#next-steps) * [Run a workflow](/guides/workflows/run-a-workflow/) — running a workflow end to end * [Managing step errors](/guides/workflows/managing-step-errors/) — debugging failures * [Upcoming runs calendar](/administration/scheduled-events/upcoming-runs-calendar/) — see when scheduled workflows will run # Alteryx Migration Readiness Checklist > Prepare Alteryx workflows, apps, macros, and dependencies for a smooth PlaidCloud migration. Use this checklist before importing Alteryx workflows into PlaidCloud. A complete package helps PlaidCloud create Advanced workflows, macro workflows, Document-backed dependencies, and controlled runtime inputs with minimal follow-up. Note For large portfolios, run this checklist once for the portfolio and again for each high-priority workflow family. ## Collect Workflow Files [Section titled “Collect Workflow Files”](#collect-workflow-files) Collect every workflow file that belongs to the migration: 1. Standard workflows: `.yxmd`. 2. Analytic apps: `.yxwz`. 3. Macros: `.yxmc`. 4. Nested or shared macros referenced by other workflows. 5. Workflow versions that are still used in production. Keep related workflows and macros together when they call each other. PlaidCloud uses those relationships to generate macro workflows and connect macro input and output ports. ## Collect Data Dependencies [Section titled “Collect Data Dependencies”](#collect-data-dependencies) Package the files that the workflows read, write, or inspect: * CSV, TSV, fixed-width, JSON, XML, Excel, Access, YXDB, and database extract files. * Folders used by Directory or Dynamic Input tools. * Expected output files for validation. * Report assets such as images, PDFs, templates, and map layers. * Lookup tables and rule tables used by joins, formulas, replacements, and matching. If a workflow references a local desktop path, include that file or folder in the import package and choose the Document path where PlaidCloud should store it. ## Collect Spatial Sidecars [Section titled “Collect Spatial Sidecars”](#collect-spatial-sidecars) Spatial file formats often require multiple files to stay together. Include every sidecar file in the same folder: * Shapefile groups such as `.shp`, `.shx`, `.dbf`, and `.prj`. * MapInfo groups such as `.tab`, `.map`, `.id`, and `.dat`. * KML, GeoJSON, and other standalone spatial files. * Projection or coordinate reference files used by the workflow. Missing sidecars are a common source of spatial validation differences. ## Capture Runtime Inputs [Section titled “Capture Runtime Inputs”](#capture-runtime-inputs) For analytic apps, record the values users normally provide: 1. Text, numeric, date, file, and folder inputs. 2. Drop-down, radio button, check box, list, and tree selections. 3. Defaults used for scheduled or repeatable runs. 4. Values that trigger conditions, warnings, or errors. PlaidCloud converts these inputs to controlled workflow variables. ## Choose The Target Location [Section titled “Choose The Target Location”](#choose-the-target-location) Before import, decide: 1. Target PlaidCloud project. 2. Target Document account. 3. Target Document folder for imported files. 4. Naming convention for converted workflows and macros. 5. Whether the first import is a staging import or production import. For portfolio migrations, use a dedicated migration folder in Document so dependencies remain easy to audit. ## Choose Validation Evidence [Section titled “Choose Validation Evidence”](#choose-validation-evidence) For each workflow, choose the validation level: * Structural validation confirms the workflow converts into a runnable PlaidCloud DAG. * Output parity validation compares schema, row count, and row values against trusted Alteryx outputs. * Artifact validation reviews reports, PDFs, images, charts, maps, or model outputs. Store expected outputs with the migration package whenever output parity is required. ## Import Readiness Checklist [Section titled “Import Readiness Checklist”](#import-readiness-checklist) Before importing, confirm: * Workflow, app, and macro files are present. * Referenced macros are present. * Input files and folders are present. * Spatial sidecars are grouped together. * Expected outputs are available when parity validation is part of the migration plan. * Runtime input values are known for analytic apps. * Target project and Document path are selected. * Credentials or external connections needed by the workflow are available in PlaidCloud. ## Related Guides [Section titled “Related Guides”](#related-guides) * [Migrate Alteryx Workflows](/guides/workflows/migrate-alteryx-workflows/) * [Package Alteryx Dependencies](/guides/workflows/package-alteryx-dependencies/) * [Validate Converted Alteryx Workflows](/guides/workflows/validate-converted-alteryx-workflows/) # Change the order of steps in a workflow > Reorder steps within a PlaidCloud workflow using drag-and-drop or manual ordering to control the data processing sequence. There are two ways to update the order of steps in the workflow. The first way is to use the up and down arrows present in the **Workflows** table to move the step up or down. The second way is to use the **Step Move** option which allows you to move the step much easier if large changes are necessary. The step move option allows you to move the step to the top, bottom, or after a specific step in one operation. # Column Propagation > Push a column rename, type change, or removal from one workflow step downstream through every step that consumes it. ## Description [Section titled “Description”](#description) When you change a column at the source — rename it, change its type, or remove it — every downstream step that maps to that column has to be updated to match. **Column Propagation** does that work for you in one confirmation. Propagation is available from any step that has a **ColumnMapper** (Project Table, Calculate, Append, Merge, etc.). Two buttons in the mapper toolbar drive it: * `Propagate All` — propagate every column in the mapper * `Propagate Selected` — propagate only the columns you have selected in the mapper grid ## Propagate a Column Change [Section titled “Propagate a Column Change”](#propagate-a-column-change) 1. Open the workflow step containing the column you want to change 2. In the **ColumnMapper**, make the change (rename, retype, etc.) 3. Click `Propagate All` (or select rows and click `Propagate Selected`) in the mapper toolbar 4. The **Propagate Downstream** dialog opens with a tree of every step that depends on the source step 5. Tick the steps you want to apply the change to — child steps cascade automatically 6. For aggregation steps, pick the aggregation function for any new columns in the lower **Aggregation** panel 7. Click `Confirm` The dialog defaults safely: when a downstream row has no explicit mapping for the column, the source name is reused as the target so no information is lost on the way through. ## What Propagates [Section titled “What Propagates”](#what-propagates) * Column rename — source → target name change * Type change — dtype updated wherever the column appears * New columns added in this step — flow forward into downstream mappers * Strip operations applied to the source Steps that don’t reference the column are still shown in the tree but are unticked by default. ## Errors and Retries [Section titled “Errors and Retries”](#errors-and-retries) If a downstream step has been edited by someone else since the dialog was opened, the propagation will fail with a stale-version error. The dialog refetches the dependency graph automatically and lets you retry without losing your selections. If the refetch itself fails, your selections are still preserved so you can resolve the underlying issue and try again. Note Column Propagation only modifies steps within the same workflow. If a downstream workflow consumes the table, update its mappers separately or run a [Dependency Audit](../view-a-dependency-audit/) to find the affected steps. # Continue on Error > Configure PlaidCloud workflow steps to continue execution on error, allowing subsequent steps to run despite earlier failures. Workflow steps can be set to continue processing even when there is an error. This might be useful in workflow start-up conditions or where data may be available intermittently. If the step errors, it will be recorded as an error but the workflow will continue to process. To set this option, click on the step edit option, the pencil icon in the workflow table, to open the edit form. Check the checkbox for **Continue On Error**. After saving the updated step, any errors with the step will not cause the workflow to stop. Steps that have been set to continue on error will have a special indicator in the workflow steps hierarchy table. # Controlling Parallel Execution > Control parallel step execution in PlaidCloud workflows to optimize performance by running independent steps simultaneously. Workflows in PlaidCloud can be executed as a combination of serial steps and parallel operations. To set a group of steps to run in parallel, place the steps in a group within the workflow hierarchy. Right click on the group folder and select the **Execute in Parallel** option. This will allow all the steps in the group to trigger simultaneously and execute in parallel. Once all steps in the group complete, the next step or group in the workflow after the group will activate. # Copy & Paste steps > Copy and paste workflow steps between PlaidCloud workflows to reuse step configurations and speed up workflow development. ## Copy Steps [Section titled “Copy Steps”](#copy-steps) It is often useful to copy steps instead of starting from scratch each time. PlaidCloud allows copying steps within workflows as well as between workflows, and even in other projects. You can select multiple steps to copy at once. Select the workflow steps within the hierarchy and click the **Copy Selected Steps** button at the top of the table. This will place the selected steps in the clipboard and allow pasting within the current workflow or another one. Copying a step will make a duplicate step within the project. If you want to place the same step in more than one location in a workflow, use the **Add Step** menu option to add a reference to the same step rather than a clone of the original step. ## Paste Steps [Section titled “Paste Steps”](#paste-steps) After selecting steps to copy and placing them on the clipboard, you can paste those steps into the same workflow or another workflow, even in another project. There are two options when pasting the steps into the workflow: * Append to the end of the workflow * Insert after last selected row The append option will simply append the steps to the end of the selected workflow. The insert option will insert the copied steps after the selected row. Note that if multiple steps have been copied to the clipboard from multiple areas in a workflow, that pasting them will paste them in order but will not have any nested hierarchy information from when they were copied. The pasting will be a flat list of steps to insert only. This might be unexpected but is safer than creating all of the directory structure in the target workflow that existed in the source workflow. # Create a Macro > Define a Macro — a reusable, run-isolated sub-workflow with a typed input/output contract — and invoke it from another workflow. A **Macro** is a reusable Advanced (DAG) workflow with a declared input/output contract. You invoke a Macro from another workflow with a [Macro Run](/reference/workflow-steps/workflow-control/run-macro/) or [Macro Concurrent](/reference/workflow-steps/workflow-control/macro-concurrent/) step — each invocation runs in its own isolated scratch schema, so the same Macro can be called concurrently from multiple parents or from multiple driver rows without the runs colliding. Use a Macro when you have a repeatable, parameterized data transformation — for example, “process one month of sales data for one region” — that you want to call from a driver workflow once per month, per region, or both. The caller binds tables and variables to the Macro’s declared input ports; when it finishes, the declared output tables are copied back to caller-side destinations. Note Macros are an Advanced-only feature. Convert your workflow to Advanced first. Steps inside the Macro must be Macro-safe: table transforms, imports, exports, run-scoped variable steps, nested Macro Run calls, and a few control-flow steps. Dimension imports and dimension-updating steps are rejected because dimensions are project-global state. ## Steps [Section titled “Steps”](#steps) 1. Open the Project containing the workflow you want to turn into a Macro. 2. Switch to the **Workflows** tab and select the workflow row. 3. In the right-side **Workflow Details** panel, confirm the **Workflow Type** is **Advanced**. If it’s Standard, use the **Convert to Advanced** action first. 4. In the **Macro** section of the Workflow Details panel, click **Convert to Macro…** and confirm in the dialog. The workflow flips to Macro mode and a Ports editor appears. 5. Click **Add Port** to declare each input or output the Macro accepts or produces. For each port: * **Name** — a short identifier the caller uses to bind to this port (for example, `month`, `region_sales`). * **Direction** — **Input** (the caller provides this value or table) or **Output** (the Macro produces this and the caller picks it up). * **Kind** — **Table** for a data table, **Scalar** for a single value (string / int / float / bool / date), or **Dimension** for a hierarchical dimension reference. * **Required** — clear if the Macro can run without this port being bound. * **Memo** — a short note about what this port represents, shown to authors in the caller-side binding form. 6. Click **Save Ports**. The workflow is now a Macro. Authors who add a [Macro Run](/reference/workflow-steps/workflow-control/run-macro/) step in another workflow will see your declared ports and bind to them by name. ## Invoking a Macro [Section titled “Invoking a Macro”](#invoking-a-macro) In the caller workflow: 1. Add a **Macro: Run** step. 2. In **Macro to Run**, select the project and the Macro workflow. 3. **Input Port Bindings** — for each input port the Macro declares, add a binding: * **Table** ports take a caller-side source table; optionally select a subset of columns and add a filter (a `WHERE` clause referencing the Macro’s scalar input variables) so only the needed slice is materialized into the Macro’s run schema. * **Scalar / Dimension** ports take a value (often a workflow variable from the caller) — set BEFORE table copy-in so the table-input filter can reference it. 4. **Output Port Bindings** — for each output port the Macro produces, point it at a caller-side destination table (created on the fly if it doesn’t already exist). 5. Save the step. When the parent workflow runs and reaches the Macro Run step, the runner: 1. Mints a fresh `run_id` for this Macro invocation. 2. Creates a per-run scratch schema (`macrorun_`) in the project’s catalog. 3. Copies the bound input tables into the scratch schema (column projection + filter applied at copy-in, so the filter pushes down). 4. Sets the bound scalar / dimension input variables on the Macro’s run-scoped variable overlay. 5. Runs the Macro’s steps in-process. Every SQL step inside the Macro reads and writes the scratch schema instead of the project schema. 6. Copies the declared output tables back to the caller’s destinations. 7. Drops the scratch schema (always — even if a step inside the Macro failed). Because each invocation has its own scratch schema, two concurrent calls to the same Macro (from a loop, a fan-out, or independent workflows) never collide on intermediate table names. ## Running One Macro per Driver Row [Section titled “Running One Macro per Driver Row”](#running-one-macro-per-driver-row) Use **Macro: Concurrent Run** when one caller table should drive many independent Macro invocations. 1. Add a **Macro: Concurrent Run** step. 2. On **Driver**, select the caller-side driver table and set **Concurrent Runs** to the maximum number of child Macro invocations to run at once. 3. On **Table Data Selection**, map the driver-table columns that each child invocation needs. 4. On **Driver Filter**, optionally restrict the driver rows to process. 5. On **Macro**, select the Macro workflow. 6. On **Input Bindings**, bind driver column values to Macro scalar or dimension variables and bind caller-side tables to Macro table ports. 7. On **Output Bindings**, map Macro output ports to caller-side destination tables. Each selected driver row gets a separate `run_id` and scratch schema. Stopping the parent step stops all active child invocations and drops their run schemas. ## Demoting a Macro [Section titled “Demoting a Macro”](#demoting-a-macro) You can clear the Macro flag at any time by clicking **Demote to Advanced…** in the Workflow Details panel. The workflow reverts to a plain Advanced workflow and any caller-side Macro Run steps that reference it will fail at runtime with a “not a macro” error. The declared ports stay on the record so re-converting later restores them. ## Limitations (v1) [Section titled “Limitations (v1)”](#limitations-v1) * Macros may contain table transforms, imports, and exports because table reads and writes are isolated to the invocation’s scratch schema. Dimension imports and dimension-updating steps are not allowed because dimensions are project-global state. Other non-table side-effect steps, such as document operations and agent calls, are not Macro-safe in v1. * Macros must live in the same project as the caller. Cross-project Macros are not yet supported. * A Macro Run step cannot be invoked from a [Run Workflow](/reference/workflow-steps/workflow-control/run-workflow/), [Workflow Loop](/reference/workflow-steps/workflow-control/workflow-loop/), or conditional Run Workflow — those steps run un-isolated and would break the per-run schema contract. Use a Macro Run step in the caller instead. ## Next Steps [Section titled “Next Steps”](#next-steps) * [Macro Run step reference](/reference/workflow-steps/workflow-control/run-macro/) — the per-field reference for the calling step. * [Macro Concurrent step reference](/reference/workflow-steps/workflow-control/macro-concurrent/) — run one Macro invocation per driver-table row. * [Run Workflow step](/reference/workflow-steps/workflow-control/run-workflow/) — for sub-workflows that don’t need per-invocation isolation. # Create Workflow (Guide) > Create a workflow in PlaidCloud and choose its type — Standard Serial, Standard Parallel, or Advanced (DAG canvas) — to load, transform, and export data. To create a new workflow, you need an existing project. If you don’t have one yet, see [Manage projects](/guides/projects/managing-projects/). ## Steps [Section titled “Steps”](#steps) 1. Open the project that should contain the workflow. 2. Switch to the **Workflows** tab. 3. Click **New Workflow** in the toolbar. 4. Fill in the form: * **Name** — short, descriptive (e.g., “Monthly close — load actuals”) * **Memo** — optional longer description for context * **Workflow Type** — Standard Serial (default), Standard Parallel, or Advanced (DAG canvas). See [Choosing a workflow type](#choosing-a-workflow-type) below. * **Trigger Remediation Workflow on Error** — optional; enable it to pick a remediation workflow (see [About remediation workflows](#about-remediation-workflows) below). 5. Click **Create**. The workflow appears in the Workflows tab and is ready to have steps added to it. Double-click it to open the [Workflow Explorer](/guides/workflows/workflow-explorer/) and start building. ## Choosing a Workflow Type [Section titled “Choosing a Workflow Type”](#choosing-a-workflow-type) The **Workflow Type** you pick when creating a workflow determines how its steps are arranged and run. It defaults to **Standard Serial**, and you set it from the type selector in the New Workflow form. | Type | How steps are arranged and run | | ------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | **Standard Serial** | Steps run from the **Steps list**, one at a time, in order. | | **Standard Parallel** | Steps run from the **Steps list**, in parallel where their dependencies allow. | | **Advanced (DAG canvas)** | Steps are laid out on a **visual canvas** and run in dependency order, with independent branches running in parallel. Advanced also unlocks breakpoints, containers, run-from-here, simulation, and real-time collaboration. | A fourth type, **Macro** — a reusable, callable workflow with declared inputs and outputs — is coming soon and appears in the selector as disabled. Choose **Advanced (DAG canvas)** here if you want the [Visual Workflow Designer](/guides/workflows/advanced-workflows/) from the start. The choice isn’t permanent: you can promote a Standard workflow later with **Convert to Advanced…** from the Workflows list. ## About Remediation Workflows [Section titled “About Remediation Workflows”](#about-remediation-workflows) If the new workflow ends in an error, PlaidCloud can automatically run a **remediation workflow** in response. This is useful for: * Sending a notification to a Slack channel, email distribution list, or webhook so someone investigates * Triggering a rollback or cleanup workflow that restores a known-good state * Logging the failure to an audit table A remediation workflow is optional. You can leave it blank now and configure it later if needed. The remediation workflow only fires on terminal failures, not on per-step warnings. ## Next Steps [Section titled “Next Steps”](#next-steps) * [Workflow explorer](/guides/workflows/workflow-explorer/) — add steps to your new workflow * [Advanced workflows](/guides/workflows/advanced-workflows/) — choose a workflow type and build on the visual canvas * [Run a workflow](/guides/workflows/run-a-workflow/) — execute the workflow once it has steps * [Managing step errors](/guides/workflows/managing-step-errors/) — debugging failures # Duplicate or Clone a Workflow > Duplicate workflows in place or clone them into another project — useful for replicating a process or staging changes safely. Copying a workflow is useful when planning major changes or replicating a process with different options. Copies are completely independent — modifying a copy does not affect the original. Two actions are available from the **Workflows** table: * **Duplicate Selected Workflows** — fast in-place copy in the same project * **Clone Workflow(s)** — copy into a target project of your choice (defaults to the current project) ## Duplicate Selected Workflows [Section titled “Duplicate Selected Workflows”](#duplicate-selected-workflows) 1. Open the project’s **Workflows** table 2. Select one or more workflows 3. Click `Duplicate Selected Workflows` in the toolbar Each clone lands in the same project, with ” copy” appended to the original name. ## Clone Workflow(s) [Section titled “Clone Workflow(s)”](#clone-workflows) Use this when you want to copy workflows into another project — for example, promoting from a development project into a sibling project. 1. Open the source project’s **Workflows** table 2. Select one or more workflows 3. Open the **Actions** menu and click `Clone Workflow(s)` 4. In the dialog, pick the **Target Project** (defaults to the current project) 5. Click `Clone Workflow(s)` Cloned workflows have ” copy” appended to their names so they don’t collide with anything already in the target project. Note Cloning copies the workflow definition and step configuration. Project-scoped resources referenced by the workflow (tables, dimensions, connections) must already exist in the target project, or you must clone them separately. # LLM Step > Run a prompt against an LLM inside a workflow — with scoped read-only access to your project's tables, dimensions, and documents — and route the structured response to outputs such as generated PDFs. ## Description [Section titled “Description”](#description) The **LLM Step** runs a prompt against a large language model as one step in a workflow. You can give the model scoped, read-only access to specific tables, dimensions, and documents in your project, and you require it to return a structured JSON response that matches a schema you define. That structured response is then routed through one or more **outputs** — for example, generating one PDF per row of the response. A common use is to summarize a financial table into per-business-unit commentary and write each commentary out as its own PDF in a document account. Note Scoped access to project data (tables, dimensions, documents) is available with **Anthropic** LLM connections, which use a secure connector to reach PlaidCloud’s read-only tools. Other providers can still run a prompt and return structured JSON, but they don’t get scoped data access — bindings are ignored for them. ## Before You Start [Section titled “Before You Start”](#before-you-start) You’ll need: * **An LLM connection.** The step requires a connection of kind **LLM** (for example, Anthropic). The connection holds the provider API key and an optional default model. See [Connections](/guides/connections/). * **A document account** — only if you’re generating files. PDF output is written to a document account you choose. ## Add an LLM Step [Section titled “Add an LLM Step”](#add-an-llm-step) 1. Open the workflow and go to the **Analyze Steps** tab. 2. Add a step in the position you want, the same way you’d add a standard transform. 3. Choose **LLM Step** as the step type. The editor opens with four sections: **LLM Request**, **Bindings**, **Outputs**, and **Limits**. ## Configure the Request [Section titled “Configure the Request”](#configure-the-request) In the **LLM Request** group: * **LLM Connection** *(required)* — the LLM connection the step uses. * **Model** *(optional)* — a specific model name, such as `claude-opus-4-7`. Leave it blank to use the connection’s default model. * **Prompt** *(required)* — the instruction sent to the model. You can reference bound objects inline with `{{tables.NAME}}`, `{{dimensions.NAME}}`, and `{{documents.NAME}}`; each reference must match a binding you declare below. * **Result schema** *(required)* — a JSON Schema the model’s output must conform to. It must be a JSON object with `"type": "object"` at the root. The step validates the response against this schema, so your outputs can rely on its shape. ## Bind Project Data [Section titled “Bind Project Data”](#bind-project-data) The **Bindings** section grants the model read-only access to specific objects and tells it how to address them. It has three tables — **Tables**, **Dimensions**, and **Documents** — each with **Add row** and **Remove selected** buttons. * **Tables** and **Dimensions** — `Name` (the label you reference in the prompt), `Reference` (the table or dimension ID), and `Mode` (`read`). * **Documents** — `Name`, `Account` (document account ID), `Path`, `Mode` (`read`), and `Format` (such as `pdf`). Note The step is **read-only**. The model can query and read the objects you bind, but it cannot modify data, and it cannot reach objects you didn’t bind. ## Define Outputs [Section titled “Define Outputs”](#define-outputs) The **Outputs** section routes the model’s structured response to a destination. At least one output is required. Each row has a **Kind** and a **Config (JSON)** value. The available kind is **`pdf_per_item`**, which renders one PDF per element of an array in the response and uploads each to a document account. Its config fields are: | Field | Description | | ------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | `iterator` | Name of the array field in the response to loop over — one PDF per element. | | `content_field` | Per-item field holding the **Markdown** rendered into the PDF body. | | `title_field` | Per-item field used as the PDF title (default `title`). | | `document_account` | Document account ID to write to. | | `path_template` | File path for each item, with `{field}` placeholders taken from the item. The date tokens `{yyyy}`, `{yyyy-mm}`, `{yyyy-mm-dd}`, and `{yyyymmdd}` are filled in automatically. | | `on_collision` | `error` (default), `skip`, or `overwrite` when a file already exists at the target path. | | `account_root` | Root path within the document account (default `/`). | For a response that returns a `businesses` array whose items each have `business_name`, `title`, and `commentary`, the config is: ```json { "iterator": "businesses", "title_field": "title", "content_field": "commentary", "document_account": "", "path_template": "/pl-analysis/{yyyy-mm}/{business_name}.pdf", "on_collision": "overwrite" } ``` ## Limits [Section titled “Limits”](#limits) The **Limits** group bounds the request: * **Max output tokens** — the largest response the model may return (1,024 to 128,000; default 16,000). * **Credential TTL (seconds)** — how long the step’s scoped, temporary data-access credential stays valid (60 to 3,600; default 1,800). The step’s job is bounded by this window, so set it long enough for the model to finish but no longer than necessary. ## Run the Step [Section titled “Run the Step”](#run-the-step) The LLM Step runs like any other step — as part of a full workflow run, or on its own (see [Running one step in a workflow](/guides/workflows/running-one-step-in-a-workflow/)). When it runs, the step sends your prompt and scoped tool access to the model, validates the response against your result schema, and writes each output. The model call runs in its own job, so a long-running step doesn’t tie up the workflow runner. Caution Each run makes a real, billable call to your LLM provider and writes its outputs. When an output’s `on_collision` is `error`, re-running a step whose files already exist fails — use `overwrite` or `skip` for repeatable runs. # Manage Workflow Variables > Manage workflow variables in PlaidCloud to store and pass dynamic values between steps for flexible data processing logic. PlaidCloud allows variables at both the project scope and workflow scope. This allows for setting project wide variables or being able to pass information easily between workflows. The variables and values are viewed by clicking on the variables icon in the **Workflows** hierarchy. From the variables table you can view the variables, the current values, and edit the values. You can also add new variables or delete existing ones. # Managing Step Errors > Handle and manage step errors in PlaidCloud workflows including error notifications, retry logic, and failure recovery options. If a workflow experiences an error during processing, an error indicator is displayed on both the workflow and the step that had the error. PlaidCloud can retry a failed step multiple times. This is often useful if the step is accessing remote systems or data that may not be highly available or intermittently fail for unknown reasons. The retry capability can be set to retry many times as well as add a delay between retries from seconds to hours. If no retry is selected or the maximum number of retries is exceeded, then the step will be marked as an error. PlaidCloud provides three levels of error handling in that case: * Stop the workflow when an error occurs * Mark the step as an error but keep processing the workflow * Mark the step as an error and trigger a remediation workflow process instead of continuing the current workflow ## Stop the Workflow [Section titled “Stop the Workflow”](#stop-the-workflow) Stopping the workflow when a step errors is the most common approach since workflows generally should run without errors. This will stop the workflow and present the error indicator on both the step and the workflow. The error will also be displayed in the activity monitor but no further action is taken. ## Keep Processing [Section titled “Keep Processing”](#keep-processing) Each step can be set to continue on error in the step form. If this checkbox is enabled, then any step will be marked with an error if it occurs, but the workflow will treat the error as a completion of the step and continue on. This is often useful if there are steps that perform tasks that can error when there is missing data but are harmless to the overall processes. Since the workflow is continuing on error under this scenario the workflow will not display an error indicator and continue to show a running indicator. ## Trigger Remediation Workflow [Section titled “Trigger Remediation Workflow”](#trigger-remediation-workflow) With the ability to set a remediation workflow as part of the workflow setup, a workflow error will immediately stop the processing of the current workflow and start processing the remediation workflow. Note that if a step is marked to continue on error that a failure will not trigger the remediation workflow. Only steps that fail that would also cause the entire workflow to stop will trigger the remediation process. A remediation workflow may be useful for simply notifying people that a failure has occurred or it can perform other complex processing to attempt an automatic correction of any underlying reasons the original workflow failed. # Migrate Alteryx Workflows > Convert Alteryx workflows, apps, and macros into PlaidCloud Advanced workflows with Document-backed dependencies, typed inputs, validation, and repeatable runs. PlaidCloud converts Alteryx workflows, analytic apps, and macros into Advanced workflows that can be reviewed, scheduled, parameterized, and run in PlaidCloud. The importer preserves the workflow graph, uploads referenced files to Document, creates macro workflows when needed, and maps tools to native workflow steps or managed job executors. Use this guide when you are moving a single workflow, a group of related workflows, or a larger Alteryx portfolio into PlaidCloud. Note Converted workflows are designed to require very little manual effort. For production workflows, PlaidCloud validation gives teams a clear readiness record before scheduling regular runs. ## What PlaidCloud Creates [Section titled “What PlaidCloud Creates”](#what-plaidcloud-creates) PlaidCloud creates a runnable Advanced workflow from the Alteryx design: * Workflow tools become PlaidCloud workflow steps with the original upstream and downstream relationships preserved. * Alteryx macros become PlaidCloud macro workflows with explicit macro inputs and macro outputs. * Analytic app questions become controlled workflow variables that users can set before a run. * Input files, output files, spatial sidecars, images, PDFs, and generated artifacts are stored in Document at the path selected during import. * Advanced operations such as fuzzy matching, spatial processing, PDF extraction, OCR, machine learning, NLP, and reporting run through PlaidCloud’s managed job executors when a native SQL or workflow operation is not the best fit. * Browse, layout, annotations, and designer-only objects are retained where they help explain the converted workflow, but they do not add unnecessary runtime work. ## Before You Start [Section titled “Before You Start”](#before-you-start) Collect the workflow files and dependencies together before importing: 1. Include Alteryx workflow, app, and macro files: `.yxmd`, `.yxwz`, and `.yxmc`. 2. Include input data files such as CSV, Excel, Access, YXDB, XML, JSON, and database extracts. 3. Include spatial sidecar files together. For example, keep shapefile groups and MapInfo files in the same folder. 4. Include report assets such as images, PDFs, map layers, and templates. 5. Choose the PlaidCloud project where the converted workflows should be created. 6. Choose the Document account and folder where PlaidCloud should upload imported files. 7. Decide whether this migration requires structural validation only or output parity validation. ## Import A Workflow [Section titled “Import A Workflow”](#import-a-workflow) 1. Open the target project in PlaidCloud. 2. Open **Workflows**. 3. Choose the import action for Alteryx workflows. 4. Select the `.yxmd`, `.yxwz`, or `.yxmc` file to import. 5. Choose the Document account and folder where imported files should be stored. 6. Add any referenced files or folders that the workflow needs at runtime. 7. Start the import. 8. Review the conversion summary for uploaded files, generated workflows, generated macros, readiness notes, and validation recommendations. 9. Open the generated Advanced workflow. PlaidCloud stores imported dependencies in the Document location selected during import. Converted steps then reference those Document paths, so the workflow can run repeatedly without relying on a desktop file system. ## Use The Converted Workflow [Section titled “Use The Converted Workflow”](#use-the-converted-workflow) After import, use the workflow like any other PlaidCloud Advanced workflow: 1. Open the converted workflow canvas. 2. Review the generated steps and branches. 3. Set workflow variables for any converted app questions or runtime parameters. 4. Run the workflow. 5. Review run history, step outputs, readiness notes, and generated artifacts. 6. Schedule the workflow when it is ready for repeatable operation. Converted macros are available as PlaidCloud macro workflows. A workflow that called an Alteryx macro will call the generated PlaidCloud macro through the macro step. Macro runs are isolated from one another, so concurrent workflow runs can safely use the same macro definition. ## Validate The Conversion [Section titled “Validate The Conversion”](#validate-the-conversion) PlaidCloud supports two practical validation levels. ### Structural Validation [Section titled “Structural Validation”](#structural-validation) Structural validation confirms that the workflow was converted into a runnable PlaidCloud DAG: * Every Alteryx tool has a PlaidCloud conversion route. * Required macros were found or generated. * Required input files were uploaded to Document. * Macro inputs and macro outputs are connected. * Workflow variables were created for user-controlled inputs. * The generated workflow opens and can be run in PlaidCloud. Structural validation is useful for migration readiness, inventory review, and early portfolio conversion. ### Output Parity Validation [Section titled “Output Parity Validation”](#output-parity-validation) Output parity validation compares the PlaidCloud run against trusted Alteryx outputs: * Output schemas match. * Row counts match. * Row values match. * Row order is ignored unless the workflow explicitly depends on ordering. For workflows that create reports, maps, PDFs, images, or model artifacts, validate the generated artifact or the data behind the artifact according to the way your team uses the output. See [Validate Converted Alteryx Workflows](/guides/workflows/validate-converted-alteryx-workflows/) for a detailed validation checklist. ## Review Conversion Coverage [Section titled “Review Conversion Coverage”](#review-conversion-coverage) The [Alteryx Conversion Matrix](/reference/alteryx-conversion-matrix/) lists each supported Alteryx object, its coverage level, and the PlaidCloud operation used during conversion. Use the matrix to understand how the importer handles each tool family: * Native DAG steps for common data preparation, joins, filters, formulas, sorting, unions, sampling, and reshaping. * Macro steps for macro inputs, macro outputs, macro invocation, control parameters, and macro concurrency. * Controlled workflow variables for analytic app questions such as check boxes, drop-downs, text boxes, radio buttons, folder pickers, and file pickers. * Document-backed file operations for input, output, directory, and dynamic file behavior. * Managed job executors for specialized spatial, fuzzy matching, machine learning, PDF, OCR, NLP, reporting, and artifact work. * Cloud-native equivalents where PlaidCloud creates a durable, shareable artifact rather than reproducing an Alteryx-specific desktop renderer or proprietary file format. ## Recommended Migration Practice [Section titled “Recommended Migration Practice”](#recommended-migration-practice) For a large portfolio, migrate in batches: 1. Import the workflows and macros into a migration project. 2. Complete dependency packages before reviewing individual formulas or business logic. 3. Run structural validation across the batch. 4. Prioritize output parity validation for production workflows, regulatory workflows, and workflows with downstream consumers. 5. Promote validated workflows into the target production project. 6. Schedule production runs and monitor run history. ## Related Guides [Section titled “Related Guides”](#related-guides) * [Alteryx Migration Readiness Checklist](/guides/workflows/alteryx-migration-readiness-checklist/) * [Package Alteryx Dependencies](/guides/workflows/package-alteryx-dependencies/) * [Validate Converted Alteryx Workflows](/guides/workflows/validate-converted-alteryx-workflows/) * [Use Converted Alteryx Apps](/guides/workflows/use-converted-alteryx-apps/) * [Orchestrate Alteryx Migrations With MCP](/guides/workflows/orchestrate-alteryx-migrations-with-mcp/) * [Tune Alteryx Imports](/guides/workflows/troubleshoot-alteryx-imports/) * [Validate Alteryx Reports And Artifacts](/guides/workflows/validate-alteryx-reports-and-artifacts/) * [Migrate Spatial Alteryx Workflows](/guides/workflows/migrate-spatial-alteryx-workflows/) * [Create A Macro](/guides/workflows/create-a-macro/) * [Run A Workflow](/guides/workflows/run-a-workflow/) * [Manage Workflow Variables](/guides/workflows/manage-workflow-variables/) * [Alteryx Conversion Matrix](/reference/alteryx-conversion-matrix/) ## Migration Documentation Set [Section titled “Migration Documentation Set”](#migration-documentation-set) For larger migrations, use these focused guides with this migration guide: * [Alteryx Migration Readiness Checklist](/guides/workflows/alteryx-migration-readiness-checklist/) for migration planning and package review. * [Package Alteryx Dependencies](/guides/workflows/package-alteryx-dependencies/) for files, folders, macros, spatial sidecars, and expected outputs. * [Use Converted Alteryx Apps](/guides/workflows/use-converted-alteryx-apps/) for controlled workflow variables and app-style runs. * [Orchestrate Alteryx Migrations With MCP](/guides/workflows/orchestrate-alteryx-migrations-with-mcp/) for using an AI agent to organize files, work from connected shared storage, and coordinate many conversions through PlaidCloud’s MCP server. * [Tune Alteryx Imports](/guides/workflows/troubleshoot-alteryx-imports/) for dependency completion, macro resolution, variables, validation comparisons, and executor readiness notes. * [Validate Alteryx Reports And Artifacts](/guides/workflows/validate-alteryx-reports-and-artifacts/) for PDFs, images, maps, charts, dashboards, and model artifacts. * [Migrate Spatial Alteryx Workflows](/guides/workflows/migrate-spatial-alteryx-workflows/) for spatial files, SQL geometry logic, managed spatial executors, and spatial validation. # Migrate Spatial Alteryx Workflows > Prepare, import, and validate Alteryx spatial workflows in PlaidCloud with Document-backed files, SQL geometry logic, and managed spatial executors. PlaidCloud converts Alteryx spatial workflows into Advanced workflows that use Document-backed spatial inputs, SQL geometry logic where appropriate, and managed spatial executors for operations that need specialized geometry processing. Note PlaidCloud chooses the simplest reliable route for each spatial operation. Some spatial logic can run as SQL, while nearest-neighbor, overlay, buffering, smoothing, trade area, and similar operations may run through managed executors. ## Package Spatial Inputs [Section titled “Package Spatial Inputs”](#package-spatial-inputs) Before import, collect every spatial dependency: * Shapefile groups with `.shp`, `.shx`, `.dbf`, and `.prj`. * MapInfo groups with `.tab`, `.map`, `.id`, and `.dat`. * KML, GeoJSON, and other spatial files. * Lookup tables used to join spatial and non-spatial records. * Expected spatial outputs for validation. Keep sidecar files together in the same folder before importing. ## How Spatial Tools Convert [Section titled “How Spatial Tools Convert”](#how-spatial-tools-convert) PlaidCloud uses the best available execution route for each spatial operation: * Point creation can convert to native geometry creation. * Spatial metadata can convert to SQL geometry expressions or a spatial transform. * Spatial matching, nearest-neighbor, overlay, buffer, smoothing, generalization, trade area, and polygon building can run through managed spatial executors. * Map and report-map outputs can convert to PlaidCloud map or report artifacts. This lets converted workflows use fast SQL logic when it is sufficient and executor-backed processing when the operation needs broader geometry support. ## Validate Geometry [Section titled “Validate Geometry”](#validate-geometry) For spatial output parity, compare: 1. Output schema. 2. Row count. 3. Key field values. 4. Geometry values or geometry-derived measurements. 5. Coordinate reference behavior. 6. Accepted tolerance for distances, areas, and simplified geometry. Row order does not need to match unless the workflow depends on ordering. ## Validate Spatial Artifacts [Section titled “Validate Spatial Artifacts”](#validate-spatial-artifacts) For maps and report maps, confirm: 1. Map layers contain the expected records. 2. Labels and grouped features are correct. 3. Boundaries, points, and polygons appear in the expected locations. 4. The artifact is usable by downstream reviewers. 5. Any intentional cloud-native artifact difference is recorded. ## Common Spatial Issues [Section titled “Common Spatial Issues”](#common-spatial-issues) If a converted spatial workflow does not validate: * Confirm sidecar files are present. * Confirm projection files are present. * Confirm the same source data was used in both runs. * Review geometry precision and coordinate reference differences. * Review executor notes. * Validate the data table behind a map artifact before comparing visual layout. ## Related Guides [Section titled “Related Guides”](#related-guides) * [Package Alteryx Dependencies](/guides/workflows/package-alteryx-dependencies/) * [Validate Alteryx Reports And Artifacts](/guides/workflows/validate-alteryx-reports-and-artifacts/) * [Alteryx Conversion Matrix](/reference/alteryx-conversion-matrix/) # Multi-Table Join Step > Join many tables in one workflow step using a visual join-graph designer — draw joins between columns, choose join types, filter, and export the diagram. ## Description [Section titled “Description”](#description) The **Multi-Table Join** step joins many tables in a single operation. Instead of chaining a series of two-table joins — where you lose the big picture and repeat the same output mapping — you lay every table out on a visual **join-graph designer**, draw the joins between their columns, and produce one result table. It’s built for the common shape of analytics joins: one fact table joined to several dimension or lookup tables. You can join up to 32 tables at once. Note This step replaces the pattern of chaining several Inner Join / Outer Join steps. One step, one diagram, one output mapping — and a single shared view of how the tables fit together. ## Add a Multi-Table Join [Section titled “Add a Multi-Table Join”](#add-a-multi-table-join) Add it like any other step and choose **Multi-Table Join** (under the **Tables** group of the step menu, or drag it from the palette on the [Advanced workflow canvas](/guides/workflows/advanced-workflows/)). The step’s editor has four tabs: 1. **Tables & Joins** — the visual designer where you add tables and draw joins. 2. **Output Columns** — the columns the result table will contain. 3. **Post-Join Filter** — an optional filter applied to the joined result. 4. **Advanced (server-set)** — settings managed by the server; you rarely touch these. A status indicator shows **Ready to save** or **Unsaved changes** as you work. ## Tables & Joins [Section titled “Tables & Joins”](#tables--joins) ### Add Tables [Section titled “Add Tables”](#add-tables) Choose **Add Table** to place a source table on the canvas. Each table shows as a card listing its columns, and gets an **alias** — a short name you use to reference its columns elsewhere (as `alias.column`). An alias must start with a letter or underscore and can’t be a SQL keyword. Add as many as you need (up to 32), and pick the **Target Table** the result is written to. If a table’s columns change in the catalog, use **Re-fetch from server** to replace the card’s columns with the latest values. ### Draw Joins [Section titled “Draw Joins”](#draw-joins) Drag from a column dot on one table to a column on another to create a **join** (an edge). Select an edge to open its editor on the right, where you set: * **Join type:** | Type | Keeps | | --------- | ------------------------------------------------------------ | | **INNER** | matches only | | **LEFT** | all rows from the left table | | **FULL** | all rows from both sides | | **CROSS** | every left row combined with every right row (no conditions) | * **Join conditions** — one or more comparisons between the two tables’ columns. Add more with **+ Add condition**, combine them with **AND** / **OR**, and use the full set of operators (`=`, `<>`, `<`, `<=`, `>`, `>=`, `BETWEEN`, `IS NULL`, `IS NOT NULL`, `IN`, `NOT IN`, `LIKE`, `NOT LIKE`). * **Label (optional)** — name the join (for example, `customer-to-orders`) to make the diagram easier to read. The designer keeps the join graph as a tree — each table connects into the result through exactly one join, so there are no cycles or ambiguous paths, and a table can’t be joined to itself. To remove a join, select the edge and choose **Delete edge**. ### Filter a Table Before It Joins [Section titled “Filter a Table Before It Joins”](#filter-a-table-before-it-joins) Each source table has an **Inbound Filter** applied *before* the join — use it to cut a table down to the rows you care about (reference its columns as `alias.column`). This is separate from the **Post-Join Filter**, which runs *after* all the joins. ### Designer Toolbar [Section titled “Designer Toolbar”](#designer-toolbar) * **Tidy Layout** auto-arranges the tables and joins left-to-right by join order. * **Fit to View** scales and centers so the whole graph fits; you can also pan and zoom manually. * **Filter columns…** narrows the columns shown on the cards when a table has many. * **Undo** / **Redo** step backward and forward through your edits (Ctrl+Z / Ctrl+Y), and **History** opens a panel of every change since you opened the dialog, with **Restore**. * **Export** downloads the join diagram as an SVG image — handy for documentation or review. ## Output Columns [Section titled “Output Columns”](#output-columns) On the **Output Columns** tab, choose the columns the result table will contain. Pick from any joined table — use **Add selected**, **Add all source columns**, or **Pick from canvas…** to choose visually. For each column you can rename it, set its data type, and apply an aggregation. When two source columns share a name, PlaidCloud prefixes them with their source alias so they don’t collide. ## Post-Join Filter [Section titled “Post-Join Filter”](#post-join-filter) The **Post-Join Filter** tab applies an optional filter to the joined result before it’s written to the target table — the equivalent of a SQL `HAVING` clause. Reference result columns by name. ## Validation [Section titled “Validation”](#validation) PlaidCloud validates the join as you build it and again when the step runs. If something is wrong — an unsupported configuration, a cycle, a duplicate alias, or a column that no longer exists — the designer marks the offending table or join with a ⚠ marker and a message, and the save is rejected with the reason. Use **Jump to issue** to go straight to the first unresolved problem. Tip A `CROSS` join produces every combination of rows and has no conditions. On large tables this can be very expensive — use it deliberately. ## Run the Step [Section titled “Run the Step”](#run-the-step) The Multi-Table Join runs like any other step. It executes all the joins in one operation and writes the mapped columns to the target table. ## Next Steps [Section titled “Next Steps”](#next-steps) * [Multi-Table Join reference](/reference/workflow-steps/tables/table-multi-table-join/) — concise field list * [Table steps](/reference/workflow-steps/tables/) — the full set of table transforms * [Advanced workflows](/guides/workflows/advanced-workflows/) — build on the visual canvas # Orchestrate Alteryx Migrations With MCP > Use an MCP-connected AI agent to coordinate Alteryx portfolio migrations into PlaidCloud. PlaidCloud’s MCP server lets an AI agent coordinate a migration across many Alteryx workflows. The agent can help stage and organize files in Document, inventory workflow packages, call the Alteryx converter, organize generated workflows, run validation workflows, and summarize progress for the migration team. Note MCP is ideal for portfolio-scale coordination. The PlaidCloud importer still performs the conversion; the agent helps plan, sequence, run, and summarize the work. ## When To Use MCP [Section titled “When To Use MCP”](#when-to-use-mcp) Use MCP orchestration when you are migrating many workflows or when you want an AI agent to help with repeatable migration tasks: * Inventory Alteryx files staged in Document. * Upload, copy, move, rename, and organize migration files through Document tools. * Work from a connected shared storage account such as OneDrive, Google Drive, SharePoint, S3, Azure Blob, or SFTP. * Group workflows, apps, and macros into migration batches. * Convert each `.yxmd`, `.yxwz`, or `.yxmc` file into PlaidCloud. * Track conversion results across the portfolio. * Run converted workflows or validation workflows. * Produce a migration summary for review. For a single workflow, the import form is often the fastest path. For a portfolio, MCP gives the agent a structured way to coordinate the same PlaidCloud capabilities repeatedly. ## What The Agent Can Call [Section titled “What The Agent Can Call”](#what-the-agent-can-call) The MCP tool catalog includes an Alteryx conversion tool named `alteryx_convert`. It creates PlaidCloud workflows from Alteryx files stored in Document. The conversion call includes: * Source Document account and path for the Alteryx file. * Destination PlaidCloud project. * Destination Document account and path for conversion artifacts. * Optional workflow, step, and table prefixes. * Workflow type, with Advanced workflows as the default. The agent can combine this with the normal MCP project, workflow, Document, workflow-run, and table tools to manage the broader migration. ## Prepare Files With The Agent [Section titled “Prepare Files With The Agent”](#prepare-files-with-the-agent) An MCP-connected agent can help with the file preparation work around the conversion: * Find `.yxmd`, `.yxwz`, and `.yxmc` files in Document. * Identify likely input files, macro files, report assets, spatial sidecars, and expected outputs. * Create a clean migration folder structure. * Copy, move, or rename files into that structure. * Keep related workflow, macro, data, spatial, report, and validation files together. * Summarize the package before conversion. This is useful when a migration package contains many folders or when teams want the agent to produce a repeatable inventory before conversion begins. ## Use Shared Storage Without Uploading [Section titled “Use Shared Storage Without Uploading”](#use-shared-storage-without-uploading) You do not have to upload files into a new PlaidCloud-owned folder before migration. PlaidCloud Document can connect directly to shared storage that already contains the Alteryx migration package. Common options include: * [Google Drive](/guides/documents/adding-accounts/add-google-drive-account/) * [OneDrive or SharePoint](/guides/documents/adding-accounts/add-onedrive-account/) * [AWS S3](/guides/documents/adding-accounts/add-aws-s3-account/) * [Azure Blob Storage](/guides/documents/adding-accounts/add-azure-blob-storage-account/) * [SFTP](/guides/documents/adding-accounts/add-sftp-account/) After the Document account is connected, the agent can work from that Document account and path. This keeps the migration close to the customer’s existing shared storage and can eliminate a separate upload step. ## Stage Or Select Files In Document [Section titled “Stage Or Select Files In Document”](#stage-or-select-files-in-document) Before asking an agent to orchestrate the migration, choose one of these paths: * Connect an existing shared storage location as a Document account. * Upload workflow packages to a Document account. * Ask the agent to organize files already available in Document. Then confirm: 1. Workflows, apps, macros, input files, spatial sidecars, reports, and expected outputs are available through Document. 2. The destination project is selected. 3. The Document path for converted workflow dependencies and artifacts is selected. 4. The agent has permission to use the relevant MCP Document and workflow tools. The agent can then reference stable Document paths when it calls the converter. ## Suggested Agent Prompt [Section titled “Suggested Agent Prompt”](#suggested-agent-prompt) Use a prompt like this with an MCP-connected agent: ```text In PlaidCloud, migrate the Alteryx workflows in the connected Document account "Migration Share" under /q4-alteryx-package into the project "Q4 Migration". First inventory the .yxmd, .yxwz, and .yxmc files. Organize the package by workflow, macro, input, spatial, report, and expected-output files. Group macros with the workflows that call them. Then convert the workflows as Advanced workflows using the Document output path /q4-alteryx-converted. Prefix created workflows with "Q4 - ". After each conversion, summarize the generated workflow, macros, readiness notes, and next validation step. Ask before making mutating calls. ``` Adjust the project name, source path, destination path, and prefix for your migration batch. ## Recommended Orchestration Flow [Section titled “Recommended Orchestration Flow”](#recommended-orchestration-flow) For portfolio migrations, ask the agent to follow this flow: 1. Connect or select the Document account that contains the migration package. 2. Inventory the source Document folder. 3. Identify `.yxmd`, `.yxwz`, and `.yxmc` files. 4. Organize files into workflow, macro, input, spatial, report, and expected-output groups. 5. Group related workflows and macros. 6. Confirm the destination project and Document output path. 7. Convert macros and workflows into Advanced workflows. 8. Open or describe the generated workflows. 9. Run structural validation. 10. Run output parity validation where expected outputs are available. 11. Summarize converted workflows, generated macros, artifacts, and validation status. This gives the migration team one progress report across the portfolio while still using PlaidCloud’s native importer for each conversion. ## Review Mutating Calls [Section titled “Review Mutating Calls”](#review-mutating-calls) Most MCP clients show tool calls before they run. Review conversion and workflow-run calls before approving them, especially in production projects. For staging migrations, use a dedicated migration project and Document folder. After validation, promote the converted workflows into the production project. ## Track Results [Section titled “Track Results”](#track-results) Ask the agent to produce a migration table with: * Source Alteryx file. * Generated PlaidCloud workflow. * Generated macro workflows. * Conversion status. * Validation level. * Output Document path. * Notes for follow-up. This summary is useful for project reporting, handoff, and production readiness review. ## Related Guides [Section titled “Related Guides”](#related-guides) * [AI Agents (MCP)](/integrations/ai-coding-agents/) * [Getting Started with AI Coding Agents](/integrations/ai-coding-agents/getting-started/) * [Add Document Accounts](/guides/documents/adding-accounts/) * [Migrate Alteryx Workflows](/guides/workflows/migrate-alteryx-workflows/) * [Package Alteryx Dependencies](/guides/workflows/package-alteryx-dependencies/) * [Validate Converted Alteryx Workflows](/guides/workflows/validate-converted-alteryx-workflows/) # Package Alteryx Dependencies > Package files, folders, macros, spatial sidecars, and validation fixtures before importing Alteryx workflows into PlaidCloud. PlaidCloud imports Alteryx workflow files from Document and uses Document paths for dependencies and artifacts. You can upload files into Document, ask an MCP-connected agent to organize files for you, or connect Document directly to shared storage such as OneDrive, Google Drive, SharePoint, S3, Azure Blob, or SFTP. Note Use the Document path picker during import to choose exactly where PlaidCloud should read source files and store generated artifacts. ## Upload Or Connect Shared Storage [Section titled “Upload Or Connect Shared Storage”](#upload-or-connect-shared-storage) Choose the source location that best fits your migration: * Upload the package to a Document account. * Connect an existing shared storage location as a Document account. * Ask an MCP-connected agent to copy, move, rename, and organize files already available in Document. Shared storage connections can reduce migration prep because the team can keep files where they already collaborate: * [Google Drive](/guides/documents/adding-accounts/add-google-drive-account/) * [OneDrive or SharePoint](/guides/documents/adding-accounts/add-onedrive-account/) * [AWS S3](/guides/documents/adding-accounts/add-aws-s3-account/) * [Azure Blob Storage](/guides/documents/adding-accounts/add-azure-blob-storage-account/) * [SFTP](/guides/documents/adding-accounts/add-sftp-account/) ## Recommended Folder Shape [Section titled “Recommended Folder Shape”](#recommended-folder-shape) For each migration batch, use a folder shape like this before importing: ```text workflow-package/ workflows/ macros/ inputs/ expected-outputs/ reports/ spatial/ ``` This structure is optional, but it makes review and validation easier. ## Workflows And Macros [Section titled “Workflows And Macros”](#workflows-and-macros) Place workflow files and macros in predictable folders: * Put `.yxmd` and `.yxwz` files in `workflows/`. * Put `.yxmc` files in `macros/`. * Keep nested macros with the rest of the macro library. * Keep duplicate macro names out of the same package unless they are intentionally versioned. PlaidCloud uses macro source files to create PlaidCloud macro workflows and connect macro calls from converted workflows. ## Input Files [Section titled “Input Files”](#input-files) Put input data under `inputs/`: * CSV, TSV, fixed-width, JSON, XML, Excel, Access, YXDB, SAS, SPSS, Stata, Avro, Parquet, and HDF files. * Lookup files used by formulas, dynamic replace, fuzzy matching, or joins. * Folders referenced by Directory and Dynamic Input tools. When PlaidCloud imports the package, converted steps point to the selected Document locations. ## Spatial Files [Section titled “Spatial Files”](#spatial-files) Put spatial files under `spatial/` and keep sidecars together: * Shapefiles: include `.shp`, `.shx`, `.dbf`, and `.prj`. * MapInfo files: include `.tab`, `.map`, `.id`, and `.dat`. * Include projection files and lookup layers when present. * Keep KML, GeoJSON, and other spatial inputs in the same dependency package. Do not split spatial sidecars across folders. PlaidCloud needs the full group to materialize geometry correctly. ## Report And Artifact Assets [Section titled “Report And Artifact Assets”](#report-and-artifact-assets) Put report assets under `reports/`: * Images used by report composer tools. * PDF inputs. * HTML fragments or templates. * Map layers used by report maps. * Example generated reports when artifact validation is required. Converted report steps create PlaidCloud artifacts that can be reviewed from Document and workflow run history. ## Expected Outputs [Section titled “Expected Outputs”](#expected-outputs) Put validation fixtures under `expected-outputs/`: * Trusted Alteryx output tables. * Expected reports, PDFs, maps, images, or charts. * Notes about accepted row ordering, numeric precision, and date handling. Expected outputs are optional for structural validation and are the best evidence for output parity validation. ## External Connections [Section titled “External Connections”](#external-connections) For database, API, and cloud-source workflows, capture: 1. Source system name. 2. Connection type. 3. Database, schema, table, endpoint, or bucket names. 4. Credential owner or secret name. 5. Snapshot date when output parity depends on a point-in-time source. Use PlaidCloud connections or credentials for repeatable production runs. ## Related Guides [Section titled “Related Guides”](#related-guides) * [Alteryx Migration Readiness Checklist](/guides/workflows/alteryx-migration-readiness-checklist/) * [Migrate Alteryx Workflows](/guides/workflows/migrate-alteryx-workflows/) * [Orchestrate Alteryx Migrations With MCP](/guides/workflows/orchestrate-alteryx-migrations-with-mcp/) * [Tune Alteryx Imports](/guides/workflows/troubleshoot-alteryx-imports/) # REST Request Step > Call any REST API from a workflow — import an endpoint from Postman, OpenAPI, or HAR, send a test request, and capture the response to a table or variables. ## Description [Section titled “Description”](#description) The **REST Request** step calls a REST API as one step in a workflow. It’s a single, Postman-style step for the whole “build → send → inspect” loop: pick an endpoint (typed in by hand or imported from a Postman collection, an OpenAPI/Swagger spec, or a HAR capture), set headers, query parameters, and a body, fire a **test request** to see the live response, then save the step. You choose where the response goes. By default it’s parsed into a **table**. Alternatively, the step can capture the response into **workflow variables** so a later step can branch on the status code or read a value out of the body. Note The REST Request step replaces the older REST-flavored steps with one unified step. Use it for new work; existing REST steps keep running. ## Before You Start [Section titled “Before You Start”](#before-you-start) You’ll usually want a **connection** of the REST kind, which holds the base URL and authentication for the service. The step can also call a fully-qualified URL directly without a connection. See [REST connections](/reference/connectors/rest/). ## Add a REST Request Step [Section titled “Add a REST Request Step”](#add-a-rest-request-step) 1. Open the workflow and add a step where you want it. 2. Choose **REST Request (unified)** as the step type (under the REST steps). The editor opens with a **Request** tab and a **Response** tab. ## Build the Request [Section titled “Build the Request”](#build-the-request) In the **Request** tab: ### Endpoint Source [Section titled “Endpoint Source”](#endpoint-source) Pick how you’ll supply the endpoint, then PlaidCloud fills in the request for you: * **Manual** — type the method and URL yourself. * **Postman collection (file)** / **Postman collection (URL)** — import from a Postman collection. * **OpenAPI / Swagger (URL)** / **OpenAPI / Swagger (file)** — import from an API spec. * **HAR archive (file)** — import a request captured from your browser’s network log. For an imported source, choose **Load Catalog** to list the available endpoints, filter to the one you want, and the step prefills the method, URL, headers, and parameters. ### Connection, Method, and Endpoint [Section titled “Connection, Method, and Endpoint”](#connection-method-and-endpoint) * **Connection** — the REST connection to authenticate with. Leave it unset to call a fully-qualified URL directly. * **Method** — `GET`, `POST`, `PUT`, `PATCH`, `DELETE`, or `HEAD`. * **Endpoint** — the path (when a connection is set) or a full URL. ### Headers, Query Parameters, and Body [Section titled “Headers, Query Parameters, and Body”](#headers-query-parameters-and-body) * **Headers** and **Query Parameters** are editable tables of name/value rows, each with an **On** toggle and a description. Add rows with **Add** and remove them with **Remove**. * **Body** holds the request payload (JSON, form, or raw). Anywhere in the endpoint, headers, query values, or body you can reference workflow variables with `${...}` — they’re substituted at run time. ### Test the Request [Section titled “Test the Request”](#test-the-request) * **Send Test Request** fires the request with your current settings and shows the live response inline, so you can confirm the call works before saving. * **Copy as curl** copies an equivalent curl command to the clipboard for sharing or debugging (with secrets masked). ## Route the Response [Section titled “Route the Response”](#route-the-response) In the **Response** tab, set the **Response Destination**: ### Table (default) [Section titled “Table (default)”](#table-default) The response is parsed into the step’s target table. The **Pagination & Parsing** group controls how: * **Row format** and **Items path** — where the rows live inside the response. * **Pagination mode**, **Mode params**, and **Mode paths** — how to follow multiple pages of results. * **Dump raw JSON instead of parsing rows** — store the raw payload rather than parsing it into columns. * **Retries** and **Timeout (s)** — how the request behaves under failure and how long it may run. ### Workflow variables [Section titled “Workflow variables”](#workflow-variables) Set **Destination** to **Workflow variables** and give a **Variable prefix** (a plain name such as `my_call`). The step fires **one** request and writes four workflow variables: | Variable | Contents | | --------------------- | ------------------------------------------------------------------- | | `{prefix}_status` | HTTP status code (for example, `200`). | | `{prefix}_body` | Response body as a string (JSON responses are stored as JSON text). | | `{prefix}_headers` | Response headers as JSON text. | | `{prefix}_elapsed_ms` | How long the request took, in milliseconds. | Reference them downstream like any workflow variable — for example `{my_call_status}`. This is the building block for “fire one request, then branch on the result” patterns. Caution The prefix must start with a letter or underscore and contain only letters, digits, and underscores. It can’t be one of the reserved names `cloud`, `project`, `model`, or `date`. In variable mode the step captures a single response — pagination and row parsing don’t apply. ## Run the Step [Section titled “Run the Step”](#run-the-step) The REST Request step runs like any other — as part of a full run or on its own (see [Running one step in a workflow](/guides/workflows/running-one-step-in-a-workflow/)). In table mode it writes the parsed (or raw) response to the target table; in variable mode it sets the four variables for later steps to read. ## Next Steps [Section titled “Next Steps”](#next-steps) * [REST Request step reference](/reference/workflow-steps/general/rest-request/) — concise field list * [Manage workflow variables](/guides/workflows/manage-workflow-variables/) — read and set variables across steps * [REST connections](/reference/connectors/rest/) — set up authenticated API connections # Run a workflow > Run a PlaidCloud workflow manually or on demand to execute all enabled steps in sequence for data processing and transformation. You can trigger a full workflow run by either clicking on the run icon from the **Workflows** hierarchy or by selecting **Run All** from the **Actions** menu within a specific workflow. You can also click on the **Toggle Start/Stop** button at the top of the workflow table. This toggle button will stop a running workflow or start a workflow. # Running a range of steps in a workflow > Run a specific range of steps within a PlaidCloud workflow to selectively execute portions of your data processing pipeline. While running individual steps is useful, it also may be useful to run subsets of an entire workflow for development, testing, or troubleshooting. To run a subset of steps, select all the steps you would like to run and select **Run Selected** from the **Actions** menu at the top of the workflow steps hierarchy. This will trigger a normal workflow processing but start the workflow at the beginning of the selected steps and stop once the last selected step is complete. # Running one step in a workflow > Run a single step within a PlaidCloud workflow to test, debug, or selectively execute individual data processing operations. During initial workflow development, testing, or troubleshooting, it is often useful to run steps individually. To run a single step in isolation, right click on the step and select **Run Step** from the context menu. # Skip steps in a workflow > Skip specific steps in a PlaidCloud workflow to bypass operations during testing, debugging, or selective processing runs. Steps in the workflow can be set to skip during the workflow run. This may be useful if there are debugging steps or old steps that you are not prepared to completely remove from the workflow yet. To set this option, you have two options: * Edit the step form * Uncheck the enabled checkbox in the workflow hierarchy To edit the step form, click on the step edit option, the pencil icon in the workflow table, to open the edit form. Uncheck the enabled checkbox. After saving the updated step it will no longer run as part of the workflow but can still be run using the single step run process. Steps that have been set to disabled will have a disabled indicator in the workflow steps hierarchy table. # Conditional Step Execution > Configure conditional execution for PlaidCloud workflow steps to control which steps run based on variable values and logic. ## Overview [Section titled “Overview”](#overview) Workflow steps normally execute in the defined order for the workflow. However, it is often useful to have certain steps only execute if predefined conditions are met. By using the step conditions capability you can control execution based on the following options: * Variable values * Table has rows or is empty * A document or folder exists in Document * A document or folder is missing in Document * Table query result * Date and time conditions are met For variables or table query result comparisons you can use the following comparisons: * Equal * Does not equal * Contains * Does not contain * Starts with * Ends with * Greater than * Less than * Greater than or equal * Less than or equal What is also important to note is that you can have multiple conditions that must be met in order for the step to execute. This provides a powerful tool for controlling exactly when a step should execute. ## Adding and Controlling Conditions [Section titled “Adding and Controlling Conditions”](#adding-and-controlling-conditions) To activate and add conditions on a step: 1. Find the step you want to add a condition on 2. Click the **Edit Step Details** (pencil) icon 3. Select the **Conditions** tab. 4. Check the **Check Conditions Before Running** checkbox to enable the dialog and add conditions. 5. In the **Condition Checks** section on the left, select the ”+” to add a New Condition 6. Add a condition from the tabbed section on the right 7. Repeat steps 5,6 as needed to add all your conditions ## Managing Conditions [Section titled “Managing Conditions”](#managing-conditions) You can add as many conditions as necessary in the **Conditions Check** section. As you add them, it is a good idea to give them a useful name so you can find the conditions easily in the future. Once you add a condition, select it on the left and the condition evaluation criteria will be editable on the right. ## Variable Conditions [Section titled “Variable Conditions”](#variable-conditions) When checking variable conditions, the **Value Check Parameters** section must be completed so a comparison can be made. In the **Variable or Table Field** fill in the variable name. Select a comparison type and enter a comparison value. ## Basic Table Conditions [Section titled “Basic Table Conditions”](#basic-table-conditions) If the condition is checking whether a table has rows or is empty, you will also need to define the table in the **Table Data Selection** tab. ## Advanced Table Conditions [Section titled “Advanced Table Conditions”](#advanced-table-conditions) When using Advanced Table conditions, the **Value Check Parameters** section must be completed so a comparison can be made. In the **Variable or Table Field** fill in the field name from the table selection. Select a comparison type and enter a comparison value. In the **Table Data Selection** tab, select the table and complete the data mapping section with at least the field referenced for the condition comparison. ## Document Path Conditions [Section titled “Document Path Conditions”](#document-path-conditions) If the condition is checking whether a document or folder exists, this requires picking the Document account and specifying the document path to check in the **Document Path** tab. ## Date and Time Conditions [Section titled “Date and Time Conditions”](#date-and-time-conditions) For Date or Time selections you can add multiple conditions if a combination of conditions is necessary. For example, if you only wanted a step to run on Mondays at 2:05am, you would create three conditions: * Day of the week condition set to Monday (1) * Hour of the day set to 2 * Minute of the hour set to 5 For “Use Financial Close Workday”, set that to the xth day of the month that your close happens on. For example, if your close happens on the 5th day of the month, have “5”. # Tune Alteryx Imports > Tune Alteryx migrations with dependency completion, macro resolution, variables, validation comparisons, and managed executors. PlaidCloud imports Alteryx workflows into Advanced workflows and reports conversion details during import. Use this guide to quickly complete dependency packages, review generated macros, tune variables, and interpret validation comparisons. Note Start with the conversion summary. It highlights dependency, macro, variable, executor, and validation items before you need to inspect individual steps. ## Complete Input Files [Section titled “Complete Input Files”](#complete-input-files) Symptoms: * A converted input step cannot find a file. * Dynamic input resolves to no files. * Validation row counts are lower than expected. Actions: 1. Confirm the file was included in the import package. 2. Confirm the file was uploaded to the selected Document path. 3. Confirm dynamic file patterns still match after upload. 4. Re-import or update the converted step to use the correct Document path. ## Complete Spatial Sidecars [Section titled “Complete Spatial Sidecars”](#complete-spatial-sidecars) Symptoms: * Spatial input needs a companion file. * Geometry fields are empty. * Spatial output differs from the expected result. Actions: 1. Confirm all shapefile or MapInfo sidecars were included. 2. Keep sidecars in the same Document folder. 3. Confirm projection files are present when the workflow depends on coordinate reference behavior. 4. Rerun the workflow after correcting the package. ## Complete Or Clarify Macros [Section titled “Complete Or Clarify Macros”](#complete-or-clarify-macros) Symptoms: * A macro call cannot resolve. * Macro inputs or outputs are not connected. * A workflow imports structurally and a macro output needs to be connected. Actions: 1. Include the `.yxmc` file for each referenced macro. 2. Include nested macros called by those macros. 3. Avoid duplicate macro names unless the package intentionally includes versioned macros. 4. Confirm the generated PlaidCloud macro has macro input and macro output steps. ## Workflow Variable Issues [Section titled “Workflow Variable Issues”](#workflow-variable-issues) Symptoms: * A converted app input is unset. * A condition fires unexpectedly. * A file or folder variable points to a desktop path. Actions: 1. Review workflow variables before running. 2. Set required values. 3. Replace desktop file paths with Document file or folder paths. 4. Confirm controlled choices match the expected app selection. 5. Rerun the workflow. ## Validation Differences [Section titled “Validation Differences”](#validation-differences) Symptoms: * Schema differs. * Row count differs. * Row values differ. * Artifact output has a difference to review. Actions: 1. Confirm the same input data was used in both runs. 2. Confirm variable values match. 3. Confirm source systems are from the same snapshot date. 4. Review null handling, date handling, numeric precision, and string collation. 5. For spatial outputs, review geometry format and coordinate reference behavior. 6. For artifacts, confirm the intended cloud-native output and compare the business content. ## Executor Notes [Section titled “Executor Notes”](#executor-notes) Specialized operations such as fuzzy matching, spatial processing, machine learning, PDF extraction, OCR, NLP, and reporting can run through managed executors. If an executor reports a note: 1. Open the workflow run details. 2. Review the step note. 3. Confirm required inputs and parameters are present. 4. Confirm expected output fixtures are available if parity validation is required. 5. Rerun after correcting inputs or settings. ## When To Re-Import [Section titled “When To Re-Import”](#when-to-re-import) Re-import when: * Important files or macros were added after the original package. * The wrong Document path was selected. * A newer workflow version should replace the imported version. * The workflow package needs a cleaner dependency layout. If only a variable value, credential, or Document path changed, updating the converted workflow may be enough. ## Related Guides [Section titled “Related Guides”](#related-guides) * [Package Alteryx Dependencies](/guides/workflows/package-alteryx-dependencies/) * [Validate Converted Alteryx Workflows](/guides/workflows/validate-converted-alteryx-workflows/) * [Alteryx Conversion Matrix](/reference/alteryx-conversion-matrix/) # Use Converted Alteryx Apps > Run converted Alteryx analytic apps in PlaidCloud with controlled workflow variables and repeatable inputs. PlaidCloud converts Alteryx analytic app questions into controlled workflow variables. Users can set those variables before a run, then PlaidCloud applies the values to formulas, filters, file paths, conditions, macro parameters, and downstream step settings. Note Converted app inputs are designed for repeatable cloud runs. Use saved variable values for scheduled workflows and controlled user input for interactive runs. ## Converted Input Types [Section titled “Converted Input Types”](#converted-input-types) PlaidCloud converts common Alteryx app controls to typed workflow inputs: * Text boxes become text variables. * Numeric controls become numeric variables. * Date controls become ISO date variables. * Check boxes, radio buttons, drop-downs, list boxes, and trees become controlled choice variables. * File browse controls become Document file variables. * Folder browse controls become Document folder variables. The converted workflow uses these variables anywhere the Alteryx app used the original question value. ## Set Values Before A Run [Section titled “Set Values Before A Run”](#set-values-before-a-run) Before running a converted app workflow: 1. Open the converted workflow. 2. Review workflow variables. 3. Set required text, numeric, date, choice, file, and folder values. 4. Confirm Document file and folder paths point to the imported dependency location or another approved Document path. 5. Run the workflow. For scheduled runs, save the variable values that should be used each time the schedule runs. ## Conditions, Warnings, And Errors [Section titled “Conditions, Warnings, And Errors”](#conditions-warnings-and-errors) Alteryx app conditions and error checks convert to PlaidCloud step conditions. A condition can: * Allow the workflow to continue. * Emit a warning message. * Stop the workflow with a clear error when the app rule calls for it. * Route execution through a different branch. Use this behavior to preserve app-level validation while running the workflow in PlaidCloud. ## File And Folder Inputs [Section titled “File And Folder Inputs”](#file-and-folder-inputs) File and folder questions use Document paths: * File inputs select a file from Document. * Folder inputs select a Document folder. * Imported desktop dependencies are stored at the Document path selected during import. * Dynamic input steps can use variables to resolve the final Document path at runtime. This keeps converted apps portable across users and scheduled runs. ## Macro Parameters [Section titled “Macro Parameters”](#macro-parameters) When a converted app calls a macro, app variables can feed macro parameters. PlaidCloud passes those values into the generated macro workflow through macro input and control parameter handling. Concurrent macro runs are isolated, so multiple workflow runs can use the same converted macro without sharing intermediate state. ## Validation Checklist [Section titled “Validation Checklist”](#validation-checklist) For each converted app, confirm: * Every expected question appears as a workflow variable. * Defaults match the original app where defaults were configured. * Controlled lists contain the expected choices. * File and folder variables point to Document paths. * Conditions produce the expected messages, warnings, or stop behavior. * Output parity passes for the selected variable values. ## Related Guides [Section titled “Related Guides”](#related-guides) * [Migrate Alteryx Workflows](/guides/workflows/migrate-alteryx-workflows/) * [Manage Workflow Variables](/guides/workflows/manage-workflow-variables/) * [Validate Converted Alteryx Workflows](/guides/workflows/validate-converted-alteryx-workflows/) # Validate Alteryx Reports And Artifacts > Validate converted Alteryx reports, PDFs, images, maps, charts, dashboards, and model artifacts in PlaidCloud. Some Alteryx workflows create files and visual outputs rather than only tables. PlaidCloud converts these steps into cloud-native artifacts that can be stored in Document, reviewed from workflow run history, and used by downstream processes. Note Artifact validation focuses on the business content and downstream usability of the generated output. PlaidCloud produces cloud-native artifacts designed for sharing, scheduling, and repeatable review. ## Artifact Types [Section titled “Artifact Types”](#artifact-types) Converted workflows can produce or use: * PDFs. * Images. * Charts. * Maps. * HTML or report fragments. * Report tables and layouts. * Dashboards or insight-style review artifacts. * Model, NLP, OCR, or text-analysis outputs. PlaidCloud stores generated files in Document when the converted workflow writes an artifact. ## Validate Report Content [Section titled “Validate Report Content”](#validate-report-content) For reports and PDFs, confirm: 1. The file was created in the expected Document path. 2. The report contains the expected tables, labels, sections, and values. 3. Images and logos appear where expected. 4. Page-level layout is acceptable for the business use. 5. Downstream users can open or distribute the file. When exact layout is important, compare the PlaidCloud artifact to a trusted Alteryx output. ## Validate Charts And Dashboards [Section titled “Validate Charts And Dashboards”](#validate-charts-and-dashboards) For charts and dashboard-style outputs, confirm: 1. The source data matches expected schema, row count, and values. 2. Measures, dimensions, and labels are correct. 3. Filters or parameters were applied correctly. 4. The generated chart or dashboard supports the intended review workflow. PlaidCloud may create a cloud-native visualization rather than reproducing an Alteryx desktop-specific renderer. ## Validate Maps [Section titled “Validate Maps”](#validate-maps) For map outputs, confirm: 1. Spatial inputs loaded successfully. 2. Coordinate reference behavior is acceptable. 3. Map layers contain the expected records. 4. Labels, boundaries, and geometry are correct for the business use. 5. Any intentional cloud-native artifact difference is documented. For data-critical map workflows, also validate the geometry table behind the artifact. ## Validate OCR, PDF, And NLP Outputs [Section titled “Validate OCR, PDF, And NLP Outputs”](#validate-ocr-pdf-and-nlp-outputs) For extraction and text workflows, confirm: 1. The expected files were processed. 2. Extracted text, tables, scores, topics, or classifications are present. 3. Output tables match expected schema and row counts. 4. Values match expected outputs within agreed tolerance. 5. Executor notes were reviewed. ## Acceptance Record [Section titled “Acceptance Record”](#acceptance-record) For production migrations, keep a validation record with: * Workflow name. * Run date. * Document output path. * Expected output source. * Validation level. * Accepted cloud-native artifact differences. * Approver. ## Related Guides [Section titled “Related Guides”](#related-guides) * [Validate Converted Alteryx Workflows](/guides/workflows/validate-converted-alteryx-workflows/) * [Migrate Spatial Alteryx Workflows](/guides/workflows/migrate-spatial-alteryx-workflows/) * [Alteryx Conversion Matrix](/reference/alteryx-conversion-matrix/) # Validate Converted Alteryx Workflows > Validate converted Alteryx workflows with structural checks, output parity checks, macro validation, and artifact review. Validation confirms that a converted Alteryx workflow is ready to run in PlaidCloud and, when expected outputs are available, produces the same business results. Use this guide after importing a workflow, app, or macro into PlaidCloud. Note Output parity means matching schema, row count, and row values. Row order does not need to match unless the workflow uses ordering as part of the business logic. ## Choose A Validation Level [Section titled “Choose A Validation Level”](#choose-a-validation-level) Most migration programs use two validation levels. ### Structural Validation [Section titled “Structural Validation”](#structural-validation) Structural validation confirms that PlaidCloud created a complete workflow from the Alteryx design. Use this level when you are measuring migration readiness, preparing a portfolio inventory, or converting workflows before expected outputs are available. Check that: 1. The converted workflow opens in the Visual Workflow Designer. 2. The workflow graph contains the expected branches, joins, macros, and outputs. 3. All required input files were uploaded to the selected Document path. 4. All required macros were imported or generated. 5. Macro inputs and macro outputs are connected. 6. Analytic app questions became workflow variables with controlled input fields. 7. The workflow run completes and reports clear readiness notes for data-dependent conditions. ### Output Parity Validation [Section titled “Output Parity Validation”](#output-parity-validation) Output parity validation confirms that the converted workflow produces the same tabular results as the Alteryx workflow. Compare: 1. Output table schema. 2. Row count. 3. Row values. 4. Null handling. 5. Numeric precision and rounding. 6. Date and time values. 7. Geometry values when spatial outputs are part of the result. Ignore row order unless the workflow explicitly sorts data or the downstream process depends on ordered rows. ## Validate Inputs [Section titled “Validate Inputs”](#validate-inputs) Before comparing outputs, confirm that the converted workflow uses the same source data: 1. Open the Document folder selected during import. 2. Confirm that required files and sidecar files are present. 3. Confirm that dynamic input patterns resolve to the intended files. 4. Confirm that database or API credentials are available in the target environment. 5. Confirm that workflow variables match the values used in the Alteryx run. Aligned inputs make validation faster and keep comparisons focused on workflow behavior. ## Validate Macros [Section titled “Validate Macros”](#validate-macros) Converted Alteryx macros become PlaidCloud macro workflows. Validate the macro and the calling workflow together: 1. Open the generated macro workflow. 2. Confirm that each macro input has a matching macro input step. 3. Confirm that each macro output has a matching macro output step. 4. Run a workflow that calls the macro. 5. Confirm that repeated or concurrent runs remain isolated from one another. 6. Compare the macro output data when expected macro fixtures are available. ## Validate Analytic Apps [Section titled “Validate Analytic Apps”](#validate-analytic-apps) Converted Alteryx analytic apps use workflow variables for controlled user input. Check that: * Text boxes, numeric inputs, dates, file pickers, folder pickers, lists, trees, radio buttons, check boxes, and drop-downs expose the expected input controls. * Default values match the original app where defaults were present. * Required values are set before running the workflow. * Conditions produce the expected messages, warnings, or stop behavior for the configured input values. ## Validate Specialized Outputs [Section titled “Validate Specialized Outputs”](#validate-specialized-outputs) Some converted workflows create artifacts instead of only tables. Validate those outputs according to how they are used. For reports, PDFs, images, charts, and maps: 1. Confirm that the artifact was created in the expected Document path. 2. Confirm that the artifact contains the expected data, labels, images, and layout. 3. Confirm that downstream consumers can open or use the artifact. 4. Compare to an expected artifact when one is available. For machine learning, NLP, fuzzy matching, and spatial workflows: 1. Confirm that the managed job executor completed successfully. 2. Review executor notes. 3. Compare generated tables or scores to expected outputs. 4. Document any cloud-native artifact differences when PlaidCloud output intentionally differs from an Alteryx desktop-specific output. ## Resolve Validation Differences [Section titled “Resolve Validation Differences”](#resolve-validation-differences) When validation finds a difference, review items in this order: 1. Missing or different input files. 2. Different workflow variable values. 3. Different database snapshots or API responses. 4. Date, time zone, null, rounding, or string-collation differences. 5. Spatial reference or geometry format differences. 6. Cloud-native artifact differences for reports, maps, proprietary formats, or desktop-only renderers. After correcting the cause, rerun the workflow and repeat the comparison. ## Promote A Validated Workflow [Section titled “Promote A Validated Workflow”](#promote-a-validated-workflow) When validation passes: 1. Move or copy the workflow to the production project if migration was done in a staging project. 2. Confirm production Document paths and credentials. 3. Schedule the workflow. 4. Monitor the first production runs in run history. 5. Keep the validation results with the migration record for audit and support. ## Related Guides [Section titled “Related Guides”](#related-guides) * [Migrate Alteryx Workflows](/guides/workflows/migrate-alteryx-workflows/) * [Alteryx Migration Readiness Checklist](/guides/workflows/alteryx-migration-readiness-checklist/) * [Package Alteryx Dependencies](/guides/workflows/package-alteryx-dependencies/) * [Use Converted Alteryx Apps](/guides/workflows/use-converted-alteryx-apps/) * [Orchestrate Alteryx Migrations With MCP](/guides/workflows/orchestrate-alteryx-migrations-with-mcp/) * [Validate Alteryx Reports And Artifacts](/guides/workflows/validate-alteryx-reports-and-artifacts/) * [Alteryx Conversion Matrix](/reference/alteryx-conversion-matrix/) * [Create A Macro](/guides/workflows/create-a-macro/) * [Run A Workflow](/guides/workflows/run-a-workflow/) # View a dependency audit > View dependency audit information for PlaidCloud workflows to understand data lineage and step-to-step dependencies in detail. The **Workflow Dependency Audit** is a very helpful tool to understand data and workflow dependencies in complex interconnected workflows. Over time, as workflow processes become more complex, it may become challenging to ensure all dependencies are in the correct order. When data already exists in tables, steps will run and appear correct in many cases but may actually have a dependency issue if the data is populated out of order. This tool will provide a dependency audit and identify issues with data dependency relationships. # View Workflow Report > View PlaidCloud workflow reports to review execution summaries, step completion status, timing, and processing statistics. Maintaining detailed documentation to support both statutory and management requirements is challenging when the projects and workflows may be dynamic. To help solve this problem, PlaidCloud provides a Workflow level report that provides detailed documentation of workflows, workflow steps, user defined functions, and variables. The report is generated on-demand and reflects the current state of the workflow. To download the report click on the Report icon in the **Workflows** hierarchy. # Viewing Workflow Log > View PlaidCloud workflow execution logs to monitor step progress, review output messages, and troubleshoot processing issues. ## Viewing the Workflow Log [Section titled “Viewing the Workflow Log”](#viewing-the-workflow-log) As things happen within a workflow, such as steps running or warnings occurring, those events are logged to the workflow log. This log is viewable from the **Project** area under the **Log** tab. The workflow log is also present in the project log in case you would like to see a more comprehensive view of logs across multiple workflows. The log viewer allows for sorting and filtering the log as well as viewing the details of a particular log entry. ## Clearing the Workflow Log [Section titled “Clearing the Workflow Log”](#clearing-the-workflow-log) Clearing the workflow log may be desirable from time to time. From the log viewer, select the **Clear Log** button. This will clear the log based on the workflow selected which will also remove the log entries from the project level log too. # Where are the Workflows > Navigate to and manage PlaidCloud workflows within your projects using the workflow interface and project navigation tools. Workflows live inside projects. To find them: 1. From the top menu, open **Projects**. 2. Click the project that contains the workflows you’re looking for. 3. Switch to the **Workflows** tab. You’ll see every workflow in the project, organized in a folder-style hierarchy. ## What You’ll See for Each Workflow [Section titled “What You’ll See for Each Workflow”](#what-youll-see-for-each-workflow) * **Status** — running, completed normally, or finished with a warning or error * **Created** and **last updated** timestamps, plus the names of the people responsible * **Folder organization** — workflows can be grouped in nested folders for easier management in large projects Double-click a workflow to open the **Workflow Explorer**, where you can view steps, run the whole workflow, run a single step, or pick a range. ## Why a Workflow Might Not Be Visible [Section titled “Why a Workflow Might Not Be Visible”](#why-a-workflow-might-not-be-visible) The workflows you can see depend on two things: * **Project access** — your workspace administrator grants you access to specific projects. If you expect to see a project but don’t, ask a project owner to add you. * **Viewing role** — within a project you’re assigned one of three roles: * **Architect** — can see and edit everything * **Manager** — can see and run workflows but not modify them * **Explorer** — limited visibility; some workflows may be hidden If you expect to see specific workflows and don’t, your role may be filtering them out. A project Architect can confirm what you should see. ## Next Steps [Section titled “Next Steps”](#next-steps) * [Workflow explorer](/guides/workflows/workflow-explorer/) — what to do inside an open workflow * [Create a workflow](/guides/workflows/create-workflow/) — start a new one * [Run a workflow](/guides/workflows/run-a-workflow/) — execute end-to-end # Workflow Explorer > Use the PlaidCloud Workflow Explorer to view workflow details, step configurations, execution history, and dependency information. To view the details within a workflow, find it in the project and then double click on it to open up the workflow in the explorer. ![Workflow Explorer](/images/workflow_explorer.png) From here, you can manage Workflow Steps including creating or modifying existing workflow steps, changing the order, executing steps, and so on. # Integrations > Connect PlaidCloud to AI coding agents, PySpark, and other external tools your team already uses. Connect PlaidCloud to the tools your team already uses. [AI coding agents ](/integrations/ai-coding-agents/)Use Claude Code, Cursor, Copilot, ChatGPT, Gemini, and Claude Desktop with PlaidCloud's MCP server. [PySpark ](/integrations/pyspark/)Run PySpark workloads against PlaidCloud data. # AI Agents (MCP) > Connect Claude, Cursor, GitHub Copilot, Gemini, and other AI agents to your PlaidCloud tenant through the Model Context Protocol (MCP) server. PlaidCloud exposes a curated [Model Context Protocol](https://modelcontextprotocol.io) (MCP) server at `/mcp/` on every workspace. AI agents connect to it the same way they connect to any other MCP server, then call the tools to read projects, run workflows, query tables, manage dimensions, and more. The pages in this section cover what the server exposes, how to authenticate, and step-by-step setup for the most common AI clients. # ChatGPT > Current state of MCP support in ChatGPT and recommended approaches for connecting ChatGPT to PlaidCloud. ChatGPT’s support for user-added MCP servers is still rolling out and varies by plan tier and surface. This page describes what works today. ## Chatgpt Pro / Plus / Team — Connectors [Section titled “Chatgpt Pro / Plus / Team — Connectors”](#chatgpt-pro--plus--team--connectors) If your ChatGPT plan exposes the **Connectors** UI (Settings → Connectors), you can add PlaidCloud as a custom MCP connector: 1. Go to **Settings → Connectors → Add custom connector**. 2. Enter: * **Name**: `PlaidCloud` * **MCP server URL**: `https://.plaid.cloud/mcp/` 3. ChatGPT will redirect you to PlaidCloud for OAuth login. Approve the connection. 4. Toggle the connector on inside any conversation that should be able to use it. Note The Connectors UI may not be visible on every account or every region. If you don’t see “Add custom connector,” your plan or workspace policy doesn’t currently allow user-added MCP servers — use the workaround below. ## Chatgpt Enterprise [Section titled “Chatgpt Enterprise”](#chatgpt-enterprise) Enterprise admins can pin MCP connectors at the workspace level through the admin console. Follow the same OAuth setup as above but expect a workspace approval step from your admin before the connector becomes usable. ## Workaround — Custom Gpts With REST Actions [Section titled “Workaround — Custom Gpts With REST Actions”](#workaround--custom-gpts-with-rest-actions) If your account doesn’t support custom MCP connectors, you can still drive PlaidCloud from ChatGPT through a **Custom GPT** that calls PlaidCloud’s REST API as an OpenAPI Action: 1. PlaidCloud’s REST surface is described by the OpenAPI document at `https://.plaidcloud.org/openapi_rest.json`. 2. In ChatGPT, create a Custom GPT (Explore GPTs → Create) and under **Actions** import that URL (or paste the JSON). 3. Configure authentication as **OAuth** and point at PlaidCloud’s Keycloak endpoints. Your PlaidCloud admin can supply the realm URLs and client ID. This trades MCP’s tool-name conventions for direct REST endpoints — slightly more verbose for the model to navigate, but functionally equivalent for the read/write operations PlaidCloud exposes. ## Why MCP Isn’t Always Available [Section titled “Why MCP Isn’t Always Available”](#why-mcp-isnt-always-available) OpenAI’s MCP support continues to evolve and the available surfaces (Connectors UI, Actions schema, etc.) change between plan tiers and over time. If the official **Connectors** path is open on your account, prefer it — it has built-in OAuth refresh and a tool-call experience matching the rest of this section. Falling back to Custom GPT Actions is only necessary when MCP isn’t yet exposed for your account. # Claude Code > Set up Claude Code (CLI, VSCode extension, JetBrains plugin) to call PlaidCloud's MCP tools using either OAuth or a static Bearer token. [Claude Code](https://claude.com/claude-code) is Anthropic’s coding agent. It ships as a CLI, a VSCode extension, and a JetBrains plugin — all three share the same MCP configuration. ## Option a — OAuth (recommended) [Section titled “Option a — OAuth (recommended)”](#option-a--oauth-recommended) OAuth is the lowest-maintenance path. Claude Code’s MCP bridge handles the browser redirect and refresh transparently. 1. From your project root, add the server. The CLI form: ```bash claude mcp add --transport http plaidcloud https://.plaid.cloud/mcp/ ``` Or in `.mcp.json` at the project root: ```json { "mcpServers": { "plaidcloud": { "type": "http", "url": "https://.plaid.cloud/mcp/" } } } ``` 2. Restart Claude Code. The first time you ask it to use a `plaidcloud_` tool, the bridge will pop up an authorization URL. Open it, sign in to PlaidCloud, approve the connection, and paste the callback URL Claude Code asked for. The token is cached locally and refreshed automatically. 3. Verify with `claude mcp list` (CLI) or `/mcp` (in-session). The server should show as connected. Note The OAuth bridge state can occasionally get stuck on remote/SSH sessions where Claude Code can’t reach `localhost` from your browser. If that happens, fall back to Option B. ## Option B — Static Bearer Token [Section titled “Option B — Static Bearer Token”](#option-b--static-bearer-token) When OAuth isn’t practical (remote sessions, devcontainers, agent runtimes that don’t survive browser redirects), use a Bearer token: 1. In a browser tab where you’re signed into PlaidCloud, open: ```plaintext https://.plaid.cloud/mcp/setup/token ``` Click “Copy snippet.” 2. Paste it into your project’s `.mcp.json` under `mcpServers`: ```json { "mcpServers": { "plaidcloud": { "type": "http", "url": "https://.plaid.cloud/mcp/", "headers": { "Authorization": "Bearer eyJhbGc…" } } } } ``` Or via the CLI: ```bash claude mcp add --transport http plaidcloud \ https://.plaid.cloud/mcp/ \ -H "Authorization: Bearer eyJhbGc…" ``` 3. Restart Claude Code. `claude mcp list` should show the server as connected. When the token expires, reload the `/mcp/setup/token` URL and replace the `Authorization` value. ## Multi-Tenant Setup [Section titled “Multi-Tenant Setup”](#multi-tenant-setup) You can configure multiple PlaidCloud tenants side-by-side — give each a distinct name: ```json { "mcpServers": { "plaidcloud-prod": { "type": "http", "url": "https://prod.plaid.cloud/mcp/" }, "plaidcloud-dev": { "type": "http", "url": "https://dev.plaid.cloud/mcp/" } } } ``` When you ask Claude Code to do something, name the tenant in your prompt (“in the dev tenant, find projects whose name starts with `Q4`”) so it picks the right server. ## Tips [Section titled “Tips”](#tips) * Run `mcp_introspect` early in a session so Claude Code understands the tool surface without re-reading the full manifest on every call. * Mutating tools (`*_upsert`, `*_organize`, `*_run`) accept `dry_run=True` — useful when you’re letting an agent script changes and want a plan to review first. * For long-running operations (workflow runs, query exports), prefer the `*_track` / `*_status` tools over poll loops — they’re the single source of truth and avoid flooding the agent’s context. # Claude Desktop and Claude.ai > Add PlaidCloud as a Custom Connector in Claude.ai (web) or Claude Desktop so the chat assistant can call MCP tools during conversations. The consumer Claude app — both the web UI at [claude.ai](https://claude.ai) and the desktop app — supports MCP servers through its **Custom Connectors** feature. Setup is a one-time OAuth dance, after which the connection is associated with your Claude account and follows you across devices. Note Custom Connectors are available on Claude Pro, Max, Team, and Enterprise plans. Team and Enterprise admins can pin connectors for everyone in the workspace. ## Setup [Section titled “Setup”](#setup) 1. In Claude (web or desktop), open **Settings → Connectors**. 2. Click **Add custom connector**. 3. Fill in: * **Name**: `PlaidCloud` (or `PlaidCloud (prod)`, `PlaidCloud (dev)` if you have multiple tenants). * **Server URL**: `https://.plaid.cloud/mcp/` 4. Click **Add**. Claude opens a browser tab to PlaidCloud’s Keycloak login. 5. Sign in and approve the connection. You’ll be redirected back to Claude with the connection saved. The connector is now available in any conversation. Toggle it on (or off) using the connectors picker in the chat composer. ## Usage [Section titled “Usage”](#usage) Once enabled, you can ask Claude things like: * “List the workflows in project `Q4 Forecast`.” * “Show me the last 10 failed runs for the `daily-load` workflow.” * “Run `mcp_recipes` and pick the right one for backfilling a step.” Claude will pick the appropriate MCP tool, call it, and incorporate the response into its reply. For mutating operations it will typically narrate what it’s about to do — review carefully before approving. ## Multiple Tenants [Section titled “Multiple Tenants”](#multiple-tenants) Add a separate connector for each tenant. Give them distinct names so Claude can tell them apart in conversation. Only the connectors you toggle on for a given chat are available — leaving production off by default and only enabling it when you’re sure is a sensible safety habit. ## Refreshing Access [Section titled “Refreshing Access”](#refreshing-access) Custom connectors store an OAuth refresh token, so re-authentication is rare. If you change your PlaidCloud password, get a new device, or your session is invalidated server-side, the connector may show “needs authentication.” Click **Reconnect** in the connectors settings to redo the OAuth flow. ## Disconnecting [Section titled “Disconnecting”](#disconnecting) Settings → Connectors → click the connector → **Remove**. This deletes the OAuth tokens stored with your Claude account. The PlaidCloud-side session is independent — log out of PlaidCloud separately if you want to invalidate the underlying Keycloak session. # GitHub Copilot > Configure GitHub Copilot's agent mode in VSCode to use PlaidCloud's MCP tools. GitHub Copilot’s [agent mode](https://docs.github.com/copilot/using-github-copilot/copilot-chat-in-ide/using-mcp-with-copilot) in VSCode supports MCP servers via a workspace or user-scoped `.vscode/mcp.json` (workspace) or the global VSCode `mcp.json` (user). Note Copilot’s MCP support is gated by Copilot plan tier and the `chat.mcp.enabled` setting in VSCode. Make sure both are enabled before configuring servers. ## Setup [Section titled “Setup”](#setup) 1. Get a Bearer token by visiting `https://.plaid.cloud/mcp/setup/token` in a browser where you’re signed into PlaidCloud. 2. In VSCode, open the command palette and run **MCP: Add Server**, or create `.vscode/mcp.json` directly: ```json { "servers": { "plaidcloud": { "type": "http", "url": "https://.plaid.cloud/mcp/", "headers": { "Authorization": "Bearer eyJhbGc…" } } } } ``` 3. Reload VSCode. Open the Copilot chat panel, switch the mode dropdown to **Agent**, and confirm the PlaidCloud tools appear in the tools list (typically shown as `plaidcloud_*`). ## Usage [Section titled “Usage”](#usage) Ask Copilot to perform PlaidCloud operations directly: * “Find all workflows in project `Q4 Forecast` whose last run failed.” * “Show me the schema of table `customers` and suggest indexes.” * “Run the `daily-load` workflow and report the run status.” Copilot picks the appropriate tool, executes it, and quotes results in its reply. For destructive operations (delete, organize, upsert without `dry_run`), it will typically ask for confirmation — review the planned action before approving. ## Refreshing the Token [Section titled “Refreshing the Token”](#refreshing-the-token) VSCode reads `.vscode/mcp.json` on startup and on file change. When the token expires, reload `https://.plaid.cloud/mcp/setup/token` and overwrite the `Authorization` value — VSCode reloads the server automatically. ## Restricting to Specific Tools [Section titled “Restricting to Specific Tools”](#restricting-to-specific-tools) If you want to limit which PlaidCloud tools Copilot can call, use VSCode’s per-server tool allow-list (Settings → Copilot → MCP → server-specific tool selection). This is helpful for read-only sessions or for keeping mutating tools (`*_upsert`, `*_run`) gated behind explicit re-enable. # Cursor > Configure Cursor IDE to call PlaidCloud's MCP tools through its built-in MCP support. [Cursor](https://cursor.com) supports MCP servers through a `mcp.json` config file. The shape is the same as Claude Code’s `.mcp.json`, so the same Bearer-token snippet works in both. ## Setup [Section titled “Setup”](#setup) 1. Get a Bearer token by visiting `https://.plaid.cloud/mcp/setup/token` in a browser where you’re signed into PlaidCloud. 2. Open Cursor’s MCP config: * **Project-scoped**: create `.cursor/mcp.json` in your project root. * **User-scoped (all projects)**: create `~/.cursor/mcp.json` in your home directory. 3. Add the PlaidCloud server: ```json { "mcpServers": { "plaidcloud": { "url": "https://.plaid.cloud/mcp/", "headers": { "Authorization": "Bearer eyJhbGc…" } } } } ``` 4. Open Cursor’s **Settings → MCP** to verify the server is connected. If it shows an error, see [Troubleshooting](../troubleshooting/). ## Using the Tools [Section titled “Using the Tools”](#using-the-tools) In Cursor’s Composer or chat panel, you can prompt the agent in plain English (“describe the structure of project `Q4 Forecast`”) and it will pick the appropriate `plaidcloud_*` tool. Tool calls and responses appear inline — review mutating operations before approving. ## Refreshing the Token [Section titled “Refreshing the Token”](#refreshing-the-token) When the token expires, reload `https://.plaid.cloud/mcp/setup/token` and paste the new value into `mcp.json`. Cursor picks up the change without a full restart — toggle the server off/on in **Settings → MCP** if needed. ## Multiple Tenants [Section titled “Multiple Tenants”](#multiple-tenants) Repeat the entry under a different name: ```json { "mcpServers": { "plaidcloud-prod": { "url": "https://prod.plaid.cloud/mcp/", "headers": { "Authorization": "Bearer …" } }, "plaidcloud-dev": { "url": "https://dev.plaid.cloud/mcp/", "headers": { "Authorization": "Bearer …" } } } } ``` # Google Gemini CLI > Configure Google's gemini-cli to call PlaidCloud's MCP tools. Google’s [`gemini-cli`](https://github.com/google-gemini/gemini-cli) supports MCP servers through `~/.gemini/settings.json` (user-scoped) or `.gemini/settings.json` in a project root (project-scoped). ## Setup [Section titled “Setup”](#setup) 1. Get a Bearer token by visiting `https://.plaid.cloud/mcp/setup/token` in a browser where you’re signed into PlaidCloud. 2. Edit `~/.gemini/settings.json` (create it if it doesn’t exist): ```json { "mcpServers": { "plaidcloud": { "httpUrl": "https://.plaid.cloud/mcp/", "headers": { "Authorization": "Bearer eyJhbGc…" } } } } ``` 3. Restart `gemini`. Inside a session, run `/mcp` to verify the server is connected and list its tools. ## Usage [Section titled “Usage”](#usage) Prompt Gemini to use the PlaidCloud tools directly: * “Use plaidcloud to list workflows in the `Sales Forecasting` project.” * “Describe the columns of the `customers` table and the most recent snapshot.” You can also call tools explicitly with the `/tools` command if you want to inspect a specific tool’s input schema before invoking it. ## Refreshing the Token [Section titled “Refreshing the Token”](#refreshing-the-token) When the token expires, reload `https://.plaid.cloud/mcp/setup/token` and update the `Authorization` value in `settings.json`. Restart `gemini` to pick up the change. ## Gemini Code Assist (IDE) [Section titled “Gemini Code Assist (IDE)”](#gemini-code-assist-ide) Gemini Code Assist in VSCode/JetBrains accepts the same `mcpServers` config under its agent settings. The schema matches the CLI form — paste the same snippet under the IDE’s MCP server settings panel. # Getting Started with AI Coding Agents > Overview of the PlaidCloud MCP server — what it exposes, authentication options, and the basics every AI agent client needs. ## What is the MCP Server? [Section titled “What is the MCP Server?”](#what-is-the-mcp-server) The [Model Context Protocol](https://modelcontextprotocol.io) is an open standard for letting AI agents talk to external tools and data. PlaidCloud runs an MCP server on every workspace that wraps the same core helpers the REST API uses — grouped by intent (`find`, `describe`, `upsert`, `run`, `organize`) so an agent can navigate the surface without loading 1,000+ low-level RPC method names. The server lives at: ```text https://.plaid.cloud/mcp/ ``` Replace `` with your workspace subdomain (whatever you use to log into the PlaidCloud UI). ## What It Exposes [Section titled “What It Exposes”](#what-it-exposes) The catalog covers most of the day-to-day surface an agent needs: * **Projects, workflows, steps** — find/describe/upsert/run/organize, including step-level rerun and version history. * **Tables, views, queries** — schema introspection, query execution, exports, snapshots, branches. * **Dimensions** — describe, find, upsert, version, manage nodes/aliases/properties. * **Connections** — find/upsert/test connections to external systems. * **Lakehouse** — branches, snapshots, optimize/vacuum operations. * **Identity** — members, groups, sessions, distros. * **Documents, dashboards, UDFs, editors, agents, publishes** — domain-specific tools. * **Alteryx migration** — convert Alteryx workflows staged in Document and coordinate portfolio migration work. * **Workflow logs and run tracking** — `workflow_logs`, `workflow_run_status`, `workflow_job_track`. Every tool returns a uniform envelope `{ok, data, next_cursor?, total?}`; failures use `{ok: false, error: {code, retryable, message, hint?}}`. Mutations accept `dry_run=True` for plan-without-write validation. For the live catalog, point your agent at the server and call `mcp_introspect` (no arguments) — that returns the current tool count, per-domain summaries, and parameter signatures. Use `mcp_recipes` for common multi-tool playbooks (paginating large lists, snapshot-then-modify, rerun a failed step, etc.). ## Authentication [Section titled “Authentication”](#authentication) PlaidCloud’s MCP server accepts two authentication paths: 1. **OAuth 2.1 + PKCE via Dynamic Client Registration (DCR).** This is what Claude.ai’s custom-connector UI uses, and it’s also the default for Claude Code’s MCP bridge. The client registers itself, redirects you to PlaidCloud’s Keycloak login, and gets back a token transparently. You don’t need to do anything other than pick “OAuth” in the client and approve the login. 2. **Static Bearer token in an `Authorization` header.** For agent runtimes that don’t have a usable browser redirect or that want a long-lived token in a config file. PlaidCloud exposes a helper page to mint one for you (see below). ### Getting a Static Bearer Token [Section titled “Getting a Static Bearer Token”](#getting-a-static-bearer-token) Open this URL in a browser tab where you’re already signed into PlaidCloud: ```text https://.plaid.cloud/mcp/setup/token ``` The page returns a JSON snippet ready to paste into your agent’s MCP config. Each workspace has its own snippet. The token’s lifespan is governed by your Keycloak realm’s access-token-lifespan policy (typically a few hours to a day). To refresh, reload the same URL — your browser session re-mints the token automatically. Note This flow is intended for getting started and for clients that can’t complete an OAuth redirect. For long-lived agent access in production — service accounts, CI runners, scheduled jobs — use OAuth where the client supports it. Personal Access Tokens are planned but not yet generally available. ## Pick a Client [Section titled “Pick a Client”](#pick-a-client) The rest of this section walks through setup for specific AI agent clients: * [Claude Code](../claude-code/) — Anthropic’s coding agent (CLI, VSCode extension, JetBrains plugin). * [Claude Desktop and Claude.ai](../claude-desktop/) — the consumer Claude app (desktop) and web (`claude.ai`) using “Custom Connectors.” * [Cursor](../cursor/) — the AI-native code editor. * [GitHub Copilot](../copilot/) — Copilot agent mode in VSCode. * [Google Gemini CLI](../gemini/) — `gemini-cli` and Gemini Code Assist. * [ChatGPT](../chatgpt/) — current support status and recommended workaround. * [Troubleshooting](../troubleshooting/) — common errors and how to fix them. # Troubleshooting > Common issues connecting AI agents to the PlaidCloud MCP server and how to fix them. ## “failed to Connect” / “session Not Found” [Section titled ““failed to Connect” / “session Not Found””](#failed-to-connect--session-not-found) Symptom: the client config looks correct but the server shows as failed in the client’s MCP status panel. Server logs show 404 `Session not found` errors. Cause: a known bug in some MCP clients (most notably Claude Code 2.1.111 with static `Authorization` headers) where the client doesn’t preserve `mcp-session-id` between successive HTTP requests. PlaidCloud’s MCP server runs in stateless HTTP mode to sidestep this — if you’re seeing it on a tenant that hasn’t been redeployed since the fix, ask your administrator to redeploy. Until the redeploy lands, the OAuth flow (without static `Authorization` headers) still works because it uses a different code path in the client. ## ”oauth Flow is Not in Progress” During Claude Code Login [Section titled “”oauth Flow is Not in Progress” During Claude Code Login”](#oauth-flow-is-not-in-progress-during-claude-code-login) Symptom: you authorize in the browser, paste the callback URL into Claude Code, and it says no flow is in progress. Cause: the bridge’s in-memory OAuth state is per-port and can be lost between the `authenticate` and `complete_authentication` tool calls — particularly in remote/SSH sessions where the browser callback can’t reach `localhost`. Fix: switch to the static Bearer flow. Open `https://.plaid.cloud/mcp/setup/token` in a browser where you’re signed into PlaidCloud, copy the snippet into your `.mcp.json`, and restart Claude Code. ## Token Expired [Section titled “Token Expired”](#token-expired) Symptom: tools that worked yesterday now return 401 Unauthorized. Cause: static Bearer tokens follow your Keycloak realm’s access-token-lifespan policy (typically a few hours to a day). Fix: reload `https://.plaid.cloud/mcp/setup/token` to mint a fresh token, paste it into your config. For long-lived sessions, prefer OAuth — clients that support it refresh tokens automatically. ## ”no `access_token` in Session” [Section titled “”no access\_token in Session””](#no-access_token-in-session) Symptom: opening `/mcp/setup/token` shows “Sign-in required” or “No access\_token in session” even though you’re signed into PlaidCloud. Cause: your session was established through a sign-in path that didn’t cache the Keycloak token (uncommon but possible). Fix: sign out of PlaidCloud and sign back in through the standard login page. The new session will carry the access token. ## Tools Missing or Incomplete Catalog [Section titled “Tools Missing or Incomplete Catalog”](#tools-missing-or-incomplete-catalog) Symptom: `mcp_introspect` returns fewer tools than expected, or specific tools you need aren’t there. Cause 1 — scopes: tools require specific PlaidCloud scopes (e.g. `analyze.workflow.write`). If your account lacks the scope, the tool will refuse to execute. Run `mcp_introspect(name='')` to see `required_scopes`. Ask your workspace admin to grant the scope or run the operation through an account that has it. Cause 2 — version mismatch: an older deployment may not have all the tools described in the latest docs. Compare `mcp_introspect()`’s tool count to your current version’s release notes; ask for a redeploy if needed. ## Multi-Tenant: Which Tenant Did the Agent Just Hit? [Section titled “Multi-Tenant: Which Tenant Did the Agent Just Hit?”](#multi-tenant-which-tenant-did-the-agent-just-hit) Symptom: you have multiple PlaidCloud tenants configured and the agent’s response could’ve come from any of them. Fix: include the tenant explicitly in your prompt (“in the **dev** tenant, list workflows…”). The MCP server names you chose in your config (e.g. `plaidcloud-prod`, `plaidcloud-dev`) double as identifiers the model can disambiguate against. For high-stakes operations, keep production toggled off in the connectors picker until you actively need it. ## Rate Limits and Quota [Section titled “Rate Limits and Quota”](#rate-limits-and-quota) PlaidCloud’s REST surface is rate-limited per requests-per-minute via the same middleware that fronts the UI. MCP calls share that limit. If an agent fires off a long burst of `find` calls (e.g. trying to enumerate every project + workflow + step), you may hit the limit. Use pagination (`cursor`, `limit`) and `count_only=True` for sizing checks instead of fetching the full result set. ## Getting Help [Section titled “Getting Help”](#getting-help) For server-side issues — auth failures, tools returning errors with no obvious cause, missing tools — check the response’s `error` envelope first. Every failure includes `code`, `retryable`, `message`, and often a `hint`. If the hint isn’t enough, contact your PlaidCloud administrator or open a support ticket with the request ID (returned in the `X-Request-Id` response header). # PySpark and Spark Compute Clusters > Build and run PySpark applications on PlaidCloud Spark compute clusters for distributed large-scale data analysis and processing. Use PySpark with PlaidCloud — connect to project tables, read data into Spark DataFrames, and run distributed transformations alongside the rest of your data pipeline. # Getting Started with PySpark > Get started using PySpark in PlaidCloud for distributed data processing within user-defined functions and Jupyter Notebooks. ## PySpark Documentation [Section titled “PySpark Documentation”](#pyspark-documentation) PySpark is similar to using Pandas but allows for distributed compute and is not RAM bound. PySpark is available in both UDFs and Jupyter Notebooks. ## Spark Cluster [Section titled “Spark Cluster”](#spark-cluster) By default, workspaces do not have the Spark cluster enabled. To activate the Spark Cluster, go to the Workspace management app and enable the “Spark Compute Cluster” service. Once activated, Spark jobs can be submitted to the cluster. The cluster can be monitored from the `spark` sub-domain for the Workspace (e.g. `https://spark.my_workspace.plaid.cloud`) # Reference > Look up workflow steps, expressions, connectors, and CLI commands. Reference material — search-driven rather than browse-driven. Use the search bar (⌘K / Ctrl-K) to jump straight to a specific function, step, or connector. [Workflow steps ](/reference/workflow-steps/)Every step type — import, export, table ops, allocations, notifications, document handling, SAP integrations, and more. [Alteryx Conversion Matrix ](/reference/alteryx-conversion-matrix/)How Alteryx tools convert to PlaidCloud workflow steps, macros, variables, artifacts, and managed executors. [Expressions ](/reference/expressions/)SQL functions for Lakehouse v1 and v2 — array, string, datetime, aggregate, window, geo, and others. [Connectors ](/reference/connectors/)Provider-by-provider reference for databases, ERPs, REST APIs, cloud storage, and open table formats. [CLI ](/reference/cli/)PlaidLink, PlaidXL, and Jupyter CLI command references. [Glossary ](/reference/glossary/)Definitions for every PlaidCloud term used across the documentation. # Alteryx Conversion Matrix > Coverage reference for how PlaidCloud converts Alteryx tools into Advanced workflow steps, macros, typed variables, Document assets, and managed job executors. PlaidCloud converts Alteryx workflows, apps, and macros into Advanced workflows. The importer maps each Alteryx object to a native workflow step, macro construct, controlled variable, Document-backed file operation, or managed job executor. Coverage levels: * **Fully Converts** - converted directly to native PlaidCloud DAG behavior. * **Converts With Validation** - converted to PlaidCloud behavior and should be validated against expected outputs for option-level parity. * **Converts To Executor** - converted to a managed PlaidCloud job executor for specialized processing. * **Cloud-Native Equivalent** - converted to a useful PlaidCloud artifact or operation that preserves the business purpose in a cloud-native form. * **Annotation Only** - retained as workflow context, layout, or pass-through behavior with no separate runtime operation. | Alteryx Object | Coverage Level | PlaidCloud Operation | Notes | | ------------------------ | ------------------------ | ------------------------------------------------------- | ---------------------------------------------------------------------------------- | | Action | Fully Converts | Variable binding and conditional step configuration | Updates downstream settings from converted app inputs. | | AlteryxSelect | Fully Converts | Select and schema projection step | Keeps selected, renamed, and reordered fields. | | AppendFields | Fully Converts | Append fields transform | Appends fields from one stream to another. | | AutoField | Converts With Validation | Auto field sizing transform | Preserves inferred field sizing intent; validate schema where precision matters. | | BrowseV2 | Annotation Only | Browse or passthrough marker | Preserved for inspection without adding runtime work. | | Buffer | Converts To Executor | Spatial executor | Creates buffered geometries with validation against spatial fixtures. | | CheckBoxGroup | Fully Converts | Controlled workflow variable | Converts app check box choices to controlled user input. | | Classification | Converts To Executor | Machine learning executor | Runs classification logic through managed ML execution. | | Condition | Fully Converts | Step condition with warning or error action | Uses workflow step conditions to trigger warnings, errors, or branches. | | ControlParam | Fully Converts | Macro control parameter | Maps to PlaidCloud macro parameter handling. | | CreatePoints | Fully Converts | Geometry point creation transform | Creates point geometry from coordinate fields. | | CrossTab | Fully Converts | Pivot or cross-tab transform | Converts rows to columns. | | DataCleansePro | Converts With Validation | Data cleanse transform | Cleans whitespace, nulls, punctuation, and casing according to configured options. | | Date | Fully Converts | Workflow variable date value | Emits ISO date values for downstream steps and conditions. | | DateTime | Converts With Validation | Date and time transform | Converts date and time parsing or formatting logic. | | DbFileInput | Fully Converts | Document-backed file input or data materializer | Loads source files from Document into workflow data. | | DbFileOutput | Fully Converts | Document-backed file output or table write | Writes output data to Document or PlaidCloud tables. | | Detour | Fully Converts | Conditional branch routing | Converts route selection to DAG conditions. | | DetourEnd | Fully Converts | Conditional branch merge | Rejoins conditionally selected branches. | | Directory | Fully Converts | Document directory listing | Lists files from a Document path. | | Distance | Converts To Executor | Spatial distance executor | Computes distance using managed spatial processing. | | Download | Converts To Executor | HTTP download executor | Downloads external data or artifacts. | | DropDown | Fully Converts | Controlled workflow variable | Converts app drop-down choices to controlled user input. | | DynamicInput | Converts With Validation | Dynamic Document input | Resolves file patterns or variable-driven inputs at runtime. | | DynamicRename | Fully Converts | Dynamic rename transform | Renames fields using metadata or configured rules. | | DynamicReplace | Converts With Validation | Dynamic replace transform | Applies replacement rules from a second data stream. | | DynamicSelect | Fully Converts | Dynamic field selection transform | Selects fields by type, name, or rule. | | Error | Fully Converts | Step condition with error action | Converts configured error behavior to PlaidCloud step conditions. | | FileBrowse | Fully Converts | Controlled Document file variable | Lets users choose a file for a converted app run. | | Filter | Fully Converts | Filter transform | Splits records by expression into true and false paths. | | FindNearest | Converts To Executor | Spatial nearest-neighbor executor | Finds nearest spatial records with managed spatial processing. | | Fit | Converts To Executor | Model training executor | Trains or fits model behavior through managed execution. | | FolderBrowse | Fully Converts | Controlled Document folder variable | Lets users choose a folder for a converted app run. | | Formula | Fully Converts | Formula transform | Converts field expressions to PlaidCloud expressions or SQL-backed logic. | | FuzzyMatch | Converts To Executor | Fuzzy matching executor | Uses managed fuzzy matching for match keys, thresholds, and candidate review. | | Generalize | Converts To Executor | Spatial generalization executor | Simplifies geometry while preserving the requested spatial intent. | | HtmlBox | Cloud-Native Equivalent | Report text or HTML artifact | Preserves content in PlaidCloud report or artifact output. | | ImageToText | Converts To Executor | OCR executor | Extracts text from images through managed OCR. | | Insights | Cloud-Native Equivalent | PlaidCloud dashboard or artifact output | Creates a cloud-native review artifact for repeatable sharing and review. | | Join | Fully Converts | Join transform | Produces joined, left-only, and right-only streams. | | JoinMultiple | Fully Converts | Multi-join transform | Joins multiple input streams. | | Label | Annotation Only | Canvas label | Preserved as workflow context. | | LabelGroup | Annotation Only | Canvas label group | Preserved as workflow context. | | Link | Annotation Only | Canvas link or annotation | Preserved as workflow context. | | ListBox | Fully Converts | Controlled workflow variable | Converts app list selections to controlled user input. | | MacroInput | Fully Converts | PlaidCloud macro input port | Maps directly to a PlaidCloud macro input step. | | MacroOutput | Fully Converts | PlaidCloud macro output port | Maps directly to a PlaidCloud macro output step. | | Map | Cloud-Native Equivalent | Map artifact or spatial visualization | Creates a PlaidCloud map artifact for cloud review and sharing. | | MapInput | Converts With Validation | Spatial input materializer | Loads spatial input data into the converted workflow. | | Message | Fully Converts | Step condition with warning or message action | Emits workflow warning, message, or error based on configured condition. | | Modeling | Converts To Executor | Machine learning executor | Runs model-oriented processing through managed execution. | | MultiFieldFormula | Converts With Validation | Multi-field formula transform | Applies a formula across selected fields. | | MultiRowFormula | Converts With Validation | Window or row-aware formula transform | Converts row-relative logic to PlaidCloud window behavior where possible. | | NumericUpDown | Fully Converts | Controlled numeric workflow variable | Converts app numeric input to a typed variable. | | Overlay | Converts To Executor | Spatial overlay executor | Performs spatial overlay operations through managed spatial processing. | | PDFInput | Converts To Executor | PDF extraction executor | Extracts text or tables from PDFs. | | PlotlyCharting | Cloud-Native Equivalent | Chart artifact | Creates a PlaidCloud chart artifact from converted data. | | PolyBuild | Converts To Executor | Spatial polygon build executor | Builds polygon geometry from spatial inputs. | | PortfolioComposerImage | Cloud-Native Equivalent | Report image artifact | Places images into generated PlaidCloud report artifacts. | | PortfolioComposerLayout | Cloud-Native Equivalent | Report layout artifact | Converts layout intent to PlaidCloud report generation. | | PortfolioComposerRender | Cloud-Native Equivalent | Report render artifact | Renders report output as a PlaidCloud artifact. | | PortfolioComposerTable | Cloud-Native Equivalent | Report table artifact | Converts report table content to PlaidCloud report output. | | PortfolioComposerText | Cloud-Native Equivalent | Report text artifact | Converts report text content to PlaidCloud report output. | | Predict | Converts To Executor | Prediction executor | Scores records using managed model execution. | | RadioButtonGroup | Fully Converts | Controlled workflow variable | Converts app radio choices to controlled user input. | | RecordID | Fully Converts | Row identifier transform | Adds a deterministic record identifier. | | RegEx | Fully Converts | Regular expression transform | Parses, matches, or replaces text using configured expressions. | | Regression | Converts To Executor | Regression executor | Runs regression modeling through managed execution. | | ReportMap | Cloud-Native Equivalent | Map report artifact | Produces a cloud-native map/report artifact. | | Sample | Fully Converts | Sample transform | Keeps configured records by count, percentage, or grouping rule. | | Smooth | Converts To Executor | Spatial smoothing executor | Smooths geometry through managed spatial processing. | | Sort | Fully Converts | Sort transform | Sorts records by configured fields and directions. | | SpatialInfo | Converts With Validation | Spatial metadata transform | Extracts spatial metadata such as area, length, centroid, or bounds. | | SpatialMatch | Converts To Executor | Spatial match executor | Matches records by spatial relationship. | | SpatialProcess | Converts To Executor | Spatial processing executor | Runs spatial operations that require executor-backed geometry handling. | | Summarize | Fully Converts | Aggregate transform | Groups and aggregates records. | | Tab | Annotation Only | App tab grouping | Preserved as converted app structure where relevant. | | Test | Fully Converts | Step condition with warning or error action | Converts test assertions to PlaidCloud conditions. | | TextBox | Fully Converts | Controlled text workflow variable | Converts app text input to a typed variable. | | TextInput | Fully Converts | Inline table input | Creates inline data for the workflow. | | TextPreProcessing | Converts To Executor | NLP preprocessing executor | Performs text normalization and preprocessing. | | TextToColumns | Fully Converts | Split columns transform | Splits text into fields or rows. | | Tile | Converts With Validation | Tile or grouping transform | Assigns tile groups according to configured rules. | | ToolContainer | Annotation Only | Canvas container | Preserved as visual workflow organization. | | TopicModel | Converts To Executor | Topic modeling executor | Runs topic modeling through managed NLP execution. | | TradeArea | Converts To Executor | Spatial trade area executor | Creates trade area geometry through managed spatial processing. | | Transformation | Converts With Validation | Transform step | Converts configured transformation logic to PlaidCloud expressions or SQL. | | Transpose | Fully Converts | Unpivot transform | Converts columns to rows. | | Tree | Fully Converts | Controlled workflow variable | Converts app tree selection to controlled user input. | | Union | Fully Converts | Union transform | Combines streams by name, position, or configured field rules. | | Unique | Fully Converts | Unique and duplicate split transform | Separates first unique records from duplicates. | | VisualLayout | Annotation Only | Canvas layout metadata | Preserved as design context. | | WordCloud | Cloud-Native Equivalent | Text visualization artifact | Creates a PlaidCloud visualization artifact from text analysis output. | | XMLParse | Converts With Validation | XML parse transform | Extracts XML fields into workflow data. | | Missing plugin reference | Fully Converts | Macro invocation or generated placeholder when resolved | Imports known macro sources and maps macro calls to PlaidCloud macro steps. | ## Validation Notes [Section titled “Validation Notes”](#validation-notes) For production workflows, validate converted outputs against trusted Alteryx outputs. PlaidCloud validation focuses on schema, row count, and row values, and ignores row order unless the workflow explicitly depends on ordered data. Specialized operations such as spatial processing, fuzzy matching, machine learning, OCR, NLP, and reporting may run through managed job executors. These routes keep the converted workflow cloud-native while covering capabilities that are not best expressed as a single SQL transform. # CLI Tools > Reference for PlaidCloud command-line tools — PlaidLink agent, PlaidXL Excel add-in, and the Jupyter CLI for notebook integration. PlaidCloud provides three command-line and on-machine tools for working with workspaces outside the web UI: ## PlaidLink [Section titled “PlaidLink”](#plaidlink) [PlaidLink](/reference/cli/plaidlink/) is an agent that runs inside your network to bridge PlaidCloud workflows to firewall-protected resources — databases, file shares, and other systems that PlaidCloud can’t reach directly. It installs as a Windows service, Unix/Linux/Mac daemon, container, or Kubernetes pod. * [Install](/reference/cli/plaidlink/install/) * [Configure](/reference/cli/plaidlink/configure/) * [Agents](/reference/cli/plaidlink/agents/) * [Upgrade](/reference/cli/plaidlink/upgrade/) ## PlaidXL [Section titled “PlaidXL”](#plaidxl) [PlaidXL](/reference/cli/plaidxl/) is the PlaidCloud Excel add-in. It lets analysts pull data from project tables, refresh saved queries, and read PlaidCloud variables directly inside Microsoft Excel. * [Install](/reference/cli/plaidxl/install/) * [Connect](/reference/cli/plaidxl/connect/) * [Retrieve data](/reference/cli/plaidxl/retrieve/) ## Jupyter CLI [Section titled “Jupyter CLI”](#jupyter-cli) [Jupyter CLI](/reference/cli/jupyter/) lets data scientists work with PlaidCloud project data from Jupyter notebooks using a PlaidCloud-aware CLI and Python helpers. * [Command line](/reference/cli/jupyter/command-line/) * [Jupyter notebook](/reference/cli/jupyter/jupyter-notebook/) * [OAuth setup](/reference/cli/jupyter/oauth-setup/) # Jupyter Notebooks and Command Line Interfaces > Access PlaidCloud directly through Jupyter Notebooks, command line interfaces, and API connections using OAuth token authentication. PlaidCloud’s Jupyter integration lets data scientists work with project tables, run queries, and use PlaidCloud as a backing store from notebook environments. Authentication uses OAuth tokens so the same credentials work across the CLI, notebooks, and the REST API. ## Topics [Section titled “Topics”](#topics) * [Command line](/reference/cli/jupyter/command-line/) — the standalone CLI for scripted interactions * [Jupyter notebook](/reference/cli/jupyter/jupyter-notebook/) — using PlaidCloud from notebook cells * [OAuth setup](/reference/cli/jupyter/oauth-setup/) — wiring up authentication # Command Line > Use the PlaidCloud command line interface to automate tasks, run scripts, and interact with PlaidCloud resources via terminal. PlaidCloud uses standard JSON-RPC requests and can be used with any application that can perform those requests. To make things easier, a Python package is available to simplify the connection and API running process. ## Required Installation [Section titled “Required Installation”](#required-installation) From a terminal run the following command: ```bash pip install plaidcloud-rpc ``` ## Using the Simplerpc Object to Make a Request [Section titled “Using the Simplerpc Object to Make a Request”](#using-the-simplerpc-object-to-make-a-request) To make a request using the `plaidcloud-rpc` package use the `SimpleRPC` object. ```python from plaidcloud.rpc.connection.jsonrpc import SimpleRPC auth_token = "Your PlaidCloud Auth Token" # See Obtaining Token below endpoint_uri = "plaidcloud.com" # or plaidcloud.net rpc = SimpleRPC(auth_token, endpoint_uri) ``` Once you have the `SimpleRPC` object instantiated you can then issue RPC request to PlaidCloud. This example requests the meta data for a table. ```python table = rpc.analyze.table.table( project_id=project_id, table_id=table_id ) ``` ## What Apis Are Available? [Section titled “What Apis Are Available?”](#what-apis-are-available) There are many APIs available for use that control nearly every aspect of PlaidCloud. The interactive API reference is served per-tenant inside each PlaidCloud workspace — open your workspace and navigate to the API documentation menu to see the live endpoint catalog. ## Obtaining an OAuth Token [Section titled “Obtaining an OAuth Token”](#obtaining-an-oauth-token) See [OAuth setup](/reference/cli/jupyter/oauth-setup/) for more information on obtaining an OAuth token and how to configure the system for automated auth. # Jupyter Notebooks > Run Jupyter Notebooks in PlaidCloud for interactive data analysis, visualization, and Python-based data exploration workflows. Jupyter Notebooks and Jupyter Lab provide exceptional interactive capabilities to analyze, explore, explain, and report data. PlaidCloud enables use of information directly in notebooks. PlaidCloud provides JupyterHub within each tenant workspace if is activated for use. The documentation below helps with setting up Jupyter separately on a desktop or seperate server. ## Install Jupyter Notebook [Section titled “Install Jupyter Notebook”](#install-jupyter-notebook) This assumes you have a working Jupyter Notebook installation. ### Installing a Stand-Alone Jupyter Notebook [Section titled “Installing a Stand-Alone Jupyter Notebook”](#installing-a-stand-alone-jupyter-notebook) For more information on installing a Jupyter Notebook locally you can reference [Jupyter’s installation documentation](https://jupyter.org/install). ### Add to vs Code [Section titled “Add to vs Code”](#add-to-vs-code) VS Code also provides an extension that allows you to run notebooks directly in VS Code. Install the extension from the [Visual Studio Marketplace](https://marketplace.visualstudio.com/items?itemName=ms-toolsai.jupyter) ## Install PlaidCloud Utilities [Section titled “Install PlaidCloud Utilities”](#install-plaidcloud-utilities) While PlaidCloud can be accessed using stand OAuth and JSON-RPC requests, it is recommended that you use our pre-built libraries for simplified access. In addition, the PlaidCloud utilities library includes handy data helpers for use with Pandas dataframes. To install the PlaidCloud Utilities perform the following pip installs: ```bash pip install plaidcloud-rpc@git+https://github.com/PlaidCloud/plaid-rpc.git@v1.4.0#egg=plaidcloud-rpc ``` ```bash pip install plaidcloud-utilities@git+https://github.com/PlaidCloud/plaid-utilities.git@v1.5.2#egg=plaidcloud-utilities ``` ## Obtaining an OAuth Token [Section titled “Obtaining an OAuth Token”](#obtaining-an-oauth-token) See [OAuth setup](/reference/cli/jupyter/oauth-setup/) for more information on obtaining an OAuth token and how to configure the system for automated auth. ## Open Jupyter Notebook User Interface [Section titled “Open Jupyter Notebook User Interface”](#open-jupyter-notebook-user-interface) Launch your notebook server to get started. Once you are signed into your Jupyter notebook server, create a new notebook from the UI. This will open a blank notebook. Create a connection to communicate with PlaidCloud through the API endpoints ```python from plaidcloud.utilities.connect import PlaidConnection conn = PlaidConnection() ``` Establish a local table object and then query it with the results automatically placed in a [Pandas](https://pandas.pydata.org/) dataframe. ```python tbl_sf_cust_master = conn.get_table('Salesforce_Customer_Master') # This gets a table object df_sf_cust_master = conn.get_data(tbl_sf_cust_master) # This retrieves all the data into a dataframe ``` With that same table object you can also write more advanced queries using standard SQLAlchemy syntax. ```python df_sf_cust_master_w_sales = conn.get_data( tbl_sf_cust_master.select().with_only_columns( [tbl_sf_cust_master.c.Id, tbl_sf_cust_master.c.CurrencyIsoCode, tbl_sf_cust_master.c.SyDSalesRegion] ).where( tbl_sf_cust_master.c.TotalSalesPast3Years > 0 ) ) ``` # OAuth Tokens > Set up OAuth tokens for PlaidCloud API access to authenticate Jupyter Notebooks, CLI tools, and custom application connections. PlaidCloud uses standard JSON-RPC requests and can be used with any application that can perform those requests. Requests are secured using OAuth tokens. ## Obtaining an OAuth Token [Section titled “Obtaining an OAuth Token”](#obtaining-an-oauth-token) OAuth tokens are generated from the PlaidCloud app. To view the list of current OAuth tokens assigned to you and generate new ones, navigate to `Analyze > Tools > Registered Systems`. Once there you can view any existing tokens or choose to create a new one. ## Download OAuth PlaidCloud Config File [Section titled “Download OAuth PlaidCloud Config File”](#download-oauth-plaidcloud-config-file) Select “Register a New System”. Fill out the form and note the name you entered so you can find it in the list. Once created, open the registered system record by clicking on the gear icon. This will display the configuration file text. NOTE: Be sure to select the project you want to use this connection for from the drop down at the top. It will add the Project Unique Identifier to the configuration. Copy this text into a plaid.conf file located on your system. Place this in the .plaid directory. ## Create a Config File Locally [Section titled “Create a Config File Locally”](#create-a-config-file-locally) Create a directory one level up from your notebook directory or from where you plan to use command line interaction. Name the directory `.plaid`. Inside the `.plaid` directory, create a file called `plaid.conf` and paste the contents you copied above into the file. Save the file and this will no allow you to connect using the PlaidCloud utilities and rpc methods. ## Advanced Uses [Section titled “Advanced Uses”](#advanced-uses) While it is convenient to locate the `.plaid` folder near its usage point, it can actually be placed anywhere in the upstream directory tree. The initialization process will traverse up the directory tree until it finds the `.plaid` directory. Locating the `.plaid` directory higher up may be useful if you have multiple operations that need access but cannot coexist in the same lower level directory structures. ## Optional Paths Specification [Section titled “Optional Paths Specification”](#optional-paths-specification) If you are using a local Jupyter Notebook installation or operating from command line, it is possible to export data, excel files, and other data as well as reading in local data to dataframes using the helper tools. To do this, a paths.yaml file is necessary. In addition to the `plaid.conf` file, create a `paths.yaml` file. The `paths.yaml` should be a sibling to the `plaid.conf` file inside the `.plaid` directory. It should contain the following path information: ```yaml paths: PROJECT_ROOT: '{WORKING_USER}/Documents' LOCAL_STORAGE: '{PROJECT_ROOT}/local_storage' DEBUG: '{PROJECT_ROOT}/local_storage' REPORTS: '{PROJECT_ROOT}/reports' create: [] local: {} ``` # PlaidLink > Install and configure PlaidLink agents for secure access to systems behind firewalls, enabling remote queries and file transfers. PlaidLink provides indirect access to client systems and processes that are protected by firewalls or behind other restrictions that make direct connections from within PlaidCloud difficult. By using a PlaidCloud Agent installed within the isolated area, PlaidCloud can request the agent perform actions like running queries, downloading or uploading files, checking sensor conditions, interacting with SAP, and much more. Since the agent initiates contact with PlaidCloud and communicates over standard HTTPS network protocols, it can normally operate with minimal setup. In addition, the agent can run as an unprivileged user to control access rights within a restricted environment. ## Topics [Section titled “Topics”](#topics) * [Install](/reference/cli/plaidlink/install/) — getting PlaidLink running on Windows, Linux, macOS, or in a container * [Configure](/reference/cli/plaidlink/configure/) — connection settings, credentials, and runtime options * [Agents](/reference/cli/plaidlink/agents/) — managing multiple agents and their capabilities * [Upgrade](/reference/cli/plaidlink/upgrade/) — moving to a newer PlaidLink build # PlaidLink Agents > Manage PlaidLink agents in PlaidCloud including registration, monitoring, status checks, and handling multiple agent deployments. ## Description [Section titled “Description”](#description) Sometimes it’s necessary and desireable to access data or run processes from a remote system that does not allow external access. This is common in enterprise environments behind firewalls. PlaidCloud allows this ability by using PlaidLink, which enables remote systems access behind a firewall or where direct access from PlaidCloud is not desired. PlaidLink uses an agent-based system. This means that an agent, the remote user, is installed on a system inside the firewall or other restricted area. The agent can then connect to PlaidCloud by using an outbound initiation process over a secure HTTPS websocket connection. It is as secure as any other encrypted web connection and usually does not require you to open non-standard ports. Before gaining access, the agent must identify itself by sending its agent identifier. From this, if the agent has a successful authentication process, the agent is granted access to the approved operations. PlaidLink can be installed on Windows, Unix, and Linux systems and can run under low privilege users. On Windows systems, PlaidLink can operate as a Windows Service with full control from the Service panel. On linux or unix systems, it can run as a deamon process. PlaidLink can also run as a stand-alone Docker container or as a Kubernetes pod. ## Managing Agents [Section titled “Managing Agents”](#managing-agents) **To manage agents:** 1. Open Analyze 2. Select “Tools” 3. Click “PlaidLink Agents” This brings you to the **PlaidLink Agents Table** where you can view, modify, and obtain credentials for the list of available agents. ## Creating an Agent [Section titled “Creating an Agent”](#creating-an-agent) **To create an agent:** 1. Open Analyze 2. Select “Tools” 3. Click “PlaidLink Agents” 4. Click “Add PlaidLink Agent” 5. Complete the required fields 6. Click “Create” 7. Assign the agent to the necessary security groups to access resources needed to perform its job 8. Assign the agent to the necessary Document accounts to access documents needed to perform its job Danger For Steps 7 and 8 above, the PlaidLink Agent must be assigned to security groups and document accounts necessary for performing the jobs you expect the Agent to perform. Otherwise it will be denied access. Note Any information not present on the new agent form will be automatically generated. ## Obtaining Agent Credentials [Section titled “Obtaining Agent Credentials”](#obtaining-agent-credentials) To configure PlaidLink agents on the remote system, you must first obtain the agent’s identifying information in order to maintain security. This information includes both a public and a private key. **To obtain these keys:** 1. Open Analyze 2. Select “Tools” 3. Click “PlaidLink Agents” 4. Click the edit icon This will open a form where you can view the public and private key values. ## Regenerating Agent Credentials [Section titled “Regenerating Agent Credentials”](#regenerating-agent-credentials) It is a good idea to periodically regenerate the public and private keys and update the configuration of remote systems in order to maintain security. **To regenerate the credentials:** 1. Open Analyze 2. Select “Tools” 3. Click “PlaidLink Agents” 4. Click the regenerate icon Once the credentials have been regenerated, they can be obtained in the same way a new agent’s credentials are obtained (described above). ## Enabling and Disabling an Agent [Section titled “Enabling and Disabling an Agent”](#enabling-and-disabling-an-agent) **To disable an agent:** 1. Open Analyze 2. Select “Tools” 3. Click “PlaidLink Agents” 4. Uncheck the “Active” checkbox Note When an agent is not marked as active, remote systems will not be able to connect using those agent credentials ## Running Multiple Agents [Section titled “Running Multiple Agents”](#running-multiple-agents) PlaidLink is designed to allow operation of multiple agents using a single service installation. Such a streamlined installation system permits one install to handle agents from multiple workspaces and / or agents with different levels of permissions for task execution. To enable multiple agents, you simply add the agent credentials to the PlaidLink configuration file. ## Running Multiple PlaidLink Services [Section titled “Running Multiple PlaidLink Services”](#running-multiple-plaidlink-services) Similar to running multiple agents within one PlaidLink service, it is also possible to run multiple PlaidLink services. This is sometimes necessary depending on use of system based security or network access restrictions that prevent communication across network boundaries. Note It is normally better to run multiple agents under a single service rather multiple services on a single machine. However, depending on the use case it may be necessary to run multiple distinct services. ## Compute, Memory, and Disk Requirements [Section titled “Compute, Memory, and Disk Requirements”](#compute-memory-and-disk-requirements) The PlaidLink service is extremely lightweight and only needs minimal compute and memory to operate. When processing significant data volumes it may be necessary to increase compute resources and especially memory. Normally, the agent will happily run with 5% of CPU and 200MB of memory. For intense data operations, it is recommended to allocate an entire CPU and at least 4GB of RAM. For dynamic resource allocation systems like Kubernetes, it is fine if the agent has access to burstable resources rather than reserved resources. Disk space for the agent is minimal too. Agent operations utilize disk space as a data buffer when transferring large amounts of data. Typically, 8GB of space is fine for normal operations. For intense data operations it is recommended that you scale disk up according to the expected data volumes. There is no set amount because it depends on several factors including CPU speed, network speed, amount of data, etc… However, a good place to start is 20GB and adjust from there. ## Networking Requirements [Section titled “Networking Requirements”](#networking-requirements) The PlaidLink Agent is designed to operate with minimal configuration required. It does not require any special VPN or network configuration other than allowing standard HTTPS network traffic. Agents communicate over the same protocol as normal web browser based traffic. The agent service always initiates communication with PlaidCloud so there is no need to configure ingress access in firewalls. Note Sometimes firewall rules block all access, even standard HTTPS traffic. If the agent reports it is unable to contact PlaidCloud on startup, you will need to work with your networking team to open port 443 for traffic. # Configure > Configure PlaidLink agent settings in PlaidCloud including connection parameters, security options, and communication preferences. The PlaidLink Agent works in conjunction with the PlaidCloud service. The PlaidLink Agent provides the connection necessary to operate with systems not accessible directly such as databases and file systems. The agent performs a number of essential actions including: * Reading and writing to databases * Reading and writing files to network drives and servers * Checking for sensor conditions * Interacting with SAP ECC and SAP S/4HANA through Remote Function Calls (RFCs) * Interacting with SAP Profitability and Cost Management (PCM) * Sending messages and notifications to remote systems ## Create an Agent on PlaidCloud [Section titled “Create an Agent on PlaidCloud”](#create-an-agent-on-plaidcloud) PlaidLink Agent management takes place within the Analyze tab of PlaidCloud. The first step is to create a new PlaidLink Agent instance on PlaidCloud. ### To Create a New PlaidLink Agent [Section titled “To Create a New PlaidLink Agent”](#to-create-a-new-plaidlink-agent) 1. Select the Analyze tab 2. Select the tools menu from the top 3. Click PlaidLink Agents 4. Create a new Agent with an appropriate name for the environment or server that it will be installed on for remote operations ### To View the Agent Public and Private Keys [Section titled “To View the Agent Public and Private Keys”](#to-view-the-agent-public-and-private-keys) 1. Click on the edit icon to view the form 2. At the bottom of the form you will find the public and private keys that were randomly generated during the Agent creation process Note Remember these keys, as they will be used in the agent configuration on the remote server. ### To Randomly Generate New Keys [Section titled “To Randomly Generate New Keys”](#to-randomly-generate-new-keys) 1. Click on the Regenerate icon for the Agent record 2. Once the keys are regenerated, don’t forget to update the agent configuration file with the new keys on the remote server. Note Retain the public and private keys for configuring the remote agent in the next step. ## Document Account Access [Section titled “Document Account Access”](#document-account-access) If the agent will need to have access to a Document account for uploading or downloading files, it must be granted permission to access the Document account. ### To Grant Account Access [Section titled “To Grant Account Access”](#to-grant-account-access) 1. In the Document tab select Manage Accounts 2. Once the table of accounts appears, click on the agent icon for the account which the new Agent should have upload/download rights 3. Drag the new agent into the Assigned Agents column 4. Save the access control form. Note Agents can only upload and download files if the agent has been granted access to one or more Document accounts. ## Data Connection Access [Section titled “Data Connection Access”](#data-connection-access) If the agent will need to have access to a data connection such as a database, it must be granted permission to access the external data connection information. ### To Grant Connection Access [Section titled “To Grant Connection Access”](#to-grant-connection-access) 1. In the Analyze tab select the Tools menu 2. Click External Data Connections 3. Once the table of data connections appears, click on the agent icon for the connection, which the new Agent should have usage rights 4. Drag the new agent into the Assigned Agents column and save the access control form. Note Agent data connection credentials are managed in the External Data Connections. ### Next Step: Installing PlaidLink (agent) on a Remote System [Section titled “Next Step: Installing PlaidLink (agent) on a Remote System”](#next-step-installing-plaidlink-agent-on-a-remote-system) Follow these [Installation Instructions](/reference/cli/plaidlink/install) to install PlaidLink on the remote system. # Install PlaidLink > Install the PlaidLink agent on your local network or server to enable secure data access between PlaidCloud and protected systems. ## Download the Agent [Section titled “Download the Agent”](#download-the-agent) Check the releases on [PlaidCloud.com](https://plaidcloud.com/) for **PlaidLink** ## Extract the Agent [Section titled “Extract the Agent”](#extract-the-agent) Extract the downloaded zip file to an install location of your choice. Generally, this location will be: ```bash C:\Users\\src\plaidlink ``` ## Create a Configuration File [Section titled “Create a Configuration File”](#create-a-configuration-file) Note If you are upgrading from a past version of the agent, the configuration file is still valid, and this step can be skipped Copy the `config-dist.yaml` file in the agent’s directory to `%ProgramData\plaidcloud\`, and rename this copy `config.yaml` *(Edit this configuration with the values retrieved from PlaidCloud)* ## Install the Agent’s Service [Section titled “Install the Agent’s Service”](#install-the-agents-service) Run the `install_windows_service.bat` file in the agent’s install directory OR From an administrator command prompt, navigate to the agent’s install directory and run: ```bash .\PlaidLink.exe install ``` ## Running the Agent [Section titled “Running the Agent”](#running-the-agent) Note To install a Windows service, one must have administrative privileges Type **`Services`** into Windows’ search bar and open the service manager. In the list of services, find **`PlaidCloud Agent`**. Right-click the service and select **“Start”** to start the agent. ## Freezing Updates [Section titled “Freezing Updates”](#freezing-updates) If at any point you want to disable the agent’s auto-update feature, open the agent’s **‘yaml’** configuration file, and at the root level of the file, add a line that reads `freeze_updates: true`, and restart the agent’s service. Caution Disabiling auto-updates is not recommended long-term # Upgrade > Upgrade your PlaidLink agent to the latest version to access new features, security patches, and improved system compatibility. A manual upgrade of PlaidLink may be necessary if the agent does not have sufficient privileges to update itself when new versions are released or a manual upgrade process is desired. ## Download the Agent [Section titled “Download the Agent”](#download-the-agent) Check the releases on [PlaidCloud.com](https://plaidcloud.com/) for **PlaidLink** ## Stop the Current Agent [Section titled “Stop the Current Agent”](#stop-the-current-agent) Type **`Services`** into Windows’ search bar and open the service manager. In the list of services, find **`PlaidCloud Agent`**. Right click on the **`PlaidCloud Agent`** service and select *Stop*. Once the service successfully stops, continue on. ## Extract the Agent [Section titled “Extract the Agent”](#extract-the-agent) Navigate to the current location of the installed agent. ```bash C:\Users\\src\ ``` Rename the current installation folder so that it will no longer be referenced. For example `Plaidlink_Old_12122022` Extract the downloaded zip file to an install it in this location. Generally, this location will be: ```bash C:\Users\\src\plaidlink ``` ## Start the Agent [Section titled “Start the Agent”](#start-the-agent) Return to the *Services* window. Right click on the **`PlaidCloud Agent`** service and select *Start*. Type **`Services`** into Windows’ search bar and open the service manager. In the list of services, find **`PlaidCloud Agent`**. Right-click the service and select **`Start`** to start the agent. Once the agent shows in the **`Running`** state, the agent is now operational again on the new version. # PlaidXL > Use the PlaidXL Excel Add-in to interact with PlaidCloud workspaces, projects, tables, and variables directly from Excel. The PlaidCloud Office Add-in (PlaidXL) lets analysts work with PlaidCloud workspaces, projects, workflows, tables, views, and variables directly from Microsoft Excel. PlaidXL provides Excel functions for pulling PlaidCloud data into worksheets and refreshing it on demand — useful for analysts who do their primary modeling in Excel but want their inputs to come from authoritative PlaidCloud project tables rather than copy-paste. ## Topics [Section titled “Topics”](#topics) * [Install](/reference/cli/plaidxl/install/) — downloading and enabling the PlaidXL add-in * [Connect](/reference/cli/plaidxl/connect/) — signing in and choosing a workspace * [Retrieve data](/reference/cli/plaidxl/retrieve/) — pulling project table data into Excel cells # Connecting > Connect PlaidXL to your PlaidCloud workspace to start importing, exporting, and managing data directly from Microsoft Excel. ## For PlaidCloud Logins [Section titled “For PlaidCloud Logins”](#for-plaidcloud-logins) Connecting to PlaidCloud is much like your login to PlaidCloud directly. You will be asked for your email, password, and any multi-factor authentication code enabled. Fill this out as normal, and begin using PlaidXL! ## For Single Sign-on Logins [Section titled “For Single Sign-on Logins”](#for-single-sign-on-logins) If you normally use single sign-on to access PlaidCloud, the login process will be transparent for you as long as you are currently logged into your organization. If you are not logged in, you will be prompted to sign in. # Install PlaidXL > Install the PlaidXL Excel Add-in to connect Microsoft Excel directly to PlaidCloud for data retrieval and management operations. ## For Windows [Section titled “For Windows”](#for-windows) 1. From the `Insert > Add-ins` menu in Microsoft Excel, type in `PlaidCloud` in the add-in search box 2. Select the PlaidCloud Office Add-in and install it ## For Mac [Section titled “For Mac”](#for-mac) 1. From the `Insert > Store` menu in Microsoft Excel for Mac, type in `PlaidCloud` in the add-in search box 2. Select the PlaidCloud Office Add-in and install it # Working with Data > Retrieve PlaidCloud data tables and views directly into Microsoft Excel using PlaidXL for local analysis and reporting tasks. ## Retrieve Data [Section titled “Retrieve Data”](#retrieve-data) To retrieve data from PlaidCloud, select your desired project from the dropdown menu. Once a project is selected, a list of tables in that project will appear. Click on a table to select it, and click the `Retrieve Table` button to import the selected table into Excel. The table will be placed in a new worksheet, named after the table. For your convenience, the following will also happen when a table is retrieved: * Column headers will be frozen * Auto-filters will be enabled * An offset-based named range will be generated to encompass the data * This range’s name will be the same as the table’s name, prefixed with an underscore and with all spaces replaced by underscores * For example, the range for a table named “Sample data” would be “\_Sample\_data” ## Save Data [Section titled “Save Data”](#save-data) If you make changes data in the spreadsheet and want to push these changes to the PlaidCloud table, simply press the `Save Table (OVERWRITE!)` button. Danger Be careful – as the warning suggests, this will overwrite the data in PlaidCloud with the data in your spreadsheet. Since you can open multiple PlaidCloud tables in PlaidXL, bulk operations are in place for your convenience. The pull/push all active tables buttons will retrieve the latest versions of all tables active in excel, or upload all active tables back to PlaidCloud, respectively. In addition, pulling all tables will also refresh any pivot tables that use data from a refreshed table. # Data and Service Connectors > Connect PlaidCloud to external data sources and services including databases, ERPs, REST APIs, cloud storage, and Git repositories. PlaidCloud connects to external data sources and services through purpose-built connectors. Each connector handles the authentication, protocol, and data-shape specifics of one provider family. ## Categories [Section titled “Categories”](#categories) ### Databases and Data Lakes [Section titled “Databases and Data Lakes”](#databases-and-data-lakes) Relational databases, cloud warehouses, query engines, and lakehouse formats. * [Databases](/reference/connectors/databases/) — PostgreSQL, MySQL, SQL Server, Oracle, Snowflake, Redshift, BigQuery, Databricks, and 15+ more * [Open Tables](/reference/connectors/open-tables/) — Apache Iceberg, Delta Lake, Hudi, Hive open table formats ### Cloud and SaaS Services [Section titled “Cloud and SaaS Services”](#cloud-and-saas-services) * [REST](/reference/connectors/rest/) — Salesforce, NetSuite, Workday, QuickBooks, Stripe, Dynamics, and more * [ERP systems](/reference/connectors/erp/) — SAP ECC, S/4HANA, Oracle EBS/Fusion, Infor, JD Edwards * [Cloud services](/reference/connectors/cloud-services/) — third-party data services * [Google](/reference/connectors/google/) — BigQuery, Google Sheets * [Collaboration](/reference/connectors/collaboration/) — Slack, Microsoft Teams * [Singer Sources](/reference/connectors/singer-sources/) — 130+ Singer-tap sources: Stripe, GitHub, HubSpot, databases, and more ### Development and Source Control [Section titled “Development and Source Control”](#development-and-source-control) * [Git providers](/reference/connectors/git/) — GitHub, GitLab, Bitbucket, Azure Repos, CodeCommit ## Related [Section titled “Related”](#related) * [Connections guide](/guides/connections/) — task-oriented walkthrough for creating and managing connections * [Workflow steps reference](/reference/workflow-steps/) — what to do with a connection once it’s configured # Cloud Service Connections > Connect PlaidCloud to cloud data services including Quandl for financial and economic data integration into your workflows. Connectors for cloud-based data services that use proprietary or non-REST protocols. These don’t fit cleanly into the database or REST categories. ## Providers [Section titled “Providers”](#providers) * [Quandl](/reference/connectors/cloud-services/quandl/) — financial and economic data (NASDAQ Data Link) # Quandl Connector > Set up a Quandl cloud service connection in PlaidCloud to import financial, economic, and alternative data into your workflows. ## Connection Documentation [Section titled “Connection Documentation”](#connection-documentation) [Quandl is now Nasdaq Data Link. The documentation](https://docs.data.nasdaq.com/). ## Setup [Section titled “Setup”](#setup) This connector uses a vendor-specific authentication flow and is configured directly from the **Connections** screen in your workspace. The configuration fields shown depend on the credentials your tenant administrator has provisioned for the integration. See the upstream [quandl documentation](https://docs.data.nasdaq.com/) for the latest setup specifics. If you need help setting up this connector for your tenant, contact your account team — connector-specific credentials, environment URLs, and any required pre-provisioning typically need to be coordinated with PlaidCloud support. # Team Collaboration Connections > Connect PlaidCloud to collaboration platforms like Slack and Microsoft Teams for automated notifications and data sharing. PlaidCloud connects to team chat platforms so workflows can send notifications, alerts, or status updates directly into channels your team is already watching. Most commonly used alongside the [Notify via Slack](/reference/workflow-steps/notifications/notify-via-slack/) and [Notify via Microsoft Teams](/reference/workflow-steps/notifications/notify-via-microsoft-teams/) workflow steps. ## Providers [Section titled “Providers”](#providers) * [Slack](/reference/connectors/collaboration/slack/) * [Microsoft Teams](/reference/connectors/collaboration/teams/) # Slack Connector > Configure a Slack connection in PlaidCloud to enable automated workflow notifications and data alerts to Slack channels. ## Connection Documentation [Section titled “Connection Documentation”](#connection-documentation) [Slack Admin documentation](https://slack.com/help). ## Setup [Section titled “Setup”](#setup) This connector uses a vendor-specific authentication flow and is configured directly from the **Connections** screen in your workspace. The configuration fields shown depend on the credentials your tenant administrator has provisioned for the integration. See the upstream [slack documentation](https://api.slack.com/) for the latest setup specifics. If you need help setting up this connector for your tenant, contact your account team — connector-specific credentials, environment URLs, and any required pre-provisioning typically need to be coordinated with PlaidCloud support. # Microsoft Teams Connector > Set up a Microsoft Teams connection in PlaidCloud to enable automated workflow notifications and data alerts to Teams channels. ## Connection Documentation [Section titled “Connection Documentation”](#connection-documentation) [Microsoft Teams Admin documentation](https://learn.microsoft.com/en-us/microsoftteams/). ## Setup [Section titled “Setup”](#setup) This connector uses a vendor-specific authentication flow and is configured directly from the **Connections** screen in your workspace. The configuration fields shown depend on the credentials your tenant administrator has provisioned for the integration. See the upstream [teams documentation](https://learn.microsoft.com/en-us/microsoftteams/) for the latest setup specifics. If you need help setting up this connector for your tenant, contact your account team — connector-specific credentials, environment URLs, and any required pre-provisioning typically need to be coordinated with PlaidCloud support. # Database and Data Lake Connections > Database and Data Lake connections vary by service. Each connector has specific security and access requirements for PlaidCloud to connect. PlaidCloud connects directly to databases, data lakes, query engines, and lakehouses. Connections can also route through a PlaidLink Agent when the target sits behind a firewall. The terms *database*, *lakehouse*, *query engine*, and *data warehouse* describe different underlying technologies but all expose a SQL-style query interface — so we treat them as one category here. ## Relational Databases [Section titled “Relational Databases”](#relational-databases) * [PostgreSQL](/reference/connectors/databases/postgres/) * [MySQL](/reference/connectors/databases/mysql/) * [Microsoft SQL Server](/reference/connectors/databases/microsoft-sql-server/) * [Oracle](/reference/connectors/databases/oracle/) * [IBM DB2](/reference/connectors/databases/ibm-db2/) * [Informix](/reference/connectors/databases/informix/) ## Cloud Data Warehouses [Section titled “Cloud Data Warehouses”](#cloud-data-warehouses) * [Snowflake](/reference/connectors/databases/snowflake/) * [Amazon Redshift](/reference/connectors/databases/amazon-redshift/) * [Amazon Athena](/reference/connectors/databases/amazon-athena/) * [Azure Databricks](/reference/connectors/databases/azure-databricks/) * [Microsoft Fabric](/reference/connectors/databases/microsoft-fabric/) * [SAP HANA](/reference/connectors/databases/sap-hana/) ## Analytical Databases [Section titled “Analytical Databases”](#analytical-databases) * [Greenplum](/reference/connectors/databases/greenplum/) * [Exasol](/reference/connectors/databases/exasol/) * [Databend](/reference/connectors/databases/databend/) — Lakehouse v1 engine * [StarRocks](/reference/connectors/databases/starrocks/) — Lakehouse v2 engine * [Doris](/reference/connectors/databases/doris/) * [PlaidCloud Lakehouse](/reference/connectors/databases/plaidcloud-lakehouse/) ## Query Engines [Section titled “Query Engines”](#query-engines) * [Presto](/reference/connectors/databases/presto/) * [Trino](/reference/connectors/databases/trino/) * [Apache Hive](/reference/connectors/databases/hive/) * [Apache Spark](/reference/connectors/databases/spark/) ## Generic [Section titled “Generic”](#generic) * [ODBC](/reference/connectors/databases/odbc/) — connect to any database with an ODBC driver # Amazon Athena > Configure an Amazon Athena connection in PlaidCloud to run serverless queries against data stored in Amazon S3 buckets. **Amazon Athena** is AWS’s serverless query engine over S3-hosted data, billed per-query. Use this connector to run Athena queries from PlaidCloud workflows — useful for joining S3 data lakes with PlaidCloud project tables. Authentication uses AWS access keys or IAM role assumption. ## Upstream Documentation [Section titled “Upstream Documentation”](#upstream-documentation) [Amazon Athena documentation](https://docs.aws.amazon.com/athena/). ## Configuration [Section titled “Configuration”](#configuration) These fields appear when creating or editing this connection. Required vs optional depends on the authentication options you enable. ### Identification [Section titled “Identification”](#identification) | Field | Type | Description | | ------------ | ----------------- | ---------------------------------------------------------------------- | | Name | Text | Display name for this connection. | | Alias | Text (multi-line) | Optional alias or notes about the connection. | | Is active | Toggle | Whether the connection is enabled. Disable to pause without deleting. | | Db read only | Toggle | Restrict the connection to read-only operations. | | Access type | Select | Read-only, write-only, or read-write access level for this connection. | ### Connection [Section titled “Connection”](#connection) | Field | Type | Description | | ---------- | ------ | ---------------------------------------------- | | Db host | Text | Hostname or IP address of the database server. | | Db port | Number | Port number for the database connection. | | Db catalog | Text | Database, catalog, or schema to connect to. | ### Authentication [Section titled “Authentication”](#authentication) | Field | Type | Description | | ----------- | -------- | ------------------------------------------------------------- | | Db user | Text | Username for database authentication. | | Use sso | Toggle | Authenticate via single sign-on instead of username/password. | | Db password | Password | Password for database authentication. | ### SSL / TLS [Section titled “SSL / TLS”](#ssl--tls) | Field | Type | Description | | -------------------- | ----------------- | ----------------------------------------------------------------------- | | Use ssl | Toggle | Encrypt the connection with SSL/TLS. | | Ssl mode | Select | SSL verification mode (e.g., disable, require, verify-ca, verify-full). | | Ssl auth client cert | Text (multi-line) | Client certificate (PEM) for mutual TLS authentication. | | Ssl auth client key | Text (multi-line) | Client private key (PEM) for mutual TLS authentication. | | Ssl auth root cert | Text (multi-line) | Root CA certificate (PEM) for verifying the server’s cert. | | Ssl auth cert revoke | Text (multi-line) | Certificate revocation list, if your environment uses one. | ### SSH Tunnel [Section titled “SSH Tunnel”](#ssh-tunnel) | Field | Type | Description | | --------------- | ----------------- | ----------------------------------------------------------------------- | | Use ssh | Toggle | Tunnel the connection through an SSH bastion. | | Ssh host | Text | SSH bastion hostname. | | Ssh port | Number | SSH bastion port (default 22). | | Ssh user | Text | SSH bastion username. | | Ssh password | Password | SSH bastion password (if password auth is used). | | Use ssh cert | Toggle | Authenticate to the SSH bastion with a private key instead of password. | | Ssh private key | Text (multi-line) | SSH private key (PEM) for bastion authentication. | | Ssh host key | Text (multi-line) | Expected SSH host key for bastion fingerprint verification. | # Amazon Redshift > Set up an Amazon Redshift database connection in PlaidCloud to query, import, and export data with your Redshift warehouse. **Amazon Redshift** is AWS’s managed cloud data warehouse, designed for analytical workloads over large datasets. Use this connector to read and write Redshift tables from PlaidCloud workflows. The connector speaks the PostgreSQL wire protocol; authentication uses standard database credentials or IAM-backed temporary credentials. ## Upstream Documentation [Section titled “Upstream Documentation”](#upstream-documentation) See the [Amazon Redshift documentation](https://docs.aws.amazon.com/redshift/) for guides and reference material. ## Configuration [Section titled “Configuration”](#configuration) These fields appear when creating or editing this connection. Required vs optional depends on the authentication options you enable. ### Identification [Section titled “Identification”](#identification) | Field | Type | Description | | ------------ | ----------------- | ---------------------------------------------------------------------- | | Name | Text | Display name for this connection. | | Alias | Text (multi-line) | Optional alias or notes about the connection. | | Is active | Toggle | Whether the connection is enabled. Disable to pause without deleting. | | Db read only | Toggle | Restrict the connection to read-only operations. | | Access type | Select | Read-only, write-only, or read-write access level for this connection. | ### Connection [Section titled “Connection”](#connection) | Field | Type | Description | | ---------- | ------ | ---------------------------------------------- | | Db host | Text | Hostname or IP address of the database server. | | Db port | Number | Port number for the database connection. | | Db catalog | Text | Database, catalog, or schema to connect to. | ### Authentication [Section titled “Authentication”](#authentication) | Field | Type | Description | | ----------- | -------- | ------------------------------------------------------------- | | Db user | Text | Username for database authentication. | | Use sso | Toggle | Authenticate via single sign-on instead of username/password. | | Db password | Password | Password for database authentication. | ### SSL / TLS [Section titled “SSL / TLS”](#ssl--tls) | Field | Type | Description | | -------------------- | ----------------- | ----------------------------------------------------------------------- | | Use ssl | Toggle | Encrypt the connection with SSL/TLS. | | Ssl mode | Select | SSL verification mode (e.g., disable, require, verify-ca, verify-full). | | Ssl auth client cert | Text (multi-line) | Client certificate (PEM) for mutual TLS authentication. | | Ssl auth client key | Text (multi-line) | Client private key (PEM) for mutual TLS authentication. | | Ssl auth root cert | Text (multi-line) | Root CA certificate (PEM) for verifying the server’s cert. | | Ssl auth cert revoke | Text (multi-line) | Certificate revocation list, if your environment uses one. | ### SSH Tunnel [Section titled “SSH Tunnel”](#ssh-tunnel) | Field | Type | Description | | --------------- | ----------------- | ----------------------------------------------------------------------- | | Use ssh | Toggle | Tunnel the connection through an SSH bastion. | | Ssh host | Text | SSH bastion hostname. | | Ssh port | Number | SSH bastion port (default 22). | | Ssh user | Text | SSH bastion username. | | Ssh password | Password | SSH bastion password (if password auth is used). | | Use ssh cert | Toggle | Authenticate to the SSH bastion with a private key instead of password. | | Ssh private key | Text (multi-line) | SSH private key (PEM) for bastion authentication. | | Ssh host key | Text (multi-line) | Expected SSH host key for bastion fingerprint verification. | # Azure Databricks > Configure an Azure Databricks connection in PlaidCloud to integrate Spark-based analytics and lakehouse data into workflows. **Azure Databricks** combines Apache Spark, Delta Lake, and a managed notebook environment on Microsoft Azure. Use this connector to read and write tables in a Databricks workspace from PlaidCloud workflows. Authentication uses a personal access token or service principal; the workspace URL and HTTP path identify the SQL warehouse to target. ## Upstream Documentation [Section titled “Upstream Documentation”](#upstream-documentation) [Azure Databricks documentation](https://learn.microsoft.com/en-us/azure/databricks/). ## Configuration [Section titled “Configuration”](#configuration) These fields appear when creating or editing this connection. Required vs optional depends on the authentication options you enable. ### Identification [Section titled “Identification”](#identification) | Field | Type | Description | | ------------ | ----------------- | ---------------------------------------------------------------------- | | Name | Text | Display name for this connection. | | Alias | Text (multi-line) | Optional alias or notes about the connection. | | Is active | Toggle | Whether the connection is enabled. Disable to pause without deleting. | | Db read only | Toggle | Restrict the connection to read-only operations. | | Access type | Select | Read-only, write-only, or read-write access level for this connection. | ### Connection [Section titled “Connection”](#connection) | Field | Type | Description | | ---------- | ---- | ---------------------------------------------- | | Db host | Text | Hostname or IP address of the database server. | | Db catalog | Text | Database, catalog, or schema to connect to. | | Db schema | Text | Schema name within the database. | ### Authentication [Section titled “Authentication”](#authentication) | Field | Type | Description | | ----------- | -------- | ------------------------------------------------------------- | | Db user | Text | Username for database authentication. | | Use sso | Toggle | Authenticate via single sign-on instead of username/password. | | Db password | Password | Password for database authentication. | ### SSL / TLS [Section titled “SSL / TLS”](#ssl--tls) | Field | Type | Description | | -------------------- | ----------------- | ----------------------------------------------------------------------- | | Use ssl | Toggle | Encrypt the connection with SSL/TLS. | | Ssl mode | Select | SSL verification mode (e.g., disable, require, verify-ca, verify-full). | | Ssl auth client cert | Text (multi-line) | Client certificate (PEM) for mutual TLS authentication. | | Ssl auth client key | Text (multi-line) | Client private key (PEM) for mutual TLS authentication. | | Ssl auth root cert | Text (multi-line) | Root CA certificate (PEM) for verifying the server’s cert. | | Ssl auth cert revoke | Text (multi-line) | Certificate revocation list, if your environment uses one. | ### SSH Tunnel [Section titled “SSH Tunnel”](#ssh-tunnel) | Field | Type | Description | | --------------- | ----------------- | ----------------------------------------------------------------------- | | Use ssh | Toggle | Tunnel the connection through an SSH bastion. | | Ssh host | Text | SSH bastion hostname. | | Ssh port | Number | SSH bastion port (default 22). | | Ssh user | Text | SSH bastion username. | | Ssh password | Password | SSH bastion password (if password auth is used). | | Use ssh cert | Toggle | Authenticate to the SSH bastion with a private key instead of password. | | Ssh private key | Text (multi-line) | SSH private key (PEM) for bastion authentication. | | Ssh host key | Text (multi-line) | Expected SSH host key for bastion fingerprint verification. | ### Other [Section titled “Other”](#other) | Field | Type | Description | | --------- | ---- | ----------- | | Http path | Text | — | # Databend > Set up a Databend database connection in PlaidCloud to run cloud-native analytical queries with cost-effective data storage. **Databend** is the open-source SQL engine that powers PlaidCloud’s Lakehouse v1. Use this connector when you want to query a standalone Databend deployment — for in-product analytics, the lakehouse is reachable through PlaidCloud directly without needing this connector. ## Upstream Documentation [Section titled “Upstream Documentation”](#upstream-documentation) [Databend documentation](https://docs.databend.com/guides/). ## Configuration [Section titled “Configuration”](#configuration) These fields appear when creating or editing this connection. Required vs optional depends on the authentication options you enable. ### Identification [Section titled “Identification”](#identification) | Field | Type | Description | | ------------ | ----------------- | ---------------------------------------------------------------------- | | Name | Text | Display name for this connection. | | Alias | Text (multi-line) | Optional alias or notes about the connection. | | Is active | Toggle | Whether the connection is enabled. Disable to pause without deleting. | | Db read only | Toggle | Restrict the connection to read-only operations. | | Access type | Select | Read-only, write-only, or read-write access level for this connection. | ### Connection [Section titled “Connection”](#connection) | Field | Type | Description | | ---------- | ------ | ---------------------------------------------- | | Db host | Text | Hostname or IP address of the database server. | | Db port | Number | Port number for the database connection. | | Db catalog | Text | Database, catalog, or schema to connect to. | ### Authentication [Section titled “Authentication”](#authentication) | Field | Type | Description | | ----------- | -------- | ------------------------------------------------------------- | | Db user | Text | Username for database authentication. | | Use sso | Toggle | Authenticate via single sign-on instead of username/password. | | Db password | Password | Password for database authentication. | ### SSL / TLS [Section titled “SSL / TLS”](#ssl--tls) | Field | Type | Description | | -------------------- | ----------------- | ----------------------------------------------------------------------- | | Use ssl | Toggle | Encrypt the connection with SSL/TLS. | | Ssl mode | Select | SSL verification mode (e.g., disable, require, verify-ca, verify-full). | | Ssl auth client cert | Text (multi-line) | Client certificate (PEM) for mutual TLS authentication. | | Ssl auth client key | Text (multi-line) | Client private key (PEM) for mutual TLS authentication. | | Ssl auth root cert | Text (multi-line) | Root CA certificate (PEM) for verifying the server’s cert. | | Ssl auth cert revoke | Text (multi-line) | Certificate revocation list, if your environment uses one. | ### SSH Tunnel [Section titled “SSH Tunnel”](#ssh-tunnel) | Field | Type | Description | | --------------- | ----------------- | ----------------------------------------------------------------------- | | Use ssh | Toggle | Tunnel the connection through an SSH bastion. | | Ssh host | Text | SSH bastion hostname. | | Ssh port | Number | SSH bastion port (default 22). | | Ssh user | Text | SSH bastion username. | | Ssh password | Password | SSH bastion password (if password auth is used). | | Use ssh cert | Toggle | Authenticate to the SSH bastion with a private key instead of password. | | Ssh private key | Text (multi-line) | SSH private key (PEM) for bastion authentication. | | Ssh host key | Text (multi-line) | Expected SSH host key for bastion fingerprint verification. | # Apache Doris > Configure an Apache Doris database connection in PlaidCloud to run real-time analytical queries on large-scale data sets. **Apache Doris** is the high-performance MPP analytical database that StarRocks forked from. Use this connector to query Doris deployments from PlaidCloud workflows. Connection uses the MySQL wire protocol with standard username/password authentication. ## Upstream Documentation [Section titled “Upstream Documentation”](#upstream-documentation) [Apache Doris documentation](https://doris.apache.org/docs/4.x/gettingStarted/what-is-apache-doris). The [Apache Doris project homepage](https://doris.apache.org/). ## Setup [Section titled “Setup”](#setup) This connector uses a vendor-specific authentication flow and is configured directly from the **Connections** screen in your workspace. The configuration fields shown depend on the credentials your tenant administrator has provisioned for the integration. See the upstream [doris documentation](https://doris.apache.org/docs/) for the latest setup specifics. If you need help setting up this connector for your tenant, contact your account team — connector-specific credentials, environment URLs, and any required pre-provisioning typically need to be coordinated with PlaidCloud support. # Exasol > Configure an Exasol database connection in PlaidCloud to run high-performance analytical queries and integrate your data. **Exasol** is an in-memory analytical database optimized for fast SQL over large datasets. Use this connector to read and write Exasol tables from PlaidCloud workflows. Connection uses standard username/password authentication with optional SSL. ## Upstream Documentation [Section titled “Upstream Documentation”](#upstream-documentation) [Exasol documentation](https://docs.exasol.com/home.htm). ## Configuration [Section titled “Configuration”](#configuration) These fields appear when creating or editing this connection. Required vs optional depends on the authentication options you enable. ### Identification [Section titled “Identification”](#identification) | Field | Type | Description | | ------------ | ----------------- | ---------------------------------------------------------------------- | | Name | Text | Display name for this connection. | | Alias | Text (multi-line) | Optional alias or notes about the connection. | | Is active | Toggle | Whether the connection is enabled. Disable to pause without deleting. | | Db read only | Toggle | Restrict the connection to read-only operations. | | Access type | Select | Read-only, write-only, or read-write access level for this connection. | ### Connection [Section titled “Connection”](#connection) | Field | Type | Description | | ---------- | ------ | ---------------------------------------------- | | Db host | Text | Hostname or IP address of the database server. | | Db port | Number | Port number for the database connection. | | Db catalog | Text | Database, catalog, or schema to connect to. | ### Authentication [Section titled “Authentication”](#authentication) | Field | Type | Description | | ----------- | -------- | ------------------------------------------------------------- | | Db user | Text | Username for database authentication. | | Use sso | Toggle | Authenticate via single sign-on instead of username/password. | | Db password | Password | Password for database authentication. | ### SSL / TLS [Section titled “SSL / TLS”](#ssl--tls) | Field | Type | Description | | -------------------- | ----------------- | ----------------------------------------------------------------------- | | Use ssl | Toggle | Encrypt the connection with SSL/TLS. | | Ssl mode | Select | SSL verification mode (e.g., disable, require, verify-ca, verify-full). | | Ssl auth client cert | Text (multi-line) | Client certificate (PEM) for mutual TLS authentication. | | Ssl auth client key | Text (multi-line) | Client private key (PEM) for mutual TLS authentication. | | Ssl auth root cert | Text (multi-line) | Root CA certificate (PEM) for verifying the server’s cert. | | Ssl auth cert revoke | Text (multi-line) | Certificate revocation list, if your environment uses one. | ### SSH Tunnel [Section titled “SSH Tunnel”](#ssh-tunnel) | Field | Type | Description | | --------------- | ----------------- | ----------------------------------------------------------------------- | | Use ssh | Toggle | Tunnel the connection through an SSH bastion. | | Ssh host | Text | SSH bastion hostname. | | Ssh port | Number | SSH bastion port (default 22). | | Ssh user | Text | SSH bastion username. | | Ssh password | Password | SSH bastion password (if password auth is used). | | Use ssh cert | Toggle | Authenticate to the SSH bastion with a private key instead of password. | | Ssh private key | Text (multi-line) | SSH private key (PEM) for bastion authentication. | | Ssh host key | Text (multi-line) | Expected SSH host key for bastion fingerprint verification. | # Greenplum > Set up a Greenplum database connection in PlaidCloud to query, import, and export data with your Greenplum data warehouse. **Greenplum** is a massively parallel PostgreSQL-derived analytical database (originally Pivotal, now VMware Tanzu). Use this connector to read and write Greenplum tables from PlaidCloud workflows. The wire protocol is PostgreSQL-compatible, so most PostgreSQL tooling considerations apply. ## Upstream Documentation [Section titled “Upstream Documentation”](#upstream-documentation) [The Greenplum documentation](https://techdocs.broadcom.com/us/en/vmware-tanzu/data-solutions/tanzu-greenplum/7/greenplum-database/landing-index.html). ## Configuration [Section titled “Configuration”](#configuration) These fields appear when creating or editing this connection. Required vs optional depends on the authentication options you enable. ### Identification [Section titled “Identification”](#identification) | Field | Type | Description | | ------------ | ----------------- | ---------------------------------------------------------------------- | | Name | Text | Display name for this connection. | | Alias | Text (multi-line) | Optional alias or notes about the connection. | | Is active | Toggle | Whether the connection is enabled. Disable to pause without deleting. | | Db read only | Toggle | Restrict the connection to read-only operations. | | Access type | Select | Read-only, write-only, or read-write access level for this connection. | ### Connection [Section titled “Connection”](#connection) | Field | Type | Description | | ---------- | ------ | ---------------------------------------------- | | Db host | Text | Hostname or IP address of the database server. | | Db port | Number | Port number for the database connection. | | Db catalog | Text | Database, catalog, or schema to connect to. | ### Authentication [Section titled “Authentication”](#authentication) | Field | Type | Description | | ----------- | -------- | ------------------------------------------------------------- | | Db user | Text | Username for database authentication. | | Use sso | Toggle | Authenticate via single sign-on instead of username/password. | | Db password | Password | Password for database authentication. | ### SSL / TLS [Section titled “SSL / TLS”](#ssl--tls) | Field | Type | Description | | -------------------- | ----------------- | ----------------------------------------------------------------------- | | Use ssl | Toggle | Encrypt the connection with SSL/TLS. | | Ssl mode | Select | SSL verification mode (e.g., disable, require, verify-ca, verify-full). | | Ssl auth client cert | Text (multi-line) | Client certificate (PEM) for mutual TLS authentication. | | Ssl auth client key | Text (multi-line) | Client private key (PEM) for mutual TLS authentication. | | Ssl auth root cert | Text (multi-line) | Root CA certificate (PEM) for verifying the server’s cert. | | Ssl auth cert revoke | Text (multi-line) | Certificate revocation list, if your environment uses one. | ### SSH Tunnel [Section titled “SSH Tunnel”](#ssh-tunnel) | Field | Type | Description | | --------------- | ----------------- | ----------------------------------------------------------------------- | | Use ssh | Toggle | Tunnel the connection through an SSH bastion. | | Ssh host | Text | SSH bastion hostname. | | Ssh port | Number | SSH bastion port (default 22). | | Ssh user | Text | SSH bastion username. | | Ssh password | Password | SSH bastion password (if password auth is used). | | Use ssh cert | Toggle | Authenticate to the SSH bastion with a private key instead of password. | | Ssh private key | Text (multi-line) | SSH private key (PEM) for bastion authentication. | | Ssh host key | Text (multi-line) | Expected SSH host key for bastion fingerprint verification. | # Apache Hive > Set up an Apache Hive data lake connection in PlaidCloud to query and integrate large-scale data stored in Hadoop ecosystems. **Apache Hive** is the SQL layer over Hadoop-style distributed storage, common in older data lake deployments. Use this connector to read and write Hive tables. Authentication varies by deployment — common modes are LDAP, Kerberos, or no-auth on internal networks; check your Hive metastore configuration. ## Upstream Documentation [Section titled “Upstream Documentation”](#upstream-documentation) [Apache Hive documentation](https://hive.apache.org/docs/latest/). ## Configuration [Section titled “Configuration”](#configuration) These fields appear when creating or editing this connection. Required vs optional depends on the authentication options you enable. ### Identification [Section titled “Identification”](#identification) | Field | Type | Description | | ------------ | ----------------- | ---------------------------------------------------------------------- | | Name | Text | Display name for this connection. | | Alias | Text (multi-line) | Optional alias or notes about the connection. | | Is active | Toggle | Whether the connection is enabled. Disable to pause without deleting. | | Db read only | Toggle | Restrict the connection to read-only operations. | | Access type | Select | Read-only, write-only, or read-write access level for this connection. | ### Connection [Section titled “Connection”](#connection) | Field | Type | Description | | ---------- | ------ | ---------------------------------------------- | | Db host | Text | Hostname or IP address of the database server. | | Db port | Number | Port number for the database connection. | | Db catalog | Text | Database, catalog, or schema to connect to. | ### Authentication [Section titled “Authentication”](#authentication) | Field | Type | Description | | ----------- | -------- | ------------------------------------------------------------- | | Db user | Text | Username for database authentication. | | Use sso | Toggle | Authenticate via single sign-on instead of username/password. | | Db password | Password | Password for database authentication. | ### SSL / TLS [Section titled “SSL / TLS”](#ssl--tls) | Field | Type | Description | | -------------------- | ----------------- | ----------------------------------------------------------------------- | | Use ssl | Toggle | Encrypt the connection with SSL/TLS. | | Ssl mode | Select | SSL verification mode (e.g., disable, require, verify-ca, verify-full). | | Ssl auth client cert | Text (multi-line) | Client certificate (PEM) for mutual TLS authentication. | | Ssl auth client key | Text (multi-line) | Client private key (PEM) for mutual TLS authentication. | | Ssl auth root cert | Text (multi-line) | Root CA certificate (PEM) for verifying the server’s cert. | | Ssl auth cert revoke | Text (multi-line) | Certificate revocation list, if your environment uses one. | ### SSH Tunnel [Section titled “SSH Tunnel”](#ssh-tunnel) | Field | Type | Description | | --------------- | ----------------- | ----------------------------------------------------------------------- | | Use ssh | Toggle | Tunnel the connection through an SSH bastion. | | Ssh host | Text | SSH bastion hostname. | | Ssh port | Number | SSH bastion port (default 22). | | Ssh user | Text | SSH bastion username. | | Ssh password | Password | SSH bastion password (if password auth is used). | | Use ssh cert | Toggle | Authenticate to the SSH bastion with a private key instead of password. | | Ssh private key | Text (multi-line) | SSH private key (PEM) for bastion authentication. | | Ssh host key | Text (multi-line) | Expected SSH host key for bastion fingerprint verification. | # IBM DB2 > Configure an IBM DB2 database connection in PlaidCloud to query, import, and export data with your DB2 database instances. **IBM DB2** is IBM’s enterprise database, common in mainframe and mid-range environments. Use this connector to read and write DB2 tables from PlaidCloud workflows. Network access to the DB2 listener is required; for mainframe deployments, an SSH or VPN tunnel is typically required. ## Upstream Documentation [Section titled “Upstream Documentation”](#upstream-documentation) [The IBM DB2 documentation](https://www.ibm.com/support/pages/db2-database-product-documentation). ## Configuration [Section titled “Configuration”](#configuration) These fields appear when creating or editing this connection. Required vs optional depends on the authentication options you enable. ### Identification [Section titled “Identification”](#identification) | Field | Type | Description | | ------------ | ----------------- | ---------------------------------------------------------------------- | | Name | Text | Display name for this connection. | | Alias | Text (multi-line) | Optional alias or notes about the connection. | | Is active | Toggle | Whether the connection is enabled. Disable to pause without deleting. | | Db read only | Toggle | Restrict the connection to read-only operations. | | Access type | Select | Read-only, write-only, or read-write access level for this connection. | ### Connection [Section titled “Connection”](#connection) | Field | Type | Description | | ---------- | ------ | ---------------------------------------------- | | Db host | Text | Hostname or IP address of the database server. | | Db port | Number | Port number for the database connection. | | Db catalog | Text | Database, catalog, or schema to connect to. | ### Authentication [Section titled “Authentication”](#authentication) | Field | Type | Description | | ----------- | -------- | ------------------------------------------------------------- | | Db user | Text | Username for database authentication. | | Use sso | Toggle | Authenticate via single sign-on instead of username/password. | | Db password | Password | Password for database authentication. | ### SSL / TLS [Section titled “SSL / TLS”](#ssl--tls) | Field | Type | Description | | -------------------- | ----------------- | ----------------------------------------------------------------------- | | Use ssl | Toggle | Encrypt the connection with SSL/TLS. | | Ssl mode | Select | SSL verification mode (e.g., disable, require, verify-ca, verify-full). | | Ssl auth client cert | Text (multi-line) | Client certificate (PEM) for mutual TLS authentication. | | Ssl auth client key | Text (multi-line) | Client private key (PEM) for mutual TLS authentication. | | Ssl auth root cert | Text (multi-line) | Root CA certificate (PEM) for verifying the server’s cert. | | Ssl auth cert revoke | Text (multi-line) | Certificate revocation list, if your environment uses one. | ### SSH Tunnel [Section titled “SSH Tunnel”](#ssh-tunnel) | Field | Type | Description | | --------------- | ----------------- | ----------------------------------------------------------------------- | | Use ssh | Toggle | Tunnel the connection through an SSH bastion. | | Ssh host | Text | SSH bastion hostname. | | Ssh port | Number | SSH bastion port (default 22). | | Ssh user | Text | SSH bastion username. | | Ssh password | Password | SSH bastion password (if password auth is used). | | Use ssh cert | Toggle | Authenticate to the SSH bastion with a private key instead of password. | | Ssh private key | Text (multi-line) | SSH private key (PEM) for bastion authentication. | | Ssh host key | Text (multi-line) | Expected SSH host key for bastion fingerprint verification. | # IBM Informix > Set up an IBM Informix database connection in PlaidCloud to query, import, and export data with your Informix instances. **IBM Informix** is IBM’s transactional database, common in retail and OLTP deployments. Use this connector to read and write Informix tables from PlaidCloud workflows. Network access to the Informix server is required; SSH tunneling is supported for non-flat networks. ## Upstream Documentation [Section titled “Upstream Documentation”](#upstream-documentation) [IBM Informix documentation](https://www.ibm.com/docs/ar/informix-servers/14.10.0?). ## Configuration [Section titled “Configuration”](#configuration) These fields appear when creating or editing this connection. Required vs optional depends on the authentication options you enable. ### Identification [Section titled “Identification”](#identification) | Field | Type | Description | | ------------ | ----------------- | ---------------------------------------------------------------------- | | Name | Text | Display name for this connection. | | Alias | Text (multi-line) | Optional alias or notes about the connection. | | Is active | Toggle | Whether the connection is enabled. Disable to pause without deleting. | | Db read only | Toggle | Restrict the connection to read-only operations. | | Access type | Select | Read-only, write-only, or read-write access level for this connection. | ### Connection [Section titled “Connection”](#connection) | Field | Type | Description | | ---------- | ------ | ---------------------------------------------- | | Db host | Text | Hostname or IP address of the database server. | | Db port | Number | Port number for the database connection. | | Db catalog | Text | Database, catalog, or schema to connect to. | ### Authentication [Section titled “Authentication”](#authentication) | Field | Type | Description | | ----------- | -------- | ------------------------------------------------------------- | | Db user | Text | Username for database authentication. | | Use sso | Toggle | Authenticate via single sign-on instead of username/password. | | Db password | Password | Password for database authentication. | ### SSL / TLS [Section titled “SSL / TLS”](#ssl--tls) | Field | Type | Description | | -------------------- | ----------------- | ----------------------------------------------------------------------- | | Use ssl | Toggle | Encrypt the connection with SSL/TLS. | | Ssl mode | Select | SSL verification mode (e.g., disable, require, verify-ca, verify-full). | | Ssl auth client cert | Text (multi-line) | Client certificate (PEM) for mutual TLS authentication. | | Ssl auth client key | Text (multi-line) | Client private key (PEM) for mutual TLS authentication. | | Ssl auth root cert | Text (multi-line) | Root CA certificate (PEM) for verifying the server’s cert. | | Ssl auth cert revoke | Text (multi-line) | Certificate revocation list, if your environment uses one. | ### SSH Tunnel [Section titled “SSH Tunnel”](#ssh-tunnel) | Field | Type | Description | | --------------- | ----------------- | ----------------------------------------------------------------------- | | Use ssh | Toggle | Tunnel the connection through an SSH bastion. | | Ssh host | Text | SSH bastion hostname. | | Ssh port | Number | SSH bastion port (default 22). | | Ssh user | Text | SSH bastion username. | | Ssh password | Password | SSH bastion password (if password auth is used). | | Use ssh cert | Toggle | Authenticate to the SSH bastion with a private key instead of password. | | Ssh private key | Text (multi-line) | SSH private key (PEM) for bastion authentication. | | Ssh host key | Text (multi-line) | Expected SSH host key for bastion fingerprint verification. | # Microsoft Fabric > Configure a Microsoft Fabric connection in PlaidCloud to integrate analytics, data warehousing, and lakehouse capabilities. **Microsoft Fabric** combines Power BI, Synapse, and Data Factory into a unified analytics platform. Use this connector to access Fabric warehouses and lakehouses as relational sources from PlaidCloud workflows. Authentication is through a SQL Server-compatible endpoint plus your Microsoft tenant credentials. ## Upstream Documentation [Section titled “Upstream Documentation”](#upstream-documentation) [The Microsoft Fabric documentation](https://learn.microsoft.com/en-us/fabric/). ## Configuration [Section titled “Configuration”](#configuration) These fields appear when creating or editing this connection. Required vs optional depends on the authentication options you enable. ### Identification [Section titled “Identification”](#identification) | Field | Type | Description | | ------------ | ----------------- | ---------------------------------------------------------------------- | | Name | Text | Display name for this connection. | | Alias | Text (multi-line) | Optional alias or notes about the connection. | | Is active | Toggle | Whether the connection is enabled. Disable to pause without deleting. | | Db read only | Toggle | Restrict the connection to read-only operations. | | Access type | Select | Read-only, write-only, or read-write access level for this connection. | ### Connection [Section titled “Connection”](#connection) | Field | Type | Description | | ---------- | ------ | ---------------------------------------------- | | Db host | Text | Hostname or IP address of the database server. | | Db port | Number | Port number for the database connection. | | Db catalog | Text | Database, catalog, or schema to connect to. | ### Authentication [Section titled “Authentication”](#authentication) | Field | Type | Description | | ----------- | -------- | ------------------------------------- | | Db user | Text | Username for database authentication. | | Db password | Password | Password for database authentication. | ### SSL / TLS [Section titled “SSL / TLS”](#ssl--tls) | Field | Type | Description | | -------------------- | ----------------- | ----------------------------------------------------------------------- | | Use ssl | Toggle | Encrypt the connection with SSL/TLS. | | Ssl mode | Select | SSL verification mode (e.g., disable, require, verify-ca, verify-full). | | Ssl auth client cert | Text (multi-line) | Client certificate (PEM) for mutual TLS authentication. | | Ssl auth client key | Text (multi-line) | Client private key (PEM) for mutual TLS authentication. | | Ssl auth root cert | Text (multi-line) | Root CA certificate (PEM) for verifying the server’s cert. | | Ssl auth cert revoke | Text (multi-line) | Certificate revocation list, if your environment uses one. | ### SSH Tunnel [Section titled “SSH Tunnel”](#ssh-tunnel) | Field | Type | Description | | --------------- | ----------------- | ----------------------------------------------------------------------- | | Use ssh | Toggle | Tunnel the connection through an SSH bastion. | | Ssh host | Text | SSH bastion hostname. | | Ssh port | Number | SSH bastion port (default 22). | | Ssh user | Text | SSH bastion username. | | Ssh password | Password | SSH bastion password (if password auth is used). | | Use ssh cert | Toggle | Authenticate to the SSH bastion with a private key instead of password. | | Ssh private key | Text (multi-line) | SSH private key (PEM) for bastion authentication. | | Ssh host key | Text (multi-line) | Expected SSH host key for bastion fingerprint verification. | ### Other [Section titled “Other”](#other) | Field | Type | Description | | ----------- | ------ | ----------- | | Trust certs | Toggle | — | | Driver type | Select | — | | User auth | Toggle | — | # Microsoft SQL Server > Configure a Microsoft SQL Server connection in PlaidCloud to query, import, and export data with your SQL Server databases. **Microsoft SQL Server** is the relational database commonly bundled with on-premises Microsoft enterprise stacks. Use this connector to read and write SQL Server tables. Supports both SQL Server Authentication (username/password) and integrated authentication; SSL and SSH tunneling are available for non-flat-network deployments. ## Upstream Documentation [Section titled “Upstream Documentation”](#upstream-documentation) [Microsoft SQL Server documentation](https://learn.microsoft.com/en-us/sql/sql-server/). ## Configuration [Section titled “Configuration”](#configuration) These fields appear when creating or editing this connection. Required vs optional depends on the authentication options you enable. ### Identification [Section titled “Identification”](#identification) | Field | Type | Description | | ------------ | ----------------- | ---------------------------------------------------------------------- | | Name | Text | Display name for this connection. | | Alias | Text (multi-line) | Optional alias or notes about the connection. | | Is active | Toggle | Whether the connection is enabled. Disable to pause without deleting. | | Db read only | Toggle | Restrict the connection to read-only operations. | | Access type | Select | Read-only, write-only, or read-write access level for this connection. | ### Connection [Section titled “Connection”](#connection) | Field | Type | Description | | ---------- | ------ | ---------------------------------------------- | | Db host | Text | Hostname or IP address of the database server. | | Db port | Number | Port number for the database connection. | | Db catalog | Text | Database, catalog, or schema to connect to. | ### Authentication [Section titled “Authentication”](#authentication) | Field | Type | Description | | ----------- | -------- | ------------------------------------- | | Db user | Text | Username for database authentication. | | Db password | Password | Password for database authentication. | ### SSL / TLS [Section titled “SSL / TLS”](#ssl--tls) | Field | Type | Description | | -------------------- | ----------------- | ----------------------------------------------------------------------- | | Use ssl | Toggle | Encrypt the connection with SSL/TLS. | | Ssl mode | Select | SSL verification mode (e.g., disable, require, verify-ca, verify-full). | | Ssl auth client cert | Text (multi-line) | Client certificate (PEM) for mutual TLS authentication. | | Ssl auth client key | Text (multi-line) | Client private key (PEM) for mutual TLS authentication. | | Ssl auth root cert | Text (multi-line) | Root CA certificate (PEM) for verifying the server’s cert. | | Ssl auth cert revoke | Text (multi-line) | Certificate revocation list, if your environment uses one. | ### SSH Tunnel [Section titled “SSH Tunnel”](#ssh-tunnel) | Field | Type | Description | | --------------- | ----------------- | ----------------------------------------------------------------------- | | Use ssh | Toggle | Tunnel the connection through an SSH bastion. | | Ssh host | Text | SSH bastion hostname. | | Ssh port | Number | SSH bastion port (default 22). | | Ssh user | Text | SSH bastion username. | | Ssh password | Password | SSH bastion password (if password auth is used). | | Use ssh cert | Toggle | Authenticate to the SSH bastion with a private key instead of password. | | Ssh private key | Text (multi-line) | SSH private key (PEM) for bastion authentication. | | Ssh host key | Text (multi-line) | Expected SSH host key for bastion fingerprint verification. | ### Other [Section titled “Other”](#other) | Field | Type | Description | | ----------- | ------ | ----------- | | Trust certs | Toggle | — | | Driver type | Select | — | | User auth | Toggle | — | # MySQL > Configure a MySQL database connection in PlaidCloud to query, import, and export data with your MySQL database instances. **MySQL** is one of the most widely-deployed open-source relational databases. Use this connector to query, import, and export data from MySQL instances. The connector also works with MySQL-compatible databases (MariaDB, Aurora MySQL); for PlaidCloud-specific compatibility quirks, check connection behavior on a small test before relying on it in production. ## Upstream Documentation [Section titled “Upstream Documentation”](#upstream-documentation) [MySQL documentation](https://dev.mysql.com/doc/). ## Configuration [Section titled “Configuration”](#configuration) These fields appear when creating or editing this connection. Required vs optional depends on the authentication options you enable. ### Identification [Section titled “Identification”](#identification) | Field | Type | Description | | ------------ | ----------------- | ---------------------------------------------------------------------- | | Name | Text | Display name for this connection. | | Alias | Text (multi-line) | Optional alias or notes about the connection. | | Is active | Toggle | Whether the connection is enabled. Disable to pause without deleting. | | Db read only | Toggle | Restrict the connection to read-only operations. | | Access type | Select | Read-only, write-only, or read-write access level for this connection. | ### Connection [Section titled “Connection”](#connection) | Field | Type | Description | | ---------- | ------ | ---------------------------------------------- | | Db host | Text | Hostname or IP address of the database server. | | Db port | Number | Port number for the database connection. | | Db catalog | Text | Database, catalog, or schema to connect to. | ### Authentication [Section titled “Authentication”](#authentication) | Field | Type | Description | | ----------- | -------- | ------------------------------------------------------------- | | Db user | Text | Username for database authentication. | | Use sso | Toggle | Authenticate via single sign-on instead of username/password. | | Db password | Password | Password for database authentication. | ### SSL / TLS [Section titled “SSL / TLS”](#ssl--tls) | Field | Type | Description | | -------------------- | ----------------- | ----------------------------------------------------------------------- | | Use ssl | Toggle | Encrypt the connection with SSL/TLS. | | Ssl mode | Select | SSL verification mode (e.g., disable, require, verify-ca, verify-full). | | Ssl auth client cert | Text (multi-line) | Client certificate (PEM) for mutual TLS authentication. | | Ssl auth client key | Text (multi-line) | Client private key (PEM) for mutual TLS authentication. | | Ssl auth root cert | Text (multi-line) | Root CA certificate (PEM) for verifying the server’s cert. | | Ssl auth cert revoke | Text (multi-line) | Certificate revocation list, if your environment uses one. | ### SSH Tunnel [Section titled “SSH Tunnel”](#ssh-tunnel) | Field | Type | Description | | --------------- | ----------------- | ----------------------------------------------------------------------- | | Use ssh | Toggle | Tunnel the connection through an SSH bastion. | | Ssh host | Text | SSH bastion hostname. | | Ssh port | Number | SSH bastion port (default 22). | | Ssh user | Text | SSH bastion username. | | Ssh password | Password | SSH bastion password (if password auth is used). | | Use ssh cert | Toggle | Authenticate to the SSH bastion with a private key instead of password. | | Ssh private key | Text (multi-line) | SSH private key (PEM) for bastion authentication. | | Ssh host key | Text (multi-line) | Expected SSH host key for bastion fingerprint verification. | # ODBC > Set up an ODBC database connection in PlaidCloud to connect to any database system that provides an ODBC driver interface. **ODBC (Open Database Connectivity)** is a universal database interface that lets PlaidCloud connect to any system providing an ODBC driver — useful when a vendor doesn’t have a dedicated PlaidCloud connector. The connection string and driver name vary per source; consult the vendor’s ODBC documentation for the right values. ## Upstream Documentation [Section titled “Upstream Documentation”](#upstream-documentation) Using the ODBC connector will require configuration specific to the database. While ODBC is a generic connection type, each database may implement some specific configurations. Please refer to the ODBC documentation for the target database. ## Configuration [Section titled “Configuration”](#configuration) These fields appear when creating or editing this connection. Required vs optional depends on the authentication options you enable. ### Identification [Section titled “Identification”](#identification) | Field | Type | Description | | ------------ | ----------------- | ---------------------------------------------------------------------- | | Name | Text | Display name for this connection. | | Alias | Text (multi-line) | Optional alias or notes about the connection. | | Is active | Toggle | Whether the connection is enabled. Disable to pause without deleting. | | Db read only | Toggle | Restrict the connection to read-only operations. | | Access type | Select | Read-only, write-only, or read-write access level for this connection. | ### Connection [Section titled “Connection”](#connection) | Field | Type | Description | | -------------- | ------ | ---------------------------------------------- | | Db host | Text | Hostname or IP address of the database server. | | Db port | Number | Port number for the database connection. | | Db odbc driver | Select | — | | Db catalog | Text | Database, catalog, or schema to connect to. | ### Authentication [Section titled “Authentication”](#authentication) | Field | Type | Description | | ----------- | -------- | ------------------------------------------------------------- | | Db user | Text | Username for database authentication. | | Use sso | Toggle | Authenticate via single sign-on instead of username/password. | | Db password | Password | Password for database authentication. | ### SSL / TLS [Section titled “SSL / TLS”](#ssl--tls) | Field | Type | Description | | -------------------- | ----------------- | ----------------------------------------------------------------------- | | Use ssl | Toggle | Encrypt the connection with SSL/TLS. | | Ssl mode | Select | SSL verification mode (e.g., disable, require, verify-ca, verify-full). | | Ssl auth client cert | Text (multi-line) | Client certificate (PEM) for mutual TLS authentication. | | Ssl auth client key | Text (multi-line) | Client private key (PEM) for mutual TLS authentication. | | Ssl auth root cert | Text (multi-line) | Root CA certificate (PEM) for verifying the server’s cert. | | Ssl auth cert revoke | Text (multi-line) | Certificate revocation list, if your environment uses one. | ### SSH Tunnel [Section titled “SSH Tunnel”](#ssh-tunnel) | Field | Type | Description | | --------------- | ----------------- | ----------------------------------------------------------------------- | | Use ssh | Toggle | Tunnel the connection through an SSH bastion. | | Ssh host | Text | SSH bastion hostname. | | Ssh port | Number | SSH bastion port (default 22). | | Ssh user | Text | SSH bastion username. | | Ssh password | Password | SSH bastion password (if password auth is used). | | Use ssh cert | Toggle | Authenticate to the SSH bastion with a private key instead of password. | | Ssh private key | Text (multi-line) | SSH private key (PEM) for bastion authentication. | | Ssh host key | Text (multi-line) | Expected SSH host key for bastion fingerprint verification. | # Oracle > Set up an Oracle database connection in PlaidCloud to query, import, and export data with your Oracle database instances. **Oracle Database** is an enterprise relational database used widely in financial, ERP, and operational systems. Use this connector to read and write Oracle tables. Oracle’s network requires the TNS listener to be reachable from PlaidCloud or via an SSH tunnel; coordinate with your DBA on firewall rules before configuring. ## Upstream Documentation [Section titled “Upstream Documentation”](#upstream-documentation) [The Oracle database documentation](https://docs.oracle.com/en/database/). ## Configuration [Section titled “Configuration”](#configuration) These fields appear when creating or editing this connection. Required vs optional depends on the authentication options you enable. ### Identification [Section titled “Identification”](#identification) | Field | Type | Description | | ------------ | ----------------- | ---------------------------------------------------------------------- | | Name | Text | Display name for this connection. | | Alias | Text (multi-line) | Optional alias or notes about the connection. | | Is active | Toggle | Whether the connection is enabled. Disable to pause without deleting. | | Db read only | Toggle | Restrict the connection to read-only operations. | | Access type | Select | Read-only, write-only, or read-write access level for this connection. | ### Connection [Section titled “Connection”](#connection) | Field | Type | Description | | ---------- | ------ | ---------------------------------------------- | | Db host | Text | Hostname or IP address of the database server. | | Db port | Number | Port number for the database connection. | | Db catalog | Text | Database, catalog, or schema to connect to. | ### Authentication [Section titled “Authentication”](#authentication) | Field | Type | Description | | ----------- | -------- | ------------------------------------------------------------- | | Db user | Text | Username for database authentication. | | Use sso | Toggle | Authenticate via single sign-on instead of username/password. | | Db password | Password | Password for database authentication. | ### SSL / TLS [Section titled “SSL / TLS”](#ssl--tls) | Field | Type | Description | | -------------------- | ----------------- | ----------------------------------------------------------------------- | | Use ssl | Toggle | Encrypt the connection with SSL/TLS. | | Ssl mode | Select | SSL verification mode (e.g., disable, require, verify-ca, verify-full). | | Ssl auth client cert | Text (multi-line) | Client certificate (PEM) for mutual TLS authentication. | | Ssl auth client key | Text (multi-line) | Client private key (PEM) for mutual TLS authentication. | | Ssl auth root cert | Text (multi-line) | Root CA certificate (PEM) for verifying the server’s cert. | | Ssl auth cert revoke | Text (multi-line) | Certificate revocation list, if your environment uses one. | ### SSH Tunnel [Section titled “SSH Tunnel”](#ssh-tunnel) | Field | Type | Description | | --------------- | ----------------- | ----------------------------------------------------------------------- | | Use ssh | Toggle | Tunnel the connection through an SSH bastion. | | Ssh host | Text | SSH bastion hostname. | | Ssh port | Number | SSH bastion port (default 22). | | Ssh user | Text | SSH bastion username. | | Ssh password | Password | SSH bastion password (if password auth is used). | | Use ssh cert | Toggle | Authenticate to the SSH bastion with a private key instead of password. | | Ssh private key | Text (multi-line) | SSH private key (PEM) for bastion authentication. | | Ssh host key | Text (multi-line) | Expected SSH host key for bastion fingerprint verification. | ### Other [Section titled “Other”](#other) | Field | Type | Description | | --------------- | ------ | ----------- | | Connection type | Select | — | | Role | Select | — | | Service mode | Select | — | | Service | Text | — | # PlaidCloud Lakehouse > Configure the PlaidCloud Lakehouse database connection for high-performance querying and analytics on your lakehouse data. **PlaidCloud Lakehouse** is the built-in analytical data store inside every PlaidCloud workspace. This connector is primarily used to read from one workspace’s lakehouse into another, or to share data between tenants in multi-tenant deployments. Within a single workspace, project tables are accessible directly without needing this connector. ## Upstream Documentation [Section titled “Upstream Documentation”](#upstream-documentation) There is very little configuration necessary for using the built-in PlaidCloud Lakehouse. The [service documentation](https://docs.plaidcloud.com/docs/plaidcloud/analyze/dw/getting-started/). ## Configuration [Section titled “Configuration”](#configuration) These fields appear when creating or editing this connection. Required vs optional depends on the authentication options you enable. ### Identification [Section titled “Identification”](#identification) | Field | Type | Description | | ------------ | ----------------- | ---------------------------------------------------------------------- | | Name | Text | Display name for this connection. | | Alias | Text (multi-line) | Optional alias or notes about the connection. | | Is active | Toggle | Whether the connection is enabled. Disable to pause without deleting. | | Db read only | Toggle | Restrict the connection to read-only operations. | | Access type | Select | Read-only, write-only, or read-write access level for this connection. | ### Connection [Section titled “Connection”](#connection) | Field | Type | Description | | ---------- | ------ | ------------------------------------------- | | Db port | Number | Port number for the database connection. | | Db catalog | Text | Database, catalog, or schema to connect to. | ### Authentication [Section titled “Authentication”](#authentication) | Field | Type | Description | | ----------- | -------- | ------------------------------------------------------------- | | Db user | Text | Username for database authentication. | | Use sso | Toggle | Authenticate via single sign-on instead of username/password. | | Db password | Password | Password for database authentication. | ### SSL / TLS [Section titled “SSL / TLS”](#ssl--tls) | Field | Type | Description | | -------------------- | ----------------- | ----------------------------------------------------------------------- | | Use ssl | Toggle | Encrypt the connection with SSL/TLS. | | Ssl mode | Select | SSL verification mode (e.g., disable, require, verify-ca, verify-full). | | Ssl auth client cert | Text (multi-line) | Client certificate (PEM) for mutual TLS authentication. | | Ssl auth client key | Text (multi-line) | Client private key (PEM) for mutual TLS authentication. | | Ssl auth root cert | Text (multi-line) | Root CA certificate (PEM) for verifying the server’s cert. | | Ssl auth cert revoke | Text (multi-line) | Certificate revocation list, if your environment uses one. | ### SSH Tunnel [Section titled “SSH Tunnel”](#ssh-tunnel) | Field | Type | Description | | --------------- | ----------------- | ----------------------------------------------------------------------- | | Use ssh | Toggle | Tunnel the connection through an SSH bastion. | | Ssh host | Text | SSH bastion hostname. | | Ssh port | Number | SSH bastion port (default 22). | | Ssh user | Text | SSH bastion username. | | Ssh password | Password | SSH bastion password (if password auth is used). | | Use ssh cert | Toggle | Authenticate to the SSH bastion with a private key instead of password. | | Ssh private key | Text (multi-line) | SSH private key (PEM) for bastion authentication. | | Ssh host key | Text (multi-line) | Expected SSH host key for bastion fingerprint verification. | ### Other [Section titled “Other”](#other) | Field | Type | Description | | --------- | ------ | ----------- | | Server | Text | — | | Lakehouse | Number | — | # PostgreSQL > Configure a PostgreSQL database connection in PlaidCloud to query, import, and export data with your Postgres instances. **PostgreSQL** is a widely-used open-source relational database. Use this connector to query, import, and export data from any PostgreSQL instance — self-hosted, RDS, Cloud SQL, or other managed offerings. Supports SSL, SSH tunneling, and SSO authentication for secure connections. ## Upstream Documentation [Section titled “Upstream Documentation”](#upstream-documentation) [PostreSQL documentation](https://www.postgresql.org/docs/) ## Configuration [Section titled “Configuration”](#configuration) These fields appear when creating or editing this connection. Required vs optional depends on the authentication options you enable. ### Identification [Section titled “Identification”](#identification) | Field | Type | Description | | ------------ | ----------------- | ---------------------------------------------------------------------- | | Name | Text | Display name for this connection. | | Alias | Text (multi-line) | Optional alias or notes about the connection. | | Is active | Toggle | Whether the connection is enabled. Disable to pause without deleting. | | Db read only | Toggle | Restrict the connection to read-only operations. | | Access type | Select | Read-only, write-only, or read-write access level for this connection. | ### Connection [Section titled “Connection”](#connection) | Field | Type | Description | | ---------- | ------ | ---------------------------------------------- | | Db host | Text | Hostname or IP address of the database server. | | Db port | Number | Port number for the database connection. | | Db catalog | Text | Database, catalog, or schema to connect to. | ### Authentication [Section titled “Authentication”](#authentication) | Field | Type | Description | | ----------- | -------- | ------------------------------------------------------------- | | Db user | Text | Username for database authentication. | | Use sso | Toggle | Authenticate via single sign-on instead of username/password. | | Db password | Password | Password for database authentication. | ### SSL / TLS [Section titled “SSL / TLS”](#ssl--tls) | Field | Type | Description | | -------------------- | ----------------- | ----------------------------------------------------------------------- | | Use ssl | Toggle | Encrypt the connection with SSL/TLS. | | Ssl mode | Select | SSL verification mode (e.g., disable, require, verify-ca, verify-full). | | Ssl auth client cert | Text (multi-line) | Client certificate (PEM) for mutual TLS authentication. | | Ssl auth client key | Text (multi-line) | Client private key (PEM) for mutual TLS authentication. | | Ssl auth root cert | Text (multi-line) | Root CA certificate (PEM) for verifying the server’s cert. | | Ssl auth cert revoke | Text (multi-line) | Certificate revocation list, if your environment uses one. | ### SSH Tunnel [Section titled “SSH Tunnel”](#ssh-tunnel) | Field | Type | Description | | --------------- | ----------------- | ----------------------------------------------------------------------- | | Use ssh | Toggle | Tunnel the connection through an SSH bastion. | | Ssh host | Text | SSH bastion hostname. | | Ssh port | Number | SSH bastion port (default 22). | | Ssh user | Text | SSH bastion username. | | Ssh password | Password | SSH bastion password (if password auth is used). | | Use ssh cert | Toggle | Authenticate to the SSH bastion with a private key instead of password. | | Ssh private key | Text (multi-line) | SSH private key (PEM) for bastion authentication. | | Ssh host key | Text (multi-line) | Expected SSH host key for bastion fingerprint verification. | # Presto > Set up a Presto distributed query engine connection in PlaidCloud to run federated queries across multiple data sources. **Presto** is a distributed SQL query engine for federated queries across multiple data sources. Use this connector to query Presto deployments from PlaidCloud workflows. Authentication uses HTTP Basic Auth or Kerberos depending on your deployment. ## Upstream Documentation [Section titled “Upstream Documentation”](#upstream-documentation) [The Presto documentation](https://prestodb.io/docs/current/). ## Configuration [Section titled “Configuration”](#configuration) These fields appear when creating or editing this connection. Required vs optional depends on the authentication options you enable. ### Identification [Section titled “Identification”](#identification) | Field | Type | Description | | ------------ | ----------------- | ---------------------------------------------------------------------- | | Name | Text | Display name for this connection. | | Alias | Text (multi-line) | Optional alias or notes about the connection. | | Is active | Toggle | Whether the connection is enabled. Disable to pause without deleting. | | Db read only | Toggle | Restrict the connection to read-only operations. | | Access type | Select | Read-only, write-only, or read-write access level for this connection. | ### Connection [Section titled “Connection”](#connection) | Field | Type | Description | | ---------- | ------ | ---------------------------------------------- | | Db host | Text | Hostname or IP address of the database server. | | Db port | Number | Port number for the database connection. | | Db catalog | Text | Database, catalog, or schema to connect to. | ### Authentication [Section titled “Authentication”](#authentication) | Field | Type | Description | | ----------- | -------- | ------------------------------------------------------------- | | Db user | Text | Username for database authentication. | | Use sso | Toggle | Authenticate via single sign-on instead of username/password. | | Db password | Password | Password for database authentication. | ### SSL / TLS [Section titled “SSL / TLS”](#ssl--tls) | Field | Type | Description | | -------------------- | ----------------- | ----------------------------------------------------------------------- | | Use ssl | Toggle | Encrypt the connection with SSL/TLS. | | Ssl mode | Select | SSL verification mode (e.g., disable, require, verify-ca, verify-full). | | Ssl auth client cert | Text (multi-line) | Client certificate (PEM) for mutual TLS authentication. | | Ssl auth client key | Text (multi-line) | Client private key (PEM) for mutual TLS authentication. | | Ssl auth root cert | Text (multi-line) | Root CA certificate (PEM) for verifying the server’s cert. | | Ssl auth cert revoke | Text (multi-line) | Certificate revocation list, if your environment uses one. | ### SSH Tunnel [Section titled “SSH Tunnel”](#ssh-tunnel) | Field | Type | Description | | --------------- | ----------------- | ----------------------------------------------------------------------- | | Use ssh | Toggle | Tunnel the connection through an SSH bastion. | | Ssh host | Text | SSH bastion hostname. | | Ssh port | Number | SSH bastion port (default 22). | | Ssh user | Text | SSH bastion username. | | Ssh password | Password | SSH bastion password (if password auth is used). | | Use ssh cert | Toggle | Authenticate to the SSH bastion with a private key instead of password. | | Ssh private key | Text (multi-line) | SSH private key (PEM) for bastion authentication. | | Ssh host key | Text (multi-line) | Expected SSH host key for bastion fingerprint verification. | # SAP HANA > Set up an SAP HANA database connection in PlaidCloud to query, import, and export data with your HANA in-memory database. **SAP HANA** is SAP’s in-memory column-store database, common alongside SAP S/4HANA, BW/4HANA, and other SAP business applications. Use this connector to read and write HANA tables and views from PlaidCloud workflows. Authentication supports username/password plus optional SSL and SSH tunneling. ## Upstream Documentation [Section titled “Upstream Documentation”](#upstream-documentation) [The SAP HANA documentation](https://help.sap.com/docs/SAP_HANA_PLATFORM). ## Configuration [Section titled “Configuration”](#configuration) These fields appear when creating or editing this connection. Required vs optional depends on the authentication options you enable. ### Identification [Section titled “Identification”](#identification) | Field | Type | Description | | ------------ | ----------------- | ---------------------------------------------------------------------- | | Name | Text | Display name for this connection. | | Alias | Text (multi-line) | Optional alias or notes about the connection. | | Is active | Toggle | Whether the connection is enabled. Disable to pause without deleting. | | Db read only | Toggle | Restrict the connection to read-only operations. | | Access type | Select | Read-only, write-only, or read-write access level for this connection. | ### Connection [Section titled “Connection”](#connection) | Field | Type | Description | | ---------- | ------ | ---------------------------------------------- | | Db host | Text | Hostname or IP address of the database server. | | Db port | Number | Port number for the database connection. | | Db catalog | Text | Database, catalog, or schema to connect to. | ### Authentication [Section titled “Authentication”](#authentication) | Field | Type | Description | | ----------- | -------- | ------------------------------------------------------------- | | Db user | Text | Username for database authentication. | | Use sso | Toggle | Authenticate via single sign-on instead of username/password. | | Db password | Password | Password for database authentication. | ### SSL / TLS [Section titled “SSL / TLS”](#ssl--tls) | Field | Type | Description | | -------------------- | ----------------- | ----------------------------------------------------------------------- | | Use ssl | Toggle | Encrypt the connection with SSL/TLS. | | Ssl mode | Select | SSL verification mode (e.g., disable, require, verify-ca, verify-full). | | Ssl auth client cert | Text (multi-line) | Client certificate (PEM) for mutual TLS authentication. | | Ssl auth client key | Text (multi-line) | Client private key (PEM) for mutual TLS authentication. | | Ssl auth root cert | Text (multi-line) | Root CA certificate (PEM) for verifying the server’s cert. | | Ssl auth cert revoke | Text (multi-line) | Certificate revocation list, if your environment uses one. | ### SSH Tunnel [Section titled “SSH Tunnel”](#ssh-tunnel) | Field | Type | Description | | --------------- | ----------------- | ----------------------------------------------------------------------- | | Use ssh | Toggle | Tunnel the connection through an SSH bastion. | | Ssh host | Text | SSH bastion hostname. | | Ssh port | Number | SSH bastion port (default 22). | | Ssh user | Text | SSH bastion username. | | Ssh password | Password | SSH bastion password (if password auth is used). | | Use ssh cert | Toggle | Authenticate to the SSH bastion with a private key instead of password. | | Ssh private key | Text (multi-line) | SSH private key (PEM) for bastion authentication. | | Ssh host key | Text (multi-line) | Expected SSH host key for bastion fingerprint verification. | # Snowflake > Set up a Snowflake database connection in PlaidCloud to query, import, and export data with your Snowflake data warehouse. **Snowflake** is a cloud-native data warehouse with separate storage and compute. Use this connector to read and write Snowflake tables. Authentication supports username/password, key-pair, OAuth, and SSO; specify the warehouse (compute pool), database, role, and schema you want PlaidCloud to act under. ## Upstream Documentation [Section titled “Upstream Documentation”](#upstream-documentation) [The Snowflake documentation](https://docs.snowflake.com/). ## Configuration [Section titled “Configuration”](#configuration) These fields appear when creating or editing this connection. Required vs optional depends on the authentication options you enable. ### Identification [Section titled “Identification”](#identification) | Field | Type | Description | | ------------ | ----------------- | ---------------------------------------------------------------------- | | Name | Text | Display name for this connection. | | Alias | Text (multi-line) | Optional alias or notes about the connection. | | Is active | Toggle | Whether the connection is enabled. Disable to pause without deleting. | | Db read only | Toggle | Restrict the connection to read-only operations. | | Access type | Select | Read-only, write-only, or read-write access level for this connection. | ### Connection [Section titled “Connection”](#connection) | Field | Type | Description | | ---------- | ---- | ------------------------------------------- | | Db catalog | Text | Database, catalog, or schema to connect to. | ### Authentication [Section titled “Authentication”](#authentication) | Field | Type | Description | | ----------- | -------- | ------------------------------------------------------------- | | Db user | Text | Username for database authentication. | | Use sso | Toggle | Authenticate via single sign-on instead of username/password. | | Db password | Password | Password for database authentication. | ### SSL / TLS [Section titled “SSL / TLS”](#ssl--tls) | Field | Type | Description | | -------------------- | ----------------- | ----------------------------------------------------------------------- | | Use ssl | Toggle | Encrypt the connection with SSL/TLS. | | Ssl mode | Select | SSL verification mode (e.g., disable, require, verify-ca, verify-full). | | Ssl auth client cert | Text (multi-line) | Client certificate (PEM) for mutual TLS authentication. | | Ssl auth client key | Text (multi-line) | Client private key (PEM) for mutual TLS authentication. | | Ssl auth root cert | Text (multi-line) | Root CA certificate (PEM) for verifying the server’s cert. | | Ssl auth cert revoke | Text (multi-line) | Certificate revocation list, if your environment uses one. | ### SSH Tunnel [Section titled “SSH Tunnel”](#ssh-tunnel) | Field | Type | Description | | --------------- | ----------------- | ----------------------------------------------------------------------- | | Use ssh | Toggle | Tunnel the connection through an SSH bastion. | | Ssh host | Text | SSH bastion hostname. | | Ssh port | Number | SSH bastion port (default 22). | | Ssh user | Text | SSH bastion username. | | Ssh password | Password | SSH bastion password (if password auth is used). | | Use ssh cert | Toggle | Authenticate to the SSH bastion with a private key instead of password. | | Ssh private key | Text (multi-line) | SSH private key (PEM) for bastion authentication. | | Ssh host key | Text (multi-line) | Expected SSH host key for bastion fingerprint verification. | ### Other [Section titled “Other”](#other) | Field | Type | Description | | --------- | ---- | ----------- | | Server | Text | — | | Warehouse | Text | — | # Apache Spark > Set up an Apache Spark database connection in PlaidCloud to run distributed queries and integrate big data into workflows. **Apache Spark** is the distributed compute engine commonly used for ETL over large datasets. Use this connector to read and write data through Spark SQL endpoints (typically Spark Thrift Server). For Databricks-managed Spark, prefer the [Azure Databricks](../azure-databricks/) connector. ## Upstream Documentation [Section titled “Upstream Documentation”](#upstream-documentation) [The Apache Spark documentation](https://spark.apache.org/documentation.html). The [Apache project](https://spark.apache.org/). ## Setup [Section titled “Setup”](#setup) This connector uses a vendor-specific authentication flow and is configured directly from the **Connections** screen in your workspace. The configuration fields shown depend on the credentials your tenant administrator has provisioned for the integration. See the upstream [spark documentation](https://spark.apache.org/docs/latest/) for the latest setup specifics. If you need help setting up this connector for your tenant, contact your account team — connector-specific credentials, environment URLs, and any required pre-provisioning typically need to be coordinated with PlaidCloud support. # StarRocks > Configure a StarRocks database connection in PlaidCloud to run high-performance analytical queries on large-scale data sets. **StarRocks** is the high-performance analytical database that powers PlaidCloud’s Lakehouse v2 (tracking StarRocks 4.1). Use this connector to query standalone StarRocks deployments — for the in-product lakehouse, you don’t need this connector. ## Upstream Documentation [Section titled “Upstream Documentation”](#upstream-documentation) [StarRocks documentation](https://docs.starrocks.io/docs/introduction/StarRocks_intro/). ## Setup [Section titled “Setup”](#setup) This connector uses a vendor-specific authentication flow and is configured directly from the **Connections** screen in your workspace. The configuration fields shown depend on the credentials your tenant administrator has provisioned for the integration. See the upstream [starrocks documentation](https://docs.starrocks.io/) for the latest setup specifics. If you need help setting up this connector for your tenant, contact your account team — connector-specific credentials, environment URLs, and any required pre-provisioning typically need to be coordinated with PlaidCloud support. # Trino > Set up a Trino distributed query engine connection in PlaidCloud to run federated queries across multiple data sources. **Trino** (formerly PrestoSQL) is the distributed SQL query engine commonly used over data lakes. Use this connector to query Trino deployments from PlaidCloud workflows. Authentication uses HTTP Basic Auth or JWT; the catalog and schema you target determine which underlying data source the query hits. ## Upstream Documentation [Section titled “Upstream Documentation”](#upstream-documentation) [The Trino documentation](https://trino.io/docs/current/index.html). ## Setup [Section titled “Setup”](#setup) This connector uses a vendor-specific authentication flow and is configured directly from the **Connections** screen in your workspace. The configuration fields shown depend on the credentials your tenant administrator has provisioned for the integration. See the upstream [trino documentation](https://trino.io/docs/current/) for the latest setup specifics. If you need help setting up this connector for your tenant, contact your account team — connector-specific credentials, environment URLs, and any required pre-provisioning typically need to be coordinated with PlaidCloud support. # ERP System Connections > Connect PlaidCloud to enterprise ERP systems including SAP, Oracle, Infor, and JD Edwards for data extraction and integration. PlaidCloud provides dedicated connectors for major enterprise ERP systems. Each ERP exposes data through its own protocol mix — RFCs, SOAP, REST, or direct database access — so each connector encapsulates the right pattern for that vendor. ## SAP [Section titled “SAP”](#sap) * [SAP ECC](/reference/connectors/erp/sap-ecc/) * [SAP S/4HANA](/reference/connectors/erp/sap-s4/) * [SAP Analytics Cloud (SAC)](/reference/connectors/erp/sap-sac/) * [SAP Profitability and Performance Management (PaPM)](/reference/connectors/erp/sap-papm/) * [SAP Profitability and Cost Management (PCM)](/reference/connectors/erp/sap-pcm/) ## Oracle [Section titled “Oracle”](#oracle) * [Oracle EBS](/reference/connectors/erp/oracle-ebs/) * [Oracle Fusion](/reference/connectors/erp/oracle-fusion/) ## Other ERPs [Section titled “Other ERPs”](#other-erps) * [Infor](/reference/connectors/erp/infor/) * [JD Edwards (Legacy)](/reference/connectors/erp/jde-legacy/) # Infor Connector > Set up an Infor ERP system connection in PlaidCloud to integrate manufacturing, distribution, and financial data into workflows. ## Upstream Documentation [Section titled “Upstream Documentation”](#upstream-documentation) [The Infor documentation](https://docs.infor.com/en-us). ## Setup [Section titled “Setup”](#setup) This connector uses a vendor-specific authentication flow and is configured directly from the **Connections** screen in your workspace. The configuration fields shown depend on the credentials your tenant administrator has provisioned for the integration. See the upstream [infor documentation](https://docs.infor.com/) for the latest setup specifics. If you need help setting up this connector for your tenant, contact your account team — connector-specific credentials, environment URLs, and any required pre-provisioning typically need to be coordinated with PlaidCloud support. # JD Edwards (Legacy) Connector > Configure a JD Edwards Legacy ERP connection in PlaidCloud to integrate financial and operational data into your workflows. ## Upstream Documentation [Section titled “Upstream Documentation”](#upstream-documentation) [The JDE documentation](https://www.oracle.com/technical-resources/documentation/jd-edwards-enterpriseone.html). ## Setup [Section titled “Setup”](#setup) This connector uses a vendor-specific authentication flow and is configured directly from the **Connections** screen in your workspace. The configuration fields shown depend on the credentials your tenant administrator has provisioned for the integration. See the upstream [jde-legacy documentation](https://docs.oracle.com/cd/E84502_01/index.htm) for the latest setup specifics. If you need help setting up this connector for your tenant, contact your account team — connector-specific credentials, environment URLs, and any required pre-provisioning typically need to be coordinated with PlaidCloud support. # Oracle EBS Connector > Set up an Oracle E-Business Suite connection in PlaidCloud to integrate ERP financial and operational data into workflows. ## Upstream Documentation [Section titled “Upstream Documentation”](#upstream-documentation) [The Oracle EBS documentation](https://docs.oracle.com/cd/E51111_01/current/html/docset.html). ## Setup [Section titled “Setup”](#setup) This connector uses a vendor-specific authentication flow and is configured directly from the **Connections** screen in your workspace. The configuration fields shown depend on the credentials your tenant administrator has provisioned for the integration. See the upstream [oracle-ebs documentation](https://docs.oracle.com/cd/E26401_01/index.htm) for the latest setup specifics. If you need help setting up this connector for your tenant, contact your account team — connector-specific credentials, environment URLs, and any required pre-provisioning typically need to be coordinated with PlaidCloud support. # Oracle Fusion Connector > Set up an Oracle Fusion Cloud ERP connection in PlaidCloud to integrate financial and operational data into your workflows. ## Upstream Documentation [Section titled “Upstream Documentation”](#upstream-documentation) [The Oracle Fusion applications documentation](https://www.oracle.com/middleware/technologies/fusion-apps-doc.html). ## Setup [Section titled “Setup”](#setup) This connector uses a vendor-specific authentication flow and is configured directly from the **Connections** screen in your workspace. The configuration fields shown depend on the credentials your tenant administrator has provisioned for the integration. See the upstream [oracle-fusion documentation](https://docs.oracle.com/en/cloud/saas/) for the latest setup specifics. If you need help setting up this connector for your tenant, contact your account team — connector-specific credentials, environment URLs, and any required pre-provisioning typically need to be coordinated with PlaidCloud support. # SAP ECC Connector > Configure an SAP ECC ERP connection in PlaidCloud to integrate financial, logistics, and operational data into your workflows. ## Upstream Documentation [Section titled “Upstream Documentation”](#upstream-documentation) SAP has removed all ECC documentation and currently only provides documentation for [S/4HANA](https://help.sap.com/docs/SAP_S4HANA_ON-PREMISE). ## Configuration [Section titled “Configuration”](#configuration) These fields appear when creating or editing this connection. Required vs optional depends on the authentication options you enable. ### Identification [Section titled “Identification”](#identification) | Field | Type | Description | | ------------ | ----------------- | ---------------------------------------------------------------------- | | Name | Text | Display name for this connection. | | Alias | Text (multi-line) | Optional alias or notes about the connection. | | Is active | Toggle | Whether the connection is enabled. Disable to pause without deleting. | | Db read only | Toggle | Restrict the connection to read-only operations. | | Access type | Select | Read-only, write-only, or read-write access level for this connection. | ### Other [Section titled “Other”](#other) | Field | Type | Description | | ------ | -------- | ----------- | | Client | Text | — | | Lang | Select | — | | Trace | Select | — | | Ashost | Text | — | | Sysnr | Text | — | | Mshost | Text | — | | Msserv | Text | — | | Sysid | Text | — | | Group | Text | — | | User | Text | — | | Passwd | Password | — | # SAP Profitability and Performance Management (PaPM) Connector > Set up an SAP PaPM connection in PlaidCloud to integrate profitability analysis and performance management data into workflows. ## Upstream Documentation [Section titled “Upstream Documentation”](#upstream-documentation) [The SAP PaPM documentation](https://help.sap.com/docs/SAP_PROFITABILITY_PERFORMANCE_MANAGEMENT). ## Setup [Section titled “Setup”](#setup) This connector uses a vendor-specific authentication flow and is configured directly from the **Connections** screen in your workspace. The configuration fields shown depend on the credentials your tenant administrator has provisioned for the integration. See the upstream [sap-papm documentation](https://help.sap.com/docs/SAP_PROFITABILITY_AND_PERFORMANCE_MANAGEMENT) for the latest setup specifics. If you need help setting up this connector for your tenant, contact your account team — connector-specific credentials, environment URLs, and any required pre-provisioning typically need to be coordinated with PlaidCloud support. # SAP Profitability and Cost Management (PCM) Connector > Configure an SAP Profitability and Cost Management connection in PlaidCloud to integrate cost allocation data into workflows. ## Upstream Documentation [Section titled “Upstream Documentation”](#upstream-documentation) [The SAP PCM legacy documentation](https://help.sap.com/docs/SAP_PROFITABILITY_AND_COST_MANAGEMENT). ## Configuration [Section titled “Configuration”](#configuration) These fields appear when creating or editing this connection. Required vs optional depends on the authentication options you enable. ### Identification [Section titled “Identification”](#identification) | Field | Type | Description | | ------------ | ----------------- | ---------------------------------------------------------------------- | | Name | Text | Display name for this connection. | | Alias | Text (multi-line) | Optional alias or notes about the connection. | | Is active | Toggle | Whether the connection is enabled. Disable to pause without deleting. | | Db read only | Toggle | Restrict the connection to read-only operations. | | Access type | Select | Read-only, write-only, or read-write access level for this connection. | ### Authentication [Section titled “Authentication”](#authentication) | Field | Type | Description | | ----------- | -------- | ------------------------------------------------------------- | | Db user | Text | Username for database authentication. | | Use sso | Toggle | Authenticate via single sign-on instead of username/password. | | Db password | Password | Password for database authentication. | # SAP S/4HANA Connector > Configure an SAP S/4HANA ERP connection in PlaidCloud to integrate real-time financial and operational data into workflows. ## Upstream Documentation [Section titled “Upstream Documentation”](#upstream-documentation) See the [SAP S/4HANA documentation](https://help.sap.com/docs/SAP_S4HANA_ON-PREMISE). ## Setup [Section titled “Setup”](#setup) This connector uses a vendor-specific authentication flow and is configured directly from the **Connections** screen in your workspace. The configuration fields shown depend on the credentials your tenant administrator has provisioned for the integration. See the upstream [sap-s4 documentation](https://help.sap.com/docs/SAP_S4HANA_ON-PREMISE) for the latest setup specifics. If you need help setting up this connector for your tenant, contact your account team — connector-specific credentials, environment URLs, and any required pre-provisioning typically need to be coordinated with PlaidCloud support. # SAP Analytics Cloud Connector > Configure a SAP Analytics Cloud connection in PlaidCloud to integrate planning, analytics, and reporting data into workflows. ## Upstream Documentation [Section titled “Upstream Documentation”](#upstream-documentation) [The SAP Analytics Cloud documentation](https://help.sap.com/docs/SAP_ANALYTICS_CLOUD). ## Setup [Section titled “Setup”](#setup) This connector uses a vendor-specific authentication flow and is configured directly from the **Connections** screen in your workspace. The configuration fields shown depend on the credentials your tenant administrator has provisioned for the integration. See the upstream [sap-sac documentation](https://help.sap.com/docs/SAP_ANALYTICS_CLOUD) for the latest setup specifics. If you need help setting up this connector for your tenant, contact your account team — connector-specific credentials, environment URLs, and any required pre-provisioning typically need to be coordinated with PlaidCloud support. # Git Repository Connections > Connect PlaidCloud to Git repositories including GitHub, GitLab, Bitbucket, Azure Repos, and AWS CodeCommit for version control. PlaidCloud connects to Git hosts so workflows can read from (or push to) version-controlled repositories. Useful for sourcing configuration, scripts, or templated files that live in source control rather than a database or document account. ## Providers [Section titled “Providers”](#providers) * [GitHub](/reference/connectors/git/github/) * [GitLab](/reference/connectors/git/gitlab/) * [Bitbucket](/reference/connectors/git/bitbucket/) * [Azure Repos](/reference/connectors/git/azure-repos/) * [AWS CodeCommit](/reference/connectors/git/codecommit/) # Azure Repos Repository Connector > Configure an Azure Repos connection in PlaidCloud to integrate version-controlled code and configuration into your workflows. ## Service Documentation [Section titled “Service Documentation”](#service-documentation) [The Azure Repos service documentation](codecommit). ## Configuration [Section titled “Configuration”](#configuration) These fields appear when creating or editing this connection. Required vs optional depends on the authentication options you enable. ### Identification [Section titled “Identification”](#identification) | Field | Type | Description | | ------------ | ----------------- | ---------------------------------------------------------------------- | | Name | Text | Display name for this connection. | | Alias | Text (multi-line) | Optional alias or notes about the connection. | | Is active | Toggle | Whether the connection is enabled. Disable to pause without deleting. | | Db read only | Toggle | Restrict the connection to read-only operations. | | Access type | Select | Read-only, write-only, or read-write access level for this connection. | ### Authentication [Section titled “Authentication”](#authentication) | Field | Type | Description | | ------- | ------ | ------------------------------------------------------------- | | Use sso | Toggle | Authenticate via single sign-on instead of username/password. | ### SSL / TLS [Section titled “SSL / TLS”](#ssl--tls) | Field | Type | Description | | -------------------- | ----------------- | ----------------------------------------------------------------------- | | Use ssl | Toggle | Encrypt the connection with SSL/TLS. | | Ssl mode | Select | SSL verification mode (e.g., disable, require, verify-ca, verify-full). | | Ssl auth client cert | Text (multi-line) | Client certificate (PEM) for mutual TLS authentication. | | Ssl auth client key | Text (multi-line) | Client private key (PEM) for mutual TLS authentication. | | Ssl auth root cert | Text (multi-line) | Root CA certificate (PEM) for verifying the server’s cert. | | Ssl auth cert revoke | Text (multi-line) | Certificate revocation list, if your environment uses one. | ### SSH Tunnel [Section titled “SSH Tunnel”](#ssh-tunnel) | Field | Type | Description | | --------------- | ----------------- | ----------------------------------------------------------------------- | | Use ssh | Toggle | Tunnel the connection through an SSH bastion. | | Ssh host | Text | SSH bastion hostname. | | Ssh port | Number | SSH bastion port (default 22). | | Ssh user | Text | SSH bastion username. | | Ssh password | Password | SSH bastion password (if password auth is used). | | Use ssh cert | Toggle | Authenticate to the SSH bastion with a private key instead of password. | | Ssh private key | Text (multi-line) | SSH private key (PEM) for bastion authentication. | | Ssh host key | Text (multi-line) | Expected SSH host key for bastion fingerprint verification. | ### Other [Section titled “Other”](#other) | Field | Type | Description | | ---------------- | -------- | ----------- | | Repo path | Text | — | | Default branch | Text | — | | Start path | Text | — | | Service username | Text | — | | Token | Password | — | # BitBucket Repository Connector > Set up a Bitbucket repository connection in PlaidCloud to integrate version-controlled code and configuration into your workflows. ## Service Documentation [Section titled “Service Documentation”](#service-documentation) [The BitBucket service documentation](https://bitbucket.org/product/guides). ## Configuration [Section titled “Configuration”](#configuration) These fields appear when creating or editing this connection. Required vs optional depends on the authentication options you enable. ### Identification [Section titled “Identification”](#identification) | Field | Type | Description | | ------------ | ----------------- | ---------------------------------------------------------------------- | | Name | Text | Display name for this connection. | | Alias | Text (multi-line) | Optional alias or notes about the connection. | | Is active | Toggle | Whether the connection is enabled. Disable to pause without deleting. | | Db read only | Toggle | Restrict the connection to read-only operations. | | Access type | Select | Read-only, write-only, or read-write access level for this connection. | ### Authentication [Section titled “Authentication”](#authentication) | Field | Type | Description | | ------- | ------ | ------------------------------------------------------------- | | Use sso | Toggle | Authenticate via single sign-on instead of username/password. | ### SSL / TLS [Section titled “SSL / TLS”](#ssl--tls) | Field | Type | Description | | -------------------- | ----------------- | ----------------------------------------------------------------------- | | Use ssl | Toggle | Encrypt the connection with SSL/TLS. | | Ssl mode | Select | SSL verification mode (e.g., disable, require, verify-ca, verify-full). | | Ssl auth client cert | Text (multi-line) | Client certificate (PEM) for mutual TLS authentication. | | Ssl auth client key | Text (multi-line) | Client private key (PEM) for mutual TLS authentication. | | Ssl auth root cert | Text (multi-line) | Root CA certificate (PEM) for verifying the server’s cert. | | Ssl auth cert revoke | Text (multi-line) | Certificate revocation list, if your environment uses one. | ### SSH Tunnel [Section titled “SSH Tunnel”](#ssh-tunnel) | Field | Type | Description | | --------------- | ----------------- | ----------------------------------------------------------------------- | | Use ssh | Toggle | Tunnel the connection through an SSH bastion. | | Ssh host | Text | SSH bastion hostname. | | Ssh port | Number | SSH bastion port (default 22). | | Ssh user | Text | SSH bastion username. | | Ssh password | Password | SSH bastion password (if password auth is used). | | Use ssh cert | Toggle | Authenticate to the SSH bastion with a private key instead of password. | | Ssh private key | Text (multi-line) | SSH private key (PEM) for bastion authentication. | | Ssh host key | Text (multi-line) | Expected SSH host key for bastion fingerprint verification. | ### Other [Section titled “Other”](#other) | Field | Type | Description | | ---------------- | -------- | ----------- | | Repo path | Text | — | | Default branch | Text | — | | Start path | Text | — | | Service username | Text | — | | Token | Password | — | # AWS CodeCommit Repository Connector > Set up an AWS CodeCommit repository connection in PlaidCloud to integrate version-controlled code and configuration into workflows. ## Service Documentation [Section titled “Service Documentation”](#service-documentation) [The AWS CodeCommit service documentation](https://docs.aws.amazon.com/codecommit/). ## Configuration [Section titled “Configuration”](#configuration) These fields appear when creating or editing this connection. Required vs optional depends on the authentication options you enable. ### Identification [Section titled “Identification”](#identification) | Field | Type | Description | | ------------ | ----------------- | ---------------------------------------------------------------------- | | Name | Text | Display name for this connection. | | Alias | Text (multi-line) | Optional alias or notes about the connection. | | Is active | Toggle | Whether the connection is enabled. Disable to pause without deleting. | | Db read only | Toggle | Restrict the connection to read-only operations. | | Access type | Select | Read-only, write-only, or read-write access level for this connection. | ### Authentication [Section titled “Authentication”](#authentication) | Field | Type | Description | | ------- | ------ | ------------------------------------------------------------- | | Use sso | Toggle | Authenticate via single sign-on instead of username/password. | ### SSL / TLS [Section titled “SSL / TLS”](#ssl--tls) | Field | Type | Description | | -------------------- | ----------------- | ----------------------------------------------------------------------- | | Use ssl | Toggle | Encrypt the connection with SSL/TLS. | | Ssl mode | Select | SSL verification mode (e.g., disable, require, verify-ca, verify-full). | | Ssl auth client cert | Text (multi-line) | Client certificate (PEM) for mutual TLS authentication. | | Ssl auth client key | Text (multi-line) | Client private key (PEM) for mutual TLS authentication. | | Ssl auth root cert | Text (multi-line) | Root CA certificate (PEM) for verifying the server’s cert. | | Ssl auth cert revoke | Text (multi-line) | Certificate revocation list, if your environment uses one. | ### SSH Tunnel [Section titled “SSH Tunnel”](#ssh-tunnel) | Field | Type | Description | | --------------- | ----------------- | ----------------------------------------------------------------------- | | Use ssh | Toggle | Tunnel the connection through an SSH bastion. | | Ssh host | Text | SSH bastion hostname. | | Ssh port | Number | SSH bastion port (default 22). | | Ssh user | Text | SSH bastion username. | | Ssh password | Password | SSH bastion password (if password auth is used). | | Use ssh cert | Toggle | Authenticate to the SSH bastion with a private key instead of password. | | Ssh private key | Text (multi-line) | SSH private key (PEM) for bastion authentication. | | Ssh host key | Text (multi-line) | Expected SSH host key for bastion fingerprint verification. | ### Other [Section titled “Other”](#other) | Field | Type | Description | | ---------------- | -------- | ----------- | | Repo path | Text | — | | Default branch | Text | — | | Start path | Text | — | | Service username | Text | — | | Token | Password | — | # GitHub Repository Connector > Set up a GitHub repository connection in PlaidCloud to integrate version-controlled code and configuration into your workflows. ## Service Documentation [Section titled “Service Documentation”](#service-documentation) [The GitHub service documentation](https://docs.github.com/). ## Configuration [Section titled “Configuration”](#configuration) These fields appear when creating or editing this connection. Required vs optional depends on the authentication options you enable. ### Identification [Section titled “Identification”](#identification) | Field | Type | Description | | ------------ | ----------------- | ---------------------------------------------------------------------- | | Name | Text | Display name for this connection. | | Alias | Text (multi-line) | Optional alias or notes about the connection. | | Is active | Toggle | Whether the connection is enabled. Disable to pause without deleting. | | Db read only | Toggle | Restrict the connection to read-only operations. | | Access type | Select | Read-only, write-only, or read-write access level for this connection. | ### Authentication [Section titled “Authentication”](#authentication) | Field | Type | Description | | ------- | ------ | ------------------------------------------------------------- | | Use sso | Toggle | Authenticate via single sign-on instead of username/password. | ### SSL / TLS [Section titled “SSL / TLS”](#ssl--tls) | Field | Type | Description | | -------------------- | ----------------- | ----------------------------------------------------------------------- | | Use ssl | Toggle | Encrypt the connection with SSL/TLS. | | Ssl mode | Select | SSL verification mode (e.g., disable, require, verify-ca, verify-full). | | Ssl auth client cert | Text (multi-line) | Client certificate (PEM) for mutual TLS authentication. | | Ssl auth client key | Text (multi-line) | Client private key (PEM) for mutual TLS authentication. | | Ssl auth root cert | Text (multi-line) | Root CA certificate (PEM) for verifying the server’s cert. | | Ssl auth cert revoke | Text (multi-line) | Certificate revocation list, if your environment uses one. | ### SSH Tunnel [Section titled “SSH Tunnel”](#ssh-tunnel) | Field | Type | Description | | --------------- | ----------------- | ----------------------------------------------------------------------- | | Use ssh | Toggle | Tunnel the connection through an SSH bastion. | | Ssh host | Text | SSH bastion hostname. | | Ssh port | Number | SSH bastion port (default 22). | | Ssh user | Text | SSH bastion username. | | Ssh password | Password | SSH bastion password (if password auth is used). | | Use ssh cert | Toggle | Authenticate to the SSH bastion with a private key instead of password. | | Ssh private key | Text (multi-line) | SSH private key (PEM) for bastion authentication. | | Ssh host key | Text (multi-line) | Expected SSH host key for bastion fingerprint verification. | ### Other [Section titled “Other”](#other) | Field | Type | Description | | ---------------- | -------- | ----------- | | Repo path | Text | — | | Default branch | Text | — | | Start path | Text | — | | Service username | Text | — | | Token | Password | — | # GitLab Repository Connector > Configure a GitLab repository connection in PlaidCloud to integrate version-controlled code and configuration into your workflows. ## Service Documentation [Section titled “Service Documentation”](#service-documentation) [The GitLab service documentation](https://docs.gitlab.com/). ## Configuration [Section titled “Configuration”](#configuration) These fields appear when creating or editing this connection. Required vs optional depends on the authentication options you enable. ### Identification [Section titled “Identification”](#identification) | Field | Type | Description | | ------------ | ----------------- | ---------------------------------------------------------------------- | | Name | Text | Display name for this connection. | | Alias | Text (multi-line) | Optional alias or notes about the connection. | | Is active | Toggle | Whether the connection is enabled. Disable to pause without deleting. | | Db read only | Toggle | Restrict the connection to read-only operations. | | Access type | Select | Read-only, write-only, or read-write access level for this connection. | ### Authentication [Section titled “Authentication”](#authentication) | Field | Type | Description | | ------- | ------ | ------------------------------------------------------------- | | Use sso | Toggle | Authenticate via single sign-on instead of username/password. | ### SSL / TLS [Section titled “SSL / TLS”](#ssl--tls) | Field | Type | Description | | -------------------- | ----------------- | ----------------------------------------------------------------------- | | Use ssl | Toggle | Encrypt the connection with SSL/TLS. | | Ssl mode | Select | SSL verification mode (e.g., disable, require, verify-ca, verify-full). | | Ssl auth client cert | Text (multi-line) | Client certificate (PEM) for mutual TLS authentication. | | Ssl auth client key | Text (multi-line) | Client private key (PEM) for mutual TLS authentication. | | Ssl auth root cert | Text (multi-line) | Root CA certificate (PEM) for verifying the server’s cert. | | Ssl auth cert revoke | Text (multi-line) | Certificate revocation list, if your environment uses one. | ### SSH Tunnel [Section titled “SSH Tunnel”](#ssh-tunnel) | Field | Type | Description | | --------------- | ----------------- | ----------------------------------------------------------------------- | | Use ssh | Toggle | Tunnel the connection through an SSH bastion. | | Ssh host | Text | SSH bastion hostname. | | Ssh port | Number | SSH bastion port (default 22). | | Ssh user | Text | SSH bastion username. | | Ssh password | Password | SSH bastion password (if password auth is used). | | Use ssh cert | Toggle | Authenticate to the SSH bastion with a private key instead of password. | | Ssh private key | Text (multi-line) | SSH private key (PEM) for bastion authentication. | | Ssh host key | Text (multi-line) | Expected SSH host key for bastion fingerprint verification. | ### Other [Section titled “Other”](#other) | Field | Type | Description | | ---------------- | -------- | ----------- | | Repo path | Text | — | | Default branch | Text | — | | Start path | Text | — | | Service username | Text | — | | Token | Password | — | # Google Service Connections > Connect PlaidCloud to Google services including BigQuery for analytics and Google Sheets for spreadsheet integration. PlaidCloud connects to Google services via Google Cloud service accounts (BigQuery) and OAuth (Google Sheets). Each connector targets a specific Google product family. ## Providers [Section titled “Providers”](#providers) * [BigQuery](/reference/connectors/google/big-query/) — Google’s cloud data warehouse * [Google Sheets](/reference/connectors/google/gspread/) — read and write spreadsheet data # Google BigQuery Connector > Configure a Google BigQuery connection in PlaidCloud to run analytical queries and integrate large-scale data into workflows. ## Connection Documentation [Section titled “Connection Documentation”](#connection-documentation) [The Google BigQuery documentation](https://docs.cloud.google.com/bigquery/docs). ## Configuration [Section titled “Configuration”](#configuration) These fields appear when creating or editing this connection. Required vs optional depends on the authentication options you enable. ### Identification [Section titled “Identification”](#identification) | Field | Type | Description | | ------------ | ----------------- | ---------------------------------------------------------------------- | | Name | Text | Display name for this connection. | | Alias | Text (multi-line) | Optional alias or notes about the connection. | | Is active | Toggle | Whether the connection is enabled. Disable to pause without deleting. | | Db read only | Toggle | Restrict the connection to read-only operations. | | Access type | Select | Read-only, write-only, or read-write access level for this connection. | ### Connection [Section titled “Connection”](#connection) | Field | Type | Description | | ---------- | ---- | ------------------------------------------- | | Db project | Text | — | | Db dataset | Text | — | | Db catalog | Text | Database, catalog, or schema to connect to. | ### Authentication [Section titled “Authentication”](#authentication) | Field | Type | Description | | ----------- | -------- | ------------------------------------- | | Db user | Text | Username for database authentication. | | Db password | Password | Password for database authentication. | # Google Sheets > Set up a Google Sheets connection in PlaidCloud to import, export, and synchronize spreadsheet data within your workflows. ## Connection Documentation [Section titled “Connection Documentation”](#connection-documentation) Google Sheets is oriented more towards consumers. For technical documentation, refer to the [developer documentation](https://developers.google.com/workspace/sheets). ## Configuration [Section titled “Configuration”](#configuration) These fields appear when creating or editing this connection. Required vs optional depends on the authentication options you enable. ### Identification [Section titled “Identification”](#identification) | Field | Type | Description | | ------------ | ----------------- | ---------------------------------------------------------------------- | | Name | Text | Display name for this connection. | | Alias | Text (multi-line) | Optional alias or notes about the connection. | | Is active | Toggle | Whether the connection is enabled. Disable to pause without deleting. | | Db read only | Toggle | Restrict the connection to read-only operations. | | Access type | Select | Read-only, write-only, or read-write access level for this connection. | # Open Table Format Connections > Connect PlaidCloud Lakehouse to open table formats including Apache Iceberg, Delta Lake, Hudi, and Hive for federated queries. PlaidCloud Lakehouse can federate queries directly against open table formats, letting you query data in place without moving it into PlaidCloud first. Useful for joining lakehouse data with external data lakes that are already managed in Iceberg, Delta Lake, or Hudi. ## Formats [Section titled “Formats”](#formats) * [Apache Iceberg](/reference/connectors/open-tables/iceberg/) * [Delta Lake](/reference/connectors/open-tables/delta-lake/) * [Apache Hudi](/reference/connectors/open-tables/hudi/) * [Apache Hive](/reference/connectors/open-tables/hive/) — Hive open table format (distinct from the [Hive query engine connector](/reference/connectors/databases/hive/)) # Delta Lake Open Table Format (Databricks Catalog) > Configure a Delta Lake open table format connection in PlaidCloud for hybrid query execution without moving your stored data. ## Catalog Documentation [Section titled “Catalog Documentation”](#catalog-documentation) [The Delta Lake documentation](https://docs.delta.io/). ## Setup [Section titled “Setup”](#setup) This connector uses a vendor-specific authentication flow and is configured directly from the **Connections** screen in your workspace. The configuration fields shown depend on the credentials your tenant administrator has provisioned for the integration. See the upstream [delta-lake documentation](https://docs.delta.io/latest/) for the latest setup specifics. If you need help setting up this connector for your tenant, contact your account team — connector-specific credentials, environment URLs, and any required pre-provisioning typically need to be coordinated with PlaidCloud support. # Apache Hive Open Table Format > Set up an Apache Hive catalog connection in PlaidCloud for open table format queries through the PlaidCloud Lakehouse service. ## Catalog Documentation [Section titled “Catalog Documentation”](#catalog-documentation) [Apache Hive documentation](https://hive.apache.org/docs/latest/). ## Configuration [Section titled “Configuration”](#configuration) These fields appear when creating or editing this connection. Required vs optional depends on the authentication options you enable. ### Identification [Section titled “Identification”](#identification) | Field | Type | Description | | ------------ | ----------------- | ---------------------------------------------------------------------- | | Name | Text | Display name for this connection. | | Alias | Text (multi-line) | Optional alias or notes about the connection. | | Is active | Toggle | Whether the connection is enabled. Disable to pause without deleting. | | Db read only | Toggle | Restrict the connection to read-only operations. | | Access type | Select | Read-only, write-only, or read-write access level for this connection. | ### Connection [Section titled “Connection”](#connection) | Field | Type | Description | | ---------- | ------ | ---------------------------------------------- | | Db host | Text | Hostname or IP address of the database server. | | Db port | Number | Port number for the database connection. | | Db catalog | Text | Database, catalog, or schema to connect to. | ### Authentication [Section titled “Authentication”](#authentication) | Field | Type | Description | | ----------- | -------- | ------------------------------------------------------------- | | Db user | Text | Username for database authentication. | | Use sso | Toggle | Authenticate via single sign-on instead of username/password. | | Db password | Password | Password for database authentication. | ### SSL / TLS [Section titled “SSL / TLS”](#ssl--tls) | Field | Type | Description | | -------------------- | ----------------- | ----------------------------------------------------------------------- | | Use ssl | Toggle | Encrypt the connection with SSL/TLS. | | Ssl mode | Select | SSL verification mode (e.g., disable, require, verify-ca, verify-full). | | Ssl auth client cert | Text (multi-line) | Client certificate (PEM) for mutual TLS authentication. | | Ssl auth client key | Text (multi-line) | Client private key (PEM) for mutual TLS authentication. | | Ssl auth root cert | Text (multi-line) | Root CA certificate (PEM) for verifying the server’s cert. | | Ssl auth cert revoke | Text (multi-line) | Certificate revocation list, if your environment uses one. | ### SSH Tunnel [Section titled “SSH Tunnel”](#ssh-tunnel) | Field | Type | Description | | --------------- | ----------------- | ----------------------------------------------------------------------- | | Use ssh | Toggle | Tunnel the connection through an SSH bastion. | | Ssh host | Text | SSH bastion hostname. | | Ssh port | Number | SSH bastion port (default 22). | | Ssh user | Text | SSH bastion username. | | Ssh password | Password | SSH bastion password (if password auth is used). | | Use ssh cert | Toggle | Authenticate to the SSH bastion with a private key instead of password. | | Ssh private key | Text (multi-line) | SSH private key (PEM) for bastion authentication. | | Ssh host key | Text (multi-line) | Expected SSH host key for bastion fingerprint verification. | # Apache Hudi Open Table Format > Configure an Apache Hudi catalog connection in PlaidCloud for open table format queries through the PlaidCloud Lakehouse service. ## Catalog Documentation [Section titled “Catalog Documentation”](#catalog-documentation) [Apache Hudi documentation](https://hudi.apache.org/docs/overview/). ## Setup [Section titled “Setup”](#setup) This connector uses a vendor-specific authentication flow and is configured directly from the **Connections** screen in your workspace. The configuration fields shown depend on the credentials your tenant administrator has provisioned for the integration. See the upstream [hudi documentation](https://hudi.apache.org/docs/overview/) for the latest setup specifics. If you need help setting up this connector for your tenant, contact your account team — connector-specific credentials, environment URLs, and any required pre-provisioning typically need to be coordinated with PlaidCloud support. # Apache Iceberg Open Table Format > Set up an Apache Iceberg catalog connection in PlaidCloud for open table format queries through the PlaidCloud Lakehouse service. ## Catalog Documentation [Section titled “Catalog Documentation”](#catalog-documentation) [Apache Iceberg documentation](https://iceberg.apache.org/docs/latest/). ## Setup [Section titled “Setup”](#setup) This connector uses a vendor-specific authentication flow and is configured directly from the **Connections** screen in your workspace. The configuration fields shown depend on the credentials your tenant administrator has provisioned for the integration. See the upstream [iceberg documentation](https://iceberg.apache.org/docs/latest/) for the latest setup specifics. If you need help setting up this connector for your tenant, contact your account team — connector-specific credentials, environment URLs, and any required pre-provisioning typically need to be coordinated with PlaidCloud support. # REST Connections > Connect PlaidCloud to REST API services including Salesforce, NetSuite, Workday, Dynamics, and other cloud-based platforms. PlaidCloud connects to REST API services using standard authentication patterns (OAuth, API keys, Basic Auth). Each provider has its own quirks in token flow, scope handling, and pagination — the dedicated connectors below encapsulate those specifics so you don’t have to. For any REST service that doesn’t have a dedicated connector, PlaidCloud provides a generic REST connector configurable to most authentication and response-parsing patterns. ## CRM and Sales [Section titled “CRM and Sales”](#crm-and-sales) * [Salesforce](/reference/connectors/rest/salesforce/) * [Dynamics](/reference/connectors/rest/dynamics/) — Microsoft Dynamics 365 ## Financial and Accounting [Section titled “Financial and Accounting”](#financial-and-accounting) * [NetSuite](/reference/connectors/rest/netsuite/) * [QuickBooks](/reference/connectors/rest/quickbooks/) * [Sage Intacct](/reference/connectors/rest/sage-intacct/) * [Stripe](/reference/connectors/rest/stripe/) * [Ramp](/reference/connectors/rest/ramp/) ## HR and Payroll [Section titled “HR and Payroll”](#hr-and-payroll) * [Workday](/reference/connectors/rest/workday/) * [Paycor](/reference/connectors/rest/paycor/) * [Gusto](/reference/connectors/rest/gusto/) ## Integration Platforms [Section titled “Integration Platforms”](#integration-platforms) * [MuleSoft](/reference/connectors/rest/mulesoft/) # Microsoft Dynamics 365 REST Connector > Configure a Microsoft Dynamics REST API connection in PlaidCloud to integrate ERP and CRM data into your analysis workflows. ## API Documentation [Section titled “API Documentation”](#api-documentation) The [vendor API reference](https://learn.microsoft.com/en-us/dynamics365/business-central/dev-itpro/api-reference/v2.0/) covers this connector’s endpoints. ## Configuration [Section titled “Configuration”](#configuration) These fields appear when creating or editing this connection. Required vs optional depends on the authentication options you enable. ### Identification [Section titled “Identification”](#identification) | Field | Type | Description | | ------------ | ----------------- | ---------------------------------------------------------------------- | | Name | Text | Display name for this connection. | | Alias | Text (multi-line) | Optional alias or notes about the connection. | | Is active | Toggle | Whether the connection is enabled. Disable to pause without deleting. | | Db read only | Toggle | Restrict the connection to read-only operations. | | Access type | Select | Read-only, write-only, or read-write access level for this connection. | ### Other [Section titled “Other”](#other) | Field | Type | Description | | -------------------- | ---- | ------------------------------------- | | Dynamics tenant | Text | — | | Oauth2 client id | Text | — | | Oauth2 client secret | Text | Secret credential — stored encrypted. | | Dynamics crm | Text | — | # Gusto REST Connector > Set up a Gusto REST API connection in PlaidCloud to integrate payroll, benefits, and HR data into your analysis workflows. ## API Documentation [Section titled “API Documentation”](#api-documentation) The [vendor API reference](https://docs.gusto.com/app-integrations/reference/get-v1-token-info) covers this connector’s endpoints. ## Configuration [Section titled “Configuration”](#configuration) These fields appear when creating or editing this connection. Required vs optional depends on the authentication options you enable. ### Identification [Section titled “Identification”](#identification) | Field | Type | Description | | --------- | ----------------- | --------------------------------------------------------------------- | | Name | Text | Display name for this connection. | | Alias | Text (multi-line) | Optional alias or notes about the connection. | | Is active | Toggle | Whether the connection is enabled. Disable to pause without deleting. | ### Other [Section titled “Other”](#other) | Field | Type | Description | | ----------------------- | ------ | ------------- | | Host | Text | — | | Auth type | Select | — | | Enable ssl verification | Toggle | — | | Follow redirects | Toggle | — | | Redirect follow http | Toggle | — | | Redirect follow auth | Toggle | — | | Redirect remove referer | Toggle | — | | Strict http | Toggle | — | | Encode url | Toggle | URL endpoint. | | Disable cookie jar | Toggle | — | | Server cipher | Toggle | — | | Max redirects | Number | — | | Test endpoint | Text | — | | Test method | Select | — | # Mulesoft REST Connector > Set up a MuleSoft REST API connection in PlaidCloud to integrate enterprise data across systems through the Anypoint platform. ## API Documentation [Section titled “API Documentation”](#api-documentation) The API documentation is for this connector is determined by the service endpoints for which Mulesoft is handling. ## Configuration [Section titled “Configuration”](#configuration) These fields appear when creating or editing this connection. Required vs optional depends on the authentication options you enable. ### Identification [Section titled “Identification”](#identification) | Field | Type | Description | | --------- | ----------------- | --------------------------------------------------------------------- | | Name | Text | Display name for this connection. | | Alias | Text (multi-line) | Optional alias or notes about the connection. | | Is active | Toggle | Whether the connection is enabled. Disable to pause without deleting. | ### Other [Section titled “Other”](#other) | Field | Type | Description | | ----------------------- | ------ | ------------- | | Host | Text | — | | Auth type | Select | — | | Enable ssl verification | Toggle | — | | Follow redirects | Toggle | — | | Redirect follow http | Toggle | — | | Redirect follow auth | Toggle | — | | Redirect remove referer | Toggle | — | | Strict http | Toggle | — | | Encode url | Toggle | URL endpoint. | | Disable cookie jar | Toggle | — | | Server cipher | Toggle | — | | Max redirects | Number | — | | Test endpoint | Text | — | | Test method | Select | — | # Netsuite REST Connector > Set up a NetSuite REST API connection in PlaidCloud to integrate ERP, financial, and e-commerce data into your workflows. ## API Documentation [Section titled “API Documentation”](#api-documentation) The [vendor API reference](https://system.netsuite.com/help/helpcenter/en_US/APIs/REST_API_Browser/record/v1/2023.1/index.html) covers this connector’s endpoints. ## Configuration [Section titled “Configuration”](#configuration) These fields appear when creating or editing this connection. Required vs optional depends on the authentication options you enable. ### Identification [Section titled “Identification”](#identification) | Field | Type | Description | | ------------ | ----------------- | ---------------------------------------------------------------------- | | Name | Text | Display name for this connection. | | Alias | Text (multi-line) | Optional alias or notes about the connection. | | Is active | Toggle | Whether the connection is enabled. Disable to pause without deleting. | | Db read only | Toggle | Restrict the connection to read-only operations. | | Access type | Select | Read-only, write-only, or read-write access level for this connection. | ### Other [Section titled “Other”](#other) | Field | Type | Description | | ---------------------------- | ---- | ----------- | | Oauth2 client id | Text | — | | Netsuite certificate id | Text | — | | Netsuite account id | Text | — | | Netsuite private certificate | Text | — | # Paycor REST Connector > Configure a Paycor REST API connection in PlaidCloud to integrate payroll, HR, and workforce data into your analysis workflows. ## API Documentation [Section titled “API Documentation”](#api-documentation) The [vendor API reference](https://developers.paycor.com/explore) covers this connector’s endpoints. ## Paycor Setup [Section titled “Paycor Setup”](#paycor-setup) The Paycor API Application and Initiation process is a little more involved than other REST providers. Please be sure to go through the steps outlined on their [Quick Start Page](https://developers.paycor.com/guides#quickStartLabel) Key values you must capture are: * Application OAuth Client ID * Application OAuth Client Secret * APIm Subscription Key * Scope Key of `current` application version Caution Do not forget to “activate” the application to allow use Activate it here, choosing Production or Sandbox depending on your need: | Environment | Activation Form URL | | ----------- | ------------------------------------------------------------ | | Sandbox | | | Production | | Danger If you have multiple organizations in Paycor, you will need separate logins for each organization. DO NOT merge them. The Developer Portal needs a dedicated unique login for each organization in order to create an organization specific Application for REST access. ## Configuration [Section titled “Configuration”](#configuration) These fields appear when creating or editing this connection. Required vs optional depends on the authentication options you enable. ### Identification [Section titled “Identification”](#identification) | Field | Type | Description | | ------------ | ----------------- | ---------------------------------------------------------------------- | | Name | Text | Display name for this connection. | | Alias | Text (multi-line) | Optional alias or notes about the connection. | | Is active | Toggle | Whether the connection is enabled. Disable to pause without deleting. | | Db read only | Toggle | Restrict the connection to read-only operations. | | Access type | Select | Read-only, write-only, or read-write access level for this connection. | ### Other [Section titled “Other”](#other) | Field | Type | Description | | ----------------------- | ---- | ------------------------------------- | | Oauth2 client id | Text | — | | Oauth2 client secret | Text | Secret credential — stored encrypted. | | Paycor subscription key | Text | Authentication key or token. | # Quickbooks REST Connector > Configure a QuickBooks REST API connection in PlaidCloud to integrate accounting and financial data into your analysis workflows. ## API Documentation [Section titled “API Documentation”](#api-documentation) The [vendor API reference](https://developer.intuit.com/app/developer/qbo/docs/learn/explore-the-quickbooks-online-api) covers this connector’s endpoints. ## Configuration [Section titled “Configuration”](#configuration) These fields appear when creating or editing this connection. Required vs optional depends on the authentication options you enable. ### Identification [Section titled “Identification”](#identification) | Field | Type | Description | | ------------ | ----------------- | ---------------------------------------------------------------------- | | Name | Text | Display name for this connection. | | Alias | Text (multi-line) | Optional alias or notes about the connection. | | Is active | Toggle | Whether the connection is enabled. Disable to pause without deleting. | | Db read only | Toggle | Restrict the connection to read-only operations. | | Access type | Select | Read-only, write-only, or read-write access level for this connection. | # Ramp REST Connector > Set up a Ramp REST API connection in PlaidCloud to integrate corporate card spending and expense data into your workflows. ## API Documentation [Section titled “API Documentation”](#api-documentation) The [vendor API reference](https://docs.ramp.com/developer-api/v1) covers this connector’s endpoints. ## Configuration [Section titled “Configuration”](#configuration) These fields appear when creating or editing this connection. Required vs optional depends on the authentication options you enable. ### Identification [Section titled “Identification”](#identification) | Field | Type | Description | | ------------ | ----------------- | ---------------------------------------------------------------------- | | Name | Text | Display name for this connection. | | Alias | Text (multi-line) | Optional alias or notes about the connection. | | Is active | Toggle | Whether the connection is enabled. Disable to pause without deleting. | | Db read only | Toggle | Restrict the connection to read-only operations. | | Access type | Select | Read-only, write-only, or read-write access level for this connection. | ### Other [Section titled “Other”](#other) | Field | Type | Description | | -------------------- | ---- | ------------------------------------- | | Oauth2 client id | Text | — | | Oauth2 client secret | Text | Secret credential — stored encrypted. | | Ramp scope | Text | — | # Sage Intacct REST Connector > Set up a Sage Intacct REST API connection in PlaidCloud to integrate financial and accounting data into your workflows. ## API Documentation [Section titled “API Documentation”](#api-documentation) The Sage Intacct REST API documentation is available at the [Sage Developer site](https://developer.sage.com/intacct/docs/1/sage-intacct-rest-api/get-started/quick-start). ## Security Requirements [Section titled “Security Requirements”](#security-requirements) The connector authenticates with a Sage Intacct **Web Services** sender ID plus a user-level login. The sender credentials must be enabled for your company by Sage support; the user credentials must have permissions for every Intacct object the connector will read. Treat sender and user credentials as secrets — store them only via the **Credentials** area in PlaidCloud and reference them from the connection. ## Obtain Credentials [Section titled “Obtain Credentials”](#obtain-credentials) 1. Open the Sage Intacct **Company Setup** area 2. Enable Web Services for the sender ID provided by Sage 3. Create or select a Web Services user for PlaidCloud 4. Grant the user permissions on every object you intend to query 5. Record the company ID, user ID, user password, sender ID, and sender password ## Create REST Connector [Section titled “Create REST Connector”](#create-rest-connector) 1. Go to **Tools > Connections** and click `Add Connection` 2. Select **Sage Intacct** as the connection type 3. Enter: * **Connection Name** — friendly name shown in workflow steps * **Company ID** — the Intacct company you’re connecting to * **User ID** and **User Password** * **Sender ID** and **Sender Password** * **Entity** — optional, for multi-entity tenants 4. Click `Test` to validate the credentials 5. Click `Save` ## Use in Workflow Steps [Section titled “Use in Workflow Steps”](#use-in-workflow-steps) The connection is selectable from these workflow import steps: * [Import Sage AP](../../../workflow-steps/import/import-sage-ap/) — AP bill headers * [Import Sage AP Lines](../../../workflow-steps/import/import-sage-ap-lines/) — AP bill line detail * [Import Sage Intacct Query](../../../workflow-steps/import/import-intacct-query/) — generic query against any Intacct object # Salesforce REST Connector > Set up a Salesforce REST API connection in PlaidCloud to integrate CRM, sales, and customer data into your analysis workflows. ## API Documentation [Section titled “API Documentation”](#api-documentation) The [vendor API reference](https://developer.salesforce.com/docs/atlas.en-us.api_rest.meta/api_rest/intro_what_is_rest_api.html) covers this connector’s endpoints. ## Configuration [Section titled “Configuration”](#configuration) These fields appear when creating or editing this connection. Required vs optional depends on the authentication options you enable. ### Identification [Section titled “Identification”](#identification) | Field | Type | Description | | ----------- | ----------------- | ---------------------------------------------------------------------- | | Name | Text | Display name for this connection. | | Alias | Text (multi-line) | Optional alias or notes about the connection. | | Is active | Toggle | Whether the connection is enabled. Disable to pause without deleting. | | Access type | Select | Read-only, write-only, or read-write access level for this connection. | ### Authentication [Section titled “Authentication”](#authentication) | Field | Type | Description | | ------------- | -------- | ------------------------------------------- | | Client id | Text | OAuth client ID issued by the provider. | | Client secret | Password | OAuth client secret issued by the provider. | ### Other [Section titled “Other”](#other) | Field | Type | Description | | ----- | ---- | ----------- | | Host | Text | — | # Stripe REST Connector > Configure a Stripe REST API connection in PlaidCloud to integrate payment processing and financial data into your workflows. ## API Documentation [Section titled “API Documentation”](#api-documentation) The [vendor API reference](https://docs.stripe.com/api) covers this connector’s endpoints. ## Configuration [Section titled “Configuration”](#configuration) These fields appear when creating or editing this connection. Required vs optional depends on the authentication options you enable. ### Identification [Section titled “Identification”](#identification) | Field | Type | Description | | --------- | ----------------- | --------------------------------------------------------------------- | | Name | Text | Display name for this connection. | | Alias | Text (multi-line) | Optional alias or notes about the connection. | | Is active | Toggle | Whether the connection is enabled. Disable to pause without deleting. | ### Other [Section titled “Other”](#other) | Field | Type | Description | | ----------------------- | ------ | ------------- | | Host | Text | — | | Auth type | Select | — | | Enable ssl verification | Toggle | — | | Follow redirects | Toggle | — | | Redirect follow http | Toggle | — | | Redirect follow auth | Toggle | — | | Redirect remove referer | Toggle | — | | Strict http | Toggle | — | | Encode url | Toggle | URL endpoint. | | Disable cookie jar | Toggle | — | | Server cipher | Toggle | — | | Max redirects | Number | — | | Test endpoint | Text | — | | Test method | Select | — | # Workday REST Connector > Configure a Workday REST API connection in PlaidCloud to integrate HR, finance, and planning data into your workflows. ## API Documentation [Section titled “API Documentation”](#api-documentation) The [vendor API reference](https://community.workday.com/sites/default/files/file-hosting/restapi/) covers this connector’s endpoints. ## Configuration [Section titled “Configuration”](#configuration) These fields appear when creating or editing this connection. Required vs optional depends on the authentication options you enable. ### Identification [Section titled “Identification”](#identification) | Field | Type | Description | | ------------ | ----------------- | ---------------------------------------------------------------------- | | Name | Text | Display name for this connection. | | Alias | Text (multi-line) | Optional alias or notes about the connection. | | Is active | Toggle | Whether the connection is enabled. Disable to pause without deleting. | | Db read only | Toggle | Restrict the connection to read-only operations. | | Access type | Select | Read-only, write-only, or read-write access level for this connection. | ### Other [Section titled “Other”](#other) | Field | Type | Description | | -------------------- | ---- | ------------------------------------- | | Oauth2 client id | Text | — | | Oauth2 client secret | Text | Secret credential — stored encrypted. | | Workday url | Text | URL endpoint. | | Oauth2 refresh token | Text | — | # Singer Sources > The catalog of Singer tap connectors available as PlaidCloud Singer sources — Stripe, GitHub, databases, and 130+ more SaaS and API sources, each linking to its connector docs. PlaidCloud can pull data from the SaaS apps, APIs, and databases below using [Singer](https://www.singer.io/) taps. Pick one as the **Tap** when you create a [Singer Source connection](/guides/connections/singer-sources/); the connection form then shows that tap’s exact configuration fields, each with inline help. For the full set of options a source supports, see its connector repository (linked in the table). The list below is the current curated, permissively licensed catalog and grows over time — the **Tap** dropdown in the connection editor is always the live source of truth. ## Available Sources (135) [Section titled “Available Sources (135)”](#available-sources-135) | Source | Tap | Configuration reference | | ------------------------------ | --------------------------- | ------------------------------------------------------------------------------------------------- | | Aircall | `tap-aircall` | [TicketSwap/tap-aircall](https://github.com/TicketSwap/tap-aircall) | | Airtable | `tap-airtable` | [tomasvotava/tap-airtable](https://github.com/tomasvotava/tap-airtable) | | Amazon Advertising | `tap-amazon-advertising` | [dbt-labs/tap-amazon-advertising](https://github.com/dbt-labs/tap-amazon-advertising) | | Amazon MWS | `tap-amazon-mws` | [adswerve/singer-tap-amazon-mws](https://github.com/adswerve/singer-tap-amazon-mws) | | Anvil | `tap-anvil` | [svinstech/tap-anvil](https://github.com/svinstech/tap-anvil) | | Apache Log Files | `tap-apachelog` | [omelark/tap-apachelog](https://github.com/omelark/tap-apachelog) | | Apaleo | `tap-apaleo` | [felixkoch/tap-apaleo](https://github.com/felixkoch/tap-apaleo) | | Apple Health | `tap-applehealth` | [felippecaso/tap-applehealth](https://github.com/felippecaso/tap-applehealth) | | Apple Search Ads | `tap-apple-search-ads` | [mighty-digital/tap-apple-search-ads](https://github.com/mighty-digital/tap-apple-search-ads) | | AskNicely | `tap-ask-nicely` | [Mashey/tap-ask-nicely](https://github.com/Mashey/tap-ask-nicely) | | AT Internet | `tap-atinternet` | [GendarmerieNationale/tap-atinternet](https://github.com/GendarmerieNationale/tap-atinternet) | | Athena | `tap-athena` | [MeltanoLabs/tap-athena](https://github.com/MeltanoLabs/tap-athena) | | AWS Cost Explorer | `tap-aws-cost-explorer` | [albert-marrero/tap-aws-cost-explorer](https://github.com/albert-marrero/tap-aws-cost-explorer) | | BambooHR | `tap-bamboohr` | [AutoIDM/autoidm-tap-bamboohr](https://github.com/AutoIDM/autoidm-tap-bamboohr) | | BigQuery | `tap-bigquery` | [anelendata/tap-bigquery](https://github.com/anelendata/tap-bigquery) | | Bitso | `tap-bitso` | [edgarrmondragon/tap-bitso](https://github.com/edgarrmondragon/tap-bitso) | | Bling | `tap-bling` | [Ricardo-Muhlstedt/tap-bling](https://github.com/Ricardo-Muhlstedt/tap-bling) | | Cassandra | `tap-cassandra` | [datarts-tech/tap-cassandra](https://github.com/datarts-tech/tap-cassandra) | | Chorusai | `tap-chorusai` | [andyoneal/tap-chorusai](https://github.com/andyoneal/tap-chorusai) | | ChurnZero | `tap-churnzero` | [MarkEstey/tap-churnzero](https://github.com/MarkEstey/tap-churnzero) | | CircleCI | `tap-circle-ci` | [MeltanoLabs/tap-circle-ci](https://github.com/MeltanoLabs/tap-circle-ci) | | ClickHouse | `tap-clickhouse` | [akurdyukov/tap-clickhouse](https://github.com/akurdyukov/tap-clickhouse) | | Clickup | `tap-clickup` | [AutoIDM/tap-clickup](https://github.com/AutoIDM/tap-clickup) | | ClinicalTrials.gov | `tap-clinicaltrials` | [edgarrmondragon/tap-clinicaltrials](https://github.com/edgarrmondragon/tap-clinicaltrials) | | Clockify | `tap-clockify` | [quantile-taps/tap-clockify](https://github.com/quantile-taps/tap-clockify) | | Cloudwatch | `tap-cloudwatch` | [meltanolabs/tap-cloudwatch](https://github.com/meltanolabs/tap-cloudwatch) | | Codat | `tap-codat` | [manuphatak/tap-codatio](https://github.com/manuphatak/tap-codatio) | | Codecov | `tap-codecov` | [pulumi/tap-codecov](https://github.com/pulumi/tap-codecov) | | Contentful | `tap-contentful` | [GtheSheep/tap-contentful](https://github.com/GtheSheep/tap-contentful) | | CrateDB | `tap-cratedb` | [crate/meltano-tap-cratedb](https://github.com/crate/meltano-tap-cratedb) | | CSV | `tap-csv` | [MeltanoLabs/tap-csv](https://github.com/MeltanoLabs/tap-csv) | | Dagster | `tap-dagster` | [voxmedia/tap-dagster](https://github.com/voxmedia/tap-dagster) | | dbt Artifacts | `tap-dbt-artifacts` | [Matatika/tap-dbt-artifacts](https://github.com/Matatika/tap-dbt-artifacts) | | dbt Cloud | `tap-dbt` | [meltanolabs/tap-dbt](https://github.com/meltanolabs/tap-dbt) | | Delighted | `tap-delighted` | [TicketSwap/tap-delighted](https://github.com/TicketSwap/tap-delighted) | | Domo | `tap-domo` | [Mashey/tap-domo](https://github.com/Mashey/tap-domo) | | DuckDB | `tap-duckdb` | [MeltanoLabs/tap-duckdb](https://github.com/MeltanoLabs/tap-duckdb) | | DynamoDB | `tap-dynamodb` | [MeltanoLabs/tap-dynamodb](https://github.com/MeltanoLabs/tap-dynamodb) | | Exact | `tap-exact` | [TicketSwap/tap-exact](https://github.com/TicketSwap/tap-exact) | | exchangerate.host | `tap-exchangeratehost` | [anelendata/tap-exchangeratehost](https://github.com/anelendata/tap-exchangeratehost) | | FaB DB | `tap-fabdb` | [dwallace0723/tap-fabdb](https://github.com/dwallace0723/tap-fabdb) | | Feed | `tap-feed` | [jawats/tap-feed](https://github.com/jawats/tap-feed) | | Fleetio | `tap-fleetio` | [fleetio/tap-fleetio](https://github.com/fleetio/tap-fleetio) | | Formbricks | `tap-formbricks` | [emilklindt/tap-formbricks](https://github.com/emilklindt/tap-formbricks) | | Formula 1 | `tap-f1` | [ReubenFrankel/tap-f1](https://github.com/ReubenFrankel/tap-f1) | | GainsightPX | `tap-gainsightpx` | [Widen/tap-gainsightpx](https://github.com/Widen/tap-gainsightpx) | | Geekbot | `tap-geekbot` | [edgarrmondragon/tap-geekbot](https://github.com/edgarrmondragon/tap-geekbot) | | Geospatial datasets | `tap-geo` | [celine-eu/tap-geo](https://github.com/celine-eu/tap-geo) | | GitHub | `tap-github` | [MeltanoLabs/tap-github](https://github.com/MeltanoLabs/tap-github) | | GMail | `tap-gmail` | [MeltanoLabs/tap-gmail](https://github.com/MeltanoLabs/tap-gmail) | | GMail CSV/Excel Attachments | `tap-gmail-csv` | [food-spotter/tap-gmail-csv](https://github.com/food-spotter/tap-gmail-csv) | | Google Analytics | `tap-google-analytics` | [MeltanoLabs/tap-google-analytics](https://github.com/MeltanoLabs/tap-google-analytics) | | Google Play (Reviews Scraper) | `tap-google-play` | [edgarrmondragon/tap-google-play](https://github.com/edgarrmondragon/tap-google-play) | | Google Play Store (GCS Export) | `tap-playstore` | [haleemur/tap-playstore](https://github.com/haleemur/tap-playstore) | | Google Search Console | `tap-google-search-console` | [MeltanoLabs/tap-google-search-console](https://github.com/MeltanoLabs/tap-google-search-console) | | Greenhouse | `tap-greenhouse` | [codyss/tap-greenhouse](https://github.com/codyss/tap-greenhouse) | | GRIB | `tap-grib` | [celine-eu/tap-grib](https://github.com/celine-eu/tap-grib) | | Healthchecks.io | `tap-healthchecksio` | [reservoir-data/tap-healthchecksio](https://github.com/reservoir-data/tap-healthchecksio) | | HighLevel | `tap-gohighlevel` | [MeltanoLabs/tap-gohighlevel](https://github.com/MeltanoLabs/tap-gohighlevel) | | IBM DB2 | `tap-db2` | [danielptv/tap-db2](https://github.com/danielptv/tap-db2) | | Iceberg | `tap-iceberg` | [shaped-ai/tap-iceberg](https://github.com/shaped-ai/tap-iceberg) | | Immuta | `tap-immuta` | [immuta/tap-immuta](https://github.com/immuta/tap-immuta) | | Impact | `tap-impact` | [voxmedia/tap-impact-publisher](https://github.com/voxmedia/tap-impact-publisher) | | Instagram | `tap-instagram` | [prratek/tap-instagram](https://github.com/prratek/tap-instagram) | | Instantly AI | `tap-instantly-ai` | [strvcom/tap-instantly-ai](https://github.com/strvcom/tap-instantly-ai) | | Intercom | `tap-intercom` | [TicketSwap/tap-intercom](https://github.com/TicketSwap/tap-intercom) | | Jaffle Shop Generator | `tap-jaffle-shop` | [MeltanoLabs/tap-jaffle-shop](https://github.com/MeltanoLabs/tap-jaffle-shop) | | Jotform | `tap-jotform` | [reservoir-data/tap-jotform](https://github.com/reservoir-data/tap-jotform) | | KiotViet | `tap-kiotviet` | [chienazazaz/tap-kiotviet](https://github.com/chienazazaz/tap-kiotviet) | | Klaviyo | `tap-klaviyo` | [hotgluexyz/tap-klaviyo](https://github.com/hotgluexyz/tap-klaviyo) | | Lever | `tap-lever` | [dbt-labs/tap-lever](https://github.com/dbt-labs/tap-lever) | | Mailchimp | `tap-mailchimp` | [lovepopcards/tap-mailchimp](https://github.com/lovepopcards/tap-mailchimp) | | Mailjet | `tap-mailjet` | [Somtom/tap-mailjet](https://github.com/Somtom/tap-mailjet) | | Megaphone | `tap-megaphone` | [yujoy/tap-megaphone](https://github.com/yujoy/tap-megaphone) | | Mercado Pago | `tap-mercadopago` | [a-rusi/tap-mercadopago](https://github.com/a-rusi/tap-mercadopago) | | Messagebird | `tap-messagebird` | [MeltanoLabs/tap-messagebird](https://github.com/MeltanoLabs/tap-messagebird) | | Microsoft Dataverse | `tap-dataverse` | [mjsqu/tap-dataverse](https://github.com/mjsqu/tap-dataverse) | | Microsoft Graph | `tap-ms-graph` | [Slalom-Consulting/tap-ms-graph](https://github.com/Slalom-Consulting/tap-ms-graph) | | Microsoft SQL Server | `tap-mssql` | [BuzzCutNorman/tap-mssql](https://github.com/BuzzCutNorman/tap-mssql) | | Miro | `tap-miro` | [Slalom-Consulting/tap-miro](https://github.com/Slalom-Consulting/tap-miro) | | MongoDB | `tap-mongodb` | [MeltanoLabs/tap-mongodb](https://github.com/MeltanoLabs/tap-mongodb) | | NASA | `tap-nasa` | [edgarrmondragon/tap-nasa](https://github.com/edgarrmondragon/tap-nasa) | | New Relic | `tap-newrelic` | [fixdauto/tap-newrelic](https://github.com/fixdauto/tap-newrelic) | | NHL Stats API | `tap-nhl` | [bicks-bapa-roob/tap-nhl](https://github.com/bicks-bapa-roob/tap-nhl) | | Open-Meteo | `tap-openmeteo` | [celine-eu/tap-openmeteo](https://github.com/celine-eu/tap-openmeteo) | | OpenProject | `tap-openproject` | [netspective-labs/tap-openproject](https://github.com/netspective-labs/tap-openproject) | | Oracle | `tap-oracle` | [Hamza-Bouali/tap-oracle](https://github.com/Hamza-Bouali/tap-oracle) | | Outbrain | `tap-outbrain` | [dbt-labs/tap-outbrain](https://github.com/dbt-labs/tap-outbrain) | | Parquet | `tap-parquet` | [AE-nv/tap-parquet](https://github.com/AE-nv/tap-parquet) | | Partnerize | `tap-partnerize` | [voxmedia/tap-partnerize](https://github.com/voxmedia/tap-partnerize) | | Partoo | `tap-partoo` | [GendarmerieNationale/tap-partoo](https://github.com/GendarmerieNationale/tap-partoo) | | Peloton | `tap-peloton` | [MeltanoLabs/tap-peloton](https://github.com/MeltanoLabs/tap-peloton) | | Pipedream | `tap-pipedream` | [edgarrmondragon/tap-pipedream](https://github.com/edgarrmondragon/tap-pipedream) | | PodBean | `tap-podbean` | [Slalom-Consulting/tap-podbean](https://github.com/Slalom-Consulting/tap-podbean) | | PowerBI | `tap-powerbi-metadata` | [dataops-tk/tap-powerbi-metadata](https://github.com/dataops-tk/tap-powerbi-metadata) | | Prometheus | `tap-prometheus` | [signal-ai/tap-prometheus](https://github.com/signal-ai/tap-prometheus) | | Pulumi Cloud | `tap-pulumi-cloud` | [MeltanoLabs/tap-pulumi-cloud](https://github.com/MeltanoLabs/tap-pulumi-cloud) | | Pushbullet | `tap-pushbullet` | [edgarrmondragon/tap-pushbullet](https://github.com/edgarrmondragon/tap-pushbullet) | | PxWeb API | `tap-pxwebapi` | [storebrand/tap-pxwebapi](https://github.com/storebrand/tap-pxwebapi) | | PyPI Stats | `tap-pypistats` | [edgarrmondragon/tap-pypistats](https://github.com/edgarrmondragon/tap-pypistats) | | Qualified | `tap-qualified` | [z3z1ma/tap-qualified](https://github.com/z3z1ma/tap-qualified) | | Quickbase | `tap-quickbase` | [MainspringEnergy/tap-quickbase-json](https://github.com/MainspringEnergy/tap-quickbase-json) | | Read the Docs | `tap-readthedocs` | [edgarrmondragon/tap-readthedocs](https://github.com/edgarrmondragon/tap-readthedocs) | | Recruitee | `tap-recruitee` | [rawwar/tap-recruitee](https://github.com/rawwar/tap-recruitee) | | Reddit Ads | `tap-redditads` | [Ella6882/tap-redditads](https://github.com/Ella6882/tap-redditads) | | Redshift | `tap-redshift` | [Monad-Inc/tap-redshift](https://github.com/Monad-Inc/tap-redshift) | | REST API | `tap-rest-api-msdk` | [Widen/tap-rest-api-msdk](https://github.com/Widen/tap-rest-api-msdk) | | Rick and Morty API | `tap-rickandmorty` | [clrcrl/tap-rickandmorty](https://github.com/clrcrl/tap-rickandmorty) | | SaasOptics | `tap-saasoptics` | [datarts-tech/tap-saasoptics](https://github.com/datarts-tech/tap-saasoptics) | | Salesloft | `tap-salesloft` | [MarkEstey/firehose-tap-salesloft](https://github.com/MarkEstey/firehose-tap-salesloft) | | Service Titan | `tap-service-titan` | [MeltanoLabs/tap-service-titan](https://github.com/MeltanoLabs/tap-service-titan) | | SharePoint Sites | `tap-sharepointsites` | [storebrand/tap-sharepointsites](https://github.com/storebrand/tap-sharepointsites) | | Shiphero | `tap-shiphero` | [definite-app/tap-shiphero](https://github.com/definite-app/tap-shiphero) | | Shopify (GraphQL) | `tap-shopify` | [sehnem/tap-shopify](https://github.com/sehnem/tap-shopify) | | Shortcut (formerly Clubhouse) | `tap-shortcut` | [edgarrmondragon/tap-shortcut](https://github.com/edgarrmondragon/tap-shortcut) | | Showpad | `tap-showpad` | [z3z1ma/tap-showpad](https://github.com/z3z1ma/tap-showpad) | | Slack | `tap-slack` | [MeltanoLabs/tap-slack](https://github.com/MeltanoLabs/tap-slack) | | Smartsheet | `tap-smartsheet` | [brooklyn-data/tap-smartsheet](https://github.com/brooklyn-data/tap-smartsheet) | | Socrata | `tap-socrata` | [MeltanoLabs/tap-socrata](https://github.com/MeltanoLabs/tap-socrata) | | Spreadsheets | `tap-spreadsheets` | [celine-eu/tap-spreadsheets](https://github.com/celine-eu/tap-spreadsheets) | | SSB Klass API | `tap-ssb-klass` | [storebrand/tap-ssb-klass](https://github.com/storebrand/tap-ssb-klass) | | StackExchange | `tap-stackexchange` | [MeltanoLabs/tap-stackexchange](https://github.com/MeltanoLabs/tap-stackexchange) | | Staffwise | `tap-staffwise` | [chartica/tap-staffwise](https://github.com/chartica/tap-staffwise) | | Strava | `tap-strava` | [dluftspring/tap-strava](https://github.com/dluftspring/tap-strava) | | Stripe | `tap-stripe` | [TicketSwap/tap-stripe](https://github.com/TicketSwap/tap-stripe) | | Substack | `tap-substack` | [tripleaceme/tap-substack](https://github.com/tripleaceme/tap-substack) | | Tempo | `tap-tempo` | [Broscorp-net/tap-tempo](https://github.com/Broscorp-net/tap-tempo) | | Tiktok Business | `tap-tiktok-business` | [hkuffel/tap-tiktok-business](https://github.com/hkuffel/tap-tiktok-business) | | Twitter | `tap-twitter` | [voxmedia/tap-twitter](https://github.com/voxmedia/tap-twitter) | | Typeform | `tap-typeform` | [albert-marrero/tap-typeform](https://github.com/albert-marrero/tap-typeform) | | Udemy for Business | `tap-udemy-for-business` | [immuta/tap-udemy-for-business](https://github.com/immuta/tap-udemy-for-business) | | Upwork | `tap-upwork` | [Automattic/tap-upwork](https://github.com/Automattic/tap-upwork) | | Userflow | `tap-userflow` | [kingalban/tap-userflow](https://github.com/kingalban/tap-userflow) | | Zendesk Sell | `tap-zendesk-sell` | [leag/tap-zendesk-sell](https://github.com/leag/tap-zendesk-sell) | | Zoom | `tap-zoom` | [robby-rob-slalom/tap-zoom](https://github.com/robby-rob-slalom/tap-zoom) | # Expressions > Reference for PlaidCloud expression functions — column aggregations, date math, string handling, and casting across Lakehouse v1 and v2. * [Lakehouse v1 Expressions](./lakehouse-v1/) — First generation of the PlaidCloud Lakehouse, based on Databend SQL functions * [Lakehouse v2 Expressions](./lakehouse-v2/) — Second generation of the PlaidCloud Lakehouse with Apache Iceberg open-table format, based on StarRocks 4.1 SQL functions ## Where to Look up Canonical Syntax [Section titled “Where to Look up Canonical Syntax”](#where-to-look-up-canonical-syntax) PlaidCloud Lakehouse uses the SQL function libraries from the underlying engines. For specifics on a function’s arguments, edge cases, and the most current behavior, consult the upstream docs alongside the PlaidCloud-flavored examples here. * **Lakehouse v1** → [Databend SQL function reference](https://docs.databend.com/sql/sql-functions/) * **Lakehouse v2** → [StarRocks SQL function reference](https://docs.starrocks.io/docs/sql-reference/sql-functions/) (PlaidCloud Lakehouse v2 tracks StarRocks 4.1) # Lakehouse v1 Expressions > Lakehouse v1 expressions based on Databend SQL functions with SQLAlchemy references using func. prefixes. Lakehouse v1 is built on the [Databend](https://databend.com/) SQL engine. For each function below, this site provides PlaidCloud-flavored syntax and examples; for the canonical upstream reference (with all edge cases and argument variants), see the **[Databend SQL function reference](https://docs.databend.com/sql/sql-functions/)**. ## Scalar Functions [Section titled “Scalar Functions”](#scalar-functions) * [Array Functions](./00-array-functions) — Perform array operations * [Bitwise Expression Functions](./01-bitmap-functions) — Perform bitwise operations and manipulations * [Conditional Expression Functions](./03-conditional-functions) — Implement conditional logic and case statements * [Context Functions](./15-context-functions) — Provide information about the current SQL execution context * [Conversion Functions](./02-conversion-functions) — Convert data types and cast values * [Date & Time Functions](./05-datetime-functions) — Manipulate and format dates and times * [Geospatial Functions](./09-geo-functions) — Handle and manipulate geospatial data * [Geometry Functions](./09-geometry-functions) — Handle and manipulate geospatial geometry data * [Interval Functions](./05-interval-functions) — Create and manipulate time intervals * [Map Functions](./10-map-functions) — Create and manipulate map data structures * [Numeric Functions](./04-numeric-functions) — Perform calculations and numeric operations * [Search Functions](./10-search-functions) — Find values using expressions * [Semi-structured and Structured Data Functions](./10-semi-structured-functions) — Work with JSON and other structured data formats * [String Functions](./06-string-functions) — Manipulate strings and perform regular expression operations ## Aggregate Functions [Section titled “Aggregate Functions”](#aggregate-functions) * [Aggregate Functions](./07-aggregate-functions) — Calculate summaries like sum, average, count, etc. * [Window Functions](./08-window-functions) — Provide aggregate calculations over a specified range of rows ## AI Functions [Section titled “AI Functions”](#ai-functions) * [AI Functions](./11-ai-functions) — Leverage AI and machine learning capabilities ## Specialized Functions [Section titled “Specialized Functions”](#specialized-functions) * [Hash Functions](./12-hash-functions) — Generate hash values for data security and comparison * [IP Address Functions](./14-ip-address-functions) — Manipulate and analyze IP address data * [UUID Functions](./13-uuid-functions) — Generate and manipulate UUIDs ## System and Table Functions [Section titled “System and Table Functions”](#system-and-table-functions) * [Sequence Functions](./18-sequence-functions) — Generate sequential values * [System Functions](./16-system-functions) — Access system-level information and perform control operations * [Table Functions](./17-table-functions) — Return results in a tabular format ## Other Functions [Section titled “Other Functions”](#other-functions) * [Dictionary Functions](./19-dictionary-functions) — Work with dictionary data structures * [Other Miscellaneous Functions](./20-other-functions) — A collection of various other functions * [Test Functions](./19-test-functions) — Functions used for testing purposes # Array Functions (Lakehouse v1) > Lakehouse v1 SQL array functions: build, query, transform, and aggregate array values. This section provides reference information for the array functions in PlaidCloud Lakehouse. # ARRAY_AGGREGATE (Lakehouse v1) > ARRAY_AGGREGATE — aggregates elements in the array with an aggregate function. Aggregates elements in the array with an aggregate function. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.array_aggregate( , '' ) ``` * Supported aggregate functions include `avg`, `count`, `max`, `min`, `sum`, `any`, `stddev_samp`, `stddev_pop`, `stddev`, `std`, `median`, `approx_count_distinct`, `kurtosis`, and `skewness`. * The syntax can be rewritten as `func.array_( )`. For example, `func.array_avg( )`. ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.array_aggregate([1, 2, 3, 4], 'sum'), func.array_sum([1, 2, 3, 4]) ┌──────────────────────────────────────────────────────────────────────────┐ │ func.array_aggregate([1, 2, 3, 4], 'sum') │ func.array_sum([1, 2, 3, 4])│ ├────────────────────────────────────────────┼─────────────────────────────┤ │ 10 │ 10 │ └──────────────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ARRAY_AGGREGATE( , '' ) ``` * Supported aggregate functions include `avg`, `count`, `max`, `min`, `sum`, `any`, `stddev_samp`, `stddev_pop`, `stddev`, `std`, `median`, `approx_count_distinct`, `kurtosis`, and `skewness`. * The syntax can be rewritten as `ARRAY_( )`. For example, `ARRAY_AVG( )`. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ARRAY_AGGREGATE([1, 2, 3, 4], 'SUM'), ARRAY_SUM([1, 2, 3, 4]); ┌────────────────────────────────────────────────────────────────┐ │ array_aggregate([1, 2, 3, 4], 'sum') │ array_sum([1, 2, 3, 4]) │ ├──────────────────────────────────────┼─────────────────────────┤ │ 10 │ 10 │ └────────────────────────────────────────────────────────────────┘ ``` # ARRAY_APPEND (Lakehouse v1) > ARRAY_APPEND — prepends an element to the array. Prepends an element to the array. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.array_append( , ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.array_append([3, 4], 5) ┌──────────────────────────────┐ │ func.array_append([3, 4], 5) │ ├──────────────────────────────┤ │ [3,4,5] │ └──────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ARRAY_APPEND( , ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ARRAY_APPEND([3, 4], 5); ┌─────────────────────────┐ │ array_append([3, 4], 5) │ ├─────────────────────────┤ │ [3,4,5] │ └─────────────────────────┘ ``` # ARRAY_APPLY (Lakehouse v1) > ARRAY_APPLY — alias for the ARRAY_TRANSFORM array function. Alias for [ARRAY\_TRANSFORM](../array-transform). # ARRAY_CONCAT (Lakehouse v1) > ARRAY_CONCAT — concats two arrays. Concats two arrays. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.array_concat( , ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.array_concat([1, 2], [3, 4]) ┌────────────────────────────────────┐ │ func.array_concat([1, 2], [3, 4]) │ ├────────────────────────────────────┤ │ [1,2,3,4] │ └────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ARRAY_CONCAT( , ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ARRAY_CONCAT([1, 2], [3, 4]); ┌──────────────────────────────┐ │ array_concat([1, 2], [3, 4]) │ ├──────────────────────────────┤ │ [1,2,3,4] │ └──────────────────────────────┘ ``` # ARRAY_CONTAINS (Lakehouse v1) > ARRAY_CONTAINS — alias for the CONTAINS array function. Alias for [CONTAINS](../contains). # ARRAY_DISTINCT (Lakehouse v1) > ARRAY_DISTINCT — removes all duplicates and NULLs from the array without preserving the original. Removes all duplicates and NULLs from the array without preserving the original order. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.array_distinct( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.array_distinct([1, 2, 2, 4, 3]) ┌───────────────────────────────────────┐ │ func.array_distinct([1, 2, 2, 4, 3]) │ ├───────────────────────────────────────┤ │ [1,2,4,3] │ └───────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ARRAY_DISTINCT( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ARRAY_DISTINCT([1, 2, 2, 4, 3]); ┌─────────────────────────────────┐ │ array_distinct([1, 2, 2, 4, 3]) │ ├─────────────────────────────────┤ │ [1,2,4,3] │ └─────────────────────────────────┘ ``` # ARRAY_FILTER (Lakehouse v1) > ARRAY_FILTER — constructs an array from those elements of the input array for which the lambda. Constructs an array from those elements of the input array for which the lambda function returns true. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.array_filter( , ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.array_filter([1, 2, 3], x -> (x > 1)) ┌─────────────────────────────────────────────┐ │ func.array_filter([1, 2, 3], x -> (x > 1)) │ ├─────────────────────────────────────────────┤ │ [2,3] │ └─────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ARRAY_FILTER( , ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ARRAY_FILTER([1, 2, 3], x -> x > 1); ┌───────────────────────────────────────┐ │ array_filter([1, 2, 3], x -> (x > 1)) │ ├───────────────────────────────────────┤ │ [2,3] │ └───────────────────────────────────────┘ ``` # ARRAY_FLATTEN (Lakehouse v1) > ARRAY_FLATTEN — flattens nested arrays, converting them into a single-level array. Flattens nested arrays, converting them into a single-level array. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.array_flatten( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.array_flatten([[1, 2], [3, 4, 5]]) ┌──────────────────────────────────────────┐ │ func.array_flatten([[1, 2], [3, 4, 5]]) │ ├──────────────────────────────────────────┤ │ [1,2,3,4,5] │ └──────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ARRAY_FLATTEN( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ARRAY_FLATTEN([[1,2], [3,4,5]]); ┌────────────────────────────────────┐ │ array_flatten([[1, 2], [3, 4, 5]]) │ ├────────────────────────────────────┤ │ [1,2,3,4,5] │ └────────────────────────────────────┘ ``` # ARRAY_GET (Lakehouse v1) > ARRAY_GET — alias for the GET array function. Alias for [GET](../get). # ARRAY_INDEXOF (Lakehouse v1) > ARRAY_INDEXOF — returns the index(1-based) of an element if the array contains the element. Returns the index(1-based) of an element if the array contains the element. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.array_indexof( , ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.array_indexof([1, 2, 9], 9) ┌───────────────────────────────────┐ │ func.array_indexof([1, 2, 9], 9) │ ├───────────────────────────────────┤ │ 3 │ └───────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ARRAY_INDEXOF( , ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ARRAY_INDEXOF([1, 2, 9], 9); ┌─────────────────────────────┐ │ array_indexof([1, 2, 9], 9) │ ├─────────────────────────────┤ │ 3 │ └─────────────────────────────┘ ``` # ARRAY_LENGTH (Lakehouse v1) > ARRAY_LENGTH — returns the length of an array. Returns the length of an array. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.array_length( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.array_length([1, 2]) ┌────────────────────────────┐ │ func.array_length([1, 2]) │ ├────────────────────────────┤ │ 2 │ └────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ARRAY_LENGTH( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ARRAY_LENGTH([1, 2]); ┌──────────────────────┐ │ array_length([1, 2]) │ ├──────────────────────┤ │ 2 │ └──────────────────────┘ ``` # ARRAY_PREPEND (Lakehouse v1) > ARRAY_PREPEND — prepends an element to the array. Prepends an element to the array. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.array_prepend( , ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.array_prepend(1, [3, 4]) ┌────────────────────────────────┐ │ func.array_prepend(1, [3, 4]) │ ├────────────────────────────────┤ │ [1,3,4] │ └────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ARRAY_PREPEND( , ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ARRAY_PREPEND(1, [3, 4]); ┌──────────────────────────┐ │ array_prepend(1, [3, 4]) │ ├──────────────────────────┤ │ [1,3,4] │ └──────────────────────────┘ ``` # ARRAY_REDUCE (Lakehouse v1) > ARRAY_REDUCE — applies iteratively the lambda function to the elements of the array, so as to reduce the array to a single value. Applies iteratively the lambda function to the elements of the array, so as to reduce the array to a single value. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.array_reduce( , ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.array_reduce([1, 2, 3, 4], (x, y) -> (x + y)) ┌─────────────────────────────────────────────────────┐ │ func.array_reduce([1, 2, 3, 4], (x, y) -> (x + y)) │ ├─────────────────────────────────────────────────────┤ │ 10 │ └─────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ARRAY_REDUCE( , ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ARRAY_REDUCE([1, 2, 3, 4], (x,y) -> x + y); ┌───────────────────────────────────────────────┐ │ array_reduce([1, 2, 3, 4], (x, y) -> (x + y)) │ ├───────────────────────────────────────────────┤ │ 10 │ └───────────────────────────────────────────────┘ ``` # ARRAY_REMOVE_FIRST (Lakehouse v1) > ARRAY_REMOVE_FIRST — Removes the first element from the array. Removes the first element from the array. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.array_remove_first( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.array_remove_first([1, 2, 3]) ┌─────────────────────────────────────┐ │ func.array_remove_first([1, 2, 3]) │ ├─────────────────────────────────────┤ │ [2,3] │ └─────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ARRAY_REMOVE_FIRST( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ARRAY_REMOVE_FIRST([1, 2, 3]); ┌───────────────────────────────┐ │ array_remove_first([1, 2, 3]) │ ├───────────────────────────────┤ │ [2,3] │ └───────────────────────────────┘ ``` # ARRAY_REMOVE_LAST (Lakehouse v1) > ARRAY_REMOVE_LAST — Removes the last element from the array. Removes the last element from the array. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.array_remove_last( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.array_remove_last([1, 2, 3]) ┌────────────────────────────────────┐ │ func.array_remove_last([1, 2, 3]) │ ├────────────────────────────────────┤ │ [1,2] │ └────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ARRAY_REMOVE_LAST( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ARRAY_REMOVE_LAST([1, 2, 3]); ┌──────────────────────────────┐ │ array_remove_last([1, 2, 3]) │ ├──────────────────────────────┤ │ [1,2] │ └──────────────────────────────┘ ``` # ARRAY_SIZE (Lakehouse v1) > ARRAY_SIZE — alias for the ARRAY_LENGTH array function. Alias for [ARRAY\_LENGTH](../array-length). # ARRAY_SLICE (Lakehouse v1) > ARRAY_SLICE — alias for the SLICE array function. Alias for [SLICE](../slice). # ARRAY_SORT (Lakehouse v1) > ARRAY_SORT — Sorts elements in the array in ascending order. Sorts elements in the array in ascending order. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```sql func.array_sort( [, , ] ) ``` | Parameter | Default | Description | | ------------ | ----------- | ---------------------------------------------------------------------------------------------------------------------------------------------- | | order | ASC | Specifies the sorting order as either ascending (ASC) or descending (DESC). | | nullposition | NULLS FIRST | Determines the position of NULL values in the sorting result, at the beginning (NULLS FIRST) or at the end (NULLS LAST) of the sorting output. | ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```sql func.array_sort([1, 4, 3, 2]) ┌────────────────────────────────┐ │ func.array_sort([1, 4, 3, 2]) │ ├────────────────────────────────┤ │ [1,2,3,4] │ └────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ARRAY_SORT( [, , ] ) ``` | Parameter | Default | Description | | ------------ | ----------- | ---------------------------------------------------------------------------------------------------------------------------------------------- | | order | ASC | Specifies the sorting order as either ascending (ASC) or descending (DESC). | | nullposition | NULLS FIRST | Determines the position of NULL values in the sorting result, at the beginning (NULLS FIRST) or at the end (NULLS LAST) of the sorting output. | ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ARRAY_SORT([1, 4, 3, 2]); ┌──────────────────────────┐ │ array_sort([1, 4, 3, 2]) │ ├──────────────────────────┤ │ [1,2,3,4] │ └──────────────────────────┘ ``` # ARRAY_TO_STRING (Lakehouse v1) > ARRAY_TO_STRING — concatenates elements of an array into a single string, using a specified. Concatenates elements of an array into a single string, using a specified separator. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.array_to_string( , '' ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.array_to_string(['apple', 'banana', 'cherry'], ', ') ┌────────────────────────────────────────────────────────────┐ │ func.array_to_string(['apple', 'banana', 'cherry'], ', ') │ ├────────────────────────────────────────────────────────────┤ │ Apple, Banana, Cherry │ └────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ARRAY_TO_STRING( , '' ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ARRAY_TO_STRING(['Apple', 'Banana', 'Cherry'], ', '); ┌──────────────────────────────────────────────────────┐ │ array_to_string(['apple', 'banana', 'cherry'], ', ') │ ├──────────────────────────────────────────────────────┤ │ Apple, Banana, Cherry │ └──────────────────────────────────────────────────────┘ ``` # ARRAY_TRANSFORM (Lakehouse v1) > ARRAY_TRANSFORM — returns an array that is the result of applying the lambda function to each. Returns an array that is the result of applying the lambda function to each element of the input array. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.array_transform( , ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.array_transform([1, 2, 3], x -> (x + 1)) ┌───────────────────────────────────────────────┐ │ func.array_transform([1, 2, 3], x -> (x + 1)) │ ├───────────────────────────────────────────────┤ │ [2,3,4] │ └───────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ARRAY_TRANSFORM( , ) ``` ## Aliases [Section titled “Aliases”](#aliases) * [ARRAY\_APPLY](../array-apply) ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ARRAY_TRANSFORM([1, 2, 3], x -> x + 1); ┌──────────────────────────────────────────┐ │ array_transform([1, 2, 3], x -> (x + 1)) │ ├──────────────────────────────────────────┤ │ [2,3,4] │ └──────────────────────────────────────────┘ ``` # ARRAY_UNIQUE (Lakehouse v1) > ARRAY_UNIQUE — Counts unique elements in the array (except NULL). Counts unique elements in the array (except NULL). ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.array_unique( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.array_unique([1, 2, 3, 3, 4]) ┌─────────────────────────────────────┐ │ func.array_unique([1, 2, 3, 3, 4]) │ ├─────────────────────────────────────┤ │ 4 │ └─────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ARRAY_UNIQUE( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ARRAY_UNIQUE([1, 2, 3, 3, 4]); ┌───────────────────────────────┐ │ array_unique([1, 2, 3, 3, 4]) │ ├───────────────────────────────┤ │ 4 │ └───────────────────────────────┘ ``` # ARRAYS_ZIP (Lakehouse v1) > ARRAYS_ZIP — Merges multiple arrays into a single array tuple. Merges multiple arrays into a single array tuple. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.arrays_zip( [, ...] ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.arrays_zip([1, 2, 3], ['a', 'b', 'c']) ┌──────────────────────────────────────────────┐ │ func.arrays_zip([1, 2, 3], ['a', 'b', 'c']) │ ├──────────────────────────────────────────────┤ │ [(1,'a'),(2,'b'),(3,'c')] │ └──────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ARRAYS_ZIP( [, ...] ) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | ---------- | ----------------- | | `` | The input ARRAYs. | Note * The length of each array must be the same. ## Return Type [Section titled “Return Type”](#return-type) Array(Tuple). ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ARRAYS_ZIP([1, 2, 3], ['a', 'b', 'c']); ┌────────────────────────────────────────┐ │ arrays_zip([1, 2, 3], ['a', 'b', 'c']) │ ├────────────────────────────────────────┤ │ [(1,'a'),(2,'b'),(3,'c')] │ └────────────────────────────────────────┘ ``` # CONTAINS (Lakehouse v1) > CONTAINS — Checks if the array contains a specific element. Checks if the array contains a specific element. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.contains( , ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.contains([1, 2], 1) ┌───────────────────────────┐ │ func.contains([1, 2], 1) │ ├───────────────────────────┤ │ true │ └───────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql CONTAINS( , ) ``` ## Aliases [Section titled “Aliases”](#aliases) * [ARRAY\_CONTAINS](../array-contains) ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ARRAY_CONTAINS([1, 2], 1), CONTAINS([1, 2], 1); ┌─────────────────────────────────────────────────┐ │ array_contains([1, 2], 1) │ contains([1, 2], 1) │ ├───────────────────────────┼─────────────────────┤ │ true │ true │ └─────────────────────────────────────────────────┘ ``` # GET (Array, Lakehouse v1) > GET — Returns an element from an array by index (1-based). Returns an element from an array by index (1-based). ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.get( , ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.get([1, 2], 2) ┌─────────────────────┐ │ func.get([1, 2], 2) │ ├─────────────────────┤ │ 2 │ └─────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql GET( , ) ``` ## Aliases [Section titled “Aliases”](#aliases) * [ARRAY\_GET](../array-get) ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT GET([1, 2], 2), ARRAY_GET([1, 2], 2); ┌───────────────────────────────────────┐ │ get([1, 2], 2) │ array_get([1, 2], 2) │ ├────────────────┼──────────────────────┤ │ 2 │ 2 │ └───────────────────────────────────────┘ ``` # RANGE (Lakehouse v1) > RANGE — Returns an array collected by [start, end). Returns an array collected by \[start, end). ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.range( , ) ``` ## Sqanalyzel Examples [Section titled “Sqanalyzel Examples”](#sqanalyzel-examples) ```python func.range(1, 5) ┌────────────────────┐ │ func.range(1, 5) │ ├────────────────────┤ │ [1,2,3,4] │ └────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql RANGE( , ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT RANGE(1, 5); ┌───────────────┐ │ range(1, 5) │ ├───────────────┤ │ [1,2,3,4] │ └───────────────┘ ``` # SLICE (Lakehouse v1) > SLICE — Extracts a slice from the array by index (1-based). Extracts a slice from the array by index (1-based). ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.slice( , [, ] ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.slice([1, 21, 32, 4], 2, 3) ┌──────────────────────────────────┐ │ func.slice([1, 21, 32, 4], 2, 3) │ ├──────────────────────────────────┤ │ [21,32] │ └──────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql SLICE( , [, ] ) ``` ## Aliases [Section titled “Aliases”](#aliases) * [ARRAY\_SLICE](../array-slice) ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ARRAY_SLICE([1, 21, 32, 4], 2, 3), SLICE([1, 21, 32, 4], 2, 3); ┌─────────────────────────────────────────────────────────────────┐ │ array_slice([1, 21, 32, 4], 2, 3) │ slice([1, 21, 32, 4], 2, 3) │ ├───────────────────────────────────┼─────────────────────────────┤ │ [21,32] │ [21,32] │ └─────────────────────────────────────────────────────────────────┘ ``` # UNNEST (Lakehouse v1) > UNNEST — Unnests the array and returns the set of elements. Unnests the array and returns the set of elements. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.unnest( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.unnest([1, 2]) ┌──────────────────────┐ │ func.unnest([1, 2]) │ ├──────────────────────┤ │ 1 │ │ 2 │ └──────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql UNNEST( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT UNNEST([1, 2]); ┌─────────────────┐ │ unnest([1, 2]) │ ├─────────────────┤ │ 1 │ │ 2 │ └─────────────────┘ -- UNNEST(array) can be used as a table function. SELECT * FROM UNNEST([1, 2]); ┌─────────────────┐ │ value │ ├─────────────────┤ │ 1 │ │ 2 │ └─────────────────┘ ``` ## A Practical Example [Section titled “A Practical Example”](#a-practical-example) In the examples below, we will use the following table called contacts with the phones column defined with an array of text. ```python CREATE TABLE contacts ( id SERIAL PRIMARY KEY, name VARCHAR (100), phones TEXT [] ); ``` The phones column is a one-dimensional array that holds various phone numbers that a contact may have. To define multiple dimensional array, you add the square brackets. For example, you can define a two-dimensional array as follows: ```python column_name data_type [][] ``` An example of inserting data into that table ```python INSERT INTO contacts (name, phones) VALUES('John Doe',ARRAY [ '(408)-589-5846','(408)-589-5555' ]); ``` or ```python INSERT INTO contacts (name, phones) VALUES('Lily Bush','{"(408)-589-5841"}'), ('William Gate','{"(408)-589-5842","(408)-589-5843"}'); ``` The unnest() function expands an array to a list of rows. For example, the following query expands all phone numbers of the phones array. ```python SELECT name, unnest(phones) FROM contacts; ``` Output: | name | unnest | | ------------ | -------------- | | John Doe | (408)-589-5846 | | John Doe | (408)-589-5555 | | Lily Bush | (408)-589-5841 | | William Gate | (408)-589-5843 | # Bitmap Functions (Lakehouse v1) > Lakehouse v1 SQL bitmap functions: build and operate on roaring bitmap values for fast set arithmetic. This section provides reference information for the bitmap functions in PlaidCloud Lakehouse. # BITMAP_AND (Lakehouse v1) > BITMAP_AND — Performs a bitwise AND operation on the two bitmaps. Performs a bitwise AND operation on the two bitmaps. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bitmap_and( , ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bitmap_and(func.build_bitmap([1, 4, 5]), func.cast(build_bitmap([4, 5])), string) ┌────────────────────────────────────────────────────────────────────────────────────────┐ │ func.bitmap_and(func.build_bitmap([1, 4, 5]), func.cast(build_bitmap([4, 5])), string) │ ├────────────────────────────────────────────────────────────────────────────────────────┤ │ 4,5 │ └────────────────────────────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BITMAP_AND( , ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITMAP_AND(BUILD_BITMAP([1,4,5]), BUILD_BITMAP([4,5]))::String; ┌───────────────────────────────────────────────────────────────────┐ │ bitmap_and(build_bitmap([1, 4, 5]), build_bitmap([4, 5]))::string │ ├───────────────────────────────────────────────────────────────────┤ │ 4,5 │ └───────────────────────────────────────────────────────────────────┘ ``` # BITMAP_AND_COUNT (Lakehouse v1) > BITMAP_AND_COUNT — counts the number of bits set to 1 in the bitmap by performing a logical AND operation. Counts the number of bits set to 1 in the bitmap by performing a logical AND operation. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bitmap_and_count( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bitmap_and_count(to_bitmap('1, 3, 5')) ┌─────────────────────────────────────────────┐ │ func.bitmap_and_count(to_bitmap('1, 3, 5')) │ ├─────────────────────────────────────────────┤ │ 3 │ └─────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BITMAP_AND_COUNT( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITMAP_AND_COUNT(TO_BITMAP('1, 3, 5')); ┌────────────────────────────────────────┐ │ bitmap_and_count(to_bitmap('1, 3, 5')) │ ├────────────────────────────────────────┤ │ 3 │ └────────────────────────────────────────┘ ``` # BITMAP_AND_NOT (Lakehouse v1) > BITMAP_AND_NOT — alias for the BITMAP_NOT bitmap function. Alias for [BITMAP\_NOT](../bitmap-not). # BITMAP_CARDINALITY (Lakehouse v1) > BITMAP_CARDINALITY — alias for the BITMAP_COUNT bitmap function. Reference. Alias for [BITMAP\_COUNT](../bitmap-count). # BITMAP_CONTAINS (Lakehouse v1) > BITMAP_CONTAINS — Checks if the bitmap contains a specific value. Checks if the bitmap contains a specific value. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bitmap_contains( , ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bitmap_contains(build_bitmap([1, 4, 5]), 1) ┌───────────────────────────────────────────────────┐ │ func.bitmap_contains(build_bitmap([1, 4, 5]), 1) │ ├───────────────────────────────────────────────────┤ │ true │ └───────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BITMAP_CONTAINS( , ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITMAP_CONTAINS(BUILD_BITMAP([1,4,5]), 1); ┌─────────────────────────────────────────────┐ │ bitmap_contains(build_bitmap([1, 4, 5]), 1) │ ├─────────────────────────────────────────────┤ │ true │ └─────────────────────────────────────────────┘ ``` # BITMAP_COUNT (Lakehouse v1) > BITMAP_COUNT — Counts the number of bits set to 1 in the bitmap. Counts the number of bits set to 1 in the bitmap. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bitmap_count( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bitmap_count(build_bitmap([1, 4, 5])) ┌────────────────────────────────────────────┐ │ func.bitmap_count(build_bitmap([1, 4, 5])) │ ├────────────────────────────────────────────┤ │ 3 │ └────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BITMAP_COUNT( ) ``` ## Aliases [Section titled “Aliases”](#aliases) * [BITMAP\_CARDINALITY](../bitmap-cardinality) ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITMAP_COUNT(BUILD_BITMAP([1,4,5])), BITMAP_CARDINALITY(BUILD_BITMAP([1,4,5])); ┌─────────────────────────────────────────────────────────────────────────────────────┐ │ bitmap_count(build_bitmap([1, 4, 5])) │ bitmap_cardinality(build_bitmap([1, 4, 5])) │ ├───────────────────────────────────────┼─────────────────────────────────────────────┤ │ 3 │ 3 │ └─────────────────────────────────────────────────────────────────────────────────────┘ ``` # BITMAP_HAS_ALL (Lakehouse v1) > BITMAP_HAS_ALL — checks if the first bitmap contains all the bits in the second bitmap. Checks if the first bitmap contains all the bits in the second bitmap. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bitmap_has_all( , ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bitmap_has_all(build_bitmap([1, 4, 5]), build_bitmap([1, 2])) ┌─────────────────────────────────────────────────────────────────────┐ │ func.bitmap_has_all(build_bitmap([1, 4, 5]), build_bitmap([1, 2])) │ ├─────────────────────────────────────────────────────────────────────┤ │ false │ └─────────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BITMAP_HAS_ALL( , ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITMAP_HAS_ALL(BUILD_BITMAP([1,4,5]), BUILD_BITMAP([1,2])); ┌───────────────────────────────────────────────────────────────┐ │ bitmap_has_all(build_bitmap([1, 4, 5]), build_bitmap([1, 2])) │ ├───────────────────────────────────────────────────────────────┤ │ false │ └───────────────────────────────────────────────────────────────┘ ``` # BITMAP_HAS_ANY (Lakehouse v1) > BITMAP_HAS_ANY — checks if the first bitmap has any bit matching the bits in the second bitmap. Checks if the first bitmap has any bit matching the bits in the second bitmap. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bitmap_has_any( , ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bitmap_has_any(func.build_bitmap([1, 4, 5]), func.build_bitmap([1, 2])) ┌───────────────────────────────────────────────────────────────────────────────┐ │ func.bitmap_has_any(func.build_bitmap([1, 4, 5]), func.build_bitmap([1, 2])) │ ├───────────────────────────────────────────────────────────────────────────────┤ │ true │ └───────────────────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BITMAP_HAS_ANY( , ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITMAP_HAS_ANY(BUILD_BITMAP([1,4,5]), BUILD_BITMAP([1,2])); ┌───────────────────────────────────────────────────────────────┐ │ bitmap_has_any(build_bitmap([1, 4, 5]), build_bitmap([1, 2])) │ ├───────────────────────────────────────────────────────────────┤ │ true │ └───────────────────────────────────────────────────────────────┘ ``` # BITMAP_INTERSECT (Lakehouse v1) > BITMAP_INTERSECT — counts the number of bits set to 1 in the bitmap by performing a logical. Counts the number of bits set to 1 in the bitmap by performing a logical INTERSECT operation. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bitmap_intersect( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bitmap_intersect(func.to_bitmap('1, 3, 5')) ┌──────────────────────────────────────────────────┐ │ func.bitmap_intersect(func.to_bitmap('1, 3, 5')) │ ├──────────────────────────────────────────────────┤ │ 1,3,5 │ └──────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BITMAP_INTERSECT( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITMAP_INTERSECT(TO_BITMAP('1, 3, 5'))::String; ┌────────────────────────────────────────────────┐ │ bitmap_intersect(to_bitmap('1, 3, 5'))::string │ ├────────────────────────────────────────────────┤ │ 1,3,5 │ └────────────────────────────────────────────────┘ ``` # BITMAP_MAX (Lakehouse v1) > BITMAP_MAX — Gets the maximum value in the bitmap. Gets the maximum value in the bitmap. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bitmap_max( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bitmap_max(func.build_bitmap([1, 4, 5])) ┌───────────────────────────────────────────────┐ │ func.bitmap_max(func.build_bitmap([1, 4, 5])) │ ├───────────────────────────────────────────────┤ │ 5 │ └───────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BITMAP_MAX( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITMAP_MAX(BUILD_BITMAP([1,4,5])); ┌─────────────────────────────────────┐ │ bitmap_max(build_bitmap([1, 4, 5])) │ ├─────────────────────────────────────┤ │ 5 │ └─────────────────────────────────────┘ ``` # BITMAP_MIN (Lakehouse v1) > BITMAP_MIN — Gets the minimum value in the bitmap. Gets the minimum value in the bitmap. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bitmap_min( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bitmap_min(func.build_bitmap([1, 4, 5])) ┌───────────────────────────────────────────────┐ │ func.bitmap_min(func.build_bitmap([1, 4, 5])) │ ├───────────────────────────────────────────────┤ │ 1 │ └───────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BITMAP_MIN( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITMAP_MIN(BUILD_BITMAP([1,4,5])); ┌─────────────────────────────────────┐ │ bitmap_min(build_bitmap([1, 4, 5])) │ ├─────────────────────────────────────┤ │ 1 │ └─────────────────────────────────────┘ ``` # BITMAP_NOT (Lakehouse v1) > BITMAP_NOT — generates a new bitmap with elements from the first bitmap that are not in the second one. Generates a new bitmap with elements from the first bitmap that are not in the second one. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bitmap_not( , ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bitmap_not(func.build_bitmap([1, 4, 5]), func.cast(func.build_bitmap([5, 6, 7])), Text) ┌───────────────────────────────────────────────────────────────────────────────────────────────┐ │ func.bitmap_not(func.build_bitmap([1, 4, 5]), func.cast(func.build_bitmap([5, 6, 7])), Text) │ ├───────────────────────────────────────────────────────────────────────────────────────────────┤ │ 1,4 │ └───────────────────────────────────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BITMAP_NOT( , ) ``` ## Aliases [Section titled “Aliases”](#aliases) * [BITMAP\_AND\_NOT](../bitmap-and-not) ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITMAP_NOT(BUILD_BITMAP([1,4,5]), BUILD_BITMAP([5,6,7]))::String; ┌──────────────────────────────────────────────────────────────────────┐ │ bitmap_not(build_bitmap([1, 4, 5]), build_bitmap([5, 6, 7]))::string │ ├──────────────────────────────────────────────────────────────────────┤ │ 1,4 │ └──────────────────────────────────────────────────────────────────────┘ SELECT BITMAP_AND_NOT(BUILD_BITMAP([1,4,5]), BUILD_BITMAP([5,6,7]))::String; ┌──────────────────────────────────────────────────────────────────────────┐ │ bitmap_and_not(build_bitmap([1, 4, 5]), build_bitmap([5, 6, 7]))::string │ ├──────────────────────────────────────────────────────────────────────────┤ │ 1,4 │ └──────────────────────────────────────────────────────────────────────────┘ ``` # BITMAP_NOT_COUNT (Lakehouse v1) > BITMAP_NOT_COUNT — counts the number of bits set to 0 in the bitmap by performing a logical NOT. Counts the number of bits set to 0 in the bitmap by performing a logical NOT operation. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bitmap_not_count( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bitmap_not_count(func.to_bitmap('1, 3, 5')) ┌──────────────────────────────────────────────────┐ │ func.bitmap_not_count(func.to_bitmap('1, 3, 5')) │ ├──────────────────────────────────────────────────┤ │ 3 │ └──────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BITMAP_NOT_COUNT( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITMAP_NOT_COUNT(TO_BITMAP('1, 3, 5')); ┌────────────────────────────────────────┐ │ bitmap_not_count(to_bitmap('1, 3, 5')) │ ├────────────────────────────────────────┤ │ 3 │ └────────────────────────────────────────┘ ``` # BITMAP_OR (Lakehouse v1) > BITMAP_OR — Performs a bitwise OR operation on the two bitmaps. Performs a bitwise OR operation on the two bitmaps. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bitmap_or( , ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bitmap_or(func.build_bitmap([1, 4, 5]), func.build_bitmap([6, 7])) ┌─────────────────────────────────────────────────────────────────────────┐ │ func.bitmap_or(func.build_bitmap([1, 4, 5]), func.build_bitmap([6, 7])) │ ├─────────────────────────────────────────────────────────────────────────┤ │ 1,4,5,6,7 │ └─────────────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BITMAP_OR( , ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITMAP_OR(BUILD_BITMAP([1,4,5]), BUILD_BITMAP([6,7]))::String; ┌──────────────────────────────────────────────────────────────────┐ │ bitmap_or(build_bitmap([1, 4, 5]), build_bitmap([6, 7]))::string │ ├──────────────────────────────────────────────────────────────────┤ │ 1,4,5,6,7 │ └──────────────────────────────────────────────────────────────────┘ ``` # BITMAP_OR_COUNT (Lakehouse v1) > BITMAP_OR_COUNT — counts the number of bits set to 1 in the bitmap by performing a logical OR operation. Counts the number of bits set to 1 in the bitmap by performing a logical OR operation. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bitmap_or_count( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bitmap_or_count(func.to_bitmap('1, 3, 5')) ┌─────────────────────────────────────────────────┐ │ func.bitmap_or_count(func.to_bitmap('1, 3, 5')) │ ├─────────────────────────────────────────────────┤ │ 3 │ └─────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BITMAP_OR_COUNT( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITMAP_OR_COUNT(TO_BITMAP('1, 3, 5')); ┌───────────────────────────────────────┐ │ bitmap_or_count(to_bitmap('1, 3, 5')) │ ├───────────────────────────────────────┤ │ 3 │ └───────────────────────────────────────┘ ``` # BITMAP_SUBSET_IN_RANGE (Lakehouse v1) > BITMAP_SUBSET_IN_RANGE — generates a sub-bitmap of the source bitmap within a specified range. Generates a sub-bitmap of the source bitmap within a specified range. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bitmap_subset_in_range( , , ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bitmap_subset_in_range(func.build_bitmap([5, 7, 9]), 6, 9) ┌─────────────────────────────────────────────────────────────────┐ │ func.bitmap_subset_in_range(func.build_bitmap([5, 7, 9]), 6, 9) │ ├─────────────────────────────────────────────────────────────────┤ │ 7 │ └─────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BITMAP_SUBSET_IN_RANGE( , , ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITMAP_SUBSET_IN_RANGE(BUILD_BITMAP([5,7,9]), 6, 9)::String; ┌───────────────────────────────────────────────────────────────┐ │ bitmap_subset_in_range(build_bitmap([5, 7, 9]), 6, 9)::string │ ├───────────────────────────────────────────────────────────────┤ │ 7 │ └───────────────────────────────────────────────────────────────┘ ``` # BITMAP_SUBSET_LIMIT (Lakehouse v1) > BITMAP_SUBSET_LIMIT — generates a sub-bitmap of the source bitmap, beginning with a range from the start value, with a size limit. Generates a sub-bitmap of the source bitmap, beginning with a range from the start value, with a size limit. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bitmap_subset_limit( , , ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bitmap_subset_limit(func.build_bitmap([1, 4, 5]), 2, 2) ┌──────────────────────────────────────────────────────────────┐ │ func.bitmap_subset_limit(func.build_bitmap([1, 4, 5]), 2, 2) │ ├──────────────────────────────────────────────────────────────┤ │ 4,5 │ └──────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BITMAP_SUBSET_LIMIT( , , ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITMAP_SUBSET_LIMIT(BUILD_BITMAP([1,4,5]), 2, 2)::String; ┌────────────────────────────────────────────────────────────┐ │ bitmap_subset_limit(build_bitmap([1, 4, 5]), 2, 2)::string │ ├────────────────────────────────────────────────────────────┤ │ 4,5 │ └────────────────────────────────────────────────────────────┘ ``` # BITMAP_UNION (Lakehouse v1) > BITMAP_UNION — counts the number of bits set to 1 in the bitmap by performing a logical UNION. Counts the number of bits set to 1 in the bitmap by performing a logical UNION operation. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bitmap_union( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bitmap_union(func.to_bitmap('1, 3, 5')) ┌──────────────────────────────────────────────┐ │ func.bitmap_union(func.to_bitmap('1, 3, 5')) │ ├──────────────────────────────────────────────┤ │ 1,3,5 │ └──────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BITMAP_UNION( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITMAP_UNION(TO_BITMAP('1, 3, 5'))::String; ┌────────────────────────────────────────────┐ │ bitmap_union(to_bitmap('1, 3, 5'))::string │ ├────────────────────────────────────────────┤ │ 1,3,5 │ └────────────────────────────────────────────┘ ``` # BITMAP_XOR (Lakehouse v1) > BITMAP_XOR — performs a bitwise XOR (exclusive OR) operation on the two bitmaps. Performs a bitwise XOR (exclusive OR) operation on the two bitmaps. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bitmap_xor( , ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bitmap_xor(func.build_bitmap([1, 4, 5]), func.build_bitmap([5, 6, 7])) ┌─────────────────────────────────────────────────────────────────────────────┐ │ func.bitmap_xor(func.build_bitmap([1, 4, 5]), func.build_bitmap([5, 6, 7])) │ ├─────────────────────────────────────────────────────────────────────────────┤ │ 1,4,6,7 │ └─────────────────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BITMAP_XOR( , ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITMAP_XOR(BUILD_BITMAP([1,4,5]), BUILD_BITMAP([5,6,7]))::String; ┌──────────────────────────────────────────────────────────────────────┐ │ bitmap_xor(build_bitmap([1, 4, 5]), build_bitmap([5, 6, 7]))::string │ ├──────────────────────────────────────────────────────────────────────┤ │ 1,4,6,7 │ └──────────────────────────────────────────────────────────────────────┘ ``` # BITMAP_XOR_COUNT (Lakehouse v1) > BITMAP_XOR_COUNT — counts the number of bits set to 1 in the bitmap by performing a logical XOR. Counts the number of bits set to 1 in the bitmap by performing a logical XOR (exclusive OR) operation. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bitmap_xor_count( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bitmap_xor_count(func.to_bitmap('1, 3, 5')) ┌──────────────────────────────────────────────────┐ │ func.bitmap_xor_count(func.to_bitmap('1, 3, 5')) │ ├──────────────────────────────────────────────────┤ │ 3 │ └──────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BITMAP_XOR_COUNT( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITMAP_XOR_COUNT(TO_BITMAP('1, 3, 5')); ┌────────────────────────────────────────┐ │ bitmap_xor_count(to_bitmap('1, 3, 5')) │ ├────────────────────────────────────────┤ │ 3 │ └────────────────────────────────────────┘ ``` # INTERSECT_COUNT (Lakehouse v1) > INTERSECT_COUNT — counts the number of intersecting bits between two bitmap columns. Counts the number of intersecting bits between two bitmap columns. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.intersect_count(( '', '' ), ( , )) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python # Given a dataset like this: ┌───────────────────────────────────────┐ │ id │ tag │ v │ ├─────────────────┼─────────────────────┤ │ 1 │ a │ 0, 1 │ │ 3 │ b │ 0, 1, 2 │ │ 2 │ c │ 1, 3, 4 │ └───────────────────────────────────────┘ # This is produced func.intersect_count(('b', 'c'), (v, tag)) ┌──────────────────────────────────────────────────────────┐ │ id │ func.intersect_count('b', 'c')(v, tag) │ ├─────────────────┼────────────────────────────────────────┤ │ 1 │ 0 │ │ 3 │ 3 │ │ 2 │ 3 │ └──────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql INTERSECT_COUNT( '', '' )( , ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql CREATE TABLE agg_bitmap_test(id Int, tag String, v Bitmap); INSERT INTO agg_bitmap_test(id, tag, v) VALUES (1, 'a', to_bitmap('0, 1')), (2, 'b', to_bitmap('0, 1, 2')), (3, 'c', to_bitmap('1, 3, 4')); SELECT id, INTERSECT_COUNT('b', 'c')(v, tag) FROM agg_bitmap_test GROUP BY id; ┌─────────────────────────────────────────────────────┐ │ id │ intersect_count('b', 'c')(v, tag) │ ├─────────────────┼───────────────────────────────────┤ │ 1 │ 0 │ │ 3 │ 3 │ │ 2 │ 3 │ └─────────────────────────────────────────────────────┘ ``` # SUB_BITMAP (Lakehouse v1) > SUB_BITMAP — generates a sub-bitmap of the source bitmap, beginning from the start index, with a specified size. Generates a sub-bitmap of the source bitmap, beginning from the start index, with a specified size. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.sub_bitmap( , , ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.sub_bitmap(func.build_bitmap([1, 2, 3, 4, 5]), 1, 3) ┌───────────────────────────────────────────────────────────┐ │ func.sub_bitmap(func.build_bitmap([1, 2, 3, 4, 5]), 1, 3) │ ├───────────────────────────────────────────────────────────┤ │ 2,3,4 │ └───────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql SUB_BITMAP( , , ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT SUB_BITMAP(BUILD_BITMAP([1, 2, 3, 4, 5]), 1, 3)::String; ┌─────────────────────────────────────────────────────────┐ │ sub_bitmap(build_bitmap([1, 2, 3, 4, 5]), 1, 3)::string │ ├─────────────────────────────────────────────────────────┤ │ 2,3,4 │ └─────────────────────────────────────────────────────────┘ ``` # Conversion Functions (Lakehouse v1) > Lakehouse v1 SQL conversion functions: cast values between types — CAST, TRY_CAST, parse, and format helpers. This section provides reference information for the conversion functions in PlaidCloud Lakehouse. Please note the following when converting a value from one type to another: * When converting from floating-point, decimal numbers, or strings to integers or decimal numbers with fractional parts, PlaidCloud Lakehouse rounds the values to the nearest integer. This is determined by the setting `numeric_cast_option` (defaults to ‘rounding’) which controls the behavior of numeric casting operations. When `numeric_cast_option` is explicitly set to ‘truncating’, PlaidCloud Lakehouse will truncate the decimal part, discarding any fractional values. ```sql SELECT CAST('0.6' AS DECIMAL(10, 0)), CAST(0.6 AS DECIMAL(10, 0)), CAST(1.5 AS INT); ┌──────────────────────────────────────────────────────────────────────────────────┐ │ cast('0.6' as decimal(10, 0)) │ cast(0.6 as decimal(10, 0)) │ cast(1.5 as int32) │ ├───────────────────────────────┼─────────────────────────────┼────────────────────┤ │ 1 │ 1 │ 2 │ └──────────────────────────────────────────────────────────────────────────────────┘ SET numeric_cast_option = 'truncating'; SELECT CAST('0.6' AS DECIMAL(10, 0)), CAST(0.6 AS DECIMAL(10, 0)), CAST(1.5 AS INT); ┌──────────────────────────────────────────────────────────────────────────────────┐ │ cast('0.6' as decimal(10, 0)) │ cast(0.6 as decimal(10, 0)) │ cast(1.5 as int32) │ ├───────────────────────────────┼─────────────────────────────┼────────────────────┤ │ 0 │ 0 │ 1 │ └──────────────────────────────────────────────────────────────────────────────────┘ ``` The table below presents a summary of numeric casting operations, highlighting the casting possibilities between different source and target numeric data types. Please note that, it specifies the requirement for String to Integer casting, where the source string must contain an integer value. | Source Type | Target Type | | ------------ | ----------- | | String | Decimal | | Float | Decimal | | Decimal | Decimal | | Float | Int | | Decimal | Int | | String (Int) | Int | * PlaidCloud Lakehouse also offers a variety of functions for converting expressions into different date and time formats. For more information, see [Date & Time Functions](../05-datetime-functions). # BUILD_BITMAP (Lakehouse v1) > BUILD_BITMAP — converts an array of positive integers to a BITMAP value. Converts an array of positive integers to a BITMAP value. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.build_bitmap( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_string(func.build_bitmap([1, 4, 5])) ┌───────────────────────────────────────────────┐ │ func.to_string(func.build_bitmap([1, 4, 5])) │ ├───────────────────────────────────────────────┤ │ 1,4,5 │ └───────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BUILD_BITMAP( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BUILD_BITMAP([1,4,5])::String; ┌─────────────────────────────────┐ │ build_bitmap([1, 4, 5])::string │ ├─────────────────────────────────┤ │ 1,4,5 │ └─────────────────────────────────┘ ``` # CAST, :: (Lakehouse v1) > CAST, :: — converts a value from one data type to another. Converts a value from one data type to another. `::` is an alias for CAST. See also: [TRY\_CAST](../try-cast) ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.cast( , ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.cast(1, string), func.to_string(1) ┌───────────────────────────────────────────┐ │ func.cast(1, string) │ func.to_string(1) │ ├──────────────────────┼────────────────────┤ │ 1 │ 1 │ └───────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql CAST( AS ) :: ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT CAST(1 AS VARCHAR), 1::VARCHAR; ┌───────────────────────────────┐ │ cast(1 as string) │ 1::string │ ├───────────────────┼───────────┤ │ 1 │ 1 │ └───────────────────────────────┘ ``` # TO_BINARY (Lakehouse v1) > TO_BINARY — converts supported data types, including string, variant, bitmap, geometry, and geography, into their binary representation (hex format). Converts supported data types, including string, variant, bitmap, geometry, and geography, into their binary representation (hex format). See also: [TRY\_TO\_BINARY](../try-to-binary) ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_binary( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_binary('Databend') ┌───────────────────────────────┐ │ func.to_binary('Databend') │ ├───────────────────────────────┤ │ 4461746162656E64 │ └───────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_BINARY( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) This example converts a string to binary: ```sql SELECT TO_BINARY('Databend'); ┌───────────────────────┐ │ to_binary('Databend') │ ├───────────────────────┤ │ 4461746162656E64 │ └───────────────────────┘ ``` This example converts JSON data to binary: ```sql SELECT TO_BINARY(PARSE_JSON('{"key":"value", "number":123}')) AS binary_variant; ┌──────────────────────────────────────────────────────────────────────────┐ │ binary_variant │ ├──────────────────────────────────────────────────────────────────────────┤ │ 40000002100000031000000610000005200000026B65796E756D62657276616C7565507B │ └──────────────────────────────────────────────────────────────────────────┘ ``` This example converts bitmap data to binary: ```sql SELECT TO_BINARY(TO_BITMAP('10,20,30')) AS binary_bitmap; ┌──────────────────────────────────────────────────────────────────────┐ │ binary_bitmap │ ├──────────────────────────────────────────────────────────────────────┤ │ 0100000000000000000000003A3000000100000000000200100000000A0014001E00 │ └──────────────────────────────────────────────────────────────────────┘ ``` This example converts geometry data (WKT format) to binary: ```sql SELECT TO_BINARY(ST_GEOMETRYFROMWKT('SRID=4326;POINT(1.0 2.0)')) AS binary_geometry; ┌────────────────────────────────────────────────────┐ │ binary_geometry │ ├────────────────────────────────────────────────────┤ │ 0101000020E6100000000000000000F03F0000000000000040 │ └────────────────────────────────────────────────────┘ ``` This example converts geography data (EWKT format) to binary: ```sql SELECT TO_BINARY(ST_GEOGRAPHYFROMEWKT('SRID=4326;POINT(-122.35 37.55)')) AS binary_geography; ┌────────────────────────────────────────────────────┐ │ binary_geography │ ├────────────────────────────────────────────────────┤ │ 0101000020E61000006666666666965EC06666666666C64240 │ └────────────────────────────────────────────────────┘ ``` # TO_BITMAP (Lakehouse v1) > TO_BITMAP — Converts a value to BITMAP data type. Converts a value to BITMAP data type. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_bitmap( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_bitmap('1101') ┌─────────────────────────┐ │ func.to_bitmap('1101') │ ├─────────────────────────┤ │ │ └─────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_BITMAP( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT TO_BITMAP('1101'); ┌───────────────────┐ │ to_bitmap('1101') │ ├───────────────────┤ │ │ └───────────────────┘ ``` # TO_BOOLEAN (Lakehouse v1) > TO_BOOLEAN — Converts a value to BOOLEAN data type. Converts a value to BOOLEAN data type. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_boolean( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_boolean('true') ┌──────────────────────────┐ │ func.to_boolean('true') │ ├──────────────────────────┤ │ true │ └──────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_BOOLEAN( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT TO_BOOLEAN('true'); ┌────────────────────┐ │ to_boolean('true') │ ├────────────────────┤ │ true │ └────────────────────┘ ``` # TO_FLOAT32 (Lakehouse v1) > TO_FLOAT32 — Converts a value to FLOAT32 data type. Converts a value to FLOAT32 data type. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_float32( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_float32('1.2') ┌─────────────────────────┐ │ func.to_float32('1.2') │ ├─────────────────────────┤ │ 1.2 │ └─────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_FLOAT32( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT TO_FLOAT32('1.2'); ┌───────────────────┐ │ to_float32('1.2') │ ├───────────────────┤ │ 1.2 │ └───────────────────┘ ``` # TO_FLOAT64 (Lakehouse v1) > TO_FLOAT64 — Converts a value to FLOAT64 data type. Converts a value to FLOAT64 data type. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_float64( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_float64('1.2') ┌─────────────────────────┐ │ func.to_float64('1.2') │ ├─────────────────────────┤ │ 1.2 │ └─────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_FLOAT64( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT TO_FLOAT64('1.2'); ┌───────────────────┐ │ to_float64('1.2') │ ├───────────────────┤ │ 1.2 │ └───────────────────┘ ``` # TO_HEX (Lakehouse v1) > TO_HEX — for a string argument str, TO_HEX() returns a hexadecimal string representation of str where each byte of each character in str is converted to two. For a string argument str, TO\_HEX() returns a hexadecimal string representation of str where each byte of each character in str is converted to two hexadecimal digits. The inverse of this operation is performed by the UNHEX() function. For a numeric argument N, TO\_HEX() returns a hexadecimal string representation of the value of N treated as a longlong (BIGINT) number. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_hex() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_hex('abc') ┌────────────────────┐ │ func.to_hex('abc') │ ├────────────────────┤ │ 616263 │ └────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_HEX() ``` ## Aliases [Section titled “Aliases”](#aliases) * [HEX](../../06-string-functions/hex) ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT HEX('abc'), TO_HEX('abc'); ┌────────────────────────────┐ │ hex('abc') │ to_hex('abc') │ ├────────────┼───────────────┤ │ 616263 │ 616263 │ └────────────────────────────┘ SELECT HEX(255), TO_HEX(255); ┌────────────────────────┐ │ hex(255) │ to_hex(255) │ ├──────────┼─────────────┤ │ ff │ ff │ └────────────────────────┘ ``` # TO_INT16 (Lakehouse v1) > TO_INT16 — Converts a value to INT16 data type. Converts a value to INT16 data type. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_int16( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_int16('123') ┌──────────────────────┐ │ func.to_int16('123') │ ├──────────────────────┤ │ 123 │ └──────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_INT16( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT TO_INT16('123'); ┌─────────────────┐ │ to_int16('123') │ ├─────────────────┤ │ 123 │ └─────────────────┘ ``` # TO_INT32 (Lakehouse v1) > TO_INT32 — Converts a value to INT32 data type. Converts a value to INT32 data type. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_int32( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_int32('123') ┌──────────────────────┐ │ func.to_int32('123') │ ├──────────────────────┤ │ 123 │ └──────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_INT32( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT TO_INT32('123'); ┌─────────────────┐ │ to_int32('123') │ ├─────────────────┤ │ 123 │ └─────────────────┘ ``` # TO_INT64 (Lakehouse v1) > TO_INT64 — Converts a value to INT64 data type. Converts a value to INT64 data type. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_int64( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_int64('123') ┌──────────────────────┐ │ func.to_int64('123') │ ├──────────────────────┤ │ 123 │ └──────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_INT64( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT TO_INT64('123'); ┌─────────────────┐ │ to_int64('123') │ ├─────────────────┤ │ 123 │ └─────────────────┘ ``` # TO_INT8 (Lakehouse v1) > TO_INT8 — converts a value to INT8 data type. Converts a value to INT8 data type. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_int8( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_int8('123') ┌─────────────────────┐ │ func.to_int8('123') │ ├─────────────────────┤ │ 123 │ └─────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_INT8( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT TO_INT8('123'); ┌────────────────┐ │ to_int8('123') │ │ UInt8 │ ├────────────────┤ │ 123 │ └────────────────┘ ``` # TO_STRING (Conversion, Lakehouse v1) > TO_STRING — converts a value to String data type, or converts a Date value to a specific. Converts a value to String data type, or converts a Date value to a specific string format. To customize the format of date and time in PlaidCloud Lakehouse, you can utilize specifiers. These specifiers allow you to define the desired format for date and time values. For a comprehensive list of supported specifiers, see Formatting Date and Time. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_string( '' ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.date_format('1.23'), func.to_string('1.23'), func.to_text('1.23'), func.to_varchar('1.23'), func.json_to_string('1.23') ┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ func.date_format('1.23') │ func.to_string('1.23') │ func.to_text('1.23') │ func.to_varchar('1.23') │ func.json_to_string('1.23') │ ├──────────────────────────┼────────────────────────┼──────────────────────┼─────────────────────────┼─────────────────────────────┤ │ 1.23 │ 1.23 │ 1.23 │ 1.23 │ 1.23 │ └──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_STRING( '' ) TO_STRING( '', '' ) ``` ## Aliases [Section titled “Aliases”](#aliases) * [DATE\_FORMAT](../../05-datetime-functions/date-format) * [JSON\_TO\_STRING](../../10-semi-structured-functions/json-to-string) * [TO\_TEXT](../to-text) * [TO\_VARCHAR](../to-varchar) ## Return Type [Section titled “Return Type”](#return-type) String. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT DATE_FORMAT('1.23'), TO_STRING('1.23'), TO_TEXT('1.23'), TO_VARCHAR('1.23'), JSON_TO_STRING('1.23'); ┌─────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ date_format('1.23') │ to_string('1.23') │ to_text('1.23') │ to_varchar('1.23') │ json_to_string('1.23') │ ├─────────────────────┼───────────────────┼─────────────────┼────────────────────┼────────────────────────┤ │ 1.23 │ 1.23 │ 1.23 │ 1.23 │ 1.23 │ └─────────────────────────────────────────────────────────────────────────────────────────────────────────┘ SELECT DATE_FORMAT('["Cooking", "Reading"]' :: JSON), TO_STRING('["Cooking", "Reading"]' :: JSON), TO_TEXT('["Cooking", "Reading"]' :: JSON), TO_VARCHAR('["Cooking", "Reading"]' :: JSON), JSON_TO_STRING('["Cooking", "Reading"]' :: JSON); ┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ date_format('["cooking", "reading"]'::variant) │ to_string('["cooking", "reading"]'::variant) │ to_text('["cooking", "reading"]'::variant) │ to_varchar('["cooking", "reading"]'::variant) │ json_to_string('["cooking", "reading"]'::variant) │ ├────────────────────────────────────────────────┼──────────────────────────────────────────────┼────────────────────────────────────────────┼───────────────────────────────────────────────┼───────────────────────────────────────────────────┤ │ ["Cooking","Reading"] │ ["Cooking","Reading"] │ ["Cooking","Reading"] │ ["Cooking","Reading"] │ ["Cooking","Reading"] │ └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ -- With one argument, the function converts input to a string without validating as a date. SELECT DATE_FORMAT('20223-12-25'), TO_STRING('20223-12-25'), TO_TEXT('20223-12-25'), TO_VARCHAR('20223-12-25'), JSON_TO_STRING('20223-12-25'); ┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ date_format('20223-12-25') │ to_string('20223-12-25') │ to_text('20223-12-25') │ to_varchar('20223-12-25') │ json_to_string('20223-12-25') │ ├────────────────────────────┼──────────────────────────┼────────────────────────┼───────────────────────────┼───────────────────────────────┤ │ 20223-12-25 │ 20223-12-25 │ 20223-12-25 │ 20223-12-25 │ 20223-12-25 │ └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ SELECT DATE_FORMAT('2022-12-25', '%m/%d/%Y'), TO_STRING('2022-12-25', '%m/%d/%Y'), TO_TEXT('2022-12-25', '%m/%d/%Y'), TO_VARCHAR('2022-12-25', '%m/%d/%Y'), JSON_TO_STRING('2022-12-25', '%m/%d/%Y'); ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ date_format('2022-12-25', '%m/%d/%y') │ to_string('2022-12-25', '%m/%d/%y') │ to_text('2022-12-25', '%m/%d/%y') │ to_varchar('2022-12-25', '%m/%d/%y') │ json_to_string('2022-12-25', '%m/%d/%y') │ ├───────────────────────────────────────┼─────────────────────────────────────┼───────────────────────────────────┼──────────────────────────────────────┼──────────────────────────────────────────┤ │ 12/25/2022 │ 12/25/2022 │ 12/25/2022 │ 12/25/2022 │ 12/25/2022 │ └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ``` # TO_TEXT (Lakehouse v1) > TO_TEXT — alias for the TO_STRING conversion function. Alias for [TO\_STRING](../to-string). # TO_UINT16 (Lakehouse v1) > TO_UINT16 — Converts a value to UINT16 data type. Converts a value to UINT16 data type. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_uint16( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_uint16('123') ┌───────────────────────┐ │ func.to_uint16('123') │ ├───────────────────────┤ │ 123 │ └───────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_UINT16( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT TO_UINT16('123'); ┌──────────────────┐ │ to_uint16('123') │ ├──────────────────┤ │ 123 │ └──────────────────┘ ``` # TO_UINT32 (Lakehouse v1) > TO_UINT32 — Converts a value to UINT32 data type. Converts a value to UINT32 data type. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_uint32( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_uint32('123') ┌───────────────────────┐ │ func.to_uint32('123') │ ├───────────────────────┤ │ 123 │ └───────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_UINT32( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT TO_UINT32('123'); ┌──────────────────┐ │ to_uint32('123') │ ├──────────────────┤ │ 123 │ └──────────────────┘ ``` # TO_UINT64 (Lakehouse v1) > TO_UINT64 — Converts a value to UINT64 data type. Converts a value to UINT64 data type. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_uint64( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_uint64('123') ┌───────────────────────┐ │ func.to_uint64('123') │ ├───────────────────────┤ │ 123 │ └───────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_UINT64( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT TO_UINT64('123'); ┌──────────────────┐ │ to_uint64('123') │ ├──────────────────┤ │ 123 │ └──────────────────┘ ``` # TO_UINT8 (Lakehouse v1) > TO_UINT8 — Converts a value to UINT8 data type. Converts a value to UINT8 data type. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_uint8( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_uint8('123') ┌──────────────────────┐ │ func.to_uint8('123') │ ├──────────────────────┤ │ 123 │ └──────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_UINT8( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT TO_UINT8('123'); ┌─────────────────┐ │ to_uint8('123') │ ├─────────────────┤ │ 123 │ └─────────────────┘ ``` # TO_VARCHAR (Lakehouse v1) > TO_VARCHAR — alias for the TO_STRING conversion function. Alias for [TO\_STRING](../to-string). # TO_VARIANT (Lakehouse v1) > TO_VARIANT — Converts a value to VARIANT data type. Converts a value to VARIANT data type. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_variant( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_variant(to_bitmap('100,200,300')) ┌───────────────────────────────────────────┐ │ func.to_variant(to_bitmap('100,200,300')) │ ├───────────────────────────────────────────┤ │ [100,200,300] │ └───────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_VARIANT( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT TO_VARIANT(TO_BITMAP('100,200,300')); ┌──────────────────────────────────────┐ │ to_variant(to_bitmap('100,200,300')) │ ├──────────────────────────────────────┤ │ [100,200,300] │ └──────────────────────────────────────┘ ``` # TRY_CAST (Lakehouse v1) > TRY_CAST — Converts a value from one data type to another. Converts a value from one data type to another. Returns NULL on error. See also: [CAST](../cast) ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.try_cast( , ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.try_cast(1, string) ┌──────────────────────────┐ │ func.try_cast(1, string) │ ├──────────────────────────┤ │ 1 │ └──────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TRY_CAST( AS ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT TRY_CAST(1 AS VARCHAR); ┌───────────────────────┐ │ try_cast(1 as string) │ ├───────────────────────┤ │ 1 │ └───────────────────────┘ ``` # TRY_TO_BINARY (Lakehouse v1) > TRY_TO_BINARY — convert an expression to a binary value, returning NULL on failure instead of raising an error. An enhanced version of [TO\_BINARY](../to_binary) that converts an input expression to a binary value, returning `NULL` if the conversion fails instead of raising an error. See also: [TO\_BINARY](../to_binary) ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.try_to_binary( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.try_to_binary('Databend') ┌───────────────────────────────────────┐ │ func.try_to_binary('Databend') │ ├───────────────────────────────────────┤ │ 4461746162656E64 │ └───────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TRY_TO_BINARY( ) ``` ## Examples [Section titled “Examples”](#examples) This example successfully converts the JSON data to binary: ```sql SELECT TRY_TO_BINARY(PARSE_JSON('{"key":"value", "number":123}')) AS binary_variant_success; ┌──────────────────────────────────────────────────────────────────────────┐ │ binary_variant │ ├──────────────────────────────────────────────────────────────────────────┤ │ 40000002100000031000000610000005200000026B65796E756D62657276616C7565507B │ └──────────────────────────────────────────────────────────────────────────┘ ``` This example demonstrates that the function fails to convert when the input is `NULL`: ```sql SELECT TRY_TO_BINARY(PARSE_JSON(NULL)) AS binary_variant_invalid_json; ┌─────────────────────────────┐ │ binary_variant_invalid_json │ ├─────────────────────────────┤ │ NULL │ └─────────────────────────────┘ ``` # Conditional Functions (Lakehouse v1) > Lakehouse v1 SQL conditional functions: branch on values with IF, CASE, COALESCE, NULLIF, and related selectors. This section provides reference information for the conditional functions in PlaidCloud Lakehouse. # AND (Lakehouse v1) > AND — conditional AND operator. Conditional AND operator. Checks whether both conditions are true. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python and_([, ...]) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python and_( table.color == 'green', table.shape == 'circle', table.price >= 1.25 ) ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql AND ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT * FROM table WHERE table.color = 'green' AND table.shape = 'circle' AND table.price >= 1.25; ``` # [ NOT ] BETWEEN (Lakehouse v1) > [ NOT ] BETWEEN — true if a numeric, string, or date value lies inside (or outside) the given range. Returns `true` if the given numeric or string `` falls inside the defined lower and upper limits. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python table.column.between(, ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python table.column.between(0, 5) ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql [ NOT ] BETWEEN AND ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT 'true' WHERE 5 BETWEEN 0 AND 5; ┌────────┐ │ 'true' │ ├────────┤ │ true │ └────────┘ SELECT 'true' WHERE 'data' BETWEEN 'data' AND 'databendcloud'; ┌────────┐ │ 'true' │ ├────────┤ │ true │ └────────┘ ``` # CASE (Lakehouse v1) > CASE — handles IF/THEN logic. Handles IF/THEN logic. It is structured with at least one pair of `WHEN` and `THEN` statements. Every `CASE` statement must be concluded with the `END` keyword. The `ELSE` statement is optional, providing a way to capture values not explicitly specified in the `WHEN` and `THEN` statements. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```python case( (, ), (, ), [ ... ] [ else_=] ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ### A Simple Example [Section titled “A Simple Example”](#a-simple-example) This example returns a person’s name. It starts off searching to see if the first name column has a value (the “if”). If there is a value, concatenate the first name with the last name and return it (the “then”). If there isn’t a first name, then return the last name only (the “else”). ```python case( (table.first_name.is_not(None), func.concat(table.first_name, table.last_name)), else_=table.last_name ) ``` ### A More Complex Example With Multiple Conditions [Section titled “A More Complex Example With Multiple Conditions”](#a-more-complex-example-with-multiple-conditions) This example returns a price based on quantity. “If” the quantity in the order is more than 100, then give the customer the special price. If it doesn’t satisfy the first condition, go to the second. If the quantity is greater than 10 (11-100), then give the customer the bulk price. Otherwise give the customer the regular price. ```python case( (order_table.qty > 100, item_table.specialprice), (order_table.qty > 10, item_table.bulkprice), else_=item_table.regularprice ) ``` This example returns the first initial of the person’s first name. If the user’s name is wendy, return W. Otherwise if the user’s name is jack, return J. Otherwise return E. ```python case( (users_table.name == "wendy", "W"), (users_table.name == "jack", "J"), else_='E' ) ``` The above may also be written in shorthand as: ```python case( {"wendy": "W", "jack": "J"}, value=users_table.name, else_='E' ) ``` ### Other Examples [Section titled “Other Examples”](#other-examples) In this example is from a Table:Lookup step where we are updating the “dock\_final” column when the table1.dock\_final value is Null. ```python case( (table1.dock_final == Null, table2.dock_final), else_ = table1.dock_final ) ``` This example is from a Table:Lookup step where we are updating the “Marketing Channel” column when “Marketing Channel” in table1 is not ‘none’ or the “Serial Number” contains a ’\_’. ```python case( (get_column(table1, 'Marketing Channel') != 'none', get_column(table1, 'Marketing Channel')), (get_column(table1, 'Serial Number').contains('_'), get_column(table1, 'Marketing Channel')), (get_column(table2, 'Marketing Channel').is_not(Null), get_column(table2, 'Marketing Channel')), else_ = 'none' ) ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax-1) ```sql CASE WHEN THEN [ WHEN THEN ] [ ... ] [ ELSE ] END AS ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) This example categorizes employee salaries using a CASE statement, presenting details with a dynamically assigned column named “SalaryCategory”: ```sql -- Create a sample table CREATE TABLE Employee ( EmployeeID INT, FirstName VARCHAR(50), LastName VARCHAR(50), Salary INT ); -- Insert some sample data INSERT INTO Employee VALUES (1, 'John', 'Doe', 50000); INSERT INTO Employee VALUES (2, 'Jane', 'Smith', 60000); INSERT INTO Employee VALUES (3, 'Bob', 'Johnson', 75000); INSERT INTO Employee VALUES (4, 'Alice', 'Williams', 90000); -- Add a new column 'SalaryCategory' using CASE statement -- Categorize employees based on their salary SELECT EmployeeID, FirstName, LastName, Salary, CASE WHEN Salary < 60000 THEN 'Low' WHEN Salary >= 60000 AND Salary < 80000 THEN 'Medium' WHEN Salary >= 80000 THEN 'High' ELSE 'Unknown' END AS SalaryCategory FROM Employee; ┌──────────────────────────────────────────────────────────────────────────────────────────┐ │ employeeid │ firstname │ lastname │ salary │ salarycategory │ ├─────────────────┼──────────────────┼──────────────────┼─────────────────┼────────────────┤ │ 1 │ John │ Doe │ 50000 │ Low │ │ 2 │ Jane │ Smith │ 60000 │ Medium │ │ 4 │ Alice │ Williams │ 90000 │ High │ │ 3 │ Bob │ Johnson │ 75000 │ Medium │ └──────────────────────────────────────────────────────────────────────────────────────────┘ ``` # COALESCE (Lakehouse v1) > COALESCE — returns the first non-NULL expression within its arguments; if all arguments are. Returns the first non-NULL expression within its arguments; if all arguments are NULL, it returns NULL. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.coalesce([, ...]) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```sql func.coalesce(table.UOM, 'none', \n) func.coalesce(get_column(table2, 'TECHNOLOGY_RATE'), 0.0) func.coalesce(table_beta.adjusted_price, table_alpha.override_price, table_alpha.price) * table_beta.quantity_sold ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql COALESCE([, ...]) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT COALESCE(1), COALESCE(1, NULL), COALESCE(NULL, 1, 2); ┌────────────────────────────────────────────────────────┐ │ coalesce(1) │ coalesce(1, null) │ coalesce(null, 1, 2) │ ├─────────────┼───────────────────┼──────────────────────┤ │ 1 │ 1 │ 1 │ └────────────────────────────────────────────────────────┘ SELECT COALESCE('a'), COALESCE('a', NULL), COALESCE(NULL, 'a', 'b'); ┌────────────────────────────────────────────────────────────────┐ │ coalesce('a') │ coalesce('a', null) │ coalesce(null, 'a', 'b') │ ├───────────────┼─────────────────────┼──────────────────────────┤ │ a │ a │ a │ └────────────────────────────────────────────────────────────────┘ ``` # Comparison Methods (Lakehouse v1) > Comparison Methods — these comparison methods are available in Analyze expressions. These comparison methods are available in Analyze expressions. | Category     | Expression | Structure | Example | Description | | ---------------- | ---------- | ------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------- | | General Usage | > | > | table.column > 23 | Greater Than | | General Usage | < | < | table.column < 23 | Less Than | | General Usage | >= | >= | table.column >= 23 | Greater than or equal to | | General Usage | <= | <= | table.column <= 23 | Less than or equal to | | General Usage | == | == | table.column == 23 | Equal to | | General Usage | != | != | table.column != 23 | Not Equal to | | General Usage | and\_ | and\_() | and\_(table.a > 23, table.b == u’blue’) [Additional Examples](../and) | Creates an AND SQL condition | | General Usage | any\_ | any\_() | table.column.any((‘red’, ‘blue’, ‘yellow’)) | Applies the SQL ANY() condition to a column | | General Usage | between | between | table.column.between(23, 46) get\_column(table, ‘LAST\_CHANGED\_DATE’).between({start\_date}, {end\_date}) | Applies the SQL BETWEEN condition | | General Usage | contains | contains | table.column.contains(‘mno’) table.SOURCE\_SYSTEM.contains(‘TEST’) | Applies the SQL LIKE ’%%‘ | | General Usage | endswith | endswith | table.column.endswith(‘xyz’) table.Parent.endswith(':EBITX') table.PERIOD.endswith(“01”) | Applies the SQL LIKE ’%%‘ | | General Usage | FALSE | FALSE | FALSE | False, false, FALSE - Alias for Python False | | General Usage | ilike | ilike | table.column.ilike(‘%foobar%‘) | Applies the SQL ILIKE method | | General Usage | in\_ | in\_() | table.column.in\_((1, 2, 3)) get\_column(table, ‘Source Country’).in\_(\[‘CN’,‘SG’,‘BR’]) table.MONTH.in\_(\[‘01’,‘02’,‘03’,‘04’,‘05’,‘06’,‘07’,‘08’,‘09’]) | Test if values are with a tuple of values | | General Usage | is\_ | is\_ | table.column.is\_(None) get\_column(table, ‘Min SafetyStock’).is\_(None) get\_column(table, ‘date\_pod’).is\_(None) | Applies the SQL is the IS for things like IS NULL | | General Usage | isnot | isnot | table.column.isnot(None) | Applies the SQL is the IS for things like IS NOT NULL | | General Usage | like | like | table.column.like(‘%foobar%‘) table.SOURCE\_SYSTEM.like(‘%Adjustments%‘) | Applies the SQL LIKE method | | General Usage | not\_ | not\_() | not\_(and\_(table.a > 23, table.b == u’blue’)) | Inverts the condition | | General Usage | notilike | notilike | table.column.notilike(‘%foobar%‘) | Applies the SQL NOT ILIKE method | | General Usage | notin | notin | table.column.notin((1, 2, 3)) table.LE.notin\_(\[‘12345’,‘67890’]) | Inverts the IN condition | | General Usage | notlike | notlike | table.column.notlike(‘%foobar%‘) | Applies the SQL NOT LIKE method | | General Usage | NULL | NULL | NULL | Null, null, NULL - Alias for Python None | | General Usage | or\_ | or\_() | or\_(table.a > 23, table.b == u’blue’) [Additional Examples](../or) | Creates an OR SQL condition | | General Usage | startswith | startswith | table.column.startswith(‘abc’) get\_column(table, ‘Zip Code’).startswith(‘9’) get\_column(table1, ‘GL Account’).startswith(‘CORP’) | Applies the SQL LIKE ’%‘ | | General Usage | TRUE | TRUE | TRUE | True, true, TRUE - Alias for Python True | | Math Expressions | + | + | + | 2+3=5 | | Math Expressions | – | – | - | 2–3=-1 | | Math Expressions | \* | \* | \* | 2\*3=6 | | Math Expressions | / | / | / | 4/2=2 | | Math Expressions | column.op | column.op(operator) | column.op(’%‘) | 5%4=1 | | Math Expressions | column.op | column.op(operator) | column.op(’^‘) | 2.0^3.0=8 | | Math Expressions | column.op | column.op(operator) | column.op(’!‘) | 5!=120 | | Math Expressions | column.op | column.op(operator) | column.op(’!!’) | !!5=120 | | Math Expressions | column.op | column.op(operator) | column.op(’@’) | @-5.0=5 | | Math Expressions | column.op | column.op(operator) | column.op(’&‘) | 91&15=11 | | Math Expressions | column.op | column.op(operator) | column.op(’#‘) | 17##5=20 | | Math Expressions | column.op | column.op(operator) | column.op(’\~’) | \~1=-2 | | Math Expressions | column.op | column.op(operator) | column.op(’<<‘) | 1<<4=16 | | Math Expressions | column.op | column.op(operator) | column.op(’>>‘) | 8>>2=2 | # ERROR_OR (Lakehouse v1) > ERROR_OR — Returns the first non-error expression among its inputs. Returns the first non-error expression among its inputs. If all expressions result in errors, it returns NULL. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.error_or(expr1, expr2, ...) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python # Returns the valid date if no errors occur # Returns the current date if the conversion results in an error func.now(), func.error_or(func.to_date('2024-12-25'), func.now()) ┌──────────────────────────────────────────────────────────────────────────────────────────┐ │ func.now() │ func.error_or(func.to_date('2024-12-25'), func.now()) │ ├─────────────────────────────────┼────────────────────────────────────────────────────────┤ │ 2024-03-18 01:22:39.460320 │ 2024-12-25 │ └──────────────────────────────────────────────────────────────────────────────────────────┘ # Returns NULL because the conversion results in an error func.error_or(func.to_date('2024-1234')) ┌────────────────────────────────────────────┐ │ func.error_or(func.to_date('2024-1234')) │ ├────────────────────────────────────────────┤ │ NULL │ └────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ERROR_OR(expr1, expr2, ...) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql -- Returns the valid date if no errors occur -- Returns the current date if the conversion results in an error SELECT NOW(), ERROR_OR('2024-12-25'::DATE, NOW()::DATE); ┌────────────────────────────────────────────────────────────────────────┐ │ now() │ error_or('2024-12-25'::date, now()::date) │ ├────────────────────────────┼───────────────────────────────────────────┤ │ 2024-03-18 01:22:39.460320 │ 2024-12-25 │ └────────────────────────────────────────────────────────────────────────┘ -- Returns NULL because the conversion results in an error SELECT ERROR_OR('2024-1234'::DATE); ┌─────────────────────────────┐ │ error_or('2024-1234'::date) │ ├─────────────────────────────┤ │ NULL │ └─────────────────────────────┘ ``` # GREATEST (Lakehouse v1) > GREATEST — Returns the maximum value from a set of values. Returns the maximum value from a set of values. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.greatest(, ...) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.greatest((5, 9, 4)) ┌──────────────────────────┐ │ func.greatest((5, 9, 4)) │ ├──────────────────────────┤ │ 9 │ └──────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql GREATEST(, ...) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT GREATEST(5, 9, 4); ┌───────────────────┐ │ greatest(5, 9, 4) │ ├───────────────────┤ │ 9 │ └───────────────────┘ ``` # IF (Lakehouse v1) > IF — if is TRUE, it returns . If `` is TRUE, it returns ``. Otherwise if `` is TRUE, it returns ``, and so on. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.if(, , [, ...], ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.if((1 > 2), 3, (4 < 5), 6, 7) ┌────────────────────────────────────┐ │ func.if((1 > 2), 3, (4 < 5), 6, 7) │ ├────────────────────────────────────┤ │ 6 │ └────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql IF(, , [, ...], ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT IF(1 > 2, 3, 4 < 5, 6, 7); ┌───────────────────────────────┐ │ if((1 > 2), 3, (4 < 5), 6, 7) │ ├───────────────────────────────┤ │ 6 │ └───────────────────────────────┘ ``` # IFNULL (Lakehouse v1) > IFNULL — if is NULL, returns , otherwise returns . If `` is NULL, returns ``, otherwise returns ``. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.ifnull(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.ifnull(null, 'b'), func.ifnull('a', 'b') ┌────────────────────────────────────────────────┐ │ func.ifnull(null, 'b') │ func.ifnull('a', 'b') │ ├────────────────────────┼───────────────────────┤ │ b │ a │ └────────────────────────────────────────────────┘ func.ifnull(null, 2), func.ifnull(1, 2) ┌──────────────────────────────────────────┐ │ func.ifnull(null, 2) │ func.ifnull(1, 2) │ ├──────────────────────┼───────────────────┤ │ 2 │ 1 │ └──────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql IFNULL(, ) ``` ## Aliases [Section titled “Aliases”](#aliases) * [NVL](../nvl) ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT IFNULL(NULL, 'b'), IFNULL('a', 'b'); ┌──────────────────────────────────────┐ │ ifnull(null, 'b') │ ifnull('a', 'b') │ ├───────────────────┼──────────────────┤ │ b │ a │ └──────────────────────────────────────┘ SELECT IFNULL(NULL, 2), IFNULL(1, 2); ┌────────────────────────────────┐ │ ifnull(null, 2) │ ifnull(1, 2) │ ├─────────────────┼──────────────┤ │ 2 │ 1 │ └────────────────────────────────┘ ``` # [ NOT ] IN (Lakehouse v1) > [ NOT ] IN — true if a value equals (or does not equal) any item in an explicit list. Checks whether a value is (or is not) in an explicit list. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python table.columns.in_((, ...)) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python table.columns.in_((, ...)) ┌──────────────────────────┐ │ table.column.in_((2, 3)) │ ├──────────────────────────┤ │ true │ └──────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql [ NOT ] IN (, ...) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT 1 NOT IN (2, 3); ┌────────────────┐ │ 1 not in(2, 3) │ ├────────────────┤ │ true │ └────────────────┘ ``` # IS [ NOT ] DISTINCT FROM (Lakehouse v1) > IS [ NOT ] DISTINCT FROM — compares whether two expressions are equal (or not equal) with awareness of nullability, meaning it treats NULLs as known values. Compares whether two expressions are equal (or not equal) with awareness of nullability, meaning it treats NULLs as known values for comparing equality. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql IS [ NOT ] DISTINCT FROM ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT NULL IS DISTINCT FROM NULL; ┌────────────────────────────┐ │ null is distinct from null │ ├────────────────────────────┤ │ false │ └────────────────────────────┘ ``` # IS_ERROR (Lakehouse v1) > IS_ERROR — returns a Boolean value indicating whether an expression is an error value. Returns a Boolean value indicating whether an expression is an error value. See also: [IS\_NOT\_ERROR](../is-not-error) ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.is_error( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python # Indicates division by zero, hence an error func.is_error((1 / 0)), func.is_not_error((1 / 0)) ┌─────────────────────────────────────────────────────┐ │ func.is_error((1 / 0)) │ func.is_not_error((1 / 0)) │ ├────────────────────────┼────────────────────────────┤ │ true │ false │ └─────────────────────────────────────────────────────┘ # The conversion to DATE is successful, hence not an error func.is_error(func.to_date('2024-03-17')), func.is_not_error(func.to_date('2024-03-17')) ┌───────────────────────────────────────────────────────────────────────────────────────────┐ │ func.is_error(func.to_date('2024-03-17')) │ func.is_not_error(func.to_date('2024-03-17')) │ ├───────────────────────────────────────────┼───────────────────────────────────────────────┤ │ false │ true │ └───────────────────────────────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql IS_ERROR( ) ``` ## Return Type [Section titled “Return Type”](#return-type) Returns `true` if the expression is an error, otherwise `false`. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql -- Indicates division by zero, hence an error SELECT IS_ERROR(1/0), IS_NOT_ERROR(1/0); ┌───────────────────────────────────────────┐ │ is_error((1 / 0)) │ is_not_error((1 / 0)) │ ├───────────────────┼───────────────────────┤ │ true │ false │ └───────────────────────────────────────────┘ -- The conversion to DATE is successful, hence not an error SELECT IS_ERROR('2024-03-17'::DATE), IS_NOT_ERROR('2024-03-17'::DATE); ┌─────────────────────────────────────────────────────────────────┐ │ is_error('2024-03-17'::date) │ is_not_error('2024-03-17'::date) │ ├──────────────────────────────┼──────────────────────────────────┤ │ false │ true │ └─────────────────────────────────────────────────────────────────┘ ``` # IS_NOT_ERROR (Lakehouse v1) > IS_NOT_ERROR — returns a Boolean value indicating whether an expression is an error value. Returns a Boolean value indicating whether an expression is an error value. See also: [IS\_ERROR](../is-error) ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.is_error( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python # Indicates division by zero, hence an error func.is_error((1 / 0)), func.is_not_error((1 / 0)) ┌─────────────────────────────────────────────────────┐ │ func.is_error((1 / 0)) │ func.is_not_error((1 / 0)) │ ├────────────────────────┼────────────────────────────┤ │ true │ false │ └─────────────────────────────────────────────────────┘ # The conversion to DATE is successful, hence not an error func.is_error(func.to_date('2024-03-17')), func.is_not_error(func.to_date('2024-03-17')) ┌───────────────────────────────────────────────────────────────────────────────────────────┐ │ func.is_error(func.to_date('2024-03-17')) │ func.is_not_error(func.to_date('2024-03-17')) │ ├───────────────────────────────────────────┼───────────────────────────────────────────────┤ │ false │ true │ └───────────────────────────────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql IS_NOT_ERROR( ) ``` ## Return Type [Section titled “Return Type”](#return-type) Returns `true` if the expression is not an error, otherwise `false`. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql -- Indicates division by zero, hence an error SELECT IS_ERROR(1/0), IS_NOT_ERROR(1/0); ┌───────────────────────────────────────────┐ │ is_error((1 / 0)) │ is_not_error((1 / 0)) │ ├───────────────────┼───────────────────────┤ │ true │ false │ └───────────────────────────────────────────┘ -- The conversion to DATE is successful, hence not an error SELECT IS_ERROR('2024-03-17'::DATE), IS_NOT_ERROR('2024-03-17'::DATE); ┌─────────────────────────────────────────────────────────────────┐ │ is_error('2024-03-17'::date) │ is_not_error('2024-03-17'::date) │ ├──────────────────────────────┼──────────────────────────────────┤ │ false │ true │ └─────────────────────────────────────────────────────────────────┘ ``` # IS_NOT_NULL (Lakehouse v1) > IS_NOT_NULL — Checks whether a value is not NULL. Checks whether a value is not NULL. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.is_not_null() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.is_not_null(1) ┌─────────────────────┐ │ func.is_not_null(1) │ ├─────────────────────┤ │ true │ └─────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql IS_NOT_NULL() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT IS_NOT_NULL(1); ┌────────────────┐ │ is_not_null(1) │ ├────────────────┤ │ true │ └────────────────┘ ``` # IS_NULL (Lakehouse v1) > IS_NULL — checks whether a value is NULL. Checks whether a value is NULL. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.is_null() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.is_null(1) ┌─────────────────┐ │ func.is_null(1) │ ├─────────────────┤ │ false │ └─────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql IS_NULL() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT IS_NULL(1); ┌────────────┐ │ is_null(1) │ ├────────────┤ │ false │ └────────────┘ ``` # LEAST (Lakehouse v1) > LEAST — Returns the minimum value from a set of values. Returns the minimum value from a set of values. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.least((, ...)) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.least((5, 9, 4)) ┌───────────────────────┐ │ func.least((5, 9, 4)) │ ├───────────────────────┤ │ 4 │ └───────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql LEAST(, ...) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT LEAST(5, 9, 4); ┌────────────────┐ │ least(5, 9, 4) │ ├────────────────┤ │ 4 │ └────────────────┘ ``` # NULLIF (Lakehouse v1) > NULLIF — Returns NULL if two expressions are equal. Returns NULL if two expressions are equal. Otherwise return expr1. They must have the same data type. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.nullif(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.nullif(0, null) ┌──────────────────────┐ │ func.nullif(0, null) │ ├──────────────────────┤ │ 0 │ └──────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql NULLIF(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT NULLIF(0, NULL); ┌─────────────────┐ │ nullif(0, null) │ ├─────────────────┤ │ 0 │ └─────────────────┘ ``` # NVL (Lakehouse v1) > NVL — If is NULL, returns , otherwise returns . If `` is NULL, returns ``, otherwise returns ``. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.nvl(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.nvl(null, 'b'), func.nvl('a', 'b') ┌──────────────────────────────────────────┐ │ func.nvl(null, 'b') │ func.nvl('a', 'b') │ ├─────────────────────┼────────────────────┤ │ b │ a │ └──────────────────────────────────────────┘ func.nvl(null, 2), func.nvl(1, 2) ┌────────────────────────────────────┐ │ func.nvl(null, 2) │ func.nvl(1, 2) │ ├───────────────────┼────────────────┤ │ 2 │ 1 │ └────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql NVL(, ) ``` ## Aliases [Section titled “Aliases”](#aliases) * [IFNULL](../ifnull) ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT NVL(NULL, 'b'), NVL('a', 'b'); ┌────────────────────────────────┐ │ nvl(null, 'b') │ nvl('a', 'b') │ ├────────────────┼───────────────┤ │ b │ a │ └────────────────────────────────┘ SELECT NVL(NULL, 2), NVL(1, 2); ┌──────────────────────────┐ │ nvl(null, 2) │ nvl(1, 2) │ ├──────────────┼───────────┤ │ 2 │ 1 │ └──────────────────────────┘ ``` # NVL2 (Lakehouse v1) > NVL2 — returns if is not NULL; otherwise, it returns . Returns `` if `` is not NULL; otherwise, it returns ``. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.nvl2( , , ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.nvl2('a', 'b', 'c'), func.nvl2(null, 'b', 'c') ┌──────────────────────────────────────────────────────┐ │ func.nvl2('a', 'b', 'c') │ func.nvl2(null, 'b', 'c') │ ├──────────────────────────┼───────────────────────────┤ │ b │ c │ └──────────────────────────────────────────────────────┘ func.nvl2(1, 2, 3), func.nvl2(null, 2, 3) ┌────────────────────────────────────────────┐ │ func.nvl2(1, 2, 3) │ func.nvl2(null, 2, 3) │ ├────────────────────┼───────────────────────┤ │ 2 │ 3 │ └────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql NVL2( , , ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT NVL2('a', 'b', 'c'), NVL2(NULL, 'b', 'c'); ┌────────────────────────────────────────────┐ │ nvl2('a', 'b', 'c') │ nvl2(null, 'b', 'c') │ ├─────────────────────┼──────────────────────┤ │ b │ c │ └────────────────────────────────────────────┘ SELECT NVL2(1, 2, 3), NVL2(NULL, 2, 3); ┌──────────────────────────────────┐ │ nvl2(1, 2, 3) │ nvl2(null, 2, 3) │ ├───────────────┼──────────────────┤ │ 2 │ 3 │ └──────────────────────────────────┘ ``` # OR (Lakehouse v1) > OR — conditional OR operator. Conditional OR operator. Checks whether either condition is true. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python or_([, ...]) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python or_( table.color == 'green', table.shape == 'circle', table.price >= 1.25 ) ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql OR ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT * FROM table WHERE table.color = 'green' OR table.shape = 'circle' OR table.price >= 1.25; ``` # Numeric Functions (Lakehouse v1) > Lakehouse v1 SQL numeric functions: arithmetic, rounding, log, power, and trigonometric helpers for numbers. This section provides reference information for the numeric functions in PlaidCloud Lakehouse. # ABS (Lakehouse v1) > ABS — returns the absolute value of x. Returns the absolute value of `x`. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.abs( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.abs((- 5)) ┌─────────────────┐ │ func.abs((- 5)) │ ├─────────────────┤ │ 5 │ └─────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ABS( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ABS(-5); ┌────────────┐ │ abs((- 5)) │ ├────────────┤ │ 5 │ └────────────┘ ``` # ACOS (Lakehouse v1) > ACOS — Returns the arc cosine of x, that is, the value whose cosine is x. Returns the arc cosine of `x`, that is, the value whose cosine is `x`. Returns NULL if `x` is not in the range -1 to 1. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.abs( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.abs(1) ┌──────────────┐ │ func.acos(1) │ ├──────────────┤ │ 0 │ └──────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ACOS( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ACOS(1); ┌─────────┐ │ acos(1) │ ├─────────┤ │ 0 │ └─────────┘ ``` # ADD (Lakehouse v1) > ADD — Alias for PLUS. Adds two numeric values together. Alias for [PLUS](../plus). # ASIN (Lakehouse v1) > ASIN — Returns the arc sine of x, that is, the value whose sine is x. Returns the arc sine of `x`, that is, the value whose sine is `x`. Returns NULL if `x` is not in the range -1 to 1. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.asin( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.asin(0.2) ┌────────────────────┐ │ func.asin(0.2) │ ├────────────────────┤ │ 0.2013579207903308 │ └────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ASIN( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ASIN(0.2); ┌────────────────────┐ │ asin(0.2) │ ├────────────────────┤ │ 0.2013579207903308 │ └────────────────────┘ ``` # ATAN (Lakehouse v1) > ATAN — returns the arc tangent of x, that is, the value whose tangent is x. Returns the arc tangent of `x`, that is, the value whose tangent is `x`. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.atan( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.atan(-2) ┌─────────────────────┐ │ func.atan((- 2)) │ ├─────────────────────┤ │ -1.1071487177940906 │ └─────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ATAN( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ATAN(-2); ┌─────────────────────┐ │ atan((- 2)) │ ├─────────────────────┤ │ -1.1071487177940906 │ └─────────────────────┘ ``` # ATAN2 (Lakehouse v1) > ATAN2 — Returns the arc tangent of the two variables x and y. Returns the arc tangent of the two variables `x` and `y`. It is similar to calculating the arc tangent of `y` / `x`, except that the signs of both arguments are used to determine the quadrant of the result. `ATAN(y, x)` is a synonym for `ATAN2(y, x)`. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.atan2( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.atan2((- 2), 2) ┌─────────────────────┐ │ func.atan2((- 2), 2)│ ├─────────────────────┤ │ -0.7853981633974483 │ └─────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ATAN2( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ATAN2(-2, 2); ┌─────────────────────┐ │ atan2((- 2), 2) │ ├─────────────────────┤ │ -0.7853981633974483 │ └─────────────────────┘ ``` # CBRT (Lakehouse v1) > CBRT — Returns the cube root of a nonnegative number x. Returns the cube root of a nonnegative number `x`. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.cbrt( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.cbrt(27) ┌───────────────┐ │ func.cbrt(27) │ ├───────────────┤ │ 3 │ └───────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql CBRT( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT CBRT(27); ┌──────────┐ │ cbrt(27) │ ├──────────┤ │ 3 │ └──────────┘ ``` # CEIL (Lakehouse v1) > CEIL — rounds the number up. Rounds the number up. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.ceil( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.ceil((- 1.23)) ┌─────────────────────┐ │ func.ceil((- 1.23)) │ ├─────────────────────┤ │ -1 │ └─────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql CEIL( ) ``` ## Aliases [Section titled “Aliases”](#aliases) * [CEILING](../ceiling) ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT CEILING(-1.23), CEIL(-1.23); ┌────────────────────────────────────┐ │ ceiling((- 1.23)) │ ceil((- 1.23)) │ ├───────────────────┼────────────────┤ │ -1 │ -1 │ └────────────────────────────────────┘ ``` # CEILING (Lakehouse v1) > CEILING — alias for the CEIL numeric function. Alias for [CEIL](../ceil). # COS (Lakehouse v1) > COS — Returns the cosine of x, where x is given in radians. Returns the cosine of `x`, where `x` is given in radians. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.cos( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.cos(func.pi()) ┌─────────────────────┐ │ func.cos(func.pi()) │ ├─────────────────────┤ │ -1 │ └─────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql COS( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT COS(PI()); ┌───────────┐ │ cos(pi()) │ ├───────────┤ │ -1 │ └───────────┘ ``` # COT (Lakehouse v1) > COT — Returns the cotangent of x, where x is given in radians. Returns the cotangent of `x`, where `x` is given in radians. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.cot( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.cot(12) ┌─────────────────────┐ │ func.cot(12) │ ├─────────────────────┤ │ -1.5726734063976895 │ └─────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql COT( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT COT(12); ┌─────────────────────┐ │ cot(12) │ ├─────────────────────┤ │ -1.5726734063976895 │ └─────────────────────┘ ``` # CRC32 (Lakehouse v1) > CRC32 — returns the CRC32 checksum of x, where 'x' is expected to be a string and (if possible). Returns the CRC32 checksum of `x`, where ‘x’ is expected to be a string and (if possible) is treated as one if it is not. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.crc32( '' ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.crc32('databend') ┌────────────────────────┐ │ func.crc32('databend') │ ├────────────────────────┤ │ 1177678456 │ └────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql CRC32( '' ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT CRC32('databend'); ┌───────────────────┐ │ crc32('databend') │ ├───────────────────┤ │ 1177678456 │ └───────────────────┘ ``` # DEGREES (Lakehouse v1) > DEGREES — returns the argument x, converted from radians to degrees, where x is given in radians. Returns the argument `x`, converted from radians to degrees, where `x` is given in radians. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.degrees( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.degrees(func.pi()) ┌─────────────────────────┐ │ func.degrees(func.pi()) │ ├─────────────────────────┤ │ 180 │ └─────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql DEGREES( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT DEGREES(PI()); ┌───────────────┐ │ degrees(pi()) │ ├───────────────┤ │ 180 │ └───────────────┘ ``` # DIV (Lakehouse v1) > DIV — returns the quotient by dividing the first number by the second one, rounding down to the closest smaller integer. Returns the quotient by dividing the first number by the second one, rounding down to the closest smaller integer. Equivalent to the division operator `//`. See also: * [DIV0](../div0) * [DIVNULL](../divnull) ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```python func.div(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python # Equivalent to the division operator "//" func.div(6.1, 2) ┌───────────────────────────────┐ │ func.div(6.1, 2) │ (6.1 // 2) │ ├──────────────────┼────────────┤ │ 3 │ 3 │ └───────────────────────────────┘ # Error when divided by 0 error: APIError: ResponseError with 1006: divided by zero while evaluating function `div(6.1, 0)` ``` ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```sql DIV ``` ## Aliases [Section titled “Aliases”](#aliases) * [INTDIV](../intdiv) ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql -- Equivalent to the division operator "//" SELECT 6.1 DIV 2, 6.1//2; ┌──────────────────────────┐ │ (6.1 div 2) │ (6.1 // 2) │ ├─────────────┼────────────┤ │ 3 │ 3 │ └──────────────────────────┘ SELECT 6.1 DIV 2, INTDIV(6.1, 2), 6.1 DIV NULL; ┌───────────────────────────────────────────────┐ │ (6.1 div 2) │ intdiv(6.1, 2) │ (6.1 div null) │ ├─────────────┼────────────────┼────────────────┤ │ 3 │ 3 │ NULL │ └───────────────────────────────────────────────┘ -- Error when divided by 0 root@localhost:8000/default> SELECT 6.1 DIV 0; error: APIError: ResponseError with 1006: divided by zero while evaluating function `div(6.1, 0)` ``` # DIV0 (Lakehouse v1) > DIV0 — returns the quotient by dividing the first number by the second one. Returns the quotient by dividing the first number by the second one. Returns 0 if the second number is 0. See also: * [DIV](../div) * [DIVNULL](../divnull) ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.div0(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.div0(20, 6), func.div0(20, 0), func.div0(20, null) ┌─────────────────────────────────────────────────────────────┐ │ func.div0(20, 6) │ func.div0(20, 0) │ func.div0(20, null) │ ├────────────────────┼──────────────────┼─────────────────────┤ │ 3.3333333333333335 │ 0 │ NULL │ └─────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql DIV0(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT DIV0(20, 6), DIV0(20, 0), DIV0(20, NULL); ┌───────────────────────────────────────────────────┐ │ div0(20, 6) │ div0(20, 0) │ div0(20, null) │ ├────────────────────┼─────────────┼────────────────┤ │ 3.3333333333333335 │ 0 │ NULL │ └───────────────────────────────────────────────────┘ ``` # DIVNULL (Lakehouse v1) > DIVNULL — returns the quotient by dividing the first number by the second one. Returns the quotient by dividing the first number by the second one. Returns NULL if the second number is 0 or NULL. See also: * [DIV](../div) * [DIV0](../div0) ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.divnull(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.divnull(20, 6), func.divnull(20, 0), func.divnull(20, null) ┌───────────────────────────────────────────────────────────────────┐ │ func.divnull(20, 6)│ func.divnull(20, 0) │ func.divnull(20, null) │ ├────────────────────┼─────────────────────┼────────────────────────┤ │ 3.3333333333333335 │ NULL │ NULL │ └───────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql DIVNULL(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT DIVNULL(20, 6), DIVNULL(20, 0), DIVNULL(20, NULL); ┌─────────────────────────────────────────────────────────┐ │ divnull(20, 6) │ divnull(20, 0) │ divnull(20, null) │ ├────────────────────┼────────────────┼───────────────────┤ │ 3.3333333333333335 │ NULL │ NULL │ └─────────────────────────────────────────────────────────┘ ``` # EXP (Lakehouse v1) > EXP — returns the value of e (the base of natural logarithms) raised to the power of x. Returns the value of e (the base of natural logarithms) raised to the power of `x`. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.exp( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.exp(2) ┌──────────────────┐ │ func.exp(2) │ ├──────────────────┤ │ 7.38905609893065 │ └──────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql EXP( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT EXP(2); ┌──────────────────┐ │ exp(2) │ ├──────────────────┤ │ 7.38905609893065 │ └──────────────────┘ ``` # FACTORIAL (Lakehouse v1) > FACTORIAL — Returns the factorial logarithm of x. Returns the factorial logarithm of `x`. If `x` is less than or equal to 0, the function returns 0. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.factorial( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.factorial(5) ┌───────────────────┐ │ func.factorial(5) │ ├───────────────────┤ │ 120 │ └───────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql FACTORIAL( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT FACTORIAL(5); ┌──────────────┐ │ factorial(5) │ ├──────────────┤ │ 120 │ └──────────────┘ ``` # FLOOR (Lakehouse v1) > FLOOR — rounds the number down. Rounds the number down. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.floor( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.floor(1.23) ┌──────────────────┐ │ func.floor(1.23) │ ├──────────────────┤ │ 1 │ └──────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql FLOOR( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT FLOOR(1.23); ┌─────────────┐ │ floor(1.23) │ ├─────────────┤ │ 1 │ └─────────────┘ ``` # INTDIV (Lakehouse v1) > INTDIV — alias for the DIV numeric function. Alias for [DIV](../div). # LN (Lakehouse v1) > LN — returns the natural logarithm of x; that is, the base-e logarithm of x. Returns the natural logarithm of `x`; that is, the base-e logarithm of `x`. If x is less than or equal to 0.0E0, the function returns NULL. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.ln( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.ln(2) ┌────────────────────┐ │ func.ln(2) │ ├────────────────────┤ │ 0.6931471805599453 │ └────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql LN( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT LN(2); ┌────────────────────┐ │ ln(2) │ ├────────────────────┤ │ 0.6931471805599453 │ └────────────────────┘ ``` # LOG10 (Lakehouse v1) > LOG10 — Returns the base-10 logarithm of x. Reference. Returns the base-10 logarithm of `x`. If `x` is less than or equal to 0.0E0, the function returns NULL. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.log10( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.log10(100) ┌─────────────────┐ │ func.log10(100) │ ├─────────────────┤ │ 2 │ └─────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql LOG10( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT LOG10(100); ┌────────────┐ │ log10(100) │ ├────────────┤ │ 2 │ └────────────┘ ``` # LOG2 (Lakehouse v1) > LOG2 — returns the base-2 logarithm of x. Returns the base-2 logarithm of `x`. If `x` is less than or equal to 0.0E0, the function returns NULL. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.log2( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.log2(65536) ┌──────────────────┐ │ func.log2(65536) │ ├──────────────────┤ │ 16 │ └──────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql LOG2( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT LOG2(65536); ┌─────────────┐ │ log2(65536) │ ├─────────────┤ │ 16 │ └─────────────┘ ``` # LOG(b, x) (Lakehouse v1) > LOG(b, x) — returns the base-b logarithm of x. Returns the base-b logarithm of `x`. If `x` is less than or equal to 0.0E0, the function returns NULL. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.log( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.log(2, 65536) ┌────────────────────┐ │ func.log(2, 65536) │ ├────────────────────┤ │ 16 │ └────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql LOG( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT LOG(2, 65536); ┌───────────────┐ │ log(2, 65536) │ ├───────────────┤ │ 16 │ └───────────────┘ ``` # LOG(x) (Lakehouse v1) > LOG(x) — returns the natural logarithm of x. Returns the natural logarithm of `x`. If x is less than or equal to 0.0E0, the function returns NULL. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.log( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.log(2) ┌────────────────────┐ │ func.log(2) │ ├────────────────────┤ │ 0.6931471805599453 │ └────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql LOG( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT LOG(2); ┌────────────────────┐ │ log(2) │ ├────────────────────┤ │ 0.6931471805599453 │ └────────────────────┘ ``` # MINUS (Lakehouse v1) > MINUS — negates a numeric value. Negates a numeric value. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.minus( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.minus(func.pi()) ┌─────────────────────────┐ │ func.minus(func.pi()) │ ├─────────────────────────┤ │ -3.141592653589793 │ └─────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql MINUS( ) ``` ## Aliases [Section titled “Aliases”](#aliases) * [NEG](../neg) * [NEGATE](../negate) * [SUBTRACT](../subtract) ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT MINUS(PI()), NEG(PI()), NEGATE(PI()), SUBTRACT(PI()); ┌───────────────────────────────────────────────────────────────────────────────────┐ │ minus(pi()) │ neg(pi()) │ negate(pi()) │ subtract(pi()) │ ├────────────────────┼────────────────────┼────────────────────┼────────────────────┤ │ -3.141592653589793 │ -3.141592653589793 │ -3.141592653589793 │ -3.141592653589793 │ └───────────────────────────────────────────────────────────────────────────────────┘ ``` # MOD (Lakehouse v1) > MOD — alias for the MODULO numeric function. Alias for [MODULO](../modulo). # MODULO (Lakehouse v1) > MODULO — Returns the remainder of x divided by y. Returns the remainder of `x` divided by `y`. If `y` is 0, it returns an error. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.modulo( , ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.modulo(9, 2) ┌───────────────────┐ │ func.modulo(9, 2) │ ├───────────────────┤ │ 1 │ └───────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql MODULO( , ) ``` ## Aliases [Section titled “Aliases”](#aliases) * [MOD](../mod) ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT MOD(9, 2), MODULO(9, 2); ┌──────────────────────────┐ │ mod(9, 2) │ modulo(9, 2) │ ├───────────┼──────────────┤ │ 1 │ 1 │ └──────────────────────────┘ ``` # NEG (Lakehouse v1) > NEG — alias for the MINUS numeric function. Alias for [MINUS](../minus). # NEGATE (Lakehouse v1) > NEGATE — alias for the MINUS numeric function. Alias for [MINUS](../minus). # PI (Lakehouse v1) > PI — Returns the value of π as a floating-point value. Returns the value of π as a floating-point value. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.pi() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.pi() ┌───────────────────┐ │ func.pi() │ ├───────────────────┤ │ 3.141592653589793 │ └───────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql PI() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT PI(); ┌───────────────────┐ │ pi() │ ├───────────────────┤ │ 3.141592653589793 │ └───────────────────┘ ``` # PLUS (Lakehouse v1) > PLUS — Calculates the sum of two numeric or decimal values. Calculates the sum of two numeric or decimal values. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.plus(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.plus(1, 2.3) ┌────────────────────┐ │ func.plus(1, 2.3) │ ├────────────────────┤ │ 3.3 │ └────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql PLUS(, ) ``` ## Aliases [Section titled “Aliases”](#aliases) * [ADD](../add) ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ADD(1, 2.3), PLUS(1, 2.3); ┌───────────────────────────────┐ │ add(1, 2.3) │ plus(1, 2.3) │ ├───────────────┼───────────────┤ │ 3.3 │ 3.3 │ └───────────────────────────────┘ ``` # POW (Lakehouse v1) > POW — returns the value of x to the power of y. Returns the value of `x` to the power of `y`. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.pow( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.pow(-2, 2) ┌────────────────────┐ │ func.pow((- 2), 2) │ ├────────────────────┤ │ 4 │ └────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql POW( ) ``` ## Aliases [Section titled “Aliases”](#aliases) * [POWER](../power) ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT POW(-2, 2), POWER(-2, 2); ┌─────────────────────────────────┐ │ pow((- 2), 2) │ power((- 2), 2) │ ├───────────────┼─────────────────┤ │ 4 │ 4 │ └─────────────────────────────────┘ ``` # POWER (Lakehouse v1) > POWER — alias for the POW numeric function. Alias for [POW](../pow). # RADIANS (Lakehouse v1) > RADIANS — Returns the argument x, converted from degrees to radians. Returns the argument `x`, converted from degrees to radians. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.radians( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.radians(90) ┌────────────────────┐ │ func.radians(90) │ ├────────────────────┤ │ 1.5707963267948966 │ └────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql RADIANS( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT RADIANS(90); ┌────────────────────┐ │ radians(90) │ ├────────────────────┤ │ 1.5707963267948966 │ └────────────────────┘ ``` # RAND() (Lakehouse v1) > RAND() — returns a random floating-point value v in the range 0 <= v < 1. Returns a random floating-point value v in the range 0 <= v < 1.0. To obtain a random integer R in the range i <= R < j, use the expression FLOOR(i + RAND() \* (j − i)). ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.rand() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.rand() ┌────────────────────┐ │ func.rand() │ ├────────────────────┤ │ 0.5191511074382174 │ └────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql RAND() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT RAND(); ┌────────────────────┐ │ rand() │ ├────────────────────┤ │ 0.5191511074382174 │ └────────────────────┘ ``` # RAND(n) (Lakehouse v1) > RAND(n) — returns a random floating-point value v in the range 0 <= v < 1.0. Returns a random floating-point value v in the range 0 <= v < 1.0. To obtain a random integer R in the range i <= R < j, use the expression FLOOR(i + RAND() \* (j − i)). Argument `n` is used as the seed value. For equal argument values, RAND(n) returns the same value each time , and thus produces a repeatable sequence of column values. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.rand( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.rand(1) ┌────────────────────┐ │ func.rand(1) │ ├────────────────────┤ │ 0.7133693869548766 │ └────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql RAND( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT RAND(1); ┌────────────────────┐ │ rand(1) │ ├────────────────────┤ │ 0.7133693869548766 │ └────────────────────┘ ``` # ROUND (Lakehouse v1) > ROUND — Rounds the argument x to d decimal places. Rounds the argument x to d decimal places. The rounding algorithm depends on the data type of x. d defaults to 0 if not specified. d can be negative to cause d digits left of the decimal point of the value x to become zero. The maximum absolute value for d is 30; any digits in excess of 30 (or -30) are truncated. When using this function’s result in calculations, be aware of potential precision issues due to its return data type being DOUBLE, which may affect final accuracy: ```sql SELECT ROUND(4/7, 4) - ROUND(3/7, 4); -- Result: 0.14280000000000004 SELECT ROUND(4/7, 4)::DECIMAL(8,4) - ROUND(3/7, 4)::DECIMAL(8,4); -- Result: 0.1428 ``` ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.round( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.round(0.123, 2) ┌──────────────────────┐ │ func.round(0.123, 2) │ ├──────────────────────┤ │ 0.12 │ └──────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ROUND( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ROUND(0.123, 2); ┌─────────────────┐ │ round(0.123, 2) │ ├─────────────────┤ │ 0.12 │ └─────────────────┘ ``` # SIGN (Lakehouse v1) > SIGN — returns the sign of the argument as -1, 0, or 1, depending on whether x is negative. Returns the sign of the argument as -1, 0, or 1, depending on whether `x` is negative, zero, or positive or NULL if the argument was NULL. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.sign( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.sign(0) ┌──────────────┐ │ func.sign(0) │ ├──────────────┤ │ 0 │ └──────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql SIGN( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT SIGN(0); ┌─────────┐ │ sign(0) │ ├─────────┤ │ 0 │ └─────────┘ ``` # SIN (Lakehouse v1) > SIN — Returns the sine of x, where x is given in radians. Returns the sine of `x`, where `x` is given in radians. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.sin( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.sin(90) ┌────────────────────┐ │ func.sin(90) │ ├────────────────────┤ │ 0.8939966636005579 │ └────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql SIN( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT SIN(90); ┌────────────────────┐ │ sin(90) │ ├────────────────────┤ │ 0.8939966636005579 │ └────────────────────┘ ``` # SQRT (Lakehouse v1) > SQRT — Returns the square root of a nonnegative number x. Returns the square root of a nonnegative number `x`. Returns Nan for negative input. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.sqrt( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.sqrt(4) ┌──────────────┐ │ func.sqrt(4) │ ├──────────────┤ │ 2 │ └──────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql SQRT( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT SQRT(4); ┌─────────┐ │ sqrt(4) │ ├─────────┤ │ 2 │ └─────────┘ ``` # SUBTRACT (Lakehouse v1) > SUBTRACT — alias for the MINUS numeric function. Alias for [MINUS](../minus). # TAN (Lakehouse v1) > TAN — Returns the tangent of x, where x is given in radians. Returns the tangent of `x`, where `x` is given in radians. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.tan( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.tan(90) ┌────────────────────┐ │ func.tan(90) │ ├────────────────────┤ │ -1.995200412208242 │ └────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TAN( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT TAN(90); ┌────────────────────┐ │ tan(90) │ ├────────────────────┤ │ -1.995200412208242 │ └────────────────────┘ ``` # TRUNCATE (Lakehouse v1) > TRUNCATE — Returns the number x, truncated to d decimal places. Returns the number `x`, truncated to `d` decimal places. If `d` is 0, the result has no decimal point or fractional part. `d` can be negative to cause `d` digits left of the decimal point of the value `x` to become zero. The maximum absolute value for `d` is 30; any digits in excess of 30 (or -30) are truncated. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.truncate( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.truncate(1.223, 1) ┌─────────────────────────┐ │ func.truncate(1.223, 1) │ ├─────────────────────────┤ │ 1.2 │ └─────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TRUNCATE( ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT TRUNCATE(1.223, 1); ┌────────────────────┐ │ truncate(1.223, 1) │ ├────────────────────┤ │ 1.2 │ └────────────────────┘ ``` # Date & Time Functions (Lakehouse v1) > Lakehouse v1 SQL date and time functions: parse, format, and arithmetic on dates, times, and timestamps. This section provides reference information for the datetime-related functions in PlaidCloud Lakehouse. ## Conversion Functions [Section titled “Conversion Functions”](#conversion-functions) * [DATE](date) * [TO\_MONTH](to-month) * [MONTH](month) * [TO\_DATE](to-date) * [TO\_DATETIME](to-datetime) * [TODAY](today) * [TO\_DAY\_OF\_MONTH](to-day-of-month) * [DAY](day) * [TO\_DAY\_OF\_WEEK](to-day-of-week) * [TO\_DAY\_OF\_YEAR](to-day-of-year) * [TO\_HOUR](to-hour) * [TO\_MINUTE](to-minute) * [TO\_MONDAY](to-monday) * [TOMORROW](tomorrow) * [TO\_QUARTER](to-quarter) * [QUARTER](quarter) * [TO\_SECOND](to-second) * [TO\_START\_OF\_DAY](to-start-of-day) * [TO\_START\_OF\_FIFTEEN\_MINUTES](to-start-of-fifteen-minutes) * [TO\_START\_OF\_FIVE\_MINUTES](to-start-of-five-minutes) * [TO\_START\_OF\_HOUR](to-start-of-hour) * [TO\_START\_OF\_ISO\_YEAR](to-start-of-iso-year) * [TO\_START\_OF\_MINUTE](to-start-of-minute) * [TO\_START\_OF\_MONTH](to-start-of-month) * [TO\_START\_OF\_QUARTER](to-start-of-quarter) * [TO\_START\_OF\_SECOND](to-start-of-second) * [TO\_START\_OF\_TENMINUTES](to-start-of-ten-minutes) * [TO\_START\_OF\_WEEK](to-start-of-week) * [TO\_START\_OF\_YEAR](to-start-of-year) * [TOTIMESTAMP](to-timestamp) * [TO\_UNIX\_TIMESTAMP](to-unix-timestamp) * [TO\_WEEK\_OF\_YEAR](to-week-of-year) * [WEEK](week) * [WEEKOFYEAR](weekofyear) * [TO\_YEAR](to-year) * [YEAR](year) * [TO\_YYYYMM](to-yyyymm) * [TO\_YYYYMMDD](to-yyyymmdd) * [TO\_YYYYMMDDHH](to-yyyymmddhh) * [TO\_YYYYMMDDHHMMSS](to-yyyymmddhhmmss) ## Date Arithmetic Functions [Section titled “Date Arithmetic Functions”](#date-arithmetic-functions) * [ADD INTERVAL](addinterval) * [DATE\_ADD](date-add) * [DATE\_SUB](date-sub) * [SUBTRACT INTERVAL](subtractinterval) ## Date Information Functions [Section titled “Date Information Functions”](#date-information-functions) * [DATE\_PART](date-part) * [DATE\_DIFF](date-diff) * [DATE\_FORMAT](date-format) * [DATE\_TRUNC](date-trunc) * [NOW](now) * [CURRENT\_TIMESTAMP](current-timestamp) ## Others [Section titled “Others”](#others) * [EXTRACT](extract) * [TIME\_SLOT](time-slot) * [TIMEZONE](timezone) * [YESTERDAY](yesterday) # ADD TIME INTERVAL (Lakehouse v1) > ADD TIME INTERVAL — add years, months, days, hours, minutes, or seconds to a date or timestamp value. Add a time interval to a date or timestamp, return the result of date or timestamp type. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.add_years(, ) func.add_quarters(, ) func.add_months(, ) func.add_days(, ) func.add_hours(, ) func.add_minutes(, ) func.add_seconds(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_date(18875), func.add_years(func.to_date(18875), 2) ┌─────────────────────────────────┬───────────────────────────────────────────────────┐ │ func.to_date(18875) │ func.add_years(func.to_date(18875), 2) │ ├─────────────────────────────────┼───────────────────────────────────────────────────┤ │ 2021-09-05 │ 2023-09-05 │ └─────────────────────────────────┴───────────────────────────────────────────────────┘ func.to_date(18875), func.add_quarters(func.to_date(18875), 2) ┌─────────────────────────────────┬───────────────────────────────────────────────────┐ │ func.to_date(18875) │ add_quarters(func.to_date(18875), 2) │ ├─────────────────────────────────┼───────────────────────────────────────────────────┤ │ 2021-09-05 │ 2022-03-05 │ └─────────────────────────────────┴───────────────────────────────────────────────────┘ func.to_date(18875), func.add_months(func.to_date(18875), 2) ┌─────────────────────────────────┬───────────────────────────────────────────────────┐ │ func.to_date(18875) │ func.add_months(func.to_date(18875), 2) │ ├─────────────────────────────────┼───────────────────────────────────────────────────┤ │ 2021-09-05 │ 2021-11-05 │ └─────────────────────────────────┴───────────────────────────────────────────────────┘ func.to_date(18875), func.add_days(func.to_date(18875), 2) ┌─────────────────────────────────┬───────────────────────────────────────────────────┐ │ func.to_date(18875) │ func.add_days(func.to_date(18875), 2) │ ├─────────────────────────────────┼───────────────────────────────────────────────────┤ │ 2021-09-05 │ 2021-09-07 │ └─────────────────────────────────┴───────────────────────────────────────────────────┘ func.to_datetime(1630833797), func.add_hours(func.to_datetime(1630833797), 2) ┌─────────────────────────────────┬───────────────────────────────────────────────────┐ │ func.to_datetime(1630833797) │ func.add_hours(func.to_datetime(1630833797), 2) │ ├─────────────────────────────────┼───────────────────────────────────────────────────┤ │ 2021-09-05 09:23:17.000000 │ 2021-09-05 11:23:17.000000 │ └─────────────────────────────────┴───────────────────────────────────────────────────┘ func.to_datetime(1630833797), func.add_minutes(func.to_datetime(1630833797), 2) ┌─────────────────────────────────┬───────────────────────────────────────────────────┐ │ func.to_datetime(1630833797) │ func.add_minutes(func.to_datetime(1630833797), 2) │ ├─────────────────────────────────┼───────────────────────────────────────────────────┤ │ 2021-09-05 09:23:17.000000 │ 2021-09-05 09:25:17.000000 │ └─────────────────────────────────┴───────────────────────────────────────────────────┘ func.to_datetime(1630833797), func.add_seconds(func.to_datetime(1630833797), 2) ┌─────────────────────────────────┬───────────────────────────────────────────────────┐ │ func.to_datetime(1630833797) │ func.add_seconds(func.to_datetime(1630833797), 2) │ ├─────────────────────────────────┼───────────────────────────────────────────────────┤ │ 2021-09-05 09:23:17.000000 │ 2021-09-05 09:23:19.000000 │ └─────────────────────────────────┴───────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ADD_YEARS(, ) ADD_QUARTERs(, ) ADD_MONTHS(, ) ADD_DAYS(, ) ADD_HOURS(, ) ADD_MINUTES(, ) ADD_SECONDS(, ) ``` ## Return Type [Section titled “Return Type”](#return-type) `DATE`, `TIMESTAMP`, depends on the input. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT to_date(18875), add_years(to_date(18875), 2); ┌────────────────┬──────────────────────────────┐ │ to_date(18875) │ add_years(to_date(18875), 2) │ ├────────────────┼──────────────────────────────┤ │ 2021-09-05 │ 2023-09-05 │ └────────────────┴──────────────────────────────┘ SELECT to_date(18875), add_quarters(to_date(18875), 2); ┌────────────────┬─────────────────────────────────┐ │ to_date(18875) │ add_quarters(to_date(18875), 2) │ ├────────────────┼─────────────────────────────────┤ │ 2021-09-05 │ 2022-03-05 │ └────────────────┴─────────────────────────────────┘ SELECT to_date(18875), add_months(to_date(18875), 2); ┌────────────────┬───────────────────────────────┐ │ to_date(18875) │ add_months(to_date(18875), 2) │ ├────────────────┼───────────────────────────────┤ │ 2021-09-05 │ 2021-11-05 │ └────────────────┴───────────────────────────────┘ SELECT to_date(18875), add_days(to_date(18875), 2); ┌────────────────┬─────────────────────────────┐ │ to_date(18875) │ add_days(to_date(18875), 2) │ ├────────────────┼─────────────────────────────┤ │ 2021-09-05 │ 2021-09-07 │ └────────────────┴─────────────────────────────┘ SELECT to_datetime(1630833797), add_hours(to_datetime(1630833797), 2); ┌────────────────────────────┬───────────────────────────────────────┐ │ to_datetime(1630833797) │ add_hours(to_datetime(1630833797), 2) │ ├────────────────────────────┼───────────────────────────────────────┤ │ 2021-09-05 09:23:17.000000 │ 2021-09-05 11:23:17.000000 │ └────────────────────────────┴───────────────────────────────────────┘ SELECT to_datetime(1630833797), add_minutes(to_datetime(1630833797), 2); ┌────────────────────────────┬─────────────────────────────────────────┐ │ to_datetime(1630833797) │ add_minutes(to_datetime(1630833797), 2) │ ├────────────────────────────┼─────────────────────────────────────────┤ │ 2021-09-05 09:23:17.000000 │ 2021-09-05 09:25:17.000000 │ └────────────────────────────┴─────────────────────────────────────────┘ SELECT to_datetime(1630833797), add_seconds(to_datetime(1630833797), 2); ┌────────────────────────────┬─────────────────────────────────────────┐ │ to_datetime(1630833797) │ add_seconds(to_datetime(1630833797), 2) │ ├────────────────────────────┼─────────────────────────────────────────┤ │ 2021-09-05 09:23:17.000000 │ 2021-09-05 09:23:19.000000 │ └────────────────────────────┴─────────────────────────────────────────┘ ``` # CURRENT_TIMESTAMP (Lakehouse v1) > CURRENT_TIMESTAMP — alias for the NOW datetime function. Alias for [NOW](../now). # DATE (Lakehouse v1) > DATE — alias for the TO_DATE datetime function. Alias for [TO\_DATE](../to-date). # DATE_ADD (Lakehouse v1) > DATE_ADD — add the time interval or date interval to the provided date or date with time. Add the time interval or date interval to the provided date or date with time (timestamp/datetime). ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.date_add(, , ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.date_add('YEAR', 1, func.to_date('2018-01-02')) ┌──────────────────────────────────────────────────────┐ │ func.date_add('YEAR', 1, func.to_date('2018-01-02')) │ ├──────────────────────────────────────────────────────┤ │ 2019-01-02 │ └──────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql DATE_ADD(, , ) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------------------- | ----------------------------------------------------------------------------------------------------------------- | | `` | Must be of the following values: `YEAR`, `QUARTER`, `MONTH`, `DAY`, `HOUR`, `MINUTE` and `SECOND` | | `` | This is the number of units of time that you want to add. For example, if you want to add 2 days, this will be 2. | | `` | A value of `DATE` or `TIMESTAMP` type | ## Return Type [Section titled “Return Type”](#return-type) The function returns a value of the same type as the `` argument. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT date_add(YEAR, 1, to_date('2018-01-02')); ┌───────────────────────────────────────────────────┐ │ DATE_ADD(YEAR, INTERVAL 1, to_date('2018-01-02')) │ ├───────────────────────────────────────────────────┤ │ 2019-01-02 │ └───────────────────────────────────────────────────┘ ``` # DATE DIFF (Lakehouse v1) > DATE DIFF — plaidCloud Lakehouse does not provide a date_diff function yet, but it supports. PlaidCloud Lakehouse does not provide a `date_diff` function yet, but it supports direct arithmetic operations on dates and times. For example, you can use the expression `TO_DATE(NOW())-2` to obtain the date from two days ago. This flexibility of directly manipulating dates and times in PlaidCloud Lakehouse makes it convenient and versatile for handling date and time computations. See an example below: ```sql CREATE TABLE tasks ( task_name VARCHAR(50), start_date DATE, end_date DATE ); INSERT INTO tasks (task_name, start_date, end_date) VALUES ('Task 1', '2023-06-15', '2023-06-20'), ('Task 2', '2023-06-18', '2023-06-25'), ('Task 3', '2023-06-20', '2023-06-23'); SELECT task_name, end_date - start_date AS duration FROM tasks; task_name|duration| ---------+--------+ Task 1 | 5| Task 2 | 7| Task 3 | 3| ``` # DATE_FORMAT (Lakehouse v1) > DATE_FORMAT — alias for the TO_STRING datetime function. Alias for [TO\_STRING](../../02-conversion-functions/to-string). # DATE_PART (Lakehouse v1) > DATE_PART — retrieves the designated portion of a date, time, or timestamp. Retrieves the designated portion of a date, time, or timestamp. See also: [EXTRACT](../extract) ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.date_part(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.now() | ---------------------+ 2023-10-16 02:09:28.0| func.date_part('day', now()) func.date_part('day', now())| ----------------------------+ 16 | ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql DATE_PART( YEAR | QUARTER | MONTH | WEEK | DAY | HOUR | MINUTE | SECOND | DOW | DOY, ) ``` * DOW: Day of Week. * DOY: Day of Year. ## Return Type [Section titled “Return Type”](#return-type) Integer. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT NOW(); now() | ---------------------+ 2023-10-16 02:09:28.0| SELECT DATE_PART(DAY, NOW()); date_part(day, now())| ---------------------+ 16| -- October 16, 2023, is a Monday SELECT DATE_PART(DOW, NOW()); date_part(dow, now())| ---------------------+ 1| -- October 16, 2023, is the 289th day of the year SELECT DATE_PART(DOY, NOW()); date_part(doy, now())| ---------------------+ 289| SELECT DATE_PART(MONTH, TO_DATE('2022-05-13')); date_part(month, to_date('2022-05-13'))| ---------------------------------------+ 5| ``` # DATE_SUB (Lakehouse v1) > DATE_SUB — subtract the time interval or date interval from the provided date or date with time (timestamp/datetime). Subtract the time interval or date interval from the provided date or date with time (timestamp/datetime). ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.date_sub(, , ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.date_sub('YEAR', 1, func.to_date('2018-01-02')) ┌──────────────────────────────────────────────────────┐ │ func.date_sub('YEAR', 1, func.to_date('2018-01-02')) │ ├──────────────────────────────────────────────────────┤ │ 2017-01-02 │ └──────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql DATE_SUB(, , ) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------------------- | ----------------------------------------------------------------------------------------------------------------- | | `` | Must be of the following values: `YEAR`, `QUARTER`, `MONTH`, `DAY`, `HOUR`, `MINUTE` and `SECOND` | | `` | This is the number of units of time that you want to add. For example, if you want to add 2 days, this will be 2. | | `` | A value of `DATE` or `TIMESTAMP` type | ## Return Type [Section titled “Return Type”](#return-type) The function returns a value of the same type as the `` argument. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT date_sub(YEAR, 1, to_date('2018-01-02')); ┌───────────────────────────────────────────────────┐ │ DATE_SUB(YEAR, INTERVAL 1, to_date('2018-01-02')) │ ├───────────────────────────────────────────────────┤ │ 2017-01-02 │ └───────────────────────────────────────────────────┘ ``` # DATE_TRUNC (Lakehouse v1) > DATE_TRUNC — truncates a date, time, or timestamp value to a specified precision. Truncates a date, time, or timestamp value to a specified precision. For example, if you truncate `2022-07-07` to `MONTH`, the result will be `2022-07-01`; if you truncate `2022-07-07 01:01:01.123456` to `SECOND`, the result will be `2022-07-07 01:01:01.000000`. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.date_sub(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.date_trunc('month', func.to_date('2022-07-07')) ┌──────────────────────────────────────────────────────┐ │ func.date_trunc('month', func.to_date('2022-07-07')) │ ├──────────────────────────────────────────────────────┤ │ 2022-07-01 │ └──────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql DATE_TRUNC(, ) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------------------- | ------------------------------------------------------------------------------------------------- | | `` | Must be of the following values: `YEAR`, `QUARTER`, `MONTH`, `DAY`, `HOUR`, `MINUTE` and `SECOND` | | `` | A value of `DATE` or `TIMESTAMP` type | ## Return Type [Section titled “Return Type”](#return-type) The function returns a value of the same type as the `` argument. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql select date_trunc(month, to_date('2022-07-07')); ┌──────────────────────────────────────────┐ │ date_trunc(month, to_date('2022-07-07')) │ ├──────────────────────────────────────────┤ │ 2022-07-01 │ └──────────────────────────────────────────┘ ``` # DAY (Lakehouse v1) > DAY — alias for the TO_DAY_OF_MONTH datetime function. Alias for [TO\_DAY\_OF\_MONTH](../to-day-of-month). # EXTRACT (Lakehouse v1) > EXTRACT — retrieves the designated portion of a date, time, or timestamp. Retrieves the designated portion of a date, time, or timestamp. See also: [DATE\_PART](../date-part) ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql EXTRACT( YEAR | QUARTER | MONTH | WEEK | DAY | HOUR | MINUTE | SECOND | DOW | DOY FROM ) ``` * DOW: Day of the Week. * DOY: Day of Year. ## Return Type [Section titled “Return Type”](#return-type) Integer. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT NOW(); now() | ---------------------+ 2023-10-16 02:09:28.0| SELECT EXTRACT(DAY FROM NOW()); extract(day from now())| -----------------------+ 16| -- October 16, 2023, is a Monday SELECT EXTRACT(DOW FROM NOW()); extract(dow from now())| -----------------------+ 1| -- October 16, 2023, is the 289th day of the year SELECT EXTRACT(DOY FROM NOW()); extract(doy from now())| -----------------------+ 289| SELECT EXTRACT(MONTH FROM TO_DATE('2022-05-13')); extract(month from to_date('2022-05-13'))| -----------------------------------------+ 5| ``` # LAST_DAY (Lakehouse v1) > LAST_DAY — returns the last day of the specified interval (week, month, quarter, or year). Returns the last day of the specified interval (week, month, quarter, or year) based on the provided date or timestamp. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.last_day(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.last_day(func.to_date('2024-11-13'), 'month') ┌──────────────────────────────────────────────────────┐ │ func.last_day(func.to_date('2024-11-13'), 'month') │ ├──────────────────────────────────────────────────────┤ │ 2024-11-30 │ └──────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql LAST_DAY(, ) ``` | Parameter | Description | | ------------------- | ------------------------------------------------------------------------------------------------------------- | | `` | A DATE or TIMESTAMP value to calculate the last day of the specified interval. | | `` | The interval type for which to find the last day. Accepted values are `week`, `month`, `quarter`, and `year`. | ## Return Type [Section titled “Return Type”](#return-type) Date. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) Let’s say you want to determine the billing date, which is always the last day of the month, based on an arbitrary date of a transaction (e.g., 2024-11-13): ```sql SELECT LAST_DAY(to_date('2024-11-13'), month) AS billing_date; ┌──────────────┐ │ billing_date │ ├──────────────┤ │ 2024-11-30 │ └──────────────┘ ``` # MONTH (Lakehouse v1) > MONTH — alias for the TO_MONTH datetime function. Alias for [TO\_MONTH](../to-month). # MONTHS_BETWEEN (Lakehouse v1) > MONTHS_BETWEEN — Returns the number of months between date1 and date2. Returns the number of months between *date1* and *date2*. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.months_between(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.months_between(func.to_date('2024-03-15'), func.to_date('2024-02-15')) ┌───────────────────────────────────────────────────────────────────────────────┐ │ func.months_between(func.to_date('2024-03-15'), func.to_date('2024-02-15')) │ ├───────────────────────────────────────────────────────────────────────────────┤ │ 1 │ └───────────────────────────────────────────────────────────────────────────────┘ ``` ```python func.months_between(func.to_date('2024-02-15'), func.to_date('2024-03-15')) ┌───────────────────────────────────────────────────────────────────────────────┐ │ func.months_between(func.to_date('2024-02-15'), func.to_date('2024-03-15')) │ ├───────────────────────────────────────────────────────────────────────────────┤ │ -1 │ └───────────────────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql MONTHS_BETWEEN( , ) ``` ## Arguments [Section titled “Arguments”](#arguments) *date1* and *date2* can be of DATE type, TIMESTAMP type, or a mix of both. ## Return Type [Section titled “Return Type”](#return-type) The function returns a FLOAT value based on the following rules: * If *date1* is earlier than *date2*, the function returns a negative value; otherwise, it returns a positive value. ```sql SELECT MONTHS_BETWEEN('2024-03-15'::DATE, '2024-02-15'::DATE), MONTHS_BETWEEN('2024-02-15'::DATE, '2024-03-15'::DATE); -[ RECORD 1 ]----------------------------------- months_between('2024-03-15'::date, '2024-02-15'::date): 1 months_between('2024-02-15'::date, '2024-03-15'::date): -1 ``` * If *date1* and *date2* fall on the same day of their respective months or both are the last day of their respective months, the result is an integer. Otherwise, the function calculates the fractional portion of the result based on a 31-day month. ```sql SELECT MONTHS_BETWEEN('2024-02-29'::DATE, '2024-01-29'::DATE), MONTHS_BETWEEN('2024-02-29'::DATE, '2024-01-31'::DATE); -[ RECORD 1 ]----------------------------------- months_between('2024-02-29'::date, '2024-01-29'::date): 1 months_between('2024-02-29'::date, '2024-01-31'::date): 1 SELECT MONTHS_BETWEEN('2024-08-05'::DATE, '2024-01-01'::DATE); -[ RECORD 1 ]----------------------------------- months_between('2024-08-05'::date, '2024-01-01'::date): 7.129032258064516 ``` * If *date1* and *date2* are the same date, the function ignores any time components and returns 0. ```sql SELECT MONTHS_BETWEEN('2024-08-05'::DATE, '2024-08-05'::DATE), MONTHS_BETWEEN('2024-08-05 02:00:00'::TIMESTAMP, '2024-08-05 01:00:00'::TIMESTAMP); -[ RECORD 1 ]----------------------------------- months_between('2024-08-05'::date, '2024-08-05'::date): 0 months_between('2024-08-05 02:00:00'::timestamp, '2024-08-05 01:00:00'::timestamp): 0 ``` # NEXT_DAY (Lakehouse v1) > NEXT_DAY — returns the date of the upcoming specified day of the week after the given date or timestamp. Returns the date of the upcoming specified day of the week after the given date or timestamp. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.next_day(date_expression>, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.next_day(func.to_date('2024-11-13'), 'monday') ┌──────────────────────────────────────────────────────┐ │ func.next_day(func.to_date('2024-11-13'), 'monday') │ ├──────────────────────────────────────────────────────┤ │ 2024-11-18 │ └──────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql NEXT_DAY(, ) ``` | Parameter | Description | | ------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `` | A `DATE` or `TIMESTAMP` value to calculate the next occurrence of the specified day. | | `` | The target day of the week to find the next occurrence of. Accepted values include `monday`, `tuesday`, `wednesday`, `thursday`, `friday`, `saturday`, and `sunday`. | ## Return Type [Section titled “Return Type”](#return-type) Date. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) To find the next Monday after a specific date, such as 2024-11-13: ```sql SELECT NEXT_DAY(to_date('2024-11-13'), monday) AS next_monday; ┌─────────────┐ │ next_monday │ ├─────────────┤ │ 2024-11-18 │ └─────────────┘ ``` # NOW (Lakehouse v1) > NOW — returns the current date and time. Returns the current date and time. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.now() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python ┌─────────────────────────────────────────────────────────┐ │ func.current_timestamp() │ func.now() │ ├────────────────────────────┼────────────────────────────┤ │ 2024-01-29 04:38:12.584359 │ 2024-01-29 04:38:12.584417 │ └─────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql NOW() ``` ## Return Type [Section titled “Return Type”](#return-type) TIMESTAMP ## Aliases [Section titled “Aliases”](#aliases) * [CURRENT\_TIMESTAMP](../current-timestamp) ## SQL Examples [Section titled “SQL Examples”](#sql-examples) This example returns the current date and time: ```sql SELECT CURRENT_TIMESTAMP(), NOW(); ┌─────────────────────────────────────────────────────────┐ │ current_timestamp() │ now() │ ├────────────────────────────┼────────────────────────────┤ │ 2024-01-29 04:38:12.584359 │ 2024-01-29 04:38:12.584417 │ └─────────────────────────────────────────────────────────┘ ``` # PREVIOUS_DAY (Lakehouse v1) > PREVIOUS_DAY — returns the date of the most recent specified day of the week before the given. Returns the date of the most recent specified day of the week before the given date or timestamp. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.next_day(date_expression>, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.next_day(func.to_date('2024-11-13'), 'friday') ┌──────────────────────────────────────────────────────┐ │ func.next_day(func.to_date('2024-11-13'), 'friday') │ ├──────────────────────────────────────────────────────┤ │ 2024-11-08 │ └──────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql PREVIOUS_DAY(, ) ``` | Parameter | Description | | ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | `` | A `DATE` or `TIMESTAMP` value to calculate the previous occurrence of the specified day. | | `` | The target day of the week to find the previous occurrence of. Accepted values include `monday`, `tuesday`, `wednesday`, `thursday`, `friday`, `saturday`, and `sunday`. | ## Return Type [Section titled “Return Type”](#return-type) Date. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) If you need to find the previous Friday before a given date, such as 2024-11-13: ```sql SELECT PREVIOUS_DAY(to_date('2024-11-13'), friday) AS last_friday; ┌─────────────┐ │ last_friday │ ├─────────────┤ │ 2024-11-08 │ └─────────────┘ ``` # QUARTER (Lakehouse v1) > QUARTER — alias for the TO_QUARTER datetime function. Alias for [TO\_QUARTER](../to-quarter). # STR_TO_DATE (Lakehouse v1) > STR_TO_DATE — alias for the TO_DATE datetime function. Alias for [TO\_DATE](../to-date). # STR_TO_TIMESTAMP (Lakehouse v1) > STR_TO_TIMESTAMP — alias for the TO_TIMESTAMP datetime function. Reference. Alias for [TO\_TIMESTAMP](../to-timestamp). # SUBTRACT TIME INTERVAL (Lakehouse v1) > SUBTRACT TIME INTERVAL — subtract years, months, days, hours, minutes, or seconds from a date or timestamp value. Subtract time interval from a date or timestamp, return the result of date or timestamp type. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.subtract_years(, ) func.subtract_quarters(, ) func.subtract_months(, ) func.subtract_days(, ) func.subtract_hours(, ) func.subtract_minutes(, ) func.subtract_seconds(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_date(18875), func.subtract_years(func.to_date(18875), 2) ┌─────────────────────────────────┬────────────────────────────────────────────────────────┐ │ func.to_date(18875) │ func.subtract_years(func.to_date(18875), 2) │ ├─────────────────────────────────┼────────────────────────────────────────────────────────┤ │ 2021-09-05 │ 2019-09-05 │ └─────────────────────────────────┴────────────────────────────────────────────────────────┘ func.to_date(18875), func.subtract_quarters(func.to_date(18875), 2) ┌─────────────────────────────────┬────────────────────────────────────────────────────────┐ │ func.to_date(18875) │ subtract_quarters(func.to_date(18875), 2) │ ├─────────────────────────────────┼────────────────────────────────────────────────────────┤ │ 2021-09-05 │ 2021-03-05 │ └─────────────────────────────────┴────────────────────────────────────────────────────────┘ func.to_date(18875), func.subtract_months(func.to_date(18875), 2) ┌─────────────────────────────────┬────────────────────────────────────────────────────────┐ │ func.to_date(18875) │ func.subtract_months(func.to_date(18875), 2) │ ├─────────────────────────────────┼────────────────────────────────────────────────────────┤ │ 2021-09-05 │ 2021-07-05 │ └─────────────────────────────────┴────────────────────────────────────────────────────────┘ func.to_date(18875), func.subtract_days(func.to_date(18875), 2) ┌─────────────────────────────────┬────────────────────────────────────────────────────────┐ │ func.to_date(18875) │ func.subtract_days(func.to_date(18875), 2) │ ├─────────────────────────────────┼────────────────────────────────────────────────────────┤ │ 2021-09-05 │ 2021-09-03 │ └─────────────────────────────────┴────────────────────────────────────────────────────────┘ func.to_datetime(1630833797), func.subtract_hours(func.to_datetime(1630833797), 2) ┌─────────────────────────────────┬────────────────────────────────────────────────────────┐ │ func.to_datetime(1630833797) │ func.subtract_hours(func.to_datetime(1630833797), 2) │ ├─────────────────────────────────┼────────────────────────────────────────────────────────┤ │ 2021-09-05 09:23:17.000000 │ 2021-09-05 07:23:17.000000 │ └─────────────────────────────────┴────────────────────────────────────────────────────────┘ func.to_datetime(1630833797), func.subtract_minutes(func.to_datetime(1630833797), 2) ┌─────────────────────────────────┬────────────────────────────────────────────────────────┐ │ func.to_datetime(1630833797) │ func.subtract_minutes(func.to_datetime(1630833797), 2) │ ├─────────────────────────────────┼────────────────────────────────────────────────────────┤ │ 2021-09-05 09:23:17.000000 │ 2021-09-05 09:21:17.000000 │ └─────────────────────────────────┴────────────────────────────────────────────────────────┘ func.to_datetime(1630833797), func.subtract_seconds(func.to_datetime(1630833797), 2) ┌─────────────────────────────────┬────────────────────────────────────────────────────────┐ │ func.to_datetime(1630833797) │ func.subtract_seconds(func.to_datetime(1630833797), 2) │ ├─────────────────────────────────┼────────────────────────────────────────────────────────┤ │ 2021-09-05 09:23:17.000000 │ 2021-09-05 09:23:15.000000 │ └─────────────────────────────────┴────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql SUBTRACT_YEARS(, ) SUBTRACT_QUARTERS(, ) SUBTRACT_MONTHS(, ) SUBTRACT_DAYS(, ) SUBTRACT_HOURS(, ) SUBTRACT_MINUTES(, ) SUBTRACT_SECONDS(, ) ``` ## Return Type [Section titled “Return Type”](#return-type) `DATE`, `TIMESTAMP` depends on the input. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT to_date(18875), subtract_years(to_date(18875), 2); ┌────────────────┬───────────────────────────────────┐ │ to_date(18875) │ subtract_years(to_date(18875), 2) │ ├────────────────┼───────────────────────────────────┤ │ 2021-09-05 │ 2019-09-05 │ └────────────────┴───────────────────────────────────┘ SELECT to_date(18875), subtract_quarters(to_date(18875), 2); ┌────────────────┬──────────────────────────────────────┐ │ to_date(18875) │ subtract_quarters(to_date(18875), 2) │ ├────────────────┼──────────────────────────────────────┤ │ 2021-09-05 │ 2021-03-05 │ └────────────────┴──────────────────────────────────────┘ SELECT to_date(18875), subtract_months(to_date(18875), 2); ┌────────────────┬────────────────────────────────────┐ │ to_date(18875) │ subtract_months(to_date(18875), 2) │ ├────────────────┼────────────────────────────────────┤ │ 2021-09-05 │ 2021-07-05 │ └────────────────┴────────────────────────────────────┘ SELECT to_date(18875), subtract_days(to_date(18875), 2); ┌────────────────┬──────────────────────────────────┐ │ to_date(18875) │ subtract_days(to_date(18875), 2) │ ├────────────────┼──────────────────────────────────┤ │ 2021-09-05 │ 2021-09-03 │ └────────────────┴──────────────────────────────────┘ SELECT to_datetime(1630833797), subtract_hours(to_datetime(1630833797), 2); ┌────────────────────────────┬────────────────────────────────────────────┐ │ to_datetime(1630833797) │ subtract_hours(to_datetime(1630833797), 2) │ ├────────────────────────────┼────────────────────────────────────────────┤ │ 2021-09-05 09:23:17.000000 │ 2021-09-05 07:23:17.000000 │ └────────────────────────────┴────────────────────────────────────────────┘ SELECT to_datetime(1630833797), subtract_minutes(to_datetime(1630833797), 2); ┌────────────────────────────┬──────────────────────────────────────────────┐ │ to_datetime(1630833797) │ subtract_minutes(to_datetime(1630833797), 2) │ ├────────────────────────────┼──────────────────────────────────────────────┤ │ 2021-09-05 09:23:17.000000 │ 2021-09-05 09:21:17.000000 │ └────────────────────────────┴──────────────────────────────────────────────┘ SELECT to_datetime(1630833797), subtract_seconds(to_datetime(1630833797), 2); ┌────────────────────────────┬──────────────────────────────────────────────┐ │ to_datetime(1630833797) │ subtract_seconds(to_datetime(1630833797), 2) │ ├────────────────────────────┼──────────────────────────────────────────────┤ │ 2021-09-05 09:23:17.000000 │ 2021-09-05 09:23:15.000000 │ └────────────────────────────┴──────────────────────────────────────────────┘ ``` # TIME_SLOT (Lakehouse v1) > TIME_SLOT — rounds the time to the half-hour. Rounds the time to the half-hour. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.time_slot() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.time_slot('2023-11-12 09:38:18.165575') ┌───────────────────────────────-───-───-──────┐ │ func.time_slot('2023-11-12 09:38:18.165575') │ │ Timestamp │ ├─────────────────────────────────-───-────────┤ │ 2023-11-12 09:30:00 │ └─────────────────────────────────-───-────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql time_slot() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ----------- | | `` | timestamp | ## Return Type [Section titled “Return Type”](#return-type) `TIMESTAMP`, returns in “YYYY-MM-DD hh:mm:ss.ffffff” format. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT time_slot('2023-11-12 09:38:18.165575') ┌─────────────────────────────────────────┐ │ time_slot('2023-11-12 09:38:18.165575') │ │ Timestamp │ ├─────────────────────────────────────────┤ │ 2023-11-12 09:30:00 │ └─────────────────────────────────────────┘ ``` # TIMESTAMP_DIFF (Lakehouse v1) > TIMESTAMP_DIFF — calculates the difference between two timestamps and returns the result as an INTERVAL. Calculates the difference between two timestamps and returns the result as an INTERVAL. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.timestamp_diff(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.timestamp_diff(func.to_timestamp('2025-02-01'), func.to_timestamp('2025-01-01')) ┌────────────────────────────────────────────────────────────────────────────────────────┐ │ func.timestamp_diff(func.to_timestamp('2025-02-01'), func.to_timestamp('2025-01-01')) │ ├────────────────────────────────────────────────────────────────────────────────────────┤ │ 744:00:00 │ └────────────────────────────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TIMESTAMP_DIFF(, ) ``` ## Return Type [Section titled “Return Type”](#return-type) INTERVAL (formatted as `hours:minutes:seconds`). ## SQL Examples [Section titled “SQL Examples”](#sql-examples) This example shows that the time difference between February 1, 2025, and January 1, 2025, is 744 hours, corresponding to 31 days: ```sql SELECT TIMESTAMP_DIFF('2025-02-01'::TIMESTAMP, '2025-01-01'::TIMESTAMP); ┌──────────────────────────────────────────────────────────────────┐ │ timestamp_diff('2025-02-01'::TIMESTAMP, '2025-01-01'::TIMESTAMP) │ ├──────────────────────────────────────────────────────────────────┤ │ 744:00:00 │ └──────────────────────────────────────────────────────────────────┘ ``` # TIMEZONE (Lakehouse v1) > TIMEZONE — Returns the timezone for the current connection. Returns the timezone for the current connection. PlaidCloud Lakehouse uses UTC (Coordinated Universal Time) as the default timezone and allows you to change the timezone to your current geographic location. For the available values you can assign to the `timezone` setting, refer to . See the examples below for details. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.timezone() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.timezone() ┌─────────────────────┐ │ timezone │ ├─────────────────────┤ │ UTC │ └─────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql SELECT TIMEZONE(); ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql -- Return the current timezone SELECT TIMEZONE(); ┌─────────────────┐ │ TIMEZONE('UTC') │ ├─────────────────┤ │ UTC │ └─────────────────┘ -- Set the timezone to China Standard Time SET timezone='Asia/Shanghai'; SELECT TIMEZONE(); ┌───────────────────────────┐ │ TIMEZONE('Asia/Shanghai') │ ├───────────────────────────┤ │ Asia/Shanghai │ └───────────────────────────┘ ``` # TO_DATE (Lakehouse v1) > TO_DATE — Converts an expression to a date, including:. Converts an expression to a date, including: * **Converting a timestamp-format string to a date**: Extracts a date from the given string. * **Converting an integer to a date**: Interprets the integer as the number of days before (for negative numbers) or after (for positive numbers) the Unix epoch (midnight on January 1, 1970). Please note that a Date value ranges from `1000-01-01` to `9999-12-31`. PlaidCloud Lakehouse would return an error if you run “SELECT TO\_DATE(9999999999999999999)”. * **Converting a string to a date using the specified format**: The function takes two arguments, converting the first string to a date based on the format specified in the second string. To customize the date and time format in PlaidCloud Lakehouse, specifiers can be used. For a comprehensive list of supported specifiers, see Formatting Date and Time. Date formats are expressed using the `strftime` specification. see the [quick reference](https://devhints.io/strftime). See also: [TO\_TIMESTAMP](../to-timestamp) ## Strftime Parameters [Section titled “Strftime Parameters”](#strftime-parameters) ### Quick Reference [Section titled “Quick Reference”](#quick-reference) #### Date [Section titled “Date”](#date) | Example | Output | | --------------- | ---------------------- | | `%m/%d/%Y` | `06/05/2013` | | `%A, %B %e, %Y` | `Sunday, June 5, 2013` | | `%b %e %a` | `Jun 5 Sun` | #### Time [Section titled “Time”](#time) | Example | Output | | ---------- | ---------- | | `%H:%M` | `23:05` | | `%I:%M %p` | `11:05 PM` | ### Date [Section titled “Date”](#date-1) | Symbol | Example | Area | | ------ | ------------------------ | ----------- | | `%a` | `Sun` | **Weekday** | | `%A` | `Sunday` | | | `%w` | `0`..`6` *(Sunday is 0)* | | | --- | --- | --- | | `%y` | `13` | **Year** | | `%Y` | `2013` | | | --- | --- | --- | | `%b` | `Jan` | **Month** | | `%B` | `January` | | | `%m` | `01`..`12` | | | --- | --- | --- | | `%d` | `01`..`31` | **Day** | | `%e` | `1`..`31` | | ### Time [Section titled “Time”](#time-1) | Symbol | Example | Area | | ------ | ------------ | ------------------- | | `%l` | `1` | Hour | | `%H` | `00`..`23` | 24h Hour | | `%I` | `01`..`12` | 12h Hour | | — | --- | --- | | `%M` | `00`..`59` | Minute | | `%S` | `00`..`60` | Second | | --- | --- | --- | | `%p` | `AM` | AM or PM | | `%Z` | `+08` | Time zone | | --- | --- | --- | | `%j` | `001`..`366` | Day of the year | | `%%` | `%` | Literal % character | ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_date('') func.to_date() func.to_date('', '') ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.typeof(func.to_date('2022-01-02')), func.typeof(func.str_to_date('2022-01-02')) ┌───────────────────────────────────────────────────────────────────────────────────────┐ │ func.typeof(func.to_date('2022-01-02')) │ func.typeof(func.str_to_date('2022-01-02')) │ ├─────────────────────────────────────────┼─────────────────────────────────────────────┤ │ DATE │ DATE │ └───────────────────────────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql -- Convert a timestamp-format string TO_DATE('') -- Convert an integer TO_DATE() -- Convert a string using the given format TO_DATE('', '') ``` ## Aliases [Section titled “Aliases”](#aliases) * [DATE](../date) * [STR\_TO\_DATE](../str-to-date) ## Return Type [Section titled “Return Type”](#return-type) The function returns a date in the format “YYYY-MM-DD”: ```sql SELECT TYPEOF(TO_DATE('2022-01-02')), TYPEOF(STR_TO_DATE('2022-01-02')); ┌───────────────────────────────────────────────────────────────────┐ │ typeof(to_date('2022-01-02')) │ typeof(str_to_date('2022-01-02')) │ ├───────────────────────────────┼───────────────────────────────────┤ │ DATE │ DATE │ └───────────────────────────────────────────────────────────────────┘ ``` To convert the returned date back to a string, use the [DATE\_FORMAT](../date-format) function: ```sql SELECT DATE_FORMAT(TO_DATE('2022-01-02')) AS dt, TYPEOF(dt); ┌─────────────────────────┐ │ dt │ typeof(dt) │ ├────────────┼────────────┤ │ 2022-01-02 │ VARCHAR │ └─────────────────────────┘ ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ### SQL Examples 1: Converting a Timestamp-Format String [Section titled “SQL Examples 1: Converting a Timestamp-Format String”](#sql-examples-1-converting-a-timestamp-format-string) ```sql SELECT TO_DATE('2022-01-02T01:12:00+07:00'), STR_TO_DATE('2022-01-02T01:12:00+07:00'); ┌─────────────────────────────────────────────────────────────────────────────────┐ │ to_date('2022-01-02t01:12:00+07:00') │ str_to_date('2022-01-02t01:12:00+07:00') │ ├──────────────────────────────────────┼──────────────────────────────────────────┤ │ 2022-01-01 │ 2022-01-01 │ └─────────────────────────────────────────────────────────────────────────────────┘ SELECT TO_DATE('2022-01-02'), STR_TO_DATE('2022-01-02'); ┌───────────────────────────────────────────────────┐ │ to_date('2022-01-02') │ str_to_date('2022-01-02') │ ├───────────────────────┼───────────────────────────┤ │ 2022-01-02 │ 2022-01-02 │ └───────────────────────────────────────────────────┘ ``` ### SQL Examples 2: Converting an Integer [Section titled “SQL Examples 2: Converting an Integer”](#sql-examples-2-converting-an-integer) ```sql SELECT TO_DATE(1), STR_TO_DATE(1), TO_DATE(-1), STR_TO_DATE(-1); ┌───────────────────────────────────────────────────────────────────┐ │ to_date(1) │ str_to_date(1) │ to_date((- 1)) │ str_to_date((- 1)) │ │ Date │ Date │ Date │ Date │ ├────────────┼────────────────┼────────────────┼────────────────────┤ │ 1970-01-02 │ 1970-01-02 │ 1969-12-31 │ 1969-12-31 │ └───────────────────────────────────────────────────────────────────┘ ``` ### SQL Examples 3: Converting a String Using the Given Format [Section titled “SQL Examples 3: Converting a String Using the Given Format”](#sql-examples-3-converting-a-string-using-the-given-format) ```sql SELECT TO_DATE('12/25/2022','%m/%d/%Y'), STR_TO_DATE('12/25/2022','%m/%d/%Y'); ┌───────────────────────────────────────────────────────────────────────────┐ │ to_date('12/25/2022', '%m/%d/%y') │ str_to_date('12/25/2022', '%m/%d/%y') │ ├───────────────────────────────────┼───────────────────────────────────────┤ │ 2022-12-25 │ 2022-12-25 │ └───────────────────────────────────────────────────────────────────────────┘ ``` # TO_DATETIME (Lakehouse v1) > TO_DATETIME — alias for the TO_TIMESTAMP datetime function. Alias for [TO\_TIMESTAMP](../to-timestamp). # TO_DAY_OF_MONTH (Lakehouse v1) > TO_DAY_OF_MONTH — convert a date or date with time (timestamp/datetime) to a UInt8 number. Convert a date or date with time (timestamp/datetime) to a UInt8 number containing the number of the day of the month (1-31). ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_day_of_month() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.now(), func.to_day_of_month(func.now()), func.day(func.now()) ┌──────────────────────────────────────────────────────────────────────────────────────┐ │ func.now() │ func.to_day_of_month(func.now()) │ func.day(func.now()) │ ├────────────────────────────┼──────────────────────────────────┼──────────────────────┤ │ 2024-03-14 23:35:41.947962 │ 14 │ 14 │ └──────────────────────────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_DAY_OF_MONTH() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | -------------- | | `` | date/timestamp | ## Aliases [Section titled “Aliases”](#aliases) * [DAY](../day) ## Return Type [Section titled “Return Type”](#return-type) `TINYINT` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT NOW(), TO_DAY_OF_MONTH(NOW()), DAY(NOW()); ┌──────────────────────────────────────────────────────────────────┐ │ now() │ to_day_of_month(now()) │ day(now()) │ ├────────────────────────────┼────────────────────────┼────────────┤ │ 2024-03-14 23:35:41.947962 │ 14 │ 14 │ └──────────────────────────────────────────────────────────────────┘ ``` # TO_DAY_OF_WEEK (Lakehouse v1) > TO_DAY_OF_WEEK — converts a date or date with time (timestamp/datetime) to a UInt8 number. Converts a date or date with time (timestamp/datetime) to a UInt8 number containing the number of the day of the week (Monday is 1, and Sunday is 7). ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_day_of_week() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_day_of_week('2023-11-12 09:38:18.165575') ┌────────────────────────────────────────────────────┐ │ func.to_day_of_week('2023-11-12 09:38:18.165575') │ │ UInt8 │ ├────────────────────────────────────────────────────┤ │ 7 │ └────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_DAY_OF_WEEK() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | -------------- | | `` | date/timestamp | ## Return Type [Section titled “Return Type”](#return-type) “TINYINT\` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT to_day_of_week('2023-11-12 09:38:18.165575') ┌──────────────────────────────────────────────┐ │ to_day_of_week('2023-11-12 09:38:18.165575') │ │ UInt8 │ ├──────────────────────────────────────────────┤ │ 7 │ └──────────────────────────────────────────────┘ ``` # TO_DAY_OF_YEAR (Lakehouse v1) > TO_DAY_OF_YEAR — convert a date or date with time (timestamp/datetime) to a UInt16 number. Convert a date or date with time (timestamp/datetime) to a UInt16 number containing the number of the day of the year (1-366). ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_day_of_year() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_day_of_week('2023-11-12 09:38:18.165575') ┌────────────────────────────────────────────────────┐ │ func.to_day_of_year('2023-11-12 09:38:18.165575') │ │ UInt8 │ ├────────────────────────────────────────────────────┤ │ 316 │ └────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_DAY_OF_YEAR() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | -------------- | | `` | date/timestamp | ## Return Type [Section titled “Return Type”](#return-type) `SMALLINT` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT to_day_of_year('2023-11-12 09:38:18.165575') ┌──────────────────────────────────────────────┐ │ to_day_of_year('2023-11-12 09:38:18.165575') │ │ UInt16 │ ├──────────────────────────────────────────────┤ │ 316 │ └──────────────────────────────────────────────┘ ``` # TO_HOUR (Lakehouse v1) > TO_HOUR — converts a date with time (timestamp/datetime) to a UInt8 number containing the number of the hour in 24-hour time (0-23). Converts a date with time (timestamp/datetime) to a UInt8 number containing the number of the hour in 24-hour time (0-23). This function assumes that if clocks are moved ahead, it is by one hour and occurs at 2 a.m., and if clocks are moved back, it is by one hour and occurs at 3 a.m. (which is not always true – even in Moscow the clocks were twice changed at a different time). ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_hour() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_hour('2023-11-12 09:38:18.165575') ┌────────────────────────────────────────────────────┐ │ func.to_hour('2023-11-12 09:38:18.165575') │ │ UInt8 │ ├────────────────────────────────────────────────────┤ │ 9 │ └────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_HOUR() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ----------- | | `` | timestamp | ## Return Type [Section titled “Return Type”](#return-type) `TINYINT` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT to_hour('2023-11-12 09:38:18.165575') ┌───────────────────────────────────────┐ │ to_hour('2023-11-12 09:38:18.165575') │ │ UInt8 │ ├───────────────────────────────────────┤ │ 9 │ └───────────────────────────────────────┘ ``` # TO_MINUTE (Lakehouse v1) > TO_MINUTE — converts a date with time (timestamp/datetime) to a UInt8 number containing the number of the minute of the hour (0-59). Converts a date with time (timestamp/datetime) to a UInt8 number containing the number of the minute of the hour (0-59). ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_minute() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_minute('2023-11-12 09:38:18.165575') ┌────────────────────────────────────────────────────┐ │ func.to_minute('2023-11-12 09:38:18.165575') │ │ UInt8 │ ├────────────────────────────────────────────────────┤ │ 38 │ └────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_MINUTE() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ----------- | | `` | timestamp | ## Return Type [Section titled “Return Type”](#return-type) `TINYINT` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT to_minute('2023-11-12 09:38:18.165575') ┌─────────────────────────────────────────┐ │ to_minute('2023-11-12 09:38:18.165575') │ │ UInt8 │ ├─────────────────────────────────────────┤ │ 38 │ └─────────────────────────────────────────┘ ``` # TO_MONDAY (Lakehouse v1) > TO_MONDAY — round down a date or date with time (timestamp/datetime) to the nearest Monday. Round down a date or date with time (timestamp/datetime) to the nearest Monday. Returns the date. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_monday() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_monday('2023-11-12 09:38:18.165575') ┌────────────────────────────────────────────────────┐ │ func.to_monday('2023-11-12 09:38:18.165575') │ │ Date │ ├────────────────────────────────────────────────────┤ │ 2023-11-06 │ └────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_MONDAY() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | -------------- | | `` | date/timestamp | ## Return Type [Section titled “Return Type”](#return-type) `DATE`, returns date in “YYYY-MM-DD” format. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT to_monday('2023-11-12 09:38:18.165575') ┌─────────────────────────────────────────┐ │ to_monday('2023-11-12 09:38:18.165575') │ │ Date │ ├─────────────────────────────────────────┤ │ 2023-11-06 │ └─────────────────────────────────────────┘ ``` # TO_MONTH (Lakehouse v1) > TO_MONTH — convert a date or date with time (timestamp/datetime) to a UInt8 number containing. Convert a date or date with time (timestamp/datetime) to a UInt8 number containing the month number (1-12). ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_month() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.now(), func.to_month(func.now()), func.month(func.now()) ┌─────────────────────────────────────────────────────────────────────────────────┐ │ func.now() │ func.to_month(func.now()) │ func.month(func.now()) │ ├────────────────────────────┼───────────────────────────┼────────────────────────┤ │ 2024-03-14 23:34:02.161291 │ 3 │ 3 │ └─────────────────────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_MONTH() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | -------------- | | `` | date/timestamp | ## Aliases [Section titled “Aliases”](#aliases) * [MONTH](../month) ## Return Type [Section titled “Return Type”](#return-type) `TINYINT` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT NOW(), TO_MONTH(NOW()), MONTH(NOW()); ┌─────────────────────────────────────────────────────────────┐ │ now() │ to_month(now()) │ month(now()) │ ├────────────────────────────┼─────────────────┼──────────────┤ │ 2024-03-14 23:34:02.161291 │ 3 │ 3 │ └─────────────────────────────────────────────────────────────┘ ``` # TO_QUARTER (Lakehouse v1) > TO_QUARTER — retrieves the quarter (1, 2, 3, or 4) from a given date or timestamp. Retrieves the quarter (1, 2, 3, or 4) from a given date or timestamp. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_quarter() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.now(), func.to_quarter(func.now()), func.quarter(func.now()) ┌─────────────────────────────────────────────────────────────────────────────────────┐ │ func.now() │ func.to_quarter(func.now()) │ func.quarter(func.now()) │ ├────────────────────────────┼─────────────────────────────┼──────────────────────────┤ │ 2024-03-14 23:32:52.743133 │ 3 │ 3 │ └─────────────────────────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_QUARTER( ) ``` ## Aliases [Section titled “Aliases”](#aliases) * [QUARTER](../quarter) ## Return Type [Section titled “Return Type”](#return-type) Integer. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT NOW(), TO_QUARTER(NOW()), QUARTER(NOW()); ┌─────────────────────────────────────────────────────────────────┐ │ now() │ to_quarter(now()) │ quarter(now()) │ ├────────────────────────────┼───────────────────┼────────────────┤ │ 2024-03-14 23:32:52.743133 │ 1 │ 1 │ └─────────────────────────────────────────────────────────────────┘ ``` # TO_SECOND (Lakehouse v1) > TO_SECOND — converts a date with time (timestamp/datetime) to a UInt8 number containing the number of the second in the minute (0-59). Converts a date with time (timestamp/datetime) to a UInt8 number containing the number of the second in the minute (0-59). ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_second() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_second('2023-11-12 09:38:18.165575') ┌──────────────────────────────────────────────┐ │ func.to_second('2023-11-12 09:38:18.165575') │ │ UInt8 │ ├──────────────────────────────────────────────┤ │ 18 │ └──────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_SECOND() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ----------- | | `` | timestamp | ## Return Type [Section titled “Return Type”](#return-type) `TINYINT` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT to_second('2023-11-12 09:38:18.165575') ┌─────────────────────────────────────────┐ │ to_second('2023-11-12 09:38:18.165575') │ │ UInt8 │ ├─────────────────────────────────────────┤ │ 18 │ └─────────────────────────────────────────┘ ``` # TO_START_OF_DAY (Lakehouse v1) > TO_START_OF_DAY — rounds down a date with time (timestamp/datetime) to the start of the day. Rounds down a date with time (timestamp/datetime) to the start of the day. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_start_of_day() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_start_of_day('2023-11-12 09:38:18.165575') ┌────────────────────────────────────────────────────┐ │ func.to_start_of_day('2023-11-12 09:38:18.165575') │ │ Timestamp │ ├────────────────────────────────────────────────────┤ │ 2023-11-12 00:00:00 │ └────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_START_OF_DAY( ) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ----------- | | `` | timestamp | ## Return Type [Section titled “Return Type”](#return-type) `TIMESTAMP`, returns date in “YYYY-MM-DD hh:mm:ss.ffffff” format. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT to_start_of_day('2023-11-12 09:38:18.165575') ┌───────────────────────────────────────────────┐ │ to_start_of_day('2023-11-12 09:38:18.165575') │ │ Timestamp │ ├───────────────────────────────────────────────┤ │ 2023-11-12 00:00:00 │ └───────────────────────────────────────────────┘ ``` # TO_START_OF_FIFTEEN_MINUTES (Lakehouse v1) > TO_START_OF_FIFTEEN_MINUTES — rounds down the date with time (timestamp/datetime) to the start. Rounds down the date with time (timestamp/datetime) to the start of the fifteen-minute interval. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_start_of_fifteen_minutes() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_start_of_fifteen_minutes('2023-11-12 09:38:18.165575') ┌────────────────────────────────────────────────────────────────┐ │ func.to_start_of_fifteen_minutes('2023-11-12 09:38:18.165575') │ │ Timestamp │ ├────────────────────────────────────────────────────────────────┤ │ 2023-11-12 09:30:00 │ └────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_START_OF_FIFTEEN_MINUTES() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ----------- | | `` | timestamp | ## Return Type [Section titled “Return Type”](#return-type) `TIMESTAMP`, returns date in “YYYY-MM-DD hh:mm:ss.ffffff” format. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT to_start_of_fifteen_minutes('2023-11-12 09:38:18.165575') ┌───────────────────────────────────────────────────────────┐ │ to_start_of_fifteen_minutes('2023-11-12 09:38:18.165575') │ │ Timestamp │ ├───────────────────────────────────────────────────────────┤ │ 2023-11-12 09:30:00 │ └───────────────────────────────────────────────────────────┘ ``` # TO_START_OF_FIVE_MINUTES (Lakehouse v1) > TO_START_OF_FIVE_MINUTES — rounds down a date with time (timestamp/datetime) to the start of the five-minute interval. Rounds down a date with time (timestamp/datetime) to the start of the five-minute interval. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_start_of_five_minutes() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_start_of_five_minutes('2023-11-12 09:38:18.165575') ┌────────────────────────────────────────────────────────────────┐ │ func.to_start_of_five_minutes('2023-11-12 09:38:18.165575') │ │ Timestamp │ ├────────────────────────────────────────────────────────────────┤ │ 2023-11-12 09:35:00 │ └────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_START_OF_FIVE_MINUTES() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ----------- | | `` | timestamp | ## Return Type [Section titled “Return Type”](#return-type) `TIMESTAMP`, returns date in “YYYY-MM-DD hh:mm:ss.ffffff” format. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT to_start_of_five_minutes('2023-11-12 09:38:18.165575') ┌────────────────────────────────────────────────────────┐ │ to_start_of_five_minutes('2023-11-12 09:38:18.165575') │ │ Timestamp │ ├────────────────────────────────────────────────────────┤ │ 2023-11-12 09:35:00 │ └────────────────────────────────────────────────────────┘ ``` # TO_START_OF_HOUR (Lakehouse v1) > TO_START_OF_HOUR — rounds down a date with time (timestamp/datetime) to the start of the hour. Rounds down a date with time (timestamp/datetime) to the start of the hour. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_start_of_hour() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_start_of_hour('2023-11-12 09:38:18.165575') ┌────────────────────────────────────────────────────────────────┐ │ func.to_start_of_hour('2023-11-12 09:38:18.165575') │ │ Timestamp │ ├────────────────────────────────────────────────────────────────┤ │ 2023-11-12 09:00:00 │ └────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_START_OF_HOUR() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ----------- | | `` | timestamp | ## Return Type [Section titled “Return Type”](#return-type) `TIMESTAMP`, returns date in “YYYY-MM-DD hh:mm:ss.ffffff” format. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT to_start_of_hour('2023-11-12 09:38:18.165575') ┌────────────────────────────────────────────────┐ │ to_start_of_hour('2023-11-12 09:38:18.165575') │ │ Timestamp │ ├────────────────────────────────────────────────┤ │ 2023-11-12 09:00:00 │ └────────────────────────────────────────────────┘ ``` # TO_START_OF_ISO_YEAR (Lakehouse v1) > TO_START_OF_ISO_YEAR — returns the first day of the ISO year for a date or a date with time. Returns the first day of the ISO year for a date or a date with time (timestamp/datetime). ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_start_of_iso_year() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_start_of_iso_year('2023-11-12 09:38:18.165575') ┌────────────────────────────────────────────────────────────────┐ │ func.to_start_of_iso_year('2023-11-12 09:38:18.165575') │ │ Date │ ├────────────────────────────────────────────────────────────────┤ │ 2023-01-02 │ └────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_START_OF_ISO_YEAR() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | -------------- | | `` | date/timestamp | ## Return Type [Section titled “Return Type”](#return-type) `DATE`, returns date in “YYYY-MM-DD” format. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT to_start_of_iso_year('2023-11-12 09:38:18.165575') ┌────────────────────────────────────────────────────┐ │ to_start_of_iso_year('2023-11-12 09:38:18.165575') │ │ Date │ ├────────────────────────────────────────────────────┤ │ 2023-01-02 │ └────────────────────────────────────────────────────┘ ``` # TO_START_OF_MINUTE (Lakehouse v1) > TO_START_OF_MINUTE — rounds down a date with time (timestamp/datetime) to the start of the minute. Rounds down a date with time (timestamp/datetime) to the start of the minute. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_start_of_minute() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_start_of_minute('2023-11-12 09:38:18.165575') ┌────────────────────────────────────────────────────────────────┐ │ func.to_start_of_minute('2023-11-12 09:38:18.165575') │ │ Timestamp │ ├────────────────────────────────────────────────────────────────┤ │ 2023-11-12 09:38:00 │ └────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_START_OF_MINUTE( ) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ----------- | | `` | timestamp | ## Return Type [Section titled “Return Type”](#return-type) `TIMESTAMP`, returns date in “YYYY-MM-DD hh:mm:ss.ffffff” format. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT to_start_of_minute('2023-11-12 09:38:18.165575') ┌──────────────────────────────────────────────────┐ │ to_start_of_minute('2023-11-12 09:38:18.165575') │ │ Timestamp │ ├──────────────────────────────────────────────────┤ │ 2023-11-12 09:38:00 │ └──────────────────────────────────────────────────┘ ``` # TO_START_OF_MONTH (Lakehouse v1) > TO_START_OF_MONTH — rounds down a date or date with time (timestamp/datetime) to the first day. Rounds down a date or date with time (timestamp/datetime) to the first day of the month. Returns the date. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_start_of_month() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_start_of_month('2023-11-12 09:38:18.165575') ┌────────────────────────────────────────────────────────────────┐ │ func.to_start_of_month('2023-11-12 09:38:18.165575') │ │ Date │ ├────────────────────────────────────────────────────────────────┤ │ 2023-11-01 │ └────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_START_OF_MONTH() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | -------------- | | `` | date/timestamp | ## Return Type [Section titled “Return Type”](#return-type) `DATE`, returns date in “YYYY-MM-DD” format. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT to_start_of_month('2023-11-12 09:38:18.165575') ┌─────────────────────────────────────────────────┐ │ to_start_of_month('2023-11-12 09:38:18.165575') │ │ Date │ ├─────────────────────────────────────────────────┤ │ 2023-11-01 │ └─────────────────────────────────────────────────┘ ``` # TO_START_OF_QUARTER (Lakehouse v1) > TO_START_OF_QUARTER — rounds down a date or date with time (timestamp/datetime) to the first. Rounds down a date or date with time (timestamp/datetime) to the first day of the quarter. The first day of the quarter is either 1 January, 1 April, 1 July, or 1 October. Returns the date. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_start_of_quarter() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_start_of_quarter('2023-11-12 09:38:18.165575') ┌────────────────────────────────────────────────────────────────┐ │ func.to_start_of_quarter('2023-11-12 09:38:18.165575') │ │ Date │ ├────────────────────────────────────────────────────────────────┤ │ 2023-10-01 │ └────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_START_OF_QUARTER() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | -------------- | | `` | date/timestamp | ## Return Type [Section titled “Return Type”](#return-type) `DATE`, returns date in “YYYY-MM-DD” format. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT to_start_of_quarter('2023-11-12 09:38:18.165575') ┌───────────────────────────────────────────────────┐ │ to_start_of_quarter('2023-11-12 09:38:18.165575') │ │ Date │ ├───────────────────────────────────────────────────┤ │ 2023-10-01 │ └───────────────────────────────────────────────────┘ ``` # TO_START_OF_SECOND (Lakehouse v1) > TO_START_OF_SECOND — rounds down a date with time (timestamp/datetime) to the start of the second. Rounds down a date with time (timestamp/datetime) to the start of the second. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_start_of_second() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_start_of_second('2023-11-12 09:38:18.165575') ┌────────────────────────────────────────────────────────────────┐ │ func.to_start_of_second('2023-11-12 09:38:18.165575') │ │ Timestamp │ ├────────────────────────────────────────────────────────────────┤ │ 2023-11-12 09:38:18 │ └────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_START_OF_SECOND() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ----------- | | `` | timestamp | ## Return Type [Section titled “Return Type”](#return-type) `TIMESTAMP`, returns date in “YYYY-MM-DD hh:mm:ss.ffffff” format. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT to_start_of_second('2023-11-12 09:38:18.165575') ┌──────────────────────────────────────────────────┐ │ to_start_of_second('2023-11-12 09:38:18.165575') │ │ Timestamp │ ├──────────────────────────────────────────────────┤ │ 2023-11-12 09:38:18 │ └──────────────────────────────────────────────────┘ ``` # TO_START_OF_TEN_MINUTES (Lakehouse v1) > TO_START_OF_TEN_MINUTES — rounds down a date with time (timestamp/datetime) to the start of the ten-minute interval. Rounds down a date with time (timestamp/datetime) to the start of the ten-minute interval. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_start_of_ten_minutes() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_start_of_ten_minutes('2023-11-12 09:38:18.165575') ┌────────────────────────────────────────────────────────────────┐ │ func.to_start_of_ten_minutes('2023-11-12 09:38:18.165575') │ │ Timestamp │ ├────────────────────────────────────────────────────────────────┤ │ 2023-11-12 09:30:00 │ └────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_START_OF_TEN_MINUTES() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ----------- | | `` | timestamp | ## Return Type [Section titled “Return Type”](#return-type) `TIMESTAMP`, returns date in “YYYY-MM-DD hh:mm:ss.ffffff” format. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT to_start_of_ten_minutes('2023-11-12 09:38:18.165575') ┌───────────────────────────────────────────────────────┐ │ to_start_of_ten_minutes('2023-11-12 09:38:18.165575') │ │ Timestamp │ ├───────────────────────────────────────────────────────┤ │ 2023-11-12 09:30:00 │ └───────────────────────────────────────────────────────┘ ``` # TO_START_OF_WEEK (Lakehouse v1) > TO_START_OF_WEEK — returns the first day of the week for a date or a date with time. Returns the first day of the week for a date or a date with time (timestamp/datetime). The first day of a week can be Sunday or Monday, which is specified by the argument `mode`. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_start_of_week() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_start_of_week('2023-11-12 09:38:18.165575') ┌────────────────────────────────────────────────────────────────┐ │ func.to_start_of_week('2023-11-12 09:38:18.165575') │ │ Date │ ├────────────────────────────────────────────────────────────────┤ │ 2023-11-12 │ └────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_START_OF_WEEK( [, mode]) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | --------------------------------------------------------------------------------------------------- | | `` | date/timestamp | | `[mode]` | Optional. If it is 0, the result is Sunday, otherwise, the result is Monday. The default value is 0 | ## Return Type [Section titled “Return Type”](#return-type) `DATE`, returns date in “YYYY-MM-DD” format. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT to_start_of_week('2023-11-12 09:38:18.165575') ┌────────────────────────────────────────────────┐ │ to_start_of_week('2023-11-12 09:38:18.165575') │ │ Date │ ├────────────────────────────────────────────────┤ │ 2023-11-12 │ └────────────────────────────────────────────────┘ ``` # TO_START_OF_YEAR (Lakehouse v1) > TO_START_OF_YEAR — returns the first day of the year for a date or a date with time. Returns the first day of the year for a date or a date with time (timestamp/datetime). ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_start_of_year() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_start_of_year('2023-11-12 09:38:18.165575') ┌────────────────────────────────────────────────────────────────┐ │ func.to_start_of_year('2023-11-12 09:38:18.165575') │ │ Date │ ├────────────────────────────────────────────────────────────────┤ │ 2023-01-01 │ └────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_START_OF_YEAR() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | -------------- | | `` | date/timestamp | ## Return Type [Section titled “Return Type”](#return-type) `DATE`, returns date in “YYYY-MM-DD” format. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT to_start_of_year('2023-11-12 09:38:18.165575') ┌────────────────────────────────────────────────┐ │ to_start_of_year('2023-11-12 09:38:18.165575') │ │ Date │ ├────────────────────────────────────────────────┤ │ 2023-01-01 │ └────────────────────────────────────────────────┘ ``` # TO_TIMESTAMP (Lakehouse v1) > TO_TIMESTAMP — tO_TIMESTAMP converts an expression to a date with time (timestamp/datetime). TO\_TIMESTAMP converts an expression to a date with time (timestamp/datetime). The function can accept one or two arguments. If given one argument, the function extracts a date from the string. If the argument is an integer, the function interprets the integer as the number of seconds, milliseconds, or microseconds before (for a negative number) or after (for a positive number) the Unix epoch (midnight on January 1, 1970): * If the integer is less than 31,536,000,000, it is treated as seconds. * If the integer is greater than or equal to 31,536,000,000 and less than 31,536,000,000,000, it is treated as milliseconds. * If the integer is greater than or equal to 31,536,000,000,000, it is treated as microseconds. If given two arguments, the function converts the first string to a timestamp based on the format specified in the second string. To customize the format of date and time in PlaidCloud Lakehouse, you can utilize specifiers. These specifiers allow you to define the desired format for date and time values. For a comprehensive list of supported specifiers, see Formatting Date and Time. * The output timestamp reflects your PlaidCloud Lakehouse timezone. * The timezone information must be included in the string you want to convert, otherwise NULL will be returned. Date formats are expressed using the `strftime` specification. see the [quick reference](https://devhints.io/strftime). See also: [TO\_DATE](../to-date) ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_timestamp() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_timestamp('2022-01-02T03:25:02.868894-07:00') ┌────────────────────────────────────────────────────────────────┐ │ func.to_timestamp('2022-01-02T03:25:02.868894-07:00') │ │ Timestamp │ ├────────────────────────────────────────────────────────────────┤ │ 2022-01-02 10:25:02.868894 │ └────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql -- Convert a string or integer to a timestamp TO_TIMESTAMP() -- Convert a string to a timestamp using the given pattern TO_TIMESTAMP() ``` ## Return Type [Section titled “Return Type”](#return-type) Returns a timestamp in the format “YYYY-MM-DD hh:mm:ss.ffffff”. If the given string matches this format but does not have the time part, it is automatically extended to this pattern. The padding value is 0. ## Aliases [Section titled “Aliases”](#aliases) * [TO\_DATETIME](../to-datetime) * [STR\_TO\_TIMESTAMP](../str-to-timestamp) ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ### Given a String Argument [Section titled “Given a String Argument”](#given-a-string-argument) ```sql SELECT TO_TIMESTAMP('2022-01-02T03:25:02.868894-07:00'); --- 2022-01-02 10:25:02.868894 SELECT TO_TIMESTAMP('2022-01-02 02:00:11'); --- 2022-01-02 02:00:11.000000 SELECT TO_TIMESTAMP('2022-01-02T02:00:22'); --- 2022-01-02 02:00:22.000000 SELECT TO_TIMESTAMP('2022-01-02T01:12:00-07:00'); --- 2022-01-02 08:12:00.000000 SELECT TO_TIMESTAMP('2022-01-02T01'); --- 2022-01-02 01:00:00.000000 ``` ### Given an Integer Argument [Section titled “Given an Integer Argument”](#given-an-integer-argument) ```sql SELECT TO_TIMESTAMP(1); --- 1970-01-01 00:00:01.000000 SELECT TO_TIMESTAMP(-1); --- 1969-12-31 23:59:59.000000 ``` Note Please note that a Timestamp value ranges from 1000-01-01 00:00:00.000000 to 9999-12-31 23:59:59.999999. PlaidCloud Lakehouse would return an error if you run the following statement: `sql SELECT TO_TIMESTAMP(9999999999999999999);` ### Given Two Arguments [Section titled “Given Two Arguments”](#given-two-arguments) ```sql SET GLOBAL timezone ='Japan'; SELECT TO_TIMESTAMP('2022 年 2 月 4 日、8 時 58 分 59 秒、タイムゾーン:+0900', '%Y年%m月%d日、%H時%M分%S秒、タイムゾーン:%z'); --- 2022-02-04 08:58:59.000000 SET GLOBAL timezone ='America/Toronto'; SELECT TO_TIMESTAMP('2022 年 2 月 4 日、8 時 58 分 59 秒、タイムゾーン:+0900', '%Y年%m月%d日、%H時%M分%S秒、タイムゾーン:%z'); --- 2022-02-03 18:58:59.000000 ``` # TO_UNIX_TIMESTAMP (Lakehouse v1) > TO_UNIX_TIMESTAMP — converts a timestamp in a date/time format to a Unix timestamp format. Converts a timestamp in a date/time format to a Unix timestamp format. A Unix timestamp represents the number of seconds that have elapsed since January 1, 1970, at 00:00:00 UTC. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_unix_timestamp() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_unix_timestamp('2023-11-12 09:38:18.165575') ┌────────────────────────────────────────────────────────────────┐ │ func.to_unix_timestamp('2023-11-12 09:38:18.165575') │ │ UInt32 │ ├────────────────────────────────────────────────────────────────┤ │ 1699781898 │ └────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_UNIX_TIMESTAMP() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ----------- | | `` | Timestamp | For more information about the timestamp data type, see Date & Time. ## Return Type [Section titled “Return Type”](#return-type) `BIGINT` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT to_unix_timestamp('2023-11-12 09:38:18.165575') ┌─────────────────────────────────────────────────┐ │ to_unix_timestamp('2023-11-12 09:38:18.165575') │ │ UInt32 │ ├─────────────────────────────────────────────────┤ │ 1699781898 │ └─────────────────────────────────────────────────┘ ``` # TO_WEEK_OF_YEAR (Lakehouse v1) > TO_WEEK_OF_YEAR — calculates the week number within a year for a given date. Calculates the week number within a year for a given date. ISO week numbering works as follows: January 4th is always considered part of the first week. If January 1st is a Thursday, then the week that spans from Monday, December 29th, to Sunday, January 4th, is designated as ISO week 1. If January 1st falls on a Friday, then the week that goes from Monday, January 4th, to Sunday, January 10th, is marked as ISO week 1. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_week_of_year() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.now(), func.to_week_of_year(func.now()), func.week(func.now()), func.weekofyear(func.now()) ┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ func.now() │ func.to_week_of_year(func.now()) │ func.week(func.now()) │ func.weekofyear(func.now()) │ ├────────────────────────────┼──────────────────────────────────┼───────────────────────┼─────────────────────────────┤ │ 2024-03-14 23:30:04.011624 │ 11 │ 11 │ 11 │ └─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_WEEK_OF_YEAR() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | -------------- | | `` | date/timestamp | ## Aliases [Section titled “Aliases”](#aliases) * [WEEK](../week) * [WEEKOFYEAR](../weekofyear) ## Return Type [Section titled “Return Type”](#return-type) Returns an integer that represents the week number within a year, with numbering ranging from 1 to 53. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT NOW(), TO_WEEK_OF_YEAR(NOW()), WEEK(NOW()), WEEKOFYEAR(NOW()); ┌───────────────────────────────────────────────────────────────────────────────────────┐ │ now() │ to_week_of_year(now()) │ week(now()) │ weekofyear(now()) │ ├────────────────────────────┼────────────────────────┼─────────────┼───────────────────┤ │ 2024-03-14 23:30:04.011624 │ 11 │ 11 │ 11 │ └───────────────────────────────────────────────────────────────────────────────────────┘ ``` # TO_YEAR (Lakehouse v1) > TO_YEAR — converts a date or date with time (timestamp/datetime) to a UInt16 number containing. Converts a date or date with time (timestamp/datetime) to a UInt16 number containing the year number (AD). ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_year() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.now(), func.to_year(func.now()), func.year(func.now()) ┌───────────────────────────────────────────────────────────────────────────────┐ │ func.now() │ func.to_year(func.now()) │ func.year(func.now()) │ ├────────────────────────────┼──────────────────────────┼───────────────────────┤ │ 2024-03-14 23:37:03.895166 │ 2024 │ 2024 │ └───────────────────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_YEAR() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | -------------- | | `` | date/timestamp | ## Aliases [Section titled “Aliases”](#aliases) * [YEAR](../year) ## Return Type [Section titled “Return Type”](#return-type) `SMALLINT` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT NOW(), TO_YEAR(NOW()), YEAR(NOW()); ┌───────────────────────────────────────────────────────────┐ │ now() │ to_year(now()) │ year(now()) │ ├────────────────────────────┼────────────────┼─────────────┤ │ 2024-03-14 23:37:03.895166 │ 2024 │ 2024 │ └───────────────────────────────────────────────────────────┘ ``` # TO_YYYYMM (Lakehouse v1) > TO_YYYYMM — converts a date or date with time (timestamp/datetime) to a UInt32 number. Converts a date or date with time (timestamp/datetime) to a UInt32 number containing the year and month number. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_yyyymm() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_yyyymm('2023-11-12 09:38:18.165575') ┌──────────────────────────────────────────────┐ │ func.to_yyyymm('2023-11-12 09:38:18.165575') │ │ UInt32 │ ├──────────────────────────────────────────────┤ │ 202311 │ └──────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_YYYYMM() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | -------------- | | `` | date/timestamp | ## Return Type [Section titled “Return Type”](#return-type) `INT`, returns in `YYYYMM` format. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT to_yyyymm('2023-11-12 09:38:18.165575') ┌─────────────────────────────────────────┐ │ to_yyyymm('2023-11-12 09:38:18.165575') │ │ UInt32 │ ├─────────────────────────────────────────┤ │ 202311 │ └─────────────────────────────────────────┘ ``` # TO_YYYYMMDD (Lakehouse v1) > TO_YYYYMMDD — converts a date or date with time (timestamp/datetime) to a UInt32 number. Converts a date or date with time (timestamp/datetime) to a UInt32 number containing the year and month number (YYYY \* 10000 + MM \* 100 + DD). ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_yyyymmdd() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_yyyymmdd('2023-11-12 09:38:18.165575') ┌────────────────────────────────────────────────┐ │ func.to_yyyymmdd('2023-11-12 09:38:18.165575') │ │ UInt32 │ ├────────────────────────────────────────────────┤ │ 20231112 │ └────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_YYYYMMDD() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ------------- | | `` | date/datetime | ## Return Type [Section titled “Return Type”](#return-type) `INT`, returns in `YYYYMMDD` format. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT to_yyyymmdd('2023-11-12 09:38:18.165575') ┌───────────────────────────────────────────┐ │ to_yyyymmdd('2023-11-12 09:38:18.165575') │ │ UInt32 │ ├───────────────────────────────────────────┤ │ 20231112 │ └───────────────────────────────────────────┘ ``` # TO_YYYYMMDDHH (Lakehouse v1) > TO_YYYYMMDDHH — formats a given date or timestamp into a string representation in the format. Formats a given date or timestamp into a string representation in the format “YYYYMMDDHH” (Year, Month, Day, Hour). ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_yyyymmddhh() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_yyyymmddhh('2023-11-12 09:38:18.165575') ┌──────────────────────────────────────────────────┐ │ func.to_yyyymmddhh('2023-11-12 09:38:18.165575') │ │ UInt32 │ ├──────────────────────────────────────────────────┤ │ 2023111209 │ └──────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_YYYYMMDDHH() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ------------- | | `` | date/datetime | ## Return Type [Section titled “Return Type”](#return-type) Returns an unsigned 64-bit integer (UInt64) in the format “YYYYMMDDHH”. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT to_yyyymmddhh('2023-11-12 09:38:18.165575') ┌─────────────────────────────────────────────┐ │ to_yyyymmddhh('2023-11-12 09:38:18.165575') │ │ UInt32 │ ├─────────────────────────────────────────────┤ │ 2023111209 │ └─────────────────────────────────────────────┘ ``` # TO_YYYYMMDDHHMMSS (Lakehouse v1) > TO_YYYYMMDDHHMMSS — convert a date or date with time (timestamp/datetime) to a UInt64 number. Convert a date or date with time (timestamp/datetime) to a UInt64 number containing the year and month number (YYYY \* 10000000000 + MM \* 100000000 + DD \* 1000000 + hh \* 10000 + mm \* 100 + ss). ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_yyyymmddhhmmss() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_yyyymmddhhmmss('2023-11-12 09:38:18.165575') ┌──────────────────────────────────────────────────────┐ │ func.to_yyyymmddhhmmss('2023-11-12 09:38:18.165575') │ │ UInt64 │ ├──────────────────────────────────────────────────────┤ │ 20231112092818 │ └──────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_YYYYMMDDHHMMSS() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | -------------- | | `` | date/timestamp | ## Return Type [Section titled “Return Type”](#return-type) `BIGINT`, returns in `YYYYMMDDhhmmss` format. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT to_yyyymmddhhmmss('2023-11-12 09:38:18.165575') ┌─────────────────────────────────────────────────┐ │ to_yyyymmddhhmmss('2023-11-12 09:38:18.165575') │ │ UInt64 │ ├─────────────────────────────────────────────────┤ │ 20231112092818 │ └─────────────────────────────────────────────────┘ ``` # TODAY (Lakehouse v1) > TODAY — returns current date. Returns current date. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.today() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.today() ┌──────────────┐ │ func.today() │ ├──────────────┤ │ 2021-09-03 │ └──────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TODAY() ``` ## Return Type [Section titled “Return Type”](#return-type) `DATE`, returns date in “YYYY-MM-DD” format. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT TODAY(); ┌────────────┐ │ TODAY() │ ├────────────┤ │ 2021-09-03 │ └────────────┘ ``` # TOMORROW (Lakehouse v1) > TOMORROW — Returns tomorrow date, same as today() + 1. Returns tomorrow date, same as `today() + 1`. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.tomorrow() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.tomorrow() ┌─────────────────┐ │ func.tomorrow() │ ├─────────────────┤ │ 2021-09-03 │ └─────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TOMORROW() ``` ## Return Type [Section titled “Return Type”](#return-type) `DATE`, returns date in “YYYY-MM-DD” format. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT TOMORROW(); ┌────────────┐ │ TOMORROW() │ ├────────────┤ │ 2021-09-04 │ └────────────┘ SELECT TODAY()+1; ┌───────────────┐ │ (TODAY() + 1) │ ├───────────────┤ │ 2021-09-04 │ └───────────────┘ ``` # TRY_TO_DATETIME (Lakehouse v1) > TRY_TO_DATETIME — alias for the TRY_TO_TIMESTAMP datetime function. Alias for [TRY\_TO\_TIMESTAMP](../try-to-timestamp). # TRY_TO_TIMESTAMP (Lakehouse v1) > TRY_TO_TIMESTAMP — a variant of TO_TIMESTAMP in PlaidCloud Lakehouse that, while performing. A variant of [TO\_TIMESTAMP](../to-timestamp) in PlaidCloud Lakehouse that, while performing the same conversion of an input expression to a timestamp, incorporates error-handling support by returning NULL if the conversion fails instead of raising an error. See also: [TO\_TIMESTAMP](../to-timestamp) ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.try_to_timestamp() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.try_to_timestamp('2022-01-02 02:00:11'), func.try_to_datetime('2022-01-02 02:00:11'), func.try_to_timestamp('plaidcloud') ┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ func.try_to_timestamp('2022-01-02 02:00:11') │ func.try_to_datetime('2022-01-02 02:00:11') │ func.try_to_timestamp('plaidcloud') │ │ Timestamp │ Timestamp │ │ ├─────────────────────────────────────────┼──────────────────────────────────────────────────┤─────────────────────────────────────│ │ 2022-01-02 02:00:11 │ 2022-01-02 02:00:11 │ NULL │ └──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql -- Convert a string or integer to a timestamp TRY_TO_TIMESTAMP() -- Convert a string to a timestamp using the given pattern TRY_TO_TIMESTAMP() ``` ## Aliases [Section titled “Aliases”](#aliases) * [TRY\_TO\_DATETIME](../try-to-datetime) ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT TRY_TO_TIMESTAMP('2022-01-02 02:00:11'), TRY_TO_DATETIME('2022-01-02 02:00:11'); ┌──────────────────────────────────────────────────────────────────────────────────┐ │ try_to_timestamp('2022-01-02 02:00:11') │ try_to_datetime('2022-01-02 02:00:11') │ │ Timestamp │ Timestamp │ ├─────────────────────────────────────────┼────────────────────────────────────────┤ │ 2022-01-02 02:00:11 │ 2022-01-02 02:00:11 │ └──────────────────────────────────────────────────────────────────────────────────┘ SELECT TRY_TO_TIMESTAMP('databend'), TRY_TO_DATETIME('databend'); ┌────────────────────────────────────────────────────────────┐ │ try_to_timestamp('databend') │ try_to_datetime('databend') │ ├──────────────────────────────┼─────────────────────────────┤ │ NULL │ NULL │ └────────────────────────────────────────────────────────────┘ ``` # WEEK (Lakehouse v1) > WEEK — alias for the TO_WEEK_OF_YEAR datetime function. Alias for [TO\_WEEK\_OF\_YEAR](../to-week-of-year). # WEEKOFYEAR (Lakehouse v1) > WEEKOFYEAR — alias for the TO_WEEK_OF_YEAR datetime function. Alias for [TO\_WEEK\_OF\_YEAR](../to-week-of-year). # YEAR (Lakehouse v1) > YEAR — alias for the TO_YEAR datetime function. Alias for [TO\_YEAR](../to-year). # YESTERDAY (Lakehouse v1) > YESTERDAY — Returns yesterday date, same as today() - 1. Returns yesterday date, same as `today() - 1`. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.yesterday() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.yesterday() ┌──────────────────┐ │ func.yesterday() │ ├──────────────────┤ │ 2021-09-02 │ └──────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql YESTERDAY() ``` ## Return Type [Section titled “Return Type”](#return-type) `DATE`, returns date in “YYYY-MM-DD” format. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT YESTERDAY(); ┌─────────────┐ │ YESTERDAY() │ ├─────────────┤ │ 2021-09-02 │ └─────────────┘ SELECT TODAY()-1; ┌───────────────┐ │ (TODAY() - 1) │ ├───────────────┤ │ 2021-09-02 │ └───────────────┘ ``` # Interval Functions (Lakehouse v1) > Lakehouse v1 SQL interval functions: add, subtract, and manipulate time intervals in date arithmetic. This section provides reference information for the interval functions in PlaidCloud Lakehouse. # EPOCH (Lakehouse v1) > EPOCH — alias for the TO_SECONDS interval function. Alias for [TO\_SECONDS](../to-seconds). # TO_CENTURIES (Lakehouse v1) > TO_CENTURIES — converts a specified number of centuries into an Interval type. Converts a specified number of centuries into an Interval type. * Accepts positive integers, zero, and negative integers as input. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_centuries() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_centuries(2) ┌──────────────────────────────────────────────────────┐ │ func.to_centuries(2) │ ├──────────────────────────────────────────────────────┤ │ 200 years │ └──────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_CENTURIES() ``` ## Return Type [Section titled “Return Type”](#return-type) Interval (represented in years). ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT TO_CENTURIES(2), TO_CENTURIES(0), TO_CENTURIES(-2); ┌───────────────────────────────────────────────────────┐ │ to_centuries(2) │ to_centuries(0) │ to_centuries(- 2) │ ├─────────────────┼─────────────────┼───────────────────┤ │ 200 years │ 00:00:00 │ -200 years │ └───────────────────────────────────────────────────────┘ ``` # TO_DAYS (Lakehouse v1) > TO_DAYS — Converts a specified number of days into an Interval type. Converts a specified number of days into an Interval type. * Accepts positive integers, zero, and negative integers as input. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_days() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_days(2) ┌──────────────────────────────────────────────────────┐ │ func.to_days(2) │ ├──────────────────────────────────────────────────────┤ │ 200 days │ └──────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_DAYS() ``` ## Return Type [Section titled “Return Type”](#return-type) Interval (represented in days). ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT TO_DAYS(2), TO_DAYS(0), TO_DAYS(-2); ┌────────────────────────────────────────┐ │ to_days(2) │ to_days(0) │ to_days(- 2) │ ├────────────┼────────────┼──────────────┤ │ 2 days │ 00:00:00 │ -2 days │ └────────────────────────────────────────┘ ``` # TO_DECADES (Lakehouse v1) > TO_DECADES — converts a specified number of decades into an Interval type. Converts a specified number of decades into an Interval type. * Accepts positive integers, zero, and negative integers as input. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_decades() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_decades(2) ┌──────────────────────────────────────────────────────┐ │ func.to_decades(2) │ ├──────────────────────────────────────────────────────┤ │ 20 years │ └──────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_DECADES() ``` ## Return Type [Section titled “Return Type”](#return-type) Interval (represented in years). ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT TO_DECADES(2), TO_DECADES(0), TO_DECADES((- 2)); ┌─────────────────────────────────────────────────┐ │ to_decades(2) │ to_decades(0) │ to_decades(- 2) │ ├───────────────┼───────────────┼─────────────────┤ │ 20 years │ 00:00:00 │ -20 years │ └─────────────────────────────────────────────────┘ ``` # TO_HOURS (Lakehouse v1) > TO_HOURS — Converts a specified number of hours into an Interval type. Converts a specified number of hours into an Interval type. * Accepts positive integers, zero, and negative integers as input. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_hours() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_hours(2) ┌──────────────────────────────────────────────────────┐ │ func.to_hours(2) │ ├──────────────────────────────────────────────────────┤ │ 2:00:00 │ └──────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_HOURS() ``` ## Return Type [Section titled “Return Type”](#return-type) Interval (in the format `hh:mm:ss`). ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT TO_HOURS(2), TO_HOURS(0), TO_HOURS((- 2)); ┌───────────────────────────────────────────┐ │ to_hours(2) │ to_hours(0) │ to_hours(- 2) │ ├─────────────┼─────────────┼───────────────┤ │ 2:00:00 │ 00:00:00 │ -2:00:00 │ └───────────────────────────────────────────┘ ``` # TO_MICROSECONDS (Lakehouse v1) > TO_MICROSECONDS — converts a specified number of microseconds into an Interval type. Converts a specified number of microseconds into an Interval type. * Accepts positive integers, zero, and negative integers as input. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_microseconds() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_microseconds(2) ┌──────────────────────────────────────────────────────┐ │ func.to_microseconds(2) │ ├──────────────────────────────────────────────────────┤ │ 0:00:00.000002 │ └──────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_MICROSECONDS() ``` ## Return Type [Section titled “Return Type”](#return-type) Interval (in the format `hh:mm:ss.sssssss`). ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT TO_MICROSECONDS(2), TO_MICROSECONDS(0), TO_MICROSECONDS((- 2)); ┌────────────────────────────────────────────────────────────────┐ │ to_microseconds(2) │ to_microseconds(0) │ to_microseconds(- 2) │ ├────────────────────┼────────────────────┼──────────────────────┤ │ 0:00:00.000002 │ 00:00:00 │ -0:00:00.000002 │ └────────────────────────────────────────────────────────────────┘ ``` # TO_MILLENNIA (Lakehouse v1) > TO_MILLENNIA — converts a specified number of millennia into an Interval type. Converts a specified number of millennia into an Interval type. * Accepts positive integers, zero, and negative integers as input. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_millennia() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_millennia(2) ┌──────────────────────────────────────────────────────┐ │ func.to_millennia(2) │ ├──────────────────────────────────────────────────────┤ │ 2000 years │ └──────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_MILLENNIA() ``` ## Return Type [Section titled “Return Type”](#return-type) Interval (represented in years). ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT TO_MILLENNIA(2), TO_MILLENNIA(0), TO_MILLENNIA((- 2)); ┌───────────────────────────────────────────────────────┐ │ to_millennia(2) │ to_millennia(0) │ to_millennia(- 2) │ ├─────────────────┼─────────────────┼───────────────────┤ │ 2000 years │ 00:00:00 │ -2000 years │ └───────────────────────────────────────────────────────┘ ``` # TO_MILLISECONDS (Lakehouse v1) > TO_MILLISECONDS — converts a specified number of milliseconds into an Interval type. Converts a specified number of milliseconds into an Interval type. * Accepts positive integers, zero, and negative integers as input. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_milliseconds() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_milliseconds(2) ┌──────────────────────────────────────────────────────┐ │ func.to_milliseconds(2) │ ├──────────────────────────────────────────────────────┤ │ 0:00:00.002 │ └──────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_MILLISECONDS() ``` ## Return Type [Section titled “Return Type”](#return-type) Interval (in the format `hh:mm:ss.sss`). ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT TO_MILLISECONDS(2), TO_MILLISECONDS(0), TO_MILLISECONDS((- 2)); ┌────────────────────────────────────────────────────────────────┐ │ to_milliseconds(2) │ to_milliseconds(0) │ to_milliseconds(- 2) │ ├────────────────────┼────────────────────┼──────────────────────┤ │ 0:00:00.002 │ 00:00:00 │ -0:00:00.002 │ └────────────────────────────────────────────────────────────────┘ ``` # TO_MINUTES (Lakehouse v1) > TO_MINUTES — converts a specified number of minutes into an Interval type. Converts a specified number of minutes into an Interval type. * Accepts positive integers, zero, and negative integers as input. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_minutes() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_minutes(2) ┌──────────────────────────────────────────────────────┐ │ func.to_minutes(2) │ ├──────────────────────────────────────────────────────┤ │ 0:02:00 │ └──────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_MINUTES() ``` ## Return Type [Section titled “Return Type”](#return-type) Interval (in the format `hh:mm:ss`). ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT TO_MINUTES(2), TO_MINUTES(0), TO_MINUTES((- 2)); ┌─────────────────────────────────────────────────┐ │ to_minutes(2) │ to_minutes(0) │ to_minutes(- 2) │ ├───────────────┼───────────────┼─────────────────┤ │ 0:02:00 │ 00:00:00 │ -0:02:00 │ └─────────────────────────────────────────────────┘ ``` # TO_MONTHS (Lakehouse v1) > TO_MONTHS — Converts a specified number of months into an Interval type. Converts a specified number of months into an Interval type. * Accepts positive integers, zero, and negative integers as input. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_months() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_months(2) ┌──────────────────────────────────────────────────────┐ │ func.to_months(2) │ ├──────────────────────────────────────────────────────┤ │ 2 months │ └──────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_MONTHS() ``` ## Return Type [Section titled “Return Type”](#return-type) Interval (represented in months). ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT TO_MONTHS(2), TO_MONTHS(0), TO_MONTHS((- 2)); ┌──────────────────────────────────────────────┐ │ to_months(2) │ to_months(0) │ to_months(- 2) │ ├──────────────┼──────────────┼────────────────┤ │ 2 months │ 00:00:00 │ -2 months │ └──────────────────────────────────────────────┘ ``` # TO_SECONDS (Lakehouse v1) > TO_SECONDS — converts a specified number of seconds into an Interval type. Converts a specified number of seconds into an Interval type. * Accepts positive integers, zero, and negative integers as input. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_seconds() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_seconds(2) ┌──────────────────────────────────────────────────────┐ │ func.to_seconds(2) │ ├──────────────────────────────────────────────────────┤ │ 0:00:02 │ └──────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_SECONDS() ``` ## Aliases [Section titled “Aliases”](#aliases) * [EPOCH](../epoch) ## Return Type [Section titled “Return Type”](#return-type) Interval (in the format `hh:mm:ss`). ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT TO_SECONDS(2), TO_SECONDS(0), TO_SECONDS((- 2)); ┌─────────────────────────────────────────────────┐ │ to_seconds(2) │ to_seconds(0) │ to_seconds(- 2) │ ├───────────────┼───────────────┼─────────────────┤ │ 0:00:02 │ 00:00:00 │ -0:00:02 │ └─────────────────────────────────────────────────┘ ``` # TO_WEEKS (Lakehouse v1) > TO_WEEKS — Converts a specified number of weeks into an Interval type. Converts a specified number of weeks into an Interval type. * Accepts positive integers, zero, and negative integers as input. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_weeks() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_weeks(2) ┌──────────────────────────────────────────────────────┐ │ func.to_weeks(2) │ ├──────────────────────────────────────────────────────┤ │ 14 days │ └──────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_WEEKS() ``` ## Return Type [Section titled “Return Type”](#return-type) Interval (represented in days). ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT TO_WEEKS(2), TO_WEEKS(0), TO_WEEKS((- 2)); ┌───────────────────────────────────────────┐ │ to_weeks(2) │ to_weeks(0) │ to_weeks(- 2) │ ├─────────────┼─────────────┼───────────────┤ │ 14 days │ 00:00:00 │ -14 days │ └───────────────────────────────────────────┘ ``` # TO_YEARS (Lakehouse v1) > TO_YEARS — Converts a specified number of years into an Interval type. Converts a specified number of years into an Interval type. * Accepts positive integers, zero, and negative integers as input. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_years() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_years(2) ┌──────────────────────────────────────────────────────┐ │ func.to_years(2) │ ├──────────────────────────────────────────────────────┤ │ 2 years │ └──────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_YEARS() ``` ## Return Type [Section titled “Return Type”](#return-type) Interval (represented in years). ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT TO_YEARS(2), TO_YEARS(0), TO_YEARS((- 2)); ┌───────────────────────────────────────────┐ │ to_years(2) │ to_years(0) │ to_years(- 2) │ ├─────────────┼─────────────┼───────────────┤ │ 2 years │ 00:00:00 │ -2 years │ └───────────────────────────────────────────┘ ``` # String Functions (Lakehouse v1) > Lakehouse v1 SQL string functions: manipulate text — case, trim, split, search, replace, format, and encode. This section provides reference information for the string-related functions in PlaidCloud Lakehouse. ## String Manipulation: [Section titled “String Manipulation:”](#string-manipulation) * [CONCAT](concat) * [CONCAT\_WS](concat-ws) * [INSERT](insert) * [LEFT](left) * [LPAD](lpad) * [REPEAT](repeat) * [REPLACE](replace) * [REVERSE](reverse) * [RIGHT](right) * [RPAD](rpad) * [SPLIT](split) * [SPLIT\_PART](split-part) * [SUBSTR](substr) * [SUBSTRING](substring) * [TRANSLATE](translate) * [TRIM](trim) ## String Information: [Section titled “String Information:”](#string-information) * [ASCII](ascii) * [BIT\_LENGTH](bit-length) * [CHAR\_LENGTH](char-length) * [CHARACTER\_LENGTH](character-length) * [INSTR](instr) * [LENGTH](length) * [LOCATE](locate) * [OCTET\_LENGTH](octet-length) * [ORD](ord) * [POSITION](position) * [STRCMP](strcmp) ## Case Conversion: [Section titled “Case Conversion:”](#case-conversion) * [LCASE](lcase) * [LOWER](lower) * [UCASE](ucase) * [UPPER](upper) ## Regular Expressions: [Section titled “Regular Expressions:”](#regular-expressions) * [LIKE](like) * [NOT\_LIKE](not-like) * [NOT\_REGEXP](not-regexp) * [NOT\_RLIKE](not-rlike) * [REGEXP](regexp) * [REGEXP\_INSTR](regexp-instr) * [REGEXP\_LIKE](regexp-like) * [REGEXP\_REPLACE](regexp-replace) * [REGEXP\_SUBSTR](regexp-substr) * [RLIKE](rlike) ## Encoding and Decoding: [Section titled “Encoding and Decoding:”](#encoding-and-decoding) * [BIN](bin) * [FROM\_BASE64](from-base64) * [HEX](hex) * [OCT](oct) * [TO\_BASE64](to-base64) * [UNHEX](unhex) ## Miscellaneous: [Section titled “Miscellaneous:”](#miscellaneous) * [CHAR](char) * [MID](mid) * [QUOTE](quote) * [SOUNDEX](soundex) * [SOUNDSLIKE](soundslike) * [SPACE](space) # ASCII (Lakehouse v1) > ASCII — returns the numeric value of the leftmost character of the string str. Returns the numeric value of the leftmost character of the string str. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.ascii() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.ascii('2') ┌─────────────────┐ │ func.ascii('2') │ ├─────────────────┤ │ 50 │ └─────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ASCII() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ----------- | | `` | The string. | ## Return Type [Section titled “Return Type”](#return-type) `TINYINT` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ASCII('2'); ┌────────────┐ │ ASCII('2') │ ├────────────┤ │ 50 │ └────────────┘ ``` # BIN (Lakehouse v1) > BIN — Returns a string representation of the binary value of N. Returns a string representation of the binary value of N. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bin() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bin(12) ┌──────────────┐ │ func.bin(12) │ ├──────────────┤ │ 1100 │ └──────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BIN() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ----------- | | `` | The number. | ## Return Type [Section titled “Return Type”](#return-type) `VARCHAR` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BIN(12); ┌─────────┐ │ BIN(12) │ ├─────────┤ │ 1100 │ └─────────┘ ``` # BIT_LENGTH (Lakehouse v1) > BIT_LENGTH — Return the length of a string in bits. Return the length of a string in bits. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bit_length() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bit_length('Word') ┌─────────────────────────┐ │ func.bit_length('Word') │ ├─────────────────────────┤ │ 32 │ └─────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BIT_LENGTH() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ----------- | | `` | The string. | ## Return Type [Section titled “Return Type”](#return-type) `BIGINT` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BIT_LENGTH('Word'); ┌────────────────────────────┐ │ SELECT BIT_LENGTH('Word'); │ ├────────────────────────────┤ │ 32 │ └────────────────────────────┘ ``` # CHAR (Lakehouse v1) > CHAR — Return the character for each integer passed. Return the character for each integer passed. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.char(N,...) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.char(77,121,83,81,76) ┌─────────────────────────────┐ │ func.char(77,121,83,81,76) │ ├─────────────────────────────┤ │ 4D7953514C │ └─────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql CHAR(N, ...) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | -------------- | | N | Numeric Column | ## Return Type [Section titled “Return Type”](#return-type) `BINARY` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) This example shows both the binary value returned as well as the string representation. ```sql SELECT CHAR(77,121,83,81,76) as a, a::String; ┌────────────────────────┐ │ a │ a::string │ │ Binary │ String │ ├────────────┼───────────┤ │ 4D7953514C │ MySQL │ └────────────────────────┘ ``` # CHAR_LENGTH (Lakehouse v1) > CHAR_LENGTH — alias for the LENGTH string function. Alias for [LENGTH](../length). # CHARACTER_LENGTH (Lakehouse v1) > CHARACTER_LENGTH — alias for the LENGTH string function. Alias for [LENGTH](../length). # CONCAT (Lakehouse v1) > CONCAT — Returns the string that results from concatenating the arguments. Returns the string that results from concatenating the arguments. May have one or more arguments. If all arguments are nonbinary strings, the result is a nonbinary string. If the arguments include any binary strings, the result is a binary string. A numeric argument is converted to its equivalent nonbinary string form. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.concat(, ...) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.concat('data', 'bend') ┌─────────────────────────────┐ │ func.concat('data', 'bend') │ ├─────────────────────────────┤ │ databend │ └─────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql CONCAT(, ...) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ----------- | | `` | string | ## Return Type [Section titled “Return Type”](#return-type) A `VARCHAR` data type value Or `NULL` data type. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT CONCAT('data', 'bend'); ┌────────────────────────┐ │ concat('data', 'bend') │ ├────────────────────────┤ │ databend │ └────────────────────────┘ SELECT CONCAT('data', NULL, 'bend'); ┌──────────────────────────────┐ │ CONCAT('data', NULL, 'bend') │ ├──────────────────────────────┤ │ NULL │ └──────────────────────────────┘ SELECT CONCAT('14.3'); ┌────────────────┐ │ concat('14.3') │ ├────────────────┤ │ 14.3 │ └────────────────┘ ``` # CONCAT_WS (Lakehouse v1) > CONCAT_WS — cONCAT_WS() stands for Concatenate With Separator and is a special form of CONCAT(). CONCAT\_WS() stands for Concatenate With Separator and is a special form of CONCAT(). The first argument is the separator for the rest of the arguments. The separator is added between the strings to be concatenated. The separator can be a string, as can the rest of the arguments. If the separator is NULL, the result is NULL. CONCAT\_WS() does not skip empty strings. However, it does skip any NULL values after the separator argument. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.concat_ws(, , ...) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.concat_ws(',', 'data', 'fuse', 'labs', '2021') ┌─────────────────────────────────────────────────────┐ │ func.concat_ws(',', 'data', 'fuse', 'labs', '2021') │ ├─────────────────────────────────────────────────────┤ │ data,fuse,labs,2021 │ └─────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql CONCAT_WS(, , ...) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | ------------- | ------------- | | `` | string column | | `` | value column | ## Return Type [Section titled “Return Type”](#return-type) A `VARCHAR` data type value Or `NULL` data type. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT CONCAT_WS(',', 'data', 'fuse', 'labs', '2021'); ┌────────────────────────────────────────────────┐ │ CONCAT_WS(',', 'data', 'fuse', 'labs', '2021') │ ├────────────────────────────────────────────────┤ │ data,fuse,labs,2021 │ └────────────────────────────────────────────────┘ SELECT CONCAT_WS(',', 'data', NULL, 'bend'); ┌──────────────────────────────────────┐ │ CONCAT_WS(',', 'data', NULL, 'bend') │ ├──────────────────────────────────────┤ │ data,bend │ └──────────────────────────────────────┘ SELECT CONCAT_WS(',', 'data', NULL, NULL, 'bend'); ┌────────────────────────────────────────────┐ │ CONCAT_WS(',', 'data', NULL, NULL, 'bend') │ ├────────────────────────────────────────────┤ │ data,bend │ └────────────────────────────────────────────┘ SELECT CONCAT_WS(NULL, 'data', 'fuse', 'labs'); ┌─────────────────────────────────────────┐ │ CONCAT_WS(NULL, 'data', 'fuse', 'labs') │ ├─────────────────────────────────────────┤ │ NULL │ └─────────────────────────────────────────┘ SELECT CONCAT_WS(',', NULL); ┌──────────────────────┐ │ CONCAT_WS(',', NULL) │ ├──────────────────────┤ │ │ └──────────────────────┘ ``` # FROM_BASE64 (Lakehouse v1) > FROM_BASE64 — takes a string encoded with the base-64 encoded rules nd returns the decoded. Takes a string encoded with the base-64 encoded rules nd returns the decoded result as a binary. The result is NULL if the argument is NULL or not a valid base-64 string. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.from_base64() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.from_base64('YWJj') ┌──────────────────────────┐ │ func.from_base64('YWJj') │ ├──────────────────────────┤ │ abc │ └──────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql FROM_BASE64() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ----------------- | | `` | The string value. | ## Return Type [Section titled “Return Type”](#return-type) `BINARY` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT TO_BASE64('abc'), FROM_BASE64(TO_BASE64('abc')) as b, b::String; ┌───────────────────────────────────────┐ │ to_base64('abc') │ b │ b::string │ │ String │ Binary │ String │ ├──────────────────┼────────┼───────────┤ │ YWJj │ 616263 │ abc │ └───────────────────────────────────────┘ ``` # FROM_HEX (Lakehouse v1) > FROM_HEX — alias for the UNHEX string function. Alias for [UNHEX](../unhex). # HEX (Lakehouse v1) > HEX — alias for the TO_HEX string function. Alias for [TO\_HEX](../../02-conversion-functions/to-hex). # INSERT (Lakehouse v1) > INSERT — returns the string str, with the substring beginning at position pos and len characters. Returns the string str, with the substring beginning at position pos and len characters long replaced by the string newstr. Returns the original string if pos is not within the length of the string. Replaces the rest of the string from position pos if len is not within the length of the rest of the string. Returns NULL if any argument is NULL. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.insert(, , , ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.insert('Quadratic', 3, 4, 'What') ┌────────────────────────────────────────┐ │ func.insert('Quadratic', 3, 4, 'What') │ ├────────────────────────────────────────┤ │ QuWhattic │ └────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql INSERT(, , , ) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | ---------- | --------------- | | `` | The string. | | `` | The position. | | `` | The length. | | `` | The new string. | ## Return Type [Section titled “Return Type”](#return-type) `VARCHAR` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT INSERT('Quadratic', 3, 4, 'What'); ┌───────────────────────────────────┐ │ INSERT('Quadratic', 3, 4, 'What') │ ├───────────────────────────────────┤ │ QuWhattic │ └───────────────────────────────────┘ SELECT INSERT('Quadratic', -1, 4, 'What'); ┌───────────────────────────────────────┐ │ INSERT('Quadratic', (- 1), 4, 'What') │ ├───────────────────────────────────────┤ │ Quadratic │ └───────────────────────────────────────┘ SELECT INSERT('Quadratic', 3, 100, 'What'); ┌─────────────────────────────────────┐ │ INSERT('Quadratic', 3, 100, 'What') │ ├─────────────────────────────────────┤ │ QuWhat │ └─────────────────────────────────────┘ ┌────────────────────────────────────────────┬────────┐ │ INSERT('123456789', number, number, 'aaa') │ number │ ├────────────────────────────────────────────┼────────┤ │ 123456789 │ 0 │ │ aaa23456789 │ 1 │ │ 1aaa456789 │ 2 │ │ 12aaa6789 │ 3 │ │ 123aaa89 │ 4 │ │ 1234aaa │ 5 │ │ 12345aaa │ 6 │ │ 123456aaa │ 7 │ │ 1234567aaa │ 8 │ │ 12345678aaa │ 9 │ │ 123456789 │ 10 │ │ 123456789 │ 11 │ │ 123456789 │ 12 │ └────────────────────────────────────────────┴────────┘ ``` # INSTR (Lakehouse v1) > INSTR — returns the position of the first occurrence of substring substr in string str. Returns the position of the first occurrence of substring substr in string str. This is the same as the two-argument form of LOCATE(), except that the order of the arguments is reversed. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.instr(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.instr('foobarbar', 'bar') ┌────────────────────────────────┐ │ func.instr('foobarbar', 'bar') │ ├────────────────────────────────┤ │ 4 │ └────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql INSTR(, ) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | ---------- | -------------- | | `` | The string. | | `` | The substring. | ## Return Type [Section titled “Return Type”](#return-type) `BIGINT` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT INSTR('foobarbar', 'bar'); ┌───────────────────────────┐ │ INSTR('foobarbar', 'bar') │ ├───────────────────────────┤ │ 4 │ └───────────────────────────┘ SELECT INSTR('xbar', 'foobar'); ┌─────────────────────────┐ │ INSTR('xbar', 'foobar') │ ├─────────────────────────┤ │ 0 │ └─────────────────────────┘ ``` # JARO_WINKLER (Lakehouse v1) > JARO_WINKLER — Calculates the Jaro-Winkler distance between two strings. Calculates the [Jaro-Winkler distance](https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance) between two strings. It is commonly used for measuring the similarity between strings, with values ranging from 0.0 (completely dissimilar) to 1.0 (identical strings). ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.jaro_winkler(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.jaro_winkler('databend', 'Databend') ┌───────────────────────────────────────────┐ │ func.jaro_winkler('databend', 'Databend') │ ├───────────────────────────────────────────┤ │ 0.9166666666666666 │ └───────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql JARO_WINKLER(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) The JARO\_WINKLER function returns a FLOAT64 value representing the similarity between the two input strings. The return value follows these rules: * Similarity Range: The result ranges from 0.0 (completely dissimilar) to 1.0 (identical). ```sql SELECT JARO_WINKLER('databend', 'Databend') AS similarity; ┌────────────────────┐ │ similarity │ ├────────────────────┤ │ 0.9166666666666666 │ └────────────────────┘ SELECT JARO_WINKLER('databend', 'database') AS similarity; ┌────────────┐ │ similarity │ ├────────────┤ │ 0.9 │ └────────────┘ ``` * NULL Handling: If either string1 or string2 is NULL, the result is NULL. ```sql SELECT JARO_WINKLER('databend', NULL) AS similarity; ┌────────────┐ │ similarity │ ├────────────┤ │ NULL │ └────────────┘ ``` * Empty Strings: * Comparing two empty strings returns 1.0. ```sql SELECT JARO_WINKLER('', '') AS similarity; ┌────────────┐ │ similarity │ ├────────────┤ │ 1 │ └────────────┘ ``` * Comparing an empty string with a non-empty string returns 0.0. ```sql SELECT JARO_WINKLER('databend', '') AS similarity; ┌────────────┐ │ similarity │ ├────────────┤ │ 0 │ └────────────┘ ``` # LCASE (Lakehouse v1) > LCASE — alias for the LOWER string function. Alias for [LOWER](../lower). # LEFT (Lakehouse v1) > LEFT — returns the leftmost len characters from the string str, or NULL if any argument is NULL. Returns the leftmost len characters from the string str, or NULL if any argument is NULL. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.left(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.left('foobarbar', 5) ┌───────────────────────────┐ │ func.left('foobarbar', 5) │ ├───────────────────────────┤ │ fooba │ └───────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql LEFT(, ); ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | -------------------------------------------------------- | | `` | The main string from where the character to be extracted | | `` | The count of characters | ## Return Type [Section titled “Return Type”](#return-type) `VARCHAR` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT LEFT('foobarbar', 5); ┌──────────────────────┐ │ LEFT('foobarbar', 5) │ ├──────────────────────┤ │ fooba │ └──────────────────────┘ ``` # LENGTH (Lakehouse v1) > LENGTH — Returns the length of a given input string or binary value. Returns the length of a given input string or binary value. In the case of strings, the length represents the count of characters, with each UTF-8 character considered as a single character. For binary data, the length corresponds to the number of bytes. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.length() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.length('Hello') ┌──────────────────────┐ │ func.length('Hello') │ ├──────────────────────┤ │ 5 │ └──────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql LENGTH() ``` ## Aliases [Section titled “Aliases”](#aliases) * [CHAR\_LENGTH](../char-length) * [CHARACTER\_LENGTH](../character-length) * [LENGTH\_UTF8](../length-utf8) ## Return Type [Section titled “Return Type”](#return-type) BIGINT ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT LENGTH('Hello'), LENGTH_UTF8('Hello'), CHAR_LENGTH('Hello'), CHARACTER_LENGTH('Hello'); ┌───────────────────────────────────────────────────────────────────────────────────────────┐ │ length('hello') │ length_utf8('hello') │ char_length('hello') │ character_length('hello') │ ├─────────────────┼──────────────────────┼──────────────────────┼───────────────────────────┤ │ 5 │ 5 │ 5 │ 5 │ └───────────────────────────────────────────────────────────────────────────────────────────┘ ``` # LENGTH_UTF8 (Lakehouse v1) > LENGTH_UTF8 — alias for the LENGTH string function. Alias for [LENGTH](../length). # LIKE (Lakehouse v1) > LIKE — pattern matching using an SQL pattern. Pattern matching using an SQL pattern. Returns 1 (TRUE) or 0 (FALSE). If either expr or pat is NULL, the result is NULL. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python .like('plaid%') ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python my_clothes.like('plaid%) ┌─────────────────┐ │ my_clothes │ ├─────────────────┤ │ plaid pants │ │ plaid hat │ │ plaid shirt │ └─────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql LIKE ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT name, category FROM system.functions WHERE name like 'tou%' ORDER BY name; ┌──────────┬────────────┐ │ name │ category │ ├──────────┼────────────┤ │ touint16 │ conversion │ │ touint32 │ conversion │ │ touint64 │ conversion │ │ touint8 │ conversion │ └──────────┴────────────┘ ``` # LOCATE (Lakehouse v1) > LOCATE — the first syntax returns the position of the first occurrence of substring substr in string str. The first syntax returns the position of the first occurrence of substring substr in string str. The second syntax returns the position of the first occurrence of substring substr in string str, starting at position pos. Returns 0 if substr is not in str. Returns NULL if any argument is NULL. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.locate(, , ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.locate('bar', 'foobarbar') ┌────────────────────────────────────┐ │ func.locate('bar', 'foobarbar') │ ├────────────────────────────────────┤ │ 5 │ └────────────────────────────────────┘ ``` ```python func.locate('bar', 'foobarbar', 5) ┌────────────────────────────────────┐ │ func.locate('bar', 'foobarbar', 5) │ ├────────────────────────────────────┤ │ 7 │ └────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql LOCATE(, ) LOCATE(, , ) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | ---------- | -------------- | | `` | The substring. | | `` | The string. | | `` | The position. | ## Return Type [Section titled “Return Type”](#return-type) `BIGINT` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT LOCATE('bar', 'foobarbar') ┌────────────────────────────┐ │ LOCATE('bar', 'foobarbar') │ ├────────────────────────────┤ │ 4 │ └────────────────────────────┘ SELECT LOCATE('xbar', 'foobar') ┌──────────────────────────┐ │ LOCATE('xbar', 'foobar') │ ├──────────────────────────┤ │ 0 │ └──────────────────────────┘ SELECT LOCATE('bar', 'foobarbar', 5) ┌───────────────────────────────┐ │ LOCATE('bar', 'foobarbar', 5) │ ├───────────────────────────────┤ │ 7 │ └───────────────────────────────┘ ``` # LOWER (Lakehouse v1) > LOWER — Returns a string with all characters changed to lowercase. Returns a string with all characters changed to lowercase. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.lower() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.lower('Hello, PlaidCloud!') ┌──────────────────────────────────┐ │ func.lower('Hello, PlaidCloud!') │ ├──────────────────────────────────┤ │ hello, plaidcloud! │ └──────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql LOWER() ``` ## Aliases [Section titled “Aliases”](#aliases) * [LCASE](../lcase) ## Return Type [Section titled “Return Type”](#return-type) VARCHAR ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT LOWER('Hello, Databend!'), LCASE('Hello, Databend!'); ┌───────────────────────────────────────────────────────┐ │ lower('hello, databend!') │ lcase('hello, databend!') │ ├───────────────────────────┼───────────────────────────┤ │ hello, databend! │ hello, databend! │ └───────────────────────────────────────────────────────┘ ``` # LPAD (Lakehouse v1) > LPAD — returns the string str, left-padded with the string padstr to a length of len characters. Returns the string str, left-padded with the string padstr to a length of len characters. If str is longer than len, the return value is shortened to len characters. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.lpad(, , ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.lpad('hi',4,'??') ┌────────────────────────┐ │ func.lpad('hi',4,'??') │ ├────────────────────────┤ │ ??hi │ └────────────────────────┘ ``` ```python func.lpad('hi',1,'??') ┌────────────────────────┐ │ func.lpad('hi',1,'??') │ ├────────────────────────┤ │ h │ └────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql LPAD(, , ) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | ---------- | --------------- | | `` | The string. | | `` | The length. | | `` | The pad string. | ## Return Type [Section titled “Return Type”](#return-type) `VARCHAR` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT LPAD('hi',4,'??'); ┌─────────────────────┐ │ LPAD('hi', 4, '??') │ ├─────────────────────┤ │ ??hi │ └─────────────────────┘ SELECT LPAD('hi',1,'??'); ┌─────────────────────┐ │ LPAD('hi', 1, '??') │ ├─────────────────────┤ │ h │ └─────────────────────┘ ``` # LTRIM (Lakehouse v1) > LTRIM — removes all occurrences of any character present in the specified trim string from the left side of the string. Removes all occurrences of any character present in the specified trim string from the left side of the string. See also: * [TRIM\_LEADING](../trim-leading) * [RTRIM](../rtrim) ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.ltrim(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.ltrim('xxdatabend', 'x') ┌────────────────────────────────┐ │ func.ltrim('xxdatabend', 'x') │ ├────────────────────────────────┤ │ databend │ └────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql LTRIM(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT LTRIM('xxdatabend', 'xx'), LTRIM('xxdatabend', 'xy'); ┌───────────────────────────────────────────────────────┐ │ ltrim('xxdatabend', 'xx') │ ltrim('xxdatabend', 'xy') │ ├───────────────────────────┼───────────────────────────┤ │ databend │ databend │ └───────────────────────────────────────────────────────┘ ``` # MID (Lakehouse v1) > MID — alias for the SUBSTR string function. Alias for [SUBSTR](../substr). # NOT LIKE (Lakehouse v1) > NOT LIKE — pattern not matching using an SQL pattern. Pattern not matching using an SQL pattern. Returns 1 (TRUE) or 0 (FALSE). If either expr or pat is NULL, the result is NULL. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python .not_like() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python my_clothes.not_like('%pants) ┌─────────────────┐ │ my_clothes │ ├─────────────────┤ │ plaid pants XL │ │ plaid hat │ │ plaid shirt │ └─────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql NOT LIKE ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT name, category FROM system.functions WHERE name like 'tou%' AND name not like '%64' ORDER BY name; ┌──────────┬────────────┐ │ name │ category │ ├──────────┼────────────┤ │ touint16 │ conversion │ │ touint32 │ conversion │ │ touint8 │ conversion │ └──────────┴────────────┘ ``` # NOT REGEXP (Lakehouse v1) > NOT REGEXP — returns 1 if the string expr doesn't match the regular expression specified by the pattern pat, 0 otherwise. Returns 1 if the string expr doesn’t match the regular expression specified by the pattern pat, 0 otherwise. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python not_(.regexp_match()) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python With an input table of: ┌─────────────────┐ │ my_clothes │ ├─────────────────┤ │ plaid pants │ │ plaid hat │ │ plaid shirt │ │ shoes │ └─────────────────┘ not_(my_clothes.regexp_match('p*')) ┌─────────────────────────────────────┐ │ not_(my_clothes.regexp_match('p*')) │ ├─────────────────────────────────────┤ │ false │ │ false │ │ false │ │ true │ └─────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql NOT REGEXP ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT 'databend' NOT REGEXP 'd*'; ┌──────────────────────────────┐ │ ('databend' not regexp 'd*') │ ├──────────────────────────────┤ │ 0 │ └──────────────────────────────┘ ``` # NOT RLIKE (Lakehouse v1) > NOT RLIKE — returns 1 if the string expr doesn't match the regular expression specified by the pattern pat, 0 otherwise. Returns 1 if the string expr doesn’t match the regular expression specified by the pattern pat, 0 otherwise. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python not_(.regexp_match()) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python With an input table of: ┌─────────────────┐ │ my_clothes │ ├─────────────────┤ │ plaid pants │ │ plaid hat │ │ plaid shirt │ │ shoes │ └─────────────────┘ not_(my_clothes.regexp_match('p*')) ┌─────────────────────────────────────┐ │ not_(my_clothes.regexp_match('p*')) │ ├─────────────────────────────────────┤ │ false │ │ false │ │ false │ │ true │ └─────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql NOT RLIKE ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT 'databend' not rlike 'd*'; ┌─────────────────────────────┐ │ ('databend' not rlike 'd*') │ ├─────────────────────────────┤ │ 0 │ └─────────────────────────────┘ ``` # OCT (Lakehouse v1) > OCT — Returns a string representation of the octal value of N. Returns a string representation of the octal value of N. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.oct() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.oct(12) ┌─────────────────┐ │ func.oct(12) │ ├─────────────────┤ │ 014 │ └─────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql OCT() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT OCT(12); ┌─────────┐ │ OCT(12) │ ├─────────┤ │ 014 │ └─────────┘ ``` # OCTET_LENGTH (Lakehouse v1) > OCTET_LENGTH — OCTET_LENGTH() is a synonym for LENGTH(). OCTET\_LENGTH() is a synonym for LENGTH(). ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.octet_length() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.octet_length('databend') ┌───────────────────────────────┐ │ func.octet_length('databend') │ ├───────────────────────────────┤ │ 8 │ └───────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql OCTET_LENGTH() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT OCTET_LENGTH('databend'); ┌──────────────────────────┐ │ OCTET_LENGTH('databend') │ ├──────────────────────────┤ │ 8 │ └──────────────────────────┘ ``` # ORD (Lakehouse v1) > ORD — return the character code for the leftmost character of a string (ASCII value for single-byte, computed value for multibyte). If the leftmost character is not a multibyte character, ORD() returns the same value as the ASCII() function. If the leftmost character of the string str is a multibyte character, returns the code for that character, calculated from the numeric values of its constituent bytes using this formula: ```sql (1st byte code) + (2nd byte code * 256) + (3rd byte code * 256^2) ... ``` ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.ord() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.ord('2') ┌────────────────┐ │ func.ord('2) │ ├────────────────┤ │ 50 │ └────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ORD() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ----------- | | `` | The string. | ## Return Type [Section titled “Return Type”](#return-type) `BIGINT` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ORD('2') ┌────────┐ │ ORD(2) │ ├────────┤ │ 50 │ └────────┘ ``` # POSITION (Lakehouse v1) > POSITION — POSITION(substr IN str) is a synonym for LOCATE(substr,str). POSITION(substr IN str) is a synonym for LOCATE(substr,str). Returns the position of the first occurrence of substring substr in string str. Returns 0 if substr is not in str. Returns NULL if any argument is NULL. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.position(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.position('bar', 'foobarbar') ┌───────────────────────────────────┐ │ func.position('bar', 'foobarbar') │ ├───────────────────────────────────┤ │ 4 │ └───────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql POSITION( IN ) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | ---------- | -------------- | | `` | The substring. | | `` | The string. | ## Return Type [Section titled “Return Type”](#return-type) `BIGINT` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT POSITION('bar' IN 'foobarbar') ┌────────────────────────────┐ │ POSITION('bar' IN 'foobarbar') │ ├────────────────────────────┤ │ 4 │ └────────────────────────────┘ SELECT POSITION('xbar' IN 'foobar') ┌──────────────────────────┐ │ POSITION('xbar' IN 'foobar') │ ├──────────────────────────┤ │ 0 │ └──────────────────────────┘ ``` # QUOTE (Lakehouse v1) > QUOTE — quotes a string to produce a result that can be used as a properly escaped data value in an SQL statement. Quotes a string to produce a result that can be used as a properly escaped data value in an SQL statement. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.quote() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.quote('Don\'t') ┌──────────────────────┐ │ func.quote('Don\'t') │ ├──────────────────────┤ │ Don\'t! │ └──────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql QUOTE() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT QUOTE('Don\'t!'); ┌─────────────────┐ │ QUOTE('Don't!') │ ├─────────────────┤ │ Don\'t! │ └─────────────────┘ SELECT QUOTE(NULL); ┌─────────────┐ │ QUOTE(NULL) │ ├─────────────┤ │ NULL │ └─────────────┘ ``` # REGEXP (Lakehouse v1) > REGEXP — returns true if the string matches the regular expression specified by the , false otherwise. Returns `true` if the string `` matches the regular expression specified by the ``, `false` otherwise. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python .regexp_match() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python With an input table of: ┌─────────────────┐ │ my_clothes │ ├─────────────────┤ │ plaid pants │ │ plaid hat │ │ plaid shirt │ │ shoes │ └─────────────────┘ my_clothes.regexp_match('p*') ┌───────────────────────────────┐ │ my_clothes.regexp_match('p*') │ ├───────────────────────────────┤ │ true │ │ true │ │ true │ │ false │ └───────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql REGEXP ``` ## Aliases [Section titled “Aliases”](#aliases) * [RLIKE](../rlike) ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT 'databend' REGEXP 'd*', 'databend' RLIKE 'd*'; ┌────────────────────────────────────────────────────┐ │ ('databend' regexp 'd*') │ ('databend' rlike 'd*') │ ├──────────────────────────┼─────────────────────────┤ │ true │ true │ └────────────────────────────────────────────────────┘ ``` # REGEXP_INSTR (Lakehouse v1) > REGEXP_INSTR — returns the starting index of the substring of the string expr that matches the regular expression specified by the pattern pat, 0 if there is. Returns the starting index of the substring of the string `expr` that matches the regular expression specified by the pattern `pat`, `0` if there is no match. If `expr` or `pat` is NULL, the return value is NULL. Character indexes begin at `1`. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.regexp_instr(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.regexp_instr('dog cat dog', 'dog') ┌─────────────────────────────────────────┐ │ func.regexp_instr('dog cat dog', 'dog') │ ├─────────────────────────────────────────┤ │ 1 │ └─────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql REGEXP_INSTR(, ) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | expr | The string expr that to be matched | | pat | The regular expression | | pos | Optional. The position in expr at which to start the search. If omitted, the default is 1. | | occurrence | Optional. Which occurrence of a match to search for. If omitted, the default is 1. | | return\_option | Optional. Which type of position to return. If this value is 0, REGEXP\_INSTR() returns the position of the matched substring’s first character. If this value is 1, REGEXP\_INSTR() returns the position following the matched substring. If omitted, the default is 0. | | match\_type | Optional. A string that specifies how to perform matching. The meaning is as described for REGEXP\_LIKE(). | ## Return Type [Section titled “Return Type”](#return-type) A number data type value. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT REGEXP_INSTR('dog cat dog', 'dog'); ┌────────────────────────────────────┐ │ REGEXP_INSTR('dog cat dog', 'dog') │ ├────────────────────────────────────┤ │ 1 │ └────────────────────────────────────┘ SELECT REGEXP_INSTR('dog cat dog', 'dog', 2); ┌───────────────────────────────────────┐ │ REGEXP_INSTR('dog cat dog', 'dog', 2) │ ├───────────────────────────────────────┤ │ 9 │ └───────────────────────────────────────┘ SELECT REGEXP_INSTR('aa aaa aaaa', 'a{2}'); ┌─────────────────────────────────────┐ │ REGEXP_INSTR('aa aaa aaaa', 'a{2}') │ ├─────────────────────────────────────┤ │ 1 │ └─────────────────────────────────────┘ SELECT REGEXP_INSTR('aa aaa aaaa', 'a{4}'); ┌─────────────────────────────────────┐ │ REGEXP_INSTR('aa aaa aaaa', 'a{4}') │ ├─────────────────────────────────────┤ │ 8 │ └─────────────────────────────────────┘ ``` # REGEXP_LIKE (Lakehouse v1) > REGEXP_LIKE — rEGEXP_LIKE function is used to check that whether the string matches the regular. REGEXP\_LIKE function is used to check that whether the string matches the regular expression. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.regexp_like(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.regexp_like('a', '^[a-d]') ┌─────────────────────────────────┐ │ func.regexp_like('a', '^[a-d]') │ ├─────────────────────────────────┤ │ 1 │ └─────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql REGEXP_LIKE(, ) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | -------------- | ---------------------------------------------------------------------------------- | | `` | The string expr that to be matched | | `` | The regular expression | | `[match_type]` | Optional. match\_type argument is a string that specifying how to perform matching | `match_type` may contain any or all the following characters: * `c`: Case-sensitive matching. * `i`: Case-insensitive matching. * `m`: Multiple-line mode. Recognize line terminators within the string. The default behavior is to match line terminators only at the start and end of the string expression. * `n`: The `.` character matches line terminators. The default is for `.` matching to stop at the end of a line. * `u`: Unix-only line endings. Not be supported now. ## Return Type [Section titled “Return Type”](#return-type) `BIGINT` Returns `1` if the string expr matches the regular expression specified by the pattern pat, `0` otherwise. If expr or pat is NULL, the return value is NULL. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT REGEXP_LIKE('a', '^[a-d]'); ┌────────────────────────────┐ │ REGEXP_LIKE('a', '^[a-d]') │ ├────────────────────────────┤ │ 1 │ └────────────────────────────┘ SELECT REGEXP_LIKE('abc', 'ABC'); ┌───────────────────────────┐ │ REGEXP_LIKE('abc', 'ABC') │ ├───────────────────────────┤ │ 1 │ └───────────────────────────┘ SELECT REGEXP_LIKE('abc', 'ABC', 'c'); ┌────────────────────────────────┐ │ REGEXP_LIKE('abc', 'ABC', 'c') │ ├────────────────────────────────┤ │ 0 │ └────────────────────────────────┘ SELECT REGEXP_LIKE('new*\n*line', 'new\\*.\\*line'); ┌───────────────────────────────────────────┐ │ REGEXP_LIKE('new*\n*line', 'new\*.\*line')│ ├───────────────────────────────────────────┤ │ 0 │ └───────────────────────────────────────────┘ SELECT REGEXP_LIKE('new*\n*line', 'new\\*.\\*line', 'n'); ┌────────────────────────────────────────────────┐ │ REGEXP_LIKE('new*\n*line', 'new\*.\*line', 'n')│ ├────────────────────────────────────────────────┤ │ 1 │ └────────────────────────────────────────────────┘ ``` # REGEXP_REPLACE (Lakehouse v1) > REGEXP_REPLACE — replaces occurrences in the string expr that match the regular expression. Replaces occurrences in the string `expr` that match the regular expression specified by the pattern `pat` with the replacement string `repl`, and returns the resulting string. If `expr`, `pat`, or `repl` is NULL, the return value is NULL. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.regexp_replace(, , ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.regexp_replace('a b c', 'b', 'X') ┌────────────────────────────────────────┐ │ func.regexp_replace('a b c', 'b', 'X') │ ├────────────────────────────────────────┤ │ a X c │ └────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql REGEXP_REPLACE(, , ) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | ----------- | ----------------------------------------------------------------------------------------------------------------------- | | expr | The string expr that to be matched | | pat | The regular expression | | repl | The replacement string | | pos | Optional. The position in expr at which to start the search. If omitted, the default is 1. | | occurrence | Optional. Which occurrence of a match to replace. If omitted, the default is 0 (which means “replace all occurrences”). | | match\_type | Optional. A string that specifies how to perform matching. The meaning is as described for REGEXP\_LIKE(). | ## Return Type [Section titled “Return Type”](#return-type) `VARCHAR` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT REGEXP_REPLACE('a b c', 'b', 'X'); ┌───────────────────────────────────┐ │ REGEXP_REPLACE('a b c', 'b', 'X') │ ├───────────────────────────────────┤ │ a X c │ └───────────────────────────────────┘ SELECT REGEXP_REPLACE('abc def ghi', '[a-z]+', 'X', 1, 3); ┌────────────────────────────────────────────────────┐ │ REGEXP_REPLACE('abc def ghi', '[a-z]+', 'X', 1, 3) │ ├────────────────────────────────────────────────────┤ │ abc def X │ └────────────────────────────────────────────────────┘ SELECT REGEXP_REPLACE('周 周周 周周周', '周+', 'X', 3, 2); ┌───────────────────────────────────────────────────────────┐ │ REGEXP_REPLACE('周 周周 周周周', '周+', 'X', 3, 2) │ ├───────────────────────────────────────────────────────────┤ │ 周 周周 X │ └───────────────────────────────────────────────────────────┘ ``` # REGEXP_SUBSTR (Lakehouse v1) > REGEXP_SUBSTR — returns the substring of the string expr that matches the regular expression. Returns the substring of the string `expr` that matches the regular expression specified by the pattern `pat`, NULL if there is no match. If expr or pat is NULL, the return value is NULL. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.regexp_substr(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.regexp_substr('abc def ghi', '[a-z]+') ┌─────────────────────────────────────────────┐ │ func.regexp_substr('abc def ghi', '[a-z]+') │ ├─────────────────────────────────────────────┤ │ abc │ └─────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql REGEXP_SUBSTR(, ) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | ----------- | ---------------------------------------------------------------------------------------------------------- | | expr | The string expr that to be matched | | pat | The regular expression | | pos | Optional. The position in expr at which to start the search. If omitted, the default is 1. | | occurrence | Optional. Which occurrence of a match to search for. If omitted, the default is 1. | | match\_type | Optional. A string that specifies how to perform matching. The meaning is as described for REGEXP\_LIKE(). | ## Return Type [Section titled “Return Type”](#return-type) `VARCHAR` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT REGEXP_SUBSTR('abc def ghi', '[a-z]+'); ┌────────────────────────────────────────┐ │ REGEXP_SUBSTR('abc def ghi', '[a-z]+') │ ├────────────────────────────────────────┤ │ abc │ └────────────────────────────────────────┘ SELECT REGEXP_SUBSTR('abc def ghi', '[a-z]+', 1, 3); ┌──────────────────────────────────────────────┐ │ REGEXP_SUBSTR('abc def ghi', '[a-z]+', 1, 3) │ ├──────────────────────────────────────────────┤ │ ghi │ └──────────────────────────────────────────────┘ SELECT REGEXP_SUBSTR('周 周周 周周周 周周周周', '周+', 2, 3); ┌──────────────────────────────────────────────────────────────────┐ │ REGEXP_SUBSTR('周 周周 周周周 周周周周', '周+', 2, 3) │ ├──────────────────────────────────────────────────────────────────┤ │ 周周周周 │ └──────────────────────────────────────────────────────────────────┘ ``` # REPEAT (Lakehouse v1) > REPEAT — returns a string consisting of the string str repeated count times. Returns a string consisting of the string str repeated count times. If count is less than 1, returns an empty string. Returns NULL if str or count are NULL. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.repeat(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.repeat(, ) ┌─────────────────────────┐ │ func.repeat('plaid', 3) │ ├─────────────────────────┤ │ plaidplaidplaid │ └─────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql REPEAT(, ) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ----------- | | `` | The string. | | `` | The number. | ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT REPEAT('databend', 3); ┌──────────────────────────┐ │ REPEAT('databend', 3) │ ├──────────────────────────┤ │ databenddatabenddatabend │ └──────────────────────────┘ SELECT REPEAT('databend', 0); ┌───────────────────────┐ │ REPEAT('databend', 0) │ ├───────────────────────┤ │ │ └───────────────────────┘ SELECT REPEAT('databend', NULL); ┌──────────────────────────┐ │ REPEAT('databend', NULL) │ ├──────────────────────────┤ │ NULL │ └──────────────────────────┘ ``` # REPLACE (Lakehouse v1) > REPLACE — returns the string str with all occurrences of the string from_str replaced by the string to_str. Returns the string str with all occurrences of the string from\_str replaced by the string to\_str. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.replace(, , ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.replace(, , ) ┌──────────────────────────────────────┐ │ func.replace('plaidCloud', 'p', 'P') │ ├──────────────────────────────────────┤ │ PlaidCloud │ └──────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql REPLACE(, , ) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | ------------ | ---------------- | | `` | The string. | | `` | The from string. | | `` | The to string. | ## Return Type [Section titled “Return Type”](#return-type) `VARCHAR` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT REPLACE('www.mysql.com', 'w', 'Ww'); ┌─────────────────────────────────────┐ │ REPLACE('www.mysql.com', 'w', 'Ww') │ ├─────────────────────────────────────┤ │ WwWwWw.mysql.com │ └─────────────────────────────────────┘ ``` # REVERSE (Lakehouse v1) > REVERSE — returns the string str with the order of the characters reversed. Returns the string str with the order of the characters reversed. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.reverse() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.reverse('abc') ┌──────────────────────┐ │ func..reverse('abc') │ ├──────────────────────┤ │ cba │ └──────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql REVERSE() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ----------------- | | `` | The string value. | ## Return Type [Section titled “Return Type”](#return-type) `VARCHAR` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT REVERSE('abc'); ┌────────────────┐ │ REVERSE('abc') │ ├────────────────┤ │ cba │ └────────────────┘ ``` # RIGHT (Lakehouse v1) > RIGHT — returns the rightmost len characters from the string str, or NULL if any argument is NULL. Returns the rightmost len characters from the string str, or NULL if any argument is NULL. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.right(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.right('foobarbar', 4) ┌────────────────────────────┐ │ func.right('foobarbar', 4) │ ├────────────────────────────┤ │ rbar │ └────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql RIGHT(, ); ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | -------------------------------------------------------- | | `` | The main string from where the character to be extracted | | `` | The count of characters | ## Return Type [Section titled “Return Type”](#return-type) `VARCHAR` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT RIGHT('foobarbar', 4); ┌───────────────────────┐ │ RIGHT('foobarbar', 4) │ ├───────────────────────┤ │ rbar │ └───────────────────────┘ ``` # RLIKE (Lakehouse v1) > RLIKE — alias for the REGEXP string function. Alias for [REGEXP](../regexp). # RPAD (Lakehouse v1) > RPAD — returns the string str, right-padded with the string padstr to a length of len characters. Returns the string str, right-padded with the string padstr to a length of len characters. If str is longer than len, the return value is shortened to len characters. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.rpad(, , ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.rpad('hi',5,'?') ┌───────────────────────┐ │ func.rpad('hi',5,'?') │ ├───────────────────────┤ │ hi??? │ └───────────────────────┘ func.rpad('hi',1,'?') ┌───────────────────────┐ │ func.rpad('hi',1,'?') │ ├───────────────────────┤ │ h │ └───────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql RPAD(, , ) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | ---------- | --------------- | | `` | The string. | | `` | The length. | | `` | The pad string. | ## Return Type [Section titled “Return Type”](#return-type) `VARCHAR` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT RPAD('hi',5,'?'); ┌────────────────────┐ │ RPAD('hi', 5, '?') │ ├────────────────────┤ │ hi??? │ └────────────────────┘ SELECT RPAD('hi',1,'?'); ┌────────────────────┐ │ RPAD('hi', 1, '?') │ ├────────────────────┤ │ h │ └────────────────────┘ ``` # RTRIM (Lakehouse v1) > RTRIM — removes all occurrences of any character present in the specified trim string from the right side of the string. Removes all occurrences of any character present in the specified trim string from the right side of the string. See also: * [TRIM\_TRAILING](../trim-trailing) * [LTRIM](../ltrim) ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.rtrim(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.rtrim('databend'xx, 'x') ┌────────────────────────────────┐ │ func.rtrim('databendxx', 'x') │ ├────────────────────────────────┤ │ databend │ └────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql RTRIM(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT RTRIM('databendxx', 'x'), RTRIM('databendxx', 'xy'); ┌──────────────────────────────────────────────────────┐ │ rtrim('databendxx', 'x') │ rtrim('databendxx', 'xy') │ ├──────────────────────────┼───────────────────────────┤ │ databend │ databend │ └──────────────────────────────────────────────────────┘ ``` # SOUNDEX (Lakehouse v1) > SOUNDEX — Generates the Soundex code for a string. Generates the Soundex code for a string. * A Soundex code consists of a letter followed by three numerical digits. PlaidCloud Lakehouse’s implementation returns more than 4 digits, but you can [SUBSTR](../substr) the result to get a standard Soundex code. * All non-alphabetic characters in the string are ignored. * All international alphabetic characters outside the A-Z range are ignored unless they’re the first letter. Note What is Soundex? Soundex converts an alphanumeric string to a four-character code that is based on how the string sounds when spoken in English. For more information, see See also: [SOUNDS LIKE](../soundslike) ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.soundex() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.soundex('PlaidCloud Lakehouse') ┌──────────────────────────────────────┐ │ func.soundex('PlaidCloud Lakehouse') │ ├──────────────────────────────────────┤ │ D153 │ └──────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql SOUNDEX() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ----------- | | str | The string. | ## Return Type [Section titled “Return Type”](#return-type) Returns a code of type VARCHAR or a NULL value. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT SOUNDEX('PlaidCloud Lakehouse'); --- D153 -- All non-alphabetic characters in the string are ignored. SELECT SOUNDEX('PlaidCloud Lakehouse!'); --- D153 -- All international alphabetic characters outside the A-Z range are ignored unless they're the first letter. SELECT SOUNDEX('PlaidCloud Lakehouse,你好'); --- D153 SELECT SOUNDEX('你好,PlaidCloud Lakehouse'); --- 你3153 -- SUBSTR the result to get a standard Soundex code. SELECT SOUNDEX('databend cloud'),SUBSTR(SOUNDEX('databend cloud'),1,4); soundex('databend cloud')|substring(soundex('databend cloud') from 1 for 4)| -------------------------+-------------------------------------------------+ D153243 |D153 | SELECT SOUNDEX(NULL); ┌─────────────────────────────────────┐ │ `SOUNDEX(NULL)` │ ├─────────────────────────────────────┤ │ │ └─────────────────────────────────────┘ ``` # SOUNDS LIKE (Lakehouse v1) > SOUNDS LIKE — compares the pronunciation of two strings by their Soundex codes. Compares the pronunciation of two strings by their Soundex codes. Soundex is a phonetic algorithm that produces a code representing the pronunciation of a string, allowing for approximate matching of strings based on their pronunciation rather than their spelling. PlaidCloud Lakehouse offers the [SOUNDEX](../soundex) function that allows you to get the Soundex code from a string. SOUNDS LIKE is frequently employed in the WHERE clause of SQL queries to narrow down rows using fuzzy string matching, such as for names and addresses, see [Filtering Rows](#filtering-rows) in [Examples](#examples). Note While the function can be useful for approximate string matching, it is important to note that it is not always accurate. The Soundex algorithm is based on English pronunciation rules and may not work well for strings from other languages or dialects. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.sounds_like(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func..sounds_like('Monday', 'Sunday') ┌───────────────────────────────────────┐ │ func..sounds_like('Monday', 'Sunday') │ ├───────────────────────────────────────┤ │ 0 │ └───────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql SOUNDS LIKE ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ------------------------ | | str1, 2 | The strings you compare. | ## Return Type [Section titled “Return Type”](#return-type) Return a Boolean value of 1 if the Soundex codes for the two strings are the same (which means they sound alike) and 0 otherwise. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ### Comparing Strings [Section titled “Comparing Strings”](#comparing-strings) ```sql SELECT 'two' SOUNDS LIKE 'too' ---- 1 SELECT CONCAT('A', 'B') SOUNDS LIKE 'AB'; ---- 1 SELECT 'Monday' SOUNDS LIKE 'Sunday'; ---- 0 ``` ### Filtering Rows [Section titled “Filtering Rows”](#filtering-rows) ```sql SELECT * FROM employees; id|first_name|last_name|age| --+----------+---------+---+ 0|John |Smith | 35| 0|Mark |Smythe | 28| 0|Johann |Schmidt | 51| 0|Eric |Doe | 30| 0|Sue |Johnson | 45| SELECT * FROM employees WHERE first_name SOUNDS LIKE 'John'; id|first_name|last_name|age| --+----------+---------+---+ 0|John |Smith | 35| 0|Johann |Schmidt | 51| ``` # SPACE (Lakehouse v1) > SPACE — Returns a string consisting of N blank space characters. Returns a string consisting of N blank space characters. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.space() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.space(20) ┌─────────────────┐ │ func.space(20) │ ├─────────────────┤ │ │ └─────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql SPACE(); ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | -------------------- | | `` | The number of spaces | ## Return Type [Section titled “Return Type”](#return-type) String data type value. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT SPACE(20) ┌──────────────────────┐ │ SPACE(20) │ ├──────────────────────┤ │ │ └──────────────────────┘ ``` # SPLIT (Lakehouse v1) > SPLIT — splits a string using a specified delimiter and returns the resulting parts as an array. Splits a string using a specified delimiter and returns the resulting parts as an array. See also: [SPLIT\_PART](../split-part) ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.split('', '') ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.split('PlaidCloud Lakehouse', ' ') ┌─────────────────────────────────────────┐ │ func.split('PlaidCloud Lakehouse', ' ') │ ├─────────────────────────────────────────┤ │ ['PlaidCloud Lakehouse'] │ └─────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql SPLIT('', '') ``` ## Return Type [Section titled “Return Type”](#return-type) Array of strings. SPLIT returns NULL when either the input string or the delimiter is NULL. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql -- Use a space as the delimiter -- SPLIT returns an array with two parts. SELECT SPLIT('PlaidCloud Lakehouse', ' '); split('PlaidCloud Lakehouse', ' ')| ----------------------------------+ ['PlaidCloud','Lakehouse'] | -- Use an empty string as the delimiter or a delimiter that does not exist in the input string -- SPLIT returns an array containing the entire input string as a single part. SELECT SPLIT('PlaidCloud Lakehouse', ''); split('databend cloud', '')| ----------------------------------+ ['PlaidCloud Lakehouse'] | SELECT SPLIT('PlaidCloud Lakehouse', ','); split('databend cloud', ',')| ----------------------------------+ ['PlaidCloud Lakehouse'] | -- Use ' ' (tab) as the delimiter -- SPLIT returns an array with timestamp, log level, and message. SELECT SPLIT('2023-10-19 15:30:45 INFO Log message goes here', ' '); split('2023-10-19 15:30:45\tinfo\tlog message goes here', '\t')| ---------------------------------------------------------------+ ['2023-10-19 15:30:45','INFO','Log message goes here'] | ``` # SPLIT_PART (Lakehouse v1) > SPLIT_PART — splits a string using a specified delimiter and returns the specified part. Splits a string using a specified delimiter and returns the specified part. See also: [SPLIT](../split) ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.split_part('', '', '') ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.split_part('PlaidCloud Lakehouse', ' ', 1) ┌─────────────────────────────────────────────────┐ │ func.split_part('PlaidCloud Lakehouse', ' ', 1) │ ├─────────────────────────────────────────────────┤ │ PlaidCloud │ └─────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql SPLIT_PART('', '', '') ``` The *position* argument specifies which part to return. It uses a 1-based index but can also accept positive, negative, or zero values: * If *position* is a positive number, it returns the part at the position from the left to the right, or NULL if it doesn’t exist. * If *position* is a negative number, it returns the part at the position from the right to the left, or NULL if it doesn’t exist. * If *position* is 0, it is treated as 1, effectively returning the first part of the string. ## Return Type [Section titled “Return Type”](#return-type) String. SPLIT\_PART returns NULL when either the input string, the delimiter, or the position is NULL. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql -- Use a space as the delimiter -- SPLIT_PART returns a specific part. SELECT SPLIT_PART('PlaidCloud Lakehouse', ' ', 1); split_part('PlaidCloud Lakehouse', ' ', 1)| ------------------------------------------+ PlaidCloud Lakehouse | -- Use an empty string as the delimiter or a delimiter that does not exist in the input string -- SPLIT_PART returns the entire input string. SELECT SPLIT_PART('PlaidCloud Lakehouse', '', 1); split_part('PlaidCloud Lakehouse', '', 1)| -----------------------------------+ PlaidCloud Lakehouse | SELECT SPLIT_PART('PlaidCloud Lakehouse', ',', 1); split_part('PlaidCloud Lakehouse', ',', 1)| ------------------------------------+ PlaidCloud Lakehouse | -- Use ' ' (tab) as the delimiter -- SPLIT_PART returns individual fields. SELECT SPLIT_PART('2023-10-19 15:30:45 INFO Log message goes here', ' ', 3); split_part('2023-10-19 15:30:45 info log message goes here', ' ', 3)| --------------------------------------------------------------------------+ Log message goes here | -- SPLIT_PART returns an empty string as the specified part does not exist at all. SELECT SPLIT_PART('2023-10-19 15:30:45 INFO Log message goes here', ' ', 4); split_part('2023-10-19 15:30:45 info log message goes here', ' ', 4)| --------------------------------------------------------------------------+ | ``` # STRCMP (Lakehouse v1) > STRCMP — returns 0 if the strings are the same, -1 if the first argument is smaller than the second, and 1 otherwise. Returns 0 if the strings are the same, -1 if the first argument is smaller than the second, and 1 otherwise. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.strcmp( ,) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.strcmp('text', 'text2') ┌──────────────────────────────┐ │ func.strcmp('text', 'text2') │ ├──────────────────────────────┤ │ -1 │ └──────────────────────────────┘ func.strcmp('text2', 'text') ┌──────────────────────────────┐ │ func.strcmp('text2', 'text') │ ├──────────────────────────────┤ │ 1 │ └──────────────────────────────┘ func.strcmp('text', 'text') ┌──────────────────────────────┐ │ func.strcmp('text', 'text') │ ├──────────────────────────────┤ │ 0 │ └──────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql STRCMP( ,) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ----------- | | `` | The string. | | `` | The string. | ## Return Type [Section titled “Return Type”](#return-type) `BIGINT` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT STRCMP('text', 'text2'); ┌─────────────────────────┐ │ STRCMP('text', 'text2') │ ├─────────────────────────┤ │ -1 │ └─────────────────────────┘ SELECT STRCMP('text2', 'text'); ┌─────────────────────────┐ │ STRCMP('text2', 'text') │ ├─────────────────────────┤ │ 1 │ └─────────────────────────┘ SELECT STRCMP('text', 'text'); ┌────────────────────────┐ │ STRCMP('text', 'text') │ ├────────────────────────┤ │ 0 │ └────────────────────────┘ ``` # SUBSTR (Lakehouse v1) > SUBSTR — extracts a string containing a specific number of characters from a particular position. Extracts a string containing a specific number of characters from a particular position of a given string. * The forms without a `len` argument return a substring from string `str` starting at position `pos`. * The forms with a `len` argument return a substring `len` characters long from string `str`, starting at position `pos`. It is also possible to use a negative value for `pos`. In this case, the beginning of the substring is pos characters from the end of the string, rather than the beginning. A negative value may be used for `pos` in any of the forms of this function. A value of 0 for `pos` returns an empty string. The position of the first character in the string from which the substring is to be extracted is reckoned as 1. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.substr(, , ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.substr('Quadratically', 5, 6) ┌────────────────────────────────────┐ │ func.substr('Quadratically', 5, 6) │ ├────────────────────────────────────┤ │ ratica │ └────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql SUBSTR(, ) SUBSTR(, , ) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ------------------------------------------------------------------------------------------ | | `` | The main string from where the character to be extracted | | `` | The position (starting from 1) the substring to start at. If negative, counts from the end | | `` | The maximum length of the substring to extract | ## Aliases [Section titled “Aliases”](#aliases) * [SUBSTRING](../substring) * [MID](../mid) ## Return Type [Section titled “Return Type”](#return-type) VARCHAR ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT SUBSTRING('Quadratically', 5), SUBSTR('Quadratically', 5), MID('Quadratically', 5); ┌─────────────────────────────────────────────────────────────────────────────────────────────────┐ │ substring('quadratically' from 5) │ substring('quadratically' from 5) │ mid('quadratically', 5) │ ├───────────────────────────────────┼───────────────────────────────────┼─────────────────────────┤ │ ratically │ ratically │ ratically │ └─────────────────────────────────────────────────────────────────────────────────────────────────┘ SELECT SUBSTRING('Quadratically', 5, 6), SUBSTR('Quadratically', 5, 6), MID('Quadratically', 5, 6); ┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ substring('quadratically' from 5 for 6) │ substring('quadratically' from 5 for 6) │ mid('quadratically', 5, 6) │ ├─────────────────────────────────────────┼─────────────────────────────────────────┼────────────────────────────┤ │ ratica │ ratica │ ratica │ └────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ``` # SUBSTRING (Lakehouse v1) > SUBSTRING — alias for the SUBSTR string function. Alias for [SUBSTR](../substr). # TO_BASE64 (Lakehouse v1) > TO_BASE64 — converts the string argument to base-64 encoded form and returns the result as a character string. Converts the string argument to base-64 encoded form and returns the result as a character string. If the argument is not a string, it is converted to a string before conversion takes place. The result is NULL if the argument is NULL. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_base64() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_base64('abc') ┌───────────────────────┐ │ func.to_base64('abc') │ ├───────────────────────┤ │ YWJj │ └───────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_BASE64() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ----------- | | `` | The value. | ## Return Type [Section titled “Return Type”](#return-type) `VARCHAR` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT TO_BASE64('abc'); ┌──────────────────┐ │ TO_BASE64('abc') │ ├──────────────────┤ │ YWJj │ └──────────────────┘ ``` # TRANSLATE (Lakehouse v1) > TRANSLATE — transforms a given string by replacing specific characters with corresponding. Transforms a given string by replacing specific characters with corresponding replacements, as defined by the provided mapping. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.translate('', '', '') ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.translate('databend', 'de', 'DE') ┌────────────────────────────────────────┐ │ func.translate('databend', 'de', 'DE') │ ├────────────────────────────────────────┤ │ DatabEnD │ └────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TRANSLATE('', '', '') ``` | Parameter | Description | | ------------------------- | ----------------------------------------------------------------------------------------------- | | `` | The input string to be transformed. | | `` | The string containing characters to be replaced in the input string. | | `` | The string containing replacement characters corresponding to those in ``. | ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql -- Replace 'd' with '$' in 'databend' SELECT TRANSLATE('databend', 'd', '$'); --- $ataben$ -- Replace 'd' with 'D' in 'databend' SELECT TRANSLATE('databend', 'd', 'D'); --- DatabenD -- Replace 'd' with 'D' and 'e' with 'E' in 'databend' SELECT TRANSLATE('databend', 'de', 'DE'); --- DatabEnD -- Remove 'd' from 'databend' SELECT TRANSLATE('databend', 'd', ''); --- ataben ``` # TRIM (Lakehouse v1) > TRIM — returns the string without leading or trailing occurrences of the specified remove string. Returns the string without leading or trailing occurrences of the specified remove string. If remove string is omitted, spaces are removed. The Analyze function automatically trims both leading and trailing spaces. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.trim(str) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.trim(' plaidcloud ') ┌────────────────────────────────┐ │ func.trim(' plaidcloud ') │ ├────────────────────────────────┤ │ 'plaidcloud' │ └────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TRIM([{BOTH | LEADING | TRAILING} [remstr] FROM ] str) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) Please note that ALL the examples in this section will return the string ‘databend’. The following example removes the leading and trailing string ‘xxx’ from the string ‘xxxdatabendxxx’: ```sql SELECT TRIM(BOTH 'xxx' FROM 'xxxdatabendxxx'); ``` The following example removes the leading string ‘xxx’ from the string ‘xxxdatabend’: ```sql SELECT TRIM(LEADING 'xxx' FROM 'xxxdatabend' ); ``` The following example removes the trailing string ‘xxx’ from the string ‘databendxxx’: ```sql SELECT TRIM(TRAILING 'xxx' FROM 'databendxxx' ); ``` If no remove string is specified, the function removes all leading and trailing spaces. The following examples remove the leading and/or trailing spaces: ```sql SELECT TRIM(' databend '); SELECT TRIM(' databend'); SELECT TRIM('databend '); ``` # TRIM_BOTH (Lakehouse v1) > TRIM_BOTH — removes all occurrences of the specified trim string from the beginning, end, or both sides of the string. Removes all occurrences of the specified trim string from the beginning, end, or both sides of the string. See also: [TRIM](../trim) ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.trim_both(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.trim_both('xxdatabendxx', 'x') ┌──────────────────────────────────────┐ │ func.trim_both('xxdatabendxx', 'x') │ ├──────────────────────────────────────┤ │ databend │ └──────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TRIM_BOTH(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT TRIM_BOTH('xxdatabendxx', 'xxx'), TRIM_BOTH('xxdatabendxx', 'xx'), TRIM_BOTH('xxdatabendxx', 'x'); ┌─────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ trim_both('xxdatabendxx', 'xxx') │ trim_both('xxdatabendxx', 'xx') │ trim_both('xxdatabendxx', 'x') │ ├──────────────────────────────────┼─────────────────────────────────┼────────────────────────────────┤ │ xxdatabendxx │ databend │ databend │ └─────────────────────────────────────────────────────────────────────────────────────────────────────┘ ``` # TRIM_LEADING (Lakehouse v1) > TRIM_LEADING — removes all occurrences of the specified trim string from the beginning of the string. Removes all occurrences of the specified trim string from the beginning of the string. See also: * [LTRIM](../ltrim) * [TRIM\_TRAILING](../trim-trailing) ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.trim_leading(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.trim_leading('xxdatabendxx', 'x') ┌──────────────────────────────────────────┐ │ func.trim_leading('xxdatabendxx', 'x') │ ├──────────────────────────────────────────┤ │ databendxx │ └──────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TRIM_LEADING(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT TRIM_LEADING('xxdatabend', 'xxx'), TRIM_LEADING('xxdatabend', 'xx'), TRIM_LEADING('xxdatabend', 'x'); ┌────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ trim_leading('xxdatabend', 'xxx') │ trim_leading('xxdatabend', 'xx') │ trim_leading('xxdatabend', 'x') │ ├───────────────────────────────────┼──────────────────────────────────┼─────────────────────────────────┤ │ xxdatabend │ databend │ databend │ └────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ``` # TRIM_TRAILING (Lakehouse v1) > TRIM_TRAILING — removes all occurrences of the specified trim string from the end of the string. Removes all occurrences of the specified trim string from the end of the string. See also: * [RTRIM](../rtrim) * [TRIM\_LEADING](../trim-leading) ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.trim_trailing(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.trim_trailing('xxdatabendxx', 'x') ┌──────────────────────────────────────────┐ │ func.trim_trailing('xxdatabendxx', 'x') │ ├──────────────────────────────────────────┤ │ xxdatabend │ └──────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TRIM_TRAILING(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT TRIM_TRAILING('databendxx', 'xxx'), TRIM_TRAILING('databendxx', 'xx'), TRIM_TRAILING('databendxx', 'x'); ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ trim_trailing('databendxx', 'xxx') │ trim_trailing('databendxx', 'xx') │ trim_trailing('databendxx', 'x') │ ├────────────────────────────────────┼───────────────────────────────────┼──────────────────────────────────┤ │ databendxx │ databend │ databend │ └───────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ``` # UCASE (Lakehouse v1) > UCASE — alias for the UPPER string function. Alias for [UPPER](../upper). # UNHEX (Lakehouse v1) > UNHEX — for a string argument str, UNHEX(str) interprets each pair of characters in the argument. For a string argument str, UNHEX(str) interprets each pair of characters in the argument as a hexadecimal number and converts it to the byte represented by the number. The return value is a binary string. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.unhex() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.unhex('6461746162656e64') ┌────────────────────────────────┐ │ func.unhex('6461746162656e64') │ ├────────────────────────────────┤ │ 6461746162656E64 │ └────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql UNHEX() ``` ## Aliases [Section titled “Aliases”](#aliases) * [FROM\_HEX](../from-hex) ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT UNHEX('6461746162656e64') as c1, typeof(c1),UNHEX('6461746162656e64')::varchar as c2, typeof(c2), FROM_HEX('6461746162656e64'); ┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ c1 │ typeof(c1) │ c2 | typeof(c2) | from_hex('6461746162656e64') | ├───────────────────────────┼────────────────────────|──────────────────┤───────────────────|─────────────────────────────────┤ │ 6461746162656E64 │ binary │ databend | varchar | 6461746162656E64 | └─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ SELECT UNHEX(HEX('string')), unhex(HEX('string'))::varchar; ┌──────────────────────────────────────────────────────┐ │ unhex(hex('string')) │ unhex(hex('string'))::varchar │ ├──────────────────────┼───────────────────────────────┤ │ 737472696E67 │ string │ └──────────────────────────────────────────────────────┘ ``` # UPPER (Lakehouse v1) > UPPER — Returns a string with all characters changed to uppercase. Returns a string with all characters changed to uppercase. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.unhex() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.upper('hello, plaidcloud lakehouse!') ┌────────────────────────────────────────────┐ │ func.upper('hello, plaidcloud lakehouse!') │ ├────────────────────────────────────────────┤ │ 'HELLO, PLAIDCLOUD LAKEHOUSE!' │ └────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql UPPER() ``` ## Aliases [Section titled “Aliases”](#aliases) * [UCASE](../ucase) ## Return Type [Section titled “Return Type”](#return-type) VARCHAR ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT UPPER('hello, databend!'), UCASE('hello, databend!'); ┌───────────────────────────────────────────────────────┐ │ upper('hello, databend!') │ ucase('hello, databend!') │ ├───────────────────────────┼───────────────────────────┤ │ HELLO, DATABEND! │ HELLO, DATABEND! │ └───────────────────────────────────────────────────────┘ ``` # Aggregate Functions (Lakehouse v1) > Lakehouse v1 SQL aggregate functions: summarise rows — SUM, AVG, MIN, MAX, COUNT, and statistical aggregates. Aggregate functions are essential tools in SQL that allow you to perform calculations on a set of values and return a single result. These functions help you extract and summarize data from databases to gain valuable insights. | Function Name | What It Does | | ------------------------------------------------------------------ | ---------------------------------------------------------------------------- | | [ANY](aggregate-any) | Checks if any row meets the specified condition | | [APPROX\_COUNT\_DISTINCT](aggregate-approx-count-distinct) | Estimates the number of distinct values with HyperLogLog | | [ARG\_MAX](aggregate-arg-max) | Finds the arg value for the maximum val value | | [ARG\_MIN](aggregate-arg-min) | Finds the arg value for the minimum val value | | [AVG\_IF](aggregate-avg-if) | Calculates the average for rows meeting a condition | | [ARRAY\_AGG](aggregate-array-agg) | Converts all the values of a column to an Array | | [AVG](aggregate-avg) | Calculates the average value of a specific column | | [COUNT\_DISTINCT](aggregate-count-distinct) | Counts the number of distinct values in a column | | [COUNT\_IF](aggregate-count-if) | Counts rows meeting a specified condition | | [COUNT](aggregate-count) | Counts the number of rows that meet certain criteria | | [COVAR\_POP](aggregate-covar-pop) | Returns the population covariance of a set of number pairs | | [COVAR\_SAMP](aggregate-covar-samp) | Returns the sample covariance of a set of number pairs | | [GROUP\_ARRAY\_MOVING\_AVG](aggregate-group-array-moving-avg) | Returns an array with elements calculates the moving average of input values | | [GROUP\_ARRAY\_MOVING\_SUM](aggregate-group-array-moving-sum) | Returns an array with elements calculates the moving sum of input values | | [KURTOSIS](aggregate-kurtosis) | Calculates the excess kurtosis of a set of values | | [MAX\_IF](aggregate-max-if) | Finds the maximum value for rows meeting a condition | | [MAX](aggregate-max) | Finds the largest value in a specific column | | [MEDIAN](aggregate-median) | Calculates the median value of a specific column | | [MEDIAN\_TDIGEST](aggregate-median-tdigest) | Calculates the median value of a specific column using t-digest algorithm | | [MIN\_IF](aggregate-min-if) | Finds the minimum value for rows meeting a condition | | [MIN](aggregate-min) | Finds the smallest value in a specific column | | [QUANTILE\_CONT](aggregate-quantile-cont) | Calculates the interpolated quantile for a specific column | | [QUANTILE\_DISC](aggregate-quantile-disc) | Calculates the quantile for a specific column | | [QUANTILE\_TDIGEST](aggregate-quantile-tdigest) | Calculates the quantile using t-digest algorithm | | [QUANTILE\_TDIGEST\_WEIGHTED](aggregate-quantile-tdigest-weighted) | Calculates the quantile with weighted using t-digest algorithm | | [RETENTION](aggregate-retention) | Calculates retention for a set of events | | [SKEWNESS](aggregate-skewness) | Calculates the skewness of a set of values | | [STDDEV\_POP](aggregate-stddev-pop) | Calculates the population standard deviation of a column | | [STDDEV\_SAMP](aggregate-stddev-samp) | Calculates the sample standard deviation of a column | | [STRING\_AGG](aggregate-string-agg) | Converts all the non-NULL values to String, separated by the delimiter | | [SUM\_IF](aggregate-sum-if) | Adds up the values meeting a condition of a specific column | | [SUM](aggregate-sum) | Adds up the values of a specific column | | [WINDOW\_FUNNEL](aggregate-windowfunnel) | Analyzes user behavior in a time-ordered sequence of events | # ANY (Lakehouse v1) > ANY — return the first encountered non-NULL value from a set; result is indeterminate when execution order varies. Aggregate function. The ANY() function selects the first encountered (non-NULL) value, unless all rows have NULL values in that column. The query can be executed in any order and even in a different order each time, so the result of this function is indeterminate. To get a determinate result, you can use the ‘min’ or ‘max’ function instead of ‘any’. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.any() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.any(table.product_name).alias('any_product_name') | any_product_name | |------------------| | Laptop | ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ANY() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | -------------- | | `` | Any expression | ## Return Type [Section titled “Return Type”](#return-type) The first encountered (non-NULL) value, in the type of the value. If all values are NULL, the return value is NULL. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) **Create a Table and Insert Sample Data** ```sql CREATE TABLE product_data ( id INT, product_name VARCHAR NULL, price FLOAT NULL ); INSERT INTO product_data (id, product_name, price) VALUES (1, 'Laptop', 1000), (2, NULL, 800), (3, 'Keyboard', NULL), (4, 'Mouse', 25), (5, 'Monitor', 150); ``` **Query Demo: Retrieve the First Encountered Non-NULL Product Name** ```sql SELECT ANY(product_name) AS any_product_name FROM product_data; ``` **Result** ```sql | any_product_name | |------------------| | Laptop | ``` # APPROX_COUNT_DISTINCT (Lakehouse v1) > APPROX_COUNT_DISTINCT — estimates the number of distinct values in a data set with the HyperLogLog algorithm. Estimates the number of distinct values in a data set with the [HyperLogLog](https://en.wikipedia.org/wiki/HyperLogLog) algorithm. The HyperLogLog algorithm provides an approximation of the number of unique elements using little memory and time. Consider using this function when dealing with large data sets where an estimated result can be accepted. In exchange for some accuracy, this is a fast and efficient method of returning distinct counts. To get an accurate result, use [COUNT\_DISTINCT](../aggregate-count-distinct). See [Examples](#examples) for more explanations. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.approx_count_distinct() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.approx_count_distinct(table.user_id).alias('approx_distinct_user_count') | approx_distinct_user_count | |----------------------------| | 4 | ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql APPROX_COUNT_DISTINCT() ``` ## Return Type [Section titled “Return Type”](#return-type) Integer. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) **Create a Table and Insert Sample Data** ```sql CREATE TABLE user_events ( id INT, user_id INT, event_name VARCHAR ); INSERT INTO user_events (id, user_id, event_name) VALUES (1, 1, 'Login'), (2, 2, 'Login'), (3, 3, 'Login'), (4, 1, 'Logout'), (5, 2, 'Logout'), (6, 4, 'Login'), (7, 1, 'Login'); ``` **Query Demo: Estimate the Number of Distinct User IDs** ```sql SELECT APPROX_COUNT_DISTINCT(user_id) AS approx_distinct_user_count FROM user_events; ``` **Result** ```sql | approx_distinct_user_count | |----------------------------| | 4 | ``` # ARG_MAX (Lakehouse v1) > ARG_MAX — Calculates the arg value for a maximum val value. Calculates the `arg` value for a maximum `val` value. If there are several values of `arg` for maximum values of `val`, returns the first of these values encountered. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.arg_max() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.arg_max(table.product, table.price).alias('max_price_product') | max_price_product | | ----------------- | | Product C | ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ARG_MAX(, ) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ------------------------------------------------------------ | | `` | Argument of any data type that PlaidCloud Lakehouse supports | | `` | Value of any data type that PlaidCloud Lakehouse supports | ## Return Type [Section titled “Return Type”](#return-type) `arg` value that corresponds to maximum `val` value. matches `arg` type. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) **Creating a Table and Inserting Sample Data** Let’s create a table named “sales” and insert some sample data: ```sql CREATE TABLE sales ( id INTEGER, product VARCHAR(50), price FLOAT ); INSERT INTO sales (id, product, price) VALUES (1, 'Product A', 10.5), (2, 'Product B', 20.75), (3, 'Product C', 30.0), (4, 'Product D', 15.25), (5, 'Product E', 25.5); ``` **Query: Using ARG\_MAX() Function** Now, let’s use the ARG\_MAX() function to find the product that has the maximum price: ```sql SELECT ARG_MAX(product, price) AS max_price_product FROM sales; ``` The result should look like this: ```sql | max_price_product | | ----------------- | | Product C | ``` # ARG_MIN (Lakehouse v1) > ARG_MIN — Calculates the arg value for a minimum val value. Calculates the `arg` value for a minimum `val` value. If there are several different values of `arg` for minimum values of `val`, returns the first of these values encountered. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.arg_min() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.arg_min(table.name, table.score).alias('student_name') | student_name | |--------------| | Charlie | ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ARG_MIN(, ) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ------------------------------------------------------------ | | `` | Argument of any data type that PlaidCloud Lakehouse supports | | `` | Value of any data type that PlaidCloud Lakehouse supports | ## Return Type [Section titled “Return Type”](#return-type) `arg` value that corresponds to minimum `val` value. matches `arg` type. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) Let’s create a table students with columns id, name, and score, and insert some data: ```sql CREATE TABLE students ( id INT, name VARCHAR, score INT ); INSERT INTO students (id, name, score) VALUES (1, 'Alice', 80), (2, 'Bob', 75), (3, 'Charlie', 90), (4, 'Dave', 80); ``` Now, we can use ARG\_MIN to find the name of the student with the lowest score: ```sql SELECT ARG_MIN(name, score) AS student_name FROM students; ``` Result: ```sql | student_name | |--------------| | Charlie | ``` # ARRAY_AGG (Lakehouse v1) > ARRAY_AGG — the ARRAY_AGG function (also known by its alias LIST) transforms all the values. The ARRAY\_AGG function (also known by its alias LIST) transforms all the values, including NULL, of a specific column in a query result into an array. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.array_agg() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python table.movie_title, func.array_agg(table.rating).alias('ratings') | movie_title | ratings | |-------------|------------| | Inception | [5, 4, 5] | ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ARRAY_AGG() LIST() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | -------------- | | `` | Any expression | ## Return Type [Section titled “Return Type”](#return-type) Returns an Array with elements that are of the same type as the original data. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) This example demonstrates how the ARRAY\_AGG function can be used to aggregate and present data in a convenient array format: ```sql -- Create a table and insert sample data CREATE TABLE movie_ratings ( id INT, movie_title VARCHAR, user_id INT, rating INT ); INSERT INTO movie_ratings (id, movie_title, user_id, rating) VALUES (1, 'Inception', 1, 5), (2, 'Inception', 2, 4), (3, 'Inception', 3, 5), (4, 'Interstellar', 1, 4), (5, 'Interstellar', 2, 3); -- List all ratings for Inception in an array SELECT movie_title, ARRAY_AGG(rating) AS ratings FROM movie_ratings WHERE movie_title = 'Inception' GROUP BY movie_title; | movie_title | ratings | |-------------|------------| | Inception | [5, 4, 5] | ``` # AVG (Lakehouse v1) > AVG — return the average value of an expression, ignoring NULL values. Aggregate function. The AVG() function returns the average value of an expression. **Note:** NULL values are not counted. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.avg() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.avg(table.price).alias('avg_price') | avg_price | | --------- | | 20.4 | ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql AVG() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ------------------------ | | `` | Any numerical expression | ## Return Type [Section titled “Return Type”](#return-type) double ## SQL Examples [Section titled “SQL Examples”](#sql-examples) **Creating a Table and Inserting Sample Data** Let’s create a table named “sales” and insert some sample data: ```sql CREATE TABLE sales ( id INTEGER, product VARCHAR(50), price FLOAT ); INSERT INTO sales (id, product, price) VALUES (1, 'Product A', 10.5), (2, 'Product B', 20.75), (3, 'Product C', 30.0), (4, 'Product D', 15.25), (5, 'Product E', 25.5); ``` **Query: Using AVG() Function** Now, let’s use the AVG() function to find the average price of all products in the “sales” table: ```sql SELECT AVG(price) AS avg_price FROM sales; ``` The result should look like this: ```sql | avg_price | | --------- | | 20.4 | ``` # AVG_IF (Lakehouse v1) > AVG_IF — the suffix -If can be appended to the name of any aggregate function. The suffix -If can be appended to the name of any aggregate function. In this case, the aggregate function accepts an extra argument – a condition. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.avg_if(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.avg_if(table.salary, table.department=='IT').alias('avg_salary_it') | avg_salary_it | |-----------------| | 65000.0 | ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql AVG_IF(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) **Create a Table and Insert Sample Data** ```sql CREATE TABLE employees ( id INT, salary INT, department VARCHAR ); INSERT INTO employees (id, salary, department) VALUES (1, 50000, 'HR'), (2, 60000, 'IT'), (3, 55000, 'HR'), (4, 70000, 'IT'), (5, 65000, 'IT'); ``` **Query Demo: Calculate Average Salary for IT Department** ```sql SELECT AVG_IF(salary, department = 'IT') AS avg_salary_it FROM employees; ``` **Result** ```sql | avg_salary_it | |-----------------| | 65000.0 | ``` # COUNT (Lakehouse v1) > COUNT — Returns the number of records returned by a SELECT query. The COUNT() function returns the number of records returned by a SELECT query. Caution NULL values are not counted. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.count() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.count(table.grade).alias('count_valid_grades') | count_valid_grades | |--------------------| | 4 | ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql COUNT() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | --------------------------------------------------------------------------------------------------------------------------------------------------- | | `` | Any expression. This may be a column name, the result of another function, or a math operation. `*` is also allowed, to indicate pure row counting. | ## Return Type [Section titled “Return Type”](#return-type) An integer. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) **Create a Table and Insert Sample Data** ```sql CREATE TABLE students ( id INT, name VARCHAR, age INT, grade FLOAT NULL ); INSERT INTO students (id, name, age, grade) VALUES (1, 'John', 21, 85), (2, 'Emma', 22, NULL), (3, 'Alice', 23, 90), (4, 'Michael', 21, 88), (5, 'Sophie', 22, 92); ``` **Query Demo: Count Students with Valid Grades** ```sql SELECT COUNT(grade) AS count_valid_grades FROM students; ``` **Result** ```sql | count_valid_grades | |--------------------| | 4 | ``` # COUNT_DISTINCT (Lakehouse v1) > COUNT(DISTINCT …) — count the number of unique non-NULL values in a column. Aggregate function. The count(distinct …) function calculates the unique value of a set of values. To obtain an estimated result from large data sets with little memory and time, consider using [APPROX\_COUNT\_DISTINCT](../aggregate-approx-count-distinct). Caution NULL values are not counted. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.count_distinct() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.count_distinct(table.category).alias('unique_categories') | unique_categories | |-------------------| | 2 | ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql COUNT(distinct ...) UNIQ() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ------------------------------------------------- | | `` | Any expression, size of the arguments is \[1, 32] | ## Return Type [Section titled “Return Type”](#return-type) UInt64 ## SQL Examples [Section titled “SQL Examples”](#sql-examples) **Create a Table and Insert Sample Data** ```sql CREATE TABLE products ( id INT, name VARCHAR, category VARCHAR, price FLOAT ); INSERT INTO products (id, name, category, price) VALUES (1, 'Laptop', 'Electronics', 1000), (2, 'Smartphone', 'Electronics', 800), (3, 'Tablet', 'Electronics', 600), (4, 'Chair', 'Furniture', 150), (5, 'Table', 'Furniture', 300); ``` **Query Demo: Count Distinct Categories** ```sql SELECT COUNT(DISTINCT category) AS unique_categories FROM products; ``` **Result** ```sql | unique_categories | |-------------------| | 2 | ``` # COUNT_IF (Lakehouse v1) > COUNT_IF — the suffix _IF can be appended to the name of any aggregate function. The suffix `_IF` can be appended to the name of any aggregate function. In this case, the aggregate function accepts an extra argument – a condition. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.count_if(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.count_if(table.status, table.status=='Completed').alias('completed_orders') | completed_orders | |------------------| | 3 | ``` ## SQL Example [Section titled “SQL Example”](#sql-example) ```sql COUNT_IF(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) **Create a Table and Insert Sample Data** ```sql CREATE TABLE orders ( id INT, customer_id INT, status VARCHAR, total FLOAT ); INSERT INTO orders (id, customer_id, status, total) VALUES (1, 1, 'completed', 100), (2, 2, 'completed', 200), (3, 1, 'pending', 150), (4, 3, 'completed', 250), (5, 2, 'pending', 300); ``` **Query Demo: Count Completed Orders** ```sql SELECT COUNT_IF(status, status = 'completed') AS completed_orders FROM orders; ``` **Result** ```sql | completed_orders | |------------------| | 3 | ``` # COVAR_POP (Lakehouse v1) > COVAR_POP — cOVAR_POP returns the population covariance of a set of number pairs. COVAR\_POP returns the population covariance of a set of number pairs. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.covar_pop(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.covar_pop(table.units_sold, table.revenue).alias('covar_pop_units_revenue') | covar_pop_units_revenue | |-------------------------| | 20000.0 | ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql COVAR_POP(, ) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ------------------------ | | `` | Any numerical expression | | `` | Any numerical expression | ## Return Type [Section titled “Return Type”](#return-type) float64 ## SQL Examples [Section titled “SQL Examples”](#sql-examples) **Create a Table and Insert Sample Data** ```sql CREATE TABLE product_sales ( id INT, product_id INT, units_sold INT, revenue FLOAT ); INSERT INTO product_sales (id, product_id, units_sold, revenue) VALUES (1, 1, 10, 1000), (2, 2, 20, 2000), (3, 3, 30, 3000), (4, 4, 40, 4000), (5, 5, 50, 5000); ``` **Query Demo: Calculate Population Covariance between Units Sold and Revenue** ```sql SELECT COVAR_POP(units_sold, revenue) AS covar_pop_units_revenue FROM product_sales; ``` **Result** ```sql | covar_pop_units_revenue | |-------------------------| | 20000.0 | ``` # COVAR_SAMP (Lakehouse v1) > COVAR_SAMP — return the sample covariance of two numeric data columns. Aggregate function. The covar\_samp() function returns the sample covariance (Σ((x - x̅)(y - y̅)) / (n - 1)) of two data columns. Caution NULL values are not counted. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.covar_samp(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.covar_samp(table.items_sold, table.profit).alias('covar_samp_items_profit') | covar_samp_items_profit | |-------------------------| | 250000.0 | ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql COVAR_SAMP(, ) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ------------------------ | | `` | Any numerical expression | | `` | Any numerical expression | ## Return Type [Section titled “Return Type”](#return-type) float64, when n <= 1, returns +∞. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) **Create a Table and Insert Sample Data** ```sql CREATE TABLE store_sales ( id INT, store_id INT, items_sold INT, profit FLOAT ); INSERT INTO store_sales (id, store_id, items_sold, profit) VALUES (1, 1, 100, 1000), (2, 2, 200, 2000), (3, 3, 300, 3000), (4, 4, 400, 4000), (5, 5, 500, 5000); ``` **Query Demo: Calculate Sample Covariance between Items Sold and Profit** ```sql SELECT COVAR_SAMP(items_sold, profit) AS covar_samp_items_profit FROM store_sales; ``` **Result** ```sql | covar_samp_items_profit | |-------------------------| | 250000.0 | ``` # GROUP_ARRAY_MOVING_AVG (Lakehouse v1) > GROUP_ARRAY_MOVING_AVG — the GROUP_ARRAY_MOVING_AVG function calculates the moving average of input values. The GROUP\_ARRAY\_MOVING\_AVG function calculates the moving average of input values. The function can take the window size as a parameter. If left unspecified, the function takes the window size equal to the number of input values. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.group_array_moving_avg() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python table.user_id, func.group_array_moving_avg(table.request_num).alias('avg_request_num') | user_id | avg_request_num | |---------|------------------| | 1 | [5.0,11.5,21.5] | | 3 | [10.0,22.5,35.0] | | 2 | [7.5,18.0,31.0] | ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql GROUP_ARRAY_MOVING_AVG() GROUP_ARRAY_MOVING_AVG()() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------------- | ------------------------ | | `` | Any numerical expression | | `` | Any numerical expression | ## Return Type [Section titled “Return Type”](#return-type) Returns an Array with elements of double or decimal depending on the source data type. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql -- Create a table and insert sample data CREATE TABLE hits ( user_id INT, request_num INT ); INSERT INTO hits (user_id, request_num) VALUES (1, 10), (2, 15), (3, 20), (1, 13), (2, 21), (3, 25), (1, 30), (2, 41), (3, 45); SELECT user_id, GROUP_ARRAY_MOVING_AVG(2)(request_num) AS avg_request_num FROM hits GROUP BY user_id; | user_id | avg_request_num | |---------|------------------| | 1 | [5.0,11.5,21.5] | | 3 | [10.0,22.5,35.0] | | 2 | [7.5,18.0,31.0] | ``` # GROUP_ARRAY_MOVING_SUM (Lakehouse v1) > GROUP_ARRAY_MOVING_SUM — the GROUP_ARRAY_MOVING_SUM function calculates the moving sum of input values. The GROUP\_ARRAY\_MOVING\_SUM function calculates the moving sum of input values. The function can take the window size as a parameter. If left unspecified, the function takes the window size equal to the number of input values. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.group_array_moving_sum() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python table.user_id, func.group_array_moving_sum(table.request_num) | user_id | request_num | |---------|-------------| | 1 | [10,23,43] | | 2 | [20,45,70] | | 3 | [15,36,62] | ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql GROUP_ARRAY_MOVING_SUM() GROUP_ARRAY_MOVING_SUM()() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------------- | ------------------------ | | `` | Any numerical expression | | `` | Any numerical expression | ## Return Type [Section titled “Return Type”](#return-type) Returns an Array with elements that are of the same type as the original data. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql -- Create a table and insert sample data CREATE TABLE hits ( user_id INT, request_num INT ); INSERT INTO hits (user_id, request_num) VALUES (1, 10), (2, 15), (3, 20), (1, 13), (2, 21), (3, 25), (1, 30), (2, 41), (3, 45); SELECT user_id, GROUP_ARRAY_MOVING_SUM(2)(request_num) AS request_num FROM hits GROUP BY user_id; | user_id | request_num | |---------|-------------| | 1 | [10,23,43] | | 2 | [20,45,70] | | 3 | [15,36,62] | ``` # HISTOGRAM (Lakehouse v1) > HISTOGRAM — generates a data distribution histogram using equal-height bucketing strategy. Generates a data distribution histogram using an “equal height” bucketing strategy. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.histogram() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) See SQL Example for details ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql HISTOGRAM() -- The following two forms are equivalent: HISTOGRAM()() HISTOGRAM( [, ]) ``` | Parameter | Description | | ----------------- | ----------------------------------------------------------------------------------- | | `expr` | The data type of `expr` should be sortable. | | `max_num_buckets` | Optional positive integer specifying the maximum number of buckets. Default is 128. | ## Return Type [Section titled “Return Type”](#return-type) Returns either an empty string or a JSON object with the following structure: * **buckets**: List of buckets with detailed information: * **lower**: Lower bound of the bucket. * **upper**: Upper bound of the bucket. * **count**: Number of elements in the bucket. * **pre\_sum**: Cumulative count of elements up to the current bucket. * **ndv**: Number of distinct values in the bucket. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) This example shows how the HISTOGRAM function analyzes the distribution of `c_int` values in the `histagg` table, returning bucket boundaries, distinct value counts, element counts, and cumulative counts: ```sql CREATE TABLE histagg ( c_id INT, c_tinyint TINYINT, c_smallint SMALLINT, c_int INT ); INSERT INTO histagg VALUES (1, 10, 20, 30), (1, 11, 21, 33), (1, 11, 12, 13), (2, 21, 22, 23), (2, 31, 32, 33), (2, 10, 20, 30); SELECT HISTOGRAM(c_int) FROM histagg; ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ histogram(c_int) │ ├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ [{"lower":"13","upper":"13","ndv":1,"count":1,"pre_sum":0},{"lower":"23","upper":"23","ndv":1,"count":1,"pre_sum":1},{"lower":"30","upper":"30","ndv":1,"count":2,"pre_sum":2},{"lower":"33","upper":"33","ndv":1,"count":2,"pre_sum":4}] │ └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ``` The result is returned as a JSON array: ```json [ { "lower": "13", "upper": "13", "ndv": 1, "count": 1, "pre_sum": 0 }, { "lower": "23", "upper": "23", "ndv": 1, "count": 1, "pre_sum": 1 }, { "lower": "30", "upper": "30", "ndv": 1, "count": 2, "pre_sum": 2 }, { "lower": "33", "upper": "33", "ndv": 1, "count": 2, "pre_sum": 4 } ] ``` This example shows how `HISTOGRAM(2)` groups c\_int values into two buckets: ```sql SELECT HISTOGRAM(2)(c_int) FROM histagg; -- Or SELECT HISTOGRAM(c_int, 2) FROM histagg; ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ histogram(2)(c_int) │ ├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ [{"lower":"13","upper":"30","ndv":3,"count":4,"pre_sum":0},{"lower":"33","upper":"33","ndv":1,"count":2,"pre_sum":4}] │ └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ``` The result is returned as a JSON array: ```json [ { "lower": "13", "upper": "30", "ndv": 3, "count": 4, "pre_sum": 0 }, { "lower": "33", "upper": "33", "ndv": 1, "count": 2, "pre_sum": 4 } ] ``` # JSON_ARRAY_AGG (Lakehouse v1) > JSON_ARRAY_AGG — converts values into a JSON array while skipping NULLs. Converts values into a JSON array while skipping NULLs. See also: [JSON\_OBJECT\_AGG](../aggregate-json-object-agg) ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.json_array_agg() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) See SQL Example for details ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql JSON_ARRAY_AGG() ``` ## Return Type [Section titled “Return Type”](#return-type) JSON array. ## Examples [Section titled “Examples”](#examples) This example demonstrates how JSON\_ARRAY\_AGG aggregates values from each column into JSON arrays: ```sql CREATE TABLE d ( a DECIMAL(10, 2), b STRING, c INT, d VARIANT, e ARRAY(STRING) ); INSERT INTO d VALUES (20, 'abc', NULL, '{"k":"v"}', ['a','b']), (10, 'de', 100, 'null', []), (4.23, NULL, 200, '"uvw"', ['x','y']), (5.99, 'xyz', 300, '[1,2,3]', ['z']); SELECT json_array_agg(a) AS aggregated_a, json_array_agg(b) AS aggregated_b, json_array_agg(c) AS aggregated_c, json_array_agg(d) AS aggregated_d, json_array_agg(e) AS aggregated_e FROM d; -[ RECORD 1 ]----------------------------------- aggregated_a: [20.0,10.0,4.23,5.99] aggregated_b: ["abc","de","xyz"] aggregated_c: [100,200,300] aggregated_d: [{"k":"v"},null,"uvw",[1,2,3]] aggregated_e: [["a","b"],[],["x","y"],["z"]] ``` # JSON_OBJECT_AGG (Lakehouse v1) > JSON_OBJECT_AGG — Converts key-value pairs into a JSON object. Converts key-value pairs into a JSON object. For each row in the input, it generates a key-value pair where the key is derived from the `` and the value is derived from the ``. These key-value pairs are then combined into a single JSON object. See also: [JSON\_ARRAY\_AGG](../aggregate-json-array-agg) ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.json_object_agg(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) See SQL Example for details ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql JSON_OBJECT_AGG(, ) ``` | Parameter | Description | | ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ | | key\_expression | Specifies the key in the JSON object. **Only supports string** expressions. If the `key_expression` evaluates to NULL, the key-value pair is skipped. | | value\_expression | Specifies the value in the JSON object. It can be any supported data type. If the `value_expression` evaluates to NULL, the key-value pair is skipped. | ## Return Type [Section titled “Return Type”](#return-type) JSON object. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) This example demonstrates how JSON\_OBJECT\_AGG can be used to aggregate different types of data—such as decimals, integers, JSON variants, and arrays—into JSON objects, with the column b as the key for each JSON object: ```sql CREATE TABLE d ( a DECIMAL(10, 2), b STRING, c INT, d VARIANT, e ARRAY(STRING) ); INSERT INTO d VALUES (20, 'abc', NULL, '{"k":"v"}', ['a','b']), (10, 'de', 100, 'null', []), (4.23, NULL, 200, '"uvw"', ['x','y']), (5.99, 'xyz', 300, '[1,2,3]', ['z']); SELECT json_object_agg(b, a) AS json_a, json_object_agg(b, c) AS json_c, json_object_agg(b, d) AS json_d, json_object_agg(b, e) AS json_e FROM d; -[ RECORD 1 ]----------------------------------- json_a: {"abc":20.0,"de":10.0,"xyz":5.99} json_c: {"de":100,"xyz":300} json_d: {"abc":{"k":"v"},"de":null,"xyz":[1,2,3]} json_e: {"abc":["a","b"],"de":[],"xyz":["z"]} ``` # KURTOSIS (Lakehouse v1) > KURTOSIS — return the excess kurtosis (peakedness vs. normal) of all input values. Aggregate function. The `KURTOSIS()` function returns the excess kurtosis of all input values. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.kurtosis() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.kurtosis(table.price).alias('excess_kurtosis') | excess_kurtosis | |-------------------------| | 0.06818181325581445 | ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql KURTOSIS() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ------------------------ | | `` | Any numerical expression | ## Return Type [Section titled “Return Type”](#return-type) Nullable Float64. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) **Create a Table and Insert Sample Data** ```sql CREATE TABLE stock_prices ( id INT, stock_symbol VARCHAR, price FLOAT ); INSERT INTO stock_prices (id, stock_symbol, price) VALUES (1, 'AAPL', 150), (2, 'AAPL', 152), (3, 'AAPL', 148), (4, 'AAPL', 160), (5, 'AAPL', 155); ``` **Query Demo: Calculate Excess Kurtosis for Apple Stock Prices** ```sql SELECT KURTOSIS(price) AS excess_kurtosis FROM stock_prices WHERE stock_symbol = 'AAPL'; ``` **Result** ```sql | excess_kurtosis | |-------------------------| | 0.06818181325581445 | ``` # MAX (Lakehouse v1) > MAX — return the maximum value in a set of values. Aggregate function. The MAX() function returns the maximum value in a set of values. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.max() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python table.city, func.max(table.temperature).alias('max_temperature') | city | max_temperature | |------------|-----------------| | New York | 32 | ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql MAX() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | -------------- | | `` | Any expression | ## Return Type [Section titled “Return Type”](#return-type) The maximum value, in the type of the value. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) **Create a Table and Insert Sample Data** ```sql CREATE TABLE temperatures ( id INT, city VARCHAR, temperature FLOAT ); INSERT INTO temperatures (id, city, temperature) VALUES (1, 'New York', 30), (2, 'New York', 28), (3, 'New York', 32), (4, 'Los Angeles', 25), (5, 'Los Angeles', 27); ``` **Query Demo: Find Maximum Temperature for New York City** ```sql SELECT city, MAX(temperature) AS max_temperature FROM temperatures WHERE city = 'New York' GROUP BY city; ``` **Result** ```sql | city | max_temperature | |------------|-----------------| | New York | 32 | ``` # MAX_IF (Lakehouse v1) > MAX_IF — the suffix _IF can be appended to the name of any aggregate function. The suffix `_IF` can be appended to the name of any aggregate function. In this case, the aggregate function accepts an extra argument – a condition. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.max_if(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.max_if(table.revenue, table.salesperson_id==1).alias('max_revenue_salesperson_1') | max_revenue_salesperson_1 | |---------------------------| | 3000 | ``` ## SQL Example [Section titled “SQL Example”](#sql-example) ```sql MAX_IF(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) **Create a Table and Insert Sample Data** ```sql CREATE TABLE sales ( id INT, salesperson_id INT, product_id INT, revenue FLOAT ); INSERT INTO sales (id, salesperson_id, product_id, revenue) VALUES (1, 1, 1, 1000), (2, 1, 2, 2000), (3, 1, 3, 3000), (4, 2, 1, 1500), (5, 2, 2, 2500); ``` **Query Demo: Find Maximum Revenue for Salesperson with ID 1** ```sql SELECT MAX_IF(revenue, salesperson_id = 1) AS max_revenue_salesperson_1 FROM sales; ``` **Result** ```sql | max_revenue_salesperson_1 | |---------------------------| | 3000 | ``` # MEDIAN (Lakehouse v1) > MEDIAN — compute the median of a numeric data sequence. Aggregate function. The MEDIAN() function computes the median of a numeric data sequence. Caution NULL values are not counted. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.median() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.median(table.score).alias('median_score') | median_score | |----------------| | 85.0 | ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql MEDIAN() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ------------------------ | | `` | Any numerical expression | ## Return Type [Section titled “Return Type”](#return-type) the type of the value. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) **Create a Table and Insert Sample Data** ```sql CREATE TABLE exam_scores ( id INT, student_id INT, score INT ); INSERT INTO exam_scores (id, student_id, score) VALUES (1, 1, 80), (2, 2, 90), (3, 3, 75), (4, 4, 95), (5, 5, 85); ``` **Query Demo: Calculate Median Exam Score** ```sql SELECT MEDIAN(score) AS median_score FROM exam_scores; ``` **Result** ```sql | median_score | |----------------| | 85.0 | ``` # MEDIAN_TDIGEST (Lakehouse v1) > MEDIAN_TDIGEST — computes the median of a numeric data sequence using the t-digest algorithm. Computes the median of a numeric data sequence using the [t-digest](https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf) algorithm. Caution NULL values are not included in the calculation. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.median_tdigest() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.median_tdigest(table.score).alias('median_score') | median_score | |----------------| | 85.0 | ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql MEDIAN_TDIGEST() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ------------------------ | | `` | Any numerical expression | ## Return Type [Section titled “Return Type”](#return-type) Returns a value of the same data type as the input values. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql -- Create a table and insert sample data CREATE TABLE exam_scores ( id INT, student_id INT, score INT ); INSERT INTO exam_scores (id, student_id, score) VALUES (1, 1, 80), (2, 2, 90), (3, 3, 75), (4, 4, 95), (5, 5, 85); -- Calculate median exam score SELECT MEDIAN_TDIGEST(score) AS median_score FROM exam_scores; | median_score | |----------------| | 85.0 | ``` # MIN (Lakehouse v1) > MIN — return the minimum value in a set of values. Aggregate function. The MIN() function returns the minimum value in a set of values. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.min() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python table.station_id, func.min(table.price).alias('min_price') | station_id | min_price | |------------|-----------| | 1 | 3.45 | ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql MIN() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | -------------- | | `` | Any expression | ## Return Type [Section titled “Return Type”](#return-type) The minimum value, in the type of the value. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) *** ## title: MIN [Section titled “title: MIN”](#title-min) Aggregate function. The MIN() function returns the minimum value in a set of values. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax-1) ```sql MIN(expression) ``` ## Arguments [Section titled “Arguments”](#arguments-1) | Arguments | Description | | ---------- | -------------- | | expression | Any expression | ## Return Type [Section titled “Return Type”](#return-type-1) The minimum value, in the type of the value. ## SQL Examples [Section titled “SQL Examples”](#sql-examples-1) **Create a Table and Insert Sample Data** ```sql CREATE TABLE gas_prices ( id INT, station_id INT, price FLOAT ); INSERT INTO gas_prices (id, station_id, price) VALUES (1, 1, 3.50), (2, 1, 3.45), (3, 1, 3.55), (4, 2, 3.40), (5, 2, 3.35); ``` **Query Demo: Find Minimum Gas Price for Station 1** ```sql SELECT station_id, MIN(price) AS min_price FROM gas_prices WHERE station_id = 1 GROUP BY station_id; ``` **Result** ```sql | station_id | min_price | |------------|-----------| | 1 | 3.45 | ``` # MIN_IF (Lakehouse v1) > MIN_IF — the suffix _IF can be appended to the name of any aggregate function. The suffix `_IF` can be appended to the name of any aggregate function. In this case, the aggregate function accepts an extra argument – a condition. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.min_if(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.min_if(table.budget, table.departing=='IT').alias('min_it_budget') | min_it_budget | |---------------| | 2000 | ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql MIN_IF(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) **Create a Table and Insert Sample Data** ```sql CREATE TABLE project_budgets ( id INT, project_id INT, department VARCHAR, budget FLOAT ); INSERT INTO project_budgets (id, project_id, department, budget) VALUES (1, 1, 'HR', 1000), (2, 1, 'IT', 2000), (3, 1, 'Marketing', 3000), (4, 2, 'HR', 1500), (5, 2, 'IT', 2500); ``` **Query Demo: Find Minimum Budget for IT Department** ```sql SELECT MIN_IF(budget, department = 'IT') AS min_it_budget FROM project_budgets; ``` **Result** ```sql | min_it_budget | |---------------| | 2000 | ``` # QUANTILE_CONT (Lakehouse v1) > QUANTILE_CONT — return the interpolated (continuous) quantile of a numeric data sequence. Aggregate function. The QUANTILE\_CONT() function computes the interpolated quantile number of a numeric data sequence. Caution NULL values are not counted. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.quantile_cont(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.quantile_cont(0.5, table.sales_amount).alias('median_sales_amount') | median_sales_amount | |-----------------------| | 6000.0 | ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql QUANTILE_CONT()() QUANTILE_CONT(level1, level2, ...)() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | ----------- | ---------------------------------------------------------------------------------------------------------------------------------------------- | | `` | Any numerical expression | ## Return Type [Section titled “Return Type”](#return-type) Float64 or float64 array based on level number. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) **Create a Table and Insert Sample Data** ```sql CREATE TABLE sales_data ( id INT, sales_person_id INT, sales_amount FLOAT ); INSERT INTO sales_data (id, sales_person_id, sales_amount) VALUES (1, 1, 5000), (2, 2, 5500), (3, 3, 6000), (4, 4, 6500), (5, 5, 7000); ``` **Query Demo: Calculate 50th Percentile (Median) of Sales Amount using Interpolation** ```sql SELECT QUANTILE_CONT(0.5)(sales_amount) AS median_sales_amount FROM sales_data; ``` **Result** ```sql | median_sales_amount | |-----------------------| | 6000.0 | ``` # QUANTILE_DISC (Lakehouse v1) > QUANTILE_DISC — return the exact (discrete) quantile of a numeric data sequence. Aggregate function. The `QUANTILE_DISC()` function computes the exact quantile number of a numeric data sequence. The `QUANTILE` alias to `QUANTILE_DISC` Caution NULL values are not counted. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.quantile_disc(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.quantile_disc([0.25, 0.75], table.salary).alias('salary_quantiles') | salary_quantiles | |---------------------| | [55000.0, 65000.0] | ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql QUANTILE_DISC()() QUANTILE_DISC(level1, level2, ...)() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | ---------- | ---------------------------------------------------------------------------------------------------------------------------------------------- | | `level(s)` | level(s) of quantile. Each level is constant floating-point number from 0 to 1. We recommend using a level value in the range of \[0.01, 0.99] | | `` | Any numerical expression | ## Return Type [Section titled “Return Type”](#return-type) InputType or array of InputType based on level number. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) **Create a Table and Insert Sample Data** ```sql CREATE TABLE salary_data ( id INT, employee_id INT, salary FLOAT ); INSERT INTO salary_data (id, employee_id, salary) VALUES (1, 1, 50000), (2, 2, 55000), (3, 3, 60000), (4, 4, 65000), (5, 5, 70000); ``` **Query Demo: Calculate 25th and 75th Percentile of Salaries** ```sql SELECT QUANTILE_DISC(0.25, 0.75)(salary) AS salary_quantiles FROM salary_data; ``` **Result** ```sql | salary_quantiles | |---------------------| | [55000.0, 65000.0] | ``` # QUANTILE_TDIGEST (Lakehouse v1) > QUANTILE_TDIGEST — computes an approximate quantile of a numeric data sequence using the t-digest algorithm. Computes an approximate quantile of a numeric data sequence using the [t-digest](https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf) algorithm. Caution NULL values are not included in the calculation. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.quantile_tdigest(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.quantile_tdigest([0.5, 0.8], table.sales_amount).alias('sales_amounts') | sales_amounts | |-----------------------+ | [6000.0,7000.0] | ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql QUANTILE_TDIGEST([, , ...])() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | ----------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- | | `` | A level of quantile represents a constant floating-point number ranging from 0 to 1. It is recommended to use a level value in the range of \[0.01, 0.99]. | | `` | Any numerical expression | ## Return Type [Section titled “Return Type”](#return-type) Returns either a Float64 value or an array of Float64 values, depending on the number of quantile levels specified. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql -- Create a table and insert sample data CREATE TABLE sales_data ( id INT, sales_person_id INT, sales_amount FLOAT ); INSERT INTO sales_data (id, sales_person_id, sales_amount) VALUES (1, 1, 5000), (2, 2, 5500), (3, 3, 6000), (4, 4, 6500), (5, 5, 7000); SELECT QUANTILE_TDIGEST(0.5)(sales_amount) AS median_sales_amount FROM sales_data; median_sales_amount| -------------------+ 6000.0| SELECT QUANTILE_TDIGEST(0.5, 0.8)(sales_amount) FROM sales_data; quantile_tdigest(0.5, 0.8)(sales_amount)| ----------------------------------------+ [6000.0,7000.0] | ``` # QUANTILE_TDIGEST_WEIGHTED (Lakehouse v1) > QUANTILE_TDIGEST_WEIGHTED — computes an approximate quantile of a numeric data sequence using. Computes an approximate quantile of a numeric data sequence using the [t-digest](https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf) algorithm. This function takes into account the weight of each sequence member. Memory consumption is **log(n)**, where **n** is a number of values. Caution NULL values are not included in the calculation. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.quantile_tdigest_weighted(, , ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.quantile_tdigest_weighted([0.5, 0.8], table.sales_amount, 1).alias('sales_amounts') | sales_amounts | |-----------------------+ | [6000.0,7000.0] | ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql QUANTILE_TDIGEST_WEIGHTED([, , ...])(, ) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- | | `` | A level of quantile represents a constant floating-point number ranging from 0 to 1. It is recommended to use a level value in the range of \[0.01, 0.99]. | | `` | Any numerical expression | | `` | Any unsigned integer expression. Weight is a number of value occurrences. | ## Return Type [Section titled “Return Type”](#return-type) Returns either a Float64 value or an array of Float64 values, depending on the number of quantile levels specified. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql -- Create a table and insert sample data CREATE TABLE sales_data ( id INT, sales_person_id INT, sales_amount FLOAT ); INSERT INTO sales_data (id, sales_person_id, sales_amount) VALUES (1, 1, 5000), (2, 2, 5500), (3, 3, 6000), (4, 4, 6500), (5, 5, 7000); SELECT QUANTILE_TDIGEST_WEIGHTED(0.5)(sales_amount, 1) AS median_sales_amount FROM sales_data; median_sales_amount| -------------------+ 6000.0| SELECT QUANTILE_TDIGEST_WEIGHTED(0.5, 0.8)(sales_amount, 1) FROM sales_data; quantile_tdigest_weighted(0.5, 0.8)(sales_amount)| -------------------------------------------------+ [6000.0,7000.0] | ``` # RETENTION (Lakehouse v1) > RETENTION — aggregate function. Aggregate function The RETENTION() function takes as arguments a set of conditions from 1 to 32 arguments of type UInt8 that indicate whether a certain condition was met for the event. Any condition can be specified as an argument (as in WHERE). The conditions, except the first, apply in pairs: the result of the second will be true if the first and second are true, of the third if the first and third are true, etc. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.retention( , , ..., ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python table.user_id, func.retention(table.event_type=='signup', table.event_type='login', table.event_type='purchase').alias('sales_amounts') | user_id | retention | |---------|-----------| | 1 | [1, 1, 0] | | 2 | [1, 0, 1] | | 3 | [1, 1, 0] | ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql RETENTION( , , ..., ); ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ------------------------------------------- | | `` | An expression that returns a Boolean result | ## Return Type [Section titled “Return Type”](#return-type) The array of 1 or 0. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) **Create a Table and Insert Sample Data** ```sql CREATE TABLE user_events ( id INT, user_id INT, event_date DATE, event_type VARCHAR ); INSERT INTO user_events (id, user_id, event_date, event_type) VALUES (1, 1, '2022-01-01', 'signup'), (2, 1, '2022-01-02', 'login'), (3, 2, '2022-01-01', 'signup'), (4, 2, '2022-01-03', 'purchase'), (5, 3, '2022-01-01', 'signup'), (6, 3, '2022-01-02', 'login'); ``` **Query Demo: Calculate User Retention Based on Signup, Login, and Purchase Events** ```sql SELECT user_id, RETENTION(event_type = 'signup', event_type = 'login', event_type = 'purchase') AS retention FROM user_events GROUP BY user_id; ``` **Result** ```sql | user_id | retention | |---------|-----------| | 1 | [1, 1, 0] | | 2 | [1, 0, 1] | | 3 | [1, 1, 0] | ``` # SKEWNESS (Lakehouse v1) > SKEWNESS — return the skewness (asymmetry of the distribution) of all input values. Aggregate function. The `SKEWNESS()` function returns the skewness of all input values. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.skewness() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.skewness(table.temperature).alias('temperature_skewness') | temperature_skewness | |----------------------| | 0.68 | ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql SKEWNESS() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ------------------------ | | `` | Any numerical expression | ## Return Type [Section titled “Return Type”](#return-type) Nullable Float64. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) **Create a Table and Insert Sample Data** ```sql CREATE TABLE temperature_data ( id INT, city_id INT, temperature FLOAT ); INSERT INTO temperature_data (id, city_id, temperature) VALUES (1, 1, 60), (2, 1, 65), (3, 1, 62), (4, 2, 70), (5, 2, 75); ``` **Query Demo: Calculate Skewness of Temperature Data** ```sql SELECT SKEWNESS(temperature) AS temperature_skewness FROM temperature_data; ``` **Result** ```sql | temperature_skewness | |----------------------| | 0.68 | ``` # STDDEV_POP (Lakehouse v1) > STDDEV_POP — return the population standard deviation of an expression (square root of VAR_POP). Aggregate function. The STDDEV\_POP() function returns the population standard deviation(the square root of VAR\_POP()) of an expression. Note STD() or STDDEV() can also be used, which are equivalent but not standard SQL. Caution NULL values are not counted. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.stddev_pop() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.stddev_pop(table.score).alias('test_score_stddev_pop') | test_score_stddev_pop | |-----------------------| | 7.07107 | ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql STDDEV_POP() STDDEV() STD() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ------------------------ | | `` | Any numerical expression | ## Return Type [Section titled “Return Type”](#return-type) double ## SQL Examples [Section titled “SQL Examples”](#sql-examples) **Create a Table and Insert Sample Data** ```sql CREATE TABLE test_scores ( id INT, student_id INT, score FLOAT ); INSERT INTO test_scores (id, student_id, score) VALUES (1, 1, 80), (2, 2, 85), (3, 3, 90), (4, 4, 95), (5, 5, 100); ``` **Query Demo: Calculate Population Standard Deviation of Test Scores** ```sql SELECT STDDEV_POP(score) AS test_score_stddev_pop FROM test_scores; ``` **Result** ```sql | test_score_stddev_pop | |-----------------------| | 7.07107 | ``` # STDDEV_SAMP (Lakehouse v1) > STDDEV_SAMP — return the sample standard deviation of an expression (square root of VAR_SAMP). Aggregate function. The STDDEV\_SAMP() function returns the sample standard deviation(the square root of VAR\_SAMP()) of an expression. Caution NULL values are not counted. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.stddev_samp() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.stddev_samp(table.height).alias('height_stddev_samp') | height_stddev_samp | |--------------------| | 0.240 | ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql STDDEV_SAMP() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ------------------------ | | `` | Any numerical expression | ## Return Type [Section titled “Return Type”](#return-type) double ## SQL Examples [Section titled “SQL Examples”](#sql-examples) **Create a Table and Insert Sample Data** ```sql CREATE TABLE height_data ( id INT, person_id INT, height FLOAT ); INSERT INTO height_data (id, person_id, height) VALUES (1, 1, 5.8), (2, 2, 6.1), (3, 3, 5.9), (4, 4, 5.7), (5, 5, 6.3); ``` **Query Demo: Calculate Sample Standard Deviation of Heights** ```sql SELECT STDDEV_SAMP(height) AS height_stddev_samp FROM height_data; ``` **Result** ```sql | height_stddev_samp | |--------------------| | 0.240 | ``` # STRING_AGG (Lakehouse v1) > STRING_AGG — concatenate non-NULL column values into a single string, separated by a delimiter. Aggregate function. The STRING\_AGG() function converts all the non-NULL values of a column to String, separated by the delimiter. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.string_agg( [, delimiter]) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.string_agg(table.language_name).alias('concatenated_languages') | concatenated_languages | |-----------------------------------------| | Python, JavaScript, Java, C#, Ruby | ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql STRING_AGG() STRING_AGG( [, delimiter]) ``` Note If `` is not a String expression, should use `::VARCHAR` to convert. For example: `sql SELECT string_agg(number::VARCHAR, '|') AS s FROM numbers(5); ┌───────────┐ │ s │ ├───────────┤ │ 0│1│2│3│4 │ └───────────┘` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | ----------- | ------------------------------------------------------------------- | | `` | Any string expression (if not a string, use `::VARCHAR` to convert) | | `delimiter` | Optional constant String, if not specified, use empty String | ## Return Type [Section titled “Return Type”](#return-type) the String type ## SQL Examples [Section titled “SQL Examples”](#sql-examples) **Create a Table and Insert Sample Data** ```sql CREATE TABLE programming_languages ( id INT, language_name VARCHAR ); INSERT INTO programming_languages (id, language_name) VALUES (1, 'Python'), (2, 'JavaScript'), (3, 'Java'), (4, 'C#'), (5, 'Ruby'); ``` **Query Demo: Concatenate Programming Language Names with a Delimiter** ```sql SELECT STRING_AGG(language_name, ', ') AS concatenated_languages FROM programming_languages; ``` **Result** ```sql | concatenated_languages | |------------------------------------------| | Python, JavaScript, Java, C#, Ruby | ``` # SUM (Lakehouse v1) > SUM — calculate the sum of a set of values. Aggregate function. The SUM() function calculates the sum of a set of values. Caution NULL values are not counted. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.sum() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.sum(table.quantity).alias('total_quantity_sold') | total_quantity_sold | |---------------------| | 41 | ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql SUM() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ------------------------ | | `` | Any numerical expression | ## Return Type [Section titled “Return Type”](#return-type) A double if the input type is double, otherwise integer. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) **Create a Table and Insert Sample Data** ```sql CREATE TABLE sales_data ( id INT, product_id INT, quantity INT ); INSERT INTO sales_data (id, product_id, quantity) VALUES (1, 1, 10), (2, 2, 5), (3, 3, 8), (4, 4, 3), (5, 5, 15); ``` **Query Demo: Calculate the Total Quantity of Products Sold** ```sql SELECT SUM(quantity) AS total_quantity_sold FROM sales_data; ``` **Result** ```sql | total_quantity_sold | |---------------------| | 41 | ``` # SUM_IF (Lakehouse v1) > SUM_IF — the suffix -If can be appended to the name of any aggregate function. The suffix -If can be appended to the name of any aggregate function. In this case, the aggregate function accepts an extra argument – a condition. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.sum_if(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.sum_if(table.amount, table.status=='Completed').alias('total_amount_completed') | total_amount_completed | |------------------------| | 270.0 | ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql SUM_IF(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) **Create a Table and Insert Sample Data** ```sql CREATE TABLE order_data ( id INT, customer_id INT, amount FLOAT, status VARCHAR ); INSERT INTO order_data (id, customer_id, amount, status) VALUES (1, 1, 100, 'Completed'), (2, 2, 50, 'Completed'), (3, 3, 80, 'Cancelled'), (4, 4, 120, 'Completed'), (5, 5, 75, 'Cancelled'); ``` **Query Demo: Calculate the Total Amount of Completed Orders** ```sql SELECT SUM_IF(amount, status = 'Completed') AS total_amount_completed FROM order_data; ``` **Result** ```sql | total_amount_completed | |------------------------| | 270.0 | ``` # WINDOW_FUNNEL (Lakehouse v1) > WINDOW_FUNNEL — search for ordered event chains within a sliding time window and return the longest match length. ![](https://datafuse-1253727613.cos.ap-hongkong.myqcloud.com/learn/databend-funnel.png) Similar to `windowFunnel` in ClickHouse (they were created by the same author), it searches for event chains in a sliding time window and calculates the maximum number of events from the chain. The function works according to the algorithm: * The function searches for data that triggers the first condition in the chain and sets the event counter to 1. This is the moment when the sliding window starts. * If events from the chain occur sequentially within the window, the counter is incremented. If the sequence of events is disrupted, the counter isn’t incremented. * If the data has multiple event chains at varying completion points, the function will only output the size of the longest chain. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql WINDOW_FUNNEL( )( , , , ..., ) ``` **Arguments** * `` — Name of the column containing the timestamp. Data types supported: integer types and datetime types. * `` — Conditions or data describing the chain of events. Must be `Boolean` datatype. **Parameters** * `` — Length of the sliding window, it is the time interval between the first and the last condition. The unit of `window` depends on the `timestamp` itself and varies. Determined using the expression `timestamp of cond1 <= timestamp of cond2 <= ... <= timestamp of condN <= timestamp of cond1 + window`. **Returned value** The maximum number of consecutive triggered conditions from the chain within the sliding time window. All the chains in the selection are analyzed. Type: `UInt8`. **Example** Determine if a set period of time is enough for the user to SELECT a phone and purchase it twice in the online store. Set the following chain of events: 1. The user logged into their account on the store (`event_name = 'login'`). 2. The user land the page (`event_name = 'visit'`). 3. The user adds to the shopping cart(`event_name = 'cart'`). 4. The user complete the purchase (`event_name = 'purchase'`). ```sql CREATE TABLE events(user_id BIGINT, event_name VARCHAR, event_timestamp TIMESTAMP); INSERT INTO events VALUES(100123, 'login', '2022-05-14 10:01:00'); INSERT INTO events VALUES(100123, 'visit', '2022-05-14 10:02:00'); INSERT INTO events VALUES(100123, 'cart', '2022-05-14 10:04:00'); INSERT INTO events VALUES(100123, 'purchase', '2022-05-14 10:10:00'); INSERT INTO events VALUES(100125, 'login', '2022-05-15 11:00:00'); INSERT INTO events VALUES(100125, 'visit', '2022-05-15 11:01:00'); INSERT INTO events VALUES(100125, 'cart', '2022-05-15 11:02:00'); INSERT INTO events VALUES(100126, 'login', '2022-05-15 12:00:00'); INSERT INTO events VALUES(100126, 'visit', '2022-05-15 12:01:00'); ``` Input table: ```sql ┌─────────┬────────────┬────────────────────────────┐ │ user_id │ event_name │ event_timestamp │ ├─────────┼────────────┼────────────────────────────┤ │ 100123 │ login │ 2022-05-14 10:01:00.000000 │ │ 100123 │ visit │ 2022-05-14 10:02:00.000000 │ │ 100123 │ cart │ 2022-05-14 10:04:00.000000 │ │ 100123 │ purchase │ 2022-05-14 10:10:00.000000 │ │ 100125 │ login │ 2022-05-15 11:00:00.000000 │ │ 100125 │ visit │ 2022-05-15 11:01:00.000000 │ │ 100125 │ cart │ 2022-05-15 11:02:00.000000 │ │ 100126 │ login │ 2022-05-15 12:00:00.000000 │ │ 100126 │ visit │ 2022-05-15 12:01:00.000000 │ └─────────┴────────────┴────────────────────────────┘ ``` Find out how far the user `user_id` could get through the chain in an hour window slides. Query: ```sql SELECT level, count() AS count FROM ( SELECT user_id, window_funnel(3600000000)(event_timestamp, event_name = 'login', event_name = 'visit', event_name = 'cart', event_name = 'purchase') AS level FROM events GROUP BY user_id ) GROUP BY level ORDER BY level ASC; ``` Note The `event_timestamp` type is timestamp, `3600000000` is a hour time window. Result: ```sql ┌───────┬───────┐ │ level │ count │ ├───────┼───────┤ │ 2 │ 1 │ │ 3 │ 1 │ │ 4 │ 1 │ └───────┴───────┘ ``` * User `100126` level is 2 (`login -> visit`) . * user `100125` level is 3 (`login -> visit -> cart`). * User `100123` level is 4 (`login -> visit -> cart -> purchase`). # Window Functions (Lakehouse v1) > Lakehouse v1 SQL window functions: compute over row windows — ranking, running totals, leads, and lags. ## Overview [Section titled “Overview”](#overview) A window function operates on a group (“window”) of related rows. For each input row, a window function returns one output row that depends on the specific row passed to the function and the values of the other rows in the window. There are two main types of order-sensitive window functions: * `Rank-related functions`: Rank-related functions list information based on the “rank” of a row. For example, ranking stores in descending order by profit per year, the store with the most profit will be ranked 1, and the second-most profitable store will be ranked 2, and so on. * `Window frame functions`: Window frame functions enable you to perform rolling operations, such as calculating a running total or a moving average, on a subset of the rows in the window. ## List of Functions That Support Windows [Section titled “List of Functions That Support Windows”](#list-of-functions-that-support-windows) The list below shows all the window functions. | Function Name | Category | Window | Window Frame | Notes | | ------------------------------------------------------------------- | ------------ | ------ | ------------ | ----- | | [ARRAY\_AGG](../07-aggregate-functions/aggregate-array-agg) | General | ✔ | | | | [AVG](../07-aggregate-functions/aggregate-avg) | General | ✔ | ✔ | | | [AVG\_IF](../07-aggregate-functions/aggregate-avg-if) | General | ✔ | ✔ | | | [COUNT](../07-aggregate-functions/aggregate-count) | General | ✔ | ✔ | | | [COUNT\_IF](../07-aggregate-functions/aggregate-count-if) | General | ✔ | ✔ | | | [COVAR\_POP](../07-aggregate-functions/aggregate-covar-pop) | General | ✔ | | | | [COVAR\_SAMP](../07-aggregate-functions/aggregate-covar-samp) | General | ✔ | | | | [MAX](../07-aggregate-functions/aggregate-max) | General | ✔ | ✔ | | | [MAX\_IF](../07-aggregate-functions/aggregate-max-if) | General | ✔ | ✔ | | | [MIN](../07-aggregate-functions/aggregate-min) | General | ✔ | ✔ | | | [MIN\_IF](../07-aggregate-functions/aggregate-min-if) | General | ✔ | ✔ | | | [STDDEV\_POP](../07-aggregate-functions/aggregate-stddev-pop) | General | ✔ | ✔ | | | [STDDEV\_SAMP](../07-aggregate-functions/aggregate-stddev-samp) | General | ✔ | ✔ | | | [MEDIAN](../07-aggregate-functions/aggregate-median) | General | ✔ | ✔ | | | [QUANTILE\_CONT](../07-aggregate-functions/aggregate-quantile-cont) | General | ✔ | ✔ | | | [QUANTILE\_DISC](../07-aggregate-functions/aggregate-quantile-disc) | General | ✔ | ✔ | | | [KURTOSIS](../07-aggregate-functions/aggregate-kurtosis) | General | ✔ | ✔ | | | [SKEWNESS](../07-aggregate-functions/aggregate-skewness) | General | ✔ | ✔ | | | [SUM](../07-aggregate-functions/aggregate-sum) | General | ✔ | ✔ | | | [SUM\_IF](../07-aggregate-functions/aggregate-sum-if) | General | ✔ | ✔ | | | [CUME\_DIST](cume-dist) | Rank-related | ✔ | | | | [PERCENT\_RANK](percent_rank) | Rank-related | ✔ | ✔ | | | [DENSE\_RANK](dense-rank) | Rank-related | ✔ | ✔ | | | [RANK](rank) | Rank-related | ✔ | ✔ | | | [ROW\_NUMBER](row-number) | Rank-related | ✔ | | | | [NTILE](ntile) | Rank-related | ✔ | | | | [FIRST\_VALUE](first-value) | Rank-related | ✔ | ✔ | | | [FIRST](first) | Rank-related | ✔ | ✔ | | | [LAST\_VALUE](last-value) | Rank-related | ✔ | ✔ | | | [LAST](last) | Rank-related | ✔ | ✔ | | | [NTH\_VALUE](nth-value) | Rank-related | ✔ | ✔ | | | [LEAD](lead) | Rank-related | ✔ | | | | [LAG](lag) | Rank-related | ✔ | | | ## Window Syntax [Section titled “Window Syntax”](#window-syntax) ```sql ( [ ] ) OVER ( { named window | inline window } ) named window ::= { window_name | ( window_name ) } inline window ::= [ PARTITION BY ] [ ORDER BY ] [ window frame ] ``` The `named window` is a window that is defined in the `WINDOW` clause of the `SELECT` statement, eg: `SELECT a, SUM(a) OVER w FROM t WINDOW w AS ( inline window )`. The `` is one of ([aggregate function](../07-aggregate-functions), rank function, value function). The `OVER` clause specifies that the function is being used as a window function. The `PARTITION BY` sub-clause allows rows to be grouped into sub-groups, for example by city, by year, etc. The `PARTITION BY` clause is optional. You can analyze an entire group of rows without breaking it into sub-groups. The `ORDER BY` clause orders rows within the window. The `window frame` clause specifies the window frame type and the window frame extent. The `window frame` clause is optional. If you omit the `window frame` clause, the default window frame type is `RANGE` and the default window frame extent is `UNBOUNDED PRECEDING AND CURRENT ROW`. ## Window Frame Syntax [Section titled “Window Frame Syntax”](#window-frame-syntax) `window frame` can be one of the following types: ```sql cumulativeFrame ::= { { ROWS | RANGE } BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW | { ROWS | RANGE } BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING } ``` ```sql slidingFrame ::= { ROWS BETWEEN { PRECEDING | FOLLOWING } AND { PRECEDING | FOLLOWING } | ROWS BETWEEN UNBOUNDED PRECEDING AND { PRECEDING | FOLLOWING } | ROWS BETWEEN { PRECEDING | FOLLOWING } AND UNBOUNDED FOLLOWING } ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) **Create the table** ```sql CREATE TABLE employees ( employee_id INT, first_name VARCHAR, last_name VARCHAR, department VARCHAR, salary INT ); ``` **Insert data** ```sql INSERT INTO employees (employee_id, first_name, last_name, department, salary) VALUES (1, 'John', 'Doe', 'IT', 75000), (2, 'Jane', 'Smith', 'HR', 85000), (3, 'Mike', 'Johnson', 'IT', 90000), (4, 'Sara', 'Williams', 'Sales', 60000), (5, 'Tom', 'Brown', 'HR', 82000), (6, 'Ava', 'Davis', 'Sales', 62000), (7, 'Olivia', 'Taylor', 'IT', 72000), (8, 'Emily', 'Anderson', 'HR', 77000), (9, 'Sophia', 'Lee', 'Sales', 58000), (10, 'Ella', 'Thomas', 'IT', 67000); ``` **Example 1: Ranking employees by salary** In this example, we use the RANK() function to rank employees based on their salaries in descending order. The highest salary will get a rank of 1, and the lowest salary will get the highest rank number. ```sql SELECT employee_id, first_name, last_name, department, salary, RANK() OVER (ORDER BY salary DESC) AS rank FROM employees; ``` Result: | employee\_id | first\_name | last\_name | department | salary | rank | | ------------ | ----------- | ---------- | ---------- | ------ | ---- | | 3 | Mike | Johnson | IT | 90000 | 1 | | 2 | Jane | Smith | HR | 85000 | 2 | | 5 | Tom | Brown | HR | 82000 | 3 | | 8 | Emily | Anderson | HR | 77000 | 4 | | 1 | John | Doe | IT | 75000 | 5 | | 7 | Olivia | Taylor | IT | 72000 | 6 | | 10 | Ella | Thomas | IT | 67000 | 7 | | 6 | Ava | Davis | Sales | 62000 | 8 | | 4 | Sara | Williams | Sales | 60000 | 9 | | 9 | Sophia | Lee | Sales | 58000 | 10 | **Example 2: Calculating the total salary per department** In this example, we use the SUM() function with PARTITION BY to calculate the total salary paid per department. Each row will show the department and the total salary for that department. ```sql SELECT department, SUM(salary) OVER (PARTITION BY department) AS total_salary FROM employees; ``` Result: | department | total\_salary | | ---------- | ------------- | | HR | 244000 | | HR | 244000 | | HR | 244000 | | IT | 304000 | | IT | 304000 | | IT | 304000 | | IT | 304000 | | Sales | 180000 | | Sales | 180000 | | Sales | 180000 | **Example 3: Calculating a running total of salaries per department** In this example, we use the SUM() function with a cumulative window frame to calculate a running total of salaries within each department. The running total is calculated based on the employee’s salary ordered by their employee\_id. ```sql SELECT employee_id, first_name, last_name, department, salary, SUM(salary) OVER (PARTITION BY department ORDER BY employee_id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total FROM employees; ``` Result: | employee\_id | first\_name | last\_name | department | salary | running\_total | | ------------ | ----------- | ---------- | ---------- | ------ | -------------- | | 2 | Jane | Smith | HR | 85000 | 85000 | | 5 | Tom | Brown | HR | 82000 | 167000 | | 8 | Emily | Anderson | HR | 77000 | 244000 | | 1 | John | Doe | IT | 75000 | 75000 | | 3 | Mike | Johnson | IT | 90000 | 165000 | | 7 | Olivia | Taylor | IT | 72000 | 237000 | | 10 | Ella | Thomas | IT | 67000 | 304000 | | 4 | Sara | Williams | Sales | 60000 | 60000 | | 6 | Ava | Davis | Sales | 62000 | 122000 | | 9 | Sophia | Lee | Sales | 58000 | 180000 | # CUME_DIST (Lakehouse v1) > CUME_DIST — returns the cumulative distribution of a given value in a set of values. Returns the cumulative distribution of a given value in a set of values. It calculates the proportion of rows that have values less than or equal to the specified value, divided by the total number of rows. Please note that the resulting value falls between 0 and 1, inclusive. See also: [PERCENT\_RANK](../percent_rank) ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.cume_dist().over(partition_by=[], order_by=[]) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python table.name, table.score, table.grade, func.cume_dist().over(partition_by=[table.grade], order_by=table.score).alias('cume_dist_val') name |score|grade|cume_dist_val| --------+-----+-----+-------------+ Smith | 81|A | 0.25| Davies | 84|A | 0.5| Evans | 87|A | 0.75| Johnson | 100|A | 1.0| Taylor | 62|B | 0.5| Brown | 62|B | 0.5| Wilson | 72|B | 1.0| Thomas | 72|B | 1.0| Jones | 55|C | 1.0| Williams| 55|C | 1.0| ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql CUME_DIST() OVER ( PARTITION BY expr, ... ORDER BY expr [ASC | DESC], ... ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) This example retrieves the students’ names, scores, grades, and the cumulative distribution values (cume\_dist\_val) within each grade using the CUME\_DIST() window function. ```sql CREATE TABLE students ( name VARCHAR(20), score INT NOT NULL, grade CHAR(1) NOT NULL ); INSERT INTO students (name, score, grade) VALUES ('Smith', 81, 'A'), ('Jones', 55, 'C'), ('Williams', 55, 'C'), ('Taylor', 62, 'B'), ('Brown', 62, 'B'), ('Davies', 84, 'A'), ('Evans', 87, 'A'), ('Wilson', 72, 'B'), ('Thomas', 72, 'B'), ('Johnson', 100, 'A'); SELECT name, score, grade, CUME_DIST() OVER (PARTITION BY grade ORDER BY score) AS cume_dist_val FROM students; name |score|grade|cume_dist_val| --------+-----+-----+-------------+ Smith | 81|A | 0.25| Davies | 84|A | 0.5| Evans | 87|A | 0.75| Johnson | 100|A | 1.0| Taylor | 62|B | 0.5| Brown | 62|B | 0.5| Wilson | 72|B | 1.0| Thomas | 72|B | 1.0| Jones | 55|C | 1.0| Williams| 55|C | 1.0| ``` # DENSE_RANK (Lakehouse v1) > DENSE_RANK — returns the rank of a value within a group of values, without gaps in the ranks. Returns the rank of a value within a group of values, without gaps in the ranks. The rank value starts at 1 and continues up sequentially. If two values are the same, they have the same rank. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.dense_rank().over(partition_by=[], order_by=[]) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python table.department, func.sum(salary), func.dense_rank().over(order_by=func.sum(table.salary).desc()).alias('dense_rank') | department | total_salary | dense_rank | |------------|--------------|------------| | IT | 172000 | 1 | | HR | 160000 | 2 | | Sales | 77000 | 3 | ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql DENSE_RANK() OVER ( [ PARTITION BY ] ORDER BY [ ASC | DESC ] [ ] ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) **Create the table** ```sql CREATE TABLE employees ( employee_id INT, first_name VARCHAR, last_name VARCHAR, department VARCHAR, salary INT ); ``` **Insert data** ```sql INSERT INTO employees (employee_id, first_name, last_name, department, salary) VALUES (1, 'John', 'Doe', 'IT', 90000), (2, 'Jane', 'Smith', 'HR', 85000), (3, 'Mike', 'Johnson', 'IT', 82000), (4, 'Sara', 'Williams', 'Sales', 77000), (5, 'Tom', 'Brown', 'HR', 75000); ``` **Calculating the total salary per department using DENSE\_RANK** ```sql SELECT department, SUM(salary) AS total_salary, DENSE_RANK() OVER (ORDER BY SUM(salary) DESC) AS dense_rank FROM employees GROUP BY department; ``` Result: | department | total\_salary | dense\_rank | | ---------- | ------------- | ----------- | | IT | 172000 | 1 | | HR | 160000 | 2 | | Sales | 77000 | 3 | # FIRST (Lakehouse v1) > FIRST — alias for the FIRST_VALUE window function. Alias for [FIRST\_VALUE](../first-value). # FIRST_VALUE (Lakehouse v1) > FIRST_VALUE — Returns the first value from an ordered group of values. Returns the first value from an ordered group of values. See also: * [LAST\_VALUE](../last-value) * [NTH\_VALUE](../nth-value) ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.first_value().over(partition_by=[], order_by=[]) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python table.employee_id, table.first_name, table.last_name, table.salary, func.first_value(table.first_name).over(order_by=table.salary.desc()).alias('highest_salary_first_name') employee_id | first_name | last_name | salary | highest_salary_first_name ------------+------------+-----------+---------+-------------------------- 4 | Mary | Williams | 7000.00 | Mary 2 | Jane | Smith | 6000.00 | Mary 3 | David | Johnson | 5500.00 | Mary 1 | John | Doe | 5000.00 | Mary 5 | Michael | Brown | 4500.00 | Mary ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql FIRST_VALUE(expression) OVER ([PARTITION BY partition_expression] ORDER BY order_expression [window_frame]) ``` For the syntax of window frame, see [Window Frame Syntax](..#window-frame-syntax). ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql CREATE TABLE employees ( employee_id INT, first_name VARCHAR(50), last_name VARCHAR(50), salary DECIMAL(10,2) ); INSERT INTO employees (employee_id, first_name, last_name, salary) VALUES (1, 'John', 'Doe', 5000.00), (2, 'Jane', 'Smith', 6000.00), (3, 'David', 'Johnson', 5500.00), (4, 'Mary', 'Williams', 7000.00), (5, 'Michael', 'Brown', 4500.00); -- Use FIRST_VALUE to retrieve the first name of the employee with the highest salary SELECT employee_id, first_name, last_name, salary, FIRST_VALUE(first_name) OVER (ORDER BY salary DESC) AS highest_salary_first_name FROM employees; employee_id | first_name | last_name | salary | highest_salary_first_name ------------+------------+-----------+---------+-------------------------- 4 | Mary | Williams | 7000.00 | Mary 2 | Jane | Smith | 6000.00 | Mary 3 | David | Johnson | 5500.00 | Mary 1 | John | Doe | 5000.00 | Mary 5 | Michael | Brown | 4500.00 | Mary ``` # LAG (Lakehouse v1) > LAG — lAG allows you to access the value of a column from a preceding row within the same result. LAG allows you to access the value of a column from a preceding row within the same result set. It is typically used to retrieve the value of a column in the previous row, based on a specified ordering. See also: [LEAD](../lead) ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.lag(, ).over(partition_by=[], order_by=[]) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python table.product_name, table.sale_amount, func.lag(table.sale_amount, 1).over(partition_by=table.product_name, order_by=table.sale_id).alias('previous_sale_amount') product_name | sale_amount | previous_sale_amount ----------------------------------------------- Product A | 1000.00 | NULL Product A | 1500.00 | 1000.00 Product A | 2000.00 | 1500.00 Product B | 500.00 | NULL Product B | 800.00 | 500.00 Product B | 1200.00 | 800.00 ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql LAG(expression [, offset [, default]]) OVER (PARTITION BY partition_expression ORDER BY sort_expression) ``` * *offset*: Specifies the number of rows ahead (LEAD) or behind (LAG) the current row within the partition to retrieve the value from. Defaults to 1. > Note that setting a negative offset has the same effect as using the [LEAD](../lead) function. * *default*: Specifies a value to be returned if the LEAD or LAG function encounters a situation where there is no value available due to the offset exceeding the partition’s boundaries. Defaults to NULL. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql CREATE TABLE sales ( sale_id INT, product_name VARCHAR(50), sale_amount DECIMAL(10, 2) ); INSERT INTO sales (sale_id, product_name, sale_amount) VALUES (1, 'Product A', 1000.00), (2, 'Product A', 1500.00), (3, 'Product A', 2000.00), (4, 'Product B', 500.00), (5, 'Product B', 800.00), (6, 'Product B', 1200.00); SELECT product_name, sale_amount, LAG(sale_amount) OVER (PARTITION BY product_name ORDER BY sale_id) AS previous_sale_amount FROM sales; product_name | sale_amount | previous_sale_amount ----------------------------------------------- Product A | 1000.00 | NULL Product A | 1500.00 | 1000.00 Product A | 2000.00 | 1500.00 Product B | 500.00 | NULL Product B | 800.00 | 500.00 Product B | 1200.00 | 800.00 -- The following statements return the same result. SELECT product_name, sale_amount, LAG(sale_amount, -1) OVER (PARTITION BY product_name ORDER BY sale_id) AS next_sale_amount FROM sales; SELECT product_name, sale_amount, LEAD(sale_amount) OVER (PARTITION BY product_name ORDER BY sale_id) AS next_sale_amount FROM sales; product_name|sale_amount|next_sale_amount| ------------+-----------+----------------+ Product A | 1000.00| 1500.00| Product A | 1500.00| 2000.00| Product A | 2000.00| | Product B | 500.00| 800.00| Product B | 800.00| 1200.00| Product B | 1200.00| | ``` # LAST (Lakehouse v1) > LAST — alias for the LAST_VALUE window function. Alias for [LAST\_VALUE](../last-value). # LAST_VALUE (Lakehouse v1) > LAST_VALUE — Returns the last value from an ordered group of values. Returns the last value from an ordered group of values. See also: * [FIRST\_VALUE](../first-value) * [NTH\_VALUE](../nth-value) ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.last_value().over(partition_by=[], order_by=[]) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python table.employee_id, table.first_name, table.last_name, table.salary, func.last_value(table.first_name).over(order_by=table.salary.desc()).alias('lowest_salary_first_name') employee_id | first_name | last_name | salary | lowest_salary_first_name ------------+------------+-----------+---------+------------------------ 4 | Mary | Williams | 7000.00 | Michael 2 | Jane | Smith | 6000.00 | Michael 3 | David | Johnson | 5500.00 | Michael 1 | John | Doe | 5000.00 | Michael 5 | Michael | Brown | 4500.00 | Michael ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql LAST_VALUE(expression) OVER ([PARTITION BY partition_expression] ORDER BY order_expression [window_frame]) ``` For the syntax of window frame, see [Window Frame Syntax](..#window-frame-syntax). ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql CREATE TABLE employees ( employee_id INT, first_name VARCHAR(50), last_name VARCHAR(50), salary DECIMAL(10,2) ); INSERT INTO employees (employee_id, first_name, last_name, salary) VALUES (1, 'John', 'Doe', 5000.00), (2, 'Jane', 'Smith', 6000.00), (3, 'David', 'Johnson', 5500.00), (4, 'Mary', 'Williams', 7000.00), (5, 'Michael', 'Brown', 4500.00); -- Use LAST_VALUE to retrieve the first name of the employee with the lowest salary SELECT employee_id, first_name, last_name, salary, LAST_VALUE(first_name) OVER (ORDER BY salary DESC) AS lowest_salary_first_name FROM employees; employee_id | first_name | last_name | salary | lowest_salary_first_name ------------+------------+-----------+---------+------------------------ 4 | Mary | Williams | 7000.00 | Michael 2 | Jane | Smith | 6000.00 | Michael 3 | David | Johnson | 5500.00 | Michael 1 | John | Doe | 5000.00 | Michael 5 | Michael | Brown | 4500.00 | Michael ``` # LEAD (Lakehouse v1) > LEAD — lEAD allows you to access the value of a column from a subsequent row within the same. LEAD allows you to access the value of a column from a subsequent row within the same result set. It is typically used to retrieve the value of a column in the next row, based on a specified ordering. See also: [LAG](../lag) ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.lead(, ).over(partition_by=[], order_by=[]) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python table.product_name, table.sale_amount, func.lead(table.sale_amount, 1).over(partition_by=table.product_name, order_by=table.sale_id).alias('next_sale_amount') product_name | sale_amount | next_sale_amount ---------------------------------------------- Product A | 1000.00 | 1500.00 Product A | 1500.00 | 2000.00 Product A | 2000.00 | NULL Product B | 500.00 | 800.00 Product B | 800.00 | 1200.00 Product B | 1200.00 | NULL ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql LEAD(expression [, offset [, default]]) OVER (PARTITION BY partition_expression ORDER BY sort_expression) ``` * *offset*: Specifies the number of rows ahead (LEAD) or behind (LAG) the current row within the partition to retrieve the value from. Defaults to 1. > Note that setting a negative offset has the same effect as using the [LAG](../lag) function. * *default*: Specifies a value to be returned if the LEAD or LAG function encounters a situation where there is no value available due to the offset exceeding the partition’s boundaries. Defaults to NULL. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql CREATE TABLE sales ( sale_id INT, product_name VARCHAR(50), sale_amount DECIMAL(10, 2) ); INSERT INTO sales (sale_id, product_name, sale_amount) VALUES (1, 'Product A', 1000.00), (2, 'Product A', 1500.00), (3, 'Product A', 2000.00), (4, 'Product B', 500.00), (5, 'Product B', 800.00), (6, 'Product B', 1200.00); SELECT product_name, sale_amount, LEAD(sale_amount) OVER (PARTITION BY product_name ORDER BY sale_id) AS next_sale_amount FROM sales; product_name | sale_amount | next_sale_amount ---------------------------------------------- Product A | 1000.00 | 1500.00 Product A | 1500.00 | 2000.00 Product A | 2000.00 | NULL Product B | 500.00 | 800.00 Product B | 800.00 | 1200.00 Product B | 1200.00 | NULL -- The following statements return the same result. SELECT product_name, sale_amount, LEAD(sale_amount, -1) OVER (PARTITION BY product_name ORDER BY sale_id) AS previous_sale_amount FROM sales; SELECT product_name, sale_amount, LAG(sale_amount) OVER (PARTITION BY product_name ORDER BY sale_id) AS previous_sale_amount FROM sales; product_name|sale_amount|previous_sale_amount| ------------+-----------+--------------------+ Product A | 1000.00| | Product A | 1500.00| 1000.00| Product A | 2000.00| 1500.00| Product B | 500.00| | Product B | 800.00| 500.00| Product B | 1200.00| 800.00| ``` # NTH_VALUE (Lakehouse v1) > NTH_VALUE — Returns the Nth value from an ordered group of values. Returns the Nth value from an ordered group of values. See also: * [FIRST\_VALUE](../first-value) * [LAST\_VALUE](../last-value) ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.nth_value(, ).over(partition_by=[], order_by=[]) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python table.employee_id, table.first_name, table.last_name, table.salary, func.nth_value(table.first_name, 2).over(order_by=table.salary.desc()).alias('second_highest_salary_first_name') employee_id | first_name | last_name | salary | second_highest_salary_first_name ------------+------------+-----------+---------+---------------------------------- 4 | Mary | Williams | 7000.00 | Jane 2 | Jane | Smith | 6000.00 | Jane 3 | David | Johnson | 5500.00 | Jane 1 | John | Doe | 5000.00 | Jane 5 | Michael | Brown | 4500.00 | Jane ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql NTH_VALUE(expression, n) OVER ([PARTITION BY partition_expression] ORDER BY order_expression [window_frame]) ``` For the syntax of window frame, see [Window Frame Syntax](..#window-frame-syntax). ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql CREATE TABLE employees ( employee_id INT, first_name VARCHAR(50), last_name VARCHAR(50), salary DECIMAL(10,2) ); INSERT INTO employees (employee_id, first_name, last_name, salary) VALUES (1, 'John', 'Doe', 5000.00), (2, 'Jane', 'Smith', 6000.00), (3, 'David', 'Johnson', 5500.00), (4, 'Mary', 'Williams', 7000.00), (5, 'Michael', 'Brown', 4500.00); -- Use NTH_VALUE to retrieve the first name of the employee with the second highest salary SELECT employee_id, first_name, last_name, salary, NTH_VALUE(first_name, 2) OVER (ORDER BY salary DESC) AS second_highest_salary_first_name FROM employees; employee_id | first_name | last_name | salary | second_highest_salary_first_name ------------+------------+-----------+---------+---------------------------------- 4 | Mary | Williams | 7000.00 | Jane 2 | Jane | Smith | 6000.00 | Jane 3 | David | Johnson | 5500.00 | Jane 1 | John | Doe | 5000.00 | Jane 5 | Michael | Brown | 4500.00 | Jane ``` # NTILE (Lakehouse v1) > NTILE — divides the sorted result set into a specified number of buckets or groups. Divides the sorted result set into a specified number of buckets or groups. It evenly distributes the sorted rows into these buckets and assigns a bucket number to each row. The NTILE function is typically used with the ORDER BY clause to sort the results. Please note that the NTILE function evenly distributes the rows into buckets based on the sorting order of the rows and ensures that the number of rows in each bucket is as equal as possible. If the number of rows cannot be evenly distributed into the buckets, some buckets may have one extra row compared to the others. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.ntile().over(partition_by=[], order_by=[]) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python table.name, table.score, table.grade, func.ntile(3).over(partition_by=[table.grade], order_by=table.score).alias('bucket') name |score|grade|bucket| --------+-----+-----+------+ Johnson | 100|A | 1| Evans | 87|A | 1| Davies | 84|A | 2| Smith | 81|A | 3| Wilson | 72|B | 1| Thomas | 72|B | 1| Taylor | 62|B | 2| Brown | 62|B | 3| Jones | 55|C | 1| Williams| 55|C | 2| ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql NTILE(n) OVER ( PARTITION BY expr, ... ORDER BY expr [ASC | DESC], ... ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) This example retrieves the students’ names, scores, grades, and assigns them to buckets based on their scores within each grade using the NTILE() window function. ```sql CREATE TABLE students ( name VARCHAR(20), score INT NOT NULL, grade CHAR(1) NOT NULL ); INSERT INTO students (name, score, grade) VALUES ('Smith', 81, 'A'), ('Jones', 55, 'C'), ('Williams', 55, 'C'), ('Taylor', 62, 'B'), ('Brown', 62, 'B'), ('Davies', 84, 'A'), ('Evans', 87, 'A'), ('Wilson', 72, 'B'), ('Thomas', 72, 'B'), ('Johnson', 100, 'A'); SELECT name, score, grade, ntile(3) OVER (PARTITION BY grade ORDER BY score DESC) AS bucket FROM students; name |score|grade|bucket| --------+-----+-----+------+ Johnson | 100|A | 1| Evans | 87|A | 1| Davies | 84|A | 2| Smith | 81|A | 3| Wilson | 72|B | 1| Thomas | 72|B | 1| Taylor | 62|B | 2| Brown | 62|B | 3| Jones | 55|C | 1| Williams| 55|C | 2| ``` # PERCENT_RANK (Lakehouse v1) > PERCENT_RANK — returns the relative rank of a given value within a set of values. Returns the relative rank of a given value within a set of values. The resulting value falls between 0 and 1, inclusive. Please note that the first row in any set has a PERCENT\_RANK of 0. See also: [CUME\_DIST](../cume-dist) ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.percent_rank().over(partition_by=[], order_by=[]) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python table.name, table.score, table.grade, func.percent_rank().over(partition_by=[table.grade], order_by=table.score).alias('percent_rank') name |score|grade|percent_rank | --------+-----+-----+------------------+ Smith | 81|A | 0.0| Davies | 84|A |0.3333333333333333| Evans | 87|A |0.6666666666666666| Johnson | 100|A | 1.0| Taylor | 62|B | 0.0| Brown | 62|B | 0.0| Wilson | 72|B |0.6666666666666666| Thomas | 72|B |0.6666666666666666| Jones | 55|C | 0.0| Williams| 55|C | 0.0| ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql PERCENT_RANK() OVER ( PARTITION BY expr, ... ORDER BY expr [ASC | DESC], ... ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) This example retrieves the students’ names, scores, grades, and the percentile ranks (percent\_rank) within each grade using the PERCENT\_RANK() window function. ```sql CREATE TABLE students ( name VARCHAR(20), score INT NOT NULL, grade CHAR(1) NOT NULL ); INSERT INTO students (name, score, grade) VALUES ('Smith', 81, 'A'), ('Jones', 55, 'C'), ('Williams', 55, 'C'), ('Taylor', 62, 'B'), ('Brown', 62, 'B'), ('Davies', 84, 'A'), ('Evans', 87, 'A'), ('Wilson', 72, 'B'), ('Thomas', 72, 'B'), ('Johnson', 100, 'A'); SELECT name, score, grade, PERCENT_RANK() OVER (PARTITION BY grade ORDER BY score) AS percent_rank FROM students; name |score|grade|percent_rank | --------+-----+-----+------------------+ Smith | 81|A | 0.0| Davies | 84|A |0.3333333333333333| Evans | 87|A |0.6666666666666666| Johnson | 100|A | 1.0| Taylor | 62|B | 0.0| Brown | 62|B | 0.0| Wilson | 72|B |0.6666666666666666| Thomas | 72|B |0.6666666666666666| Jones | 55|C | 0.0| Williams| 55|C | 0.0| ``` # RANK (Lakehouse v1) > RANK — assigns a unique rank to each value within an ordered group of values. The RANK() function assigns a unique rank to each value within an ordered group of values. The rank value starts at 1 and continues up sequentially. If two values are the same, they have the same rank. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.rank().over(partition_by=[], order_by=[]) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python table.employee_id, table.first_name, table.last_name, table.department, table.salary, func.rank().over(order_by=table.salary).alias('rank') | employee_id | first_name | last_name | department | salary | rank | |-------------|------------|-----------|------------|--------|------| | 1 | John | Doe | IT | 90000 | 1 | | 2 | Jane | Smith | HR | 85000 | 2 | | 3 | Mike | Johnson | IT | 82000 | 3 | | 4 | Sara | Williams | Sales | 77000 | 4 | | 5 | Tom | Brown | HR | 75000 | 5 | ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql RANK() OVER ( [ PARTITION BY ] ORDER BY [ { ASC | DESC } ] [ ] ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) **Create the table** ```sql CREATE TABLE employees ( employee_id INT, first_name VARCHAR, last_name VARCHAR, department VARCHAR, salary INT ); ``` **Insert data** ```sql INSERT INTO employees (employee_id, first_name, last_name, department, salary) VALUES (1, 'John', 'Doe', 'IT', 90000), (2, 'Jane', 'Smith', 'HR', 85000), (3, 'Mike', 'Johnson', 'IT', 82000), (4, 'Sara', 'Williams', 'Sales', 77000), (5, 'Tom', 'Brown', 'HR', 75000); ``` **Ranking employees by salary** ```sql SELECT employee_id, first_name, last_name, department, salary, RANK() OVER (ORDER BY salary DESC) AS rank FROM employees; ``` Result: | employee\_id | first\_name | last\_name | department | salary | rank | | ------------ | ----------- | ---------- | ---------- | ------ | ---- | | 1 | John | Doe | IT | 90000 | 1 | | 2 | Jane | Smith | HR | 85000 | 2 | | 3 | Mike | Johnson | IT | 82000 | 3 | | 4 | Sara | Williams | Sales | 77000 | 4 | | 5 | Tom | Brown | HR | 75000 | 5 | # ROW_NUMBER (Lakehouse v1) > ROW_NUMBER — assigns a temporary sequential number to each row within a partition of a result. Assigns a temporary sequential number to each row within a partition of a result set, starting at 1 for the first row in each partition. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.row_number().over(partition_by=[], order_by=[]) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python table.employee_id, table.first_name, table.last_name, table.department, table.salary, func.row_number().over(partition=table.department, order_by=table.salary).alias('row_num') ┌──────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ employee_id │ first_name │ last_name │ department │ salary │ row_num │ ├─────────────────┼──────────────────┼──────────────────┼──────────────────┼─────────────────┼─────────┤ │ 2 │ Jane │ Smith │ HR │ 85000 │ 1 │ │ 5 │ Tom │ Brown │ HR │ 75000 │ 2 │ │ 1 │ John │ Doe │ IT │ 90000 │ 1 │ │ 3 │ Mike │ Johnson │ IT │ 82000 │ 2 │ │ 4 │ Sara │ Williams │ Sales │ 77000 │ 1 │ └──────────────────────────────────────────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ROW_NUMBER() OVER ( [ PARTITION BY [, ... ] ] ORDER BY [ , ... ] [ { ASC | DESC } ] ) ``` | Parameter | Required? | Description | | ---------- | --------- | ---------------------------------------------------------------------------------- | | ORDER BY | Yes | Specifies the order of rows within each partition. | | ASC / DESC | No | Specifies the sorting order within each partition. ASC (ascending) is the default. | | QUALIFY | No | Filters rows based on conditions. | ## SQL Examples [Section titled “SQL Examples”](#sql-examples) This example demonstrates the use of ROW\_NUMBER() to assign sequential numbers to employees within their departments, ordered by descending salary. ```sql -- Prepare the data CREATE TABLE employees ( employee_id INT, first_name VARCHAR, last_name VARCHAR, department VARCHAR, salary INT ); INSERT INTO employees (employee_id, first_name, last_name, department, salary) VALUES (1, 'John', 'Doe', 'IT', 90000), (2, 'Jane', 'Smith', 'HR', 85000), (3, 'Mike', 'Johnson', 'IT', 82000), (4, 'Sara', 'Williams', 'Sales', 77000), (5, 'Tom', 'Brown', 'HR', 75000); -- Select employee details along with the row number partitioned by department and ordered by salary in descending order. SELECT employee_id, first_name, last_name, department, salary, ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num FROM employees; ┌──────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ employee_id │ first_name │ last_name │ department │ salary │ row_num │ ├─────────────────┼──────────────────┼──────────────────┼──────────────────┼─────────────────┼─────────┤ │ 2 │ Jane │ Smith │ HR │ 85000 │ 1 │ │ 5 │ Tom │ Brown │ HR │ 75000 │ 2 │ │ 1 │ John │ Doe │ IT │ 90000 │ 1 │ │ 3 │ Mike │ Johnson │ IT │ 82000 │ 2 │ │ 4 │ Sara │ Williams │ Sales │ 77000 │ 1 │ └──────────────────────────────────────────────────────────────────────────────────────────────────────┘ ``` # Geography Functions (Lakehouse v1) > Lakehouse v1 SQL geography functions: compute distances, bearings, and containment between geographic points and shapes on a spheroid. This section provides reference information for the geography functions in PlaidCloud Lakehouse. These functions are based on the [H3 hierarchical hexagonal indexing system](https://www.uber.com/blog/h3/) developed by Uber for computing geographic relationships. ## Coordinate Conversion [Section titled “Coordinate Conversion”](#coordinate-conversion) * [GEO\_TO\_H3](geo-to-h3) * [GEOHASH\_DECODE](geohash-decode) * [GEOHASH\_ENCODE](geohash-encode) * [STRING\_TO\_H3](string-to-h3) * [H3\_TO\_GEO](h3-to-geo) * [H3\_TO\_STRING](h3-to-string) ## Hexagon Properties [Section titled “Hexagon Properties”](#hexagon-properties) * [H3\_CELL\_AREA\_M2](h3-cell-area-m2) * [H3\_CELL\_AREA\_RADS2](h3-cell-area-rads2) * [H3\_HEX\_AREA\_KM2](h3-hex-area-km2) * [H3\_HEX\_AREA\_M2](h3-hex-area-m2) * [H3\_GET\_BASE\_CELL](h3-get-base-cell) * [H3\_GET\_FACES](h3-get-faces) * [H3\_GET\_RESOLUTION](h3-get-resolution) * [H3\_TO\_CENTER\_CHILD](h3-to-center-child) * [H3\_TO\_CHILDREN](h3-to-children) * [H3\_TO\_GEO\_BOUNDARY](h3-to-geo-boundary) * [H3\_TO\_PARENT](h3-to-parent) * [H3\_NUM\_HEXAGONS](h3-num-hexagons) ## Hexagon Relationships [Section titled “Hexagon Relationships”](#hexagon-relationships) * [H3\_HEX\_RING](h3-hex-ring) * [H3\_K\_RING](h3-k-ring) * [H3\_INDEXES\_ARE\_NEIGHBORS](h3-indexes-are-neighbors) * [H3\_IS\_PENTAGON](h3-is-pentagon) * [H3\_IS\_RES\_CLASS\_III](h3-is-res-class-iii) * [H3\_IS\_VALID](h3-is-valid) * [H3\_GET\_DESTINATION\_INDEX\_FROM\_UNIDIRECTIONAL\_EDGE](h3-get-destination-index-from-unidirectional-edge) * [H3\_GET\_INDEXES\_FROM\_UNIDIRECTIONAL\_EDGE](h3-get-indexes-from-unidirectional-edge) * [H3\_GET\_ORIGIN\_INDEX\_FROM\_UNIDIRECTIONAL\_EDGE](h3-get-origin-index-from-unidirectional-edge) * [H3\_GET\_UNIDIRECTIONAL\_EDGE\_BOUNDARY](h3-get-unidirectional-edge-boundary) * [H3\_GET\_UNIDIRECTIONAL\_EDGE](h3-get-unidirectional-edge) * [H3\_GET\_UNIDIRECTIONAL\_EDGES\_FROM\_HEXAGON](h3-get-unidirectional-edges-from-hexagon) * [H3\_UNIDIRECTIONAL\_EDGE\_IS\_VALID](h3-unidirectional-edge-is-valid) ## Measurement [Section titled “Measurement”](#measurement) * [H3\_DISTANCE](h3-distance) * [H3\_EDGE\_ANGLE](h3-edge-angle) * [H3\_EDGE\_LENGTH\_KM](h3-edge-length-km) * [H3\_EDGE\_LENGTH\_M](h3-edge-length-m) * [H3\_EXACT\_EDGE\_LENGTH\_KM](h3-exact-edge-length-km) * [H3\_EXACT\_EDGE\_LENGTH\_M](h3-exact-edge-length-m) * [H3\_EXACT\_EDGE\_LENGTH\_RADS](h3-exact-edge-length-rads) ## General Utility [Section titled “General Utility”](#general-utility) * [POINT\_IN\_POLYGON](point-in-polygon) * [H3\_LINE](h3-line) # GEO_TO_H3 (Lakehouse v1) > GEO_TO_H3 — returns the H3 index of the hexagon cell where the given location resides. Returns the [H3](https://eng.uber.com/h3/) index of the hexagon cell where the given location resides. Returning 0 means an error occurred. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.geo_to_h3(lon, lat, res) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.geo_to_h3(37.79506683, 55.71290588, 15) ┌──────────────────────────────────────────────┐ │ func.geo_to_h3(37.79506683, 55.71290588, 15) │ ├──────────────────────────────────────────────┤ │ 644325524701193974 │ └──────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql GEO_TO_H3(lon, lat, res) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT GEO_TO_H3(37.79506683, 55.71290588, 15); ┌─────────────────────────────────────────┐ │ geo_to_h3(37.79506683, 55.71290588, 15) │ ├─────────────────────────────────────────┤ │ 644325524701193974 │ └─────────────────────────────────────────┘ ``` # GEOHASH_DECODE (Lakehouse v1) > GEOHASH_DECODE — converts a Geohash-encoded string into latitude/longitude coordinates. Converts a [Geohash](https://en.wikipedia.org/wiki/Geohash)-encoded string into latitude/longitude coordinates. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.geohash_decode('') ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.geohash_decode('ezs42') ┌─────────────────────────────────┐ │ func.geohash_decode('ezs42') │ ├─────────────────────────────────┤ │ (-5.60302734375,42.60498046875) │ └─────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql GEOHASH_DECODE('') ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT GEOHASH_DECODE('ezs42'); ┌─────────────────────────────────┐ │ geohash_decode('ezs42') │ ├─────────────────────────────────┤ │ (-5.60302734375,42.60498046875) │ └─────────────────────────────────┘ ``` # GEOHASH_ENCODE (Lakehouse v1) > GEOHASH_ENCODE — converts a pair of latitude and longitude coordinates into a Geohash-encoded. Converts a pair of latitude and longitude coordinates into a [Geohash](https://en.wikipedia.org/wiki/Geohash)-encoded string. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.geohash_encode(lon, lat) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.geohash_encode(-5.60302734375, 42.593994140625) ┌─────────────────────────────────────────────────────────┐ │ func.geohash_encode((- 5.60302734375), 42.593994140625) │ ├─────────────────────────────────────────────────────────┤ │ ezs42d000000 │ └─────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql GEOHASH_ENCODE(lon, lat) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT GEOHASH_ENCODE(-5.60302734375, 42.593994140625); ┌────────────────────────────────────────────────────┐ │ geohash_encode((- 5.60302734375), 42.593994140625) │ ├────────────────────────────────────────────────────┤ │ ezs42d000000 │ └────────────────────────────────────────────────────┘ ``` # H3_CELL_AREA_M2 (Lakehouse v1) > H3_CELL_AREA_M2 — returns the exact area of specific cell in square meters. Returns the exact area of specific cell in square meters. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.h3_cell_area_m2(h3) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.h3_cell_area_m2(599119489002373119) ┌──────────────────────────────────────────┐ │ func.h3_cell_area_m2(599119489002373119) │ ├──────────────────────────────────────────┤ │ 127785582.60809991 │ └──────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql H3_CELL_AREA_M2(h3) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT H3_CELL_AREA_M2(599119489002373119); ┌─────────────────────────────────────┐ │ h3_cell_area_m2(599119489002373119) │ ├─────────────────────────────────────┤ │ 127785582.60809991 │ └─────────────────────────────────────┘ ``` # H3_CELL_AREA_RADS2 (Lakehouse v1) > H3_CELL_AREA_RADS2 — returns the exact area of specific cell in square radians. Returns the exact area of specific cell in square radians. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.h3_cell_area_rads2(h3) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.h3_cell_area_rads2(599119489002373119) ┌─────────────────────────────────────────────┐ │ func.h3_cell_area_rads2(599119489002373119) │ ├─────────────────────────────────────────────┤ │ 0.000003148224310427697 │ └─────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql H3_CELL_AREA_RADS2(h3) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT H3_CELL_AREA_RADS2(599119489002373119); ┌────────────────────────────────────────┐ │ h3_cell_area_rads2(599119489002373119) │ ├────────────────────────────────────────┤ │ 0.000003148224310427697 │ └────────────────────────────────────────┘ ``` # H3_DISTANCE (Lakehouse v1) > H3_DISTANCE — returns the grid distance between the the given two H3 indexes. Returns the grid distance between the the given two [H3](https://eng.uber.com/h3/) indexes. Note H3 distance calculations can only calculate distances between hexes that are neighbors. Trying to use this with non-neighbor hexes will error. ### Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.h3_distance(h3, a_h3) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.h3_distance(599119489002373119, 599119491149856767) ┌──────────────────────────────────────────────────────────┐ │ func.h3_distance(599119489002373119, 599119491149856767) │ ├──────────────────────────────────────────────────────────┤ │ 1 │ └──────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql H3_DISTANCE(h3, a_h3) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT H3_DISTANCE(599119489002373119, 599119491149856767); ┌─────────────────────────────────────────────────────┐ │ h3_distance(599119489002373119, 599119491149856767) │ ├─────────────────────────────────────────────────────┤ │ 1 │ └─────────────────────────────────────────────────────┘ ``` # H3_EDGE_ANGLE (Lakehouse v1) > H3_EDGE_ANGLE — returns the average length of the H3 hexagon edge in grades. Returns the average length of the H3 hexagon edge in grades. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.h3_edge_angle(res) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.h3_edge_angle(10) ┌────────────────────────────┐ │ func.h3_edge_angle(10) │ ├────────────────────────────┤ │ 0.0006822586214153981 │ └────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql H3_EDGE_ANGLE(res) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT H3_EDGE_ANGLE(10); ┌───────────────────────┐ │ h3_edge_angle(10) │ ├───────────────────────┤ │ 0.0006822586214153981 │ └───────────────────────┘ ``` # H3_EDGE_LENGTH_KM (Lakehouse v1) > H3_EDGE_LENGTH_KM — returns the average hexagon edge length in kilometers at the given resolution. Returns the average hexagon edge length in kilometers at the given resolution. Excludes pentagons. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.h3_edge_length_km(res) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.h3_edge_length_km(1) ┌───────────────────────────┐ │ func.h3_edge_length_km(1) │ ├───────────────────────────┤ │ 483.0568390711111 │ └───────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql H3_EDGE_LENGTH_KM(res) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT H3_EDGE_LENGTH_KM(1); ┌──────────────────────┐ │ h3_edge_length_km(1) │ ├──────────────────────┤ │ 483.0568390711111 │ └──────────────────────┘ ``` # H3_EDGE_LENGTH_M (Lakehouse v1) > H3_EDGE_LENGTH_M — returns the average hexagon edge length in meters at the given resolution. Returns the average hexagon edge length in meters at the given resolution. Excludes pentagons. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.h3_edge_length(1) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.h3_edge_length(1) ┌──────────────────────────┐ │ func.h3_edge_length_m(1) │ ├──────────────────────────┤ │ 483056.8390711111 │ └──────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql H3_EDGE_LENGTH_M(1) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql ┌─────────────────────┐ │ h3_edge_length_m(1) │ ├─────────────────────┤ │ 483056.8390711111 │ └─────────────────────┘ ``` # H3_EXACT_EDGE_LENGTH_KM (Lakehouse v1) > H3_EXACT_EDGE_LENGTH_KM — computes the length of this directed edge, in kilometers. Computes the length of this directed edge, in kilometers. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.h3_exact_edge_length_km(h3) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.h3_exact_edge_length_km(1319695429381652479) ┌───────────────────────────────────────────────────┐ │ func.h3_exact_edge_length_km(1319695429381652479) │ ├───────────────────────────────────────────────────┤ │ 8.267326832647143 │ └───────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql H3_EXACT_EDGE_LENGTH_KM(h3) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT H3_EXACT_EDGE_LENGTH_KM(1319695429381652479); ┌──────────────────────────────────────────────┐ │ h3_exact_edge_length_km(1319695429381652479) │ ├──────────────────────────────────────────────┤ │ 8.267326832647143 │ └──────────────────────────────────────────────┘ ``` # H3_EXACT_EDGE_LENGTH_M (Lakehouse v1) > H3_EXACT_EDGE_LENGTH_M — computes the length of this directed edge, in meters. Computes the length of this directed edge, in meters. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.h3_exact_edge_length_m(h3) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.h3_exact_edge_length_m(1319695429381652479) ┌──────────────────────────────────────────────────┐ │ func.h3_exact_edge_length_m(1319695429381652479) │ ├──────────────────────────────────────────────────┤ │ 8267.326832647143 │ └──────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql H3_EXACT_EDGE_LENGTH_M(h3) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT H3_EXACT_EDGE_LENGTH_M(1319695429381652479); ┌─────────────────────────────────────────────┐ │ h3_exact_edge_length_m(1319695429381652479) │ ├─────────────────────────────────────────────┤ │ 8267.326832647143 │ └─────────────────────────────────────────────┘ ``` # H3_EXACT_EDGE_LENGTH_RADS (Lakehouse v1) > H3_EXACT_EDGE_LENGTH_RADS — computes the length of this directed edge, in radians. Computes the length of this directed edge, in radians. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.h3_exact_edge_length_km(h3) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.h3_exact_edge_length_km(1319695429381652479) ┌───────────────────────────────────────────────────┐ │ func.h3_exact_edge_length_km(1319695429381652479) │ ├───────────────────────────────────────────────────┤ │ 8.267326832647143 │ └───────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql H3_EXACT_EDGE_LENGTH_RADS(h3) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT H3_EXACT_EDGE_LENGTH_KM(1319695429381652479); ┌──────────────────────────────────────────────┐ │ h3_exact_edge_length_km(1319695429381652479) │ ├──────────────────────────────────────────────┤ │ 8.267326832647143 │ └──────────────────────────────────────────────┘ ``` # H3_GET_BASE_CELL (Lakehouse v1) > H3_GET_BASE_CELL — Returns the base cell number of the given H3 index. Returns the base cell number of the given [H3](https://eng.uber.com/h3/) index. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.h3_get_base_cell(h3) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.h3_get_base_cell(644325524701193974) ┌───────────────────────────────────────────┐ │ func.h3_get_base_cell(644325524701193974) │ ├───────────────────────────────────────────┤ │ 8 │ └───────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql H3_GET_BASE_CELL(h3) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT H3_GET_BASE_CELL(644325524701193974); ┌──────────────────────────────────────┐ │ h3_get_base_cell(644325524701193974) │ ├──────────────────────────────────────┤ │ 8 │ └──────────────────────────────────────┘ ``` # H3_GET_DESTINATION_INDEX_FROM_UNIDIRECTIONAL_EDGE (Lakehouse v1) > H3_GET_DESTINATION_INDEX_FROM_UNIDIRECTIONAL_EDGE — returns the destination hexagon index from the unidirectional edge H3Index. Returns the destination hexagon index from the unidirectional edge H3Index. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.h3_get_destination_index_from_unidirectional_edge(h3) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.h3_get_destination_index_from_unidirectional_edge(1248204388774707199) ┌─────────────────────────────────────────────────────────────────────────────┐ │ func.h3_get_destination_index_from_unidirectional_edge(1248204388774707199) │ ├─────────────────────────────────────────────────────────────────────────────┤ │ 599686043507097599 │ └─────────────────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql H3_GET_DESTINATION_INDEX_FROM_UNIDIRECTIONAL_EDGE(h3) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT H3_GET_DESTINATION_INDEX_FROM_UNIDIRECTIONAL_EDGE(1248204388774707199); ┌────────────────────────────────────────────────────────────────────────┐ │ h3_get_destination_index_from_unidirectional_edge(1248204388774707199) │ ├────────────────────────────────────────────────────────────────────────┤ │ 599686043507097599 │ └────────────────────────────────────────────────────────────────────────┘ ``` # H3_GET_FACES (Lakehouse v1) > H3_GET_FACES — finds all icosahedron faces intersected by the given H3 index. Finds all icosahedron faces intersected by the given [H3](https://eng.uber.com/h3/) index. Faces are represented as integers from 0-19. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.h3_get_faces(h3) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.h3_get_faces(599119489002373119) ┌───────────────────────────────────────┐ │ func.h3_get_faces(599119489002373119) │ ├───────────────────────────────────────┤ │ [0,1,2,3,4] │ └───────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql H3_GET_FACES(h3) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT H3_GET_FACES(599119489002373119); ┌──────────────────────────────────┐ │ h3_get_faces(599119489002373119) │ ├──────────────────────────────────┤ │ [0,1,2,3,4] │ └──────────────────────────────────┘ ``` # H3_GET_INDEXES_FROM_UNIDIRECTIONAL_EDGE (Lakehouse v1) > H3_GET_INDEXES_FROM_UNIDIRECTIONAL_EDGE — returns the origin and destination hexagon indexes. Returns the origin and destination hexagon indexes from the given unidirectional edge H3Index. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.h3_get_indexes_from_unidirectional_edge(h3) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.h3_get_indexes_from_unidirectional_edge(1248204388774707199) ┌────────────────────────────────────────────────────────────────────┐ │ func.h3_get_indexes_from_unidirectional_edge(1248204388774707199) │ ├────────────────────────────────────────────────────────────────────┤ │ (599686042433355775,599686043507097599) │ └────────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql H3_GET_INDEXES_FROM_UNIDIRECTIONAL_EDGE(h3) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT H3_GET_INDEXES_FROM_UNIDIRECTIONAL_EDGE(1248204388774707199); ┌──────────────────────────────────────────────────────────────┐ │ h3_get_indexes_from_unidirectional_edge(1248204388774707199) │ ├──────────────────────────────────────────────────────────────┤ │ (599686042433355775,599686043507097599) │ └──────────────────────────────────────────────────────────────┘ ``` # H3_GET_ORIGIN_INDEX_FROM_UNIDIRECTIONAL_EDGE (Lakehouse v1) > H3_GET_ORIGIN_INDEX_FROM_UNIDIRECTIONAL_EDGE — returns the origin hexagon index from the unidirectional edge H3Index. Returns the origin hexagon index from the unidirectional edge H3Index. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.h3_get_origin_index_from_unidirectional_edge(h3) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.h3_get_origin_index_from_unidirectional_edge(1248204388774707199) ┌────────────────────────────────────────────────────────────────────────┐ │ func.h3_get_origin_index_from_unidirectional_edge(1248204388774707199) │ ├────────────────────────────────────────────────────────────────────────┤ │ 599686042433355775 │ └────────────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql H3_GET_ORIGIN_INDEX_FROM_UNIDIRECTIONAL_EDGE(h3) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT H3_GET_ORIGIN_INDEX_FROM_UNIDIRECTIONAL_EDGE(1248204388774707199); ┌───────────────────────────────────────────────────────────────────┐ │ h3_get_origin_index_from_unidirectional_edge(1248204388774707199) │ ├───────────────────────────────────────────────────────────────────┤ │ 599686042433355775 │ └───────────────────────────────────────────────────────────────────┘ ``` # H3_GET_RESOLUTION (Lakehouse v1) > H3_GET_RESOLUTION — Returns the resolution of the given H3 index. Returns the resolution of the given [H3](https://eng.uber.com/h3/) index. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.h3_get_resolution(h3) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.h3_get_resolution(644325524701193974) ┌────────────────────────────────────────────┐ │ func.h3_get_resolution(644325524701193974) │ ├────────────────────────────────────────────┤ │ 15 │ └────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql H3_GET_RESOLUTION(h3) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT H3_GET_RESOLUTION(644325524701193974); ┌───────────────────────────────────────┐ │ h3_get_resolution(644325524701193974) │ ├───────────────────────────────────────┤ │ 15 │ └───────────────────────────────────────┘ ``` # H3_GET_UNIDIRECTIONAL_EDGE (Lakehouse v1) > H3_GET_UNIDIRECTIONAL_EDGE — returns the edge between the given two H3 indexes. Returns the edge between the given two [H3](https://eng.uber.com/h3/) indexes. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.h3_get_unidirectional_edge(h3, a_h3) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.h3_get_unidirectional_edge(644325524701193897, 644325524701193754) ┌─────────────────────────────────────────────────────────────────────────┐ │ func.h3_get_unidirectional_edge(644325524701193897, 644325524701193754) │ ├─────────────────────────────────────────────────────────────────────────┤ │ 1581074247194257065 │ └─────────────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql H3_GET_UNIDIRECTIONAL_EDGE(h3, a_h3) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT H3_GET_UNIDIRECTIONAL_EDGE(644325524701193897, 644325524701193754); ┌────────────────────────────────────────────────────────────────────┐ │ h3_get_unidirectional_edge(644325524701193897, 644325524701193754) │ ├────────────────────────────────────────────────────────────────────┤ │ 1581074247194257065 │ └────────────────────────────────────────────────────────────────────┘ ``` # H3_GET_UNIDIRECTIONAL_EDGE_BOUNDARY (Lakehouse v1) > H3_GET_UNIDIRECTIONAL_EDGE_BOUNDARY — returns the coordinates defining the unidirectional edge. Returns the coordinates defining the unidirectional edge. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.h3_get_unidirectional_edge_boundary(h3) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.h3_get_unidirectional_edge_boundary(1248204388774707199) ┌─────────────────────────────────────────────────────────────────────────────────┐ │ func.h3_get_unidirectional_edge_boundary(1248204388774707199) │ ├─────────────────────────────────────────────────────────────────────────────────┤ │ [(37.42012867767778,-122.03773496427027),(37.33755608435298,-122.090428929044)] │ └─────────────────────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql H3_GET_UNIDIRECTIONAL_EDGE_BOUNDARY(h3) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT H3_GET_UNIDIRECTIONAL_EDGE_BOUNDARY(1248204388774707199); ┌─────────────────────────────────────────────────────────────────────────────────┐ │ h3_get_unidirectional_edge_boundary(1248204388774707199) │ ├─────────────────────────────────────────────────────────────────────────────────┤ │ [(37.42012867767778,-122.03773496427027),(37.33755608435298,-122.090428929044)] │ └─────────────────────────────────────────────────────────────────────────────────┘ ``` # H3_GET_UNIDIRECTIONAL_EDGES_FROM_HEXAGON (Lakehouse v1) > H3_GET_UNIDIRECTIONAL_EDGES_FROM_HEXAGON — returns all of the unidirectional edges from the provided H3Index. Returns all of the unidirectional edges from the provided H3Index. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.h3_get_unidirectional_edges_from_hexagon(h3) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.h3_get_unidirectional_edges_from_hexagon(644325524701193754) ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ func.h3_get_unidirectional_edges_from_hexagon(644325524701193754) │ ├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ [1292843871042545178,1364901465080473114,1436959059118401050,1509016653156328986,1581074247194256922,1653131841232184858] │ └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql H3_GET_UNIDIRECTIONAL_EDGES_FROM_HEXAGON(h3) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT H3_GET_UNIDIRECTIONAL_EDGES_FROM_HEXAGON(644325524701193754); ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ h3_get_unidirectional_edges_from_hexagon(644325524701193754) │ ├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ [1292843871042545178,1364901465080473114,1436959059118401050,1509016653156328986,1581074247194256922,1653131841232184858] │ └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ``` # H3_HEX_AREA_KM2 (Lakehouse v1) > H3_HEX_AREA_KM2 — returns the average hexagon area in square kilometers at the given resolution. Returns the average hexagon area in square kilometers at the given resolution. Excludes pentagons. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.h3_area_km2(res) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.h3_area_km2(1) ┌─────────────────────────┐ │ func.h3_hex_area_km2(1) │ ├─────────────────────────┤ │ 609788.4417941332 │ └─────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql H3_HEX_AREA_KM2(res) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT H3_HEX_AREA_KM2(1); ┌────────────────────┐ │ h3_hex_area_km2(1) │ ├────────────────────┤ │ 609788.4417941332 │ └────────────────────┘ ``` # H3_HEX_AREA_M2 (Lakehouse v1) > H3_HEX_AREA_M2 — returns the average hexagon area in square meters at the given resolution. Returns the average hexagon area in square meters at the given resolution. Excludes pentagons. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.h3_hex_area_m2(res) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.h3_hex_area_m2(1) ┌────────────────────────┐ │ func.h3_hex_area_m2(1) │ ├────────────────────────┤ │ 609788441794.1339 │ └────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql H3_HEX_AREA_M2(res) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT H3_HEX_AREA_M2(1); ┌───────────────────┐ │ h3_hex_area_m2(1) │ ├───────────────────┤ │ 609788441794.1339 │ └───────────────────┘ ``` # H3_HEX_RING (Lakehouse v1) > H3_HEX_RING — returns the hollow ring of hexagons at grid distance k from a given origin index. Returns the “hollow” ring of hexagons at exactly grid distance `k` from the given [H3](https://eng.uber.com/h3/) index. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.h3_hex_ring(h3, k) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.h3_hex_ring(599686042433355775, 2) ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ func.h3_hex_ring(599686042433355775, 2) │ ├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ [599686018811035647,599686034917163007,599686029548453887,599686032769679359,599686198125920255,599686040285872127,599686041359613951,599686039212130303,599686023106002943,599686027400970239,599686013442326527,599686012368584703] │ └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql H3_HEX_RING(h3, k) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT H3_HEX_RING(599686042433355775, 2); ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ h3_hex_ring(599686042433355775, 2) │ ├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ [599686018811035647,599686034917163007,599686029548453887,599686032769679359,599686198125920255,599686040285872127,599686041359613951,599686039212130303,599686023106002943,599686027400970239,599686013442326527,599686012368584703] │ └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ``` # H3_INDEXES_ARE_NEIGHBORS (Lakehouse v1) > H3_INDEXES_ARE_NEIGHBORS — returns whether or not the provided H3 indexes are neighbors. Returns whether or not the provided [H3](https://eng.uber.com/h3/) indexes are neighbors. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.h3_indexes_are_neighbors(h3, a_h3) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.h3_indexes_are_neighbors(644325524701193974, 644325524701193897) ┌───────────────────────────────────────────────────────────────────────┐ │ func.h3_indexes_are_neighbors(644325524701193974, 644325524701193897) │ ├───────────────────────────────────────────────────────────────────────┤ │ true │ └───────────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql H3_INDEXES_ARE_NEIGHBORS(h3, a_h3) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT H3_INDEXES_ARE_NEIGHBORS(644325524701193974, 644325524701193897); ┌──────────────────────────────────────────────────────────────────┐ │ h3_indexes_are_neighbors(644325524701193974, 644325524701193897) │ ├──────────────────────────────────────────────────────────────────┤ │ true │ └──────────────────────────────────────────────────────────────────┘ ``` # H3_IS_PENTAGON (Lakehouse v1) > H3_IS_PENTAGON — checks if the given H3 index represents a pentagonal cell. Checks if the given [H3](https://eng.uber.com/h3/) index represents a pentagonal cell. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.h3_is_pentagon(h3) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.h3_is_pentagon(599119489002373119) ┌─────────────────────────────────────────┐ │ func.h3_is_pentagon(599119489002373119) │ ├─────────────────────────────────────────┤ │ true │ └─────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql H3_IS_PENTAGON(h3) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT H3_IS_PENTAGON(599119489002373119); ┌────────────────────────────────────┐ │ h3_is_pentagon(599119489002373119) │ ├────────────────────────────────────┤ │ true │ └────────────────────────────────────┘ ``` # H3_IS_RES_CLASS_III (Lakehouse v1) > H3_IS_RES_CLASS_III — checks if the given H3 index has a resolution with Class III orientation. Checks if the given [H3](https://eng.uber.com/h3/) index has a resolution with Class III orientation. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.h3_is_res_class_iii(h3) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.h3_is_res_class_iii(635318325446452991) ┌──────────────────────────────────────────────┐ │ func.h3_is_res_class_iii(635318325446452991) │ ├──────────────────────────────────────────────┤ │ true │ └──────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql H3_IS_RES_CLASS_III(h3) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT H3_IS_RES_CLASS_III(635318325446452991); ┌─────────────────────────────────────────┐ │ h3_is_res_class_iii(635318325446452991) │ ├─────────────────────────────────────────┤ │ true │ └─────────────────────────────────────────┘ ``` # H3_IS_VALID (Lakehouse v1) > H3_IS_VALID — Checks if the given H3 index is valid. Checks if the given [H3](https://eng.uber.com/h3/) index is valid. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.h3_is_valid(h3) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.h3_is_valid(644325524701193974) ┌──────────────────────────────────────┐ │ func.h3_is_valid(644325524701193974) │ ├──────────────────────────────────────┤ │ true │ └──────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql H3_IS_VALID(h3) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT H3_IS_VALID(644325524701193974); ┌─────────────────────────────────┐ │ h3_is_valid(644325524701193974) │ ├─────────────────────────────────┤ │ true │ └─────────────────────────────────┘ ``` # H3_K_RING (Lakehouse v1) > H3_K_RING — returns an array containing the H3 indexes of the k-ring hexagons surrounding the input H3 index. Returns an array containing the [H3](https://eng.uber.com/h3/) indexes of the k-ring hexagons surrounding the input H3 index. Each element in this array is an H3 index. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.h3_k_ring(h3, k) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.h3_k_ring(644325524701193974, 1) ┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ func.h3_k_ring(644325524701193974, 1) │ ├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ [644325524701193974,644325524701193899,644325524701193869,644325524701193970,644325524701193968,644325524701193972,644325524701193897] │ └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql H3_K_RING(h3, k) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT H3_K_RING(644325524701193974, 1); ┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ h3_k_ring(644325524701193974, 1) │ ├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ [644325524701193974,644325524701193899,644325524701193869,644325524701193970,644325524701193968,644325524701193972,644325524701193897] │ └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ``` # H3_LINE (Lakehouse v1) > H3_LINE — Returns the line of indexes between the given two H3 indexes. Returns the line of indexes between the given two [H3](https://eng.uber.com/h3/) indexes. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.h3_line(h3, a_h3) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.h3_line(599119489002373119, 599119491149856767) ┌──────────────────────────────────────────────────────┐ │ func.h3_line(599119489002373119, 599119491149856767) │ ├──────────────────────────────────────────────────────┤ │ [599119489002373119,599119491149856767] │ └──────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql H3_LINE(h3, a_h3) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT H3_LINE(599119489002373119, 599119491149856767); ┌─────────────────────────────────────────────────┐ │ h3_line(599119489002373119, 599119491149856767) │ ├─────────────────────────────────────────────────┤ │ [599119489002373119,599119491149856767] │ └─────────────────────────────────────────────────┘ ``` # H3_NUM_HEXAGONS (Lakehouse v1) > H3_NUM_HEXAGONS — returns the number of unique H3 indexes at the given resolution. Returns the number of unique [H3](https://eng.uber.com/h3/) indexes at the given resolution. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.h3_num_hexagons(res) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.h3_num_hexagons(10) ┌──────────────────────────┐ │ func.h3_num_hexagons(10) │ ├──────────────────────────┤ │ 33897029882 │ └──────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql H3_NUM_HEXAGONS(res) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT H3_NUM_HEXAGONS(10); ┌─────────────────────┐ │ h3_num_hexagons(10) │ ├─────────────────────┤ │ 33897029882 │ └─────────────────────┘ ``` # H3_TO_CENTER_CHILD (Lakehouse v1) > H3_TO_CENTER_CHILD — returns the center child index at the specified resolution. Returns the center child index at the specified resolution. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.h3_to_center_child(h3, res) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.h3_to_center_child(599119489002373119, 15) ┌─────────────────────────────────────────────────┐ │ func.h3_to_center_child(599119489002373119, 15) │ ├─────────────────────────────────────────────────┤ │ 644155484202336256 │ └─────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql H3_TO_CENTER_CHILD(h3, res) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT H3_TO_CENTER_CHILD(599119489002373119, 15); ┌────────────────────────────────────────────┐ │ h3_to_center_child(599119489002373119, 15) │ ├────────────────────────────────────────────┤ │ 644155484202336256 │ └────────────────────────────────────────────┘ ``` # H3_TO_CHILDREN (Lakehouse v1) > H3_TO_CHILDREN — returns the indexes contained by h3 at resolution child_res. Returns the indexes contained by `h3` at resolution `child_res`. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.h3_to_children(h3, child_res) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.h3_to_children(635318325446452991, 14) ┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ func.h3_to_children(635318325446452991, 14) │ ├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ [639821925073823431,639821925073823439,639821925073823447,639821925073823455,639821925073823463,639821925073823471,639821925073823479] │ └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql H3_TO_CHILDREN(h3, child_res) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT H3_TO_CHILDREN(635318325446452991, 14); ┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ h3_to_children(635318325446452991, 14) │ ├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ [639821925073823431,639821925073823439,639821925073823447,639821925073823455,639821925073823463,639821925073823471,639821925073823479] │ └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ``` # H3_TO_GEO (Lakehouse v1) > H3_TO_GEO — returns the longitude and latitude corresponding to the given H3 index. Returns the longitude and latitude corresponding to the given [H3](https://eng.uber.com/h3/) index. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.h3_to_geo(h3) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.h3_to_geo(644325524701193974) ┌────────────────────────────────────────┐ │ func.h3_to_geo(644325524701193974) │ ├────────────────────────────────────────┤ │ (37.79506616830255,55.712902431456676) │ └────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql H3_TO_GEO(h3) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT H3_TO_GEO(644325524701193974); ┌────────────────────────────────────────┐ │ h3_to_geo(644325524701193974) │ ├────────────────────────────────────────┤ │ (37.79506616830255,55.712902431456676) │ └────────────────────────────────────────┘ ``` # H3_TO_GEO_BOUNDARY (Lakehouse v1) > H3_TO_GEO_BOUNDARY — returns an array containing the longitude and latitude coordinates of the vertices of the hexagon corresponding to the H3 index. Returns an array containing the longitude and latitude coordinates of the vertices of the hexagon corresponding to the [H3](https://eng.uber.com/h3/) index. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.h3_to_geo_boundary(h3) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.h3_to_geo_boundary(644325524701193974) ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ func.h3_to_geo_boundary(644325524701193974) │ ├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ [(37.79505811173477,55.712900225355526),(37.79506506997187,55.71289713485417),(37.795073126539855,55.71289934095484),(37.795074224871684,55.71290463755745),(37.79506726663349,55.71290772805916),(37.79505921006456,55.712905521957914)] │ └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql H3_TO_GEO_BOUNDARY(h3) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT H3_TO_GEO_BOUNDARY(644325524701193974); ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ h3_to_geo_boundary(644325524701193974) │ ├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ [(37.79505811173477,55.712900225355526),(37.79506506997187,55.71289713485417),(37.795073126539855,55.71289934095484),(37.795074224871684,55.71290463755745),(37.79506726663349,55.71290772805916),(37.79505921006456,55.712905521957914)] │ └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ``` # H3_TO_PARENT (Lakehouse v1) > H3_TO_PARENT — returns the parent index containing the h3 at resolution parent_res. Returns the parent index containing the `h3` at resolution `parent_res`. Returning 0 means an error occurred. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.h3_to_parent(h3, parent_res) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.h3_to_parent(635318325446452991, 12) ┌───────────────────────────────────────────┐ │ func.h3_to_parent(635318325446452991, 12) │ ├───────────────────────────────────────────┤ │ 630814725819082751 │ └───────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql H3_TO_PARENT(h3, parent_res) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT H3_TO_PARENT(635318325446452991, 12); ┌──────────────────────────────────────┐ │ h3_to_parent(635318325446452991, 12) │ ├──────────────────────────────────────┤ │ 630814725819082751 │ └──────────────────────────────────────┘ ``` # H3_TO_STRING (Lakehouse v1) > H3_TO_STRING — converts the representation of the given H3 index to the string representation. Converts the representation of the given [H3](https://eng.uber.com/h3/) index to the string representation. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.h3_to_string(h3) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.h3_to_string(635318325446452991) ┌───────────────────────────────────────┐ │ func.h3_to_string(635318325446452991) │ ├───────────────────────────────────────┤ │ 8d11aa6a38826ff │ └───────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql H3_TO_STRING(h3) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT H3_TO_STRING(635318325446452991); ┌──────────────────────────────────┐ │ h3_to_string(635318325446452991) │ ├──────────────────────────────────┤ │ 8d11aa6a38826ff │ └──────────────────────────────────┘ ``` # H3_UNIDIRECTIONAL_EDGE_IS_VALID (Lakehouse v1) > H3_UNIDIRECTIONAL_EDGE_IS_VALID — determines if the provided H3Index is a valid unidirectional. Determines if the provided H3Index is a valid unidirectional edge index. Returns 1 if it’s a unidirectional edge and 0 otherwise. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.h3_unidirectional_edge_is_valid(h3) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.h3_unidirectional_edge_is_valid(1248204388774707199) ┌───────────────────────────────────────────────────────────┐ │ func.h3_unidirectional_edge_is_valid(1248204388774707199) │ ├───────────────────────────────────────────────────────────┤ │ true │ └───────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql H3_UNIDIRECTIONAL_EDGE_IS_VALID(h3) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT H3_UNIDIRECTIONAL_EDGE_IS_VALID(1248204388774707199); ┌──────────────────────────────────────────────────────┐ │ h3_unidirectional_edge_is_valid(1248204388774707199) │ ├──────────────────────────────────────────────────────┤ │ true │ └──────────────────────────────────────────────────────┘ ``` # POINT_IN_POLYGON (Lakehouse v1) > POINT_IN_POLYGON — calculates whether a given point falls within the polygon formed by joining. Calculates whether a given point falls within the polygon formed by joining multiple points. A polygon is a closed shape connected by coordinate pairs in the order they appear. Changing the order of coordinate pairs can result in a different shape. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.point_in_polygon((x,y), [(a,b), (c,d), (e,f) ... ]) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.point_in_polygon((3., 3.), [(6, 0), (8, 4), (5, 8), (0, 2)]) ┌─────────────────────────────────────────────────────────────────┐ │ func.point_in_polygon((3, 3), [(6, 0), (8, 4), (5, 8), (0, 2)]) │ ├─────────────────────────────────────────────────────────────────┤ │ 1 │ └─────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql POINT_IN_POLYGON((x,y), [(a,b), (c,d), (e,f) ... ]) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT POINT_IN_POLYGON((3., 3.), [(6, 0), (8, 4), (5, 8), (0, 2)]); ┌────────────────────────────────────────────────────────────┐ │ point_in_polygon((3, 3), [(6, 0), (8, 4), (5, 8), (0, 2)]) │ ├────────────────────────────────────────────────────────────┤ │ 1 │ └────────────────────────────────────────────────────────────┘ ``` # STRING_TO_H3 (Lakehouse v1) > STRING_TO_H3 — converts the string representation to H3 (uint64) representation. Converts the string representation to [H3](https://eng.uber.com/h3/) (uint64) representation. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.string_to_h3(h3) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.string_to_h3('8d11aa6a38826ff') ┌──────────────────────────────────────┐ │ func.string_to_h3('8d11aa6a38826ff') │ ├──────────────────────────────────────┤ │ 635318325446452991 │ └──────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql STRING_TO_H3(h3) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT STRING_TO_H3('8d11aa6a38826ff'); ┌─────────────────────────────────┐ │ string_to_h3('8d11aa6a38826ff') │ ├─────────────────────────────────┤ │ 635318325446452991 │ └─────────────────────────────────┘ ``` # Geometry Functions (Lakehouse v1) > Lakehouse v1 SQL geometry functions: work with planar geometry types — construct, transform, and query shapes. This section provides reference information for the geometry and distance functions in PlaidCloud Lakehouse. # HAVERSINE (Lakehouse v1) > HAVERSINE — calculates the great circle distance in kilometers between two points on the Earth’s surface, using the Haversine formula. Calculates the great circle distance in kilometers between two points on the Earth’s surface, using the [Haversine formula](https://en.wikipedia.org/wiki/Haversine_formula). The two points are specified by their latitude and longitude in degrees. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql HAVERSINE(, , , ) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ---------------------------------- | | `` | The latitude of the first point. | | `` | The longitude of the first point. | | `` | The latitude of the second point. | | `` | The longitude of the second point. | ## Return Type [Section titled “Return Type”](#return-type) Double. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT HAVERSINE(40.7127, -74.0059, 34.0500, -118.2500) AS distance ┌────────────────┐ │ distance │ ├────────────────┤ │ 3936.390533556 │ └────────────────┘ ``` # ST_ASBINARY (Lakehouse v1) > ST_ASBINARY — alias for the ST_ASWKB geometry function. Alias for [ST\_ASWKB](../st-aswkb). # ST_ASEWKB (Lakehouse v1) > ST_ASEWKB — converts a GEOMETRY object into a EWKB(extended well-known-binary) format. Converts a GEOMETRY object into a [EWKB(extended well-known-binary)](https://postgis.net/docs/ST_GeomFromEWKB.html) format representation. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ST_ASEWKB() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | ------------ | ---------------------------------------------------- | | `` | The argument must be an expression of type GEOMETRY. | ## Return Type [Section titled “Return Type”](#return-type) Binary. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ST_ASEWKB( ST_GEOMETRYFROMWKT( 'SRID=4326;LINESTRING(400000 6000000, 401000 6010000)' ) ) AS pipeline_ewkb; ┌────────────────────────────────────────────────────────────────────────────────────────────┐ │ pipeline_ewkb │ ├────────────────────────────────────────────────────────────────────────────────────────────┤ │ 0102000020E61000000200000000000000006A18410000000060E3564100000000A07918410000000024ED5641 │ └────────────────────────────────────────────────────────────────────────────────────────────┘ SELECT ST_ASEWKB( ST_GEOMETRYFROMWKT( 'SRID=4326;POINT(-122.35 37.55)' ) ) AS pipeline_ewkb; ┌────────────────────────────────────────────────────┐ │ pipeline_ewkb │ ├────────────────────────────────────────────────────┤ │ 0101000020E61000006666666666965EC06666666666C64240 │ └────────────────────────────────────────────────────┘ ``` # ST_ASEWKT (Lakehouse v1) > ST_ASEWKT — converts a GEOMETRY object into a EWKT(extended well-known-text) format. Converts a GEOMETRY object into a [EWKT(extended well-known-text)](https://postgis.net/docs/ST_GeomFromEWKT.html) format representation. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ST_ASEWKT() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | ------------ | ---------------------------------------------------- | | `` | The argument must be an expression of type GEOMETRY. | ## Return Type [Section titled “Return Type”](#return-type) String. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ST_ASEWKT( ST_GEOMETRYFROMWKT( 'SRID=4326;LINESTRING(400000 6000000, 401000 6010000)' ) ) AS pipeline_ewkt; ┌─────────────────────────────────────────────────────┐ │ pipeline_ewkt │ ├─────────────────────────────────────────────────────┤ │ SRID=4326;LINESTRING(400000 6000000,401000 6010000) │ └─────────────────────────────────────────────────────┘ SELECT ST_ASEWKT( ST_GEOMETRYFROMWKT( 'SRID=4326;POINT(-122.35 37.55)' ) ) AS pipeline_ewkt; ┌────────────────────────────────┐ │ pipeline_ewkt │ ├────────────────────────────────┤ │ SRID=4326;POINT(-122.35 37.55) │ └────────────────────────────────┘ ``` # ST_ASGEOJSON (Lakehouse v1) > ST_ASGEOJSON — Converts a GEOMETRY object into a GeoJSON representation. Converts a GEOMETRY object into a [GeoJSON](https://geojson.org/) representation. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ST_ASGEOJSON() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | ------------ | ---------------------------------------------------- | | `` | The argument must be an expression of type GEOMETRY. | ## Return Type [Section titled “Return Type”](#return-type) Variant. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ST_ASGEOJSON( ST_GEOMETRYFROMWKT( 'SRID=4326;LINESTRING(400000 6000000, 401000 6010000)' ) ) AS pipeline_geojson; ┌─────────────────────────────────────────────────────────────────────────┐ │ pipeline_geojson │ ├─────────────────────────────────────────────────────────────────────────┤ │ {"coordinates":[[400000,6000000],[401000,6010000]],"type":"LineString"} │ └─────────────────────────────────────────────────────────────────────────┘ ``` # ST_ASTEXT (Lakehouse v1) > ST_ASTEXT — alias for the ST_ASWKT geometry function. Alias for [ST\_ASWKT](../st-aswkt). # ST_ASWKB (Lakehouse v1) > ST_ASWKB — converts a GEOMETRY object into a WKB(well-known-binary) format representation. Converts a GEOMETRY object into a [WKB(well-known-binary)](https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry#Well-known_binary) format representation. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ST_ASWKB() ``` ## Aliases [Section titled “Aliases”](#aliases) * [ST\_ASBINARY](../st-asbinary) ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | ------------ | ---------------------------------------------------- | | `` | The argument must be an expression of type GEOMETRY. | ## Return Type [Section titled “Return Type”](#return-type) Binary. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ST_ASWKB( ST_GEOMETRYFROMWKT( 'SRID=4326;LINESTRING(400000 6000000, 401000 6010000)' ) ) AS pipeline_wkb; ┌────────────────────────────────────────────────────────────────────────────────────┐ │ pipeline_wkb │ ├────────────────────────────────────────────────────────────────────────────────────┤ │ 01020000000200000000000000006A18410000000060E3564100000000A07918410000000024ED5641 │ └────────────────────────────────────────────────────────────────────────────────────┘ SELECT ST_ASBINARY( ST_GEOMETRYFROMWKT( 'SRID=4326;POINT(-122.35 37.55)' ) ) AS pipeline_wkb; ┌────────────────────────────────────────────┐ │ pipeline_wkb │ ├────────────────────────────────────────────┤ │ 01010000006666666666965EC06666666666C64240 │ └────────────────────────────────────────────┘ ``` # ST_ASWKT (Lakehouse v1) > ST_ASWKT — converts a GEOMETRY object into a WKT(well-known-text) format representation. Converts a GEOMETRY object into a [WKT(well-known-text)](https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry) format representation. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ST_ASWKT() ``` ## Aliases [Section titled “Aliases”](#aliases) * [ST\_ASTEXT](../st-astext) ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | ------------ | ---------------------------------------------------- | | `` | The argument must be an expression of type GEOMETRY. | ## Return Type [Section titled “Return Type”](#return-type) String. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ST_ASWKT( ST_GEOMETRYFROMWKT( 'SRID=4326;LINESTRING(400000 6000000, 401000 6010000)' ) ) AS pipeline_wkt; ┌───────────────────────────────────────────┐ │ pipeline_wkt │ ├───────────────────────────────────────────┤ │ LINESTRING(400000 6000000,401000 6010000) │ └───────────────────────────────────────────┘ SELECT ST_ASTEXT( ST_GEOMETRYFROMWKT( 'SRID=4326;POINT(-122.35 37.55)' ) ) AS pipeline_wkt; ┌──────────────────────┐ │ pipeline_wkt │ ├──────────────────────┤ │ POINT(-122.35 37.55) │ └──────────────────────┘ ``` # ST_CONTAINS (Lakehouse v1) > ST_CONTAINS — returns TRUE if the second GEOMETRY object is completely inside the first. Returns TRUE if the second GEOMETRY object is completely inside the first GEOMETRY object. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ST_CONTAINS(, ) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | ------------- | -------------------------------------------------------------------------------------------- | | `` | The argument must be an expression of type GEOMETRY object that is not a GeometryCollection. | | `` | The argument must be an expression of type GEOMETRY object that is not a GeometryCollection. | Note * The function reports an error if the two input GEOMETRY objects have different SRIDs. ## Return Type [Section titled “Return Type”](#return-type) Boolean. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ST_CONTAINS(TO_GEOMETRY('POLYGON((-2 0, 0 2, 2 0, -2 0))'), TO_GEOMETRY('POLYGON((-1 0, 0 1, 1 0, -1 0))')) AS contains ┌──────────┐ │ contains │ ├──────────┤ │ true │ └──────────┘ SELECT ST_CONTAINS(TO_GEOMETRY('POLYGON((-2 0, 0 2, 2 0, -2 0))'), TO_GEOMETRY('LINESTRING(-1 1, 0 2, 1 1)')) AS contains ┌──────────┐ │ contains │ ├──────────┤ │ false │ └──────────┘ SELECT ST_CONTAINS(TO_GEOMETRY('POLYGON((-2 0, 0 2, 2 0, -2 0))'), TO_GEOMETRY('LINESTRING(-2 0, 0 0, 0 1)')) AS contains ┌──────────┐ │ contains │ ├──────────┤ │ true │ └──────────┘ ``` # ST_DIMENSION (Lakehouse v1) > ST_DIMENSION — Return the dimension for a geometry object. Return the dimension for a geometry object. The dimension of a GEOMETRY object is: | Geospatial Object Type | Dimension | | ---------------------------- | --------- | | Point / MultiPoint | 0 | | LineString / MultiLineString | 1 | | Polygon / MultiPolygon | 2 | ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ST_DIMENSION() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | ------------ | ---------------------------------------------------- | | `` | The argument must be an expression of type GEOMETRY. | ## Return Type [Section titled “Return Type”](#return-type) UInt8. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ST_DIMENSION( ST_GEOMETRYFROMWKT( 'POINT(-122.306100 37.554162)' ) ) AS pipeline_dimension; ┌────────────────────┐ │ pipeline_dimension │ ├────────────────────┤ │ 0 │ └────────────────────┘ SELECT ST_DIMENSION( ST_GEOMETRYFROMWKT( 'LINESTRING(-124.20 42.00, -120.01 41.99)' ) ) AS pipeline_dimension; ┌────────────────────┐ │ pipeline_dimension │ ├────────────────────┤ │ 1 │ └────────────────────┘ SELECT ST_DIMENSION( ST_GEOMETRYFROMWKT( 'POLYGON((-124.20 42.00, -120.01 41.99, -121.1 42.01, -124.20 42.00))' ) ) AS pipeline_dimension; ┌────────────────────┐ │ pipeline_dimension │ ├────────────────────┤ │ 2 │ └────────────────────┘ ``` # ST_DISTANCE (Lakehouse v1) > ST_DISTANCE — returns the minimum Euclidean distance between two GEOMETRY objects. Returns the minimum [Euclidean distance](https://en.wikipedia.org/wiki/Euclidean_distance) between two GEOMETRY objects. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ST_DISTANCE(, ) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | ------------- | ----------------------------------------------------------------------------- | | `` | The argument must be an expression of type GEOMETRY and must contain a Point. | | `` | The argument must be an expression of type GEOMETRY and must contain a Point. | Note * Returns NULL if one or more input points are NULL. * The function reports an error if the two input GEOMETRY objects have different SRIDs. ## Return Type [Section titled “Return Type”](#return-type) Double. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ST_DISTANCE( TO_GEOMETRY('POINT(0 0)'), TO_GEOMETRY('POINT(1 1)') ) AS distance ┌─────────────┐ │ distance │ ├─────────────┤ │ 1.414213562 │ └─────────────┘ ``` # ST_ENDPOINT (Lakehouse v1) > ST_ENDPOINT — Returns the last Point in a LineString. Returns the last Point in a LineString. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ST_ENDPOINT() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | ------------ | --------------------------------------------------------------------------------- | | `` | The argument must be an expression of type GEOMETRY that represents a LineString. | ## Return Type [Section titled “Return Type”](#return-type) Geometry. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ST_ENDPOINT( ST_GEOMETRYFROMWKT( 'LINESTRING(1 1, 2 2, 3 3, 4 4)' ) ) AS pipeline_endpoint; ┌───────────────────┐ │ pipeline_endpoint │ ├───────────────────┤ │ POINT(4 4) │ └───────────────────┘ ``` # ST_GEOHASH (Lakehouse v1) > ST_GEOHASH — Return the geohash for a GEOMETRY object. Return the [geohash](https://en.wikipedia.org/wiki/Geohash) for a GEOMETRY object. A geohash is a short base32 string that identifies a geodesic rectangle containing a location in the world. The optional precision argument specifies the `precision` of the returned geohash. For example, passing 5 for \`precision returns a shorter geohash (5 characters long) that is less precise. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ST_GEOHASH( [, ]) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | ------------- | ------------------------------------------------------------------------- | | `geometry` | The argument must be an expression of type GEOMETRY. | | `[precision]` | Optional. specifies the precision of the returned geohash, default is 12. | ## Return Type [Section titled “Return Type”](#return-type) String. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ST_GEOHASH( ST_GEOMETRYFROMWKT( 'POINT(-122.306100 37.554162)' ) ) AS pipeline_geohash; ┌──────────────────┐ │ pipeline_geohash │ ├──────────────────┤ │ 9q9j8ue2v71y │ └──────────────────┘ SELECT ST_GEOHASH( ST_GEOMETRYFROMWKT( 'SRID=4326;POINT(-122.35 37.55)' ), 5 ) AS pipeline_geohash; ┌──────────────────┐ │ pipeline_geohash │ ├──────────────────┤ │ 9q8vx │ └──────────────────┘ ``` # ST_GEOM_POINT (Lakehouse v1) > ST_GEOM_POINT — alias for the ST_MAKEGEOMPOINT geometry function. Alias for [ST\_MAKEGEOMPOINT](../st-makegeompoint). # ST_GEOMETRYFROMEWKB (Lakehouse v1) > ST_GEOMETRYFROMEWKB — alias for the ST_GEOMTRYFROMWKB geometry function. Alias for [ST\_GEOMTRYFROMWKB](../st-geometryfromwkb). # ST_GEOMETRYFROMEWKT (Lakehouse v1) > ST_GEOMETRYFROMEWKT — alias for the ST_GEOMTRYFROMWKT geometry function. Alias for [ST\_GEOMTRYFROMWKT](../st-geometryfromwkt). # ST_GEOMETRYFROMTEXT (Lakehouse v1) > ST_GEOMETRYFROMTEXT — alias for the ST_GEOMETRYFROMWKT geometry function. Alias for [ST\_GEOMETRYFROMWKT](../st-geometryfromwkt). # ST_GEOMETRYFROMWKB (Lakehouse v1) > ST_GEOMETRYFROMWKB — parses a WKB(well-known-binary) or EWKB(extended well-known-binary) input. Parses a [WKB(well-known-binary)](https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry#Well-known_binary) or [EWKB(extended well-known-binary)](https://postgis.net/docs/ST_GeomFromEWKB.html) input and returns a value of type GEOMETRY. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ST_GEOMETRYFROMWKB(, []) ST_GEOMETRYFROMWKB(, []) ``` ## Aliases [Section titled “Aliases”](#aliases) * [ST\_GEOMFROMWKB](../st-geomfromwkb) * [ST\_GEOMETRYFROMEWKB](../st-geometryfromewkb) * [ST\_GEOMFROMEWKB](../st-geomfromewkb) ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | ---------- | ------------------------------------------------------------------------------ | | `` | The argument must be a string expression in WKB or EWKB in hexadecimal format. | | `` | The argument must be a binary expression in WKB or EWKB format. | | `` | The integer value of the SRID to use. | ## Return Type [Section titled “Return Type”](#return-type) Geometry. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ST_GEOMETRYFROMWKB( '0101000020797f000066666666a9cb17411f85ebc19e325641' ) AS pipeline_geometry; ┌────────────────────────────────────────┐ │ pipeline_geometry │ ├────────────────────────────────────────┤ │ SRID=32633;POINT(389866.35 5819003.03) │ └────────────────────────────────────────┘ SELECT ST_GEOMETRYFROMWKB( FROM_HEX('0101000020797f000066666666a9cb17411f85ebc19e325641'), 4326 ) AS pipeline_geometry; ┌───────────────────────────────────────┐ │ pipeline_geometry │ ├───────────────────────────────────────┤ │ SRID=4326;POINT(389866.35 5819003.03) │ └───────────────────────────────────────┘ ``` # ST_GEOMETRYFROMWKT (Lakehouse v1) > ST_GEOMETRYFROMWKT — parses a WKT(well-known-text) or EWKT(extended well-known-text) input and returns a value of type GEOMETRY. Parses a [WKT(well-known-text)](https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry) or [EWKT(extended well-known-text)](https://postgis.net/docs/ST_GeomFromEWKT.html) input and returns a value of type GEOMETRY. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ST_GEOMETRYFROMWKT(, []) ``` ## Aliases [Section titled “Aliases”](#aliases) * [ST\_GEOMFROMWKT](../st-geomfromwkt) * [ST\_GEOMETRYFROMEWKT](../st-geometryfromewkt) * [ST\_GEOMFROMEWKT](../st-geomfromewkt) * [ST\_GEOMFROMTEXT](../st-geomfromtext) * [ST\_GEOMETRYFROMTEXT](../st-geometryfromtext) ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | ---------- | --------------------------------------------------------------- | | `` | The argument must be a string expression in WKT or EWKT format. | | `` | The integer value of the SRID to use. | ## Return Type [Section titled “Return Type”](#return-type) Geometry. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ST_GEOMETRYFROMWKT( 'POINT(1820.12 890.56)' ) AS pipeline_geometry; ┌───────────────────────┐ │ pipeline_geometry │ ├───────────────────────┤ │ POINT(1820.12 890.56) │ └───────────────────────┘ SELECT ST_GEOMETRYFROMWKT( 'POINT(1820.12 890.56)', 4326 ) AS pipeline_geometry; ┌─────────────────────────────────┐ │ pipeline_geometry │ │ Geometry │ ├─────────────────────────────────┤ │ SRID=4326;POINT(1820.12 890.56) │ └─────────────────────────────────┘ ``` # ST_GEOMFROMEWKB (Lakehouse v1) > ST_GEOMFROMEWKB — alias for the ST_GEOMTRYFROMWKB geometry function. Alias for [ST\_GEOMTRYFROMWKB](../st-geometryfromwkb). # ST_GEOMFROMEWKT (Lakehouse v1) > ST_GEOMFROMEWKT — alias for the ST_GEOMTRYFROMWKT geometry function. Alias for [ST\_GEOMTRYFROMWKT](../st-geometryfromwkt). # ST_GEOMFROMGEOHASH (Lakehouse v1) > ST_GEOMFROMGEOHASH — returns a GEOMETRY object for the polygon that represents the boundaries. Returns a GEOMETRY object for the polygon that represents the boundaries of a [geohash](https://en.wikipedia.org/wiki/Geohash). ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ST_GEOMFROMGEOHASH() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | ----------- | ------------------------------- | | `` | The argument must be a geohash. | ## Return Type [Section titled “Return Type”](#return-type) Geometry. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ST_GEOMFROMGEOHASH( '9q60y60rhs' ) AS pipeline_geometry; ┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ st_geomfromgeohash('9q60y60rhs') │ ├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ POLYGON((-120.66230535507202 35.30029535293579,-120.66230535507202 35.30030071735382,-120.66229462623596 35.30030071735382,-120.66229462623596 35.30029535293579,-120.66230535507202 35.30029535293579)) │ └──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ``` # ST_GEOMFROMTEXT (Lakehouse v1) > ST_GEOMFROMTEXT — alias for the ST_GEOMTRYFROMWKT geometry function. Alias for [ST\_GEOMTRYFROMWKT](../st-geometryfromwkt). # ST_GEOMFROMWKB (Lakehouse v1) > ST_GEOMFROMWKB — alias for the ST_GEOMTRYFROMWKB geometry function. Alias for [ST\_GEOMTRYFROMWKB](../st-geometryfromwkb). # ST_GEOMFROMWKT (Lakehouse v1) > ST_GEOMFROMWKT — alias for the ST_GEOMTRYFROMWKT geometry function. Alias for [ST\_GEOMTRYFROMWKT](../st-geometryfromwkt). # ST_GEOMPOINTFROMGEOHASH (Lakehouse v1) > ST_GEOMPOINTFROMGEOHASH — returns a GEOMETRY object for the point that represents center of a geohash. Returns a GEOMETRY object for the point that represents center of a [geohash](https://en.wikipedia.org/wiki/Geohash). ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ST_GEOMPOINTFROMGEOHASH() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | ----------- | ------------------------------- | | `` | The argument must be a geohash. | ## Return Type [Section titled “Return Type”](#return-type) Geometry. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ST_GEOMPOINTFROMGEOHASH( 's02equ0' ) AS pipeline_geometry; ┌──────────────────────────────────────────────┐ │ pipeline_geometry │ │ Geometry │ ├──────────────────────────────────────────────┤ │ POINT(1.0004425048828125 2.0001983642578125) │ └──────────────────────────────────────────────┘ ``` # ST_LENGTH (Lakehouse v1) > ST_LENGTH — returns the Euclidean length of the LineString(s) in a GEOMETRY object. Returns the Euclidean length of the LineString(s) in a GEOMETRY object. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ST_LENGTH() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | ------------ | --------------------------------------------------------------------------- | | `` | The argument must be an expression of type GEOMETRY containing linestrings. | Note * If `` is not a `LineString`, `MultiLineString`, or `GeometryCollection` containing linestrings, returns 0. * If `` is a `GeometryCollection`, returns the sum of the lengths of the linestrings in the collection. ## Return Type [Section titled “Return Type”](#return-type) Double. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ST_LENGTH(TO_GEOMETRY('POINT(1 1)')) AS length ┌─────────┐ │ length │ ├─────────┤ │ 0 │ └─────────┘ SELECT ST_LENGTH(TO_GEOMETRY('LINESTRING(0 0, 1 1)')) AS length ┌─────────────┐ │ length │ ├─────────────┤ │ 1.414213562 │ └─────────────┘ SELECT ST_LENGTH( TO_GEOMETRY('POLYGON((0 0, 0 1, 1 1, 1 0, 0 0))') ) AS length ┌─────────┐ │ length │ ├─────────┤ │ 0 │ └─────────┘ ``` # ST_MAKE_LINE (Lakehouse v1) > ST_MAKE_LINE — alias for the ST_MAKELINE geometry function. Alias for [ST\_MAKELINE](../st-makeline). # ST_MAKEGEOMPOINT (Lakehouse v1) > ST_MAKEGEOMPOINT — constructs a GEOMETRY object that represents a Point with the specified. Constructs a GEOMETRY object that represents a Point with the specified longitude and latitude. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ST_MAKEGEOMPOINT(, ) ``` ## Aliases [Section titled “Aliases”](#aliases) * [ST\_GEOM\_POINT](../st-geom-point) ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | ------------- | --------------------------------------------- | | `` | A Double value that represents the longitude. | | `` | A Double value that represents the latitude. | ## Return Type [Section titled “Return Type”](#return-type) Geometry. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ST_MAKEGEOMPOINT( 7.0, 8.0 ) AS pipeline_point; ┌────────────────┐ │ pipeline_point │ ├────────────────┤ │ POINT(7 8) │ └────────────────┘ SELECT ST_MAKEGEOMPOINT( -122.3061, 37.554162 ) AS pipeline_point; ┌────────────────────────────┐ │ pipeline_point │ ├────────────────────────────┤ │ POINT(-122.3061 37.554162) │ └────────────────────────────┘ ``` # ST_MAKELINE (Lakehouse v1) > ST_MAKELINE — constructs a GEOMETRY object that represents a line connecting the points in the input two GEOMETRY objects. Constructs a GEOMETRY object that represents a line connecting the points in the input two GEOMETRY objects. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ST_MAKELINE(, ) ``` ## Aliases [Section titled “Aliases”](#aliases) * [ST\_MAKE\_LINE](../st-make-line) ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | ------------- | ----------------------------------------------------------------------------------------------------------- | | `` | A GEOMETRY object containing the points to connect. This object must be a Point, MultiPoint, or LineString. | | `` | A GEOMETRY object containing the points to connect. This object must be a Point, MultiPoint, or LineString. | ## Return Type [Section titled “Return Type”](#return-type) Geometry. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ST_MAKELINE( ST_GEOMETRYFROMWKT( 'POINT(-122.306100 37.554162)' ), ST_GEOMETRYFROMWKT( 'POINT(-104.874173 56.714538)' ) ) AS pipeline_line; ┌───────────────────────────────────────────────────────┐ │ pipeline_line │ ├───────────────────────────────────────────────────────┤ │ LINESTRING(-122.3061 37.554162,-104.874173 56.714538) │ └───────────────────────────────────────────────────────┘ ``` # ST_MAKEPOLYGON (Lakehouse v1) > ST_MAKEPOLYGON — constructs a GEOMETRY object that represents a Polygon without holes. Constructs a GEOMETRY object that represents a Polygon without holes. The function uses the specified LineString as the outer loop. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ST_MAKEPOLYGON() ``` ## Aliases [Section titled “Aliases”](#aliases) * [ST\_POLYGON](../st-polygon) ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | ------------ | ---------------------------------------------------- | | `` | The argument must be an expression of type GEOMETRY. | ## Return Type [Section titled “Return Type”](#return-type) Geometry. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ST_MAKEPOLYGON( ST_GEOMETRYFROMWKT( 'LINESTRING(0.0 0.0, 1.0 0.0, 1.0 2.0, 0.0 2.0, 0.0 0.0)' ) ) AS pipeline_polygon; ┌────────────────────────────────┐ │ pipeline_polygon │ ├────────────────────────────────┤ │ POLYGON((0 0,1 0,1 2,0 2,0 0)) │ └────────────────────────────────┘ ``` # ST_NPOINTS (Lakehouse v1) > ST_NPOINTS — Returns the number of points in a GEOMETRY object. Returns the number of points in a GEOMETRY object. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ST_NPOINTS() ``` ## Aliases [Section titled “Aliases”](#aliases) * [ST\_NUMPOINTS](../st-numpoints) ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | ------------ | ----------------------------------------------------------- | | `` | The argument must be an expression of type GEOMETRY object. | ## Return Type [Section titled “Return Type”](#return-type) UInt8. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ST_NPOINTS(TO_GEOMETRY('POINT(66 12)')) AS npoints ┌─────────┐ │ npoints │ ├─────────┤ │ 1 │ └─────────┘ SELECT ST_NPOINTS(TO_GEOMETRY('MULTIPOINT((45 21),(12 54))')) AS npoints ┌─────────┐ │ npoints │ ├─────────┤ │ 2 │ └─────────┘ SELECT ST_NPOINTS(TO_GEOMETRY('LINESTRING(40 60,50 50,60 40)')) AS npoints ┌─────────┐ │ npoints │ ├─────────┤ │ 3 │ └─────────┘ SELECT ST_NPOINTS(TO_GEOMETRY('MULTILINESTRING((1 1,32 17),(33 12,73 49,87.1 6.1))')) AS npoints ┌─────────┐ │ npoints │ ├─────────┤ │ 5 │ └─────────┘ SELECT ST_NPOINTS(TO_GEOMETRY('GEOMETRYCOLLECTION(POLYGON((-10 0,0 10,10 0,-10 0)),LINESTRING(40 60,50 50,60 40),POINT(99 11))')) AS npoints ┌─────────┐ │ npoints │ ├─────────┤ │ 8 │ └─────────┘ ``` # ST_NUMPOINTS (Lakehouse v1) > ST_NUMPOINTS — alias for the ST_NPOINTS geometry function. Alias for [ST\_NPOINTS](../st-npoints). # ST_POINTN (Lakehouse v1) > ST_POINTN — Returns a Point at a specified index in a LineString. Returns a Point at a specified index in a LineString. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ST_POINTN(, ) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | ------------ | --------------------------------------------------------------------------------- | | `` | The argument must be an expression of type GEOMETRY that represents a LineString. | | `` | The index of the Point to return. | Note The index is 1-based, and a negative index is uesed as the offset from the end of LineString. If index is out of bounds, the function returns an error. ## Return Type [Section titled “Return Type”](#return-type) Geometry. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ST_POINTN( ST_GEOMETRYFROMWKT( 'LINESTRING(1 1, 2 2, 3 3, 4 4)' ), 1 ) AS pipeline_pointn; ┌─────────────────┐ │ pipeline_pointn │ ├─────────────────┤ │ POINT(1 1) │ └─────────────────┘ SELECT ST_POINTN( ST_GEOMETRYFROMWKT( 'LINESTRING(1 1, 2 2, 3 3, 4 4)' ), -2 ) AS pipeline_pointn; ┌─────────────────┐ │ pipeline_pointn │ ├─────────────────┤ │ POINT(3 3) │ └─────────────────┘ ``` # ST_POLYGON (Lakehouse v1) > ST_POLYGON — alias for the ST_MAKEPOLYGON geometry function. Alias for [ST\_MAKEPOLYGON](../st-makepolygon). # ST_SETSRID (Lakehouse v1) > ST_SETSRID — returns a GEOMETRY object that has its SRID (spatial reference system identifier). Returns a GEOMETRY object that has its [SRID (spatial reference system identifier)](https://en.wikipedia.org/wiki/Spatial_reference_system#Identifier) set to the specified value. This Function only change the SRID without affecting the coordinates of the object. If you also need to change the coordinates to match the new SRS (spatial reference system), use [ST\_TRANSFORM](../st-transform) instead. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ST_SETSRID(, ) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | ------------ | ----------------------------------------------------------- | | `` | The argument must be an expression of type GEOMETRY object. | | `` | The SRID integer to set in the returned GEOMETRY object. | ## Return Type [Section titled “Return Type”](#return-type) Geometry. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SET GEOMETRY_OUTPUT_FORMAT = 'EWKT' SELECT ST_SETSRID(TO_GEOMETRY('POINT(13 51)'), 4326) AS geometry ┌────────────────────────┐ │ geometry │ ├────────────────────────┤ │ SRID=4326;POINT(13 51) │ └────────────────────────┘ ``` # ST_SRID (Lakehouse v1) > ST_SRID — returns the SRID (spatial reference system identifier) of a GEOMETRY object. Returns the SRID (spatial reference system identifier) of a GEOMETRY object. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ST_SRID() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | ------------ | ---------------------------------------------------- | | `` | The argument must be an expression of type GEOMETRY. | ## Return Type [Section titled “Return Type”](#return-type) INT32. Note If the Geometry don’t have a SRID, a default value 4326 will be returned. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ST_SRID( TO_GEOMETRY( 'POINT(-122.306100 37.554162)', 1234 ) ) AS pipeline_srid; ┌───────────────┐ │ pipeline_srid │ ├───────────────┤ │ 1234 │ └───────────────┘ SELECT ST_SRID( ST_MAKEGEOMPOINT( 37.5, 45.5 ) ) AS pipeline_srid; ┌───────────────┐ │ pipeline_srid │ ├───────────────┤ │ 4326 │ └───────────────┘ ``` # ST_STARTPOINT (Lakehouse v1) > ST_STARTPOINT — Returns the first Point in a LineString. Returns the first Point in a LineString. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ST_STARTPOINT() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | ------------ | --------------------------------------------------------------------------------- | | `` | The argument must be an expression of type GEOMETRY that represents a LineString. | ## Return Type [Section titled “Return Type”](#return-type) Geometry. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ST_STARTPOINT( ST_GEOMETRYFROMWKT( 'LINESTRING(1 1, 2 2, 3 3, 4 4)' ) ) AS pipeline_endpoint; ┌───────────────────┐ │ pipeline_endpoint │ ├───────────────────┤ │ POINT(1 1) │ └───────────────────┘ ``` # ST_TRANSFORM (Lakehouse v1) > ST_TRANSFORM — converts a GEOMETRY object from one spatial reference system (SRS) to another. Converts a GEOMETRY object from one [spatial reference system (SRS)](https://en.wikipedia.org/wiki/Spatial_reference_system) to another. If you just need to change the SRID without changing the coordinates (e.g. if the SRID was incorrect), use [ST\_SETSRID](../st-setsrid) instead. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ST_TRANSFORM( [, ], ) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | ------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- | | `` | The argument must be an expression of type GEOMETRY object. | | `` | Optional SRID identifying the current SRS of the input GEOMETRY object, if this argument is omitted, use the SRID specified in the input GEOMETRY object. | | `` | The SRID that identifies the SRS to use, transforms the input GEOMETRY object to a new object that uses this SRS. | ## Return Type [Section titled “Return Type”](#return-type) Geometry. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SET GEOMETRY_OUTPUT_FORMAT = 'EWKT' SELECT ST_TRANSFORM(ST_GEOMFROMWKT('POINT(389866.35 5819003.03)', 32633), 3857) AS transformed_geom ┌───────────────────────────────────────────────┐ │ transformed_geom │ ├───────────────────────────────────────────────┤ │ SRID=3857;POINT(1489140.093766 6892872.19868) │ └───────────────────────────────────────────────┘ SELECT ST_TRANSFORM(ST_GEOMFROMWKT('POINT(4.500212 52.161170)'), 4326, 28992) AS transformed_geom ┌──────────────────────────────────────────────┐ │ transformed_geom │ ├──────────────────────────────────────────────┤ │ SRID=28992;POINT(94308.670475 464038.168827) │ └──────────────────────────────────────────────┘ ``` # ST_X (Lakehouse v1) > ST_X — returns the longitude (X coordinate) of a Point represented by a GEOMETRY object. Returns the longitude (X coordinate) of a Point represented by a GEOMETRY object. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ST_X() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | ------------ | ----------------------------------------------------------------------------- | | `` | The argument must be an expression of type GEOMETRY and must contain a Point. | ## Return Type [Section titled “Return Type”](#return-type) Double. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ST_X( ST_MAKEGEOMPOINT( 37.5, 45.5 ) ) AS pipeline_x; ┌────────────┐ │ pipeline_x │ ├────────────┤ │ 37.5 │ └────────────┘ ``` # ST_XMAX (Lakehouse v1) > ST_XMAX — returns the maximum longitude (X coordinate) of all points contained in the specified GEOMETRY object. Returns the maximum longitude (X coordinate) of all points contained in the specified GEOMETRY object. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ST_XMAX() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | ------------ | ---------------------------------------------------- | | `` | The argument must be an expression of type GEOMETRY. | ## Return Type [Section titled “Return Type”](#return-type) Double. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ST_XMAX( TO_GEOMETRY( 'GEOMETRYCOLLECTION(POINT(40 10),LINESTRING(10 10,20 20,10 40),POINT EMPTY)' ) ) AS pipeline_xmax; ┌───────────────┐ │ pipeline_xmax │ ├───────────────┤ │ 40 │ └───────────────┘ SELECT ST_XMAX( TO_GEOMETRY( 'GEOMETRYCOLLECTION(POINT(40 10),LINESTRING(10 10,20 20,10 40),POLYGON((40 40,20 45,45 30,40 40)))' ) ) AS pipeline_xmax; ┌───────────────┐ │ pipeline_xmax │ ├───────────────┤ │ 45 │ └───────────────┘ ``` # ST_XMIN (Lakehouse v1) > ST_XMIN — returns the minimum longitude (X coordinate) of all points contained in the specified GEOMETRY object. Returns the minimum longitude (X coordinate) of all points contained in the specified GEOMETRY object. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ST_XMIN() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | ------------ | ---------------------------------------------------- | | `` | The argument must be an expression of type GEOMETRY. | ## Return Type [Section titled “Return Type”](#return-type) Double. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ST_XMIN( TO_GEOMETRY( 'GEOMETRYCOLLECTION(POINT(180 10),LINESTRING(20 10,30 20,40 40),POINT EMPTY)' ) ) AS pipeline_xmin; ┌───────────────┐ │ pipeline_xmin │ ├───────────────┤ │ 20 │ └───────────────┘ SELECT ST_XMIN( TO_GEOMETRY( 'GEOMETRYCOLLECTION(POINT(40 10),LINESTRING(20 10,30 20,10 40),POLYGON((40 40,20 45,45 30,40 40)))' ) ) AS pipeline_xmin; ┌───────────────┐ │ pipeline_xmin │ ├───────────────┤ │ 10 │ └───────────────┘ ``` # ST_Y (Lakehouse v1) > ST_Y — returns the latitude (Y coordinate) of a Point represented by a GEOMETRY object. Returns the latitude (Y coordinate) of a Point represented by a GEOMETRY object. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ST_Y() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | ------------ | ----------------------------------------------------------------------------- | | `` | The argument must be an expression of type GEOMETRY and must contain a Point. | ## Return Type [Section titled “Return Type”](#return-type) Double. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ST_Y( ST_MAKEGEOMPOINT( 37.5, 45.5 ) ) AS pipeline_y; ┌────────────┐ │ pipeline_y │ ├────────────┤ │ 45.5 │ └────────────┘ ``` # ST_YMAX (Lakehouse v1) > ST_YMAX — returns the maximum latitude (Y coordinate) of all points contained in the specified. Returns the maximum latitude (Y coordinate) of all points contained in the specified GEOMETRY object. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ST_YMAX() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | ------------ | ---------------------------------------------------- | | `` | The argument must be an expression of type GEOMETRY. | ## Return Type [Section titled “Return Type”](#return-type) Double. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ST_YMAX( TO_GEOMETRY( 'GEOMETRYCOLLECTION(POINT(180 50),LINESTRING(10 10,20 20,10 40),POINT EMPTY)' ) ) AS pipeline_ymax; ┌───────────────┐ │ pipeline_ymax │ ├───────────────┤ │ 50 │ └───────────────┘ SELECT ST_YMAX( TO_GEOMETRY( 'GEOMETRYCOLLECTION(POINT(40 10),LINESTRING(10 10,20 20,10 40),POLYGON((40 40,20 45,45 30,40 40)))' ) ) AS pipeline_ymax; ┌───────────────┐ │ pipeline_ymax │ ├───────────────┤ │ 45 │ └───────────────┘ ``` # ST_YMIN (Lakehouse v1) > ST_YMIN — returns the minimum latitude (Y coordinate) of all points contained in the specified. Returns the minimum latitude (Y coordinate) of all points contained in the specified GEOMETRY object. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ST_YMIN() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | ------------ | ---------------------------------------------------- | | `` | The argument must be an expression of type GEOMETRY. | ## Return Type [Section titled “Return Type”](#return-type) Double. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ST_YMIN( TO_GEOMETRY( 'GEOMETRYCOLLECTION(POINT(-180 -10),LINESTRING(-179 0, 179 30),POINT EMPTY)' ) ) AS pipeline_ymin; ┌───────────────┐ │ pipeline_ymin │ ├───────────────┤ │ -10 │ └───────────────┘ SELECT ST_YMIN( TO_GEOMETRY( 'GEOMETRYCOLLECTION(POINT(180 0),LINESTRING(-60 -30, 60 30),POLYGON((40 40,20 45,45 30,40 40)))' ) ) AS pipeline_ymin; ┌───────────────┐ │ pipeline_ymin │ ├───────────────┤ │ -30 │ └───────────────┘ ``` # TO_GEOMETRY (Lakehouse v1) > TO_GEOMETRY — Parses an input and returns a value of type GEOMETRY. Parses an input and returns a value of type GEOMETRY. `TRY_TO_GEOMETRY` returns a NULL value if an error occurs during parsing. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_GEOMETRY(, []) TO_GEOMETRY(, []) TO_GEOMETRY(, []) TRY_TO_GEOMETRY(, []) TRY_TO_GEOMETRY(, []) TRY_TO_GEOMETRY(, []) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | ----------- | --------------------------------------------------------------------------------------------------------- | | `` | The argument must be a string expression in WKT, EWKT, WKB or EWKB in hexadecimal format, GeoJSON format. | | `` | The argument must be a binary expression in WKB or EWKB format. | | `` | The argument must be a JSON OBJECT in GeoJSON format. | | `` | The integer value of the SRID to use. | ## Return Type [Section titled “Return Type”](#return-type) Geometry. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT TO_GEOMETRY( 'POINT(1820.12 890.56)' ) AS pipeline_geometry; ┌───────────────────────┐ │ pipeline_geometry │ ├───────────────────────┤ │ POINT(1820.12 890.56) │ └───────────────────────┘ SELECT TO_GEOMETRY( '0101000020797f000066666666a9cb17411f85ebc19e325641', 4326 ) AS pipeline_geometry; ┌───────────────────────────────────────┐ │ pipeline_geometry │ ├───────────────────────────────────────┤ │ SRID=4326;POINT(389866.35 5819003.03) │ └───────────────────────────────────────┘ SELECT TO_GEOMETRY( FROM_HEX('0101000020797f000066666666a9cb17411f85ebc19e325641'), 4326 ) AS pipeline_geometry; ┌───────────────────────────────────────┐ │ pipeline_geometry │ ├───────────────────────────────────────┤ │ SRID=4326;POINT(389866.35 5819003.03) │ └───────────────────────────────────────┘ SELECT TO_GEOMETRY( '{"coordinates":[[389866,5819003],[390000,5830000]],"type":"LineString"}' ) AS pipeline_geometry; ┌───────────────────────────────────────────┐ │ pipeline_geometry │ ├───────────────────────────────────────────┤ │ LINESTRING(389866 5819003,390000 5830000) │ └───────────────────────────────────────────┘ SELECT TO_GEOMETRY( PARSE_JSON('{"coordinates":[[389866,5819003],[390000,5830000]],"type":"LineString"}') ) AS pipeline_geometry; ┌───────────────────────────────────────────┐ │ pipeline_geometry │ ├───────────────────────────────────────────┤ │ LINESTRING(389866 5819003,390000 5830000) │ └───────────────────────────────────────────┘ ``` # TO_STRING (Geometry, Lakehouse v1) > TO_STRING — Converts a GEOMETRY object into a String representation. Converts a GEOMETRY object into a String representation. The display format of the output data is controlled by the `geometry_output_format` setting, which contains the following types: | Parameter | Description | | ----------------- | ------------------------------------------------------------------- | | GeoJSON (default) | The GEOMETRY result is rendered as a JSON object in GeoJSON format. | | WKT | The GEOMETRY result is rendered as a String in WKT format. | | WKB | The GEOMETRY result is rendered as a Binary in WKB format. | | EWKT | The GEOMETRY result is rendered as a String in EWKT format. | | EWKB | The GEOMETRY result is rendered as a Binary in EWKB format. | ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_STRING() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | ------------ | ---------------------------------------------------- | | `` | The argument must be an expression of type GEOMETRY. | ## Return Type [Section titled “Return Type”](#return-type) String. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SET geometry_output_format='GeoJSON'; SELECT TO_GEOMETRY( ST_GEOMETRYFROMWKT( 'SRID=4326;LINESTRING(400000 6000000, 401000 6010000)' ) ) AS pipeline_geometry; ┌────────────────────────────────────────────────────────────────────────────┐ │ pipeline_geometry │ ├────────────────────────────────────────────────────────────────────────────┤ │ {"type": "LineString", "coordinates": [[400000,6000000],[401000,6010000]]} │ └────────────────────────────────────────────────────────────────────────────┘ SET geometry_output_format='WKT'; SELECT TO_GEOMETRY( ST_GEOMETRYFROMWKT( 'SRID=4326;LINESTRING(400000 6000000, 401000 6010000)' ) ) AS pipeline_geometry; ┌───────────────────────────────────────────┐ │ pipeline_geometry │ ├───────────────────────────────────────────┤ │ LINESTRING(400000 6000000,401000 6010000) │ └───────────────────────────────────────────┘ SET geometry_output_format='EWKT'; SELECT TO_GEOMETRY( ST_GEOMETRYFROMWKT( 'SRID=4326;LINESTRING(400000 6000000, 401000 6010000)' ) ) AS pipeline_geometry; ┌─────────────────────────────────────────────────────┐ │ pipeline_geometry │ ├─────────────────────────────────────────────────────┤ │ SRID=4326;LINESTRING(400000 6000000,401000 6010000) │ └─────────────────────────────────────────────────────┘ SET geometry_output_format='WKB'; SELECT TO_GEOMETRY( ST_GEOMETRYFROMWKT( 'SRID=4326;LINESTRING(400000 6000000, 401000 6010000)' ) ) AS pipeline_geometry; ┌────────────────────────────────────────────────────────────────────────────────────┐ │ pipeline_geometry │ ├────────────────────────────────────────────────────────────────────────────────────┤ │ 01020000000200000000000000006A18410000000060E3564100000000A07918410000000024ED5641 │ └────────────────────────────────────────────────────────────────────────────────────┘ SET geometry_output_format='EWKB'; SELECT TO_GEOMETRY( ST_GEOMETRYFROMWKT( 'SRID=4326;LINESTRING(400000 6000000, 401000 6010000)' ) ) AS pipeline_geometry; ┌────────────────────────────────────────────────────────────────────────────────────────────┐ │ pipeline_geometry │ ├────────────────────────────────────────────────────────────────────────────────────────────┤ │ 0102000020E61000000200000000000000006A18410000000060E3564100000000A07918410000000024ED5641 │ └────────────────────────────────────────────────────────────────────────────────────────────┘ ``` # Map Functions (Lakehouse v1) > Lakehouse v1 SQL map functions: build and query key-value MAP values. This section provides reference information for the map functions in PlaidCloud Lakehouse. # MAP_CAT (Lakehouse v1) > MAP_CAT — returns the concatenatation of two MAPs. Returns the concatenatation of two MAPs. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql MAP_CAT( , ) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ------------------------------- | | `` | The source MAP. | | `` | The MAP to be appended to map1. | Note * If both map1 and map2 have a value with the same key, then the output map contains the value from map2. * If either argument is NULL, the function returns NULL without reporting any error. ## Return Type [Section titled “Return Type”](#return-type) Map. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT MAP_CAT({'a':1,'b':2,'c':3}, {'c':5,'d':6}); ┌─────────────────────────────────────────────┐ │ map_cat({'a':1,'b':2,'c':3}, {'c':5,'d':6}) │ ├─────────────────────────────────────────────┤ │ {'a':1,'b':2,'c':5,'d':6} │ └─────────────────────────────────────────────┘ ``` # MAP_CONTAINS_KEY (Lakehouse v1) > MAP_CONTAINS_KEY — determines whether the specified MAP contains the specified key. Determines whether the specified MAP contains the specified key. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql MAP_CONTAINS_KEY( , ) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ----------------------- | | `` | The map to be searched. | | `` | The key to find. | ## Return Type [Section titled “Return Type”](#return-type) Boolean. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT MAP_CONTAINS_KEY({'a':1,'b':2,'c':3}, 'c'); ┌────────────────────────────────────────────┐ │ map_contains_key({'a':1,'b':2,'c':3}, 'c') │ ├────────────────────────────────────────────┤ │ true │ └────────────────────────────────────────────┘ SELECT MAP_CONTAINS_KEY({'a':1,'b':2,'c':3}, 'x'); ┌────────────────────────────────────────────┐ │ map_contains_key({'a':1,'b':2,'c':3}, 'x') │ ├────────────────────────────────────────────┤ │ false │ └────────────────────────────────────────────┘ ``` # MAP_DELETE (Lakehouse v1) > MAP_DELETE — Returns an existing MAP with one or more keys removed. Returns an existing MAP with one or more keys removed. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql MAP_DELETE( , [, , ... ] ) MAP_DELETE( , ) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ------------------------------------------------------ | | `` | The MAP that contains the KEY to remove. | | `` | The KEYs to be omitted from the returned MAP. | | `` | The Array of KEYs to be omitted from the returned MAP. | Note * The types of the key expressions and the keys in the map must be the same. * Key values not found in the map will be ignored. ## Return Type [Section titled “Return Type”](#return-type) Map. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT MAP_DELETE({'a':1,'b':2,'c':3}, 'a', 'c'); ┌───────────────────────────────────────────┐ │ map_delete({'a':1,'b':2,'c':3}, 'a', 'c') │ ├───────────────────────────────────────────┤ │ {'b':2} │ └───────────────────────────────────────────┘ SELECT MAP_DELETE({'a':1,'b':2,'c':3}, ['a', 'b']); ┌─────────────────────────────────────────────┐ │ map_delete({'a':1,'b':2,'c':3}, ['a', 'b']) │ ├─────────────────────────────────────────────┤ │ {'c':3} │ └─────────────────────────────────────────────┘ ``` # MAP_FILTER (Lakehouse v1) > MAP_FILTER — filters key-value pairs from a map using a lambda expression to define the condition. Filters key-value pairs from a map using a lambda expression to define the condition. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql MAP_FILTER(, (, ) -> ) ``` ## Return Type [Section titled “Return Type”](#return-type) Returns a map that includes only the key-value pairs meeting the condition specified by the lambda expression. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) This example returns a map containing only the products with stock quantities below 10: ```sql SELECT MAP_FILTER({101:15, 102:8, 103:12, 104:5}, (product_id, stock) -> (stock < 10)) AS low_stock_products; ┌────────────────────┐ │ low_stock_products │ ├────────────────────┤ │ {102:8,104:5} │ └────────────────────┘ ``` # MAP_INSERT (Lakehouse v1) > MAP_INSERT — returns a new MAP consisting of the input MAP with a new key-value pair inserted (an. Returns a new MAP consisting of the input MAP with a new key-value pair inserted (an existing key updated with a new value). ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql MAP_INSERT( , , [, ] ) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | -------------- | -------------------------------------------------------------------------------------------- | | `` | The input MAP. | | `` | The new key to insert into the MAP. | | `` | The new value to insert into the MAP. | | `` | The boolean flag indicates whether an existing key can be overwritten. The default is FALSE. | ## Return Type [Section titled “Return Type”](#return-type) Map. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT MAP_INSERT({'a':1,'b':2,'c':3}, 'd', 4); ┌─────────────────────────────────────────┐ │ map_insert({'a':1,'b':2,'c':3}, 'd', 4) │ ├─────────────────────────────────────────┤ │ {'a':1,'b':2,'c':3,'d':4} │ └─────────────────────────────────────────┘ SELECT MAP_INSERT({'a':1,'b':2,'c':3}, 'a', 5, true); ┌───────────────────────────────────────────────┐ │ map_insert({'a':1,'b':2,'c':3}, 'a', 5, TRUE) │ ├───────────────────────────────────────────────┤ │ {'a':5,'b':2,'c':3} │ └───────────────────────────────────────────────┘ ``` # MAP_KEYS (Lakehouse v1) > MAP_KEYS — returns the keys in a map. Returns the keys in a map. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql MAP_KEYS( ) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | -------------- | | `` | The input map. | ## Return Type [Section titled “Return Type”](#return-type) Array. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT MAP_KEYS({'a':1,'b':2,'c':3}); ┌───────────────────────────────┐ │ map_keys({'a':1,'b':2,'c':3}) │ ├───────────────────────────────┤ │ ['a','b','c'] │ └───────────────────────────────┘ ``` # MAP_PICK (Lakehouse v1) > MAP_PICK — returns a new MAP containing the specified key-value pairs from an existing MAP. Returns a new MAP containing the specified key-value pairs from an existing MAP. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql MAP_PICK( , [, , ... ] ) MAP_PICK( , ) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ------------------------------------------------------- | | `` | The input MAP. | | `` | The KEYs to be included from the returned MAP. | | `` | The Array of KEYs to be included from the returned MAP. | Note * The types of the key expressions and the keys in the map must be the same. * Key values not found in the map will be ignored. ## Return Type [Section titled “Return Type”](#return-type) Map. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT MAP_PICK({'a':1,'b':2,'c':3}, 'a', 'c'); ┌─────────────────────────────────────────┐ │ map_pick({'a':1,'b':2,'c':3}, 'a', 'c') │ ├─────────────────────────────────────────┤ │ {'a':1,'c':3} │ └─────────────────────────────────────────┘ SELECT MAP_PICK({'a':1,'b':2,'c':3}, ['a', 'b']); ┌───────────────────────────────────────────┐ │ map_pick({'a':1,'b':2,'c':3}, ['a', 'b']) │ ├───────────────────────────────────────────┤ │ {'a':1,'b':2} │ └───────────────────────────────────────────┘ ``` # MAP_SIZE (Lakehouse v1) > MAP_SIZE — returns the size of a MAP. Returns the size of a MAP. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql MAP_SIZE( ) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | -------------- | | `` | The input map. | ## Return Type [Section titled “Return Type”](#return-type) UInt64. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT MAP_SIZE({'a':1,'b':2,'c':3}); ┌───────────────────────────────┐ │ map_size({'a':1,'b':2,'c':3}) │ ├───────────────────────────────┤ │ 3 │ └───────────────────────────────┘ ``` # MAP_TRANSFORM_KEYS (Lakehouse v1) > MAP_TRANSFORM_KEYS — applies a transformation to each key in a map using a lambda expression. Applies a transformation to each key in a map using a lambda expression. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql MAP_TRANSFORM_KEYS(, (, ) -> ) ``` ## Return Type [Section titled “Return Type”](#return-type) Returns a map with the same values as the input map but with keys modified according to the specified lambda transformation. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) This example adds 1,000 to each product ID, creating a new map with updated keys while keeping the associated prices the same: ```sql SELECT MAP_TRANSFORM_KEYS({101: 29.99, 102: 45.50, 103: 15.00}, (product_id, price) -> product_id + 1000) AS updated_product_ids; ┌────────────────────────────────────┐ │ updated_product_ids │ ├────────────────────────────────────┤ │ {1101:29.99,1102:45.50,1103:15.00} │ └────────────────────────────────────┘ ``` # MAP_TRANSFORM_VALUES (Lakehouse v1) > MAP_TRANSFORM_VALUES — applies a transformation to each value in a map using a lambda expression. Applies a transformation to each value in a map using a lambda expression. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql MAP_TRANSFORM_VALUES(, (, ) -> ) ``` ## Return Type [Section titled “Return Type”](#return-type) Returns a map with the same keys as the input map but with values modified according to the specified lambda transformation. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) This example reduces each product’s price by 10%, while the product IDs (keys) remain unchanged: ```sql SELECT MAP_TRANSFORM_VALUES({101: 100.0, 102: 150.0, 103: 200.0}, (product_id, price) -> price * 0.9) AS discounted_prices; ┌───────────────────────────────────┐ │ discounted_prices │ ├───────────────────────────────────┤ │ {101:90.00,102:135.00,103:180.00} │ └───────────────────────────────────┘ ``` # MAP_VALUES (Lakehouse v1) > MAP_VALUES — returns the values in a map. Returns the values in a map. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql MAP_VALUES( ) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | -------------- | | `` | The input map. | ## Return Type [Section titled “Return Type”](#return-type) Array. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT MAP_VALUES({'a':1,'b':2,'c':3}); ┌─────────────────────────────────┐ │ map_values({'a':1,'b':2,'c':3}) │ ├─────────────────────────────────┤ │ [1,2,3] │ └─────────────────────────────────┘ ``` # Search Functions (Lakehouse v1) > Lakehouse v1 SQL search functions: full-text and similarity search helpers. This section provides reference information for the search functions in PlaidCloud Lakehouse. # MATCH (Lakehouse v1) > MATCH — Searches for documents containing specified keywords. Searches for documents containing specified keywords. Please note that the MATCH function can only be used in a WHERE clause. PlaidCloud Lakehouse’s MATCH function is inspired by Elasticsearch’s [MATCH](https://www.elastic.co/guide/en/elasticsearch/reference/current/sql-functions-search.html#sql-functions-search-match). ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql MATCH( '', ''[, ''] ) ``` | Parameter | Description | | ------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `` | A comma-separated list of column names in the table to search for the specified keywords, with optional weighting using the syntax (^), which allows assigning different weights to each column, influencing the importance of each column in the search. | | `` | The keywords to match against the specified columns in the table. This parameter can also be used for suffix matching, where the search term followed by an asterisk (\*) can match any number of characters or words. | | `` | A set of configuration options, separated by semicolons `;`, that customize the search behavior. See the table below for details. | | Option | Description | Example | Explanation | | --------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | fuzziness | Allows matching terms within a specified Levenshtein distance. `fuzziness` can be set to 1 or 2. | SELECT id, score(), content FROM t WHERE match(content, ‘box’, ‘fuzziness=1’); | When matching the query term “box”, `fuzziness=1` allows matching terms like “fox”, since “box” and “fox” have a Levenshtein distance of 1. | | operator | Specifies how multiple query terms are combined. Can be set to OR (default) or AND. OR returns results containing any of the query terms, while AND returns results containing all query terms. | SELECT id, score(), content FROM t WHERE match(content, ‘action works’, ‘fuzziness=1;operator=AND’); | With `operator=AND`, the query requires both “action” and “works” to be present in the results. Due to `fuzziness=1`, it matches terms like “Actions” and “words”, so “Actions speak louder than words” is returned. | | lenient | Controls whether errors are reported when the query text is invalid. Defaults to `false`. If set to `true`, no error is reported, and an empty result set is returned if the query text is invalid. | SELECT id, score(), content FROM t WHERE match(content, ’()’, ‘lenient=true’); | If the query text `()` is invalid, setting `lenient=true` prevents an error from being thrown and returns an empty result set instead. | ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql CREATE TABLE test(title STRING, body STRING); CREATE INVERTED INDEX idx ON test(title, body); INSERT INTO test VALUES ('The Importance of Reading', 'Reading is a crucial skill that opens up a world of knowledge and imagination.'), ('The Benefits of Exercise', 'Exercise is essential for maintaining a healthy lifestyle.'), ('The Power of Perseverance', 'Perseverance is the key to overcoming obstacles and achieving success.'), ('The Art of Communication', 'Effective communication is crucial in everyday life.'), ('The Impact of Technology on Society', 'Technology has revolutionized our society in countless ways.'); -- Retrieve documents where the 'title' column matches 'art power' SELECT * FROM test WHERE MATCH('title', 'art power'); ┌────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ title │ body │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────┤ │ The Power of Perseverance │ Perseverance is the key to overcoming obstacles and achieving success. │ │ The Art of Communication │ Effective communication is crucial in everyday life. │ └────────────────────────────────────────────────────────────────────────────────────────────────────┘ -- Retrieve documents where the 'title' column contains values that start with 'The' followed by any characters SELECT * FROM test WHERE MATCH('title', 'The*') ┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ title │ body │ │ Nullable(String) │ Nullable(String) │ ├─────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────┤ │ The Importance of Reading │ Reading is a crucial skill that opens up a world of knowledge and imagination. │ │ The Benefits of Exercise │ Exercise is essential for maintaining a healthy lifestyle. │ │ The Power of Perseverance │ Perseverance is the key to overcoming obstacles and achieving success. │ │ The Art of Communication │ Effective communication is crucial in everyday life. │ │ The Impact of Technology on Society │ Technology has revolutionized our society in countless ways. │ └──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ -- Retrieve documents where either the 'title' or 'body' column matches 'knowledge technology' SELECT *, score() FROM test WHERE MATCH('title, body', 'knowledge technology'); ┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ title │ body │ score() │ ├─────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────┼───────────┤ │ The Importance of Reading │ Reading is a crucial skill that opens up a world of knowledge and imagination. │ 1.1550591 │ │ The Impact of Technology on Society │ Technology has revolutionized our society in countless ways. │ 2.6830134 │ └──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ -- Retrieve documents where either the 'title' or 'body' column matches 'knowledge technology', with weighted importance on both columns SELECT *, score() FROM test WHERE MATCH('title^5, body^1.2', 'knowledge technology'); ┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ title │ body │ score() │ ├─────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────┼───────────┤ │ The Importance of Reading │ Reading is a crucial skill that opens up a world of knowledge and imagination. │ 1.3860708 │ │ The Impact of Technology on Society │ Technology has revolutionized our society in countless ways. │ 7.8053584 │ └──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ -- Retrieve documents where the 'body' column contains both "knowledge" and "imagination" (allowing for minor typos). SELECT * FROM test WHERE MATCH('body', 'knowledg imaginatio', 'fuzziness = 1; operator = AND'); -[ RECORD 1 ]----------------------------------- title: The Importance of Reading body: Reading is a crucial skill that opens up a world of knowledge and imagination. ``` # QUERY (Lakehouse v1) > QUERY — Searches for documents satisfying a specified query expression. Searches for documents satisfying a specified query expression. Please note that the QUERY function can only be used in a WHERE clause. PlaidCloud Lakehouse’s QUERY function is inspired by Elasticsearch’s [QUERY](https://www.elastic.co/guide/en/elasticsearch/reference/current/sql-functions-search.html#sql-functions-search-query). ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql QUERY( ''[, ''] ) ``` The query expression supports the following syntaxes. Please note that `` can also be used for suffix matching, where the search term followed by an asterisk (\*) can match any number of characters or words. | Syntax | Description | Examples | | ------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------- | | `:` | Matches documents where the specified column contains the specified keyword. | `QUERY('title:power')` | | `:IN [, ...]` | Matches documents where the specified column contains any of the keywords listed within the square brackets. | `QUERY('title:IN [power, art]')` | | `: AND / OR ` | Matches documents where the specified column contains both or either of the specified keywords. In queries with both AND and OR, AND operations are prioritized over OR, meaning that ‘a AND b OR c’ is read as ‘(a AND b) OR c’. | `QUERY('title:power AND art')` | | `:+ -` | Matches documents where the specified positive keyword exists in the specified column and excludes documents where the specified negative keyword exists. | `QUERY('title:+the -reading')` | | `:""` | Matches documents where the specified column contains the exact specified phrase. | `QUERY('title:"Benefits of Exercise"')` | | `:^ :^` | Matches documents where the specified keyword exists in the specified columns with the specified boosts to increase their relevance in the search. This syntax allows setting different weights for multiple columns to influence the search relevance. | `QUERY('title:art^5 body:reading^1.2')` | | Option | Description | Example | Explanation | | --------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | fuzziness | Allows matching terms within a specified Levenshtein distance. `fuzziness` can be set to 1 or 2. | SELECT id, score(), content FROM t WHERE query(‘content:box’, ‘fuzziness=1’); | When matching the query term “box”, `fuzziness=1` allows matching terms like “fox”, since “box” and “fox” have a Levenshtein distance of 1. | | operator | Specifies how multiple query terms are combined. Can be set to OR (default) or AND. OR returns results containing any of the query terms, while AND returns results containing all query terms. | SELECT id, score(), content FROM t WHERE query(‘content:action works’, ‘fuzziness=1;operator=AND’); | With `operator=AND`, the query requires both “action” and “works” to be present in the results. Due to `fuzziness=1`, it matches terms like “Actions” and “words”, so “Actions speak louder than words” is returned. | | lenient | Controls whether errors are reported when the query text is invalid. Defaults to `false`. If set to `true`, no error is reported, and an empty result set is returned if the query text is invalid. | SELECT id, score(), content FROM t WHERE query(‘content:()’, ‘lenient=true’); | If the query text `()` is invalid, setting `lenient=true` prevents an error from being thrown and returns an empty result set instead. | ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql CREATE TABLE test(title STRING, body STRING); CREATE INVERTED INDEX idx ON test(title, body); INSERT INTO test VALUES ('The Importance of Reading', 'Reading is a crucial skill that opens up a world of knowledge and imagination.'), ('The Benefits of Exercise', 'Exercise is essential for maintaining a healthy lifestyle.'), ('The Power of Perseverance', 'Perseverance is the key to overcoming obstacles and achieving success.'), ('The Art of Communication', 'Effective communication is crucial in everyday life.'), ('The Impact of Technology on Society', 'Technology has revolutionized our society in countless ways.'); -- Retrieve documents where the 'title' column contains the keyword 'power' SELECT * FROM test WHERE QUERY('title:power'); ┌────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ title │ body │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────┤ │ The Power of Perseverance │ Perseverance is the key to overcoming obstacles and achieving success. │ └────────────────────────────────────────────────────────────────────────────────────────────────────┘ -- Retrieve documents where the 'title' column contains values that start with 'The' followed by any characters SELECT * FROM test WHERE QUERY('title:The*'); ┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ title │ body │ ├─────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────┤ │ The Importance of Reading │ Reading is a crucial skill that opens up a world of knowledge and imagination. │ │ The Benefits of Exercise │ Exercise is essential for maintaining a healthy lifestyle. │ │ The Power of Perseverance │ Perseverance is the key to overcoming obstacles and achieving success. │ │ The Art of Communication │ Effective communication is crucial in everyday life. │ │ The Impact of Technology on Society │ Technology has revolutionized our society in countless ways. │ └──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ -- Retrieve documents where the 'title' column contains either the keyword 'power' or 'art' SELECT * FROM test WHERE QUERY('title:power OR art'); ┌────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ title │ body │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────┤ │ The Power of Perseverance │ Perseverance is the key to overcoming obstacles and achieving success. │ │ The Art of Communication │ Effective communication is crucial in everyday life. │ └────────────────────────────────────────────────────────────────────────────────────────────────────┘ SELECT * FROM test WHERE QUERY('title:IN [power, art]') ┌────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ title │ body │ │ Nullable(String) │ Nullable(String) │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────┤ │ The Power of Perseverance │ Perseverance is the key to overcoming obstacles and achieving success. │ │ The Art of Communication │ Effective communication is crucial in everyday life. │ └────────────────────────────────────────────────────────────────────────────────────────────────────┘ -- Retrieve documents where the 'title' column contains the positive keyword 'the' but not 'reading' SELECT * FROM test WHERE QUERY('title:+the -reading'); ┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ title │ body │ ├─────────────────────────────────────┼────────────────────────────────────────────────────────────────────────┤ │ The Benefits of Exercise │ Exercise is essential for maintaining a healthy lifestyle. │ │ The Power of Perseverance │ Perseverance is the key to overcoming obstacles and achieving success. │ │ The Art of Communication │ Effective communication is crucial in everyday life. │ │ The Impact of Technology on Society │ Technology has revolutionized our society in countless ways. │ └──────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ -- Retrieve documents where the 'title' column contains the exact phrase 'Benefits of Exercise' SELECT * FROM test WHERE QUERY('title:"Benefits of Exercise"'); ┌───────────────────────────────────────────────────────────────────────────────────────┐ │ title │ body │ ├──────────────────────────┼────────────────────────────────────────────────────────────┤ │ The Benefits of Exercise │ Exercise is essential for maintaining a healthy lifestyle. │ └───────────────────────────────────────────────────────────────────────────────────────┘ -- Retrieve documents where the 'title' column contains the keyword 'art' with a boost of 5 and the 'body' column contains the keyword 'reading' with a boost of 1.2 SELECT *, score() FROM test WHERE QUERY('title:art^5 body:reading^1.2'); ┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ title │ body │ score() │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────┼───────────┤ │ The Importance of Reading │ Reading is a crucial skill that opens up a world of knowledge and imagination. │ 1.3860708 │ │ The Art of Communication │ Effective communication is crucial in everyday life. │ 7.1992116 │ └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ -- Retrieve documents where the 'body' column contains both "knowledge" and "imagination" (allowing for minor typos). SELECT * FROM test WHERE QUERY('body:knowledg OR imaginatio', 'fuzziness = 1; operator = AND'); -[ RECORD 1 ]----------------------------------- title: The Importance of Reading body: Reading is a crucial skill that opens up a world of knowledge and imagination. ``` # SCORE (Lakehouse v1) > SCORE — Returns the relevance of the query string. Returns the relevance of the query string. The higher the score, the more relevant the data. Please note that SCORE function can only be used with the [QUERY](../query) or [MATCH](../match) function. PlaidCloud Lakehouse’s SCORE function is inspired by Elasticsearch’s [SCORE](https://www.elastic.co/guide/en/elasticsearch/reference/current/sql-functions-search.html#sql-functions-search-score). ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql SCORE() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql CREATE TABLE test(title STRING, body STRING); CREATE INVERTED INDEX idx ON test(title, body); INSERT INTO test VALUES ('The Importance of Reading', 'Reading is a crucial skill that opens up a world of knowledge and imagination.'), ('The Benefits of Exercise', 'Exercise is essential for maintaining a healthy lifestyle.'), ('The Power of Perseverance', 'Perseverance is the key to overcoming obstacles and achieving success.'), ('The Art of Communication', 'Effective communication is crucial in everyday life.'), ('The Impact of Technology on Society', 'Technology has revolutionized our society in countless ways.'); -- Retrieve documents where the 'title' column contains the keyword 'art' with a boost of 5 and the 'body' column contains the keyword 'reading' with a boost of 1.2, along with their relevance scores SELECT *, score() FROM test WHERE QUERY('title:art^5 body:reading^1.2'); ┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ title │ body │ score() │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────┼───────────┤ │ The Importance of Reading │ Reading is a crucial skill that opens up a world of knowledge and imagination. │ 1.3860708 │ │ The Art of Communication │ Effective communication is crucial in everyday life. │ 7.1992116 │ └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ -- Retrieve documents where the 'title' column contains the keyword 'reading' with a boost of 5 and the 'body' column contains the keyword 'everyday' with a boost of 1.2, along with their relevance scores SELECT *, score() FROM test WHERE MATCH('title^5, body^1.2', 'reading everyday'); ┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ title │ body │ score() │ ├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────┼───────────┤ │ The Importance of Reading │ Reading is a crucial skill that opens up a world of knowledge and imagination. │ 8.585282 │ │ The Art of Communication │ Effective communication is crucial in everyday life. │ 1.8575745 │ └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ``` # Semi-Structured Functions (Lakehouse v1) > Lakehouse v1 SQL semi-structured functions: extract, traverse, and transform JSON, variant, and nested data. This section provides reference information for the semi-structured data functions in PlaidCloud Lakehouse. ## JSON Parsing, Conversion & Type Checking: [Section titled “JSON Parsing, Conversion & Type Checking:”](#json-parsing-conversion--type-checking) * [CHECK\_JSON](check-json) * [JSON\_PRETTY](json-pretty) * [JSON\_TYPEOF](json-typeof) * [PARSE\_JSON](parse-json) * [FLATTEN](flatten) * [IS\_ARRAY](is-array) * [IS\_BOOLEAN](is-boolean) * [IS\_FLOAT](is-float) * [IS\_INTEGER](is-integer) * [IS\_NULL\_VALUE](is-null-value) * [IS\_OBJECT](is-object) * [IS\_STRING](is-string) ## JSON Query and Extraction: [Section titled “JSON Query and Extraction:”](#json-query-and-extraction) * [JSON\_ARRAY\_ELEMENTS](json-array-elements) * [JSON\_EACH](json-each) * [JSON\_EXTRACT\_PATH\_TEXT](json-extract-path-text) * [JSON\_PATH\_EXISTS](json-path-exists) * [JSON\_PATH\_MATCH](json-path-match) * [JSON\_PATH\_QUERY](json-path-query) * [JSON\_PATH\_QUERY\_ARRAY](json-path-query-array) * [JSON\_PATH\_QUERY\_FIRST](json-path-query-first) ## JSON Data Manipulation: [Section titled “JSON Data Manipulation:”](#json-data-manipulation) * [JSON\_ARRAY](json-array) * [JSON\_STRIP\_NULLS](json-strip-nulls) ## Object Operations: [Section titled “Object Operations:”](#object-operations) * [GET](get) * [GET\_IGNORE\_CASE](get-ignore-case) * [GET\_PATH](get-path) * [OBJECT\_KEYS](object-keys) ## Type Conversion: [Section titled “Type Conversion:”](#type-conversion) * [AS\_TYPE](as-type) # AS_ (Lakehouse v1) > AS_ — strict casting VARIANT values to other data types. Strict casting `VARIANT` values to other data types. If the input data type is not `VARIANT`, the output is `NULL`. If the type of value in the `VARIANT` does not match the output value, the output is `NULL`. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.as_boolean( ) func.as_integer( ) func.as_float( ) func.as_string( ) func.as_array( ) func.as_object( ) ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql AS_BOOLEAN( ) AS_INTEGER( ) AS_FLOAT( ) AS_STRING( ) AS_ARRAY( ) AS_OBJECT( ) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | ----------- | ----------------- | | `` | The VARIANT value | ## Return Type [Section titled “Return Type”](#return-type) * AS\_BOOLEAN: BOOLEAN * AS\_INTEGER: BIGINT * AS\_FLOAT: DOUBLE * AS\_STRING: VARCHAR * AS\_ARRAY: Variant contains Array * AS\_OBJECT: Variant contains Object ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT as_boolean(parse_json('true')); ┌────────────────────────────────┐ │ as_boolean(parse_json('true')) │ ├────────────────────────────────┤ │ 1 │ └────────────────────────────────┘ SELECT as_integer(parse_json('123')); ┌───────────────────────────────┐ │ as_integer(parse_json('123')) │ ├───────────────────────────────┤ │ 123 │ └───────────────────────────────┘ SELECT as_float(parse_json('12.34')); ┌───────────────────────────────┐ │ as_float(parse_json('12.34')) │ ├───────────────────────────────┤ │ 12.34 │ └───────────────────────────────┘ SELECT as_string(parse_json('"abc"')); ┌────────────────────────────────┐ │ as_string(parse_json('"abc"')) │ ├────────────────────────────────┤ │ abc │ └────────────────────────────────┘ SELECT as_array(parse_json('[1,2,3]')); ┌─────────────────────────────────┐ │ as_array(parse_json('[1,2,3]')) │ ├─────────────────────────────────┤ │ [1,2,3] │ └─────────────────────────────────┘ SELECT as_object(parse_json('{"k":"v","a":"b"}')); ┌────────────────────────────────────────────┐ │ as_object(parse_json('{"k":"v","a":"b"}')) │ ├────────────────────────────────────────────┤ │ {"k":"v","a":"b"} │ └────────────────────────────────────────────┘ ``` # CHECK_JSON (Lakehouse v1) > CHECK_JSON — Checks the validity of a JSON document. Checks the validity of a JSON document. If the input string is a valid JSON document or a `NULL`, the output is `NULL`. If the input cannot be translated to a valid JSON value, the output string contains the error message. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.check_json() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.check_json('[1,2,3]'); ┌────────────────────────────┐ │ func.check_json('[1,2,3]') │ ├────────────────────────────┤ │ NULL │ └────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql CHECK_JSON( ) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ---------------------------- | | `` | An expression of string type | ## Return Type [Section titled “Return Type”](#return-type) String ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT check_json('[1,2,3]'); ┌───────────────────────┐ │ check_json('[1,2,3]') │ ├───────────────────────┤ │ NULL │ └───────────────────────┘ SELECT check_json('{"key":"val"}'); ┌─────────────────────────────┐ │ check_json('{"key":"val"}') │ ├─────────────────────────────┤ │ NULL │ └─────────────────────────────┘ SELECT check_json('{"key":'); ┌──────────────────────────────────────────────┐ │ check_json('{"key":') │ ├──────────────────────────────────────────────┤ │ EOF while parsing a value at line 1 column 7 │ └──────────────────────────────────────────────┘ ``` # FLATTEN (Lakehouse v1) > FLATTEN — transforms nested JSON data into a tabular format, where each element or field is represented as a separate row. Transforms nested JSON data into a tabular format, where each element or field is represented as a separate row. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql [LATERAL] FLATTEN ( INPUT => [ , PATH => ] [ , OUTER => TRUE | FALSE ] [ , RECURSIVE => TRUE | FALSE ] [ , MODE => 'OBJECT' | 'ARRAY' | 'BOTH' ] ) ``` | Parameter / Keyword | Description | Default | | ------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------- | | INPUT | Specifies the JSON or array data to flatten. | - | | PATH | Specifies the path to the array or object within the input data to flatten. | - | | OUTER | If set to TRUE, rows with zero results will still be included in the output, but the values in the KEY, INDEX, and VALUE columns of those rows will be set to NULL. | FALSE | | RECURSIVE | If set to TRUE, the function will continue to flatten nested elements. | FALSE | | MODE | Controls whether to flatten only objects (‘OBJECT’), only arrays (‘ARRAY’), or both (‘BOTH’). | ’BOTH’ | | LATERAL | LATERAL is an optional keyword used to reference columns defined to the left of the LATERAL keyword within the FROM clause. LATERAL enables cross-referencing between the preceding table expressions and the function. | - | ## Output [Section titled “Output”](#output) The following table describes the output columns of the FLATTEN function: Note When using the LATERAL keyword with FLATTEN, these output columns may not be explicitly provided, as LATERAL introduces dynamic cross-referencing, altering the output structure. | Column | Description | | ------ | --------------------------------------------------------------------------------------------- | | SEQ | A unique sequence number associated with the input. | | KEY | Key to the expanded value. If the flattened element does not contain a key, it’s set to NULL. | | PATH | Path to the flattened element. | | INDEX | If the element is an array, this column contains its index; otherwise, it’s set to NULL. | | VALUE | Value of the flattened element. | | THIS | This column identifies the element currently being flattened. | ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ### SQL Examples 1: Demonstrating PATH, OUTER, RECURSIVE, and MODE Parameters [Section titled “SQL Examples 1: Demonstrating PATH, OUTER, RECURSIVE, and MODE Parameters”](#sql-examples-1-demonstrating-path-outer-recursive-and-mode-parameters) This example demonstrates the behavior of the FLATTEN function with respect to the PATH, OUTER, RECURSIVE, and MODE parameters. ```sql SELECT * FROM FLATTEN ( INPUT => PARSE_JSON ( '{"name": "John", "languages": ["English", "Spanish", "French"], "address": {"city": "New York", "state": "NY"}}' ) ); ┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ seq │ key │ path │ index │ value │ this │ ├────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ 1 │ address │ address │ NULL │ {"city":"New York","state":"NY"} │ {"address":{"city":"New York","state":"NY"},"languages":["English","Spanish","French"],"name":"John"} │ │ 1 │ languages │ languages │ NULL │ ["English","Spanish","French"] │ {"address":{"city":"New York","state":"NY"},"languages":["English","Spanish","French"],"name":"John"} │ │ 1 │ name │ name │ NULL │ "John" │ {"address":{"city":"New York","state":"NY"},"languages":["English","Spanish","French"],"name":"John"} │ └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ -- PATH helps in selecting elements at a specific path from the original JSON data. SELECT * FROM FLATTEN ( INPUT => PARSE_JSON ( '{"name": "John", "languages": ["English", "Spanish", "French"], "address": {"city": "New York", "state": "NY"}}' ), PATH => 'languages' ); ┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ seq │ key │ path │ index │ value │ this │ ├────────┼──────────────────┼──────────────────┼──────────────────┼───────────────────┼────────────────────────────────┤ │ 1 │ NULL │ languages[0] │ 0 │ "English" │ ["English","Spanish","French"] │ │ 1 │ NULL │ languages[1] │ 1 │ "Spanish" │ ["English","Spanish","French"] │ │ 1 │ NULL │ languages[2] │ 2 │ "French" │ ["English","Spanish","French"] │ └──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ -- RECURSIVE enables recursive flattening of nested structures. SELECT * FROM FLATTEN ( INPUT => PARSE_JSON ( '{"name": "John", "languages": ["English", "Spanish", "French"], "address": {"city": "New York", "state": "NY"}}' ), RECURSIVE => TRUE ); ┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ seq │ key │ path │ index │ value │ this │ ├────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ 1 │ address │ address │ NULL │ {"city":"New York","state":"NY"} │ {"address":{"city":"New York","state":"NY"},"languages":["English","Spanish","French"],"name":"John"} │ │ 1 │ city │ address.city │ NULL │ "New York" │ {"city":"New York","state":"NY"} │ │ 1 │ state │ address.state │ NULL │ "NY" │ {"city":"New York","state":"NY"} │ │ 1 │ languages │ languages │ NULL │ ["English","Spanish","French"] │ {"address":{"city":"New York","state":"NY"},"languages":["English","Spanish","French"],"name":"John"} │ │ 1 │ NULL │ languages[0] │ 0 │ "English" │ ["English","Spanish","French"] │ │ 1 │ NULL │ languages[1] │ 1 │ "Spanish" │ ["English","Spanish","French"] │ │ 1 │ NULL │ languages[2] │ 2 │ "French" │ ["English","Spanish","French"] │ │ 1 │ name │ name │ NULL │ "John" │ {"address":{"city":"New York","state":"NY"},"languages":["English","Spanish","French"],"name":"John"} │ └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ -- MODE specifies whether only objects ('OBJECT'), only arrays ('ARRAY'), or both ('BOTH') should be flattened. -- In this example, MODE => 'ARRAY' is used, which means that only arrays within the JSON data will be flattened. SELECT * FROM FLATTEN ( INPUT => PARSE_JSON ( '{"name": "John", "languages": ["English", "Spanish", "French"], "address": {"city": "New York", "state": "NY"}}' ), MODE => 'ARRAY' ); --- -- OUTER determines the inclusion of zero-row expansions in the output. -- In this first example, OUTER => TRUE is used with an empty JSON array, which results in zero-row expansions. -- Rows are included in the output even when there are no values to flatten. SELECT * FROM FLATTEN (INPUT => PARSE_JSON ('[]'), OUTER => TRUE); ┌─────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ seq │ key │ path │ index │ value │ this │ ├────────┼──────────────────┼──────────────────┼──────────────────┼───────────────────┼───────────────────┤ │ 1 │ NULL │ NULL │ NULL │ NULL │ NULL │ └─────────────────────────────────────────────────────────────────────────────────────────────────────────┘ -- In this second example, OUTER is omitted, and the output shows how rows with zero results are not included when OUTER is not specified. SELECT * FROM FLATTEN (INPUT => PARSE_JSON ('[]')); ``` ### SQL Examples 2: Demonstrating LATERAL FLATTEN [Section titled “SQL Examples 2: Demonstrating LATERAL FLATTEN”](#sql-examples-2-demonstrating-lateral-flatten) This example demonstrates the behavior of the FLATTEN function when used in conjunction with the LATERAL keyword. ```sql -- Create a table for Tim Hortons transactions with multiple items CREATE TABLE tim_hortons_transactions ( transaction_id INT, customer_id INT, items VARIANT ); -- Insert data for Tim Hortons transactions with multiple items INSERT INTO tim_hortons_transactions (transaction_id, customer_id, items) VALUES (101, 1, parse_json('[{"item":"coffee", "price":2.50}, {"item":"donut", "price":1.20}]')), (102, 2, parse_json('[{"item":"bagel", "price":1.80}, {"item":"muffin", "price":2.00}]')), (103, 3, parse_json('[{"item":"timbit_assortment", "price":5.00}]')); -- Show Tim Hortons transactions with multiple items using LATERAL FLATTEN SELECT t.transaction_id, t.customer_id, f.value:item::STRING AS purchased_item, f.value:price::FLOAT AS price FROM tim_hortons_transactions t, LATERAL FLATTEN(input => t.items) f; ┌───────────────────────────────────────────────────────────────────────────┐ │ transaction_id │ customer_id │ purchased_item │ price │ ├─────────────────┼─────────────────┼───────────────────┼───────────────────┤ │ 101 │ 1 │ coffee │ 2.5 │ │ 101 │ 1 │ donut │ 1.2 │ │ 102 │ 2 │ bagel │ 1.8 │ │ 102 │ 2 │ muffin │ 2 │ │ 103 │ 3 │ timbit_assortment │ 5 │ └───────────────────────────────────────────────────────────────────────────┘ -- Find maximum, minimum, and average prices of the purchased items SELECT MAX(f.value:price::FLOAT) AS max_price, MIN(f.value:price::FLOAT) AS min_price, AVG(f.value:price::FLOAT) AS avg_price FROM tim_hortons_transactions t, LATERAL FLATTEN(input => t.items) f; ┌───────────────────────────────────────────────────────────┐ │ max_price │ min_price │ avg_price │ ├───────────────────┼───────────────────┼───────────────────┤ │ 5 │ 1.2 │ 2.5 │ └───────────────────────────────────────────────────────────┘ ``` # GET (Semi-Structured, Lakehouse v1) > GET — extracts value from a Variant that contains ARRAY by index, or a Variant. Extracts value from a `Variant` that contains `ARRAY` by `index`, or a `Variant` that contains `OBJECT` by `field_name`. The value is returned as a `Variant` or `NULL` if either of the arguments is `NULL`. `GET` applies case-sensitive matching to `field_name`. For case-insensitive matching, use `GET_IGNORE_CASE`. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.get(, ) or func.get(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.get(func.parse_json('[2.71, 3.14]'), 0); ┌──────────────────────────────────────────────┐ │ func.get(func.parse_json('[2.71, 3.14]'), 0) │ ├──────────────────────────────────────────────┤ │ 2.71 │ └──────────────────────────────────────────────┘ func.get(func.parse_json('{"aa":1, "aA":2, "Aa":3}'), 'aa'); ┌─────────────────────────────────────────────────────────────┐ │ func.get(func.parse_json('{"aa":1, "aA":2, "Aa":3}'), 'aa') │ ├─────────────────────────────────────────────────────────────┤ │ 1 │ └─────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql GET( , ) GET( , ) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | -------------- | ---------------------------------------------------------------- | | `` | The VARIANT value that contains either an ARRAY or an OBJECT | | `` | The Uint32 value specifies the position of the value in ARRAY | | `` | The String value specifies the key in a key-value pair of OBJECT | ## Return Type [Section titled “Return Type”](#return-type) VARIANT ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT get(parse_json('[2.71, 3.14]'), 0); ┌────────────────────────────────────┐ │ get(parse_json('[2.71, 3.14]'), 0) │ ├────────────────────────────────────┤ │ 2.71 │ └────────────────────────────────────┘ SELECT get(parse_json('{"aa":1, "aA":2, "Aa":3}'), 'aa'); ┌───────────────────────────────────────────────────┐ │ get(parse_json('{"aa":1, "aA":2, "Aa":3}'), 'aa') │ ├───────────────────────────────────────────────────┤ │ 1 │ └───────────────────────────────────────────────────┘ SELECT get(parse_json('{"aa":1, "aA":2, "Aa":3}'), 'AA'); ┌───────────────────────────────────────────────────┐ │ get(parse_json('{"aa":1, "aA":2, "Aa":3}'), 'AA') │ ├───────────────────────────────────────────────────┤ │ NULL │ └───────────────────────────────────────────────────┘ ``` # GET_IGNORE_CASE (Lakehouse v1) > GET_IGNORE_CASE — extracts value from a VARIANT that contains OBJECT by the field_name. Extracts value from a `VARIANT` that contains `OBJECT` by the field\_name. The value is returned as a `Variant` or `NULL` if either of the arguments is `NULL`. `GET_IGNORE_CASE` is similar to `GET` but applies case-insensitive matching to field names. First match the exact same field name, if not found, match the case-insensitive field name alphabetically. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.get_ignore_Case(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.get_ignore_case(func.parse_json('{"aa":1, "aA":2, "Aa":3}'), 'AA') ┌─────────────────────────────────────────────────────────────────────────┐ │ func.get_ignore_case(func.parse_json('{"aa":1, "aA":2, "Aa":3}'), 'AA') │ ├─────────────────────────────────────────────────────────────────────────┤ │ 3 │ └─────────────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql GET_IGNORE_CASE( , ) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | -------------- | ---------------------------------------------------------------- | | `` | The VARIANT value that contains either an ARRAY or an OBJECT | | `` | The String value specifies the key in a key-value pair of OBJECT | ## Return Type [Section titled “Return Type”](#return-type) VARIANT ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT get_ignore_case(parse_json('{"aa":1, "aA":2, "Aa":3}'), 'AA'); ┌───────────────────────────────────────────────────────────────┐ │ get_ignore_case(parse_json('{"aa":1, "aA":2, "Aa":3}'), 'AA') │ ├───────────────────────────────────────────────────────────────┤ │ 3 │ └───────────────────────────────────────────────────────────────┘ ``` # GET_PATH (Lakehouse v1) > GET_PATH — Extracts value from a VARIANT by path_name. Extracts value from a `VARIANT` by `path_name`. The value is returned as a `Variant` or `NULL` if either of the arguments is `NULL`. `GET_PATH` is equivalent to a chain of `GET` functions, `path_name` consists of a concatenation of field names preceded by periods (.), colons (:) or index operators (`[index]`). The first field name does not require the leading identifier to be specified. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.get_path(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.get_path(func.parse_json('{"k1":[0,1,2], "k2":{"k3":3,"k4":4}}'), 'k2.k4') ┌─────────────────────────────────────────────────────────────────────────────────┐ │ func.get_path(func.parse_json('{"k1":[0,1,2], "k2":{"k3":3,"k4":4}}'), 'k2.k4') │ ├─────────────────────────────────────────────────────────────────────────────────┤ │ 4 │ └─────────────────────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql GET_PATH( , ) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | ------------- | ---------------------------------------------------------------- | | `` | The VARIANT value that contains either an ARRAY or an OBJECT | | `` | The String value that consists of a concatenation of field names | ## Return Type [Section titled “Return Type”](#return-type) VARIANT ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT get_path(parse_json('{"k1":[0,1,2], "k2":{"k3":3,"k4":4}}'), 'k1[0]'); ┌───────────────────────────────────────────────────────────────────────┐ │ get_path(parse_json('{"k1":[0,1,2], "k2":{"k3":3,"k4":4}}'), 'k1[0]') │ ├───────────────────────────────────────────────────────────────────────┤ │ 0 │ └───────────────────────────────────────────────────────────────────────┘ SELECT get_path(parse_json('{"k1":[0,1,2], "k2":{"k3":3,"k4":4}}'), 'k2:k3'); ┌───────────────────────────────────────────────────────────────────────┐ │ get_path(parse_json('{"k1":[0,1,2], "k2":{"k3":3,"k4":4}}'), 'k2:k3') │ ├───────────────────────────────────────────────────────────────────────┤ │ 3 │ └───────────────────────────────────────────────────────────────────────┘ SELECT get_path(parse_json('{"k1":[0,1,2], "k2":{"k3":3,"k4":4}}'), 'k2.k4'); ┌───────────────────────────────────────────────────────────────────────┐ │ get_path(parse_json('{"k1":[0,1,2], "k2":{"k3":3,"k4":4}}'), 'k2.k4') │ ├───────────────────────────────────────────────────────────────────────┤ │ 4 │ └───────────────────────────────────────────────────────────────────────┘ SELECT get_path(parse_json('{"k1":[0,1,2], "k2":{"k3":3,"k4":4}}'), 'k2.k5'); ┌───────────────────────────────────────────────────────────────────────┐ │ get_path(parse_json('{"k1":[0,1,2], "k2":{"k3":3,"k4":4}}'), 'k2.k5') │ ├───────────────────────────────────────────────────────────────────────┤ │ NULL │ └───────────────────────────────────────────────────────────────────────┘ ``` # IS_ARRAY (Lakehouse v1) > IS_ARRAY — Checks if the input value is a JSON array. Checks if the input value is a JSON array. Please note that a JSON array is not the same as the ARRAY data type. A JSON array is a data structure commonly used in JSON, representing an ordered collection of values enclosed within square brackets `[ ]`. It is a flexible format for organizing and exchanging various data types, including strings, numbers, booleans, objects, and nulls. JSON Array Example: ```json [ "Apple", 42, true, {"name": "John", "age": 30, "isStudent": false}, [1, 2, 3], null ] ``` ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.is_array() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.is_array(func.parse_json('true')), func.is_array(func.parse_json('[1,2,3]')) ┌────────────────────────────────────────────────────────────────────────────────────┐ │ func.is_array(func.parse_json('true')) │ func.is_array(func.parse_json('[1,2,3]')) │ ├────────────────────────────────────────┼───────────────────────────────────────────┤ │ false │ true │ └────────────────────────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql IS_ARRAY( ) ``` ## Return Type [Section titled “Return Type”](#return-type) Returns `true` if the input value is a JSON array, and `false` otherwise. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT IS_ARRAY(PARSE_JSON('true')), IS_ARRAY(PARSE_JSON('[1,2,3]')); ┌────────────────────────────────────────────────────────────────┐ │ is_array(parse_json('true')) │ is_array(parse_json('[1,2,3]')) │ ├──────────────────────────────┼─────────────────────────────────┤ │ false │ true │ └────────────────────────────────────────────────────────────────┘ ``` # IS_BOOLEAN (Lakehouse v1) > IS_BOOLEAN — Checks if the input JSON value is a boolean. Checks if the input JSON value is a boolean. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.is_boolean() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.is_boolean(func.parse_json('true')), func.is_boolean(func.parse_json('[1,2,3]')) ┌────────────────────────────────────────────────────────────────────────────────────────┐ │ func.is_boolean(func.parse_json('true')) │ func.is_boolean(func.parse_json('[1,2,3]')) │ ├──────────────────────────────────────────┼─────────────────────────────────────────────┤ │ true │ false │ └────────────────────────────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql IS_BOOLEAN( ) ``` ## Return Type [Section titled “Return Type”](#return-type) Returns `true` if the input JSON value is a boolean, and `false` otherwise. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT IS_BOOLEAN(PARSE_JSON('true')), IS_BOOLEAN(PARSE_JSON('[1,2,3]')); ┌────────────────────────────────────────────────────────────────────┐ │ is_boolean(parse_json('true')) │ is_boolean(parse_json('[1,2,3]')) │ ├────────────────────────────────┼───────────────────────────────────┤ │ true │ false │ └────────────────────────────────────────────────────────────────────┘ ``` # IS_FLOAT (Lakehouse v1) > IS_FLOAT — Checks if the input JSON value is a float. Checks if the input JSON value is a float. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.is_float() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.is_float(func.parse_json('1.23')), func.is_float(func.parse_json('[1,2,3]')) ┌────────────────────────────────────────────────────────────────────────────────────────┐ │ func.is_float(func.parse_json('1.23')) │ func.is_float(func.parse_json('[1,2,3]')) │ ├──────────────────────────────────────────┼─────────────────────────────────────────────┤ │ true │ false │ └────────────────────────────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql IS_FLOAT( ) ``` ## Return Type [Section titled “Return Type”](#return-type) Returns `true` if the input JSON value is a float, and `false` otherwise. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT IS_FLOAT(PARSE_JSON('1.23')), IS_FLOAT(PARSE_JSON('[1,2,3]')); ┌────────────────────────────────────────────────────────────────┐ │ is_float(parse_json('1.23')) │ is_float(parse_json('[1,2,3]')) │ ├──────────────────────────────┼─────────────────────────────────┤ │ true │ false │ └────────────────────────────────────────────────────────────────┘ ``` # IS_INTEGER (Lakehouse v1) > IS_INTEGER — Checks if the input JSON value is an integer. Checks if the input JSON value is an integer. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.is_integer() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.is_integer(func.parse_json('123')), func.is_integer(func.parse_json('[1,2,3]')) ┌────────────────────────────────────────────────────────────────────────────────────────┐ │ func.is_integer(func.parse_json('123')) │ func.is_integer(func.parse_json('[1,2,3]')) │ ├──────────────────────────────────────────┼─────────────────────────────────────────────┤ │ true │ false │ └────────────────────────────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql IS_INTEGER( ) ``` ## Return Type [Section titled “Return Type”](#return-type) Returns `true` if the input JSON value is an integer, and `false` otherwise. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT IS_INTEGER(PARSE_JSON('123')), IS_INTEGER(PARSE_JSON('[1,2,3]')); ┌───────────────────────────────────────────────────────────────────┐ │ is_integer(parse_json('123')) │ is_integer(parse_json('[1,2,3]')) │ ├───────────────────────────────┼───────────────────────────────────┤ │ true │ false │ └───────────────────────────────────────────────────────────────────┘ ``` # IS_NULL_VALUE (Lakehouse v1) > IS_NULL_VALUE — checks whether the input value is a JSON null. Checks whether the input value is a JSON `null`. Please note that this function examines JSON `null`, not SQL NULL. To check if a value is SQL NULL, use [IS\_NULL](../../03-conditional-functions/is-null). JSON null Example: ```json { "name": "John", "age": null } ``` ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.is_null_value() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.is_null_value(func.get_path(func.parse_json('{"k1":[0,1,2], "k2":{"k3":3,"k4":4}}'), 'k2.k5')) ┌─────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ func.is_null_value(func.get_path(func.parse_json('{"k1":[0,1,2], "k2":{"k3":3,"k4":4}}'), 'k2.k5')) │ ├─────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ true │ └─────────────────────────────────────────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql IS_NULL_VALUE( ) ``` ## Return Type [Section titled “Return Type”](#return-type) Returns `true` if the input value is a JSON `null`, and `false` otherwise. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT IS_NULL_VALUE(PARSE_JSON('{"name":"John", "age":null}') :age), --JSON null IS_NULL(NULL); --SQL NULL ┌──────────────────────────────────────────────────────────────────────────────┐ │ is_null_value(parse_json('{"name":"john", "age":null}'):age) │ is_null(null) │ ├──────────────────────────────────────────────────────────────┼───────────────┤ │ true │ true │ └──────────────────────────────────────────────────────────────────────────────┘ ``` # IS_OBJECT (Lakehouse v1) > IS_OBJECT — Checks if the input value is a JSON object. Checks if the input value is a JSON object. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.is_object() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.is_object(func.parse_json('{"a":"b"}')), func.is_object(func.parse_json('["a","b","c"]')) ┌──────────────────────────────────────────────────────────────────────────────────────────────────┐ │ func.is_object(func.parse_json('{"a":"b"}')) │ func.is_object(func.parse_json('["a","b","c"]')) │ ├───────────────────────────────────────────────┼──────────────────────────────────────────────────┤ │ true │ false │ └──────────────────────────────────────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql IS_OBJECT( ) ``` ## Return Type [Section titled “Return Type”](#return-type) Returns `true` if the input JSON value is a JSON object, and `false` otherwise. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT IS_OBJECT(PARSE_JSON('{"a":"b"}')), -- JSON Object IS_OBJECT(PARSE_JSON('["a","b","c"]')); --JSON Array ┌─────────────────────────────────────────────────────────────────────────────┐ │ is_object(parse_json('{"a":"b"}')) │ is_object(parse_json('["a","b","c"]')) │ ├────────────────────────────────────┼────────────────────────────────────────┤ │ true │ false │ └─────────────────────────────────────────────────────────────────────────────┘ ``` # IS_STRING (Lakehouse v1) > IS_STRING — Checks if the input JSON value is a string. Checks if the input JSON value is a string. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.is_string() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.is_string(func.parse_json('"abc"')), func.is_string(func.parse_json('123')) ┌──────────────────────────────────────────────────────────────────────────────────────────────────┐ │ func.is_string(func.parse_json('"abc"')) │ func.is_string(func.parse_json('123')) │ ├───────────────────────────────────────────────┼──────────────────────────────────────────────────┤ │ true │ false │ └──────────────────────────────────────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql IS_STRING( ) ``` ## Return Type [Section titled “Return Type”](#return-type) Returns `true` if the input JSON value is a string, and `false` otherwise. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT IS_STRING(PARSE_JSON('"abc"')), IS_STRING(PARSE_JSON('123')); ┌───────────────────────────────────────────────────────────────┐ │ is_string(parse_json('"abc"')) │ is_string(parse_json('123')) │ ├────────────────────────────────┼──────────────────────────────┤ │ true │ false │ └───────────────────────────────────────────────────────────────┘ ``` # JQ (Lakehouse v1) > JQ — the JQ function is a set-returning SQL function that allows you to apply jq. The JQ function is a set-returning SQL function that allows you to apply [jq](https://jqlang.github.io/jq/) filters to JSON data stored in Variant columns. With this function, you can process JSON data by applying a specified jq filter, returning the results as a set of rows. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql JQ (, ) ``` | Parameter | Description | | --------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `jq_expression` | A `jq` filter expression that defines how to process and transform JSON data using the `jq` syntax. This expression can specify how to select, modify, and manipulate data within JSON objects and arrays. For information on the syntax, filters, and functions supported by jq, please refer to the [jq Manual](https://jqlang.github.io/jq/manual/#basic-filters). | | `json_data` | The JSON-formatted input that you want to process or transform using the `jq` filter expression. It can be a JSON object, array, or any valid JSON data structure. | ## Return Type [Section titled “Return Type”](#return-type) The JQ function returns a set of JSON values, where each value corresponds to an element of the transformed or extracted result based on the ``. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) To start, we create a table named `customer_data` with columns for `id` and `profile`, where `profile` is a JSON type to store user information: ```sql CREATE TABLE customer_data ( id INT, profile JSON ); INSERT INTO customer_data VALUES (1, '{"name": "Alice", "age": 30, "city": "New York"}'), (2, '{"name": "Bob", "age": 25, "city": "Los Angeles"}'), (3, '{"name": "Charlie", "age": 35, "city": "Chicago"}'); ``` This example extracts specific fields from the JSON data: ```sql SELECT id, jq('.name', profile) AS customer_name FROM customer_data; ┌─────────────────────────────────────┐ │ id │ customer_name │ ├─────────────────┼───────────────────┤ │ 1 │ "Alice" │ │ 2 │ "Bob" │ │ 3 │ "Charlie" │ └─────────────────────────────────────┘ ``` This example selects the user ID and the age incremented by 1 for each user: ```sql SELECT id, jq('.age + 1', profile) AS updated_age FROM customer_data; ┌─────────────────────────────────────┐ │ id │ updated_age │ ├─────────────────┼───────────────────┤ │ 1 │ 31 │ │ 2 │ 26 │ │ 3 │ 36 │ └─────────────────────────────────────┘ ``` This example converts city names to uppercase: ```sql SELECT id, jq('.city | ascii_upcase', profile) AS city_uppercase FROM customer_data; ┌─────────────────────────────────────┐ │ id │ city_uppercase │ ├─────────────────┼───────────────────┤ │ 1 │ "NEW YORK" │ │ 2 │ "LOS ANGELES" │ │ 3 │ "CHICAGO" │ └─────────────────────────────────────┘ ``` # JSON_ARRAY (Lakehouse v1) > JSON_ARRAY — Creates a JSON array with specified values. Creates a JSON array with specified values. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.json_array(value1[, value2[, ...]]) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.json_array('fruits', func.json_array('apple', 'banana', 'orange'), func.json_object('price', 1.2, 'quantity', 3)) | -----------------------------------------------------------------------------------------------------------------------+ ["fruits",["apple","banana","orange"],{"price":1.2,"quantity":3}] | ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql JSON_ARRAY(value1[, value2[, ...]]) ``` ## Return Type [Section titled “Return Type”](#return-type) JSON array. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ### SQL Examples 1: Creating JSON Array With Constant Values or Expressions [Section titled “SQL Examples 1: Creating JSON Array With Constant Values or Expressions”](#sql-examples-1-creating-json-array-with-constant-values-or-expressions) ```sql SELECT JSON_ARRAY('PlaidCloud Lakehouse', 3.14, NOW(), TRUE, NULL); json_array('databend', 3.14, now(), true, null) | --------------------------------------------------------+ ["PlaidCloud Lakehouse",3.14,"2023-09-06 07:23:55.399070",true,null]| SELECT JSON_ARRAY('fruits', JSON_ARRAY('apple', 'banana', 'orange'), JSON_OBJECT('price', 1.2, 'quantity', 3)); json_array('fruits', json_array('apple', 'banana', 'orange'), json_object('price', 1.2, 'quantity', 3))| -------------------------------------------------------------------------------------------------------+ ["fruits",["apple","banana","orange"],{"price":1.2,"quantity":3}] | ``` ### SQL Examples 2: Creating JSON Array From Table Data [Section titled “SQL Examples 2: Creating JSON Array From Table Data”](#sql-examples-2-creating-json-array-from-table-data) ```sql CREATE TABLE products ( ProductName VARCHAR(255), Price DECIMAL(10, 2) ); INSERT INTO products (ProductName, Price) VALUES ('Apple', 1.2), ('Banana', 0.5), ('Orange', 0.8); SELECT JSON_ARRAY(ProductName, Price) FROM products; json_array(productname, price)| ------------------------------+ ["Apple",1.2] | ["Banana",0.5] | ["Orange",0.8] | ``` # JSON_ARRAY_APPLY (Lakehouse v1) > JSON_ARRAY_APPLY — alias for the JSON_ARRAY_TRANSFORM semi-structured data function. Alias for [JSON\_ARRAY\_TRANSFORM](../json-array-transform). # JSON_ARRAY_DISTINCT (Lakehouse v1) > JSON_ARRAY_DISTINCT — removes duplicate elements from a JSON array and returns an array with only distinct elements. Removes duplicate elements from a JSON array and returns an array with only distinct elements. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql JSON_ARRAY_DISTINCT() ``` ## Return Type [Section titled “Return Type”](#return-type) JSON array. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT JSON_ARRAY_DISTINCT('["apple", "banana", "apple", "orange", "banana"]'::VARIANT); -[ RECORD 1 ]----------------------------------- json_array_distinct('["apple", "banana", "apple", "orange", "banana"]'::VARIANT): ["apple","banana","orange"] ``` # JSON_ARRAY_ELEMENTS (Lakehouse v1) > JSON_ARRAY_ELEMENTS — extract elements from a JSON array as individual rows; nested arrays are not recursively expanded. Extracts the elements from a JSON array, returning them as individual rows in the result set. JSON\_ARRAY\_ELEMENTS does not recursively expand nested arrays; it treats them as single elements. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.json_array_elements() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.json_array_elements(func.parse_json('[ \n {"product": "laptop", "brand": "apple", "price": 1500},\n {"product": "smartphone", "brand": "samsung", "price": 800},\n {"product": "headphones", "brand": "sony", "price": 150}\n]')) ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ func.json_array_elements(func.parse_json('[ \n {"product": "laptop", "brand": "apple", "price": 1500},\n {"product": "smartphone", "brand": "samsung", "price": 800},\n {"product": "headphones", "brand": "sony", "price": 150}\n]')) │ ├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ {"brand":"Apple","price":1500,"product":"Laptop"} │ │ {"brand":"Samsung","price":800,"product":"Smartphone"} │ │ {"brand":"Sony","price":150,"product":"Headphones"} │ └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql JSON_ARRAY_ELEMENTS() ``` ## Return Type [Section titled “Return Type”](#return-type) JSON\_ARRAY\_ELEMENTS returns a set of VARIANT values, each representing an element extracted from the input JSON array. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql -- Extract individual elements from a JSON array containing product information SELECT JSON_ARRAY_ELEMENTS( PARSE_JSON ( '[ {"product": "Laptop", "brand": "Apple", "price": 1500}, {"product": "Smartphone", "brand": "Samsung", "price": 800}, {"product": "Headphones", "brand": "Sony", "price": 150} ]' ) ); ┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ json_array_elements(parse_json('[ \n {"product": "laptop", "brand": "apple", "price": 1500},\n {"product": "smartphone", "brand": "samsung", "price": 800},\n {"product": "headphones", "brand": "sony", "price": 150}\n]')) │ ├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ {"brand":"Apple","price":1500,"product":"Laptop"} │ │ {"brand":"Samsung","price":800,"product":"Smartphone"} │ │ {"brand":"Sony","price":150,"product":"Headphones"} │ └─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ -- Display data types of the extracted elements SELECT TYPEOF ( JSON_ARRAY_ELEMENTS( PARSE_JSON ( '[ {"product": "Laptop", "brand": "Apple", "price": 1500}, {"product": "Smartphone", "brand": "Samsung", "price": 800}, {"product": "Headphones", "brand": "Sony", "price": 150} ]' ) ) ); ┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ typeof(json_array_elements(parse_json('[ \n {"product": "laptop", "brand": "apple", "price": 1500},\n {"product": "smartphone", "brand": "samsung", "price": 800},\n {"product": "headphones", "brand": "sony", "price": 150}\n]'))) │ ├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ VARIANT NULL │ │ VARIANT NULL │ │ VARIANT NULL │ └─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ``` # JSON_ARRAY_EXCEPT (Lakehouse v1) > JSON_ARRAY_EXCEPT — returns a new JSON array containing the elements from the first JSON array that are not present in the second JSON array. Returns a new JSON array containing the elements from the first JSON array that are not present in the second JSON array. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql JSON_ARRAY_EXCEPT(, ) ``` ## Return Type [Section titled “Return Type”](#return-type) JSON array. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT JSON_ARRAY_EXCEPT( '["apple", "banana", "orange"]'::JSON, '["banana", "grapes"]'::JSON ); -[ RECORD 1 ]----------------------------------- json_array_except('["apple", "banana", "orange"]'::VARIANT, '["banana", "grapes"]'::VARIANT): ["apple","orange"] -- Return an empty array because all elements in the first array are present in the second array. SELECT json_array_except('["apple", "banana", "orange"]'::VARIANT, '["apple", "banana", "orange"]'::VARIANT) -[ RECORD 1 ]----------------------------------- json_array_except('["apple", "banana", "orange"]'::VARIANT, '["apple", "banana", "orange"]'::VARIANT): [] ``` # JSON_ARRAY_FILTER (Lakehouse v1) > JSON_ARRAY_FILTER — filters elements from a JSON array based on a specified Lambda. Filters elements from a JSON array based on a specified Lambda expression, returning only the elements that satisfy the condition. For more information about Lambda expression, see Lambda Expressions. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql JSON_ARRAY_FILTER(, ) ``` ## Return Type [Section titled “Return Type”](#return-type) JSON array. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) This example filters the array to return only the strings that start with the letter `a`, resulting in `["apple", "avocado"]`: ```sql SELECT JSON_ARRAY_FILTER( ['apple', 'banana', 'avocado', 'grape']::JSON, d -> d::String LIKE 'a%' ); -[ RECORD 1 ]----------------------------------- json_array_filter(['apple', 'banana', 'avocado', 'grape']::VARIANT, d -> d::STRING LIKE 'a%'): ["apple","avocado"] ``` # JSON_ARRAY_INSERT (Lakehouse v1) > JSON_ARRAY_INSERT — inserts a value into a JSON array at the specified index and returns the updated JSON array. Inserts a value into a JSON array at the specified index and returns the updated JSON array. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql JSON_ARRAY_INSERT(, , ) ``` | Parameter | Description | | -------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `` | The JSON array to modify. | | `` | The position at which the value will be inserted. Positive indices insert at the specified position or append if out of range; negative indices insert from the end or at the beginning if out of range. | | `` | The JSON value to insert into the array. | ## Return Type [Section titled “Return Type”](#return-type) JSON array. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) When the `` is a non-negative integer, the new element is inserted at the specified position, and existing elements are shifted to the right. ```sql -- The new element is inserted at position 0 (the beginning of the array), shifting all original elements to the right SELECT JSON_ARRAY_INSERT('["task1", "task2", "task3"]'::VARIANT, 0, '"new_task"'::VARIANT); -[ RECORD 1 ]----------------------------------- json_array_insert('["task1", "task2", "task3"]'::VARIANT, 0, '"new_task"'::VARIANT): ["new_task","task1","task2","task3"] -- The new element is inserted at position 1, between task1 and task2 SELECT JSON_ARRAY_INSERT('["task1", "task2", "task3"]'::VARIANT, 1, '"new_task"'::VARIANT); -[ RECORD 1 ]----------------------------------- json_array_insert('["task1", "task2", "task3"]'::VARIANT, 1, '"new_task"'::VARIANT): ["task1","new_task","task2","task3"] -- If the index exceeds the length of the array, the new element is appended at the end of the array SELECT JSON_ARRAY_INSERT('["task1", "task2", "task3"]'::VARIANT, 6, '"new_task"'::VARIANT); -[ RECORD 1 ]----------------------------------- json_array_insert('["task1", "task2", "task3"]'::VARIANT, 6, '"new_task"'::VARIANT): ["task1","task2","task3","new_task"] ``` A negative `` counts from the end of the array, with `-1` representing the position before the last element, `-2` before the second last, and so on. ```sql -- The new element is inserted just before the last element (task3) SELECT JSON_ARRAY_INSERT('["task1", "task2", "task3"]'::VARIANT, -1, '"new_task"'::VARIANT); -[ RECORD 1 ]----------------------------------- json_array_insert('["task1", "task2", "task3"]'::VARIANT, - 1, '"new_task"'::VARIANT): ["task1","task2","new_task","task3"] -- Since the negative index exceeds the array’s length, the new element is inserted at the beginning SELECT JSON_ARRAY_INSERT('["task1", "task2", "task3"]'::VARIANT, -6, '"new_task"'::VARIANT); -[ RECORD 1 ]----------------------------------- json_array_insert('["task1", "task2", "task3"]'::VARIANT, - 6, '"new_task"'::VARIANT): ["new_task","task1","task2","task3"] ``` # JSON_ARRAY_INTERSECTION (Lakehouse v1) > JSON_ARRAY_INTERSECTION — returns the common elements between two JSON arrays. Returns the common elements between two JSON arrays. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql JSON_ARRAY_INTERSECTION(, ) ``` ## Return Type [Section titled “Return Type”](#return-type) JSON array. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql -- Find the intersection of two JSON arrays SELECT json_array_intersection('["Electronics", "Books", "Toys"]'::JSON, '["Books", "Fashion", "Electronics"]'::JSON); -[ RECORD 1 ]----------------------------------- json_array_intersection('["Electronics", "Books", "Toys"]'::VARIANT, '["Books", "Fashion", "Electronics"]'::VARIANT): ["Electronics","Books"] -- Find the intersection of the result from the first query with a third JSON array using an iterative approach SELECT json_array_intersection( json_array_intersection('["Electronics", "Books", "Toys"]'::JSON, '["Books", "Fashion", "Electronics"]'::JSON), '["Electronics", "Books", "Clothing"]'::JSON ); -[ RECORD 1 ]----------------------------------- json_array_intersection(json_array_intersection('["Electronics", "Books", "Toys"]'::VARIANT, '["Books", "Fashion", "Electronics"]'::VARIANT), '["Electronics", "Books", "Clothing"]'::VARIANT): ["Electronics","Books"] ``` # JSON_ARRAY_MAP (Lakehouse v1) > JSON_ARRAY_MAP — alias for the JSON_ARRAY_TRANSFORM semi-structured data function. Alias for [JSON\_ARRAY\_TRANSFORM](../json-array-transform). # JSON_ARRAY_OVERLAP (Lakehouse v1) > JSON_ARRAY_OVERLAP — checks if there is any overlap between two JSON arrays and returns true if there are common elements; otherwise, it returns false. Checks if there is any overlap between two JSON arrays and returns `true` if there are common elements; otherwise, it returns `false`. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql JSON_ARRAY_OVERLAP(, ) ``` ## Return Type [Section titled “Return Type”](#return-type) The function returns a boolean value: * `true` if there is at least one common element between the two JSON arrays, * `false` if there are no common elements. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT json_array_overlap( '["apple", "banana", "cherry"]'::JSON, '["banana", "kiwi", "mango"]'::JSON ); -[ RECORD 1 ]----------------------------------- json_array_overlap('["apple", "banana", "cherry"]'::VARIANT, '["banana", "kiwi", "mango"]'::VARIANT): true SELECT json_array_overlap( '["grape", "orange"]'::JSON, '["apple", "kiwi"]'::JSON ); -[ RECORD 1 ]----------------------------------- json_array_overlap('["grape", "orange"]'::VARIANT, '["apple", "kiwi"]'::VARIANT): false ``` # JSON_ARRAY_REDUCE (Lakehouse v1) > JSON_ARRAY_REDUCE — reduces a JSON array to a single value by applying a specified. Reduces a JSON array to a single value by applying a specified Lambda expression. For more information about Lambda expression, see Lambda Expressions. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql JSON_ARRAY_REDUCE(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) This example multiplies all the elements in the array (2 \_ 3 \_ 4): ```sql SELECT JSON_ARRAY_REDUCE( [2, 3, 4]::JSON, (acc, d) -> acc::Int * d::Int ); -[ RECORD 1 ]----------------------------------- json_array_reduce([2, 3, 4]::VARIANT, (acc, d) -> acc::Int32 * d::Int32): 24 ``` # JSON_ARRAY_TRANSFORM (Lakehouse v1) > JSON_ARRAY_TRANSFORM — transforms each element of a JSON array using a specified. Transforms each element of a JSON array using a specified transformation Lambda expression. For more information about Lambda expression, see Lambda Expressions. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql JSON_ARRAY_TRANSFORM(, ) ``` ## Aliases [Section titled “Aliases”](#aliases) * [JSON\_ARRAY\_APPLY](../json-array-apply) * [JSON\_ARRAY\_MAP](../json-array-map) ## Return Type [Section titled “Return Type”](#return-type) JSON array. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) In this example, each numeric element in the array is multiplied by 10, transforming the original array into `[10, 20, 30, 40]`: ```sql SELECT JSON_ARRAY_TRANSFORM( [1, 2, 3, 4]::JSON, data -> (data::Int * 10) ); -[ RECORD 1 ]----------------------------------- json_array_transform([1, 2, 3, 4]::VARIANT, data -> data::Int32 * 10): [10,20,30,40] ``` # JSON_EACH (Lakehouse v1) > JSON_EACH — extracts key-value pairs from a JSON object, breaking down the structure into individual rows in the result set. Extracts key-value pairs from a JSON object, breaking down the structure into individual rows in the result set. Each row represents a distinct key-value pair derived from the input JSON expression. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.json_each() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python ┌────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ func.json_each(func.parse_json('{"name": "john", "age": 25, "isstudent": false, "grades": [90, 85, 92]}')) │ ├────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ ('age','25') │ │ ('grades','[90,85,92]') │ │ ('isStudent','false') │ │ ('name','"John"') │ └────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql JSON_EACH() ``` ## Return Type [Section titled “Return Type”](#return-type) JSON\_EACH returns a set of tuples, each consisting of a STRING key and a corresponding VARIANT value. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql -- Extract key-value pairs from a JSON object representing information about a person SELECT JSON_EACH( PARSE_JSON ( '{"name": "John", "age": 25, "isStudent": false, "grades": [90, 85, 92]}' ) ); ┌──────────────────────────────────────────────────────────────────────────────────────────────────┐ │ json_each(parse_json('{"name": "john", "age": 25, "isstudent": false, "grades": [90, 85, 92]}')) │ ├──────────────────────────────────────────────────────────────────────────────────────────────────┤ │ ('age','25') │ │ ('grades','[90,85,92]') │ │ ('isStudent','false') │ │ ('name','"John"') │ └──────────────────────────────────────────────────────────────────────────────────────────────────┘ -- Display data types of the extracted values SELECT TYPEOF ( JSON_EACH( PARSE_JSON ( '{"name": "John", "age": 25, "isStudent": false, "grades": [90, 85, 92]}' ) ) ); ┌──────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ typeof(json_each(parse_json('{"name": "john", "age": 25, "isstudent": false, "grades": [90, 85, 92]}'))) │ ├──────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ TUPLE(STRING, VARIANT) NULL │ │ TUPLE(STRING, VARIANT) NULL │ │ TUPLE(STRING, VARIANT) NULL │ │ TUPLE(STRING, VARIANT) NULL │ └──────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ``` # JSON_EXTRACT_PATH_TEXT (Lakehouse v1) > JSON_EXTRACT_PATH_TEXT — extracts value from a Json string by path_name. Extracts value from a Json string by `path_name`. The value is returned as a `String` or `NULL` if either of the arguments is `NULL`. This function is equivalent to `to_varchar(GET_PATH(PARSE_JSON(JSON), PATH_NAME))`. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.json_extract_path_text(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.json_extract_path_text('{"k1":[0,1,2], "k2":{"k3":3,"k4":4}}', 'k2.k4') ┌──────────────────────────────────────────────────────────────────────────────┐ │ func.json_extract_path_text('{"k1":[0,1,2], "k2":{"k3":3,"k4":4}}', 'k2.k4') │ ├──────────────────────────────────────────────────────────────────────────────┤ │ 4 │ └──────────────────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql JSON_EXTRACT_PATH_TEXT( , ) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | ------------- | ---------------------------------------------------------------- | | `` | The Json String value | | `` | The String value that consists of a concatenation of field names | ## Return Type [Section titled “Return Type”](#return-type) String ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT json_extract_path_text('{"k1":[0,1,2], "k2":{"k3":3,"k4":4}}', 'k1[0]'); ┌─────────────────────────────────────────────────────────────────────────┐ │ json_extract_path_text('{"k1":[0,1,2], "k2":{"k3":3,"k4":4}}', 'k1[0]') │ ├─────────────────────────────────────────────────────────────────────────┤ │ 0 │ └─────────────────────────────────────────────────────────────────────────┘ SELECT json_extract_path_text('{"k1":[0,1,2], "k2":{"k3":3,"k4":4}}', 'k2:k3'); ┌─────────────────────────────────────────────────────────────────────────┐ │ json_extract_path_text('{"k1":[0,1,2], "k2":{"k3":3,"k4":4}}', 'k2:k3') │ ├─────────────────────────────────────────────────────────────────────────┤ │ 3 │ └─────────────────────────────────────────────────────────────────────────┘ SELECT json_extract_path_text('{"k1":[0,1,2], "k2":{"k3":3,"k4":4}}', 'k2.k4'); ┌─────────────────────────────────────────────────────────────────────────┐ │ json_extract_path_text('{"k1":[0,1,2], "k2":{"k3":3,"k4":4}}', 'k2.k4') │ ├─────────────────────────────────────────────────────────────────────────┤ │ 4 │ └─────────────────────────────────────────────────────────────────────────┘ SELECT json_extract_path_text('{"k1":[0,1,2], "k2":{"k3":3,"k4":4}}', 'k2.k5'); ┌─────────────────────────────────────────────────────────────────────────┐ │ json_extract_path_text('{"k1":[0,1,2], "k2":{"k3":3,"k4":4}}', 'k2.k5') │ ├─────────────────────────────────────────────────────────────────────────┤ │ NULL │ └─────────────────────────────────────────────────────────────────────────┘ ``` # JSON_MAP_FILTER (Lakehouse v1) > JSON_MAP_FILTER — filters key-value pairs in a JSON object based on a specified. Filters key-value pairs in a JSON object based on a specified condition, defined using a lambda expression. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql JSON_MAP_FILTER(, (, ) -> ) ``` ## Return Type [Section titled “Return Type”](#return-type) Returns a JSON object with only the key-value pairs that satisfy the specified condition. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) This example extracts only the `"status": "active"` key-value pair from the JSON object, filtering out the other fields: ```sql SELECT JSON_MAP_FILTER('{"status":"active", "user":"admin", "time":"2024-11-01"}'::VARIANT, (k, v) -> k = 'status') AS filtered_metadata; ┌─────────────────────┐ │ filtered_metadata │ ├─────────────────────┤ │ {"status":"active"} │ └─────────────────────┘ ``` # JSON_MAP_TRANSFORM_KEYS (Lakehouse v1) > JSON_MAP_TRANSFORM_KEYS — applies a transformation to each key in a JSON object. Applies a transformation to each key in a JSON object using a lambda expression. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql JSON_MAP_TRANSFORM_KEYS(, (, ) -> ) ``` ## Return Type [Section titled “Return Type”](#return-type) Returns a JSON object with the same values as the input JSON object, but with keys modified according to the specified lambda transformation. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) This example appends “\_v1” to each key, creating a new JSON object with modified keys: ```sql SELECT JSON_MAP_TRANSFORM_KEYS('{"name":"John", "role":"admin"}'::VARIANT, (k, v) -> CONCAT(k, '_v1')) AS versioned_metadata; ┌──────────────────────────────────────┐ │ versioned_metadata │ ├──────────────────────────────────────┤ │ {"name_v1":"John","role_v1":"admin"} │ └──────────────────────────────────────┘ ``` # JSON_MAP_TRANSFORM_VALUES (Lakehouse v1) > JSON_MAP_TRANSFORM_VALUES — applies a transformation to each value in a JSON. Applies a transformation to each value in a JSON object using a lambda expression. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql JSON_MAP_TRANSFORM_VALUES(, (, ) -> ) ``` ## Return Type [Section titled “Return Type”](#return-type) Returns a JSON object with the same keys as the input JSON object, but with values modified according to the specified lambda transformation. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) This example appends ” - Special Offer” to each product description: ```sql SELECT JSON_MAP_TRANSFORM_VALUES('{"product1":"laptop", "product2":"phone"}'::VARIANT, (k, v) -> CONCAT(v, ' - Special Offer')) AS promo_descriptions; ┌──────────────────────────────────────────────────────────────────────────┐ │ promo_descriptions │ ├──────────────────────────────────────────────────────────────────────────┤ │ {"product1":"laptop - Special Offer","product2":"phone - Special Offer"} │ └──────────────────────────────────────────────────────────────────────────┘ ``` # JSON_OBJECT_DELETE (Lakehouse v1) > JSON_OBJECT_DELETE — deletes specified keys from a JSON object and returns the modified object. Deletes specified keys from a JSON object and returns the modified object. If a specified key doesn’t exist in the object, it is ignored. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql json_object_delete(, [, , ...]) ``` ## Parameters [Section titled “Parameters”](#parameters) | Parameter | Description | | ------------- | -------------------------------------------------------------------------------- | | json\_object | A JSON object (VARIANT type) from which to delete keys. | | key1, key2, … | One or more string literals representing the keys to be deleted from the object. | ## Return Type [Section titled “Return Type”](#return-type) Returns a VARIANT containing the modified JSON object with specified keys removed. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) Delete a single key: ```sql SELECT json_object_delete('{"a":1,"b":2,"c":3}'::VARIANT, 'a'); -- Result: {"b":2,"c":3} ``` Delete multiple keys: ```sql SELECT json_object_delete('{"a":1,"b":2,"d":4}'::VARIANT, 'a', 'c'); -- Result: {"b":2,"d":4} ``` Delete a non-existent key (key is ignored): ```sql SELECT json_object_delete('{"a":1,"b":2}'::VARIANT, 'x'); -- Result: {"a":1,"b":2} ``` # JSON_OBJECT_INSERT (Lakehouse v1) > JSON_OBJECT_INSERT — inserts or updates a key-value pair in a JSON object. Inserts or updates a key-value pair in a JSON object. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql JSON_OBJECT_INSERT(, , [, ]) ``` | Parameter | Description | | | --------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | - | | `` | The input JSON object. | | | `` | The key to be inserted or updated. | | | `` | The value to assign to the key. | | | `` | A boolean flag that controls whether to replace the value if the specified key already exists in the JSON object. If `true`, the function replaces the value if the key already exists. If `false` (or omitted), an error occurs if the key exists. | | ## Return Type [Section titled “Return Type”](#return-type) Returns the updated JSON object. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) This example demonstrates how to insert a new key ‘c’ with the value 3 into the existing JSON object: ```sql SELECT JSON_OBJECT_INSERT('{"a":1,"b":2,"d":4}'::variant, 'c', 3); ┌────────────────────────────────────────────────────────────┐ │ json_object_insert('{"a":1,"b":2,"d":4}'::VARIANT, 'c', 3) │ ├────────────────────────────────────────────────────────────┤ │ {"a":1,"b":2,"c":3,"d":4} │ └────────────────────────────────────────────────────────────┘ ``` This example shows how to update the value of an existing key ‘a’ from 1 to 10 using the update flag set to `true`, allowing the key’s value to be replaced: ```sql SELECT JSON_OBJECT_INSERT('{"a":1,"b":2,"d":4}'::variant, 'a', 10, true); ┌───────────────────────────────────────────────────────────────────┐ │ json_object_insert('{"a":1,"b":2,"d":4}'::VARIANT, 'a', 10, TRUE) │ ├───────────────────────────────────────────────────────────────────┤ │ {"a":10,"b":2,"d":4} │ └───────────────────────────────────────────────────────────────────┘ ``` This example demonstrates an error that occurs when trying to insert a value for an existing key ‘a’ without specifying the update flag set to `true`: ```sql SELECT JSON_OBJECT_INSERT('{"a":1,"b":2,"d":4}'::variant, 'a', 10); error: APIError: ResponseError with 1006: ObjectDuplicateKey while evaluating function `json_object_insert('{"a":1,"b":2,"d":4}', 'a', 10)` in expr `json_object_insert('{"a":1,"b":2,"d":4}', 'a', 10)` ``` # JSON_OBJECT_KEEP_NULL (Lakehouse v1) > JSON_OBJECT_KEEP_NULL — creates a JSON object with keys and values. Creates a JSON object with keys and values. * The arguments are zero or more key-value pairs(where keys are strings, and values are of any type). * If a key is NULL, the key-value pair is omitted from the resulting object. However, if a value is NULL, the key-value pair will be kept. * The keys must be distinct from each other, and their order in the resulting JSON might be different from the order you specify. * `TRY_JSON_OBJECT_KEEP_NULL` returns a NULL value if an error occurs when building the object. See also: JSON\_OBJECT ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql JSON_OBJECT_KEEP_NULL(key1, value1[, key2, value2[, ...]]) TRY_JSON_OBJECT_KEEP_NULL(key1, value1[, key2, value2[, ...]]) ``` ## Return Type [Section titled “Return Type”](#return-type) JSON object. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT JSON_OBJECT_KEEP_NULL(); ┌─────────────────────────┐ │ json_object_keep_null() │ ├─────────────────────────┤ │ {} │ └─────────────────────────┘ SELECT JSON_OBJECT_KEEP_NULL('a', 3.14, 'b', 'xx', 'c', NULL); ┌────────────────────────────────────────────────────────┐ │ json_object_keep_null('a', 3.14, 'b', 'xx', 'c', null) │ ├────────────────────────────────────────────────────────┤ │ {"a":3.14,"b":"xx","c":null} │ └────────────────────────────────────────────────────────┘ SELECT JSON_OBJECT_KEEP_NULL('fruits', ['apple', 'banana', 'orange'], 'vegetables', ['carrot', 'celery']); ┌────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ json_object_keep_null('fruits', ['apple', 'banana', 'orange'], 'vegetables', ['carrot', 'celery']) │ ├────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ {"fruits":["apple","banana","orange"],"vegetables":["carrot","celery"]} │ └────────────────────────────────────────────────────────────────────────────────────────────────────┘ SELECT JSON_OBJECT_KEEP_NULL('key'); | 1 | SELECT JSON_OBJECT_KEEP_NULL('key') | ^^^^^^^^^^^^^^^^^^ The number of keys and values must be equal while evaluating function `json_object_keep_null('key')` SELECT TRY_JSON_OBJECT_KEEP_NULL('key'); ┌──────────────────────────────────┐ │ try_json_object_keep_null('key') │ ├──────────────────────────────────┤ │ NULL │ └──────────────────────────────────┘ ``` # JSON_OBJECT_KEYS (Lakehouse v1) > JSON_OBJECT_KEYS — returns an Array containing the list of keys in the input. Returns an Array containing the list of keys in the input Variant OBJECT. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.json_object_keys() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.json_object_keys(func.parse_json(parse_json('{"a": 1, "b": [1,2,3]}')), func.json_object_keys(func.parse_json(parse_json('{"b": [2,3,4]}')) ┌─────────────────────────────────────────────────────────────────┐ │ id │ json_object_keys(var) │ json_object_keys(var) │ ├────────────────┼────────────────────────┼───────────────────────┤ │ 1 │ ["a","b"] │ ["a","b"] │ │ 2 │ ["b"] │ ["b"] │ └─────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql JSON_OBJECT_KEYS() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | ----------- | ----------------------------------------- | | `` | The VARIANT value that contains an OBJECT | ## Aliases [Section titled “Aliases”](#aliases) * [OBJECT\_KEYS](../object-keys) ## Return Type [Section titled “Return Type”](#return-type) Array`` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql CREATE TABLE IF NOT EXISTS objects_test1(id TINYINT, var VARIANT); INSERT INTO objects_test1 VALUES (1, parse_json('{"a": 1, "b": [1,2,3]}')); INSERT INTO objects_test1 VALUES (2, parse_json('{"b": [2,3,4]}')); SELECT id, object_keys(var), json_object_keys(var) FROM objects_test1; ┌────────────────────────────────────────────────────────────┐ │ id │ object_keys(var) │ json_object_keys(var) │ ├────────────────┼───────────────────┼───────────────────────┤ │ 1 │ ["a","b"] │ ["a","b"] │ │ 2 │ ["b"] │ ["b"] │ └────────────────────────────────────────────────────────────┘ ``` # JSON_OBJECT_PICK (Lakehouse v1) > JSON_OBJECT_PICK — creates a new JSON object containing only the specified keys. Creates a new JSON object containing only the specified keys from the input JSON object. If a specified key doesn’t exist in the input object, it is omitted from the result. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql json_object_pick(, [, , ...]) ``` ## Parameters [Section titled “Parameters”](#parameters) | Parameter | Description | | ------------- | -------------------------------------------------------------------------------------- | | json\_object | A JSON object (VARIANT type) from which to pick keys. | | key1, key2, … | One or more string literals representing the keys to be included in the result object. | ## Return Type [Section titled “Return Type”](#return-type) Returns a VARIANT containing a new JSON object with only the specified keys and their corresponding values. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) Pick a single key: ```sql SELECT json_object_pick('{"a":1,"b":2,"c":3}'::VARIANT, 'a'); -- Result: {"a":1} ``` Pick multiple keys: ```sql SELECT json_object_pick('{"a":1,"b":2,"d":4}'::VARIANT, 'a', 'b'); -- Result: {"a":1,"b":2} ``` Pick with non-existent key (non-existent keys are ignored): ```sql SELECT json_object_pick('{"a":1,"b":2,"d":4}'::VARIANT, 'a', 'c'); -- Result: {"a":1} ``` # JSON_PATH_EXISTS (Lakehouse v1) > JSON_PATH_EXISTS — checks whether a specified path exists in JSON data. Checks whether a specified path exists in JSON data. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.json_path_exists(, 1)') ┌─────────────────────────────┐ │ Item 1 │ Item 2 │ ├────────────────┼────────────┤ │ True │ False │ └─────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql JSON_PATH_EXISTS(, ) ``` * json\_data: Specifies the JSON data you want to search within. It can be a JSON object or an array. * json\_path\_expression: Specifies the path, starting from the root of the JSON data represented by `$`, that you want to check within the JSON data. You can also include conditions within the expression, using `@` to refer to the current node or element being evaluated, to filter the results. ## Return Type [Section titled “Return Type”](#return-type) The function returns: * `true` if the specified JSON path (and conditions if any) exists within the JSON data. * `false` if the specified JSON path (and conditions if any) does not exist within the JSON data. * NULL if either the json\_data or json\_path\_expression is NULL or invalid. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT JSON_PATH_EXISTS(parse_json('{"a": 1, "b": 2}'), '$.a ? (@ == 1)'); ---- true SELECT JSON_PATH_EXISTS(parse_json('{"a": 1, "b": 2}'), '$.a ? (@ > 1)'); ---- false SELECT JSON_PATH_EXISTS(NULL, '$.a'); ---- NULL SELECT JSON_PATH_EXISTS(parse_json('{"a": 1, "b": 2}'), NULL); ---- NULL ``` # JSON_PATH_MATCH (Lakehouse v1) > JSON_PATH_MATCH — checks whether a specified JSON path expression matches certain. Checks whether a specified JSON path expression matches certain conditions within a JSON data. Please note that the `@@` operator is synonymous with this function. For more information, see JSON Operators. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.json_path_match(, 1') ┌────────────────────────────────────────────────────────────────────────────┐ │ func.json_path_match(func.parse_json('{"a":1,"b":[1,2,3]}'), '$.b[0] > 1') │ ├────────────────────────────────────────────────────────────────────────────┤ │ false │ └────────────────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql JSON_PATH_MATCH(, ) ``` * `json_data`: Specifies the JSON data you want to examine. It can be a JSON object or an array. * `json_path_expression`: Specifies the conditions to be checked within the JSON data. This expression describes the specific path or criteria to be matched, such as verifying whether specific field values in the JSON structure meet certain conditions. The `$` symbol represents the root of the JSON data. It is used to start the path expression and indicates the top-level object in the JSON structure. ## Return Type [Section titled “Return Type”](#return-type) The function returns: * `true` if the specified JSON path expression matches the conditions within the JSON data. * `false` if the specified JSON path expression does not match the conditions within the JSON data. * NULL if either `json_data` or `json_path_expression` is NULL or invalid. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql -- Check if the value at JSON path $.a is equal to 1 SELECT JSON_PATH_MATCH(parse_json('{"a":1,"b":[1,2,3]}'), '$.a == 1'); ┌────────────────────────────────────────────────────────────────┐ │ json_path_match(parse_json('{"a":1,"b":[1,2,3]}'), '$.a == 1') │ ├────────────────────────────────────────────────────────────────┤ │ true │ └────────────────────────────────────────────────────────────────┘ -- Check if the first element in the array at JSON path $.b is greater than 1 SELECT JSON_PATH_MATCH(parse_json('{"a":1,"b":[1,2,3]}'), '$.b[0] > 1'); ┌──────────────────────────────────────────────────────────────────┐ │ json_path_match(parse_json('{"a":1,"b":[1,2,3]}'), '$.b[0] > 1') │ ├──────────────────────────────────────────────────────────────────┤ │ false │ └──────────────────────────────────────────────────────────────────┘ -- Check if any element in the array at JSON path $.b -- from the second one to the last are greater than or equal to 2 SELECT JSON_PATH_MATCH(parse_json('{"a":1,"b":[1,2,3]}'), '$.b[1 to last] >= 2'); ┌───────────────────────────────────────────────────────────────────────────┐ │ json_path_match(parse_json('{"a":1,"b":[1,2,3]}'), '$.b[1 to last] >= 2') │ ├───────────────────────────────────────────────────────────────────────────┤ │ true │ └───────────────────────────────────────────────────────────────────────────┘ -- NULL is returned if either the json_data or json_path_expression is NULL or invalid. SELECT JSON_PATH_MATCH(parse_json('{"a":1,"b":[1,2,3]}'), NULL); ┌──────────────────────────────────────────────────────────┐ │ json_path_match(parse_json('{"a":1,"b":[1,2,3]}'), null) │ ├──────────────────────────────────────────────────────────┤ │ NULL │ └──────────────────────────────────────────────────────────┘ SELECT JSON_PATH_MATCH(NULL, '$.a == 1'); ┌───────────────────────────────────┐ │ json_path_match(null, '$.a == 1') │ ├───────────────────────────────────┤ │ NULL │ └───────────────────────────────────┘ ``` # JSON_PATH_QUERY (Lakehouse v1) > JSON_PATH_QUERY — get all JSON items returned by JSON path for the specified JSON. Get all JSON items returned by JSON path for the specified JSON value. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.json_path_query(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python table.name, func.json_path_query(table.details, '$.features.*').alias('all_features') ┌────────────┬──────────────┐ │ name │ all_features │ ├────────────┼──────────────┤ │ Laptop │ "16GB" │ │ Laptop │ "512GB" │ │ Smartphone │ "4GB" │ │ Smartphone │ "128GB" │ │ Headphones │ "20h" │ │ Headphones │ "5.0" │ └────────────┴──────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql JSON_PATH_QUERY(, '') ``` ## Return Type [Section titled “Return Type”](#return-type) VARIANT ## SQL Examples [Section titled “SQL Examples”](#sql-examples) **Create a Table and Insert Sample Data** ```sql CREATE TABLE products ( name VARCHAR, details VARIANT ); INSERT INTO products (name, details) VALUES ('Laptop', '{"brand": "Dell", "colors": ["Black", "Silver"], "price": 1200, "features": {"ram": "16GB", "storage": "512GB"}}'), ('Smartphone', '{"brand": "Apple", "colors": ["White", "Black"], "price": 999, "features": {"ram": "4GB", "storage": "128GB"}}'), ('Headphones', '{"brand": "Sony", "colors": ["Black", "Blue", "Red"], "price": 150, "features": {"battery": "20h", "bluetooth": "5.0"}}'); ``` **Query Demo: Extracting All Features from Product Details** ```sql SELECT name, JSON_PATH_QUERY(details, '$.features.*') AS all_features FROM products; ``` **Result** ```sql ┌────────────┬──────────────┐ │ name │ all_features │ ├────────────┼──────────────┤ │ Laptop │ "16GB" │ │ Laptop │ "512GB" │ │ Smartphone │ "4GB" │ │ Smartphone │ "128GB" │ │ Headphones │ "20h" │ │ Headphones │ "5.0" │ └────────────┴──────────────┘ ``` # JSON_PATH_QUERY_ARRAY (Lakehouse v1) > JSON_PATH_QUERY_ARRAY — get all JSON items returned by JSON path for the specified. Get all JSON items returned by JSON path for the specified JSON value and wrap a result into an array. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.json_path_query_array(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python table.name, func.json_path_query_array(table.details, '$.features.*').alias('all_features') name | all_features ------------+----------------------- Laptop | ["16GB", "512GB"] Smartphone | ["4GB", "128GB"] Headphones | ["20h", "5.0"] ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql JSON_PATH_QUERY_ARRAY(, '') ``` ## Return Type [Section titled “Return Type”](#return-type) VARIANT ## SQL Examples [Section titled “SQL Examples”](#sql-examples) **Create a Table and Insert Sample Data** ```sql CREATE TABLE products ( name VARCHAR, details VARIANT ); INSERT INTO products (name, details) VALUES ('Laptop', '{"brand": "Dell", "colors": ["Black", "Silver"], "price": 1200, "features": {"ram": "16GB", "storage": "512GB"}}'), ('Smartphone', '{"brand": "Apple", "colors": ["White", "Black"], "price": 999, "features": {"ram": "4GB", "storage": "128GB"}}'), ('Headphones', '{"brand": "Sony", "colors": ["Black", "Blue", "Red"], "price": 150, "features": {"battery": "20h", "bluetooth": "5.0"}}'); ``` **Query Demo: Extracting All Features from Product Details as an Array** ```sql SELECT name, JSON_PATH_QUERY_ARRAY(details, '$.features.*') AS all_features FROM products; ``` **Result** ```text name | all_features -----------+----------------------- Laptop | ["16GB", "512GB"] Smartphone | ["4GB", "128GB"] Headphones | ["20h", "5.0"] ``` # JSON_PATH_QUERY_FIRST (Lakehouse v1) > JSON_PATH_QUERY_FIRST — get the first JSON item returned by JSON path for the specified JSON value. Get the first JSON item returned by JSON path for the specified JSON value. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.json_path_query_first(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python table.name, func.json_path_query_first(table.details, '$.features.*').alias('first_feature') ┌────────────┬───────────────┐ │ name │ first_feature │ ├────────────┼───────────────┤ │ Laptop │ "16GB" │ │ Laptop │ "16GB" │ │ Smartphone │ "4GB" │ │ Smartphone │ "4GB" │ │ Headphones │ "20h" │ │ Headphones │ "20h" │ └────────────┴───────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql JSON_PATH_QUERY_FIRST(, '') ``` ## Return Type [Section titled “Return Type”](#return-type) VARIANT ## SQL Examples [Section titled “SQL Examples”](#sql-examples) **Create a Table and Insert Sample Data** ```sql CREATE TABLE products ( name VARCHAR, details VARIANT ); INSERT INTO products (name, details) VALUES ('Laptop', '{"brand": "Dell", "colors": ["Black", "Silver"], "price": 1200, "features": {"ram": "16GB", "storage": "512GB"}}'), ('Smartphone', '{"brand": "Apple", "colors": ["White", "Black"], "price": 999, "features": {"ram": "4GB", "storage": "128GB"}}'), ('Headphones', '{"brand": "Sony", "colors": ["Black", "Blue", "Red"], "price": 150, "features": {"battery": "20h", "bluetooth": "5.0"}}'); ``` **Query Demo: Extracting the First Feature from Product Details** ```sql SELECT name, JSON_PATH_QUERY(details, '$.features.*') AS all_features, JSON_PATH_QUERY_FIRST(details, '$.features.*') AS first_feature FROM products; ``` **Result** ```sql ┌────────────┬──────────────┬───────────────┐ │ name │ all_features │ first_feature │ ├────────────┼──────────────┼───────────────┤ │ Laptop │ "16GB" │ "16GB" │ │ Laptop │ "512GB" │ "16GB" │ │ Smartphone │ "4GB" │ "4GB" │ │ Smartphone │ "128GB" │ "4GB" │ │ Headphones │ "20h" │ "20h" │ │ Headphones │ "5.0" │ "20h" │ └────────────┴──────────────┴───────────────┘ ``` # JSON_PRETTY (Lakehouse v1) > JSON_PRETTY — formats JSON data, making it more readable and presentable. Formats JSON data, making it more readable and presentable. It automatically adds indentation, line breaks, and other formatting to the JSON data for better visual representation. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.json_pretty() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.json_pretty(func.parse_json('{"person": {"name": "bob", "age": 25}, "location": "city"}')) ┌─────────────────────────────────────────────────────────────────────────────────────────────────┐ │ func.json_pretty(func.parse_json('{"person": {"name": "bob", "age": 25}, "location": "city"}')) │ │ String │ ├─────────────────────────────────────────────────────────────────────────────────────────────────┤ │ { │ │ "location": "City", │ │ "person": { │ │ "age": 25, │ │ "name": "Bob" │ │ } │ │ } │ └─────────────────────────────────────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql JSON_PRETTY() ``` ## Return Type [Section titled “Return Type”](#return-type) String. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT JSON_PRETTY(PARSE_JSON('{"name":"Alice","age":30}')); --- ┌──────────────────────────────────────────────────────┐ │ json_pretty(parse_json('{"name":"alice","age":30}')) │ │ String │ ├──────────────────────────────────────────────────────┤ │ { │ │ "age": 30, │ │ "name": "Alice" │ │ } │ └──────────────────────────────────────────────────────┘ SELECT JSON_PRETTY(PARSE_JSON('{"person": {"name": "Bob", "age": 25}, "location": "City"}')); --- ┌───────────────────────────────────────────────────────────────────────────────────────┐ │ json_pretty(parse_json('{"person": {"name": "bob", "age": 25}, "location": "city"}')) │ │ String │ ├───────────────────────────────────────────────────────────────────────────────────────┤ │ { │ │ "location": "City", │ │ "person": { │ │ "age": 25, │ │ "name": "Bob" │ │ } │ │ } │ └───────────────────────────────────────────────────────────────────────────────────────┘ ``` # JSON_STRIP_NULLS (Lakehouse v1) > JSON_STRIP_NULLS — removes all properties with null values from a JSON object. Removes all properties with null values from a JSON object. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.json_strip_nulls() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.json_strip_nulls(func.parse_json('{"name": "alice", "age": 30, "city": null}')) ┌─────────────────────────────────────────────────────────────────────────────────────────────────┐ │ func.json_strip_nulls(func.parse_json('{"name": "alice", "age": 30, "city": null}')) │ │ String │ ├─────────────────────────────────────────────────────────────────────────────────────────────────┤ │ {"age":30,"name":"Alice"} │ └─────────────────────────────────────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql JSON_STRIP_NULLS() ``` ## Return Type [Section titled “Return Type”](#return-type) Returns a value of the same type as the input JSON value. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT JSON_STRIP_NULLS(PARSE_JSON('{"name": "Alice", "age": 30, "city": null}')); json_strip_nulls(parse_json('{"name": "alice", "age": 30, "city": null}'))| --------------------------------------------------------------------------+ {"age":30,"name":"Alice"} | ``` # JSON_TO_STRING (Lakehouse v1) > JSON_TO_STRING — alias for the TO_STRING semi-structured data function. Alias for [TO\_STRING](../../02-conversion-functions/to-string). # JSON_TYPEOF (Lakehouse v1) > JSON_TYPEOF — returns the type of the main-level of a JSON structure. Returns the type of the main-level of a JSON structure. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.json_typeof() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.json_typeof(func.parse_json('null'))| -----------------------------------------+ null | -- func.json_typeof(func.parse_json('true'))| -----------------------------------------+ boolean | -- func.json_typeof(func.parse_json('"plaidcloud"'))| -----------------------------------------------+ string | -- func.json_typeof(func.parse_json('-1.23'))| ------------------------------------------+ number | -- func.json_typeof(func.parse_json('[1,2,3]'))| --------------------------------------------+ array | -- func.json_typeof(func.parse_json('{"name": "alice", "age": 30}'))| -----------------------------------------------------------------+ object | ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql JSON_TYPEOF() ``` ## Return Type [Section titled “Return Type”](#return-type) The return type of the json\_typeof function (or similar) is a string that indicates the data type of the parsed JSON value. The possible return values are: ‘null’, ‘boolean’, ‘string’, ‘number’, ‘array’, and ‘object’. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql -- Parsing a JSON value that is NULL SELECT JSON_TYPEOF(PARSE_JSON(NULL)); -- func.json_typeof(func.parse_json(null))| -----------------------------+ | -- Parsing a JSON value that is the string 'null' SELECT JSON_TYPEOF(PARSE_JSON('null')); -- func.json_typeof(func.parse_json('null'))| -------------------------------+ null | SELECT JSON_TYPEOF(PARSE_JSON('true')); -- func.json_typeof(func.parse_json('true'))| -------------------------------+ boolean | SELECT JSON_TYPEOF(PARSE_JSON('"PlaidCloud Lakehouse"')); -- func.json_typeof(func.parse_json('"databend"'))| -------------------------------------+ string | SELECT JSON_TYPEOF(PARSE_JSON('-1.23')); -- func.json_typeof(func.parse_json('-1.23'))| --------------------------------+ number | SELECT JSON_TYPEOF(PARSE_JSON('[1,2,3]')); -- func.json_typeof(func.parse_json('[1,2,3]'))| ----------------------------------+ array | SELECT JSON_TYPEOF(PARSE_JSON('{"name": "Alice", "age": 30}')); -- func.json_typeof(func.parse_json('{"name": "alice", "age": 30}'))| -------------------------------------------------------+ object | ``` # OBJECT_KEYS (Lakehouse v1) > OBJECT_KEYS — alias for the JSON_OBJECT_KEYS semi-structured data function. Alias for [JSON\_OBJECT\_KEYS](../json-object-keys). # PARSE_JSON (Lakehouse v1) > Interprets input JSON string, producing a VARIANT value. `parse_json` and `try_parse_json` interprets an input string as a JSON document, producing a VARIANT value. `try_parse_json` returns a NULL value if an error occurs during parsing. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.parse_json() or func.try_parse_json() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.parse_json('[-1, 12, 289, 2188, false]') ┌───────────────────────────────────────────────┐ │ func.parse_json('[-1, 12, 289, 2188, false]') │ ├───────────────────────────────────────────────┤ │ [-1,12,289,2188,false] │ └───────────────────────────────────────────────┘ func.try_parse_json('{ "x" : "abc", "y" : false, "z": 10} ') ┌──────────────────────────────────────────────────────────────┐ │ func.try_parse_json('{ "x" : "abc", "y" : false, "z": 10} ') │ ├──────────────────────────────────────────────────────────────┤ │ {"x":"abc","y":false,"z":10} │ └──────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql PARSE_JSON() TRY_PARSE_JSON() ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ------------------------------------------------------------------------------ | | `` | An expression of string type (e.g. VARCHAR) that holds valid JSON information. | ## Return Type [Section titled “Return Type”](#return-type) VARIANT ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT parse_json('[-1, 12, 289, 2188, false]'); ┌──────────────────────────────────────────┐ │ parse_json('[-1, 12, 289, 2188, false]') │ ├──────────────────────────────────────────┤ │ [-1,12,289,2188,false] │ └──────────────────────────────────────────┘ SELECT try_parse_json('{ "x" : "abc", "y" : false, "z": 10} '); ┌─────────────────────────────────────────────────────────┐ │ try_parse_json('{ "x" : "abc", "y" : false, "z": 10} ') │ ├─────────────────────────────────────────────────────────┤ │ {"x":"abc","y":false,"z":10} │ └─────────────────────────────────────────────────────────┘ ``` # AI Functions (Lakehouse v1) > Using SQL-based AI Functions for Knowledge Base Search and Text Completion. This document demonstrates how to leverage PlaidCloud Lakehouse’s built-in AI functions for creating document embeddings, searching for similar documents, and generating text completions based on context. # AI_TO_SQL (Lakehouse v1) > AI_TO_SQL — converts natural language instructions into SQL queries with the latest model. Converts natural language instructions into SQL queries with the latest model `text-davinci-003`. PlaidCloud Lakehouse offers an efficient solution for constructing SQL queries by incorporating OLAP and AI. Through this function, instructions written in a natural language can be converted into SQL query statements that align with the table schema. For example, the function can be provided with a sentence like “Get all items that cost 10 dollars or less” as an input and generate the corresponding SQL query “SELECT \* FROM items WHERE price <= 10” as output. See the [upstream implementation](https://github.com/datafuselabs/databend/blob/1e93c5b562bd159ecb0f336bb88fd1b7f9dc4a62/src/query/service/src/table_functions/openai/ai_to_sql.rs). Note The SQL query statements generated adhere to the PostgreSQL standards, so they might require manual revisions to align with the syntax of PlaidCloud Lakehouse. Note Starting from PlaidCloud Lakehouse v1.1.47, PlaidCloud Lakehouse supports the [Azure OpenAI service](https://azure.microsoft.com/en-au/products/cognitive-services/openai-service). This integration offers improved data privacy. To use Azure OpenAI, add the following configurations to the `[query]` section: `sql # Azure OpenAI openai_api_chat_base_url = "https://.openai.azure.com/openai/deployments//" openai_api_embedding_base_url = "https://.openai.azure.com/openai/deployments//" openai_api_version = "2023-03-15-preview"` Caution PlaidCloud Lakehouse relies on (Azure) OpenAI for `AI_TO_SQL` but only sends the table schema to (Azure) OpenAI, not the data. They will only work when the PlaidCloud Lakehouse configuration includes the `openai_api_key`, otherwise they will be inactive. This function is available by default on PlaidCloud Lakehouse using an Azure OpenAI key. If you use them, you acknowledge that your table schema will be sent to Azure OpenAI by us. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.ai_to_sql('') ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) In this example, an SQL query statement is generated from an instruction with the AI\_TO\_SQL function, and the resulting statement is executed to obtain the query results. ```python func.ai_to_sql('List the total amount spent by users from the USA who are older than 30 years, grouped by their names, along with the number of orders they made in 2022') ``` A SQL statement is generated by the function as the output: ```sql *************************** 1. row *************************** database: openai generated_sql: SELECT name, SUM(price) AS total_spent, COUNT(order_id) AS total_orders FROM users JOIN orders ON users.id = orders.user_id WHERE country = 'USA' AND age > 30 AND order_date BETWEEN '2022-01-01' AND '2022-12-31' GROUP BY name; ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql USE ; SELECT * FROM ai_to_sql(''); ``` Note Obtain and Config OpenAI API Key - To obtain your openAI API key, please visit and generate a new key. - Configure the **databend-query.toml** file with the openai\_api\_key setting. `toml [query] ... ... openai_api_key = ""` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) In this example, an SQL query statement is generated from an instruction with the AI\_TO\_SQL function, and the resulting statement is executed to obtain the query results. 1. Prepare data. ```sql CREATE DATABASE IF NOT EXISTS openai; USE openai; CREATE TABLE users( id INT, name VARCHAR, age INT, country VARCHAR ); CREATE TABLE orders( order_id INT, user_id INT, product_name VARCHAR, price DECIMAL(10,2), order_date DATE ); -- Insert sample data into the users table INSERT INTO users VALUES (1, 'Alice', 31, 'USA'), (2, 'Bob', 32, 'USA'), (3, 'Charlie', 45, 'USA'), (4, 'Diana', 29, 'USA'), (5, 'Eva', 35, 'Canada'); -- Insert sample data into the orders table INSERT INTO orders VALUES (1, 1, 'iPhone', 1000.00, '2022-03-05'), (2, 1, 'OpenAI Plus', 20.00, '2022-03-06'), (3, 2, 'OpenAI Plus', 20.00, '2022-03-07'), (4, 2, 'MacBook Pro', 2000.00, '2022-03-10'), (5, 3, 'iPad', 500.00, '2022-03-12'), (6, 3, 'AirPods', 200.00, '2022-03-14'); ``` 2. Run the AI\_TO\_SQL function with an instruction written in English as the input. ```sql SELECT * FROM ai_to_sql( 'List the total amount spent by users from the USA who are older than 30 years, grouped by their names, along with the number of orders they made in 2022'); ``` A SQL statement is generated by the function as the output: ```sql *************************** 1. row *************************** database: openai generated_sql: SELECT name, SUM(price) AS total_spent, COUNT(order_id) AS total_orders FROM users JOIN orders ON users.id = orders.user_id WHERE country = 'USA' AND age > 30 AND order_date BETWEEN '2022-01-01' AND '2022-12-31' GROUP BY name; ``` 3. Run the generated SQL statement to get the query results. ```sql ┌─────────┬─────────────┬─────────────┐ │ name │ order_count │ total_spent │ ├─────────┼─────────────┼─────────────┤ │ Bob │ 2 │ 2020.00 │ │ Alice │ 2 │ 1020.00 │ │ Charlie │ 2 │ 700.00 │ └─────────┴─────────────┴─────────────┘ ``` # AI_EMBEDDING_VECTOR (Lakehouse v1) > Creating embeddings using the ai_embedding_vector function in PlaidCloud Lakehouse. This document provides an overview of the ai\_embedding\_vector function in PlaidCloud Lakehouse and demonstrates how to create document embeddings using this function. See the [upstream implementation](https://github.com/datafuselabs/databend/blob/1e93c5b562bd159ecb0f336bb88fd1b7f9dc4a62/src/common/openai/src/embedding.rs). By default, PlaidCloud Lakehouse leverages the [text-embedding-ada](https://platform.openai.com/docs/models/embeddings) model for generating embeddings. Note Starting from PlaidCloud Lakehouse v1.1.47, PlaidCloud Lakehouse supports the [Azure OpenAI service](https://azure.microsoft.com/en-au/products/cognitive-services/openai-service). This integration offers improved data privacy. To use Azure OpenAI, add the following configurations to the `[query]` section: `sql # Azure OpenAI openai_api_chat_base_url = "https://.openai.azure.com/openai/deployments//" openai_api_embedding_base_url = "https://.openai.azure.com/openai/deployments//" openai_api_version = "2023-03-15-preview"` Caution PlaidCloud Lakehouse relies on (Azure) OpenAI for `AI_EMBEDDING_VECTOR` and sends the embedding column data to (Azure) OpenAI. They will only work when the PlaidCloud Lakehouse configuration includes the `openai_api_key`, otherwise they will be inactive. This function is available by default on PlaidCloud Lakehouse using an Azure OpenAI key. If you use them, you acknowledge that your data will be sent to Azure OpenAI by us. ## Overview of Ai\_embedding\_vector [Section titled “Overview of Ai\_embedding\_vector”](#overview-of-ai_embedding_vector) The `ai_embedding_vector` function in PlaidCloud Lakehouse is a built-in function that generates vector embeddings for text data. It is useful for natural language processing tasks, such as document similarity, clustering, and recommendation systems. The function takes a text input and returns a high-dimensional vector that represents the input text’s semantic meaning and context. The embeddings are created using pre-trained models on large text corpora, capturing the relationships between words and phrases in a continuous space. ## Creating Embeddings Using Ai\_embedding\_vector [Section titled “Creating Embeddings Using Ai\_embedding\_vector”](#creating-embeddings-using-ai_embedding_vector) To create embeddings for a text document using the `ai_embedding_vector` function, follow the example below. 1. Create a table to store the documents: ```sql CREATE TABLE documents ( id INT, title VARCHAR, content VARCHAR, embedding ARRAY(FLOAT32) ); ``` 2. Insert example documents into the table: ```sql INSERT INTO documents(id, title, content) VALUES (1, 'A Brief History of AI', 'Artificial intelligence (AI) has been a fascinating concept of science fiction for decades...'), (2, 'Machine Learning vs. Deep Learning', 'Machine learning and deep learning are two subsets of artificial intelligence...'), (3, 'Neural Networks Explained', 'A neural network is a series of algorithms that endeavors to recognize underlying relationships...'), ``` 3. Generate the embeddings: ```sql UPDATE documents SET embedding = ai_embedding_vector(content) WHERE length(embedding) = 0; ``` After running the query, the embedding column in the table will contain the generated embeddings. The embeddings are stored as an array of `FLOAT32` values in the embedding column, which has the `ARRAY(FLOAT32)` column type. You can now use these embeddings for various natural language processing tasks, such as finding similar documents or clustering documents based on their content. 4. Inspect the embeddings: ```sql SELECT length(embedding) FROM documents; ┌───────────────────┐ │ length(embedding) │ ├───────────────────┤ │ 1536 │ │ 1536 │ │ 1536 │ └───────────────────┘ ``` The query above shows that the generated embeddings have a length of 1536(dimensions) for each document. # AI_TEXT_COMPLETION (Lakehouse v1) > Generating text completions using the ai_text_completion function in PlaidCloud Lakehouse. This document provides an overview of the `ai_text_completion` function in PlaidCloud Lakehouse and demonstrates how to generate text completions using this function. See the [upstream implementation](https://github.com/datafuselabs/databend/blob/1e93c5b562bd159ecb0f336bb88fd1b7f9dc4a62/src/common/openai/src/completion.rs). Note Starting from PlaidCloud Lakehouse v1.1.47, PlaidCloud Lakehouse supports the [Azure OpenAI service](https://azure.microsoft.com/en-au/products/cognitive-services/openai-service). This integration offers improved data privacy. To use Azure OpenAI, add the following configurations to the `[query]` section: `sql # Azure OpenAI openai_api_chat_base_url = "https://.openai.azure.com/openai/deployments//" openai_api_embedding_base_url = "https://.openai.azure.com/openai/deployments//" openai_api_version = "2023-03-15-preview"` Caution PlaidCloud Lakehouse relies on (Azure) OpenAI for `AI_TEXT_COMPLETION` and sends the completion prompt data to (Azure) OpenAI. They will only work when the PlaidCloud Lakehouse configuration includes the `openai_api_key`, otherwise they will be inactive. This function is available by default on PlaidCloud Lakehouse using an Azure OpenAI key. If you use them, you acknowledge that your data will be sent to Azure OpenAI by us. ## Overview of Ai\_text\_completion [Section titled “Overview of Ai\_text\_completion”](#overview-of-ai_text_completion) The `ai_text_completion` function in PlaidCloud Lakehouse is a built-in function that generates text completions based on a given prompt. It is useful for natural language processing tasks, such as question answering, text generation, and autocompletion systems. The function takes a text prompt as input and returns a generated completion for the prompt. The completions are created using pre-trained models on large text corpora, capturing the relationships between words and phrases in a continuous space. ## Generating Text Completions Using Ai\_text\_completion [Section titled “Generating Text Completions Using Ai\_text\_completion”](#generating-text-completions-using-ai_text_completion) Here is a simple example using the `ai_text_completion` function in PlaidCloud Lakehouse to generate a text completion: ```sql SELECT ai_text_completion('What is artificial intelligence?') AS completion; ``` Result: ```sql ┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ completion │ ├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ Artificial intelligence (AI) is the field of study focused on creating machines and software capable of thinking, learning, and solving problems in a way that mimics human intelligence. This includes areas such as machine learning, natural language processing, computer vision, and robotics. │ └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ``` In this example, we provide the prompt “What is artificial intelligence?” to the `ai_text_completion` function, and it returns a generated completion that briefly describes artificial intelligence. # COSINE_DISTANCE (Lakehouse v1) > Measuring similarity using the cosine_distance function in PlaidCloud Lakehouse. This document provides an overview of the cosine\_distance function in PlaidCloud Lakehouse and demonstrates how to measure document similarity using this function. Note The cosine\_distance function performs vector computations within PlaidCloud Lakehouse and does not rely on the (Azure) OpenAI API. The cosine\_distance function in PlaidCloud Lakehouse is a built-in function that calculates the cosine distance between two vectors. It is commonly used in natural language processing tasks, such as document similarity and recommendation systems. Cosine distance is a measure of similarity between two vectors, based on the cosine of the angle between them. The function takes two input vectors and returns a value between 0 and 1, with 0 indicating identical vectors and 1 indicating orthogonal (completely dissimilar) vectors. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.cosine_distance(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) **Creating a Table and Inserting Sample Data** Let’s create a table to store some sample text documents and their corresponding embeddings: ```sql CREATE TABLE articles ( id INT, title VARCHAR, content VARCHAR, embedding ARRAY(FLOAT32) ); ``` Now, let’s insert some sample documents into the table: ```sql INSERT INTO articles (id, title, content, embedding) VALUES (1, 'Python for Data Science', 'Python is a versatile programming language widely used in data science...', ai_embedding_vector('Python is a versatile programming language widely used in data science...')), (2, 'Introduction to R', 'R is a popular programming language for statistical computing and graphics...', ai_embedding_vector('R is a popular programming language for statistical computing and graphics...')), (3, 'Getting Started with SQL', 'Structured Query Language (SQL) is a domain-specific language used for managing relational databases...', ai_embedding_vector('Structured Query Language (SQL) is a domain-specific language used for managing relational databases...')); ``` **Querying for Similar Documents** Now, let’s find the documents that are most similar to a given query using the cosine\_distance function: ```sql SELECT id, title, content, cosine_distance(embedding, ai_embedding_vector('How to use Python in data analysis?')) AS similarity FROM articles ORDER BY similarity ASC LIMIT 3; ``` Result: ```sql ┌──────┬──────────────────────────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────┬────────────┐ │ id │ title │ content │ similarity │ ├──────┼──────────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────┼────────────┤ │ 1 │ Python for Data Science │ Python is a versatile programming language widely used in data science... │ 0.1142081 │ │ 2 │ Introduction to R │ R is a popular programming language for statistical computing and graphics... │ 0.18741018 │ │ 3 │ Getting Started with SQL │ Structured Query Language (SQL) is a domain-specific language used for managing relational databases... │ 0.25137568 │ └──────┴──────────────────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────┴────────────┘ ``` # Hash Functions (Lakehouse v1) > Lakehouse v1 SQL hash functions: compute deterministic hashes (MD5, SHA, xxHash, CityHash) for dedup, sampling, and integrity checks. This section provides reference information for the Hash functions in PlaidCloud Lakehouse. # BLAKE3 (Lakehouse v1) > BLAKE3 — Calculates a BLAKE3 256-bit checksum for a string. Calculates a BLAKE3 256-bit checksum for a string. The value is returned as a string of 64 hexadecimal digits or NULL if the argument was NULL. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.blake3() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.blake3('1234567890') ┌──────────────────────────────────────────────────────────────────┐ │ func.blake3('1234567890') │ ├──────────────────────────────────────────────────────────────────┤ │ d12e417e04494572b561ba2c12c3d7f9e5107c4747e27b9a8a54f8480c63e841 │ └──────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BLAKE3() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BLAKE3('1234567890'); ┌──────────────────────────────────────────────────────────────────┐ │ blake3('1234567890') │ ├──────────────────────────────────────────────────────────────────┤ │ d12e417e04494572b561ba2c12c3d7f9e5107c4747e27b9a8a54f8480c63e841 │ └──────────────────────────────────────────────────────────────────┘ ``` # CITY64WITHSEED (Lakehouse v1) > CITY64WITHSEED — Calculates a City64WithSeed 64-bit hash for a string. Calculates a City64WithSeed 64-bit hash for a string. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.city64withseed(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.city64withseed('1234567890', 12) ┌───────────────────────────────────────┐ │ func.city64withseed('1234567890', 12) │ ├───────────────────────────────────────┤ │ 10660895976650300430 │ └───────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql CITY64WITHSEED(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT CITY64WITHSEED('1234567890', 12); ┌──────────────────────────────────┐ │ city64withseed('1234567890', 12) │ ├──────────────────────────────────┤ │ 10660895976650300430 │ └──────────────────────────────────┘ ``` # MD5 (Lakehouse v1) > MD5 — Calculates an MD5 128-bit checksum for a string. Calculates an MD5 128-bit checksum for a string. The value is returned as a string of 32 hexadecimal digits or NULL if the argument was NULL. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.md5() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.md5('1234567890') ┌──────────────────────────────────────────┐ │ func.md5('1234567890') │ ├──────────────────────────────────────────┤ │ e807f1fcf82d132f9bb018ca6738a19f │ └──────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql MD5() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT MD5('1234567890'); ┌──────────────────────────────────┐ │ md5('1234567890') │ ├──────────────────────────────────┤ │ e807f1fcf82d132f9bb018ca6738a19f │ └──────────────────────────────────┘ ``` # SHA (Lakehouse v1) > SHA — calculates an SHA-1 160-bit checksum for the string, as described in RFC 3174 (Secure Hash. Calculates an SHA-1 160-bit checksum for the string, as described in RFC 3174 (Secure Hash Algorithm). The value is returned as a string of 40 hexadecimal digits or NULL if the argument was NULL. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.sha() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.sha('1234567890') ┌──────────────────────────────────────────┐ │ func.sha('1234567890') │ ├──────────────────────────────────────────┤ │ 01b307acba4f54f55aafc33bb06bbbf6ca803e9a │ └──────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql SHA() ``` ## Aliases [Section titled “Aliases”](#aliases) * [SHA1](../sha1) ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT SHA('1234567890'), SHA1('1234567890'); ┌─────────────────────────────────────────────────────────────────────────────────────┐ │ sha('1234567890') │ sha1('1234567890') │ ├──────────────────────────────────────────┼──────────────────────────────────────────┤ │ 01b307acba4f54f55aafc33bb06bbbf6ca803e9a │ 01b307acba4f54f55aafc33bb06bbbf6ca803e9a │ └─────────────────────────────────────────────────────────────────────────────────────┘ ``` # SHA1 (Lakehouse v1) > SHA1 — Alias for SHA. Computes a cryptographic hash of the input. Alias for [SHA](../sha). # SHA2 (Lakehouse v1) > SHA2 — calculates the SHA-2 family of hash functions (SHA-224, SHA-256, SHA-384, and SHA-512). Calculates the SHA-2 family of hash functions (SHA-224, SHA-256, SHA-384, and SHA-512). If the hash length is not one of the permitted values, the return value is NULL. Otherwise, the function result is a hash value containing the desired number of bits as a string of hexadecimal digits. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.sha2(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.sha2('1234567890', 0) ┌──────────────────────────────────────────────────────────────────┐ │ func.sha2('1234567890', 0)) │ ├──────────────────────────────────────────────────────────────────┤ │ c775e7b757ede630cd0aa1113bd102661ab38829ca52a6422ab782862f268646 │ └──────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql SHA2(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT SHA2('1234567890', 0); ┌──────────────────────────────────────────────────────────────────┐ │ sha2('1234567890', 0) │ ├──────────────────────────────────────────────────────────────────┤ │ c775e7b757ede630cd0aa1113bd102661ab38829ca52a6422ab782862f268646 │ └──────────────────────────────────────────────────────────────────┘ ``` # SIPHASH (Lakehouse v1) > SIPHASH — alias for the SIPHASH64 hash function. Alias for [SIPHASH64](../siphash64). # SIPHASH64 (Lakehouse v1) > SIPHASH64 — produces a 64-bit SipHash hash value. Produces a 64-bit [SipHash](https://en.wikipedia.org/wiki/SipHash) hash value. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.siphash64() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.siphash64('1234567890') ┌───────────────────────────────┐ │ func.siphash64('1234567890') │ ├───────────────────────────────┤ │ 18110648197875983073 │ └───────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql SIPHASH64() ``` ## Aliases [Section titled “Aliases”](#aliases) * [SIPHASH](../siphash) ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT SIPHASH('1234567890'), SIPHASH64('1234567890'); ┌─────────────────────────────────────────────────┐ │ siphash('1234567890') │ siphash64('1234567890') │ ├───────────────────────┼─────────────────────────┤ │ 18110648197875983073 │ 18110648197875983073 │ └─────────────────────────────────────────────────┘ ``` # XXHASH32 (Lakehouse v1) > XXHASH32 — Calculates an xxHash32 32-bit hash value for a string. Calculates an xxHash32 32-bit hash value for a string. The value is returned as a UInt32 or NULL if the argument was NULL. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.xxhash32() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.xxhash32('1234567890') ┌─────────────────────────────┐ │ func.xxhash32('1234567890') │ ├─────────────────────────────┤ │ 3896585587 │ └─────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql XXHASH32() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT XXHASH32('1234567890'); ┌────────────────────────┐ │ xxhash32('1234567890') │ ├────────────────────────┤ │ 3896585587 │ └────────────────────────┘ ``` # XXHASH64 (Lakehouse v1) > XXHASH64 — Calculates an xxHash64 64-bit hash value for a string. Calculates an xxHash64 64-bit hash value for a string. The value is returned as a UInt64 or NULL if the argument was NULL. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.xxhash64() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.xxhash64('1234567890') ┌─────────────────────────────┐ │ func.xxhash64('1234567890') │ ├─────────────────────────────┤ │ 12237639266330420150 │ └─────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql XXHASH64() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT XXHASH64('1234567890'); ┌────────────────────────┐ │ xxhash64('1234567890') │ ├────────────────────────┤ │ 12237639266330420150 │ └────────────────────────┘ ``` # UUID Functions (Lakehouse v1) > Lakehouse v1 SQL uuid functions: generate and parse UUID values. This section provides reference information for the UUID-related functions in PlaidCloud Lakehouse. # GEN_RANDOM_UUID (Lakehouse v1) > GEN_RANDOM_UUID — generates a random UUID based on v4. Generates a random UUID based on v4. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.gen_random_uuid() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```python func.gen_random_uuid() ┌───────────────────────────────────────┐ │ func.gen_random_uuid() │ ├───────────────────────────────────────| │ f88e7efe-1bc2-494b-806b-3ffe90db8f47 │ └───────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql GEN_RANDOM_UUID() ``` ## Aliases [Section titled “Aliases”](#aliases) * [UUID](../uuid) ## SQL Examples [Section titled “SQL Examples”](#sql-examples-1) ```sql SELECT GEN_RANDOM_UUID(), UUID(); ┌─────────────────────────────────────────────────────────────────────────────┐ │ gen_random_uuid() │ uuid() │ ├──────────────────────────────────────┼──────────────────────────────────────┤ │ f88e7efe-1bc2-494b-806b-3ffe90db8f47 │ f88e7efe-1bc2-494b-806b-3ffe90db8f47 │ └─────────────────────────────────────────────────────────────────────────────┘ ``` # UUID (Lakehouse v1) > UUID — alias for the GEN_RANDOM_UUID UUID function. Alias for [GEN\_RANDOM\_UUID](../gen-random-uuid). # IP Address Functions (Lakehouse v1) > Lakehouse v1 SQL ip address functions: work with IPv4 and IPv6 values — parse, compare, range-check, and convert addresses. This section provides reference information for the IP address-related functions in PlaidCloud Lakehouse. # INET_ATON (Lakehouse v1) > INET_ATON — converts an IPv4 address to a 32-bit integer. Converts an IPv4 address to a 32-bit integer. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.inet_aton() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.inet_aton('1.2.3.4') ┌───────────────────────────────┐ │ func.inet_aton('1.2.3.4') │ ├───────────────────────────────┤ │ 16909060 │ └───────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql INET_ATON() ``` ## Aliases [Section titled “Aliases”](#aliases) * [IPV4\_STRING\_TO\_NUM](../ipv4-string-to-num) ## Return Type [Section titled “Return Type”](#return-type) Integer. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT IPV4_STRING_TO_NUM('1.2.3.4'), INET_ATON('1.2.3.4'); ┌──────────────────────────────────────────────────────┐ │ ipv4_string_to_num('1.2.3.4') │ inet_aton('1.2.3.4') │ ├───────────────────────────────┼──────────────────────┤ │ 16909060 │ 16909060 │ └──────────────────────────────────────────────────────┘ ``` # INET_NTOA (Lakehouse v1) > INET_NTOA — converts a 32-bit integer to an IPv4 address. Converts a 32-bit integer to an IPv4 address. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.inet_ntoa() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python SELECT func.inet_ntoa(16909060) ┌──────────────────────────────┐ │ func.inet_ntoa(16909060) │ ├──────────────────────────────┤ │ 1.2.3.4 │ └──────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql INET_NOTA( ) ``` ## Aliases [Section titled “Aliases”](#aliases) * [IPV4\_NUM\_TO\_STRING](../ipv4-num-to-string) ## Return Type [Section titled “Return Type”](#return-type) String. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT IPV4_NUM_TO_STRING(16909060), INET_NTOA(16909060); ┌────────────────────────────────────────────────────┐ │ ipv4_num_to_string(16909060) │ inet_ntoa(16909060) │ ├──────────────────────────────┼─────────────────────┤ │ 1.2.3.4 │ 1.2.3.4 │ └────────────────────────────────────────────────────┘ ``` # IPV4_NUM_TO_STRING (Lakehouse v1) > IPV4_NUM_TO_STRING — alias for the INET_NTOA IP address function. Alias for [INET\_NTOA](../inet-ntoa). # IPV4_STRING_TO_NUM (Lakehouse v1) > IPV4_STRING_TO_NUM — alias for the INET_ATON IP address function. Alias for [INET\_ATON](../inet-aton). # TRY_INET_ATON (Lakehouse v1) > TRY_INET_ATON — converts an IPv4 address (dotted-quad string) to its 32-bit integer representation. try\_inet\_aton function is used to take the dotted-quad representation of an IPv4 address as a string and returns the numeric value of the given IP address in form of an integer. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.try_inet_aton() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.try_inet_aton('10.0.5.9') ┌────────────────────────────────┐ │ func.try_inet_aton('10.0.5.9') │ ├────────────────────────────────┤ │ 167773449 │ └────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TRY_INET_ATON( ) ``` ## Aliases [Section titled “Aliases”](#aliases) * [TRY\_IPV4\_STRING\_TO\_NUM](../try-ipv4-string-to-num) ## Return Type [Section titled “Return Type”](#return-type) Integer. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT TRY_INET_ATON('10.0.5.9'), TRY_IPV4_STRING_TO_NUM('10.0.5.9'); ┌────────────────────────────────────────────────────────────────┐ │ try_inet_aton('10.0.5.9') │ try_ipv4_string_to_num('10.0.5.9') │ │ UInt32 │ UInt32 │ ├───────────────────────────┼────────────────────────────────────┤ │ 167773449 │ 167773449 │ └────────────────────────────────────────────────────────────────┘ ``` # TRY_INET_NTOA (Lakehouse v1) > TRY_INET_NTOA — convert an IPv4 address in network byte order to its dotted-quad string representation, returning NULL on failure. Takes an IPv4 address in network byte order and then returns the address as a dotted-quad string representation. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.try_inet_ntoa() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.try_inet_ntoaA(167773449) ┌───────────────────────────────┐ │ func.try_inet_ntoa(167773449) │ ├───────────────────────────────┤ │ 10.0.5.9 │ └───────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TRY_INET_NTOA( ) ``` ## Aliases [Section titled “Aliases”](#aliases) * [TRY\_IPV4\_NUM\_TO\_STRING](../try-ipv4-num-to-string) ## Return Type [Section titled “Return Type”](#return-type) String. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT TRY_INET_NTOA(167773449), TRY_IPV4_NUM_TO_STRING(167773449); ┌──────────────────────────────────────────────────────────────┐ │ try_inet_ntoa(167773449) │ try_ipv4_num_to_string(167773449) │ ├──────────────────────────┼───────────────────────────────────┤ │ 10.0.5.9 │ 10.0.5.9 │ └──────────────────────────────────────────────────────────────┘ ``` # TRY_IPV4_NUM_TO_STRING (Lakehouse v1) > TRY_IPV4_NUM_TO_STRING — alias for the TRY_INET_NTOA IP address function. Alias for [TRY\_INET\_NTOA](../try-inet-ntoa). # TRY_IPV4_STRING_TO_NUM (Lakehouse v1) > TRY_IPV4_STRING_TO_NUM — alias for the TRY_INET_ATON IP address function. Alias for [TRY\_INET\_ATON](../try-inet-aton). # Context Functions (Lakehouse v1) > Lakehouse v1 SQL context functions: access query context — current database, role, session ID, and timing. This section provides reference information for the context-related functions in PlaidCloud Lakehouse. # CONNECTION_ID (Lakehouse v1) > CONNECTION_ID — Returns the connection ID for the current connection. Returns the connection ID for the current connection. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.connection_id() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.connection_id() ┌──────────────────────────────────────┐ │ func.connection_id() │ ├──────────────────────────────────────┤ │ 23cb06ec-583e-4eba-b790-7c8cf72a53f8 │ └──────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql CONNECTION_ID() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT CONNECTION_ID(); ┌──────────────────────────────────────┐ │ connection_id() │ ├──────────────────────────────────────┤ │ 23cb06ec-583e-4eba-b790-7c8cf72a53f8 │ └──────────────────────────────────────┘ ``` # CURRENT_CATALOG (Lakehouse v1) > CURRENT_CATALOG — returns the name of the catalog currently in use for the session. Returns the name of the catalog currently in use for the session. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql CURRENT_CATALOG() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT CURRENT_CATALOG(); ┌───────────────────┐ │ current_catalog() │ ├───────────────────┤ │ default │ └───────────────────┘ ``` # CURRENT_USER (Lakehouse v1) > CURRENT_USER — returns the user name and host name combination for the account that the server. Returns the user name and host name combination for the account that the server used to authenticate the current client. This account determines your access privileges. The return value is a string in the utf8 character set. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.current_user() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.current_user() ┌─────────────────────┐ │ func.current_user() │ ├─────────────────────┤ │ 'root'@'%' │ └─────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql CURRENT_USER() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT CURRENT_USER(); ┌────────────────┐ │ current_user() │ ├────────────────┤ │ 'root'@'%' │ └────────────────┘ ``` # DATABASE (Lakehouse v1) > DATABASE — Returns the name of the currently selected database. Returns the name of the currently selected database. If no database is selected, then this function returns `default`. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.database() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.database() ┌─────────────────┐ │ func.database() │ ├─────────────────┤ │ default │ └─────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql DATABASE() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT DATABASE(); ┌────────────┐ │ database() │ ├────────────┤ │ default │ └────────────┘ ``` # LAST_QUERY_ID (Lakehouse v1) > LAST_QUERY_ID — returns the last query ID of query in current session, index can be (-1, 1, 1+2). Returns the last query ID of query in current session, index can be (-1, 1, 1+2)…, out of range index will return empty string. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.last_query_id() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.last_query_id(-1) ┌──────────────────────────────────────┐ │ func.last_query_id((- 1)) │ ├──────────────────────────────────────┤ │ a6f615c6-5bad-4863-8558-afd01889448c │ └──────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql LAST_QUERY_ID() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT LAST_QUERY_ID(-1); ┌──────────────────────────────────────┐ │ last_query_id((- 1)) │ ├──────────────────────────────────────┤ │ a6f615c6-5bad-4863-8558-afd01889448c │ └──────────────────────────────────────┘ ``` # VERSION (Lakehouse v1) > VERSION — Returns the current version of PlaidCloud LakehouseQuery. Returns the current version of PlaidCloud LakehouseQuery. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```sql func.version() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```sql func.version() ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ func.version() │ ├───────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ PlaidCloud LakehouseQuery v1.2.252-nightly-193ed56304(rust-1.75.0-nightly-2023-12-12T22:07:25.371440000Z) │ └───────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql VERSION() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT VERSION(); ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ version() │ ├───────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ PlaidCloud LakehouseQuery v1.2.252-nightly-193ed56304(rust-1.75.0-nightly-2023-12-12T22:07:25.371440000Z) │ └───────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ``` # System Functions (Lakehouse v1) > Lakehouse v1 SQL system functions: inspect runtime state — version, current user, session, and database. This section provides reference information for the system-related functions in PlaidCloud Lakehouse. # CLUSTERING_INFORMATION (Lakehouse v1) > CLUSTERING_INFORMATION — Returns clustering information of a table. Returns clustering information of a table. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql CLUSTERING_INFORMATION('', '') ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql CREATE TABLE mytable(a int, b int) CLUSTER BY(a+1); INSERT INTO mytable VALUES(1,1),(3,3); INSERT INTO mytable VALUES(2,2),(5,5); INSERT INTO mytable VALUES(4,4); SELECT * FROM CLUSTERING_INFORMATION('default','mytable')\G *************************** 1. row *************************** cluster_key: ((a + 1)) total_block_count: 3 constant_block_count: 1 unclustered_block_count: 0 average_overlaps: 1.3333 average_depth: 2.0 block_depth_histogram: {"00002":3} ``` | Parameter | Description | | ------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------- | | cluster\_key | The defined cluster key. | | total\_block\_count | The current count of blocks. | | constant\_block\_count | The count of blocks where min/max values are equal, meaning each block contains only one (group of) cluster\_key value. | | unclustered\_block\_count | The count of blocks that have not yet been clustered. | | average\_overlaps | The average ratio of overlapping blocks within a given range. | | average\_depth | The average depth of overlapping partitions for the cluster key. | | block\_depth\_histogram | The number of partitions at each depth level. A higher concentration of partitions at lower depths indicates more effective table clustering. | # FUSE_BLOCK (Lakehouse v1) > FUSE_BLOCK — returns the block information of the latest or specified snapshot of a table. Returns the block information of the latest or specified snapshot of a table. For more information about what is block in PlaidCloud Lakehouse, see What are Snapshot, Segment, and Block?. The command returns the location information of each parquet file referenced by a snapshot. This enables downstream applications to access and consume the data stored in the files. See Also: * [FUSE\_SNAPSHOT](../fuse_snapshot) * [FUSE\_SEGMENT](../fuse_segment) ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql FUSE_BLOCK('', ''[, '']) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql CREATE TABLE mytable(c int); INSERT INTO mytable values(1); INSERT INTO mytable values(2); SELECT * FROM FUSE_BLOCK('default', 'mytable'); --- ┌──────────────────────────────────┬────────────────────────────┬────────────────────────────────────────────────────┬────────────┬────────────────────────────────────────────────────┬───────────────────┐ │ snapshot_id │ timestamp │ block_location │ block_size │ bloom_filter_location │ bloom_filter_size │ ├──────────────────────────────────┼────────────────────────────┼────────────────────────────────────────────────────┼────────────┼────────────────────────────────────────────────────┼───────────────────┤ │ 51e84b56458f44269b05a059b364a659 │ 2022-09-15 07:14:14.137268 │ 1/7/_b/39a6dbbfd9b44ad5a8ec8ab264c93cf5_v0.parquet │ 4 │ 1/7/_i/39a6dbbfd9b44ad5a8ec8ab264c93cf5_v1.parquet │ 221 │ │ 51e84b56458f44269b05a059b364a659 │ 2022-09-15 07:14:14.137268 │ 1/7/_b/d0ee9688c4d24d6da86acd8b0d6f4fad_v0.parquet │ 4 │ 1/7/_i/d0ee9688c4d24d6da86acd8b0d6f4fad_v1.parquet │ 219 │ └──────────────────────────────────┴────────────────────────────┴────────────────────────────────────────────────────┴────────────┴────────────────────────────────────────────────────┴───────────────────┘ ``` # FUSE_COLUMN (Lakehouse v1) > FUSE_COLUMN — returns the column information of the latest or specified snapshot of a table. Returns the column information of the latest or specified snapshot of a table. For more information about what is block in PlaidCloud Lakehouse, see What are Snapshot, Segment, and Block?. See Also: * [FUSE\_SNAPSHOT](../fuse_snapshot) * [FUSE\_SEGMENT](../fuse_segment) * [FUSE\_BLOCK](../fuse_block) ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql FUSE_COLUMN('', ''[, '']) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql CREATE TABLE mytable(c int); INSERT INTO mytable values(1); INSERT INTO mytable values(2); SELECT * FROM FUSE_COLUMN('default', 'mytable'); --- ┌──────────────────────────────────┬────────────────────────────┬─────────────────────────────────────────────────────────┬────────────┬───────────┬───────────┬─────────────┬─────────────┬───────────┬──────────────┬──────────────────┐ │ snapshot_id │ timestamp │ block_location │ block_size │ file_size │ row_count │ column_name │ column_type │ column_id │ block_offset │ bytes_compressed │ ├──────────────────────────────────┼────────────────────────────┼─────────────────────────────────────────────────────────┼────────────┼───────────┼───────────┼─────────────┼─────────────┼───────────┼──────────────┼──────────────────┤ │ 3faefc1a9b6a48f388a8b59228dd06c1 │ 2023-07-18 03:06:30.276502 │ 1/118746/_b/44df130c207745cb858928135d39c1c0_v2.parquet │ 4 │ 196 │ 1 │ c │ Int32 │ 0 │ 8 │ 14 │ │ 3faefc1a9b6a48f388a8b59228dd06c1 │ 2023-07-18 03:06:30.276502 │ 1/118746/_b/b6f8496d7e3f4f62a89c09572840cf70_v2.parquet │ 4 │ 196 │ 1 │ c │ Int32 │ 0 │ 8 │ 14 │ └──────────────────────────────────┴────────────────────────────┴─────────────────────────────────────────────────────────┴────────────┴───────────┴───────────┴─────────────┴─────────────┴───────────┴──────────────┴──────────────────┘ ``` # FUSE_ENCODING (Lakehouse v1) > FUSE_ENCODING — returns the encoding types applied to a specific column within a table. Returns the encoding types applied to a specific column within a table. It helps you understand how data is compressed and stored in a native format within the table. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql FUSE_ENCODING('', '', '') ``` The function returns a result set with the following columns: | Column | Data Type | Description | | ------------------ | ---------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | VALIDITY\_SIZE | Nullable(UInt32) | The size of a bitmap value that indicates whether each row in the column has a non-null value. This bitmap is used to track the presence or absence of null values in the column’s data. | | COMPRESSED\_SIZE | UInt32 | The size of the column data after compression. | | UNCOMPRESSED\_SIZE | UInt32 | The size of the column data before applying encoding. | | LEVEL\_ONE | String | The primary or initial encoding applied to the column. | | LEVEL\_TWO | Nullable(String) | A secondary or recursive encoding method applied to the column after the initial encoding. | ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql -- Create a table with an integer column 'c' and apply 'Lz4' compression CREATE TABLE t(c INT) STORAGE_FORMAT = 'native' COMPRESSION = 'lz4'; -- Insert data into the table. INSERT INTO t SELECT number FROM numbers(2048); -- Analyze the encoding for column 'c' in table 't' SELECT LEVEL_ONE, LEVEL_TWO, COUNT(*) FROM FUSE_ENCODING('default', 't', 'c') GROUP BY LEVEL_ONE, LEVEL_TWO; level_one |level_two|count(*)| ------------+---------+--------+ DeltaBitpack| | 1| -- Insert 2,048 rows with the value 1 into the table 't' INSERT INTO t (c) SELECT 1 FROM numbers(2048); SELECT LEVEL_ONE, LEVEL_TWO, COUNT(*) FROM FUSE_ENCODING('default', 't', 'c') GROUP BY LEVEL_ONE, LEVEL_TWO; level_one |level_two|count(*)| ------------+---------+--------+ OneValue | | 1| DeltaBitpack| | 1| ``` # FUSE_SEGMENT (Lakehouse v1) > FUSE_SEGMENT — returns the segment information of a specified table snapshot. Returns the segment information of a specified table snapshot. For more information about what is segment in PlaidCloud Lakehouse, see What are Snapshot, Segment, and Block?. See Also: * [FUSE\_SNAPSHOT](../fuse_snapshot) * [FUSE\_BLOCK](../fuse_block) ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql FUSE_SEGMENT('', '','') ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql CREATE TABLE mytable(c int); INSERT INTO mytable values(1); INSERT INTO mytable values(2); -- Obtain a snapshot ID SELECT snapshot_id FROM FUSE_SNAPSHOT('default', 'mytable') limit 1; --- ┌──────────────────────────────────┐ │ snapshot_id │ ├──────────────────────────────────┤ │ 82c572947efa476892bd7c0635158ba2 │ └──────────────────────────────────┘ SELECT * FROM FUSE_SEGMENT('default', 'mytable', '82c572947efa476892bd7c0635158ba2'); --- ┌────────────────────────────────────────────────────┬────────────────┬─────────────┬───────────┬────────────────────┬──────────────────┐ │ file_location │ format_version │ block_count │ row_count │ bytes_uncompressed │ bytes_compressed │ ├────────────────────────────────────────────────────┼────────────────┼─────────────┼───────────┼────────────────────┼──────────────────┤ │ 1/319/_sg/d35fe7bf99584301b22e8f6a8a9c97f9_v1.json │ 1 │ 1 │ 1 │ 4 │ 184 │ │ 1/319/_sg/c261059d47c840e1b749222dabb4b2bb_v1.json │ 1 │ 1 │ 1 │ 4 │ 184 │ └────────────────────────────────────────────────────┴────────────────┴─────────────┴───────────┴────────────────────┴──────────────────┘ ``` # FUSE_SNAPSHOT (Lakehouse v1) > FUSE_SNAPSHOT — Returns the snapshot information of a table. Returns the snapshot information of a table. For more information about what is snapshot in PlaidCloud Lakehouse, see What are Snapshot, Segment, and Block?. See Also: * [FUSE\_SEGMENT](../fuse_segment) * [FUSE\_BLOCK](../fuse_block) ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql FUSE_SNAPSHOT('', '') ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql CREATE TABLE mytable(a int, b int) CLUSTER BY(a+1); INSERT INTO mytable VALUES(1,1),(3,3); INSERT INTO mytable VALUES(2,2),(5,5); INSERT INTO mytable VALUES(4,4); SELECT * FROM FUSE_SNAPSHOT('default','mytable'); --- | snapshot_id | snapshot_location | format_version | previous_snapshot_id | segment_count | block_count | row_count | bytes_uncompressed | bytes_compressed | index_size | timestamp | |----------------------------------|------------------------------------------------------------|----------------|----------------------------------|---------------|-------------|-----------|--------------------|------------------|------------|----------------------------| | a13d211b7421432898a3786848b8ced3 | 670655/783287/_ss/a13d211b7421432898a3786848b8ced3_v1.json | 1 | \N | 1 | 1 | 2 | 16 | 290 | 363 | 2022-09-19 14:51:52.860425 | | cf08e6af6c134642aeb76bc81e6e7580 | 670655/783287/_ss/cf08e6af6c134642aeb76bc81e6e7580_v1.json | 1 | a13d211b7421432898a3786848b8ced3 | 2 | 2 | 4 | 32 | 580 | 726 | 2022-09-19 14:52:15.282943 | | 1bd4f68b831a402e8c42084476461aa1 | 670655/783287/_ss/1bd4f68b831a402e8c42084476461aa1_v1.json | 1 | cf08e6af6c134642aeb76bc81e6e7580 | 3 | 3 | 5 | 40 | 862 | 1085 | 2022-09-19 14:52:20.284347 | ``` # FUSE_STATISTIC (Lakehouse v1) > FUSE_STATISTIC — returns the estimated number of distinct values of each column in a table. Returns the estimated number of distinct values of each column in a table. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql FUSE_STATISTIC('', '') ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) You’re most likely to use this function together with `ANALYZE TABLE ` to generate and check the statistical information of a table. For more explanations and examples, see OPTIMIZE TABLE. # FUSE_TIME_TRAVEL_SIZE (Lakehouse v1) > FUSE_TIME_TRAVEL_SIZE — calculates the storage size of historical data (for Time Travel) for tables. Calculates the storage size of historical data (for Time Travel) for tables. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql -- Calculate historical data size for all tables in all databases SELECT ... FROM fuse_time_travel_size(); -- Calculate historical data size for all tables in a specified database SELECT ... FROM fuse_time_travel_size(''); -- Calculate historical data size for a specified table in a specified database SELECT ... FROM fuse_time_travel_size('', '')); ``` ## Output [Section titled “Output”](#output) The function returns a result set with the following columns: | Column | Description | | -------------------------------- | ----------------------------------------------------------------------------------------------------- | | `database_name` | The name of the database where the table is located. | | `table_name` | The name of the table. | | `is_dropped` | Indicates whether the table has been dropped (`true` for dropped tables, `false` otherwise). | | `time_travel_size` | The total storage size of historical data (for Time Travel) for the table, in bytes. | | `latest_snapshot_size` | The storage size of the latest snapshot of the table, in bytes. | | `data_retention_period_in_hours` | The retention period for Time Travel data in hours (`NULL` means using the default retention policy). | | `error` | Any error encountered while retrieving the storage size (`NULL` if no errors occurred). | ## SQL Examples [Section titled “SQL Examples”](#sql-examples) This example calculates the historical data for all tables in the `default` database: ```sql SELECT * FROM fuse_time_travel_size('default') ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ database_name │ table_name │ is_dropped │ time_travel_size │ latest_snapshot_size │ data_retention_period_in_hours │ error │ ├───────────────┼────────────┼────────────┼──────────────────┼──────────────────────┼────────────────────────────────┼──────────────────┤ │ default │ books │ true │ 2810 │ 1490 │ NULL │ NULL │ └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ``` # Table Functions (Lakehouse v1) > Lakehouse v1 SQL table functions: return tabular results — generators, splitters, and set-returning helpers. This section provides reference information for the table-related functions in PlaidCloud Lakehouse. # INFER_SCHEMA (Lakehouse v1) > INFER_SCHEMA — automatically detects the file metadata schema and retrieves the column definitions. Automatically detects the file metadata schema and retrieves the column definitions. Caution `infer_schema` currently only supports parquet file format. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql INFER_SCHEMA( LOCATION => '{ internalStage | externalStage }' [ PATTERN => ''] ) ``` Where: ### Internalstage [Section titled “Internalstage”](#internalstage) ```sql internalStage ::= @[/] ``` ### Externalstage [Section titled “Externalstage”](#externalstage) ```sql externalStage ::= @[/] ``` ### PATTERN = ‘regex\_pattern’ [Section titled “PATTERN = ‘regex\_pattern’”](#pattern--regex_pattern) A [PCRE2](https://www.pcre.org/current/doc/html/)-based regular expression pattern string, enclosed in single quotes, specifying the file names to match. see [below](#loading-data-with-pattern-matching) to see an example. For PCRE2 syntax, see . ## SQL Examples [Section titled “SQL Examples”](#sql-examples) Generate a parquet file in a stage: ```sql CREATE STAGE infer_parquet FILE_FORMAT = (TYPE = PARQUET); COPY INTO @infer_parquet FROM (SELECT * FROM numbers(10)) FILE_FORMAT = (TYPE = PARQUET); ``` ```sql LIST @infer_parquet; ┌───────────────────────────────────────────────────────┬──────┬────────────────────────────────────┬───────────────────────────────┬─────────┐ │ name │ size │ md5 │ last_modified │ creator │ ├───────────────────────────────────────────────────────┼──────┼────────────────────────────────────┼───────────────────────────────┼─────────┤ │ data_e0fd9cba-f45c-4c43-aa07-d6d87d134378_0_0.parquet │ 258 │ "7DCC9FFE04EA1F6882AED2CF9640D3D4" │ 2023-02-09 05:21:52.000 +0000 │ NULL │ └───────────────────────────────────────────────────────┴──────┴────────────────────────────────────┴───────────────────────────────┴─────────┘ ``` ### `infer_schema` [Section titled “infer\_schema”](#infer_schema) ```sql SELECT * FROM INFER_SCHEMA(location => '@infer_parquet/data_e0fd9cba-f45c-4c43-aa07-d6d87d134378_0_0.parquet'); ┌─────────────┬─────────────────┬──────────┬──────────┐ │ column_name │ type │ nullable │ order_id │ ├─────────────┼─────────────────┼──────────┼──────────┤ │ number │ BIGINT UNSIGNED │ 0 │ 0 │ └─────────────┴─────────────────┴──────────┴──────────┘ ``` ### `infer_schema` With Pattern Matching [Section titled “infer\_schema With Pattern Matching”](#infer_schema-with-pattern-matching) ```sql SELECT * FROM infer_schema(location => '@infer_parquet/', pattern => '.*parquet'); ┌─────────────┬─────────────────┬──────────┬──────────┐ │ column_name │ type │ nullable │ order_id │ ├─────────────┼─────────────────┼──────────┼──────────┤ │ number │ BIGINT UNSIGNED │ 0 │ 0 │ └─────────────┴─────────────────┴──────────┴──────────┘ ``` ### Create a Table From Parquet File [Section titled “Create a Table From Parquet File”](#create-a-table-from-parquet-file) The `infer_schema` can only display the schema of a parquet file and cannot create a table from it. To create a table from a parquet file: ```sql CREATE TABLE mytable AS SELECT * FROM @infer_parquet/ (pattern=>'.*parquet') LIMIT 0; DESC mytable; ┌────────┬─────────────────┬──────┬─────────┬───────┐ │ Field │ Type │ Null │ Default │ Extra │ ├────────┼─────────────────┼──────┼─────────┼───────┤ │ number │ BIGINT UNSIGNED │ NO │ 0 │ │ └────────┴─────────────────┴──────┴─────────┴───────┘ ``` # INSPECT_PARQUET (Lakehouse v1) > INSPECT_PARQUET — retrieves a table of comprehensive metadata from a staged Parquet file. Retrieves a table of comprehensive metadata from a staged Parquet file, including the following columns: | Column | Description | | ------------------------------------ | -------------------------------------------------------------- | | created\_by | The entity or source responsible for creating the Parquet file | | num\_columns | The number of columns in the Parquet file | | num\_rows | The total number of rows or records in the Parquet file | | num\_row\_groups | The count of row groups within the Parquet file | | serialized\_size | The size of the Parquet file on disk (compressed) | | max\_row\_groups\_size\_compressed | The size of the largest row group (compressed) | | max\_row\_groups\_size\_uncompressed | The size of the largest row group (uncompressed) | ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql INSPECT_PARQUET('@') ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) This example retrieves the metadata from a staged sample Parquet file named [books.parquet](https://datafuse-1253727613.cos.ap-hongkong.myqcloud.com/data/books.parquet). The file contains two records: books.parquet ```text Transaction Processing,Jim Gray,1992 Readings in Database Systems,Michael Stonebraker,2004 ``` ```sql -- Show the staged file LIST @my_internal_stage; ┌──────────────────────────────────────────────────────────────────────────────────────────────┐ │ name │ size │ md5 │ last_modified │ creator │ ├───────────────┼────────┼──────────────────┼───────────────────────────────┼──────────────────┤ │ books.parquet │ 998 │ NULL │ 2023-04-19 19:34:51.303 +0000 │ NULL │ └──────────────────────────────────────────────────────────────────────────────────────────────┘ -- Retrieve metadata from the staged file SELECT * FROM INSPECT_PARQUET('@my_internal_stage/books.parquet'); ┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ created_by │ num_columns │ num_rows │ num_row_groups │ serialized_size │ max_row_groups_size_compressed │ max_row_groups_size_uncompressed │ ├────────────────────────────────────┼─────────────┼──────────┼────────────────┼─────────────────┼────────────────────────────────┼──────────────────────────────────┤ │ parquet-cpp version 1.5.1-SNAPSHOT │ 3 │ 2 │ 1 │ 998 │ 332 │ 320 │ └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ``` # LIST_STAGE (Lakehouse v1) > LIST_STAGE — lists files in a stage. Lists files in a stage. This allows you to filter files in a stage based on their extensions and obtain comprehensive details about each file. The function is similar to the DDL command LIST STAGE FILES, but provides you the flexibility to retrieve specific file information with the SELECT statement, such as file name, size, MD5 hash, last modified timestamp, and creator, rather than all file information. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql LIST_STAGE( LOCATION => '{ internalStage | externalStage | userStage }' [ PATTERN => ''] ) ``` Where: ### Internalstage [Section titled “Internalstage”](#internalstage) ```sql internalStage ::= @[/] ``` ### Externalstage [Section titled “Externalstage”](#externalstage) ```sql externalStage ::= @[/] ``` ### Userstage [Section titled “Userstage”](#userstage) ```sql userStage ::= @~[/] ``` ### PATTERN [Section titled “PATTERN”](#pattern) See COPY INTO table. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT * FROM list_stage(location => '@my_stage/', pattern => '.*[.]log'); ┌────────────────┬──────┬────────────────────────────────────┬───────────────────────────────┬─────────┐ │ name │ size │ md5 │ last_modified │ creator │ ├────────────────┼──────┼────────────────────────────────────┼───────────────────────────────┼─────────┤ │ 2023/meta.log │ 475 │ "4208ff530b252236e14b3cd797abdfbd" │ 2023-04-19 20:23:24.000 +0000 │ NULL │ │ 2023/query.log │ 1348 │ "1c6654b207472c277fc8c6207c035e18" │ 2023-04-19 20:23:24.000 +0000 │ NULL │ └────────────────┴──────┴────────────────────────────────────┴───────────────────────────────┴─────────┘ -- Equivalent to the following statement: LIST @my_stage PATTERN = '.log'; ``` # RESULT_SCAN (Lakehouse v1) > RESULT_SCAN — returns the result set of a previous command in same session as if the result was a table. Returns the result set of a previous command in same session as if the result was a table. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql RESULT_SCAN( { '' | LAST_QUERY_ID() } ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) Create a simple table: ```sql CREATE TABLE t1(a int); ``` Insert some values; ```sql INSERT INTO t1(a) VALUES (1), (2), (3); ``` ### `result_scan` [Section titled “result\_scan”](#result_scan) ```bash SELECT * FROM t1 ORDER BY a; ┌───────┐ │ a │ ├───────┤ │ 1 │ ├───────┤ │ 2 │ ├───────┤ │ 3 │ └───────┘ ``` ```bash SELECT * FROM RESULT_SCAN(LAST_QUERY_ID()) ORDER BY a; ┌───────┐ │ a │ ├───────┤ │ 1 │ ├───────┤ │ 2 │ ├───────┤ │ 3 │ └───────┘ ``` # GENERATE_SERIES (Lakehouse v1) > GENERATE_SERIES — generates a dataset starting from a specified point, ending at another. Generates a dataset starting from a specified point, ending at another specified point, and optionally with an incrementing value. The GENERATE\_SERIES function works with the following data types: * Integer * Date * Timestamp ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.generate_series(, [, ]) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.generate_series(1, 10, 2); generate_series| ---------------+ 1| 3| 5| 7| 9| ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql GENERATE_SERIES(, [, ]) ``` ## Arguments [Section titled “Arguments”](#arguments) | Argument | Description | | -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | start | The starting value, representing the first number, date, or timestamp in the sequence. | | stop | The ending value, representing the last number, date, or timestamp in the sequence. | | step\_interval | The step interval, determining the difference between adjacent values in the sequence. For integer sequences, the default value is 1. For date sequences, the default step interval is 1 day. For timestamp sequences, the default step interval is 1 microsecond. | Note When dealing with functions like GENERATE\_SERIES and RANGE, a key distinction lies in their boundary traits. GENERATE\_SERIES is bound by both the left and right sides, while RANGE is bound on the left side only. For example, utilizing RANGE(1, 11) is equivalent to GENERATE\_SERIES(1, 10). ## Return Type [Section titled “Return Type”](#return-type) Returns a list containing a continuous sequence of numeric values, dates, or timestamps from *start* to *stop*. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ### SQL Examples 1: Generating Numeric, Date, and Timestamp Data [Section titled “SQL Examples 1: Generating Numeric, Date, and Timestamp Data”](#sql-examples-1-generating-numeric-date-and-timestamp-data) ```sql SELECT * FROM GENERATE_SERIES(1, 10, 2); generate_series| ---------------+ 1| 3| 5| 7| 9| SELECT * FROM GENERATE_SERIES('2023-03-20'::date, '2023-03-27'::date); generate_series| ---------------+ 2023-03-20| 2023-03-21| 2023-03-22| 2023-03-23| 2023-03-24| 2023-03-25| 2023-03-26| 2023-03-27| SELECT * FROM GENERATE_SERIES('2023-03-26 00:00'::timestamp, '2023-03-27 12:00'::timestamp, 86400000000); generate_series | -------------------+ 2023-03-26 00:00:00| 2023-03-27 00:00:00| ``` ### SQL Examples 2: Filling Query Result Gaps [Section titled “SQL Examples 2: Filling Query Result Gaps”](#sql-examples-2-filling-query-result-gaps) This example uses the GENERATE\_SERIES function and left join operator to handle gaps in query results caused by missing information in specific ranges. ```sql CREATE TABLE t_metrics ( date Date, value INT ); INSERT INTO t_metrics VALUES ('2020-01-01', 200), ('2020-01-01', 300), ('2020-01-04', 300), ('2020-01-04', 300), ('2020-01-05', 400), ('2020-01-10', 700); SELECT date, SUM(value), COUNT() FROM t_metrics GROUP BY date ORDER BY date; date |sum(value)|count()| ----------+----------+-------+ 2020-01-01| 500| 2| 2020-01-04| 600| 2| 2020-01-05| 400| 1| 2020-01-10| 700| 1| ``` To close the gaps between January 1st and January 10th, 2020, use the following query: ```sql SELECT t.date, COALESCE(SUM(t_metrics.value), 0), COUNT(t_metrics.value) FROM generate_series( '2020-01-01'::Date, '2020-01-10'::Date ) AS t(date) LEFT JOIN t_metrics ON t_metrics.date = t.date GROUP BY t.date ORDER BY t.date; date |coalesce(sum(t_metrics.value), 0)|count(t_metrics.value)| ----------+---------------------------------+----------------------+ 2020-01-01| 500| 2| 2020-01-02| 0| 0| 2020-01-03| 0| 0| 2020-01-04| 600| 2| 2020-01-05| 400| 1| 2020-01-06| 0| 0| 2020-01-07| 0| 0| 2020-01-08| 0| 0| 2020-01-09| 0| 0| 2020-01-10| 700| 1| ``` # SHOW_GRANTS (Lakehouse v1) > SHOW_GRANTS — lists privileges explicitly granted to a user, to a role, or on a specific object. Lists privileges explicitly granted to a user, to a role, or on a specific object. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql SHOW_GRANTS('role', '') SHOW_GRANTS('user', '') SHOW_GRANTS('stage', '') SHOW_GRANTS('udf', '') SHOW_GRANTS('table', '', '', '') SHOW_GRANTS('database', '', '') ``` ## Configuring `enable_expand_roles` Setting [Section titled “Configuring enable\_expand\_roles Setting”](#configuring-enable_expand_roles-setting) The `enable_expand_roles` setting controls whether the SHOW\_GRANTS function expands role inheritance when displaying privileges. * `enable_expand_roles=1` (default): * SHOW\_GRANTS recursively expands inherited privileges, meaning that if a role has been granted another role, it will display all the inherited privileges. * Users will also see all privileges granted through their assigned roles. * `enable_expand_roles=0`: * SHOW\_GRANTS only displays privileges that are directly assigned to the specified role or user. * However, the result will still include GRANT ROLE statements to indicate role inheritance. For example, role `a` has the `SELECT` privilege on `t1`, and role `b` has the `SELECT` privilege on `t2`: ```sql SELECT grants FROM show_grants('role', 'a') ORDER BY object_id; ┌──────────────────────────────────────────────────────┐ │ grants │ ├──────────────────────────────────────────────────────┤ │ GRANT SELECT ON 'default'.'default'.'t1' TO ROLE `a` │ └──────────────────────────────────────────────────────┘ SELECT grants FROM show_grants('role', 'b') ORDER BY object_id; ┌──────────────────────────────────────────────────────┐ │ grants │ ├──────────────────────────────────────────────────────┤ │ GRANT SELECT ON 'default'.'default'.'t2' TO ROLE `b` │ └──────────────────────────────────────────────────────┘ ``` If you grant role `b` to role `a` and check the grants on role `a` again, you can see than the `SELECT` privilege on `t2` is now included in role `a`: ```sql GRANT ROLE b TO ROLE a; ``` ```sql SELECT grants FROM show_grants('role', 'a') ORDER BY object_id; ┌──────────────────────────────────────────────────────┐ │ grants │ ├──────────────────────────────────────────────────────┤ │ GRANT SELECT ON 'default'.'default'.'t1' TO ROLE `a` │ │ GRANT SELECT ON 'default'.'default'.'t2' TO ROLE `a` │ └──────────────────────────────────────────────────────┘ ``` If you set `enable_expand_roles` to `0` and check the grants on role `a` again, the result will show the `GRANT ROLE` statement instead of listing the specific privileges inherited from role `b`: ```sql SET enable_expand_roles=0; ``` ```sql SELECT grants FROM show_grants('role', 'a') ORDER BY object_id; ┌──────────────────────────────────────────────────────┐ │ grants │ ├──────────────────────────────────────────────────────┤ │ GRANT SELECT ON 'default'.'default'.'t1' TO ROLE `a` │ │ GRANT ROLE b to ROLE `a` │ │ GRANT ROLE public to ROLE `a` │ └──────────────────────────────────────────────────────┘ ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) This example illustrates how to list privileges granted to a user, a role, and on a specific object. ```sql -- Create a new user CREATE USER 'user1' IDENTIFIED BY 'password'; -- Create a new role CREATE ROLE analyst; -- Grant the analyst role to the user GRANT ROLE analyst TO 'user1'; -- Create a stage CREATE STAGE my_stage; -- Grant privileges on the stage to the role GRANT READ ON STAGE my_stage TO ROLE analyst; -- List privileges granted to the user SELECT * FROM SHOW_GRANTS('user', 'user1'); ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ privileges │ object_name │ object_id │ grant_to │ name │ grants │ ├────────────┼─────────────┼──────────────────┼──────────┼────────┼─────────────────────────────────────────────┤ │ Read │ my_stage │ NULL │ USER │ user1 │ GRANT Read ON STAGE my_stage TO 'user1'@'%' │ └───────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ -- List privileges granted to the role SELECT * FROM SHOW_GRANTS('role', 'analyst'); ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ privileges │ object_name │ object_id │ grant_to │ name │ grants │ ├────────────┼─────────────┼──────────────────┼──────────┼─────────┼────────────────────────────────────────────────┤ │ Read │ my_stage │ NULL │ ROLE │ analyst │ GRANT Read ON STAGE my_stage TO ROLE `analyst` │ └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ -- List privileges granted on the stage SELECT * FROM SHOW_GRANTS('stage', 'my_stage'); ┌─────────────────────────────────────────────────────────────────────────────────────┐ │ privileges │ object_name │ object_id │ grant_to │ name │ grants │ ├────────────┼─────────────┼──────────────────┼──────────┼─────────┼──────────────────┤ │ Read │ my_stage │ NULL │ ROLE │ analyst │ │ └─────────────────────────────────────────────────────────────────────────────────────┘ ``` # STREAM_STATUS (Lakehouse v1) > STREAM_STATUS — returns whether a stream has change-data-capture records to consume (true/false). Provides information about the status of a specified stream, yielding a single-column result (`has_data`) that can take on values of `true` or `false`: * `true`: Indicates that the stream **might contain** change data capture records. * `false`: Indicates that the stream currently does not contain any change data capture records. Note The presence of a `true` in the result (`has_data`) does **not** ensure the definite existence of change data capture records. Other operations, such as performing a table compact operation, could also lead to a `true` value even when there are no actual change data capture records. Note When using `STREAM_STATUS` in tasks, you must include the database name when referencing the stream (e.g., `STREAM_STATUS('mydb.stream_name')`). ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql SELECT * FROM STREAM_STATUS('.'); -- OR SELECT * FROM STREAM_STATUS(''); -- Uses current database ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql -- Create a table 't' with a column 'c' CREATE TABLE t (c int); -- Create a stream 's' on the table 't' CREATE STREAM s ON TABLE t; -- Check the initial status of the stream 's' SELECT * FROM STREAM_STATUS('s'); -- The result should be 'false' indicating no change data capture records initially ┌──────────┐ │ has_data │ ├──────────┤ │ false │ └──────────┘ -- Insert a value into the table 't' INSERT INTO t VALUES (1); -- Check the updated status of the stream 's' after the insertion SELECT * FROM STREAM_STATUS('s'); -- The result should now be 'true' indicating the presence of change data capture records ┌──────────┐ │ has_data │ ├──────────┤ │ true │ └──────────┘ -- Example with database name specified SELECT * FROM STREAM_STATUS('mydb.s'); ``` # TASK_HISTORY (Lakehouse v1) > TASK_HISTORY — Displays task running history given variables. Displays task running history given variables. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TASK_HISTORY( [ SCHEDULED_TIME_RANGE_START => ] [, SCHEDULED_TIME_RANGE_END => ] [, RESULT_LIMIT => ] [, TASK_NAME => '' ] [, ERROR_ONLY => { TRUE | FALSE } ] [, ROOT_TASK_ID => ''] ) ``` ## Arguments [Section titled “Arguments”](#arguments) All the arguments are optional. `SCHEDULED_TIME_RANGE_START => `, `SCHEDULED_TIME_RANGE_END => ` Time range (in TIMESTAMP\_LTZ format), within the last 7 days, in which the task execution was scheduled. If the time range does not fall within the last 7 days, an error is returned. * If `SCHEDULED_TIME_RANGE_END` is not specified, the function returns those tasks that have already completed, are currently running, or are scheduled in the future. * If `SCHEDULED_TIME_RANGE_END` is CURRENT\_TIMESTAMP, the function returns those tasks that have already completed or are currently running. Note that a task that is executed immediately before the current time might still be identified as scheduled. * To query only those tasks that have already completed or are currently running, include `WHERE query_id IS NOT NULL` as a filter. The QUERY\_ID column in the TASK\_HISTORY output is populated only when a task has started running. If no start or end time is specified, the most recent tasks are returned, up to the specified RESULT\_LIMIT value. `RESULT_LIMIT => ` A number specifying the maximum number of rows returned by the function. If the number of matching rows is greater than this limit, the task executions with the most recent timestamp are returned, up to the specified limit. Range: `1` to `10000` Default: `100`. `TASK_NAME => ` A case-insensitive string specifying a task. Only non-qualified task names are supported. Only executions of the specified task are returned. Note that if multiple tasks have the same name, the function returns the history for each of these tasks. `ERROR_ONLY => { TRUE | FALSE }` When set to TRUE, this function returns only task runs that failed or were cancelled. `ROOT_TASK_ID => ` Unique identifier for the root task in a task graph. This ID matches the ID column value in the SHOW TASKS output for the same task. Specify the ROOT\_TASK\_ID to show the history of the root task and any child tasks that are part of the task graph. ## Usage Notes [Section titled “Usage Notes”](#usage-notes) * This function returns a maximum of 10,000 rows, set in the RESULT\_LIMIT argument value. The default value is 100. * This function returns results only for the ACCOUNTADMIN role. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT * FROM TASK_HISTORY() order by scheduled_time; ``` The above SQL query retrieves all task history records from the TASK\_HISTORY function, ordered by the scheduled\_time column.(maximum 10,000) ```sql SELECT * FROM TASK_HISTORY( SCHEDULED_TIME_RANGE_START=>TO_TIMESTAMP('2022-01-02T01:12:00-07:00'), SCHEDULED_TIME_RANGE_END=>TO_TIMESTAMP('2022-01-02T01:12:30-07:00')) ``` The above SQL query retrieves all task history records from the TASK\_HISTORY function where the scheduled time range starts at ‘2022-01-02T01:12:00-07:00’ and ends at ‘2022-01-02T01:12:30-07:00’. This means it will return the tasks that were scheduled to run within this specific 30-second time window. The result will include details of the tasks that match this criteria. # Sequence Functions (Lakehouse v1) > Lakehouse v1 SQL sequence functions: detect ordered sequences and event chains within partitioned data. This section provides reference information for the sequence functions in PlaidCloud Lakehouse. # NEXTVAL (Lakehouse v1) > NEXTVAL — Retrieves the next value from a sequence. Retrieves the next value from a sequence. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql NEXTVAL() ``` ## Return Type [Section titled “Return Type”](#return-type) Integer. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) This example demonstrates how the NEXTVAL function works with a sequence: ```sql CREATE SEQUENCE my_seq; SELECT NEXTVAL(my_seq), NEXTVAL(my_seq), NEXTVAL(my_seq); ┌─────────────────────────────────────────────────────┐ │ nextval(my_seq) │ nextval(my_seq) │ nextval(my_seq) │ ├─────────────────┼─────────────────┼─────────────────┤ │ 1 │ 2 │ 3 │ └─────────────────────────────────────────────────────┘ ``` This example showcases how sequences and the NEXTVAL function are employed to automatically generate and assign unique identifiers to rows in a table. ```sql -- Create a new sequence named staff_id_seq CREATE SEQUENCE staff_id_seq; -- Create a new table named staff with columns for staff_id, name, and department CREATE TABLE staff ( staff_id INT, name VARCHAR(50), department VARCHAR(50) ); -- Insert a new row into the staff table, using the next value from the staff_id_seq sequence for the staff_id column INSERT INTO staff (staff_id, name, department) VALUES (NEXTVAL(staff_id_seq), 'John Doe', 'HR'); -- Insert another row into the staff table, using the next value from the staff_id_seq sequence for the staff_id column INSERT INTO staff (staff_id, name, department) VALUES (NEXTVAL(staff_id_seq), 'Jane Smith', 'Finance'); SELECT * FROM staff; ┌───────────────────────────────────────────────────────┐ │ staff_id │ name │ department │ ├─────────────────┼──────────────────┼──────────────────┤ │ 2 │ Jane Smith │ Finance │ │ 1 │ John Doe │ HR │ └───────────────────────────────────────────────────────┘ ``` # Dictionary Functions (Lakehouse v1) > Lakehouse v1 SQL dictionary functions: look up values from dictionary objects for low-latency joins and enrichment. This section provides reference information for the dictionary functions in PlaidCloud Lakehouse. # DICT_GET (Lakehouse v1) > DICT_GET — retrieves the value of a specified attribute from a dictionary using a provided. Retrieves the value of a specified attribute from a dictionary using a provided key expression. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql DICT_GET([db_name.], '', ) ``` | Parameter | Description | | ---------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- | | dict\_name | The name of the dictionary. | | attr\_name | The name of the attribute in the dictionary that you want to retrieve the value for. | | key\_expr | The key expression used to locate a specific entry in the dictionary. It represents the value of the dictionary’s primary key to retrieve the corresponding data. | ## SQL Examples [Section titled “SQL Examples”](#sql-examples) # Test Functions (Lakehouse v1) > Lakehouse v1 SQL test functions: assertion and validation helpers for testing data values. This section provides reference information for the test functions in PlaidCloud Lakehouse. # SLEEP (Lakehouse v1) > SLEEP — sleeps seconds seconds on each data block. Sleeps `seconds` seconds on each data block. !!! warning Only used for testing where sleep is required. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql SLEEP(seconds) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | -------------------------------------------------------------- | | seconds | Must be a constant column of any nonnegative number or float.| | ## Return Type [Section titled “Return Type”](#return-type) UInt8 ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT sleep(2); ┌──────────┐ │ sleep(2) │ ├──────────┤ │ 0 │ └──────────┘ ``` # Other Functions (Lakehouse v1) > Lakehouse v1 SQL other functions: miscellaneous helpers that don't fit other categories. This section provides reference information for other miscellaneous functions in PlaidCloud Lakehouse. # ASSUME_NOT_NULL (Lakehouse v1) > ASSUME_NOT_NULL — results in an equivalent non-Nullable value for a Nullable type. Results in an equivalent non-`Nullable` value for a Nullable type. In case the original value is `NULL` the result is undetermined. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.assume_not_null() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python With a table like: ┌────────────────────┐ │ x │ y │ ├────────────────────┤ │ 1 │ NULL │ │ 2 │ 3 │ └────────────────────┘ func.assume_not_null(y) ┌─────────────────────────┐ │ func.assume_not_null(y) │ ├─────────────────────────┤ │ 0 │ │ 3 │ └─────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ASSUME_NOT_NULL() ``` ## Aliases [Section titled “Aliases”](#aliases) * [REMOVE\_NULLABLE](../remove-nullable) ## Return Type [Section titled “Return Type”](#return-type) Returns the original datatype from the non-`Nullable` type; Returns the embedded non-`Nullable` datatype for `Nullable` type. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql CREATE TABLE default.t_null ( x int, y int null); INSERT INTO default.t_null values (1, null), (2, 3); SELECT ASSUME_NOT_NULL(y), REMOVE_NULLABLE(y) FROM t_null; ┌─────────────────────────────────────────┐ │ assume_not_null(y) │ remove_nullable(y) │ ├────────────────────┼────────────────────┤ │ 0 │ 0 │ │ 3 │ 3 │ └─────────────────────────────────────────┘ ``` # EXISTS (Lakehouse v1) > EXISTS — the exists condition is used in combination with a subquery and is considered \"to be. The exists condition is used in combination with a subquery and is considered “to be met” if the subquery returns at least one row. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql WHERE EXISTS ( ); ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT number FROM numbers(5) AS A WHERE exists (SELECT * FROM numbers(3) WHERE number=1); ┌────────┐ │ number │ ├────────┤ │ 0 │ │ 1 │ │ 2 │ │ 3 │ │ 4 │ └────────┘ ``` # GROUPING (Lakehouse v1) > GROUPING — returns a bit mask indicating which GROUP BY expressions are not included in the current grouping set. Returns a bit mask indicating which `GROUP BY` expressions are not included in the current grouping set. Bits are assigned with the rightmost argument corresponding to the least-significant bit; each bit is 0 if the corresponding expression is included in the grouping criteria of the grouping set generating the current result row, and 1 if it is not included. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql GROUPING ( expr [, expr, ...] ) ``` Note `GROUPING` can only be used with `GROUPING SETS`, `ROLLUP`, or `CUBE`, and its arguments must be in the grouping sets list. ## Arguments [Section titled “Arguments”](#arguments) Grouping sets items. ## Return Type [Section titled “Return Type”](#return-type) UInt32. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql select a, b, grouping(a), grouping(b), grouping(a,b), grouping(b,a) from t group by grouping sets ((a,b),(a),(b), ()) ; ┌──────┬──────┬─────────────┬─────────────┬────────────────┬────────────────┐ │ a │ b │ grouping(a) │ grouping(b) │ grouping(a, b) │ grouping(b, a) │ ├──────┼──────┼─────────────┼─────────────┼────────────────┼────────────────┤ │ NULL │ A │ 1 │ 0 │ 2 │ 1 │ │ a │ NULL │ 0 │ 1 │ 1 │ 2 │ │ b │ A │ 0 │ 0 │ 0 │ 0 │ │ NULL │ NULL │ 1 │ 1 │ 3 │ 3 │ │ a │ A │ 0 │ 0 │ 0 │ 0 │ │ b │ B │ 0 │ 0 │ 0 │ 0 │ │ b │ NULL │ 0 │ 1 │ 1 │ 2 │ │ a │ B │ 0 │ 0 │ 0 │ 0 │ │ NULL │ B │ 1 │ 0 │ 2 │ 1 │ └──────┴──────┴─────────────┴─────────────┴────────────────┴────────────────┘ ``` # HUMANIZE_NUMBER (Lakehouse v1) > HUMANIZE_NUMBER — returns a readable number. Returns a readable number. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.humanize_number(x); ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.humanize_number(1000 * 1000) ┌─────────────────────────────────────┐ │ func.humanize_number((1000 * 1000)) │ ├─────────────────────────────────────┤ │ 1 million │ └─────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql HUMANIZE_NUMBER(x); ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ------------------- | | x | The numerical size. | ## Return Type [Section titled “Return Type”](#return-type) String. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT HUMANIZE_NUMBER(1000 * 1000) ┌─────────────────────────┐ │ HUMANIZE_NUMBER((1000 * 1000)) │ ├─────────────────────────┤ │ 1 million │ └─────────────────────────┘ ``` # HUMANIZE_SIZE (Lakehouse v1) > HUMANIZE_SIZE — Returns the readable size with a suffix(KiB, MiB, etc). Returns the readable size with a suffix(KiB, MiB, etc). ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.humanize_size(x); ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.humanize_size(1024 * 1024) ┌────────────────────────────────────────┐ │ func.func.humanize_size((1024 * 1024)) │ ├────────────────────────────────────────┤ │ 1 MiB │ └────────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql HUMANIZE_SIZE(x); ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ------------------- | | x | The numerical size. | ## Return Type [Section titled “Return Type”](#return-type) String. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT HUMANIZE_SIZE(1024 * 1024) ┌─────────────────────────┐ │ HUMANIZE_SIZE((1024 * 1024)) │ ├─────────────────────────┤ │ 1 MiB │ └─────────────────────────┘ ``` # IGNORE (Lakehouse v1) > IGNORE — by using insert ignore statement, the rows with invalid data that cause the error are. By using insert ignore statement, the rows with invalid data that cause the error are ignored and the rows with valid data are inserted into the table. ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql INSERT ignore INTO TABLE(column_list) VALUES( value_list), ( value_list), ... ``` # REMOVE_NULLABLE (Lakehouse v1) > REMOVE_NULLABLE — alias for the ASSUME_NOT_NULL utility function. Alias for [ASSUME\_NOT\_NULL](../assume-not-null). # TO_NULLABLE (Lakehouse v1) > TO_NULLABLE — Converts a value to its nullable equivalent. Converts a value to its nullable equivalent. When you apply this function to a value, it checks if the value is already able to hold NULL values or not. If the value is already able to hold NULL values, the function will return the value without making any changes. However, if the value is not able to hold NULL values, the TO\_NULLABLE function will modify the value to make it able to hold NULL values. It does this by wrapping the value in a structure that can hold NULL values, which means the value can now hold NULL values in the future. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_nullable(x); ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.typeof(3), func.to_nullable(3), func.typeof(func.to_nullable(3)) func.typeof(3) | func.to_nullable(3) | func.typeof(func.to_nullable(3)) | -----------------+---------------------+----------------------------------+ TINYINT UNSIGNED | 3 | TINYINT UNSIGNED NULL | ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_NULLABLE(x); ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ------------------- | | x | The original value. | ## Return Type [Section titled “Return Type”](#return-type) Returns a value of the same data type as the input value, but wrapped in a nullable container if the input value is not already nullable. ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT typeof(3), TO_NULLABLE(3), typeof(TO_NULLABLE(3)); typeof(3) |to_nullable(3)|typeof(to_nullable(3))| ----------------+--------------+----------------------+ TINYINT UNSIGNED| 3|TINYINT UNSIGNED NULL | ``` # TYPEOF (Lakehouse v1) > TYPEOF — TYPEOF function is used to return the name of a data type. TYPEOF function is used to return the name of a data type. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.typeof( ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.typeof(1) ┌──────────────────┐ │ func.typeof(1) │ ├──────────────────┤ │ INT │ └──────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TYPEOF( ) ``` ## Arguments [Section titled “Arguments”](#arguments) | Arguments | Description | | --------- | ----------------------------------------------------------------------------------------------- | | `` | Any expression. This may be a column name, the result of another function, or a math operation. | ## Return Type [Section titled “Return Type”](#return-type) String ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT typeof(1::INT); ┌──────────────────┐ │ typeof(1::Int32) │ ├──────────────────┤ │ INT │ └──────────────────┘ ``` # Lakehouse v2 Expressions > Lakehouse v2 expressions based on StarRocks 4.1 SQL functions with SQLAlchemy references. Lakehouse v2 is built on [StarRocks](https://www.starrocks.io/) 4.1. For each function below, this site provides PlaidCloud-flavored syntax and examples; for the canonical upstream reference (with all edge cases and argument variants), see the **[StarRocks SQL function reference](https://docs.starrocks.io/docs/sql-reference/sql-functions/)**. ## Aggregate Functions [Section titled “Aggregate Functions”](#aggregate-functions) * [Aggregate Functions](./01-aggregate-functions) — Calculate summaries like sum, average, count, etc. * [Percentile Functions](./15-percentile-functions) — Calculate percentile values and manage percentile states ## Scalar Functions [Section titled “Scalar Functions”](#scalar-functions) * [Array Functions](./02-array-functions) — Perform array operations * [Binary Functions](./03-binary-functions) — Convert between binary and string formats * [Bit Functions](./04-bit-functions) — Perform bitwise operations * [Bitmap Functions](./05-bitmap-functions) — Perform bitmap operations and manipulations * [Condition Functions](./06-condition-functions) — Implement conditional logic and case statements * [Cryptographic Functions](./07-cryptographic-functions) — Encrypt, decrypt, and hash data * [Date and Time Functions](./08-date-time-functions) — Manipulate and format dates and times * [Dictionary Functions](./09-dictionary-functions) — Look up values in dictionary tables * [Hash Functions](./10-hash-functions) — Generate hash values using MurmurHash and xxHash * [JSON Functions](./11-json-functions) — Create, query, and manipulate JSON data * [Map Functions](./12-map-functions) — Create and manipulate map data structures * [Math Functions](./13-math-functions) — Perform calculations and numeric operations * [Pattern Matching Functions](./14-pattern-matching-functions) — Match strings with LIKE and regular expressions * [Scalar Functions (HLL)](./16-scalar-functions) — HyperLogLog cardinality estimation functions * [Spatial Functions](./17-spatial-functions) — Handle and manipulate geospatial data * [String Functions](./18-string-functions) — Manipulate strings and perform text operations * [Struct Functions](./19-struct-functions) — Create and work with struct data types * [Variant Functions](./22-variant-functions) — Query and inspect VARIANT data types ## Table Functions [Section titled “Table Functions”](#table-functions) * [Table Functions](./20-table-functions) — Return results in a tabular format ## Utility and Meta Functions [Section titled “Utility and Meta Functions”](#utility-and-meta-functions) * [Utility Functions](./21-utility-functions) — Access system information and session utilities * [Meta Functions](./23-meta-functions) — Inspect materialized views, memory, and diagnostics # Aggregate Functions (Lakehouse v2) > Lakehouse v2 SQL aggregate functions: summarise rows — SUM, AVG, MIN, MAX, COUNT, and statistical aggregates. This section provides reference information for the aggregate functions in PlaidCloud Lakehouse. ## Functions [Section titled “Functions”](#functions) * [ANY\_VALUE](any-value/) * [APPROX\_COUNT\_DISTINCT](approx-count-distinct/) * [APPROX\_TOP\_K](approx-top-k/) * [AVG](avg/) * [BITMAP](bitmap/) * [BOOL\_OR](bool-or/) * [CORR](corr/) * [COUNT\_IF](count-if/) * [COUNT](count/) * [COVAR\_POP](covar-pop/) * [COVAR\_SAMP](covar-samp/) * [DS\_HLL\_ACCUMULATE](ds-hll-accumulate/) * [DS\_HLL\_COMBINE](ds-hll-combine/) * [DS\_HLL\_COUNT\_DISTINCT](ds-hll-count-distinct/) * [DS\_HLL\_ESTIMATE](ds-hll-estimate/) * [DS\_THETA\_COUNT\_DISTINCT](ds-theta-count-distinct/) * [GROUP\_CONCAT](group-concat/) * [GROUPING\_ID](grouping-id/) * [GROUPING](grouping/) * [HLL\_RAW\_AGG](hll-raw-agg/) * [HLL\_UNION\_AGG](hll-union-agg/) * [HLL\_UNION](hll-union/) * [MANN\_WHITNEY\_U\_TEST](mann-whitney-u-test/) * [MAX\_BY](max-by/) * [MAX](max/) * [MIN\_BY](min-by/) * [MIN](min/) * [MULTI\_DISTINCT\_COUNT](multi-distinct-count/) * [MULTI\_DISTINCT\_SUM](multi-distinct-sum/) * [PERCENTILE\_APPROX\_WEIGHT](percentile-approx-weight/) * [PERCENTILE\_APPROX](percentile-approx/) * [PERCENTILE\_CONT](percentile-cont/) * [PERCENTILE\_DISC\_LC](percentile-disc-lc/) * [PERCENTILE\_DISC](percentile-disc/) * [RETENTION](retention/) * [STD](std/) * [STDDEV\_POP](stddev-pop/) * [STDDEV\_SAMP](stddev-samp/) * [STDDEV](stddev/) * [SUM\_MAP](sum-map/) * [SUM](sum/) * [VAR\_POP](var-pop/) * [VAR\_SAMP](var-samp/) * [VARIANCE\_POP](variance-pop/) * [VARIANCE\_SAMP](variance-samp/) * [VARIANCE](variance/) * [WINDOW\_FUNNEL](window-funnel/) # ANY_VALUE (Lakehouse v2) > ANY_VALUE — Returns any arbitrary value from a group of rows. Returns any arbitrary value from a group of rows. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.any_value(get_column(table, 'department')) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.any_value(get_column(table, 'department')) ┌───────┐ │ Sales │ └───────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ANY_VALUE() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ANY_VALUE(department) FROM employees; ┌───────┐ │ Sales │ └───────┘ ``` # APPROX_COUNT_DISTINCT (Lakehouse v2) > APPROX_COUNT_DISTINCT — returns an approximate count of distinct values using HyperLogLog. Returns an approximate count of distinct values using HyperLogLog. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.approx_count_distinct(get_column(table, 'user_id')) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.approx_count_distinct(get_column(table, 'user_id')) ┌─────┐ │ 985 │ └─────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql APPROX_COUNT_DISTINCT() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT APPROX_COUNT_DISTINCT(user_id) FROM page_views; ┌─────┐ │ 985 │ └─────┘ ``` # APPROX_TOP_K (Lakehouse v2) > APPROX_TOP_K — returns the top-k most frequent values and their approximate counts. Returns the top-k most frequent values and their approximate counts. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.approx_top_k(get_column(table, 'city'), 3) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.approx_top_k(get_column(table, 'city'), 3) ┌──────────────────────────────┐ │ [{"item":"NYC","count":150}] │ └──────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql APPROX_TOP_K(, 3) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT APPROX_TOP_K(city, 3) FROM customers; ┌───────────────────────────────────┐ │ [{"item":"New York","count":150}] │ └───────────────────────────────────┘ ``` # AVG (Lakehouse v2) > AVG — returns the average value of a numeric column. Returns the average value of a numeric column. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.avg(get_column(table, 'salary')) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.avg(get_column(table, 'salary')) ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql AVG() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT department, AVG(salary) FROM employees GROUP BY department; ┌────────────┬──────────────┐ │ department │ avg(salary) │ ├────────────┼──────────────┤ │ Sales │ 65000.00 │ │ IT │ 82000.00 │ └────────────┴──────────────┘ ``` # BITMAP (Lakehouse v2) > BITMAP — returns a bitmap union of a set of values. Typically used with BITMAP_AGG. Returns a bitmap union of a set of values. Typically used with BITMAP\_AGG. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bitmap_agg(get_column(table, 'id')) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bitmap_agg(get_column(table, 'id')) ┌──────────┐ │ (bitmap) │ └──────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BITMAP() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITMAP_COUNT(BITMAP_AGG(id)) FROM user_tags; ┌─────┐ │ 500 │ └─────┘ ``` # BOOL_OR (Lakehouse v2) > BOOL_OR — returns TRUE if any value in the group is TRUE. Returns TRUE if any value in the group is TRUE. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bool_or(get_column(table, 'is_active')) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bool_or(get_column(table, 'is_active')) ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BOOL_OR() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT department, BOOL_OR(is_active) FROM employees GROUP BY department; ┌────────────┬─────────────────────┐ │ department │ bool_or(is_active) │ ├────────────┼─────────────────────┤ │ Sales │ 1 │ │ IT │ 1 │ └────────────┴─────────────────────┘ ``` # CORR (Lakehouse v2) > CORR — returns the Pearson correlation coefficient between two expressions. Returns the Pearson correlation coefficient between two expressions. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.corr(get_column(table, 'revenue'), get_column(table, 'ad_spend')) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.corr(get_column(table, 'revenue'), get_column(table, 'ad_spend')) ┌───────┐ │ 0.872 │ └───────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql CORR(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT CORR(revenue, ad_spend) FROM marketing; ┌───────┐ │ 0.872 │ └───────┘ ``` # COUNT (Lakehouse v2) > COUNT — returns the number of rows or non-NULL values. Returns the number of rows or non-NULL values. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.count() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.count() ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql COUNT() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT COUNT(*) FROM employees; ┌──────────┐ │ count(*) │ ├──────────┤ │ 1000 │ └──────────┘ SELECT COUNT(DISTINCT department) FROM employees; ┌──────────────────────────────┐ │ count(distinct department) │ ├──────────────────────────────┤ │ 5 │ └──────────────────────────────┘ ``` # COUNT_IF (Lakehouse v2) > COUNT_IF — returns the number of rows for which the expression is TRUE. Returns the number of rows for which the expression is TRUE. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.count_if(get_column(table, 'salary') > 80000) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.count_if(get_column(table, 'salary') > 80000) ┌────┐ │ 42 │ └────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql COUNT_IF( > 80000) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT COUNT_IF(salary > 80000) FROM employees; ┌────┐ │ 42 │ └────┘ ``` # COVAR_POP (Lakehouse v2) > COVAR_POP — Returns the population covariance of two expressions. Returns the population covariance of two expressions. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.covar_pop(get_column(table, 'y'), get_column(table, 'x')) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.covar_pop(get_column(table, 'height'), get_column(table, 'weight')) ┌────────┐ │ 102.46 │ └────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql COVAR_POP(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT COVAR_POP(height, weight) FROM measurements; ┌────────┐ │ 102.46 │ └────────┘ ``` # COVAR_SAMP (Lakehouse v2) > COVAR_SAMP — Returns the sample covariance of two expressions. Returns the sample covariance of two expressions. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.covar_samp(get_column(table, 'y'), get_column(table, 'x')) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.covar_samp(get_column(table, 'height'), get_column(table, 'weight')) ┌────────┐ │ 103.01 │ └────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql COVAR_SAMP(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT COVAR_SAMP(height, weight) FROM measurements; ┌────────┐ │ 103.01 │ └────────┘ ``` # DS_HLL_ACCUMULATE (Lakehouse v2) > Use the DS_HLL_ACCUMULATE aggregate function in PlaidCloud Lakehouse. Accumulates values into a DataSketches HLL sketch for approximate distinct counting. Accumulates values into a DataSketches HLL sketch for approximate distinct counting. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.ds_hll_accumulate(get_column(table, 'user_id')) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.ds_hll_accumulate(get_column(table, 'user_id')) ┌──────────────┐ │ (hll sketch) │ └──────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql DS_HLL_ACCUMULATE() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT DS_HLL_ESTIMATE(DS_HLL_ACCUMULATE(user_id)) FROM visits; ┌──────┐ │ 9856 │ └──────┘ ``` # DS_HLL_COMBINE (Lakehouse v2) > DS_HLL_COMBINE — combines multiple DataSketches HLL sketches into a single sketch. Combines multiple DataSketches HLL sketches into a single sketch. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.ds_hll_combine(get_column(table, 'hll_sketch')) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.ds_hll_combine(get_column(table, 'hll_sketch')) ┌───────────────────┐ │ (combined sketch) │ └───────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql DS_HLL_COMBINE() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT DS_HLL_ESTIMATE(DS_HLL_COMBINE(sketch_col)) FROM daily_sketches; ┌───────┐ │ 25000 │ └───────┘ ``` # DS_HLL_COUNT_DISTINCT (Lakehouse v2) > Use the DS_HLL_COUNT_DISTINCT aggregate function in PlaidCloud Lakehouse. Returns an approximate distinct count using DataSketches HLL algorithm. More. Returns an approximate distinct count using DataSketches HLL algorithm. More accurate than `APPROX_COUNT_DISTINCT`. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.ds_hll_count_distinct(get_column(table, 'user_id')) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.ds_hll_count_distinct(get_column(table, 'user_id')) ┌───────┐ │ 10042 │ └───────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql DS_HLL_COUNT_DISTINCT() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT DS_HLL_COUNT_DISTINCT(user_id) FROM page_views; ┌───────┐ │ 10042 │ └───────┘ ``` # DS_HLL_ESTIMATE (Lakehouse v2) > DS_HLL_ESTIMATE — estimates the cardinality from a DataSketches HLL sketch. Estimates the cardinality from a DataSketches HLL sketch. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.ds_hll_estimate(get_column(table, 'hll_sketch')) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.ds_hll_estimate(get_column(table, 'hll_sketch')) ┌──────┐ │ 9856 │ └──────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql DS_HLL_ESTIMATE() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT DS_HLL_ESTIMATE(DS_HLL_ACCUMULATE(user_id)) FROM visits; ┌──────┐ │ 9856 │ └──────┘ ``` # DS_THETA_COUNT_DISTINCT (Lakehouse v2) > Use the DS_THETA_COUNT_DISTINCT aggregate function in PlaidCloud Lakehouse. Returns an approximate distinct count using DataSketches Theta algorithm. Supports. Returns an approximate distinct count using DataSketches Theta algorithm. Supports set operations like intersection and difference. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.ds_theta_count_distinct(get_column(table, 'user_id')) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.ds_theta_count_distinct(get_column(table, 'user_id')) ┌───────┐ │ 10035 │ └───────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql DS_THETA_COUNT_DISTINCT() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT DS_THETA_COUNT_DISTINCT(user_id) FROM page_views; ┌───────┐ │ 10035 │ └───────┘ ``` # GROUP_CONCAT (Aggregate, Lakehouse v2) > GROUP_CONCAT — concatenates values from a group into a single string with a separator. Concatenates values from a group into a single string with a separator. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.group_concat(get_column(table, 'name'), literal(',')) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.group_concat(get_column(table, 'name')) ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql GROUP_CONCAT(, literal(',')) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT department, GROUP_CONCAT(name ORDER BY name SEPARATOR ', ') FROM employees GROUP BY department; ┌────────────┬──────────────────────────┐ │ department │ group_concat(name) │ ├────────────┼──────────────────────────┤ │ Sales │ Alice, Bob, Charlie │ │ IT │ Dave, Eve │ └────────────┴──────────────────────────┘ ``` # GROUPING (Lakehouse v2) > Use the GROUPING aggregate function in PlaidCloud Lakehouse. Indicates whether a specified column in a GROUP BY clause is aggregated. Returns 1 if aggregated. Indicates whether a specified column in a GROUP BY clause is aggregated. Returns 1 if aggregated, 0 otherwise. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.grouping(get_column(table, 'department')) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.grouping(get_column(table, 'department')) ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql GROUPING() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT department, GROUPING(department), SUM(salary) FROM employees GROUP BY ROLLUP(department); ┌────────────┬───────────────────────┬─────────────┐ │ department │ grouping(department) │ sum(salary) │ ├────────────┼───────────────────────┼─────────────┤ │ Sales │ 0 │ 195000 │ │ IT │ 0 │ 246000 │ │ NULL │ 1 │ 441000 │ └────────────┴───────────────────────┴─────────────┘ ``` # GROUPING_ID (Lakehouse v2) > GROUPING_ID — returns a bitmask corresponding to the grouping of columns. Returns a bitmask corresponding to the grouping of columns. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.grouping_id(get_column(table, 'a'), get_column(table, 'b')) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.grouping_id(get_column(table, 'department'), get_column(table, 'year')) ┌───┐ │ 0 │ └───┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql GROUPING_ID(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT department, year, GROUPING_ID(department, year), SUM(salary) FROM employees GROUP BY ROLLUP(department, year); ┌───┐ │ 0 │ └───┘ ``` # HLL_RAW_AGG (Lakehouse v2) > HLL_RAW_AGG — Aggregates HLL values into a single HLL value. Aggregates HLL values into a single HLL value. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.hll_raw_agg(get_column(table, 'hll_col')) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.hll_raw_agg(get_column(table, 'hll_col')) ┌─────────────┐ │ (hll value) │ └─────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql HLL_RAW_AGG() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT HLL_CARDINALITY(HLL_RAW_AGG(hll_col)) FROM sketches; ┌──────┐ │ 5000 │ └──────┘ ``` # HLL_UNION (Lakehouse v2) > HLL_UNION — returns the union of multiple HLL values. Returns the union of multiple HLL values. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.hll_union(get_column(table, 'hll_col')) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.hll_union(get_column(table, 'hll_col')) ┌─────────────┐ │ (hll value) │ └─────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql HLL_UNION() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT HLL_CARDINALITY(HLL_UNION(hll_col)) FROM daily_sketches; ┌───────┐ │ 12500 │ └───────┘ ``` # HLL_UNION_AGG (Lakehouse v2) > HLL_UNION_AGG — Aggregates HLL values by computing the union. Aggregates HLL values by computing the union. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.hll_union_agg(get_column(table, 'hll_col')) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.hll_union_agg(get_column(table, 'hll_col')) ┌─────────────┐ │ (hll value) │ └─────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql HLL_UNION_AGG() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT HLL_CARDINALITY(HLL_UNION_AGG(hll_col)) FROM segments; ┌──────┐ │ 8000 │ └──────┘ ``` # MANN_WHITNEY_U_TEST (Lakehouse v2) > MANN_WHITNEY_U_TEST — performs a Mann-Whitney U test on two independent samples. Performs a Mann-Whitney U test on two independent samples. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.mann_whitney_u_test(get_column(table, 'sample'), get_column(table, 'treatment')) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.mann_whitney_u_test(get_column(table, 'score'), get_column(table, 'group_id')) ┌─────────────────────────────┐ │ {"U":245.0,"p-value":0.032} │ └─────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql MANN_WHITNEY_U_TEST(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT MANN_WHITNEY_U_TEST(score, group_id) FROM experiment; ┌─────────────────────────────┐ │ {"U":245.0,"p-value":0.032} │ └─────────────────────────────┘ ``` # MAX (Lakehouse v2) > MAX — returns the maximum value in a set of values. Returns the maximum value in a set of values. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.max(get_column(table, 'salary')) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.max(get_column(table, 'salary')) ┌────────┐ │ 150000 │ └────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql MAX() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT MAX(salary) FROM employees; ┌────────┐ │ 150000 │ └────────┘ ``` # MAX_BY (Lakehouse v2) > MAX_BY — returns the value of one column associated with the maximum value of another column. Returns the value of one column associated with the maximum value of another column. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.max_by(get_column(table, 'name'), get_column(table, 'salary')) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.max_by(get_column(table, 'name'), get_column(table, 'salary')) ┌───────┐ │ Alice │ └───────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql MAX_BY(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT MAX_BY(name, salary) FROM employees; ┌───────┐ │ Alice │ └───────┘ ``` # MIN (Lakehouse v2) > MIN — returns the minimum value in a set of values. Returns the minimum value in a set of values. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.min(get_column(table, 'salary')) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.min(get_column(table, 'salary')) ┌───────┐ │ 35000 │ └───────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql MIN() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT MIN(salary) FROM employees; ┌───────┐ │ 35000 │ └───────┘ ``` # MIN_BY (Lakehouse v2) > MIN_BY — returns the value of one column associated with the minimum value of another column. Returns the value of one column associated with the minimum value of another column. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.min_by(get_column(table, 'name'), get_column(table, 'salary')) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.min_by(get_column(table, 'name'), get_column(table, 'salary')) ┌─────┐ │ Bob │ └─────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql MIN_BY(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT MIN_BY(name, salary) FROM employees; ┌─────┐ │ Bob │ └─────┘ ``` # MULTI_DISTINCT_COUNT (Lakehouse v2) > MULTI_DISTINCT_COUNT — returns the count of distinct values. Equivalent to COUNT(DISTINCT). Returns the count of distinct values. Equivalent to COUNT(DISTINCT). ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.multi_distinct_count(get_column(table, 'dept')) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.multi_distinct_count(get_column(table, 'department')) ┌───┐ │ 5 │ └───┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql MULTI_DISTINCT_COUNT() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT MULTI_DISTINCT_COUNT(department) FROM employees; ┌───┐ │ 5 │ └───┘ ``` # MULTI_DISTINCT_SUM (Lakehouse v2) > MULTI_DISTINCT_SUM — returns the sum of distinct values. Returns the sum of distinct values. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.multi_distinct_sum(get_column(table, 'bonus')) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.multi_distinct_sum(get_column(table, 'bonus')) ┌───────┐ │ 25000 │ └───────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql MULTI_DISTINCT_SUM() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT MULTI_DISTINCT_SUM(bonus) FROM employees; ┌───────┐ │ 25000 │ └───────┘ ``` # PERCENTILE_APPROX (Lakehouse v2) > PERCENTILE_APPROX — returns an approximate percentile value using the t-digest algorithm. Returns an approximate percentile value using the t-digest algorithm. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.percentile_approx(get_column(table, 'response_time'), 0.95) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.percentile_approx(get_column(table, 'response_time'), 0.95) ┌───────┐ │ 245.3 │ └───────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql PERCENTILE_APPROX(, 0.95) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT PERCENTILE_APPROX(response_time, 0.95) FROM requests; ┌───────┐ │ 245.3 │ └───────┘ ``` # PERCENTILE_APPROX_WEIGHT (Lakehouse v2) > PERCENTILE_APPROX_WEIGHT — returns a weighted approximate percentile value. Returns a weighted approximate percentile value. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.percentile_approx_weight(get_column(table, 'val'), get_column(table, 'weight'), 0.5) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.percentile_approx_weight(get_column(table, 'val'), get_column(table, 'weight'), 0.5) ┌──────┐ │ 72.5 │ └──────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql PERCENTILE_APPROX_WEIGHT(, , 0.5) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT PERCENTILE_APPROX_WEIGHT(val, weight, 0.5) FROM data; ┌──────┐ │ 72.5 │ └──────┘ ``` # PERCENTILE_CONT (Aggregate, Lakehouse v2) > PERCENTILE_CONT — returns an interpolated percentile value based on a continuous distribution. Returns an interpolated percentile value based on a continuous distribution. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.percentile_cont(0.5) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.percentile_cont(0.5) ┌──────────┐ │ 72500.00 │ └──────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql PERCENTILE_CONT(0.5) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) FROM employees; ┌──────────┐ │ 72500.00 │ └──────────┘ ``` # PERCENTILE_DISC (Aggregate, Lakehouse v2) > Use the PERCENTILE_DISC aggregate function in PlaidCloud Lakehouse. Returns the smallest value whose cumulative distribution is >= the specified percentile. Returns the smallest value whose cumulative distribution is >= the specified percentile. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.percentile_disc(0.5) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.percentile_disc(0.5) ┌───────┐ │ 72000 │ └───────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql PERCENTILE_DISC(0.5) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY salary) FROM employees; ┌───────┐ │ 72000 │ └───────┘ ``` # PERCENTILE_DISC_LC (Lakehouse v2) > PERCENTILE_DISC_LC — returns the percentile value using a low-cardinality optimized algorithm. Returns the percentile value using a low-cardinality optimized algorithm. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.percentile_disc_lc(0.5) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.percentile_disc_lc(0.5) ┌───────┐ │ 72000 │ └───────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql PERCENTILE_DISC_LC(0.5) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT PERCENTILE_DISC_LC(0.5) WITHIN GROUP (ORDER BY salary) FROM employees; ┌───────┐ │ 72000 │ └───────┘ ``` # RETENTION (Lakehouse v2) > RETENTION — calculates retention for a set of events. Returns an array of 0s and 1s. Calculates retention for a set of events. Returns an array of 0s and 1s. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.retention(cond1, cond2) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.retention(get_column(table, 'day1'), get_column(table, 'day2')) ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql RETENTION(cond1, cond2) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT uid, RETENTION(dt='2024-01-01', dt='2024-01-02', dt='2024-01-03') FROM events GROUP BY uid; ┌─────┬────────────┐ │ uid │ retention │ ├─────┼────────────┤ │ 1 │ [1,1,0] │ │ 2 │ [1,0,0] │ │ 3 │ [1,1,1] │ └─────┴────────────┘ ``` # STD (Lakehouse v2) > STD — returns the population standard deviation. Alias for `STDDEV_POP`. Returns the population standard deviation. Alias for `STDDEV_POP`. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.std(get_column(table, 'score')) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.std(get_column(table, 'score')) ┌───────┐ │ 15.32 │ └───────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql STD() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT STD(score) FROM test_results; ┌───────┐ │ 15.32 │ └───────┘ ``` # STDDEV (Lakehouse v2) > STDDEV — returns the population standard deviation. Alias for `STDDEV_POP`. Returns the population standard deviation. Alias for `STDDEV_POP`. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.stddev(get_column(table, 'score')) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.stddev(get_column(table, 'score')) ┌───────┐ │ 15.32 │ └───────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql STDDEV() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT STDDEV(score) FROM test_results; ┌───────┐ │ 15.32 │ └───────┘ ``` # STDDEV_POP (Lakehouse v2) > STDDEV_POP — returns the population standard deviation. Returns the population standard deviation. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.stddev_pop(get_column(table, 'score')) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.stddev_pop(get_column(table, 'score')) ┌───────┐ │ 15.32 │ └───────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql STDDEV_POP() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT STDDEV_POP(score) FROM test_results; ┌───────┐ │ 15.32 │ └───────┘ ``` # STDDEV_SAMP (Lakehouse v2) > STDDEV_SAMP — returns the sample standard deviation. Returns the sample standard deviation. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.stddev_samp(get_column(table, 'score')) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.stddev_samp(get_column(table, 'score')) ┌───────┐ │ 15.89 │ └───────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql STDDEV_SAMP() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT STDDEV_SAMP(score) FROM test_results; ┌───────┐ │ 15.89 │ └───────┘ ``` # SUM (Lakehouse v2) > SUM — returns the sum of all values in a group. Returns the sum of all values in a group. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.sum(get_column(table, 'amount')) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.sum(get_column(table, 'amount')) ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql SUM() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT department, SUM(salary) FROM employees GROUP BY department; ┌────────────┬─────────────┐ │ department │ sum(salary) │ ├────────────┼─────────────┤ │ Sales │ 195000 │ │ IT │ 246000 │ │ HR │ 210000 │ └────────────┴─────────────┘ ``` # SUM_MAP (Lakehouse v2) > SUM_MAP — sums values grouped by keys in map columns. Sums values grouped by keys in map columns. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.sum_map(get_column(table, 'key_col'), get_column(table, 'val_col')) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.sum_map(get_column(table, 'keys'), get_column(table, 'values')) ┌─────────────────┐ │ {'a':10,'b':20} │ └─────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql SUM_MAP(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT SUM_MAP(keys, values) FROM metrics; ┌─────────────────┐ │ {"a":10,"b":20} │ └─────────────────┘ ``` # VAR_POP (Lakehouse v2) > VAR_POP — Returns the population variance. Alias for `VARIANCE_POP`. Returns the population variance. Alias for `VARIANCE_POP`. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.var_pop(get_column(table, 'score')) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.var_pop(get_column(table, 'score')) ┌────────┐ │ 234.72 │ └────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql VAR_POP() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT VAR_POP(score) FROM test_results; ┌────────┐ │ 234.72 │ └────────┘ ``` # VAR_SAMP (Lakehouse v2) > VAR_SAMP — Returns the sample variance. Alias for `VARIANCE_SAMP`. Returns the sample variance. Alias for `VARIANCE_SAMP`. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.var_samp(get_column(table, 'score')) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.var_samp(get_column(table, 'score')) ┌────────┐ │ 252.48 │ └────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql VAR_SAMP() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT VAR_SAMP(score) FROM test_results; ┌────────┐ │ 252.48 │ └────────┘ ``` # VARIANCE (Lakehouse v2) > VARIANCE — Returns the population variance. Alias for `VARIANCE_POP`. Returns the population variance. Alias for `VARIANCE_POP`. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.variance(get_column(table, 'score')) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.variance(get_column(table, 'score')) ┌────────┐ │ 234.72 │ └────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql VARIANCE() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT VARIANCE(score) FROM test_results; ┌────────┐ │ 234.72 │ └────────┘ ``` # VARIANCE_POP (Lakehouse v2) > VARIANCE_POP — returns the population variance. Returns the population variance. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.variance_pop(get_column(table, 'score')) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.variance_pop(get_column(table, 'score')) ┌────────┐ │ 234.72 │ └────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql VARIANCE_POP() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT VARIANCE_POP(score) FROM test_results; ┌────────┐ │ 234.72 │ └────────┘ ``` # VARIANCE_SAMP (Lakehouse v2) > VARIANCE_SAMP — returns the sample variance. Returns the sample variance. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.variance_samp(get_column(table, 'score')) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.variance_samp(get_column(table, 'score')) ┌────────┐ │ 252.48 │ └────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql VARIANCE_SAMP() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT VARIANCE_SAMP(score) FROM test_results; ┌────────┐ │ 252.48 │ └────────┘ ``` # WINDOW_FUNNEL (Lakehouse v2) > Use the WINDOW_FUNNEL aggregate function in PlaidCloud Lakehouse. Searches for event chains in a time-ordered sequence and returns the maximum chain length. Searches for event chains in a time-ordered sequence and returns the maximum chain length. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.window_funnel(86400, get_column(table, 'ts'), cond1, cond2) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.window_funnel(86400, get_column(table, 'event_time'), get_column(table, 'step1'), get_column(table, 'step2')) ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql WINDOW_FUNNEL(86400, , cond1, cond2) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT user_id, WINDOW_FUNNEL(86400, event_time, event = 'view', event = 'cart', event = 'purchase') FROM user_events GROUP BY user_id; ┌─────────┬─────────────────┐ │ user_id │ window_funnel │ ├─────────┼─────────────────┤ │ 1 │ 3 │ │ 2 │ 2 │ │ 3 │ 1 │ └─────────┴─────────────────┘ ``` # Array Functions (Lakehouse v2) > Lakehouse v2 SQL array functions: build, query, transform, and aggregate array values. This section provides reference information for the array functions in PlaidCloud Lakehouse. ## Functions [Section titled “Functions”](#functions) * [ALL\_MATCH](all-match/) * [ANY\_MATCH](any-match/) * [ARRAY\_AGG](array-agg/) * [ARRAY\_APPEND](array-append/) * [ARRAY\_AVG](array-avg/) * [ARRAY\_CONCAT](array-concat/) * [ARRAY\_CONTAINS\_ALL](array-contains-all/) * [ARRAY\_CONTAINS\_SEQ](array-contains-seq/) * [ARRAY\_CONTAINS](array-contains/) * [ARRAY\_CUM\_SUM](array-cum-sum/) * [ARRAY\_DIFFERENCE](array-difference/) * [ARRAY\_DISTINCT](array-distinct/) * [ARRAY\_FILTER](array-filter/) * [ARRAY\_FLATTEN](array-flatten/) * [ARRAY\_GENERATE](array-generate/) * [ARRAY\_INTERSECT](array-intersect/) * [ARRAY\_JOIN](array-join/) * [ARRAY\_LENGTH](array-length/) * [ARRAY\_MAP](array-map/) * [ARRAY\_MAX](array-max/) * [ARRAY\_MIN](array-min/) * [ARRAY\_POSITION](array-position/) * [ARRAY\_REMOVE](array-remove/) * [ARRAY\_REPEAT](array-repeat/) * [ARRAY\_SLICE](array-slice/) * [ARRAY\_SORT](array-sort/) * [ARRAY\_SORTBY](array-sortby/) * [ARRAY\_SUM](array-sum/) * [ARRAY\_TO\_BITMAP](array-to-bitmap/) * [ARRAY\_TOP\_N](array-top-n/) * [ARRAY\_UNIQUE\_AGG](array-unique-agg/) * [ARRAYS\_OVERLAP](arrays-overlap/) * [ARRAYS\_ZIP](arrays-zip/) * [CARDINALITY](cardinality/) * [ELEMENT\_AT](element-at/) * [REVERSE](reverse/) * [UNNEST](unnest/) # ALL_MATCH (Lakehouse v2) > ALL_MATCH — returns TRUE if all elements in an array match the given predicate. Returns TRUE if all elements in an array match the given predicate. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.all_match(get_column(table, 'arr'), lambda x: x > 0) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.all_match([1, 2, 3], lambda x: x > 0) ┌───┐ │ 1 │ └───┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ALL_MATCH(, lambda x: x > 0) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ALL_MATCH([1, 2, 3], x -> x > 0); ┌───┐ │ 1 │ └───┘ ``` # ANY_MATCH (Lakehouse v2) > ANY_MATCH — returns TRUE if any element in an array matches the given predicate. Returns TRUE if any element in an array matches the given predicate. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.any_match(get_column(table, 'arr'), lambda x: x > 5) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.any_match([1, 2, 8], lambda x: x > 5) ┌───┐ │ 1 │ └───┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ANY_MATCH(, lambda x: x > 5) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ANY_MATCH([1, 2, 8], x -> x > 5); ┌───┐ │ 1 │ └───┘ ``` # ARRAY_AGG (Lakehouse v2) > ARRAY_AGG — aggregates values into an array. Aggregates values into an array. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.array_agg(get_column(table, 'name')) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.array_agg(get_column(table, 'name')) ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ARRAY_AGG() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ARRAY_AGG(name) FROM employees; ┌─────────────────────────────┐ │ array_agg(name) │ ├─────────────────────────────┤ │ ["Alice","Bob","Charlie"] │ └─────────────────────────────┘ ``` # ARRAY_APPEND (Lakehouse v2) > ARRAY_APPEND — appends an element to the end of an array. Appends an element to the end of an array. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.array_append([1, 2, 3], 4) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.array_append([1, 2, 3], 4) ┌───────────┐ │ [1,2,3,4] │ └───────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ARRAY_APPEND([1, 2, 3], 4) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ARRAY_APPEND([1, 2, 3], 4); ┌───────────┐ │ [1,2,3,4] │ └───────────┘ ``` # ARRAY_AVG (Lakehouse v2) > ARRAY_AVG — returns the average of elements in an array. Returns the average of elements in an array. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.array_avg([1, 2, 3, 4]) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.array_avg([10, 20, 30]) ┌──────┐ │ 20.0 │ └──────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ARRAY_AVG([1, 2, 3, 4]) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ARRAY_AVG([10, 20, 30]); ┌──────┐ │ 20.0 │ └──────┘ ``` # ARRAY_CONCAT (Lakehouse v2) > ARRAY_CONCAT — Concatenates multiple arrays into a single array. Concatenates multiple arrays into a single array. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.array_concat([1, 2], [3, 4]) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.array_concat([1, 2], [3, 4]) ┌───────────┐ │ [1,2,3,4] │ └───────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ARRAY_CONCAT([1, 2], [3, 4]) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ARRAY_CONCAT([1, 2], [3, 4]); ┌───────────┐ │ [1,2,3,4] │ └───────────┘ ``` # ARRAY_CONTAINS (Lakehouse v2) > ARRAY_CONTAINS — Checks whether an array contains a specific element. Checks whether an array contains a specific element. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.array_contains([1, 2, 3], 2) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.array_contains([1, 2, 3], 2) ┌───┐ │ 1 │ └───┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ARRAY_CONTAINS([1, 2, 3], 2) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ARRAY_CONTAINS([1, 2, 3], 2); ┌───┐ │ 1 │ └───┘ ``` # ARRAY_CONTAINS_ALL (Lakehouse v2) > ARRAY_CONTAINS_ALL — checks whether an array contains all elements of another array. Checks whether an array contains all elements of another array. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.array_contains_all([1,2,3], [1,2]) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.array_contains_all([1, 2, 3], [1, 2]) ┌───┐ │ 1 │ └───┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ARRAY_CONTAINS_ALL([1,2,3], [1,2]) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ARRAY_CONTAINS_ALL([1, 2, 3], [1, 2]); ┌───┐ │ 1 │ └───┘ ``` # ARRAY_CONTAINS_SEQ (Lakehouse v2) > ARRAY_CONTAINS_SEQ — checks whether an array contains all elements of another array in order. Checks whether an array contains all elements of another array in order. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.array_contains_seq([1,2,3], [1,2]) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.array_contains_seq([1, 2, 3, 4], [2, 3]) ┌───┐ │ 1 │ └───┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ARRAY_CONTAINS_SEQ([1,2,3], [1,2]) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ARRAY_CONTAINS_SEQ([1, 2, 3, 4], [2, 3]); ┌───┐ │ 1 │ └───┘ ``` # ARRAY_CUM_SUM (Lakehouse v2) > ARRAY_CUM_SUM — Returns the cumulative sum of elements in an array. Returns the cumulative sum of elements in an array. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.array_cum_sum([1, 2, 3, 4]) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.array_cum_sum([1, 2, 3, 4]) ┌────────────┐ │ [1,3,6,10] │ └────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ARRAY_CUM_SUM([1, 2, 3, 4]) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ARRAY_CUM_SUM([1, 2, 3, 4]); ┌────────────┐ │ [1,3,6,10] │ └────────────┘ ``` # ARRAY_DIFFERENCE (Lakehouse v2) > ARRAY_DIFFERENCE — returns an array of differences between consecutive elements. Returns an array of differences between consecutive elements. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.array_difference([1, 3, 6, 10]) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.array_difference([1, 3, 6, 10]) ┌───────────┐ │ [0,2,3,4] │ └───────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ARRAY_DIFFERENCE([1, 3, 6, 10]) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ARRAY_DIFFERENCE([1, 3, 6, 10]); ┌───────────┐ │ [0,2,3,4] │ └───────────┘ ``` # ARRAY_DISTINCT (Lakehouse v2) > ARRAY_DISTINCT — removes duplicate elements from an array. Removes duplicate elements from an array. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.array_distinct([1, 2, 2, 3, 3]) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.array_distinct([1, 2, 2, 3, 3]) ┌─────────┐ │ [1,2,3] │ └─────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ARRAY_DISTINCT([1, 2, 2, 3, 3]) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ARRAY_DISTINCT([1, 2, 2, 3, 3]); ┌─────────┐ │ [1,2,3] │ └─────────┘ ``` # ARRAY_FILTER (Lakehouse v2) > ARRAY_FILTER — Filters elements in an array using a lambda expression. Filters elements in an array using a lambda expression. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.array_filter(get_column(table, 'arr'), lambda x: x > 2) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.array_filter([1, 2, 3, 4, 5], lambda x: x > 2) ┌─────────┐ │ [3,4,5] │ └─────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ARRAY_FILTER(, lambda x: x > 2) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ARRAY_FILTER([1, 2, 3, 4, 5], x -> x > 2); ┌─────────┐ │ [3,4,5] │ └─────────┘ ``` # ARRAY_FLATTEN (Lakehouse v2) > ARRAY_FLATTEN — Flattens nested arrays into a single-level array. Flattens nested arrays into a single-level array. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.array_flatten([[1,2],[3,4]]) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.array_flatten([[1, 2], [3, 4]]) ┌───────────┐ │ [1,2,3,4] │ └───────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ARRAY_FLATTEN([[1,2],[3,4]]) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ARRAY_FLATTEN([[1, 2], [3, 4]]); ┌───────────┐ │ [1,2,3,4] │ └───────────┘ ``` # ARRAY_GENERATE (Lakehouse v2) > ARRAY_GENERATE — generates an array of sequential values. Generates an array of sequential values. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.array_generate(1, 5) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.array_generate(1, 5) ┌─────────────┐ │ [1,2,3,4,5] │ └─────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ARRAY_GENERATE(1, 5) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ARRAY_GENERATE(1, 5); ┌─────────────┐ │ [1,2,3,4,5] │ └─────────────┘ ``` # ARRAY_INTERSECT (Lakehouse v2) > ARRAY_INTERSECT — returns the intersection of two arrays. Returns the intersection of two arrays. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.array_intersect([1,2,3], [2,3,4]) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.array_intersect([1, 2, 3], [2, 3, 4]) ┌───────┐ │ [2,3] │ └───────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ARRAY_INTERSECT([1,2,3], [2,3,4]) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ARRAY_INTERSECT([1, 2, 3], [2, 3, 4]); ┌───────┐ │ [2,3] │ └───────┘ ``` # ARRAY_JOIN (Lakehouse v2) > ARRAY_JOIN — Concatenates array elements into a string with a separator. Concatenates array elements into a string with a separator. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.array_join([1,2,3], '-') ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.array_join(['a', 'b', 'c'], '-') ┌─────────┐ │ 'a-b-c' │ └─────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ARRAY_JOIN([1,2,3], '-') ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ARRAY_JOIN(['a', 'b', 'c'], '-'); ┌───────┐ │ a-b-c │ └───────┘ ``` # ARRAY_LENGTH (Lakehouse v2) > ARRAY_LENGTH — returns the number of elements in an array. Returns the number of elements in an array. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.array_length([1, 2, 3]) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.array_length([10, 20, 30]) ┌───┐ │ 3 │ └───┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ARRAY_LENGTH([1, 2, 3]) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ARRAY_LENGTH([10, 20, 30]); ┌───┐ │ 3 │ └───┘ ``` # ARRAY_MAP (Lakehouse v2) > ARRAY_MAP — Applies a lambda expression to each element of an array. Applies a lambda expression to each element of an array. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.array_map(get_column(table, 'arr'), lambda x: x * 2) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.array_map([1, 2, 3], lambda x: x * 2) ┌─────────┐ │ [2,4,6] │ └─────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ARRAY_MAP(, lambda x: x * 2) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ARRAY_MAP([1, 2, 3], x -> x * 2); ┌─────────┐ │ [2,4,6] │ └─────────┘ ``` # ARRAY_MAX (Lakehouse v2) > ARRAY_MAX — returns the maximum element in an array. Returns the maximum element in an array. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.array_max([3, 1, 4, 1, 5]) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.array_max([3, 1, 4, 1, 5]) ┌───┐ │ 5 │ └───┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ARRAY_MAX([3, 1, 4, 1, 5]) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ARRAY_MAX([3, 1, 4, 1, 5]); ┌───┐ │ 5 │ └───┘ ``` # ARRAY_MIN (Lakehouse v2) > ARRAY_MIN — returns the minimum element in an array. Returns the minimum element in an array. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.array_min([3, 1, 4, 1, 5]) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.array_min([3, 1, 4, 1, 5]) ┌───┐ │ 1 │ └───┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ARRAY_MIN([3, 1, 4, 1, 5]) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ARRAY_MIN([3, 1, 4, 1, 5]); ┌───┐ │ 1 │ └───┘ ``` # ARRAY_POSITION (Lakehouse v2) > ARRAY_POSITION — returns the position of the first occurrence of an element (1-indexed). Returns the position of the first occurrence of an element (1-indexed). ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.array_position([10, 20, 30], 20) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.array_position([10, 20, 30], 20) ┌───┐ │ 2 │ └───┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ARRAY_POSITION([10, 20, 30], 20) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ARRAY_POSITION([10, 20, 30], 20); ┌───┐ │ 2 │ └───┘ ``` # ARRAY_REMOVE (Lakehouse v2) > ARRAY_REMOVE — removes all occurrences of a specified element from an array. Removes all occurrences of a specified element from an array. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.array_remove([1, 2, 3, 2, 1], 2) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.array_remove([1, 2, 3, 2, 1], 2) ┌─────────┐ │ [1,3,1] │ └─────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ARRAY_REMOVE([1, 2, 3, 2, 1], 2) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ARRAY_REMOVE([1, 2, 3, 2, 1], 2); ┌─────────┐ │ [1,3,1] │ └─────────┘ ``` # ARRAY_REPEAT (Lakehouse v2) > ARRAY_REPEAT — creates an array containing a specified element repeated N times. Creates an array containing a specified element repeated N times. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.array_repeat('x', 3) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.array_repeat('x', 3) ┌───────────────┐ │ ['x','x','x'] │ └───────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ARRAY_REPEAT('x', 3) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ARRAY_REPEAT('x', 3); ┌───────────────┐ │ ["x","x","x"] │ └───────────────┘ ``` # ARRAY_SLICE (Lakehouse v2) > ARRAY_SLICE — returns a slice of an array from a start position with a given length. Returns a slice of an array from a start position with a given length. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.array_slice([1,2,3,4,5], 2, 3) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.array_slice([1, 2, 3, 4, 5], 2, 3) ┌─────────┐ │ [2,3,4] │ └─────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ARRAY_SLICE([1,2,3,4,5], 2, 3) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ARRAY_SLICE([1, 2, 3, 4, 5], 2, 3); ┌─────────┐ │ [2,3,4] │ └─────────┘ ``` # ARRAY_SORT (Lakehouse v2) > ARRAY_SORT — Sorts the elements of an array in ascending order. Sorts the elements of an array in ascending order. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.array_sort([3, 1, 4, 1, 5]) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.array_sort([3, 1, 4, 1, 5]) ┌─────────────┐ │ [1,1,3,4,5] │ └─────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ARRAY_SORT([3, 1, 4, 1, 5]) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ARRAY_SORT([3, 1, 4, 1, 5]); ┌─────────────┐ │ [1,1,3,4,5] │ └─────────────┘ ``` # ARRAY_SORTBY (Lakehouse v2) > ARRAY_SORTBY — sorts elements of one array by corresponding elements of another array. Sorts elements of one array by corresponding elements of another array. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.array_sortby(get_column(table, 'names'), get_column(table, 'scores')) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.array_sortby(['c','a','b'], [3,1,2]) ┌───────────────┐ │ ['a','b','c'] │ └───────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ARRAY_SORTBY(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ARRAY_SORTBY(['c','a','b'], [3,1,2]); ┌───────────────┐ │ ["a","b","c"] │ └───────────────┘ ``` # ARRAY_SUM (Lakehouse v2) > ARRAY_SUM — returns the sum of elements in an array. Returns the sum of elements in an array. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.array_sum([1, 2, 3, 4, 5]) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.array_sum([10, 20, 30]) ┌────┐ │ 60 │ └────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ARRAY_SUM([1, 2, 3, 4, 5]) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ARRAY_SUM([10, 20, 30]); ┌────┐ │ 60 │ └────┘ ``` # ARRAY_TO_BITMAP (Lakehouse v2) > ARRAY_TO_BITMAP — converts an array of integers to a bitmap. Converts an array of integers to a bitmap. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.array_to_bitmap([1, 2, 3]) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.array_to_bitmap([1, 2, 3]) ┌──────────┐ │ (bitmap) │ └──────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ARRAY_TO_BITMAP([1, 2, 3]) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITMAP_TO_STRING(ARRAY_TO_BITMAP([1, 2, 3])); ┌───────┐ │ 1,2,3 │ └───────┘ ``` # ARRAY_TOP_N (Lakehouse v2) > ARRAY_TOP_N — returns the top N elements from an array. Returns the top N elements from an array. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.array_top_n([3,1,4,1,5], 3) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.array_top_n([3, 1, 4, 1, 5], 3) ┌─────────┐ │ [5,4,3] │ └─────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ARRAY_TOP_N([3,1,4,1,5], 3) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ARRAY_TOP_N([3, 1, 4, 1, 5], 3); ┌─────────┐ │ [5,4,3] │ └─────────┘ ``` # ARRAY_UNIQUE_AGG (Lakehouse v2) > ARRAY_UNIQUE_AGG — Aggregates values into an array of distinct values. Aggregates values into an array of distinct values. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.array_unique_agg(get_column(table, 'tag')) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.array_unique_agg(get_column(table, 'tag')) ┌───────────────┐ │ ['a','b','c'] │ └───────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ARRAY_UNIQUE_AGG() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ARRAY_UNIQUE_AGG(tag) FROM tags; ┌───────────────┐ │ ["a","b","c"] │ └───────────────┘ ``` # ARRAYS_OVERLAP (Lakehouse v2) > ARRAYS_OVERLAP — Checks whether two arrays have any common elements. Checks whether two arrays have any common elements. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.arrays_overlap([1,2,3], [3,4,5]) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.arrays_overlap([1, 2, 3], [3, 4, 5]) ┌───┐ │ 1 │ └───┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ARRAYS_OVERLAP([1,2,3], [3,4,5]) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ARRAYS_OVERLAP([1, 2, 3], [3, 4, 5]); ┌───┐ │ 1 │ └───┘ ``` # ARRAYS_ZIP (Lakehouse v2) > ARRAYS_ZIP — merges multiple arrays into an array of structs. Merges multiple arrays into an array of structs. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.arrays_zip([1,2], ['a','b']) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.arrays_zip([1, 2, 3], ['a', 'b', 'c']) ┌───────────────────────────┐ │ [{1,'a'},{2,'b'},{3,'c'}] │ └───────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ARRAYS_ZIP([1,2], ['a','b']) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ARRAYS_ZIP([1, 2, 3], ['a', 'b', 'c']); ┌───────────────────────────────────────────────────┐ │ [{"0":1,"1":"a"},{"0":2,"1":"b"},{"0":3,"1":"c"}] │ └───────────────────────────────────────────────────┘ ``` # CARDINALITY (Array, Lakehouse v2) > CARDINALITY — returns the number of elements in an array. Alias for `ARRAY_LENGTH`. Returns the number of elements in an array. Alias for `ARRAY_LENGTH`. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.cardinality([1, 2, 3]) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.cardinality([10, 20, 30]) ┌───┐ │ 3 │ └───┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql CARDINALITY([1, 2, 3]) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT CARDINALITY([10, 20, 30]); ┌───┐ │ 3 │ └───┘ ``` # ELEMENT_AT (Array, Lakehouse v2) > ELEMENT_AT — returns the element at a specified position in an array (1-indexed). Returns the element at a specified position in an array (1-indexed). ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.element_at([10, 20, 30], 2) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.element_at([10, 20, 30], 2) ┌────┐ │ 20 │ └────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ELEMENT_AT([10, 20, 30], 2) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ELEMENT_AT([10, 20, 30], 2); ┌────┐ │ 20 │ └────┘ ``` # REVERSE (Array, Lakehouse v2) > REVERSE — returns an array with elements in reverse order. Returns an array with elements in reverse order. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.reverse([1, 2, 3]) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.reverse([1, 2, 3]) ┌─────────┐ │ [3,2,1] │ └─────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql REVERSE([1, 2, 3]) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ARRAY_REVERSE([1, 2, 3]); ┌─────────┐ │ [3,2,1] │ └─────────┘ ``` # UNNEST (Lakehouse v2) > UNNEST — expands an array into a set of rows. Expands an array into a set of rows. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.unnest([1, 2, 3]) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python # Used in FROM clause # SELECT * FROM UNNEST([1, 2, 3]) ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql UNNEST([1, 2, 3]) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT * FROM UNNEST([1, 2, 3]) AS t(val); ┌─────┐ │ val │ ├─────┤ │ 1 │ │ 2 │ │ 3 │ └─────┘ ``` # Binary Functions (Lakehouse v2) > Lakehouse v2 SQL binary functions: manipulate binary (BLOB) data — slicing, encoding, and conversion. This section provides reference information for the binary functions in PlaidCloud Lakehouse. ## Functions [Section titled “Functions”](#functions) * [FROM\_BINARY](from-binary/) * [TO\_BINARY](to-binary/) # FROM_BINARY (Lakehouse v2) > FROM_BINARY — converts a binary value to a VARCHAR string based on the specified binary format. Converts a binary value to a VARCHAR string based on the specified binary format. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.from_binary(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.from_binary(b'\x48\x65\x6c\x6c\x6f', 'utf8') ┌─────────┐ │ 'Hello' │ └─────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql FROM_BINARY(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT FROM_BINARY(X'48656C6C6F', 'utf8'); ┌───────┐ │ Hello │ └───────┘ ``` # TO_BINARY (Lakehouse v2) > TO_BINARY — converts a VARCHAR string to a binary value based on the specified binary format. Converts a VARCHAR string to a binary value based on the specified binary format. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_binary(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_binary('Hello', 'utf8') ┌──────────┐ │ b'Hello' │ └──────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_BINARY(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT TO_BINARY('Hello', 'utf8'); ┌────────────┐ │ 48656C6C6F │ └────────────┘ ``` # Bit Functions (Lakehouse v2) > Lakehouse v2 SQL bit functions: bitwise operations on integers — AND, OR, XOR, shifts. This section provides reference information for the bit functions in PlaidCloud Lakehouse. ## Functions [Section titled “Functions”](#functions) * [BIT\_SHIFT\_LEFT](bit-shift-left/) * [BIT\_SHIFT\_RIGHT\_LOGICAL](bit-shift-right-logical/) * [BIT\_SHIFT\_RIGHT](bit-shift-right/) * [BITAND](bitand/) * [BITNOT](bitnot/) * [BITOR](bitor/) * [BITXOR](bitxor/) # BIT_SHIFT_LEFT (Lakehouse v2) > BIT_SHIFT_LEFT — shifts the bits of a numeric value to the left by a specified number of positions. Shifts the bits of a numeric value to the left by a specified number of positions. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bit_shift_left(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bit_shift_left(1, 4) ┌────┐ │ 16 │ └────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BIT_SHIFT_LEFT(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BIT_SHIFT_LEFT(1, 4); ┌────┐ │ 16 │ └────┘ ``` # BIT_SHIFT_RIGHT (Lakehouse v2) > Use the BIT_SHIFT_RIGHT bit function in PlaidCloud Lakehouse. Shifts the bits of a numeric value to the right by a specified number of positions (arithmetic). Shifts the bits of a numeric value to the right by a specified number of positions (arithmetic). ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bit_shift_right(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bit_shift_right(16, 2) ┌───┐ │ 4 │ └───┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BIT_SHIFT_RIGHT(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BIT_SHIFT_RIGHT(16, 2); ┌───┐ │ 4 │ └───┘ ``` # BIT_SHIFT_RIGHT_LOGICAL (Lakehouse v2) > Use the BIT_SHIFT_RIGHT_LOGICAL bit function in PlaidCloud Lakehouse. Shifts the bits of a numeric value to the right by a specified number of positions. Shifts the bits of a numeric value to the right by a specified number of positions (logical). ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bit_shift_right_logical(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bit_shift_right_logical(16, 2) ┌───┐ │ 4 │ └───┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BIT_SHIFT_RIGHT_LOGICAL(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BIT_SHIFT_RIGHT_LOGICAL(16, 2); ┌───┐ │ 4 │ └───┘ ``` # BITAND (Lakehouse v2) > BITAND — returns the bitwise AND of two numeric values. Returns the bitwise AND of two numeric values. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bitand(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bitand(12, 10) ┌───┐ │ 8 │ └───┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BITAND(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITAND(12, 10); ┌───┐ │ 8 │ └───┘ ``` # BITNOT (Lakehouse v2) > BITNOT — returns the bitwise NOT of a numeric value. Returns the bitwise NOT of a numeric value. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bitnot() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bitnot(0) ┌──────┐ │ '-1' │ └──────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BITNOT() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITNOT(0); ┌────┐ │ -1 │ └────┘ ``` # BITOR (Lakehouse v2) > BITOR — returns the bitwise OR of two numeric values. Returns the bitwise OR of two numeric values. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bitor(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bitor(12, 10) ┌────┐ │ 14 │ └────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BITOR(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITOR(12, 10); ┌────┐ │ 14 │ └────┘ ``` # BITXOR (Lakehouse v2) > BITXOR — returns the bitwise XOR of two numeric values. Returns the bitwise XOR of two numeric values. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bitxor(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bitxor(12, 10) ┌───┐ │ 6 │ └───┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BITXOR(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITXOR(12, 10); ┌───┐ │ 6 │ └───┘ ``` # Bitmap Functions (Lakehouse v2) > Lakehouse v2 SQL bitmap functions: build and operate on roaring bitmap values for fast set arithmetic. This section provides reference information for the bitmap functions in PlaidCloud Lakehouse. ## Functions [Section titled “Functions”](#functions) * [BASE64\_TO\_BITMAP](base64-to-bitmap/) * [BITMAP\_AGG](bitmap-agg/) * [BITMAP\_AND](bitmap-and/) * [BITMAP\_ANDNOT](bitmap-andnot/) * [BITMAP\_CONTAINS](bitmap-contains/) * [BITMAP\_COUNT](bitmap-count/) * [BITMAP\_EMPTY](bitmap-empty/) * [BITMAP\_FROM\_BINARY](bitmap-from-binary/) * [BITMAP\_FROM\_STRING](bitmap-from-string/) * [BITMAP\_HAS\_ANY](bitmap-has-any/) * [BITMAP\_HASH](bitmap-hash/) * [BITMAP\_HASH64](bitmap-hash64/) * [BITMAP\_INTERSECT](bitmap-intersect/) * [BITMAP\_MAX](bitmap-max/) * [BITMAP\_MIN](bitmap-min/) * [BITMAP\_OR](bitmap-or/) * [BITMAP\_REMOVE](bitmap-remove/) * [BITMAP\_SUBSET\_IN\_RANGE](bitmap-subset-in-range/) * [BITMAP\_SUBSET\_LIMIT](bitmap-subset-limit/) * [BITMAP\_TO\_ARRAY](bitmap-to-array/) * [BITMAP\_TO\_BASE64](bitmap-to-base64/) * [BITMAP\_TO\_BINARY](bitmap-to-binary/) * [BITMAP\_TO\_STRING](bitmap-to-string/) * [BITMAP\_UNION\_COUNT](bitmap-union-count/) * [BITMAP\_UNION\_INT](bitmap-union-int/) * [BITMAP\_UNION](bitmap-union/) * [BITMAP\_XOR](bitmap-xor/) * [INTERSECT\_COUNT](intersect-count/) * [SUB\_BITMAP](sub-bitmap/) * [SUBDIVIDE\_BITMAP](subdivide-bitmap/) * [TO\_BITMAP](to-bitmap/) * [UNNEST\_BITMAP](unnest-bitmap/) # BASE64_TO_BITMAP (Lakehouse v2) > BASE64_TO_BITMAP — Converts a base64-encoded string to a bitmap. Converts a base64-encoded string to a bitmap. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.base64_to_bitmap() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.base64_to_bitmap(get_column(table, 'b64_col')) ┌──────────┐ │ (bitmap) │ └──────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BASE64_TO_BITMAP() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITMAP_TO_STRING(BASE64_TO_BITMAP(b64_col)) FROM data; ┌───────┐ │ 1,2,3 │ └───────┘ ``` # BITMAP_AGG (Lakehouse v2) > BITMAP_AGG — aggregates integer values into a bitmap. Aggregates integer values into a bitmap. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bitmap_agg(get_column(table, 'id')) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bitmap_agg(get_column(table, 'user_id')) ┌──────────┐ │ (bitmap) │ └──────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BITMAP_AGG() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITMAP_COUNT(BITMAP_AGG(user_id)) FROM visits; ┌─────┐ │ 500 │ └─────┘ ``` # BITMAP_AND (Lakehouse v2) > BITMAP_AND — returns the intersection of two bitmaps. Returns the intersection of two bitmaps. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bitmap_and(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bitmap_and(get_column(table, 'bm1'), get_column(table, 'bm2')) ┌──────────┐ │ (bitmap) │ └──────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BITMAP_AND(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITMAP_TO_STRING(BITMAP_AND(TO_BITMAP(1), TO_BITMAP(1))); ┌───┐ │ 1 │ └───┘ ``` # BITMAP_ANDNOT (Lakehouse v2) > BITMAP_ANDNOT — returns the difference of two bitmaps (elements in first but not second). Returns the difference of two bitmaps (elements in first but not second). ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bitmap_andnot(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bitmap_andnot(get_column(table, 'bm1'), get_column(table, 'bm2')) ┌──────────┐ │ (bitmap) │ └──────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BITMAP_ANDNOT(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITMAP_TO_STRING(BITMAP_ANDNOT( BITMAP_FROM_STRING('1,2,3'), BITMAP_FROM_STRING('2,3'))); ┌───┐ │ 1 │ └───┘ ``` # BITMAP_CONTAINS (Lakehouse v2) > BITMAP_CONTAINS — Checks whether a bitmap contains a specific value. Checks whether a bitmap contains a specific value. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bitmap_contains(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bitmap_contains(get_column(table, 'bm'), 42) ┌───┐ │ 1 │ └───┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BITMAP_CONTAINS(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITMAP_CONTAINS(BITMAP_FROM_STRING('1,2,42'), 42); ┌───┐ │ 1 │ └───┘ ``` # BITMAP_COUNT (Lakehouse v2) > BITMAP_COUNT — returns the number of set bits in a bitmap. Returns the number of set bits in a bitmap. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bitmap_count() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bitmap_count(get_column(table, 'bm')) ┌───┐ │ 3 │ └───┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BITMAP_COUNT() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITMAP_COUNT(BITMAP_FROM_STRING('1,2,3')); ┌───┐ │ 3 │ └───┘ ``` # BITMAP_EMPTY (Lakehouse v2) > BITMAP_EMPTY — returns an empty bitmap. Returns an empty bitmap. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bitmap_empty() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bitmap_empty() ┌────────────────┐ │ (empty bitmap) │ └────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BITMAP_EMPTY() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITMAP_COUNT(BITMAP_EMPTY()); ┌───┐ │ 0 │ └───┘ ``` # BITMAP_FROM_BINARY (Lakehouse v2) > BITMAP_FROM_BINARY — converts a binary value to a bitmap. Converts a binary value to a bitmap. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bitmap_from_binary() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bitmap_from_binary(get_column(table, 'bin_col')) ┌──────────┐ │ (bitmap) │ └──────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BITMAP_FROM_BINARY() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITMAP_COUNT(BITMAP_FROM_BINARY(bm_binary)) FROM data; ┌───┐ │ 5 │ └───┘ ``` # BITMAP_FROM_STRING (Lakehouse v2) > BITMAP_FROM_STRING — converts a comma-separated string of integers to a bitmap. Converts a comma-separated string of integers to a bitmap. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bitmap_from_string() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bitmap_from_string('1,2,3,4,5') ┌──────────┐ │ (bitmap) │ └──────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BITMAP_FROM_STRING() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITMAP_COUNT(BITMAP_FROM_STRING('1,2,3,4,5')); ┌───┐ │ 5 │ └───┘ ``` # BITMAP_HAS_ANY (Lakehouse v2) > BITMAP_HAS_ANY — Checks whether two bitmaps have any common elements. Checks whether two bitmaps have any common elements. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bitmap_has_any(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bitmap_has_any(get_column(table, 'bm1'), get_column(table, 'bm2')) ┌───┐ │ 1 │ └───┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BITMAP_HAS_ANY(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITMAP_HAS_ANY(BITMAP_FROM_STRING('1,2'), BITMAP_FROM_STRING('2,3')); ┌───┐ │ 1 │ └───┘ ``` # BITMAP_HASH (Lakehouse v2) > BITMAP_HASH — computes a 32-bit hash of a value and returns a bitmap containing that hash. Computes a 32-bit hash of a value and returns a bitmap containing that hash. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bitmap_hash() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bitmap_hash('hello') ┌──────────┐ │ (bitmap) │ └──────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BITMAP_HASH() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITMAP_COUNT(BITMAP_HASH('hello')); ┌───┐ │ 1 │ └───┘ ``` # BITMAP_HASH64 (Lakehouse v2) > BITMAP_HASH64 — computes a 64-bit hash of a value and returns a bitmap containing that hash. Computes a 64-bit hash of a value and returns a bitmap containing that hash. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bitmap_hash64() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bitmap_hash64('hello') ┌──────────┐ │ (bitmap) │ └──────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BITMAP_HASH64() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITMAP_COUNT(BITMAP_HASH64('hello')); ┌───┐ │ 1 │ └───┘ ``` # BITMAP_INTERSECT (Lakehouse v2) > BITMAP_INTERSECT — returns the intersection of a set of bitmaps (aggregate). Returns the intersection of a set of bitmaps (aggregate). ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bitmap_intersect(get_column(table, 'bm')) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bitmap_intersect(get_column(table, 'user_bitmap')) ┌──────────┐ │ (bitmap) │ └──────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BITMAP_INTERSECT() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITMAP_COUNT(BITMAP_INTERSECT(user_bitmap)) FROM segments; ┌─────┐ │ 100 │ └─────┘ ``` # BITMAP_MAX (Lakehouse v2) > BITMAP_MAX — returns the maximum value in a bitmap. Returns the maximum value in a bitmap. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bitmap_max() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bitmap_max(get_column(table, 'bm')) ┌─────┐ │ 100 │ └─────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BITMAP_MAX() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITMAP_MAX(BITMAP_FROM_STRING('1,50,100')); ┌─────┐ │ 100 │ └─────┘ ``` # BITMAP_MIN (Lakehouse v2) > BITMAP_MIN — returns the minimum value in a bitmap. Returns the minimum value in a bitmap. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bitmap_min() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bitmap_min(get_column(table, 'bm')) ┌───┐ │ 1 │ └───┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BITMAP_MIN() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITMAP_MIN(BITMAP_FROM_STRING('1,50,100')); ┌───┐ │ 1 │ └───┘ ``` # BITMAP_OR (Lakehouse v2) > BITMAP_OR — returns the union of two bitmaps. Returns the union of two bitmaps. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bitmap_or(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bitmap_or(get_column(table, 'bm1'), get_column(table, 'bm2')) ┌──────────┐ │ (bitmap) │ └──────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BITMAP_OR(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITMAP_TO_STRING(BITMAP_OR( BITMAP_FROM_STRING('1,2'), BITMAP_FROM_STRING('2,3'))); ┌───────┐ │ 1,2,3 │ └───────┘ ``` # BITMAP_REMOVE (Lakehouse v2) > BITMAP_REMOVE — removes a specific value from a bitmap. Removes a specific value from a bitmap. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bitmap_remove(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bitmap_remove(get_column(table, 'bm'), 2) ┌──────────┐ │ (bitmap) │ └──────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BITMAP_REMOVE(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITMAP_TO_STRING(BITMAP_REMOVE(BITMAP_FROM_STRING('1,2,3'), 2)); ┌─────┐ │ 1,3 │ └─────┘ ``` # BITMAP_SUBSET_IN_RANGE (Lakehouse v2) > BITMAP_SUBSET_IN_RANGE — returns a subset of a bitmap within a specified range. Returns a subset of a bitmap within a specified range. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bitmap_subset_in_range(, , ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bitmap_subset_in_range(get_column(table, 'bm'), 2, 5) ┌──────────┐ │ (bitmap) │ └──────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BITMAP_SUBSET_IN_RANGE(, , ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITMAP_TO_STRING(BITMAP_SUBSET_IN_RANGE( BITMAP_FROM_STRING('1,2,3,4,5,6'), 2, 5)); ┌───────┐ │ 2,3,4 │ └───────┘ ``` # BITMAP_SUBSET_LIMIT (Lakehouse v2) > BITMAP_SUBSET_LIMIT — returns a subset of a bitmap starting from an offset with a cardinality limit. Returns a subset of a bitmap starting from an offset with a cardinality limit. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bitmap_subset_limit(, , ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bitmap_subset_limit(get_column(table, 'bm'), 0, 3) ┌──────────┐ │ (bitmap) │ └──────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BITMAP_SUBSET_LIMIT(, , ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITMAP_TO_STRING(BITMAP_SUBSET_LIMIT( BITMAP_FROM_STRING('1,2,3,4,5'), 0, 3)); ┌───────┐ │ 1,2,3 │ └───────┘ ``` # BITMAP_TO_ARRAY (Lakehouse v2) > BITMAP_TO_ARRAY — converts a bitmap to an array of integers. Converts a bitmap to an array of integers. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bitmap_to_array() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bitmap_to_array(get_column(table, 'bm')) ┌─────────┐ │ [1,2,3] │ └─────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BITMAP_TO_ARRAY() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITMAP_TO_ARRAY(BITMAP_FROM_STRING('1,2,3')); ┌─────────┐ │ [1,2,3] │ └─────────┘ ``` # BITMAP_TO_BASE64 (Lakehouse v2) > BITMAP_TO_BASE64 — Converts a bitmap to a base64-encoded string. Converts a bitmap to a base64-encoded string. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bitmap_to_base64() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bitmap_to_base64(get_column(table, 'bm')) ┌──────────┐ │ 'AQI...' │ └──────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BITMAP_TO_BASE64() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITMAP_TO_BASE64(BITMAP_FROM_STRING('1,2,3')); ┌─────────────────┐ │ (base64 string) │ └─────────────────┘ ``` # BITMAP_TO_BINARY (Lakehouse v2) > BITMAP_TO_BINARY — converts a bitmap to a binary value. Converts a bitmap to a binary value. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bitmap_to_binary() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bitmap_to_binary(get_column(table, 'bm')) ┌──────────┐ │ (binary) │ └──────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BITMAP_TO_BINARY() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT HEX(BITMAP_TO_BINARY(BITMAP_FROM_STRING('1,2,3'))); ┌──────────────┐ │ (hex string) │ └──────────────┘ ``` # BITMAP_TO_STRING (Lakehouse v2) > BITMAP_TO_STRING — Converts a bitmap to a comma-separated string. Converts a bitmap to a comma-separated string. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bitmap_to_string() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bitmap_to_string(get_column(table, 'bm')) ┌─────────┐ │ '1,2,3' │ └─────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BITMAP_TO_STRING() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITMAP_TO_STRING(BITMAP_FROM_STRING('1,2,3')); ┌───────┐ │ 1,2,3 │ └───────┘ ``` # BITMAP_UNION (Lakehouse v2) > BITMAP_UNION — Returns the union of a set of bitmaps (aggregate). Returns the union of a set of bitmaps (aggregate). ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bitmap_union(get_column(table, 'bm')) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bitmap_union(get_column(table, 'user_bitmap')) ┌──────────┐ │ (bitmap) │ └──────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BITMAP_UNION() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITMAP_COUNT(BITMAP_UNION(user_bitmap)) FROM daily_visits; ┌──────┐ │ 5000 │ └──────┘ ``` # BITMAP_UNION_COUNT (Lakehouse v2) > BITMAP_UNION_COUNT — returns the count of distinct values in the union of a set of bitmaps. Returns the count of distinct values in the union of a set of bitmaps. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bitmap_union_count(get_column(table, 'bm')) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bitmap_union_count(get_column(table, 'user_bitmap')) ┌──────┐ │ 5000 │ └──────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BITMAP_UNION_COUNT() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITMAP_UNION_COUNT(user_bitmap) FROM daily_visits; ┌──────┐ │ 5000 │ └──────┘ ``` # BITMAP_UNION_INT (Lakehouse v2) > BITMAP_UNION_INT — returns the count of distinct integer values (aggregate). Returns the count of distinct integer values (aggregate). ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bitmap_union_int(get_column(table, 'id')) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bitmap_union_int(get_column(table, 'user_id')) ┌─────┐ │ 500 │ └─────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BITMAP_UNION_INT() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITMAP_UNION_INT(user_id) FROM visits; ┌─────┐ │ 500 │ └─────┘ ``` # BITMAP_XOR (Lakehouse v2) > BITMAP_XOR — returns the symmetric difference of two bitmaps. Returns the symmetric difference of two bitmaps. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.bitmap_xor(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.bitmap_xor(get_column(table, 'bm1'), get_column(table, 'bm2')) ┌──────────┐ │ (bitmap) │ └──────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BITMAP_XOR(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITMAP_TO_STRING(BITMAP_XOR( BITMAP_FROM_STRING('1,2,3'), BITMAP_FROM_STRING('2,3,4'))); ┌─────┐ │ 1,4 │ └─────┘ ``` # INTERSECT_COUNT (Lakehouse v2) > Use the INTERSECT_COUNT bitmap function in PlaidCloud Lakehouse. Returns the count of elements in the intersection of multiple bitmaps filtered by dimension. Returns the count of elements in the intersection of multiple bitmaps filtered by dimension. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.intersect_count(get_column(table, 'bm'), get_column(table, 'dim'), val1, val2) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.intersect_count(get_column(table, 'bm'), get_column(table, 'tag'), 1, 2) ┌─────┐ │ 150 │ └─────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql INTERSECT_COUNT(, , val1, val2) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT INTERSECT_COUNT(user_bm, tag, 1, 2) FROM segments; ┌─────┐ │ 150 │ └─────┘ ``` # SUB_BITMAP (Lakehouse v2) > SUB_BITMAP — returns a sub-bitmap starting from a specified position with a cardinality limit. Returns a sub-bitmap starting from a specified position with a cardinality limit. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.sub_bitmap(, , ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.sub_bitmap(get_column(table, 'bm'), 0, 3) ┌──────────┐ │ (bitmap) │ └──────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql SUB_BITMAP(, , ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITMAP_TO_STRING(SUB_BITMAP(BITMAP_FROM_STRING('1,2,3,4,5'), 0, 3)); ┌───────┐ │ 1,2,3 │ └───────┘ ``` # SUBDIVIDE_BITMAP (Lakehouse v2) > SUBDIVIDE_BITMAP — splits a bitmap into multiple sub-bitmaps of a given size. Splits a bitmap into multiple sub-bitmaps of a given size. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.subdivide_bitmap(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.subdivide_bitmap(get_column(table, 'bm'), 2) ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql SUBDIVIDE_BITMAP(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITMAP_TO_STRING(bm) FROM TABLE( SUBDIVIDE_BITMAP(BITMAP_FROM_STRING('1,2,3,4'), 2)); ┌──────┐ │ bm │ ├──────┤ │ 1,2 │ │ 3,4 │ └──────┘ ``` # TO_BITMAP (Lakehouse v2) > TO_BITMAP — converts an integer value to a bitmap containing that single value. Converts an integer value to a bitmap containing that single value. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_bitmap() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_bitmap(42) ┌──────────┐ │ (bitmap) │ └──────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_BITMAP() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BITMAP_TO_STRING(TO_BITMAP(42)); ┌────┐ │ 42 │ └────┘ ``` # UNNEST_BITMAP (Lakehouse v2) > UNNEST_BITMAP — expands a bitmap into a set of rows. Expands a bitmap into a set of rows. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.unnest_bitmap() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.unnest_bitmap(get_column(table, 'bm')) ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql UNNEST_BITMAP() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT * FROM TABLE(UNNEST_BITMAP(BITMAP_FROM_STRING('1,2,3'))); ┌───────┐ │ value │ ├───────┤ │ 1 │ │ 2 │ │ 3 │ └───────┘ ``` # Condition Functions (Lakehouse v2) > Lakehouse v2 SQL condition functions: branch on values with IF, CASE, COALESCE, NULLIF, and related selectors. This section provides reference information for the condition functions in PlaidCloud Lakehouse. ## Functions [Section titled “Functions”](#functions) * [CASE](case-when/) * [COALESCE](coalesce/) * [IF](if/) * [IFNULL](ifnull/) * [NULLIF](nullif/) # CASE (Lakehouse v2) > CASE — evaluates conditions and returns a value when the first condition is met. Evaluates conditions and returns a value when the first condition is met. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python case((get_column(table, 'status') == 1, 'Active'), else_='Inactive') ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python case((get_column(table, 'salary') > 100000, 'High'), (get_column(table, 'salary') > 60000, 'Medium'), else_='Low') ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql CASE(( == 1, 'Active'), else_='Inactive') ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT name, CASE WHEN salary > 100000 THEN 'High' WHEN salary > 60000 THEN 'Medium' ELSE 'Low' END AS band FROM employees; ┌─────────┬────────┐ │ name │ band │ ├─────────┼────────┤ │ Alice │ High │ │ Bob │ Medium │ │ Charlie │ Low │ └─────────┴────────┘ ``` # COALESCE (Lakehouse v2) > COALESCE — returns the first non-NULL expression from a list of expressions. Returns the first non-NULL expression from a list of expressions. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.coalesce(get_column(table, 'nickname'), get_column(table, 'name')) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.coalesce(get_column(table, 'nickname'), get_column(table, 'name'), 'Unknown') ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql COALESCE(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT COALESCE(nickname, first_name, 'Unknown') AS display_name FROM users; ┌──────────────┐ │ display_name │ ├──────────────┤ │ Bob │ │ Alice │ │ Unknown │ └──────────────┘ ``` # IF (Lakehouse v2) > IF — returns one of two values depending on whether a condition is TRUE or FALSE. Returns one of two values depending on whether a condition is TRUE or FALSE. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.if_(get_column(table, 'score') >= 60, 'Pass', 'Fail') ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.if_(get_column(table, 'score') >= 60, 'Pass', 'Fail') ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql IF( >= 60, 'Pass', 'Fail') ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT name, IF(score >= 60, 'Pass', 'Fail') AS result FROM students; ┌─────────┬────────┐ │ name │ result │ ├─────────┼────────┤ │ Alice │ Pass │ │ Bob │ Fail │ │ Charlie │ Pass │ └─────────┴────────┘ ``` # IFNULL (Lakehouse v2) > IFNULL — returns the first expression if it is not NULL, otherwise returns the second expression. Returns the first expression if it is not NULL, otherwise returns the second expression. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.ifnull(get_column(table, 'phone'), 'N/A') ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.ifnull(get_column(table, 'phone'), 'N/A') ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql IFNULL(, 'N/A') ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT name, IFNULL(phone, 'N/A') AS phone FROM contacts; ┌─────────┬──────────┐ │ name │ phone │ ├─────────┼──────────┤ │ Alice │ 555-1234 │ │ Bob │ N/A │ └─────────┴──────────┘ ``` # NULLIF (Lakehouse v2) > NULLIF — returns NULL if two expressions are equal, otherwise returns the first expression. Returns NULL if two expressions are equal, otherwise returns the first expression. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.nullif(get_column(table, 'value'), 0) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.nullif(get_column(table, 'divisor'), 0) ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql NULLIF(, 0) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT 100 / NULLIF(divisor, 0) AS safe_division FROM calculations; ┌───────────────┐ │ safe_division │ ├───────────────┤ │ 50.00 │ │ NULL │ │ 25.00 │ └───────────────┘ ``` # Cryptographic Functions (Lakehouse v2) > Lakehouse v2 SQL cryptographic functions: encryption, decryption, and HMAC helpers. This section provides reference information for the cryptographic functions in PlaidCloud Lakehouse. ## Functions [Section titled “Functions”](#functions) * [AES\_DECRYPT](aes-decrypt/) * [AES\_ENCRYPT](aes-encrypt/) * [BASE64\_DECODE\_BINARY](base64-decode-binary/) * [BASE64\_DECODE\_STRING](base64-decode-string/) * [FROM\_BASE64](from-base64/) * [MD5](md5/) * [MD5SUM\_NUMERIC](md5sum-numeric/) * [MD5SUM](md5sum/) * [SHA2](sha2/) * [SM3](sm3/) * [TO\_BASE64](to-base64/) # AES_DECRYPT (Lakehouse v2) > AES_DECRYPT — decrypts a value encrypted with AES. Decrypts a value encrypted with AES. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.aes_decrypt(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.aes_decrypt(get_column(table, 'encrypted_data'), 'secret_key') ┌──────────────┐ │ 'plain text' │ └──────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql AES_DECRYPT(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT AES_DECRYPT(encrypted_col, 'secret_key') FROM data; ┌────────────┐ │ plain text │ └────────────┘ ``` # AES_ENCRYPT (Lakehouse v2) > AES_ENCRYPT — encrypts a value using AES encryption. Encrypts a value using AES encryption. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.aes_encrypt(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.aes_encrypt('hello', 'secret_key') ┌──────────┐ │ (binary) │ └──────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql AES_ENCRYPT(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT HEX(AES_ENCRYPT('hello', 'secret_key')); ┌──────────┐ │ A7B4C... │ └──────────┘ ``` # BASE64_DECODE_BINARY (Lakehouse v2) > BASE64_DECODE_BINARY — decodes a base64-encoded string to a binary value. Decodes a base64-encoded string to a binary value. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.base64_decode_binary() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.base64_decode_binary('SGVsbG8=') ┌──────────┐ │ b'Hello' │ └──────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BASE64_DECODE_BINARY() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BASE64_DECODE_BINARY('SGVsbG8='); ┌───────┐ │ Hello │ └───────┘ ``` # BASE64_DECODE_STRING (Lakehouse v2) > BASE64_DECODE_STRING — decodes a base64-encoded string to a VARCHAR string. Decodes a base64-encoded string to a VARCHAR string. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.base64_decode_string() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.base64_decode_string('SGVsbG8=') ┌─────────┐ │ 'Hello' │ └─────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql BASE64_DECODE_STRING() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT BASE64_DECODE_STRING('SGVsbG8='); ┌───────┐ │ Hello │ └───────┘ ``` # FROM_BASE64 (Lakehouse v2) > FROM_BASE64 — decodes a base64-encoded string. Alias for `BASE64_DECODE_STRING`. Decodes a base64-encoded string. Alias for `BASE64_DECODE_STRING`. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.from_base64() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.from_base64('SGVsbG8=') ┌─────────┐ │ 'Hello' │ └─────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql FROM_BASE64() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT FROM_BASE64('SGVsbG8='); ┌───────┐ │ Hello │ └───────┘ ``` # MD5 (Lakehouse v2) > MD5 — returns the MD5 hash of a string as a 32-character hexadecimal string. Returns the MD5 hash of a string as a 32-character hexadecimal string. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.md5() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.md5('hello') ┌────────────────────────────────────┐ │ '5d41402abc4b2a76b9719d911017c592' │ └────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql MD5() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT MD5('hello'); ┌──────────────────────────────────┐ │ 5d41402abc4b2a76b9719d911017c592 │ └──────────────────────────────────┘ ``` # MD5SUM (Lakehouse v2) > MD5SUM — returns the MD5 hash of multiple strings concatenated together. Returns the MD5 hash of multiple strings concatenated together. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.md5sum([, , ...]) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.md5sum('hello', 'world') ┌────────────────────────────────────┐ │ 'fc5e038d38a57032085441e7fe7010b0' │ └────────────────────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql MD5SUM([, , ...]) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT MD5SUM('hello', 'world'); ┌──────────────────────────────────┐ │ fc5e038d38a57032085441e7fe7010b0 │ └──────────────────────────────────┘ ``` # MD5SUM_NUMERIC (Lakehouse v2) > MD5SUM_NUMERIC — returns the MD5 hash of multiple strings as a 128-bit numeric value. Returns the MD5 hash of multiple strings as a 128-bit numeric value. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.md5sum_numeric([, , ...]) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.md5sum_numeric('hello') ┌────────────┐ │ (largeint) │ └────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql MD5SUM_NUMERIC([, , ...]) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT MD5SUM_NUMERIC('hello'); ┌─────────────────────────────────────────┐ │ 268224579284945131320661290344218476706 │ └─────────────────────────────────────────┘ ``` # SHA2 (Lakehouse v2) > SHA2 — returns the SHA-2 hash of a string for a specified bit length (224, 256, 384, or 512). Returns the SHA-2 hash of a string for a specified bit length (224, 256, 384, or 512). ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.sha2(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.sha2('hello', 256) ┌───────────────┐ │ '2cf24dba...' │ └───────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql SHA2(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT SHA2('hello', 256); ┌──────────────────────────────────────────────────────────────────┐ │ 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824 │ └──────────────────────────────────────────────────────────────────┘ ``` # SM3 (Lakehouse v2) > SM3 — returns the SM3 hash of a string. Returns the SM3 hash of a string. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.sm3() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.sm3('hello') ┌───────────────┐ │ 'becbbfaa...' │ └───────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql SM3() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT SM3('hello'); ┌──────────────────────────────────────────────────────────────────┐ │ becbbfaae6548b8bf0cfcad5a27183cd1be6093b1cceccc303d9c61d0a645268 │ └──────────────────────────────────────────────────────────────────┘ ``` # TO_BASE64 (Lakehouse v2) > TO_BASE64 — Encodes a string to a base64-encoded string. Encodes a string to a base64-encoded string. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.to_base64() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.to_base64('Hello') ┌────────────┐ │ 'SGVsbG8=' │ └────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql TO_BASE64() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT TO_BASE64('Hello'); ┌──────────┐ │ SGVsbG8= │ └──────────┘ ``` # Date and Time Functions (Lakehouse v2) > Lakehouse v2 SQL date and time functions: parse, format, and arithmetic on dates, times, and timestamps. This section provides reference information for the date and time functions in PlaidCloud Lakehouse. ## Functions [Section titled “Functions”](#functions) * [ADD\_MONTHS](add-months/) * [ADDDATE](adddate/) * [CONVERT\_TZ](convert-tz/) * [CURDATE](curdate/) * [CURRENT\_DATE](current-date/) * [CURRENT\_TIME](current-time/) * [CURRENT\_TIMESTAMP](current-timestamp/) * [CURRENT\_TIMEZONE](current-timezone/) * [CURTIME](curtime/) * [DATE\_ADD](date-add/) * [DATE\_DIFF](date-diff/) * [DATE\_FORMAT](date-format/) * [DATE\_SLICE](date-slice/) * [DATE\_SUB](date-sub/) * [DATE\_TRUNC](date-trunc/) * [DATE](date/) * [DATEDIFF](datediff/) * [DAY](day/) * [DAYNAME](dayname/) * [DAYOFMONTH](dayofmonth/) * [DAYOFWEEK\_ISO](dayofweek-iso/) * [DAYOFWEEK](dayofweek/) * [DAYOFYEAR](dayofyear/) * [DAYS\_ADD](days-add/) * [DAYS\_DIFF](days-diff/) * [DAYS\_SUB](days-sub/) * [FROM\_DAYS](from-days/) * [FROM\_UNIXTIME](from-unixtime/) * [HOUR](hour/) * [HOURS\_ADD](hours-add/) * [HOURS\_DIFF](hours-diff/) * [HOURS\_SUB](hours-sub/) * [JODATIME\_FORMAT](jodatime-format/) * [LAST\_DAY](last-day/) * [LOCALTIME](localtime/) * [LOCALTIMESTAMP](localtimestamp/) * [MAKEDATE](makedate/) * [MICROSECONDS\_ADD](microseconds-add/) * [MICROSECONDS\_SUB](microseconds-sub/) * [MILLISECONDS\_DIFF](milliseconds-diff/) * [MINUTE](minute/) * [MINUTES\_ADD](minutes-add/) * [MINUTES\_DIFF](minutes-diff/) * [MINUTES\_SUB](minutes-sub/) * [MONTH](month/) * [MONTHNAME](monthname/) * [MONTHS\_ADD](months-add/) * [MONTHS\_DIFF](months-diff/) * [MONTHS\_SUB](months-sub/) * [NEXT\_DAY](next-day/) * [NOW](now/) * [PREVIOUS\_DAY](previous-day/) * [QUARTER](quarter/) * [SEC\_TO\_TIME](sec-to-time/) * [SECOND](second/) * [SECONDS\_ADD](seconds-add/) * [SECONDS\_DIFF](seconds-diff/) * [SECONDS\_SUB](seconds-sub/) * [STR\_TO\_DATE](str-to-date/) * [STR\_TO\_JODATIME](str-to-jodatime/) * [STR2DATE](str2date/) * [SUBDATE](subdate/) * [TIME\_FORMAT](time-format/) * [TIME\_SLICE](time-slice/) * [TIME\_TO\_SEC](time-to-sec/) * [TIMEDIFF](timediff/) * [TIMESTAMP](timestamp/) * [TIMESTAMPADD](timestampadd/) * [TIMESTAMPDIFF](timestampdiff/) * [TO\_DATE](to-date/) * [TO\_DATETIME\_NTZ](to-datetime-ntz/) * [TO\_DATETIME](to-datetime/) * [TO\_DAYS](to-days/) * [TO\_ISO8601](to-iso8601/) * [TO\_TERA\_DATE](to-tera-date/) * [TO\_TERA\_TIMESTAMP](to-tera-timestamp/) * [UNIX\_TIMESTAMP](unix-timestamp/) * [UTC\_TIME](utc-time/) * [UTC\_TIMESTAMP](utc-timestamp/) * [WEEK\_ISO](week-iso/) * [WEEK](week/) * [WEEKDAY](weekday/) * [WEEKOFYEAR](weekofyear/) * [WEEKS\_ADD](weeks-add/) * [WEEKS\_DIFF](weeks-diff/) * [WEEKS\_SUB](weeks-sub/) * [YEAR](year/) * [YEARS\_ADD](years-add/) * [YEARS\_DIFF](years-diff/) * [YEARS\_SUB](years-sub/) * [YEARWEEK](yearweek/) # ADD_MONTHS (Lakehouse v2) > ADD_MONTHS — adds a specified number of months to a date. Adds a specified number of months to a date. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.add_months(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.add_months('2024-01-31', 1) ┌──────────────┐ │ '2024-02-29' │ └──────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ADD_MONTHS(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ADD_MONTHS('2024-01-31', 1); ┌────────────┐ │ 2024-02-29 │ └────────────┘ ``` # ADDDATE (Lakehouse v2) > ADDDATE — adds a specified time interval to a date. Alias for `DATE_ADD`. Adds a specified time interval to a date. Alias for `DATE_ADD`. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.adddate(, INTERVAL ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.adddate('2024-01-01', text('INTERVAL 7 DAY')) ┌──────────────┐ │ '2024-01-08' │ └──────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql ADDDATE(, INTERVAL ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT ADDDATE('2024-01-01', INTERVAL 7 DAY); ┌────────────┐ │ 2024-01-08 │ └────────────┘ ``` # CONVERT_TZ (Lakehouse v2) > CONVERT_TZ — converts a datetime from one time zone to another. Converts a datetime from one time zone to another. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.convert_tz(, , ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.convert_tz('2024-01-01 12:00:00', 'UTC', 'America/New_York') ┌───────────────────────┐ │ '2024-01-01 07:00:00' │ └───────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql CONVERT_TZ(, , ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT CONVERT_TZ('2024-01-01 12:00:00', 'UTC', 'America/New_York'); ┌─────────────────────┐ │ 2024-01-01 07:00:00 │ └─────────────────────┘ ``` # CURDATE (Lakehouse v2) > CURDATE — returns the current date. Returns the current date. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.curdate() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.curdate() ┌──────────────┐ │ '2024-06-15' │ └──────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql CURDATE() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT CURDATE(); ┌────────────┐ │ 2024-06-15 │ └────────────┘ ``` # CURRENT_DATE (Lakehouse v2) > CURRENT_DATE — returns the current date. Alias for `CURDATE`. Returns the current date. Alias for `CURDATE`. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.current_date() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.current_date() ┌──────────────┐ │ '2024-06-15' │ └──────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql CURRENT_DATE() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT CURRENT_DATE(); ┌────────────┐ │ 2024-06-15 │ └────────────┘ ``` # CURRENT_TIME (Lakehouse v2) > CURRENT_TIME — returns the current time. Returns the current time. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.current_time() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.current_time() ┌────────────┐ │ '14:30:00' │ └────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql CURRENT_TIME() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT CURRENT_TIME(); ┌──────────┐ │ 14:30:00 │ └──────────┘ ``` # CURRENT_TIMESTAMP (Lakehouse v2) > CURRENT_TIMESTAMP — returns the current date and time. Returns the current date and time. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.current_timestamp() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.current_timestamp() ┌───────────────────────┐ │ '2024-06-15 14:30:00' │ └───────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql CURRENT_TIMESTAMP() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT CURRENT_TIMESTAMP(); ┌─────────────────────┐ │ 2024-06-15 14:30:00 │ └─────────────────────┘ ``` # CURRENT_TIMEZONE (Lakehouse v2) > CURRENT_TIMEZONE — returns the current session time zone. Returns the current session time zone. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.current_timezone() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.current_timezone() ┌────────────────────┐ │ 'America/New_York' │ └────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql CURRENT_TIMEZONE() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT CURRENT_TIMEZONE(); ┌──────────────────┐ │ America/New_York │ └──────────────────┘ ``` # CURTIME (Lakehouse v2) > CURTIME — returns the current time. Alias for `CURRENT_TIME`. Returns the current time. Alias for `CURRENT_TIME`. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.curtime() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.curtime() ┌────────────┐ │ '14:30:00' │ └────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql CURTIME() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT CURTIME(); ┌──────────┐ │ 14:30:00 │ └──────────┘ ``` # DATE (Lakehouse v2) > DATE — extracts the date part from a datetime expression. Extracts the date part from a datetime expression. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.date() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.date('2024-06-15 14:30:00') ┌──────────────┐ │ '2024-06-15' │ └──────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql DATE() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT DATE('2024-06-15 14:30:00'); ┌────────────┐ │ 2024-06-15 │ └────────────┘ ``` # DATE_ADD (Lakehouse v2) > DATE_ADD — adds a specified time interval to a date or datetime. Adds a specified time interval to a date or datetime. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.date_add(, INTERVAL ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.date_add('2024-01-01', text('INTERVAL 30 DAY')) ┌──────────────┐ │ '2024-01-31' │ └──────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql DATE_ADD(, INTERVAL ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT DATE_ADD('2024-01-01', INTERVAL 30 DAY); ┌────────────┐ │ 2024-01-31 │ └────────────┘ ``` # DATE_DIFF (Lakehouse v2) > DATE_DIFF — returns the difference between two dates in the specified unit. Returns the difference between two dates in the specified unit. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.date_diff(, , ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.date_diff('DAY', '2024-01-01', '2024-03-01') ┌────┐ │ 60 │ └────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql DATE_DIFF(, , ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT DATE_DIFF('DAY', '2024-01-01', '2024-03-01'); ┌────┐ │ 60 │ └────┘ ``` # DATE_FORMAT (Lakehouse v2) > DATE_FORMAT — formats a date or datetime value according to a format string. Formats a date or datetime value according to a format string. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.date_format(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.date_format('2024-06-15', '%Y/%m/%d') ┌──────────────┐ │ '2024/06/15' │ └──────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql DATE_FORMAT(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT DATE_FORMAT('2024-06-15', '%Y/%m/%d'); ┌────────────┐ │ 2024/06/15 │ └────────────┘ ``` # DATE_SLICE (Lakehouse v2) > Use the DATE_SLICE date/time function in PlaidCloud Lakehouse. Converts a given time to the beginning or end of a time interval based on the specified period. Converts a given time to the beginning or end of a time interval based on the specified period. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.date_slice(, INTERVAL [, ]) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.date_slice('2024-06-15 14:35:00', text('INTERVAL 1 HOUR')) ┌───────────────────────┐ │ '2024-06-15 14:00:00' │ └───────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql DATE_SLICE(, INTERVAL [, ]) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT DATE_SLICE('2024-06-15 14:35:00', INTERVAL 1 HOUR); ┌─────────────────────┐ │ 2024-06-15 14:00:00 │ └─────────────────────┘ ``` # DATE_SUB (Lakehouse v2) > DATE_SUB — subtracts a specified time interval from a date or datetime. Subtracts a specified time interval from a date or datetime. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.date_sub(, INTERVAL ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.date_sub('2024-03-01', text('INTERVAL 1 MONTH')) ┌──────────────┐ │ '2024-02-01' │ └──────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql DATE_SUB(, INTERVAL ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT DATE_SUB('2024-03-01', INTERVAL 1 MONTH); ┌────────────┐ │ 2024-02-01 │ └────────────┘ ``` # DATE_TRUNC (Lakehouse v2) > DATE_TRUNC — truncates a date or datetime value to the specified precision. Truncates a date or datetime value to the specified precision. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.date_trunc(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.date_trunc('MONTH', '2024-06-15') ┌──────────────┐ │ '2024-06-01' │ └──────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql DATE_TRUNC(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT DATE_TRUNC('MONTH', '2024-06-15'); ┌────────────┐ │ 2024-06-01 │ └────────────┘ ``` # DATEDIFF (Lakehouse v2) > DATEDIFF — returns the number of days between two dates. Returns the number of days between two dates. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.datediff(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.datediff('2024-03-01', '2024-01-01') ┌────┐ │ 60 │ └────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql DATEDIFF(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT DATEDIFF('2024-03-01', '2024-01-01'); ┌────┐ │ 60 │ └────┘ ``` # DAY (Lakehouse v2) > DAY — returns the day of the month from a date. Returns the day of the month from a date. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.day() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.day('2024-06-15') ┌────┐ │ 15 │ └────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql DAY() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT DAY('2024-06-15'); ┌────┐ │ 15 │ └────┘ ``` # DAYNAME (Lakehouse v2) > DAYNAME — returns the name of the weekday for a date. Returns the name of the weekday for a date. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.dayname() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.dayname('2024-06-15') ┌────────────┐ │ 'Saturday' │ └────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql DAYNAME() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT DAYNAME('2024-06-15'); ┌──────────┐ │ Saturday │ └──────────┘ ``` # DAYOFMONTH (Lakehouse v2) > DAYOFMONTH — returns the day of the month from a date. Alias for `DAY`. Returns the day of the month from a date. Alias for `DAY`. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.dayofmonth() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.dayofmonth('2024-06-15') ┌────┐ │ 15 │ └────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql DAYOFMONTH() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT DAYOFMONTH('2024-06-15'); ┌────┐ │ 15 │ └────┘ ``` # DAYOFWEEK (Lakehouse v2) > DAYOFWEEK — returns the day of the week index for a date (1=Sunday, 7=Saturday). Returns the day of the week index for a date (1=Sunday, 7=Saturday). ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.dayofweek() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.dayofweek('2024-06-15') ┌───┐ │ 7 │ └───┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql DAYOFWEEK() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT DAYOFWEEK('2024-06-15'); ┌───┐ │ 7 │ └───┘ ``` # DAYOFWEEK_ISO (Lakehouse v2) > DAYOFWEEK_ISO — returns the ISO day of the week index for a date (1=Monday, 7=Sunday). Returns the ISO day of the week index for a date (1=Monday, 7=Sunday). ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.dayofweek_iso() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.dayofweek_iso('2024-06-15') ┌───┐ │ 6 │ └───┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql DAYOFWEEK_ISO() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT DAYOFWEEK_ISO('2024-06-15'); ┌───┐ │ 6 │ └───┘ ``` # DAYOFYEAR (Lakehouse v2) > DAYOFYEAR — returns the day of the year from a date. Returns the day of the year from a date. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.dayofyear() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.dayofyear('2024-06-15') ┌─────┐ │ 167 │ └─────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql DAYOFYEAR() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT DAYOFYEAR('2024-06-15'); ┌─────┐ │ 167 │ └─────┘ ``` # DAYS_ADD (Lakehouse v2) > DAYS_ADD — adds a specified number of days to a date. Adds a specified number of days to a date. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.days_add(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.days_add('2024-01-01', 30) ┌──────────────┐ │ '2024-01-31' │ └──────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql DAYS_ADD(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT DAYS_ADD('2024-01-01', 30); ┌────────────┐ │ 2024-01-31 │ └────────────┘ ``` # DAYS_DIFF (Lakehouse v2) > DAYS_DIFF — returns the number of days between two dates. Returns the number of days between two dates. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.days_diff(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.days_diff('2024-03-01', '2024-01-01') ┌────┐ │ 60 │ └────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql DAYS_DIFF(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT DAYS_DIFF('2024-03-01', '2024-01-01'); ┌────┐ │ 60 │ └────┘ ``` # DAYS_SUB (Lakehouse v2) > DAYS_SUB — subtracts a specified number of days from a date. Subtracts a specified number of days from a date. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.days_sub(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.days_sub('2024-01-31', 30) ┌──────────────┐ │ '2024-01-01' │ └──────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql DAYS_SUB(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT DAYS_SUB('2024-01-31', 30); ┌────────────┐ │ 2024-01-01 │ └────────────┘ ``` # FROM_DAYS (Lakehouse v2) > FROM_DAYS — converts a day count to a date. Converts a day count to a date. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.from_days() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.from_days(738886) ┌──────────────┐ │ '2024-01-01' │ └──────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql FROM_DAYS() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT FROM_DAYS(738886); ┌────────────┐ │ 2024-01-01 │ └────────────┘ ``` # FROM_UNIXTIME (Lakehouse v2) > FROM_UNIXTIME — converts a Unix timestamp to a datetime string. Converts a Unix timestamp to a datetime string. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.from_unixtime([, ]) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.from_unixtime(1704067200) ┌───────────────────────┐ │ '2024-01-01 00:00:00' │ └───────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql FROM_UNIXTIME([, ]) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT FROM_UNIXTIME(1704067200); ┌─────────────────────┐ │ 2024-01-01 00:00:00 │ └─────────────────────┘ ``` # HOUR (Lakehouse v2) > HOUR — returns the hour from a datetime. Returns the hour from a datetime. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.hour() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.hour('2024-06-15 14:30:00') ┌────┐ │ 14 │ └────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql HOUR() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT HOUR('2024-06-15 14:30:00'); ┌────┐ │ 14 │ └────┘ ``` # HOURS_ADD (Lakehouse v2) > HOURS_ADD — adds a specified number of hours to a datetime. Adds a specified number of hours to a datetime. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.hours_add(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.hours_add('2024-01-01 10:00:00', 5) ┌───────────────────────┐ │ '2024-01-01 15:00:00' │ └───────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql HOURS_ADD(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT HOURS_ADD('2024-01-01 10:00:00', 5); ┌─────────────────────┐ │ 2024-01-01 15:00:00 │ └─────────────────────┘ ``` # HOURS_DIFF (Lakehouse v2) > HOURS_DIFF — returns the number of hours between two datetimes. Returns the number of hours between two datetimes. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.hours_diff(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.hours_diff('2024-01-02 10:00:00', '2024-01-01 10:00:00') ┌────┐ │ 24 │ └────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql HOURS_DIFF(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT HOURS_DIFF('2024-01-02 10:00:00', '2024-01-01 10:00:00'); ┌────┐ │ 24 │ └────┘ ``` # HOURS_SUB (Lakehouse v2) > HOURS_SUB — subtracts a specified number of hours from a datetime. Subtracts a specified number of hours from a datetime. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.hours_sub(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.hours_sub('2024-01-01 15:00:00', 5) ┌───────────────────────┐ │ '2024-01-01 10:00:00' │ └───────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql HOURS_SUB(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT HOURS_SUB('2024-01-01 15:00:00', 5); ┌─────────────────────┐ │ 2024-01-01 10:00:00 │ └─────────────────────┘ ``` # JODATIME_FORMAT (Lakehouse v2) > JODATIME_FORMAT — formats a date or datetime using Joda-Time format patterns. Formats a date or datetime using Joda-Time format patterns. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.jodatime_format(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.jodatime_format('2024-06-15', 'yyyy/MM/dd') ┌──────────────┐ │ '2024/06/15' │ └──────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql JODATIME_FORMAT(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT JODATIME_FORMAT('2024-06-15', 'yyyy/MM/dd'); ┌────────────┐ │ 2024/06/15 │ └────────────┘ ``` # LAST_DAY (Lakehouse v2) > LAST_DAY — returns the last day of the month for a given date. Returns the last day of the month for a given date. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.last_day([, ]) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.last_day('2024-02-15') ┌──────────────┐ │ '2024-02-29' │ └──────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql LAST_DAY([, ]) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT LAST_DAY('2024-02-15'); ┌────────────┐ │ 2024-02-29 │ └────────────┘ ``` # LOCALTIME (Lakehouse v2) > LOCALTIME — returns the current date and time. Alias for `NOW`. Returns the current date and time. Alias for `NOW`. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.localtime() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.localtime() ┌───────────────────────┐ │ '2024-06-15 14:30:00' │ └───────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql LOCALTIME() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT LOCALTIME(); ┌─────────────────────┐ │ 2024-06-15 14:30:00 │ └─────────────────────┘ ``` # LOCALTIMESTAMP (Lakehouse v2) > LOCALTIMESTAMP — returns the current date and time. Alias for `NOW`. Returns the current date and time. Alias for `NOW`. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.localtimestamp() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.localtimestamp() ┌───────────────────────┐ │ '2024-06-15 14:30:00' │ └───────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql LOCALTIMESTAMP() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT LOCALTIMESTAMP(); ┌─────────────────────┐ │ 2024-06-15 14:30:00 │ └─────────────────────┘ ``` # MAKEDATE (Lakehouse v2) > MAKEDATE — creates a date from a year and day-of-year value. Creates a date from a year and day-of-year value. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.makedate(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.makedate(2024, 100) ┌──────────────┐ │ '2024-04-09' │ └──────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql MAKEDATE(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT MAKEDATE(2024, 100); ┌────────────┐ │ 2024-04-09 │ └────────────┘ ``` # MICROSECONDS_ADD (Lakehouse v2) > MICROSECONDS_ADD — adds a specified number of microseconds to a datetime. Adds a specified number of microseconds to a datetime. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.microseconds_add(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.microseconds_add('2024-01-01 00:00:00', 1000000) ┌───────────────────────┐ │ '2024-01-01 00:00:01' │ └───────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql MICROSECONDS_ADD(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT MICROSECONDS_ADD('2024-01-01 00:00:00', 1000000); ┌─────────────────────┐ │ 2024-01-01 00:00:01 │ └─────────────────────┘ ``` # MICROSECONDS_SUB (Lakehouse v2) > MICROSECONDS_SUB — subtracts a specified number of microseconds from a datetime. Subtracts a specified number of microseconds from a datetime. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.microseconds_sub(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.microseconds_sub('2024-01-01 00:00:01', 1000000) ┌───────────────────────┐ │ '2024-01-01 00:00:00' │ └───────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql MICROSECONDS_SUB(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT MICROSECONDS_SUB('2024-01-01 00:00:01', 1000000); ┌─────────────────────┐ │ 2024-01-01 00:00:00 │ └─────────────────────┘ ``` # MILLISECONDS_DIFF (Lakehouse v2) > MILLISECONDS_DIFF — returns the number of milliseconds between two datetimes. Returns the number of milliseconds between two datetimes. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.milliseconds_diff(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.milliseconds_diff('2024-01-01 00:00:01', '2024-01-01 00:00:00') ┌──────┐ │ 1000 │ └──────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql MILLISECONDS_DIFF(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT MILLISECONDS_DIFF('2024-01-01 00:00:01', '2024-01-01 00:00:00'); ┌──────┐ │ 1000 │ └──────┘ ``` # MINUTE (Lakehouse v2) > MINUTE — returns the minute from a datetime. Returns the minute from a datetime. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.minute() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.minute('2024-06-15 14:30:00') ┌────┐ │ 30 │ └────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql MINUTE() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT MINUTE('2024-06-15 14:30:00'); ┌────┐ │ 30 │ └────┘ ``` # MINUTES_ADD (Lakehouse v2) > MINUTES_ADD — adds a specified number of minutes to a datetime. Adds a specified number of minutes to a datetime. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.minutes_add(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.minutes_add('2024-01-01 10:00:00', 45) ┌───────────────────────┐ │ '2024-01-01 10:45:00' │ └───────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql MINUTES_ADD(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT MINUTES_ADD('2024-01-01 10:00:00', 45); ┌─────────────────────┐ │ 2024-01-01 10:45:00 │ └─────────────────────┘ ``` # MINUTES_DIFF (Lakehouse v2) > MINUTES_DIFF — returns the number of minutes between two datetimes. Returns the number of minutes between two datetimes. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.minutes_diff(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.minutes_diff('2024-01-01 11:00:00', '2024-01-01 10:00:00') ┌────┐ │ 60 │ └────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql MINUTES_DIFF(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT MINUTES_DIFF('2024-01-01 11:00:00', '2024-01-01 10:00:00'); ┌────┐ │ 60 │ └────┘ ``` # MINUTES_SUB (Lakehouse v2) > MINUTES_SUB — subtracts a specified number of minutes from a datetime. Subtracts a specified number of minutes from a datetime. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.minutes_sub(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.minutes_sub('2024-01-01 10:45:00', 45) ┌───────────────────────┐ │ '2024-01-01 10:00:00' │ └───────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql MINUTES_SUB(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT MINUTES_SUB('2024-01-01 10:45:00', 45); ┌─────────────────────┐ │ 2024-01-01 10:00:00 │ └─────────────────────┘ ``` # MONTH (Lakehouse v2) > MONTH — returns the month from a date. Returns the month from a date. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.month() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.month('2024-06-15') ┌───┐ │ 6 │ └───┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql MONTH() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT MONTH('2024-06-15'); ┌───┐ │ 6 │ └───┘ ``` # MONTHNAME (Lakehouse v2) > MONTHNAME — returns the name of the month for a date. Returns the name of the month for a date. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.monthname() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.monthname('2024-06-15') ┌────────┐ │ 'June' │ └────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql MONTHNAME() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT MONTHNAME('2024-06-15'); ┌──────┐ │ June │ └──────┘ ``` # MONTHS_ADD (Lakehouse v2) > MONTHS_ADD — adds a specified number of months to a date. Adds a specified number of months to a date. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.months_add(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.months_add('2024-01-15', 3) ┌──────────────┐ │ '2024-04-15' │ └──────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql MONTHS_ADD(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT MONTHS_ADD('2024-01-15', 3); ┌────────────┐ │ 2024-04-15 │ └────────────┘ ``` # MONTHS_DIFF (Lakehouse v2) > MONTHS_DIFF — returns the number of months between two dates. Returns the number of months between two dates. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.months_diff(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.months_diff('2024-06-01', '2024-01-01') ┌───┐ │ 5 │ └───┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql MONTHS_DIFF(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT MONTHS_DIFF('2024-06-01', '2024-01-01'); ┌───┐ │ 5 │ └───┘ ``` # MONTHS_SUB (Lakehouse v2) > MONTHS_SUB — subtracts a specified number of months from a date. Subtracts a specified number of months from a date. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.months_sub(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.months_sub('2024-06-15', 3) ┌──────────────┐ │ '2024-03-15' │ └──────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql MONTHS_SUB(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT MONTHS_SUB('2024-06-15', 3); ┌────────────┐ │ 2024-03-15 │ └────────────┘ ``` # NEXT_DAY (Lakehouse v2) > NEXT_DAY — returns the date of the next specified weekday after a given date. Returns the date of the next specified weekday after a given date. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.next_day(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.next_day('2024-06-15', 'Monday') ┌──────────────┐ │ '2024-06-17' │ └──────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql NEXT_DAY(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT NEXT_DAY('2024-06-15', 'Monday'); ┌────────────┐ │ 2024-06-17 │ └────────────┘ ``` # NOW (Lakehouse v2) > NOW — returns the current date and time. Returns the current date and time. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.now() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.now() ┌───────────────────────┐ │ '2024-06-15 14:30:00' │ └───────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql NOW() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT NOW(); ┌─────────────────────┐ │ 2024-06-15 14:30:00 │ └─────────────────────┘ ``` # PREVIOUS_DAY (Lakehouse v2) > PREVIOUS_DAY — returns the date of the previous specified weekday before a given date. Returns the date of the previous specified weekday before a given date. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.previous_day(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.previous_day('2024-06-15', 'Monday') ┌──────────────┐ │ '2024-06-10' │ └──────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql PREVIOUS_DAY(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT PREVIOUS_DAY('2024-06-15', 'Monday'); ┌────────────┐ │ 2024-06-10 │ └────────────┘ ``` # QUARTER (Lakehouse v2) > QUARTER — returns the quarter of the year from a date (1-4). Returns the quarter of the year from a date (1-4). ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.quarter() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.quarter('2024-06-15') ┌───┐ │ 2 │ └───┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql QUARTER() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT QUARTER('2024-06-15'); ┌───┐ │ 2 │ └───┘ ``` # SEC_TO_TIME (Lakehouse v2) > SEC_TO_TIME — converts seconds to a time value. Converts seconds to a time value. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.sec_to_time() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.sec_to_time(3661) ┌────────────┐ │ '01:01:01' │ └────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql SEC_TO_TIME() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT SEC_TO_TIME(3661); ┌──────────┐ │ 01:01:01 │ └──────────┘ ``` # SECOND (Lakehouse v2) > SECOND — returns the second from a datetime. Returns the second from a datetime. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.second() ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.second('2024-06-15 14:30:45') ┌────┐ │ 45 │ └────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql SECOND() ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT SECOND('2024-06-15 14:30:45'); ┌────┐ │ 45 │ └────┘ ``` # SECONDS_ADD (Lakehouse v2) > SECONDS_ADD — adds a specified number of seconds to a datetime. Adds a specified number of seconds to a datetime. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.seconds_add(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.seconds_add('2024-01-01 00:00:00', 90) ┌───────────────────────┐ │ '2024-01-01 00:01:30' │ └───────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql SECONDS_ADD(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT SECONDS_ADD('2024-01-01 00:00:00', 90); ┌─────────────────────┐ │ 2024-01-01 00:01:30 │ └─────────────────────┘ ``` # SECONDS_DIFF (Lakehouse v2) > SECONDS_DIFF — returns the number of seconds between two datetimes. Returns the number of seconds between two datetimes. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.seconds_diff(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.seconds_diff('2024-01-01 00:01:30', '2024-01-01 00:00:00') ┌────┐ │ 90 │ └────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql SECONDS_DIFF(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT SECONDS_DIFF('2024-01-01 00:01:30', '2024-01-01 00:00:00'); ┌────┐ │ 90 │ └────┘ ``` # SECONDS_SUB (Lakehouse v2) > SECONDS_SUB — subtracts a specified number of seconds from a datetime. Subtracts a specified number of seconds from a datetime. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.seconds_sub(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.seconds_sub('2024-01-01 00:01:30', 90) ┌───────────────────────┐ │ '2024-01-01 00:00:00' │ └───────────────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql SECONDS_SUB(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT SECONDS_SUB('2024-01-01 00:01:30', 90); ┌─────────────────────┐ │ 2024-01-01 00:00:00 │ └─────────────────────┘ ``` # STR_TO_DATE (Lakehouse v2) > STR_TO_DATE — parses a string into a date using a format string. Parses a string into a date using a format string. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.str_to_date(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.str_to_date('Jun 15 2024', '%b %d %Y') ┌──────────────┐ │ '2024-06-15' │ └──────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql STR_TO_DATE(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT STR_TO_DATE('Jun 15 2024', '%b %d %Y'); ┌────────────┐ │ 2024-06-15 │ └────────────┘ ``` # STR_TO_JODATIME (Lakehouse v2) > STR_TO_JODATIME — parses a string into a datetime using Joda-Time format patterns. Parses a string into a datetime using Joda-Time format patterns. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.str_to_jodatime(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.str_to_jodatime('2024/06/15', 'yyyy/MM/dd') ┌──────────────┐ │ '2024-06-15' │ └──────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql STR_TO_JODATIME(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT STR_TO_JODATIME('2024/06/15', 'yyyy/MM/dd'); ┌────────────┐ │ 2024-06-15 │ └────────────┘ ``` # STR2DATE (Lakehouse v2) > STR2DATE — parses a string into a date using a format string. Alias for `STR_TO_DATE`. Parses a string into a date using a format string. Alias for `STR_TO_DATE`. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.str2date(, ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.str2date('2024/06/15', '%Y/%m/%d') ┌──────────────┐ │ '2024-06-15' │ └──────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql STR2DATE(, ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT STR2DATE('2024/06/15', '%Y/%m/%d'); ┌────────────┐ │ 2024-06-15 │ └────────────┘ ``` # SUBDATE (Lakehouse v2) > SUBDATE — subtracts a time interval from a date. Alias for `DATE_SUB`. Subtracts a time interval from a date. Alias for `DATE_SUB`. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.subdate(, INTERVAL ) ``` ## Analyze Examples [Section titled “Analyze Examples”](#analyze-examples) ```python func.subdate('2024-01-31', text('INTERVAL 7 DAY')) ┌──────────────┐ │ '2024-01-24' │ └──────────────┘ ``` ## SQL Syntax [Section titled “SQL Syntax”](#sql-syntax) ```sql SUBDATE(, INTERVAL ) ``` ## SQL Examples [Section titled “SQL Examples”](#sql-examples) ```sql SELECT SUBDATE('2024-01-31', INTERVAL 7 DAY); ┌────────────┐ │ 2024-01-24 │ └────────────┘ ``` # TIME_FORMAT (Lakehouse v2) > TIME_FORMAT — formats a time value according to a format string. Formats a time value according to a format string. ## Analyze Syntax [Section titled “Analyze Syntax”](#analyze-syntax) ```python func.time_format(