Apache Spark

Apache Spark is the distributed compute engine commonly used for ETL over large datasets. Use this connector to read and write data through Spark SQL endpoints (typically Spark Thrift Server). For Databricks-managed Spark, prefer the Azure Databricks connector.

Upstream Documentation

The Apache Spark documentation.

The Apache project.

Setup

This connector uses a vendor-specific authentication flow and is configured directly from the Connections screen in your workspace. The configuration fields shown depend on the credentials your tenant administrator has provisioned for the integration.

See the upstream spark documentation for the latest setup specifics.

If you need help setting up this connector for your tenant, contact your account team — connector-specific credentials, environment URLs, and any required pre-provisioning typically need to be coordinated with PlaidCloud support.