Load, Transform, and Publish Data
This tutorial takes about an hour. By the end you’ll have a working workflow that imports a CSV, transforms it through a few table steps, and publishes the result.
What You’ll Build
Section titled “What You’ll Build”A workflow that takes a raw sales CSV, cleans it, joins it to a product reference table, computes derived columns, and publishes the result as a clean fact table.
sales.csv → import → filter → join with products → add columns → publish ↓ products.csv → importPrerequisites
Section titled “Prerequisites”- A PlaidCloud workspace (start a free trial if you don’t have one)
- A project to work in (create one from the Projects tab if needed)
- Two CSV files to import — for this tutorial we’ll use a simple sales transactions file and a product catalog file. You can use your own or generate sample data with any spreadsheet tool
Step 1: Create the Workflow
Section titled “Step 1: Create the Workflow”- Open your project and switch to the Workflows tab.
- Click New Workflow. Name it something descriptive like “Sales Cleanup”.
- Click Create. The empty workflow appears in your list.
- Double-click the workflow to open the Workflow Explorer.
Step 2: Import the Sales CSV
Section titled “Step 2: Import the Sales CSV”- In the Workflow Explorer, add a new step.
- Choose Import → CSV (or whichever import step matches your source format).
- In the step configuration:
- Source file — point at your sales CSV (upload, or pick from a connected document account)
- Target table — name it
sales_raw - Delimiter, quote character, header row — adjust if your file is non-standard
- Run the step. The CSV lands as a new table in your project.
Check the Tables tab to see sales_raw. Click into it to verify the data looks right — column count, sample rows, data types.
Step 3: Import the Product Catalog
Section titled “Step 3: Import the Product Catalog”Repeat Step 2 with your products CSV, targeting a table named products_raw. This gives you both reference tables needed for the join.
Step 4: Filter Out Bad Rows
Section titled “Step 4: Filter Out Bad Rows”Real sales data has gaps — null amounts, test transactions, refunds you don’t want in the main fact table. Add a filter step to remove them.
- Add a Tables → Table Lookup step (or any table transform that lets you filter).
- Configure:
- Source table:
sales_raw - Target table:
sales_clean - Filter conditions: e.g.,
amount > 0 AND status = 'completed'
- Source table:
- Run the step.
sales_cleanshould have fewer rows thansales_raw.
Step 5: Join to the Product Catalog
Section titled “Step 5: Join to the Product Catalog”Now combine the cleaned sales rows with product details from the catalog.
- Add a Tables → Table Inner Join step.
- Configure:
- Left table:
sales_clean - Right table:
products_raw - Join keys: the column linking the two tables (e.g.,
product_id) - Target table:
sales_enriched
- Left table:
- Run the step.
sales_enriched now has every column from both tables. You probably want a subset — that comes next.
Step 6: Select and Compute Columns
Section titled “Step 6: Select and Compute Columns”Cleaning typically means dropping columns you don’t need and computing derived ones. Add another table step:
- Add a Tables → Table Lookup step (used here for column selection and computation).
- Configure:
- Source:
sales_enriched - Target:
sales_final - Columns to keep: the subset that matters for downstream consumers
- Computed columns: e.g.,
revenue = amount * price,margin = revenue - cost
- Source:
- Run the step.
For column-level calculations, the Expressions reference covers every function available — string operations, date math, conditional logic, aggregations.
Step 7: Publish the Result
Section titled “Step 7: Publish the Result”Make sales_final available to dashboards and downstream systems.
- Add a Data → Publish step (or use the Publish option directly on the table).
- Configure who can read the published table — typically other members of the workspace plus any external systems that have access.
- Run the step.
The published table is now reachable by Dashboards, BI tools, and any external consumer with the right permissions.
Step 8: Run the Whole Workflow
Section titled “Step 8: Run the Whole Workflow”Click Run on the workflow (not just one step). Watch the log as each step executes in order. The complete pipeline runs from CSV import through to publish.
If any step errors, the Managing step errors guide covers debugging — the most common issues are bad join keys (mismatch between tables) and unexpected null values in computed columns.
What’s Next
Section titled “What’s Next”- Build an Allocation Model — spread costs across consumers using driver data
- Workflows guide — error handling, conditions, loops, variables
- Workflow steps reference — every step type and what it does