From two raw CSV files to a live analytics dashboard
UK giftware retailer, production-grade analytics platform built from scratch
Part 1
UK retailer, 38 countries, raw CSV data, not clean, not structured, not usable for analytics
| InvoiceNo | StockCode | Quantity | InvoiceDate | CustomerID |
|---|---|---|---|---|
| 536365 | 85123A | 6 | 12/1/10 8:26 | 17850 |
| C536379 | D | -1 | 12/1/10 9:41 | 14527 |
| 536370 | POST | 3 | 12/1/10 8:45 | 12583 |
C-prefixed invoices: cancellations with negative quantitiesPOST, D stock codes: non-product rowsStockCode with different descriptions and pricesRaw files must be loaded, then cleaned, structured, and modelled before they can answer anything
.sql file. dbt compiles, runs, and tests them against the warehouse in the right order.
| Model | Problem | Solution |
|---|---|---|
dim_datetime |
Dates as text, two formats | PARSE_DATETIME selected by string length |
dim_product |
StockCode not unique |
Surrogate key = hash(code + desc + price) |
dim_customer |
No ISO codes, duplicates | Deduplicate + join country reference |
fct_invoices |
Returns, noise, no totals | Filter Qty > 0 + compute Qty × Price |
541,909 raw rows → 397,884 validated sale facts
·
Tests: unique, not_null, relationships
A modelling pattern where one central fact table is surrounded by dimension tables, like a star. Optimised for analytical queries.
Pre-aggregated for Looker Studio, one model per business question
report_customer_invoices
report_product_invoices
report_year_invoices
Part 2
| Component | Role |
|---|---|
| GCS | Raw CSV file storage, where the data journey begins |
| Eventarc | Fires on object.finalized: zero lag, reacts on file landing |
| Cloud Workflows | Orchestrates: load CSVs first → then trigger dbt |
| BigQuery | 3 datasets: raw, transform, report |
| Cloud Run Job | Runs dbt Docker image on demand, stateless, exits when done |
| Secret Manager | BigQuery credentials, no passwords hardcoded |
| Artifact Registry | Stores the dbt Docker image |
| Cloud Build | CI/CD: rebuilds and pushes the dbt image on git push |
Part 3
Infrastructure as Code: every GCP resource in version-controlled files
modules/platform
modules/pipeline
modules/observability
terraform apply → creates every resource from scratchPart 4
Looker Studio → BigQuery → report dataset · View live dashboard
report_customer_invoices · Bar chartreport_product_invoices · Bar chartreport_year_invoices · Line chartraw_invoice · Horizontal bar by revenueExpected: UK dominates · Q4 spike (Oct to Dec) · small decorative items drive volume
From two CSV files and four unanswered questions
to a fully automated, tested, event-driven analytics platform.
Zero manual steps between data landing and dashboard updating.