v0 App

When I joined z9 Ventures in 2023, both Aroma360 and Hotel Collection were generating massive amounts of data but struggling to turn it into actionable insights. Customer events were scattered across platforms, attribution was broken, and our executive team was making million-dollar decisions based on day-old reports.

Fast forward 18 months: we now process over 2 million events daily through a unified data warehouse that powers real-time dashboards, enables precise attribution modeling, and has directly contributed to scaling daily revenue from $100K to $400K.

The Challenge: Data Chaos at Scale

Running two 9-figure eCommerce brands means dealing with data from everywhere: Shopify, Google Analytics, Facebook Ads, call tracking systems, email platforms, SMS providers, and dozens of other tools. Each platform had its own data format, update frequency, and API limitations.

The existing setup was a patchwork of scheduled ETL jobs, CSV exports, and manual data pulls. Critical business metrics like customer lifetime value and marketing attribution were calculated differently across teams, leading to conflicting reports and delayed decision-making.

The Architecture: Universal Webhook Handler

The breakthrough came when I designed what we call our "Universal Webhook Handler" - a cloud function-based system that standardizes how we ingest events from any platform in real-time - designed primarily to work with any webhook subscription, but flexible enough to allow for event style processing for convential APIs .

How It Works

The architecture centers around parameterized webhook URLs that encode metadata directly in the endpoint. When setting up integrations with platforms like Shopify, Facebook, or our call tracking system, we generate URLs like:

https://webhooks.z9data.com?platform=shopify&event=order_created&business_unit=aroma360
https://webhooks.z9data.com?platform=klaviyo&event=sms_sent&business_unit=hotel_collection

Each URL contains four key parameters:

Platform: The source system (shopify, facebook, callrail, etc.)
Event Type: The specific action (order_created, purchase, call_completed)
Business Unit: Which brand the event belongs to (aroma360, hotel_collection)
Additional Context: Campaign IDs, user segments, or other metadata

The Ingestion Flow

When an event fires, our cloud function extracts the metadata from the URL path, validates the payload, and enriches it with additional context like timestamp, IP geolocation, and user agent parsing. The enriched event is then published to a Google Pub/Sub topic.

This approach has several key advantages:

Zero Configuration: New integrations require no code changes, just URL generation
Automatic Routing: Events are automatically tagged and routed based on URL structure
Failure Resilience: Pub/Sub handles retries and dead letter queues automatically
Scalability: Cloud functions scale to zero when idle, handle traffic spikes seamlessly

Real-Time Processing with Local Subscribers

While cloud-native solutions are great for scalability, we needed the flexibility and cost control of on-premises processing for our analytics workloads. Our solution: a locally hosted Python service that subscribes to the Pub/Sub topics and processes events in real-time.

This subscriber service runs on a dedicated server in our office and handles:

Data Validation: Schema validation and data quality checks
Transformation: Converting platform-specific formats to our unified schema
Enrichment: Adding calculated fields, customer segments, and attribution data
Database Insertion: Writing clean, structured data to PostgreSQL

PostgreSQL as the Foundation

All processed events land in a carefully designed PostgreSQL database with separate schemas for raw events, transformed data, and analytics views. We use materialized views extensively to pre-compute common aggregations and ensure sub-second query performance.

Key design decisions:

Event Sourcing: We never delete or modify raw events, only append new ones
Partitioning: Tables are partitioned by date and business unit for query performance
Materialized Views: Pre-computed aggregations for common metrics (daily revenue, customer cohorts, attribution models)
Indexing Strategy: Composite indexes on common query patterns (customer_id + timestamp, campaign_id + event_type)

Analytics and Visualization

With clean, real-time data flowing into PostgreSQL, we connected multiple BI platforms to serve different stakeholders:

Grafana: Real-time operational dashboards for the engineering team
Tableau: Executive reporting and deep-dive analysis for leadership
Custom APIs: Direct database connections for our internal tools and applications

The materialized views are refreshed every 5 minutes, giving us near real-time visibility into key metrics like:

Revenue by channel, campaign, and product
Customer acquisition costs and lifetime value
Attribution modeling across the entire customer journey
Inventory levels and demand forecasting

Results and Impact

The impact of this architecture has been transformative:

Revenue Growth: Enabled scaling from $100K to $400K daily revenue through better attribution and optimization
Decision Speed: Executive team now has real-time visibility into business metrics
Data Quality: Unified schema eliminated discrepancies between teams and reports
Engineering Velocity: New integrations take hours instead of weeks to implement
Cost Efficiency: Hybrid cloud/on-premises approach reduced data processing costs by 80%

Lessons Learned

Building this system taught me several important lessons about data architecture at scale:

1. Design for Change

The most valuable aspect of our webhook handler isn't the technology - it's the flexibility. By encoding metadata in URLs rather than hardcoding it in application logic, we can adapt to new platforms and requirements without code changes.

2. Embrace Hybrid Architectures

Pure cloud solutions aren't always optimal. Our hybrid approach gives us the scalability of cloud functions for ingestion with the cost control and flexibility of on-premises processing.

3. Materialized Views Are Your Friend

Pre-computing common aggregations through materialized views dramatically improved query performance and user experience. The trade-off in storage cost is minimal compared to the value of fast analytics.

4. Event Sourcing Pays Dividends

Maintaining immutable event logs has saved us countless times when debugging attribution issues or implementing new analytics models. The ability to replay events and recalculate metrics is invaluable.

What's Next

We're currently working on several enhancements to this architecture:

Machine Learning Integration: Adding real-time ML models for fraud detection and personalization
Data Mesh: Expanding the architecture to support additional business units and use cases

The foundation we've built has proven robust and scalable, handling everything from Black Friday traffic spikes to new product launches without missing a beat.

Want to discuss data architecture challenges or share your own experiences with event-driven systems?Drop me a line - I'd love to hear from you.

Building Event-Driven Data Warehouses: Lessons from Scaling 9-Figure eCommerce