Back
Apr 21, 2025

How to count 100B events: Comparing architectures

Reddit built a powerful architecture in 2017 to count views and unique viewers on posts. How does it compare to our simpler Tinybird approach?
Ariel Pérez
Head of Product & Technology

In the previous posts, we looked at how to build, optimize, and scale a simple view counter service to be able to handle 100B+ views and a high-cardinality of unique viewers. These posts were inspired by a 2017 Reddit architecture, which we simplified and optimized with Tinybird.

Now, let's compare the original Reddit architecture with our simpler Tinybird approach, focusing on the real costs - both obvious and hidden.

Setting the stage

To make an apples-to-apples comparison, let's establish some reasonable assumptions about the scale and usage patterns we're dealing with:

  1. Content Creation: 10,000 posts are created each month
  2. Viewing Patterns: Each post receives 10M views within that month
  3. Analytics Usage: View counts are only visible on Post Insights pages, accessed by post creators
  4. Infrastructure Scope: We'll start our comparison at the initial Kafka stream, assuming all upstream infrastructure (web servers, load balancers, etc.) remains the same

These assumptions help us create a realistic but simplified model of the system's requirements.

Start building with Tinybird
If you've read this far, you might want to use Tinybird as your analytics backend. Start for free with no time limit.

The numbers

Data volume

  • Total views: 100B/month (10K posts × 10M views)
  • Event size: ~100 bytes (JSON format including property names and structure)
  • Total raw data: ~10TB/month

Traffic patterns

  • Ingestion: 38.5K events/second (100B events ÷ 30 days ÷ 86,400 seconds)
  • Post Insights Usage:
    • Each creator(assume 10K creators) checks their post stats 2-3 times per day
    • Some viral posts get checked more frequently
    • Results in up to ~30K post insight checks/day
    • Average: 0.35 QPS
    • Peak: 2-5 QPS during high-activity periods

The Architectures

Reddit's original architecture

Each component serves a specific purpose:

  • Kafka handles event ingestion and routing
  • Consumer services validate events and maintain counts
  • Redis maintains real-time counts
  • Cassandra stores the data permanently

This is, of course, the "right" way to build a system at scale. After all, it follows all the classic big data architecture patterns:

  • Multiple specialized data stores? Check.
  • Event streaming with Kafka? Check.
  • Separate processing stages? Check.
  • Caching layer? Check.
  • Durable storage? Check.

This is exactly what tech blogs, conference talks, and architecture diagrams have been teaching us for years. It's almost like a rite of passage - you haven't really built a scale system until you've connected Kafka, Redis, and some form of durable storage together with multiple processing services, right?

Or... maybe not. What if we could take that same raw_view_events Kafka topic and just process it directly? No need for additional topics, multiple processing stages, or complex recovery processes. Just stream the data into a system designed to handle both real-time ingestion and serving at scale.

The Tinybird approach

One system handles everything:

  • Direct ingestion from the Kafka topic
  • Real-time processing and counting
  • Efficient storage with 17.5:1 columnar compression
  • Sub-25ms API responses

That's it. That's the entire architecture.

Data processing & storage costs*

*I tried to select the most cost-effective versions or equivalents of these services.

Reddit Architecture

ComponentMonthly CostNotes
Consumer Services$186.00
EC2 c4.2xlarge (spot)$113.008 vCPUs, 15 GiB RAM
EC2 c4.large (on-demand)$73.002 vCPUs, 3.75 GiB RAM
AWS Elasticache (Redis)$28.50
Storage (120MB)$10.80$0.125/GB-hour
Processing (ECPUs)$17.70185.82M ECPUs/month
Cassandra (Datastax Astra)$161.10
Storage (120MB)$0.03$0.25/GB/month
Daily Restores
- Reads (300K)$0.11$0.37/1M reads
- Writes (300K)$0.19$0.62/1M writes
10s Backups
- Writes (259.2M)$160.7086400/10 writes/day × 30 days
Data Transfer$0.07$0.02/GB for ~3.6GB/month
Redpanda Serverless$893.95
Instance Hours$73.00$0.10/hour
Partitions (1)$1.10$0.0015/partition/hour
Write (3.67MB/s)$434.02$0.045/GB
Read (3.67MB/s)$385.79$0.04/GB
Storage (1 day)$0.04$0.00012/GB
Total Processing$1,268.55

Processing calculation breakdown

  1. Consumer Services:
  • Single EC2 c4.2xlarge spot instance for consumer groups
    • Runs both consumer groups (Nazar and Abacus)
    • Processes events and maintains counts
    • Spot pricing reduces costs but requires handling instance termination
  • Single EC2 c4.large spot instance for API service
    • Serves Post Insights API requests
    • Can run two VMs for redundancy
    • Handles peak of 2-5 QPS efficiently
  1. Redis Operations:
  • Daily restore: 10K reads from Cassandra, 10K writes to Redis (Elasticache Redis can be expected to restart once a day on average)
  • Every 10 seconds: Read all 10K HLLs, write to Cassandra
  • Post Insights API: Peak 5 QPS = 432K reads/day
  • Total monthly ECPUs:
    • Restore/backup: 172.82M ECPUs
    • Post Insights reads: 13M ECPUs
    • Total: 185.82M ECPUs at $0.0034/M
  1. Cassandra Operations:
  • Daily: 10K reads, 10K writes for Redis restores 
  • Every 10 seconds: 10K writes for HLL backups
  • Monthly writes: 259.2M (10K × 8640 × 30)
  1. Redpanda Operations:
  • Event size: 100 bytes
  • Events per second: 38.5K
  • One topic: processed_view_events (not including raw_view_events because it's in both architectures)
  • Throughput: 3.67MB/s
  • Total write: 3.67MB/s (writing to processed_view_events)
  • Total read: 3.67MB/s (reading processed_view_events)
  • Monthly data processed: ~10TB
  • Minimal retention (1 day) to optimize costs

Tinybird architecture (S1 Plan)

ComponentMonthly CostNotes
Base Plan (S1)$99.00Includes 600 vCPU hours, 25GB storage, 25 QPS
Additional Storage$31.67546GB additional at $0.058/GB
Additional Compute$21.06130 additional vCPU hours at $0.162/hour
Total Processing$151.73

Compute resource analysis

  • 1 vCPU running constantly (730 hours/month)
  • Handles both ingestion (38.5K events/second) and serving (2-5 QPS peak)
  • Base plan includes 600 hours
  • Only need 130 additional hours
  • No QPS overage charges (well within 25 QPS included)

Operational costs

Here's where hidden costs become apparent. The Reddit architecture requires a team to maintain multiple systems, each with its own operational complexities. You typically need:

  • Kafka specialists
  • Redis experts
  • Cassandra administrators
  • Integration specialists

The Tinybird architecture, being consolidated, typically requires just a single developer who understands SQL.

Support & tooling requirements

ArchitectureAdditional Tools Needed
RedditOverall monitoring and observability service
Log aggregation service
APM tools
Kafka monitoring
Cloudwatch
Cassandra monitoring
TinybirdBasic monitoring and observability service

Total Cost Comparison

Reddit Architecture Total

CategoryMonthly CostNotes
Data Processing & Serving$1,268.55Multiple specialized systems
Infrastructure SupportVariedBasic monitoring and tools
Total Infrastructure$1,268.55+

Add to this several full-time engineers and specialists to operate and maintain it.

Tinybird Architecture Total

CategoryMonthly CostNotes
Data Processing & Serving$151.73S1 plan + overages
Infrastructure SupportVariedBasic monitoring and tools
Total Infrastructure$151.73

Add to this a single developer to operate and maintain it.

Performance Characteristics

The Tinybird S1 plan handles this workload efficiently:

  • Ingestion: Constant 38.5K events/second (well within capacity)
  • Query latency: 20-40ms for view count queries
  • Autoscaling: Can burst to 2 vCPU if needed

Architecture Trade-offs

While both architectures can handle the required scale, they differ significantly in complexity and operational overhead:

Development complexity

  • Reddit Approach: Requires coordinating multiple services, managing state across systems, and handling complex failure scenarios
  • Tinybird Approach: Single system to manage, SQL-based transformations, simple API endpoints

Operational overhead

  • Reddit Approach:
    • Multiple systems to monitor and maintain
    • Complex recovery procedures
    • Multiple points of failure to manage
    • Specialized expertise needed for each component
  • Tinybird Approach:
    • Single system to monitor
    • Built-in recovery mechanisms
    • Consolidated logging and monitoring
    • SQL knowledge is the main requirement

Time to value

The architectural differences directly impact development speed:

Initial setup

  • Reddit Approach: Weeks to set up and coordinate multiple services, configure monitoring, and establish operational procedures
  • Tinybird Approach: Hours to create data sources, write transformations, and deploy API endpoints

New feature development

  • Reddit Approach: Days to implement changes across multiple services
  • Tinybird Approach: Minutes to modify transformations and update endpoints

For example, adding a new aggregation (like counting views by country) requires:

  • Reddit Approach: Modifying consumer logic, updating Redis storage, changing Cassandra schema, and updating API services
  • Tinybird Approach: Adding a single SQL transformation and publishing a new endpoint
Subscribe to our newsletter
Links to our blog and other great reads sent every other Saturday.

The Bottom Line

The real difference isn't just in the raw numbers—it's in the operational simplicity. The Reddit architecture, while functional, requires managing multiple specialized systems and the team to support them. The Tinybird approach consolidates everything after Kafka into a single platform, dramatically reducing both infrastructure, development, and personnel costs.

More importantly, this simplification means:

  • Faster feature development
  • Easier troubleshooting
  • Lower training costs
  • Reduced operational risk
  • Better cost predictability

The choice between these architectures isn't just about monthly bills—it's about how much of your development team's time you want spent maintaining infrastructure versus building features that matter to your users.

Want to try this yourself? Check out the Tinybird documentation to get started.

Do you like this post?

Related posts

The simplest way to count 100 billion unique IDs: Part 1
The simplest way to count 100B unique IDs: Part 2
How we processed 12 trillion rows during Black Friday
Horizontally scaling Kafka consumers with rendezvous hashing
Hey Claude, help me analyze Bluesky data.

Claude

AI Assistant

Nov 27, 2024
Outgrowing Postgres: Handling increased user concurrency
More Data, More Apps: Improving data ingestion in Tinybird
You can now create Materialized Views in the Tinybird UI
How to scale a real-time data platform
Building Real-Time Live Sports Viewer Analytics with Tinybird and AWS

Build fast data products, faster.

Try Tinybird and bring your data sources together and enable engineers to build with data in minutes. No credit card required, free to get started.