🤖 Track your AI apps.
Use our free template.

Product Pricing Use Cases Customers Docs Blog

Product Pricing Use Cases Customers Docs Blog Start building

Sign In Start building

Back

Apr 21, 2025

How to count 100B events: Comparing architectures

Name: Tinybird
Brand: Tinybird
Rating: 5.0 (10 reviews)

Reddit built a powerful architecture in 2017 to count views and unique viewers on posts. How does it compare to our simpler Tinybird approach?

Data 101

Ariel Pérez

Head of Product & Technology

In the previous posts, we looked at how to build, optimize, and scale a simple view counter service to be able to handle 100B+ views and a high-cardinality of unique viewers. These posts were inspired by a 2017 Reddit architecture, which we simplified and optimized with Tinybird.

Now, let's compare the original Reddit architecture with our simpler Tinybird approach, focusing on the real costs - both obvious and hidden.

Setting the stage

To make an apples-to-apples comparison, let's establish some reasonable assumptions about the scale and usage patterns we're dealing with:

Content Creation: 10,000 posts are created each month
Viewing Patterns: Each post receives 10M views within that month
Analytics Usage: View counts are only visible on Post Insights pages, accessed by post creators
Infrastructure Scope: We'll start our comparison at the initial Kafka stream, assuming all upstream infrastructure (web servers, load balancers, etc.) remains the same

These assumptions help us create a realistic but simplified model of the system's requirements.

Start building with Tinybird

If you've read this far, you might want to use Tinybird as your analytics backend. Start for free with no time limit.

Sign up

The numbers

Data volume

Total views: 100B/month (10K posts × 10M views)
Event size: ~100 bytes (JSON format including property names and structure)
Total raw data: ~10TB/month

Traffic patterns

Ingestion: 38.5K events/second (100B events ÷ 30 days ÷ 86,400 seconds)
Post Insights Usage:
- Each creator(assume 10K creators) checks their post stats 2-3 times per day
- Some viral posts get checked more frequently
- Results in up to ~30K post insight checks/day
- Average: 0.35 QPS
- Peak: 2-5 QPS during high-activity periods

The Architectures

Reddit's original architecture

Each component serves a specific purpose:

Kafka handles event ingestion and routing
Consumer services validate events and maintain counts
Redis maintains real-time counts
Cassandra stores the data permanently

This is, of course, the "right" way to build a system at scale. After all, it follows all the classic big data architecture patterns:

Multiple specialized data stores? Check.
Event streaming with Kafka? Check.
Separate processing stages? Check.
Caching layer? Check.
Durable storage? Check.

This is exactly what tech blogs, conference talks, and architecture diagrams have been teaching us for years. It's almost like a rite of passage - you haven't really built a scale system until you've connected Kafka, Redis, and some form of durable storage together with multiple processing services, right?

Or... maybe not. What if we could take that same raw_view_events Kafka topic and just process it directly? No need for additional topics, multiple processing stages, or complex recovery processes. Just stream the data into a system designed to handle both real-time ingestion and serving at scale.

The Tinybird approach

One system handles everything:

Direct ingestion from the Kafka topic
Real-time processing and counting
Efficient storage with 17.5:1 columnar compression
Sub-25ms API responses

That's it. That's the entire architecture.

Data processing & storage costs*

*I tried to select the most cost-effective versions or equivalents of these services.

Reddit Architecture

Component	Monthly Cost	Notes
Consumer Services	$186.00
EC2 c4.2xlarge (spot)	$113.00	8 vCPUs, 15 GiB RAM
EC2 c4.large (on-demand)	$73.00	2 vCPUs, 3.75 GiB RAM
AWS Elasticache (Redis)	$28.50
Storage (120MB)	$10.80	$0.125/GB-hour
Processing (ECPUs)	$17.70	185.82M ECPUs/month
Cassandra (Datastax Astra)	$161.10
Storage (120MB)	$0.03	$0.25/GB/month
Daily Restores
- Reads (300K)	$0.11	$0.37/1M reads
- Writes (300K)	$0.19	$0.62/1M writes
10s Backups
- Writes (259.2M)	$160.70	86400/10 writes/day × 30 days
Data Transfer	$0.07	$0.02/GB for ~3.6GB/month
Redpanda Serverless	$893.95
Instance Hours	$73.00	$0.10/hour
Partitions (1)	$1.10	$0.0015/partition/hour
Write (3.67MB/s)	$434.02	$0.045/GB
Read (3.67MB/s)	$385.79	$0.04/GB
Storage (1 day)	$0.04	$0.00012/GB
Total Processing	$1,268.55

Processing calculation breakdown

Consumer Services:

Single EC2 c4.2xlarge spot instance for consumer groups
- Runs both consumer groups (Nazar and Abacus)
- Processes events and maintains counts
- Spot pricing reduces costs but requires handling instance termination
Single EC2 c4.large spot instance for API service
- Serves Post Insights API requests
- Can run two VMs for redundancy
- Handles peak of 2-5 QPS efficiently

Redis Operations:

Daily restore: 10K reads from Cassandra, 10K writes to Redis (Elasticache Redis can be expected to restart once a day on average)
Every 10 seconds: Read all 10K HLLs, write to Cassandra
Post Insights API: Peak 5 QPS = 432K reads/day
Total monthly ECPUs:
- Restore/backup: 172.82M ECPUs
- Post Insights reads: 13M ECPUs
- Total: 185.82M ECPUs at $0.0034/M

Cassandra Operations:

Daily: 10K reads, 10K writes for Redis restores
Every 10 seconds: 10K writes for HLL backups
Monthly writes: 259.2M (10K × 8640 × 30)

Redpanda Operations:

Event size: 100 bytes
Events per second: 38.5K
One topic: processed_view_events (not including raw_view_events because it's in both architectures)
Throughput: 3.67MB/s
Total write: 3.67MB/s (writing to processed_view_events)
Total read: 3.67MB/s (reading processed_view_events)
Monthly data processed: ~10TB
Minimal retention (1 day) to optimize costs

Tinybird architecture (S1 Plan)

Component	Monthly Cost	Notes
Base Plan (S1)	$99.00	Includes 600 vCPU hours, 25GB storage, 25 QPS
Additional Storage	$31.67	546GB additional at $0.058/GB
Additional Compute	$21.06	130 additional vCPU hours at $0.162/hour
Total Processing	$151.73

Compute resource analysis

1 vCPU running constantly (730 hours/month)
Handles both ingestion (38.5K events/second) and serving (2-5 QPS peak)
Base plan includes 600 hours
Only need 130 additional hours
No QPS overage charges (well within 25 QPS included)

Operational costs

Here's where hidden costs become apparent. The Reddit architecture requires a team to maintain multiple systems, each with its own operational complexities. You typically need:

Kafka specialists
Redis experts
Cassandra administrators
Integration specialists

The Tinybird architecture, being consolidated, typically requires just a single developer who understands SQL.

Support & tooling requirements

Architecture	Additional Tools Needed
Reddit	Overall monitoring and observability service
	Log aggregation service
	APM tools
	Kafka monitoring
	Cloudwatch
	Cassandra monitoring
Tinybird	Basic monitoring and observability service

Total Cost Comparison

Reddit Architecture Total

Category	Monthly Cost	Notes
Data Processing & Serving	$1,268.55	Multiple specialized systems
Infrastructure Support	Varied	Basic monitoring and tools
Total Infrastructure	$1,268.55+

Add to this several full-time engineers and specialists to operate and maintain it.

Tinybird Architecture Total

Category	Monthly Cost	Notes
Data Processing & Serving	$151.73	S1 plan + overages
Infrastructure Support	Varied	Basic monitoring and tools
Total Infrastructure	$151.73

Add to this a single developer to operate and maintain it.

Performance Characteristics

The Tinybird S1 plan handles this workload efficiently:

Ingestion: Constant 38.5K events/second (well within capacity)
Query latency: 20-40ms for view count queries
Autoscaling: Can burst to 2 vCPU if needed

Architecture Trade-offs

While both architectures can handle the required scale, they differ significantly in complexity and operational overhead:

Development complexity

Reddit Approach: Requires coordinating multiple services, managing state across systems, and handling complex failure scenarios
Tinybird Approach: Single system to manage, SQL-based transformations, simple API endpoints

Operational overhead

Reddit Approach:
- Multiple systems to monitor and maintain
- Complex recovery procedures
- Multiple points of failure to manage
- Specialized expertise needed for each component
Tinybird Approach:
- Single system to monitor
- Built-in recovery mechanisms
- Consolidated logging and monitoring
- SQL knowledge is the main requirement

Time to value

The architectural differences directly impact development speed:

Initial setup

Reddit Approach: Weeks to set up and coordinate multiple services, configure monitoring, and establish operational procedures
Tinybird Approach: Hours to create data sources, write transformations, and deploy API endpoints

New feature development

Reddit Approach: Days to implement changes across multiple services
Tinybird Approach: Minutes to modify transformations and update endpoints

For example, adding a new aggregation (like counting views by country) requires:

Reddit Approach: Modifying consumer logic, updating Redis storage, changing Cassandra schema, and updating API services
Tinybird Approach: Adding a single SQL transformation and publishing a new endpoint

Subscribe to our newsletter

Links to our blog and other great reads sent every other Saturday.

Loading…

The Bottom Line

The real difference isn't just in the raw numbers—it's in the operational simplicity. The Reddit architecture, while functional, requires managing multiple specialized systems and the team to support them. The Tinybird approach consolidates everything after Kafka into a single platform, dramatically reducing both infrastructure, development, and personnel costs.

More importantly, this simplification means:

Faster feature development
Easier troubleshooting
Lower training costs
Reduced operational risk
Better cost predictability

The choice between these architectures isn't just about monthly bills—it's about how much of your development team's time you want spent maintaining infrastructure versus building features that matter to your users.

Want to try this yourself? Check out the Tinybird documentation to get started.

Do you like this post?

Build fast data
products, faster.

Try Tinybird and bring your data sources together and enable engineers to build with data in minutes. No credit card required, free to get started.

Start building Request a demo

How to count 100B events: Comparing architectures

Setting the stage