Dec 03, 2024

Building Real-Time Live Sports Viewer Analytics with Tinybird and AWS

Ariel Pérez
Field CTO

Ever tried to show millions of viewers real-time stats about how many other people like them are watching the same event? It's a bit like trying to count grains of sand while they're being poured into your bucket. Fun times! Let's look at how to build this without breaking the bank (or your sanity).

The Challenge: Fan Engagement at Massive Scale

Imagine you're streaming a major live event and want to show each viewer some engaging stats:

  • How many people in their state are watching?
  • How many fans of their team are tuned in?
  • What's the total viewer count in their country?
  • What's the global audience size?

Sounds simple? Well, here's the catch - you need to handle:

  • 3.3M concurrent viewers
  • 350,000 events/second
  • 17 GB/minute of incoming data
  • Globally distributed
  • And keep it all fresh within an average of 5 seconds (we're optimizing for cost here!)

Drawing inspiration from how FOX Sports built their near real-time internal analytics at massive scale, we're going to take it a step further. While their architecture excelled at delivering internal BI analytics, we want to extend it to power real-time viewer segmentation and engagement features for millions of concurrent viewers. We'll show you how to build upon their robust foundation to create engaging, personalized experiences.

The AWS Well-Architected Solution

First, let's look at how you might build this with AWS services. It follows AWS Well-Architected principles to ensure reliability and scale:

An architecture that separates concerns for reliability and scale while keeping costs in check

Component Design & Optimization

The architecture is carefully tuned to balance performance, reliability, and cost. Each choice introduces additional complexity but serves a specific performance or cost goal:

Infrastructure Choices

  1. Kinesis Data Streams with on-demand capacity handles unpredictable traffic patterns with 1ms ingestion latency
  2. Lightweight Lambda functions (128MB) provide 100ms processing time for JSON handling while keeping costs low
  3. CloudFront Functions reduce latency to single-digit milliseconds for JWT validation compared to Lambda@Edge
  4. HTTP API instead of REST API cuts request latency by 60% for simple authentication needs

Request Optimization

  1. JWT validation at CloudFront enables request coalescing, critical for handling traffic spikes
  2. Reduces backend load from millions to thousands of requests per second
  3. Without this optimization, DynamoDB costs would be 3-4x higher with potential throughput issues
  4. Each layer adds complexity but is necessary for cost management at scale

Storage Strategy

  1. Hot path: DynamoDB delivers consistent sub-10ms reads for real-time data
  2. Cloudfront provides low double-digit latencies via globally distributed PoPs
  3. Cold path: S3 with Parquet format (7:1 compression ratio) optimizes for analytical queries
  4. Geographic partitioning reduces Athena query costs by up to 90% for location-based analytics
  5. Each storage tier requires different access patterns and maintenance strategies

Operational Efficiency

  1. 4-hour runtime window for streaming components saves 80% on Kinesis costs
  2. 30-day data retention balancing analysis needs with storage costs
  3. 10-second cache TTL cuts backend requests by 95% while maintaining reasonable freshness
  4. Each optimization requires careful monitoring and adjustment

Data Flow

  1. User actions hit Kinesis Data Streams (4 of them!)
  2. Kinesis Analytics does the real-time number crunching
  3. Then things get interesting:
    - Need instant stats? That's the DynamoDB path (read more about this approach here)
    - Building dashboards? Off to RDS Postgres and JSON in S3 + Athena you go
    - Want historical analysis? Parquet on S3 and Athena have your back

By the Numbers

Here's what you can expect performance-wise:

Near Real-time Stats (User App Path):

  • Freshness: 1-11 seconds
  • Query Response: 26-103 milliseconds (thank you, caching!)

Streaming Analytics (BI Tool Path):

  • Freshness: 1 minute
  • Query Response: 505ms

Batch Analytics Path (BI Tool Path):

  • Freshness: 1 minute
  • Query Response: 2.5 seconds

Ad-Hoc Analytics Path:

  • Freshness: ~6. minutes
  • Query Response: 35 seconds

And it'll cost you about $1,588.30 (we'll break down those numbers in detail later).

The Tinybird Approach

Remember that complex AWS architecture we just looked at? Here's the same thing with Tinybird:

Not a typo - it's really that simple! And here's the kicker:

Performance Characteristics

  • Consistent performance whether you're querying last minute's or last month's data

Near Real-time Stats (User App Path):

  • Freshness: 2-3 seconds
  • Query Response: 10-25 milliseconds

Optimized Aggregate Analytics (BI Tool Path):

  • Freshness: 3 seconds
  • Query Response: 1 second

Ad-Hoc Analytics (BI Tool Path):

  • Freshness: 2 seconds
  • Query Response: 6 seconds

Operational Benefits and Cost Efficiency

Simplified Total Cost of Ownership

  1. No separate paths needed for real-time vs historical data
  2. Fewer moving parts means fewer things to break
  3. Simpler architecture makes troubleshooting a breeze
  4. When something goes wrong, you're not playing detective across multiple services

Rapid Development and Iteration

  1. Need a new feature? Just write SQL
  2. Want to transform data differently? SQL
  3. Need to expose a new API endpoint? You guessed it - SQL

Streamlined Feature Development
Want to add new analytics? Let's see how it works in both approaches: 

AWS Approach:

  • Modify Kinesis Analytics application
  • Update Lambda functions
  • Add new DynamoDB tables/indexes
  • Update the streaming aggregates in Postgres
  • Modify the batch processing pipeline
  • Test each component separately
  • Hope everything still works together

Tinybird Approach:

Then publish it as an API endpoint. Done!

How much does it cost?

Total Monthly Cost: $1,270.91

That’s 20% less than the AWS Well-Architected Solution! To be fair, all the costs are estimated based on the expected workloads and the current published prices for both AWS and Tinybird but, even within the margins of error the Tinybird implementation puts less strain on your budget. Run this for an entire month and the differences are even more dramatic.

Implementation Considerations

Go With AWS If:Pick Tinybird When:
You're already deep in the AWS ecosystemYou want sub-second query latency without the headache
Your team dreams in Lambda functionsYour team prefers writing SQL to managing infrastructure
You need very specific control over data storage locationsYou need to iterate quickly on new analytics features
You enjoy building and maintaining complex pipelines (hey, some people do!)You don't want to manage numerous moving parts

The Bottom Line

Both approaches can handle the scale - that's not the question. The real decision comes down to what you value more: operational and architectural simplicity or operational and architectural uniformity. If you're building something new, Tinybird's approach lets you move faster and sleep easier. But if you're heavily invested in AWS services, their solution, while more complex, might fit better into your existing workflow.

Appendix

Detailed Cost estimates

The costs for both approaches are based on published on-demand pricing - both platforms offer discounts if you're willing to commit long-term.

AWS Approach

Path/Component

Cost

Notes

Total Live Event Cost

$1,592.29


Ingestion

$653.44


Kinesis Data Streams

$653.44


Kinesis Data Streams (Data In)

$326.40

17GB/min * 240 min * $0.08/GB

Kinesis Data Streams (Data Out)

$326.40

17GB/min * 240 min * $0.04/GB * 2 (1/4 * 4 Windowed KDAs + Raw Firehose)

Kinesis Data Streams (Stream Hours)

$0.64

4 streams * 4 hours * $0.04/stream-hour

Analytics

$712.57


Kinesis Data Analytics

$27.28

62 KPUs * 4 hours * $0.11/KPU-hour

Near Real-time to Mobile/Web Apps

$682.59


Lambda

$1.85


Lambda (Requests)

$1.00

87 requests/s per stream * 4 streams * 14400s * $0.2/M requests

Lambda (Duration)

$0.85

87 requests/s/stream * 4 streams * 14400s * 100ms/request * $0.0000000017/ms

Cloudfront

$677.33


CloudFront (HTTPS Requests to Origin)

$0.13

87 stat request/10s * 14400s * $0.01/10K requests

CloudFront (Functions)

$475.20

3.3M viewers * 1 request/10s/viewer * 14400s * $0.10/M invocations

CloudFront (Data Transfer)

$202.00

3.3M viewers * 1 request/10s/viewer * 14400s * 0.0000004657GB/request * $0.085/GB

API Gateway

$0.13

87 stat requests/10s * 14400s * $1.00/million HTTP API Requests

Dynamo DB

$3.29


DynamoDB (Writes)

$3.13

87 writes/s per stream * 4 streams * 14400s * 1 WRU/write * $0.625/M WRUs

DynamoDB (Reads)

$0.16

87 reads/10s * 14400s * 1RRU/read * $1.25/M RRUs

Streaming Aggregates (Postgres/BI)

$1.98


Lambda

$1.85


Lambda (Requests)

$1.00

87 requests/s per stream * 4 streams * 14400s * $0.2/M requests

Lambda (Duration)

$0.85

87 requests/s/stream * 4 streams * 14400s * 100ms/request * $0.0000000017/ms

RDS PostgreSQL

$0.13

4 hours * $0.032/hour

Batched Aggregates (Athena/BI)

$0.73


Kinesis Firehose

$0.01

87 objects/s/stream * 14400 s * 4 streams * 100 bytes/object * $0.029/GB

S3

$0.43

87 objects/s/stream * 14400 s * 4 streams * 100 bytes/object * $0.023/GB +

87 objects/s/stream * 14400 s * 4 streams * 1 POST/min * 1 min/60 s * $0.005/1000 POSTs

Athena

$0.29

1 query/min * 240 min * 0.000238 TB (avg)/query * $5.00/TB

Ad-hoc Raw Data Queries (Athena/BI)

$226.27


Firehose

$203.76


Firehose (Ingestion)

$118.32

17GB/min * 240 min * $0.029/GB

Firehose (Format Conversion)

$73.44

17GB/min * 240 min * $0.018/GB

Firehose (Dynamic Partitioning)

$12.00

17 GB/min * 240 min * $0.02/GB +

53 partitions/min * 1 object per partition * 240 min * $0.005/1000 objects +

4 hours * $0.07/hour

S3

$13.41

17 GB/min * 240 min * 1/7 parquet compression +

53 POSTs/min * 240 min * $0.005/1000 POSTs

Athena

$9.11

1 query/30 min * 240 min * 80% of records/query * 50% of columns/query *

0.0166TB/min * 1/7 parquet compression * $5.00/TB

Tinybird Approach

Cost Dimension

Cost

Notes

Total Live Event Cost

$1,270.91


Total Stored

$693.61


Ingestion

$693.60

17 GB/min * 240 min * 50% size reduction * $0.34/GB

Materialized Views

$0.01


States MV

$0.00

14 bytes/record * 50 records/s * 14400s * $0.34/GB

US/ex-US MV

$0.00

14 bytes/record * 2 records/s * 14400s * $0.34/GB

Teams MV

$0.00

22 bytes/record * 32 records/s * 14400s * $0.34/GB

Favored Winner MV

$0.00

18 bytes/record * 2 records/s * 14400s * $0.34/GB

Total Processed

$577.31


Ingestion

$285.60

17 GB/min * 240 min * $0.07/GB

Materialization

$34.18


States Materialization

$10.51

350,000 events/s * 14400s * 18 bytes/event * $0.07/GB +

350,000 events/s * 14400s * 14 bytes/event * $0.07/GB

US/ex-US

Materialization

$10.51

350,000 events/s * 14400s * 18 bytes/event * $0.07/GB +

350,000 events/s * 14400s * 14 bytes/event * $0.07/GB

Teams Materialization

$13.14

350,000 events/s * 14400s * 26 bytes/event * $0.07/GB +

350,000 events/s * 14400s * 14 bytes/event * $0.07/GB

Favored Winner

Materialization

$0.01

19 bytes/record * 1.42 (avg) decisions/viewer * 3.3M viewers * $0.07/GB +

15 bytes/record * 1.42 (avg) decisions/viewer * 3.3M viewers * $0.07/GB

API Endpoints

$0.49


States Endpoint

$0.00

14 bytes/read * 50 reads/s * 14400s * $0.07/GB

US/ex-US Endpoint

$0.00

14 bytes/read * 2 reads/s * 14400s * $0.07/GB

Teams Endpoint

$0.00

22 bytes/read * 32 reads/s * 14400s * $0.07/GB

Favored Teams Endpoint

$0.49

2 reads/s * 14400s * 0.000241 GB/read (avg) * $0.07/GB

Ad-Hoc BI Queries

$257.04

1 query/30 min * 240 min * 80% of records/query * 50% of columns/query * 17GB/min *

50% size reduction * 56.25% (avg data available to scan per query) * $0.07/GB

Detailed Performance Estimates

AWS Approach

Component

Activity

Latency/

Freshness (s)

Near Real-time to Mobile/Web Apps

11.208

Write Path

1.105

Kinesis Data Streams

Write/Read

0.002

Kinesis Data Analytics

Window Processing

1.000

Lambda

Process Events

0.100

DynamoDB

Write

0.003

Read Path

10.103

Cloudfront

Cache TTL

10.000

API Gateway

Auth/Route

0.100

DynamoDB

Read

0.003

Streaming Aggregates (Postgres/BI)

61.612

Write Path

1.107

Kinesis Data Streams

Write/Read

0.002

Kinesis Data Analytics

Window Processing

1.000

Lambda

Process Events

0.100

RDS Postgres

Write

0.005

Read Path

60.505

RDS Postgres

Read

0.005

BI Tools

Refresh Rate

60.000

BI Tools

Query Processing

0.500

Batched Aggregates (Athena/BI)

123.602

Write Path

61.102

Kinesis Data Streams

Write/Read

0.002

Kinesis Data Analytics

Window Processing

1.000

Kinesis Firehose

Buffer

60.000

S3

Write

0.100

Read Path

62.500

Athena

Query

2.000

BI Tools

Refresh Rate

60.000

BI Tools

Query Processing

0.500

Ad-hoc Raw Data Queries (Athena/BI)

371.502

Write Path

335.502

Kinesis Data Streams

Write/Read

0.002

Kinesis Firehose

Parquet Buffer

60.000

Kinesis Firehose

Parquet Conversion

0.500

S3

Write

275.000

Read Path

36.000

Athena

Complex Query

35.000

BI Tools

Query Processing

1.000

Tinybird Approach

Component

Activity

Latency/

Freshness (s)

Near Real-time to Mobile/Web Apps

3.100

Write Path

2.050

Events API Ingestion

Write

2.000

Materialization

Window Processing

0.050

Read Path

1.035

Endpoints

Cache TTL

1.000

Cache Read

Cache Read

0.010

Database

Read

0.025

Optimized Aggregate Queries (BI)

3.050

Write Path

2.050

Events API Ingestion

Write

2.000

Materialization

Window Processing

0.050

Read Path

1.025

Database

Read

0.025

BI Tools

Query Processing

1.000

Ad-hoc Unoptimized Queries (BI)

8.050

Write Path

2.050

Events API Ingestion

Write

2.000

Materialization

Window Processing

0.050

Read Path

6.000

Database

Read

5.000

BI Tools

Query Processing

1.000

Do you like this post?

Related posts

How we processed 12 trillion rows during Black Friday
Real-time streaming data architectures that scale

Tinybird

Team

Jul 21, 2023
Real-Time Data Ingestion: The Foundation for Real-time Analytics
Tinybird is out of beta and open to everyone
Real-time analytics API at scale with billions of rows
Real-Time Analytics: Examples, Use Cases, Tools & FAQs

Tinybird

Team

Mar 17, 2023
What are Materialized Views and why do they matter for real-time?
Real-time Data Visualization: How to build faster dashboards
3 ways to run real-time analytics on AWS with DynamoDB
Adding JOIN support for parallel replicas on ClickHouse®️

Build fast data products, faster.

Try Tinybird and bring your data sources together and enable engineers to build with data in minutes. No credit card required, free to get started.
Need more? Contact sales for Enterprise support.