Real-time data systems often process petabytes of data or more every day, serving requests to thousands or millions of concurrent users with the expectation of sub-second API response times. Infrastructure is provisioned to handle steady-state throughput, but increases in traffic or usage can lead to load spikes that can bring down a production system. In real-time systems, load testing becomes critical to ensure uptime even during surges.
In this resource, we share from our experience running hundreds of load tests at Tinybird, where our customers build real-time data APIs to serve billions of requests a day.
What is load testing?
Load testing evaluates how well an infrastructure handles expected traffic, measuring response times and stability as requests increase. In real-time data systems such as Tinybird, where low-latency API response times are critical for many customers, load testing helps predict system behavior during traffic spikes, ensures SLO compliance, and prevents unexpected downtime.
Why do you need to perform load tests?
There are some scenarios where load testing is essential:
- Expected traffic surges during scheduled events: For example, marketing campaigns that lead to a significant increase in visits and queries, or Black Friday for e-commerce companies.
- Creation of new endpoints: Before launching a new use case, it is crucial to validate its performance under different load levels.
- Validation of existing infrastructure: Ensuring that the current infrastructure can handle the expected traffic and identifying potential bottlenecks.
Failing to conduct load tests can lead to significant risks and costs:
- Risks: Service outages, high response latencies, or data loss can negatively impact user experience and product reputation.
- Costs: Unnecessary overprovisioning increases operational costs, while underestimation can cause failures during traffic spikes.
Key considerations for planning a load test in real-time data systems
Defining the objectives and scope of a load test correctly is key, as is establishing the right metrics and indicators. It's also important to select representative examples of API calls, their distribution, and the expected volume. Making 10x more calls to an endpoint that retrieves data for the last week is not the same as making 10x more calls to the same endpoint to retrieve data for the entire last year. Understanding the distribution of API requests, as well as the type of query to be performed, is key in this context.
Another important consideration is the increase in ingestion load. This affects both the machine load and the volume of data that needs to be processed.
Consider, for example, a scenario in Tinybird. You have an endpoint to calculate the average of some metric for the last 24 hours of data, and this endpoint directly queries the raw data source where you ingest your events. If you experience a 10x ingestion spike, your endpoints will have to read 10 times more data. This can lead to higher endpoint latency and block the I/O for longer, limiting its capacity to handle other requests. If the ingestion increase persists over time, your endpoints will keep reading more and more data, and the situation will progressively worsen. In the next picture, you can see an example of how the latency of an endpoint increases proportionally to the increase in the ingestion.
This highlights the importance of accounting for these factors when designing and running load tests.
Important variables for your load test
When performing a load test, it is generally assumed that a set of resources (machines with memory and CPU) will be available. These resources may need to be scaled up or down to support the expected load.
For example, if endpoints are reading large amounts of data, it could create an I/O bottleneck on the machine, reducing the number of queries that can be handled concurrently. To increase this value without adding resources, you would need to reduce the volume of data read, and at some point, you will hit a limit. Additionally, the more data that is read, the higher the latency you can expect.
Understanding how certain metrics change under peak loads or as the load increases is crucial for properly optimizing and scaling the system. The following variables are important to track during a load test:
Queries per second (QPS)
Estimating the number of simultaneous calls to the API will help you understand how to allocate compute resources.
Request latency
Defining a maximum acceptable response time for end users creates a benchmark for successful or failed load tests.
Processed data volume
Reducing the amount of data each query processes or returns will free up resources during load spikes. If large volumes of data must be processed while maintaining low latency and high concurrency, the infrastructure must be scaled accordingly, leading to additional costs.
Consider an example below. These queries get the same result, but with a marked difference in processed data and latency. The first query processes 70MB of data in 484ms. The second, which is better optimized, only processes 1KB in 3.5ms. This was achieved by pre-aggregating metrics using incremental materialized views to maintain minimal response times and I/O resources needed to run the query. Unoptimized queries may be less noticeable during steady state, but extensive load testing will expose them. It is essential to evaluate if optimizations like these can be applied before performing a load test to ensure proper resource allocation.
Ingestion load increase
It's common to generate more data during events like Black Friday, big sporting events, or a large marketing campaign. You should consider the increase in ingestion when performing the load test.
Designing the load test
The load test design process involves the following steps:
- Define the objective:
- What is the purpose of the test?
- Which endpoints will be evaluated?
- What traffic volume is expected? - Review current endpoint performance:
- Ensure that the processed data volume and latency are reasonable.
- If uncertain, consult an expert to validate acceptable values. - Obtain test data:
- Request distribution: It can be random, uniform, or based on specific usage patterns. - Define success criteria:
- Establish acceptable latency and stability thresholds, such as "latency < 100 ms for 99% of calls." - Extract metrics and analyze the result of the test
- In Tinybird, for example, you can query service data sources to extract and monitor metrics. You can see an example query below.
Keep in mind that load tests are not static processes. Typically, multiple iterations are required to adjust configurations and evaluate different scenarios.
Importance of a representative sample
A crucial part of preparing a load test is selecting an appropriate sample of API calls. Endpoints typically include input parameters that allow filtering the queried data. The test sample must reflect the real distribution of requests made by end users.
For example, if an endpoint usually filters the last week of data but is now expected to extend queries to the last month, the test should account for this increase in volume. Otherwise, the test results will not accurately evaluate performance under the new scenario.
When historical data is unavailable, estimates can be based on:
- Parameters used in similar endpoints.
- Expected query distribution based on the use case.
A practical load test example
To see how the considerations above are applied, let's walk through a practical load test example.
Imagine we're an electronics e-commerce company preparing for Black Friday. We've built API endpoints using Tinybird to serve real-time, personalized offers to our customers at scale. These endpoints are the backbone of our online store, enabling customers to find the products they need and discover the best deals quickly and efficiently. During our Black Friday sales, these endpoints will experience massive load increases, and we'll rely on Tinybird to allocate the necessary resources to manage that load.
Throughout the year, our endpoints experience stable traffic, but we anticipate a 10x increase in requests on Black Friday along with a 10x increase in data ingestion. We need to verify that our search and sales endpoints can manage this peak without any performance degradation.
Step 1: Identify Critical Endpoints
Before initiating any tests, we need to pinpoint the most critical endpoints for our business. In this case, we'll focus on:
/search
: Used by customers to find products and get recommendations based on previous searches./sales
: Used to display discounted products during Black Friday whenever a user accesses the page.
These endpoints will be our primary focus during the load testing.
Step 2: Gather Baseline Metrics
To effectively evaluate a service's performance during a load test, it’s key to gather baseline metrics. A load test typically sends hundreds or even thousands of requests per second over a prolonged period. By analyzing these requests, you can understand the latency distribution, identify bottlenecks, and evaluate how the service will perform under peak loads or in production environments.
The available statistics for all critical metrics include:
- Mean: The average value of a dataset, calculated by summing all values and dividing by the number of data points.
- Median: The middle value when the data is ordered, with 50% of the values above and 50% below it.
- Mode: The value that appears most frequently in the dataset.
- Maximum: The highest value in the dataset.
- Minimum: The lowest value in the dataset.
- Percentiles: Divide the dataset into 100 equal parts. The 99th percentile, for example, shows the value below which 99% of the requests fall.
Which statistic should I use?
When evaluating load test results, it’s important to consider the mean, median, and percentiles. A successful test demonstrates that a service can handle most requests with latencies below a set threshold. The 99th percentile is particularly useful, as it indicates the latency under which 99% of requests fall, ensuring a good user experience when combined with a low error rate.
The mean can be significantly impacted by outliers, distorting the true performance and obscuring the response times for users affected by tail latency—slower responses that occur in the "long tail" of the distribution. These latencies are key for understanding the full range of user experiences, especially in high-traffic scenarios. In such cases, the median is a more reliable measure, as it is less influenced by extreme values.
The mode, meanwhile, helps reveal where most values are concentrated and highlights any skew in the distribution, providing a clearer picture of typical performance.
In Tinybird, you can easily capture these statistics by running the following query against the pipe_stats_rt
service data source, which contains real-time requests logs to API endpoints published on Tinybird's infrastructure:
Pipe Name | QPS | Avg Latency | p99 Latency | Errors | Total Requests | Top Categories |
---|---|---|---|---|---|---|
/search | 10 | 80 | 150 | 2 | 14,400 | ['smartphones', 'laptops', 'headphones'] |
/sales | 5 | 120 | 250 | 1 | 7,200 | ['smartphones', 'laptops'] |
Analysis of baseline
- Our
/search
endpoint handles 10 QPS steady state with an average latency of 80ms and a p99 latency of 150ms. - Our
/sales
endpoint handles 5 QPS steady state with an average latency of 120ms and a p99 latency of 250ms. - The error rate is negligible.
- The most searched categories are smartphones, laptops, and headphones.
Step 3: Define Load Test Parameters
Our experience from previous events indicates that the behavior of our endpoints will be:
- 10x increase in traffic, meaning
/search
should handle 100 QPS and/sales
50 QPS. - Ingestion load increase of x10.
- Our goal is to maintain the average latency below 200ms and the p99 latency below 400ms.
- We'll set an error threshold of 0.1%.
Assessing request distribution
In Tinybird, we can easily extract a sample distribution of endpoint calls following the production distribution running the following query over the pipe_stats_rt
service data source:
You can extract this sample of queries as a CSV file (e.g. query_distribution.csv
) to seed your load test using your preferred tool.
Going a step further, a truly representative sample requires mirroring not only the distribution of endpoints, reflecting production load (e.g., 40% endpoint 1, 60% endpoint 2), but also the distribution of parameters within those endpoints. This ensures accurate simulation of real-world data processing. For instance, if endpoint 1 predominantly handles year-long date ranges, or if a recommendation endpoint processes data for high-interaction users, the sample must reflect these parameter distributions to accurately represent the production environment.
In the following example, you can see how to get the distribution of requests for each category.
The following query returns the number of requests grouped by query parameters, which can be used to fine-tune your load test.
Step 4: Perform the load tests
There are several tools available for load testing, each with its own strengths and capabilities. Some popular options include JMeter, Gatling, Locust, and wrk, among others. Each tool allows you to simulate traffic, measure performance, and analyze the results based on different parameters.
In this particular example, we use wrk
to execute the test, using the previously generated file and the following configuration:
-t12
: specifies 12 execution threads.-c400
: defines 400 concurrent connections.-d30s
: sets a 30-second duration for the test.
Don’t forget to customize the multi-request-json.lua
to point to your query_distribution.csv
file.
Step 5: Analyze the result
Now that we've covered how to set up and run a load test in Tinybird, and which metrics to use for evaluation, let's discuss how to extract test results from Tinybird’s service tables. Full documentation on service data sources is available here: Tinybird Service Data Sources.
To analyze endpoint behavior during a load test, we use the pipe_stats_rt
service data source, which contains detailed information about all endpoint requests made to our APIs in the last 7 days.. The following SQL query retrieves the average latency, median, percentiles, and counts of successful and failed requests during a load test.
Since Tinybird records all the parameters you sent to the endpoint, even if they are not used, you can use a placeholder parameter to identify the calls as a load test. For example, by passing test=test20250206
as a query parameter, for example, you can subsequently isolate the specific load test results:
This query extracts all the relevant metrics, making it sufficient for evaluating infrastructure and service performance during a load test.
It is also important to monitor CPU and memory usage during the load test. To prevent instance crashes, tests should be conducted incrementally, with continuous monitoring to avoid overwhelming the machines and causing service disruptions. The following queries help monitor CPU and memory status.
You can use our template for monitoring to help you build a dashboard based on the metrics reported by Tinybird.
CPU Usage:
Memory Usage:
If resource usage hits limits before reaching the expected load, you may need to resize the infrastructure or optimize endpoints to resolve bottlenecks. CPU usage is especially critical — exceeding 60% can cause latency spikes, so keeping it within 50-60% is ideal. If latency becomes unacceptable during the test, it’s often a sign of instance overload, requiring infrastructure adjustments.
Going back to our example, applying the queries mentioned above we get the following results:
Endpoint | QPS | Average Latency (ms) | Median Latency (ms) | 75th Percentile Latency (ms) | 90th Percentile Latency (ms) | 99th Percentile Latency (ms) | Total Requests | Successes | Errors |
---|---|---|---|---|---|---|---|---|---|
/search | 98 | 190 | 185 | 210 | 280 | 380 | 846,720 | 846,300 | 420 |
/checkout | 49 | 230 | 220 | 260 | 350 | 450 | 423,360 | 423,000 | 360 |
Interpreting the results:
/search
endpoint:
- Handling an average of 98 queries per second.
- Average latency is 190ms, with a median of 185ms. This suggests that the latency distribution is fairly symmetrical.
- 75th percentile latency (p75) is 210ms, meaning 75% of requests are served within 210ms.
- 99th percentile latency (p99) is 380ms, indicating that 1% of requests experience latencies higher than 380ms.
- The error rate is very low (420 errors out of 846,720 requests).
/sales
endpoint:
- Handling 49 queries per second on average.
- Average latency is 230ms, with a median of 220ms.
- p99 latency is 450ms, slightly higher than
/search
. - The error rate is also low but slightly higher than
/search
(360 errors out of 423,360 requests).
The results suggest that both endpoints are performing reasonably well under load. However, /sales
has slightly higher latencies and a higher error rate compared to /search
. This indicates that /sales
might be a bottleneck and could benefit from further optimization.
CPU Usage Query Results:
We can see the CPU usage across different nodes (node-1
and node-2
) over time.
CPU usage is increasing gradually but remains within acceptable limits (below 60%). This indicates that the nodes are handling the load effectively.
Memory Usage Query Results:
Memory usage is also increasing but is well below the capacity of the nodes. This suggests that memory is not a bottleneck at this point. Our recommendation is to keep the memory usage always below 60%.
Important Notes:
- These are just examples. Your actual results will vary depending on your infrastructure, load test parameters, and the specific queries you run.
- It's crucial to monitor these metrics in real-time during your load tests to identify any potential issues early on.
- If you see CPU usage approaching 60% or memory usage nearing capacity, you may need to consider scaling your infrastructure to handle the increased load.
Step 6: Adjustments and iterative testing
Based on the results, we can apply optimizations and rerun the tests. Some strategies include:
- Optimize queries: We reviewed the
/sales
queries and found that a costly aggregation was causing the latency and errors. We optimized the query using pre-calculated aggregations. - Implement caching: We implemented caching to reduce the load on frequently used endpoints.
- Ensure infrastructure can scale appropriately: We verified that our resources have sufficient CPU and memory to handle the load.
After the optimizations, we rerun the load tests and verify that the results are within the defined thresholds.
Conclusion
Conducting load testing is essential for preparing for high-traffic events like Black Friday. Tinybird, along with load testing tools like wrk
, enables you to assess the performance of your endpoints and optimize them to provide a seamless experience for your customers.
By following these steps, you can prepare for traffic spikes and ensure your app or service is ready to handle the load without interruptions even on the most important day of the year.