Analyze the performance of your API Endpoints¶
You can use the pipe_stats
and pipe_stats_rt
Service Data Sources to analyze the performance of your API Endpoints. Read on to see several practical examples that show what you can do with these Data Sources.
Knowing what to optimize¶
Before you optimize, you need to know what to optimize. The pipe_stats
and pipe_stats_rt
Service Data Sources let you see how your API Endpoints are performing, so you can find causes of overhead and improve performance.
These Service Data Sources provide performance data and consumption data for every single request. You can also filter and sort results by Tokens to see who is accessing your API Endpoints and how often.
The difference between pipe_stats_rt
and pipe_stats
is that pipe_stats
provides aggregate stats, like average request duration and total read bytes, per day, whereas pipe_stats_rt
offers the same information but without aggregation. Every single request is stored in pipe_stats_rt
. The examples in this guide use pipe_stats_rt
, but you can use the same logic with pipe_stats
if you need more than 7 days of lookback.
Before you start¶
You need a high-level understanding of Tinybird's Service Data Sources.
Understand the core stats¶
This guide focuses on the following fields in the pipe_stats_rt
Service Data Source:
pipe_name
(String): Pipe name as returned in Pipes API.duration
(Float): the duration in seconds of each specific request.read_bytes
(UInt64): How much data was scanned for this particular request.read_rows
(UInt64): How many rows were scanned.token_name
(String): The name of the Token used in a particular request.status_code
(Int32): The HTTP status code returned for this particular request.
You can find the full schema for pipe_stats_rt
in the API docs.
The value of pipe_name
is "query_api" in the event as it's a Query API request. The following section covers how to monitor query performance when using the Query API.
Use the Query API with metadata parameters¶
If you are using the Query API to run queries in Tinybird you can still track query performance using the pipe_stats_rt
Service Data Source. Add metadata related to the query as request parameters, as well as any existing parameters used in your query.
For example, when running a query against the Query API you can leverage a parameter called app_name
to track all queries from the "explorer" application. Here's an example using curl
:
Using the metadata parameters with the Query API
curl -X POST \ -H "Authorization: Bearer <PIPE:READ token>" \ --data "% SELECT * FROM events LIMIT {{Int8(my_limit, 10)}}" \ "https://api.tinybird.co/v0/sql?my_limit=10&app_name=explorer"
When you run the following queries, use the parameters
attribute to access those queries where app_name
equals "explorer":
Simple Parameterized Query
SELECT * FROM tinybird.pipe_stats_rt WHERE parameters['app_name'] = 'explorer'
Detect errors in your API Endpoints¶
If you want to monitor the number of errors per Endpoint over the last hour, you can run the following query:
Errors in the last hour
SELECT pipe_name, status_code, count() as error_count FROM tinybird.pipe_stats_rt WHERE status_code >= 400 AND start_datetime > now() - INTERVAL 1 HOUR GROUP BY pipe_name, status_code ORDER BY status_code desc
If you have errors, the query would return something like:
Pipe_a | 404 | 127 Pipe_b | 403 | 32
With one query, you can see in real time if your API Endpoints are experiencing errors, and investigate further if so.
Analyze the performance of API Endpoints over time¶
You can also use pipe_stats_rt
to track how long API calls take using the duration
field, and seeing how that changes over time. API performance is directly related to how much data you are reading per request, so if your API Endpoint is dynamic, request duration varies.
For instance, it might be receiving start and end date parameters that alter how long a period is being read.
API Endpoint performance over time
SELECT toStartOfMinute(start_datetime) t, pipe_name, avg(duration) avg_duration, quantile(.95)(duration) p95_duration, count() requests FROM tinybird.pipe_stats_rt WHERE start_datetime >= {{DateTime(start_date_time, '2022-05-01 00:00:00', description="Start date time")}} AND start_datetime < {{DateTime(end_date_time, '2022-05-25 00:00:00', description="End date time")}} GROUP BY t, pipe_name ORDER BY t desc, pipe_name
Find the endpoints that process the most data¶
You might want to find Endpoints that repeatedly scan large amounts of data. They are your best candidates for optimization to reduce time and spend.
Here's an example of using pipe_stats_rt
to find the API Endpoints that have processed the most data as a percentage of all processed data in the last 24 hours:
Most processed data last 24 hours
WITH ( SELECT sum(read_bytes) FROM tinybird.pipe_stats_rt WHERE start_datetime >= now() - INTERVAL 24 HOUR ) as total, sum(read_bytes) as processed_byte SELECT pipe_id, quantile(0.9)(duration) as p90, formatReadableSize(processed_byte) AS processed_formatted, processed_byte*100/total as percentage FROM tinybird.pipe_stats_rt WHERE start_datetime >= now() - INTERVAL 24 HOUR GROUP BY pipe_id ORDER BY percentage DESC
Include consumption of the Query API¶
If you use Tinybird's Query API to query your Data Sources directly, you probably want to include in your analysis which queries are consuming more.
Whenever you use the Query API, the field pipe_name
contain the value query_api
. The actual query is included as part of the q
parameter in the url
field. You can modify the query in the previous section to extract the SQL query that's processing the data.
Using the Query API
WITH ( SELECT sum(read_bytes) FROM tinybird.pipe_stats_rt WHERE start_datetime >= now() - INTERVAL 24 HOUR ) as total, sum(read_bytes) as processed_byte SELECT if(pipe_name = 'query_api', normalizeQuery(extractURLParameter(decodeURLComponent(url), 'q')),pipe_name) as pipe_name, quantile(0.9)(duration) as p90, formatReadableSize(processed_byte) AS processed_formatted, processed_byte*100/total as percentage FROM tinybird.pipe_stats_rt WHERE start_datetime >= now() - INTE RVAL 24 HOUR GROUP BY pipe_name ORDER BY percentage DESC
Monitor usage of Tokens¶
If you use your API Endpoint with different Tokens, for example if allowing different customers to check their own data, you can track and control which Tokens are being used to access these endpoints.
The following example shows, for the last 24 hours, the number and size of requests per Token:
Token usage last 24 hours
SELECT count() requests, formatReadableSize(sum(read_bytes)) as total_read_bytes, token_name FROM tinybird.pipe_stats_rt WHERE start_datetime >= now() - INTERVAL 24 HOUR GROUP BY token_name ORDER BY requests DESC
To get this information, request the Token name (token_name
column) or id (token
column).
Check the limits page for limits on ingestion, queries, API Endpoints, and more.
Next steps¶
- Want to optimize further? Read Monitor your ingestion.
- Want to use Prometheus or Datadog? Consume API Endpoints in Prometheus format
- Learn how to monitor jobs in your Workspace.
- Monitor the latency of your API Endpoints.
- Learn how to build Charts of your data.