Check the health of your data sources¶
After you have fixed all the possible errors in your source files, matched the data source schema to your needs, and done on-the-fly transformations, you can start ingesting data periodically. Knowing the status of your ingestion processes helps you to keep your data clean and consistent.
Data source summary¶
Select a data source to see the size of the data source, the number of rows, the number of rows in the quarantine data source, and when it was last updated. The log contains details of the events for the data source, which appears as the results of the query.
Service data sources for continuous monitoring¶
Service data sources can help you with ingestion health checks. You can use them like any other data source in your Workspace, which means you can create API Endpoints to monitor your ingestion processes.
Querying the tinybird.datasources_ops_log directly, you can, for example, lists your ingest processes during the last week:
SELECT *
FROM tinybird.datasources_ops_log
WHERE toDate(timestamp) > now() - INTERVAL 7 DAY
ORDER BY timestamp DESC
This query calculates the percentage of quarantined rows for a given period of time:
SELECT
countIf(result != 'ok') / countIf(result == 'ok') * 100 percentage_failed,
sum(rows_quarantine) / sum(rows) * 100 quarantined_rows
FROM tinybird.datasources_ops_log
The following query monitors the average duration of your periodic ingestion processes for a given data source:
SELECT avg(elapsed_time) avg_duration
FROM tinybird.datasources_ops_log
WHERE datasource_id = 't_8417d5126ed84802aa0addce7d1664f2'
If you want to configure or build an external service that monitors these metrics, you need to create an API Endpoint and raise an alert when passing a threshold. When you receive an alert, you can check the quarantine data source or the Operations log to see what's going on and fix your source files or ingestion processes.
Monitoring API endpoints¶
You can use the pipe_stats and pipe_stats_rt Service data sources to monitor the performance of your API Endpoints.
Every request to a pipe is logged to tinybird.pipe_stats_rt and kept in this data source for the last week.
The following example API Endpoint aggregates the statistics for each hour for the selected pipe.
SELECT
toStartOfHour(start_datetime) as hour,
count() as view_count,
round(avg(duration), 2) as avg_time,
arrayElement(quantiles(0.50)(duration),1) as quantile_50,
arrayElement(quantiles(0.95)(duration),1) as quantile_95,
arrayElement(quantiles(0.99)(duration),1) as quantile_99
FROM tinybird.pipe_stats_rt
WHERE pipe_id = 'PIPE_ID'
GROUP BY hour
ORDER BY hour
pipe_stats contains statistics about your pipe Endpoints' API calls aggregated per day using intermediate states.
SELECT
date,
sum(view_count) view_count,
sum(error_count) error_count,
avgMerge(avg_duration_state) avg_time,
quantilesTimingMerge(0.9, 0.95, 0.99)(quantile_timing_state) quantiles_timing_in_millis_array
FROM tinybird.pipe_stats
WHERE pipe_id = 'PIPE_ID'
GROUP BY date
ORDER BY date
You can use these API endpoints to trigger alerts whenever statistics pass predefined thresholds. Export API endpoint statistics in Prometheus format to integrate with your monitoring and alerting tools.