Check the health of your Data Sources¶
After you have fixed all the possible errors in your source files, matched the Data Source schema to your needs, and done on-the-fly transformations, you can start ingesting data periodically. Knowing the status of your ingestion processes helps you to keep your data clean and consistent.
Data Sources log¶
From the Data Sources log in your Workspace overview, you can check whether there are new rows in quarantine, if jobs are failing, or if there is any other problem.
Operations Log¶
Select a Data Source to see the size of the Data Source, the number of rows, the number of rows in the quarantine Data Source, and when it was last updated. The Operations log contains details of the events for the Data Source, which appears as the results of the query.
Service Data Sources for continuous monitoring¶
Service Data Sources can help you with ingestion health checks. You can use them like any other Data Source in your Workspace, which means you can create API Endpoints to monitor your ingestion processes.
Querying the 'tinybird.datasources_ops_log' directly, you can, for example, lists your ingest processes during the last week:
LISTING INGESTIONS IN THE LAST 7 DAYS
SELECT * FROM tinybird.datasources_ops_log WHERE toDate(timestamp) > now() - INTERVAL 7 DAY ORDER BY timestamp DESC
This query calculates the percentage of quarantined rows for a given period of time:
CALCULATE % OF ROWS THAT WENT TO QUARANTINE
SELECT countIf(result != 'ok') / countIf(result == 'ok') * 100 percentage_failed, sum(rows_quarantine) / sum(rows) * 100 quarantined_rows FROM tinybird.datasources_ops_log
The following query monitors the average duration of your periodic ingestion processes for a given Data Source:
CALCULATING AVERAGE INGEST DURATION
SELECT avg(elapsed_time) avg_duration FROM tinybird.datasources_ops_log WHERE datasource_id = 't_8417d5126ed84802aa0addce7d1664f2'
If you want to configure or build an external service that monitors these metrics, you need to create an API Endpoint and raise an alert when passing a threshold. When you receive an alert, you can check the quarantine Data Source or the Operations log to see what's going on and fix your source files or ingestion processes.
Monitoring API Endpoints¶
You can use the 'pipe_stats' and 'pipe_stats_rt' Service Data Sources to monitor the performance of your API Endpoints.
Every request to a Pipe is logged to 'tinybird.pipe_stats_rt' and kept in this Data Source for the last week.
The following example API Endpoint aggregates the statistics for each hour for the selected Pipe.
PIPE_STATS_RT_BY_HR
SELECT toStartOfHour(start_datetime) as hour, count() as view_count, round(avg(duration), 2) as avg_time, arrayElement(quantiles(0.50)(duration),1) as quantile_50, arrayElement(quantiles(0.95)(duration),1) as quantile_95, arrayElement(quantiles(0.99)(duration),1) as quantile_99 FROM tinybird.pipe_stats_rt WHERE pipe_id = 'PIPE_ID' GROUP BY hour ORDER BY hour
'pipe_stats' contains statistics about your Pipe Endpoints' API calls aggregated per day using intermediate states.
PIPE_STATS_BY_DATE
SELECT date, sum(view_count) view_count, sum(error_count) error_count, avgMerge(avg_duration_state) avg_time, quantilesTimingMerge(0.9, 0.95, 0.99)(quantile_timing_state) quantiles_timing_in_millis_array FROM tinybird.pipe_stats WHERE pipe_id = 'PIPE_ID' GROUP BY date ORDER BY date
You can use these API Endpoints to trigger alerts whenever statistics pass predefined thresholds. Export API endpoint statistics in Prometheus format to integrate with your monitoring and alerting tools.
To see how you can monitor Pipes and Data Sources health in a dashboard, see Operational Analytics in Real Time with Tinybird and Retool.