Check the health of your Data Sources

After you have fixed all the possible errors in your source files, matched the Data Source schema to your needs, and done on-the-fly transformations, you can start ingesting data periodically. Knowing the status of your ingestion processes helps you to keep your data clean and consistent.

Data Sources log

From the Data Sources log in your Workspace overview, you can check whether there are new rows in quarantine, if jobs are failing, or if there is any other problem.

Operations Log

Select a Data Source to see the size of the Data Source, the number of rows, the number of rows in the quarantine Data Source, and when it was last updated. The Operations log contains details of the events for the Data Source, which appears as the results of the query.

Service Data Sources for continuous monitoring

Service Data Sources can help you with ingestion health checks. You can use them like any other Data Source in your Workspace, which means you can create API Endpoints to monitor your ingestion processes.

Querying the 'tinybird.datasources_ops_log' directly, you can, for example, lists your ingest processes during the last week:

LISTING INGESTIONS IN THE LAST 7 DAYS
SELECT * 
FROM tinybird.datasources_ops_log
WHERE toDate(timestamp) > now() - INTERVAL 7 DAY
ORDER BY timestamp DESC

This query calculates the percentage of quarantined rows for a given period of time:

CALCULATE % OF ROWS THAT WENT TO QUARANTINE
SELECT 
  countIf(result != 'ok') / countIf(result == 'ok') * 100 percentage_failed,
  sum(rows_quarantine) / sum(rows) * 100 quarantined_rows
FROM tinybird.datasources_ops_log

The following query monitors the average duration of your periodic ingestion processes for a given Data Source:

CALCULATING AVERAGE INGEST DURATION
SELECT avg(elapsed_time) avg_duration 
FROM tinybird.datasources_ops_log
WHERE datasource_id = 't_8417d5126ed84802aa0addce7d1664f2'

If you want to configure or build an external service that monitors these metrics, you need to create an API Endpoint and raise an alert when passing a threshold. When you receive an alert, you can check the quarantine Data Source or the Operations log to see what's going on and fix your source files or ingestion processes.

Monitoring API Endpoints

You can use the 'pipe_stats' and 'pipe_stats_rt' Service Data Sources to monitor the performance of your API Endpoints.

Every request to a Pipe is logged to 'tinybird.pipe_stats_rt' and kept in this Data Source for the last week.

The following example API Endpoint aggregates the statistics for each hour for the selected Pipe.

PIPE_STATS_RT_BY_HR
SELECT
  toStartOfHour(start_datetime) as hour,
  count() as view_count,
  round(avg(duration), 2) as avg_time,
  arrayElement(quantiles(0.50)(duration),1) as quantile_50,
  arrayElement(quantiles(0.95)(duration),1) as quantile_95,
  arrayElement(quantiles(0.99)(duration),1) as quantile_99
FROM tinybird.pipe_stats_rt
WHERE pipe_id = 'PIPE_ID'
GROUP BY hour
ORDER BY hour

'pipe_stats' contains statistics about your Pipe Endpoints' API calls aggregated per day using intermediate states.

PIPE_STATS_BY_DATE
SELECT
  date,
  sum(view_count) view_count,
  sum(error_count) error_count,
  avgMerge(avg_duration_state) avg_time,
  quantilesTimingMerge(0.9, 0.95, 0.99)(quantile_timing_state) quantiles_timing_in_millis_array
FROM tinybird.pipe_stats
WHERE pipe_id = 'PIPE_ID'
GROUP BY date
ORDER BY date

You can use these API Endpoints to trigger alerts whenever statistics pass predefined thresholds. Export API endpoint statistics in Prometheus format to integrate with your monitoring and alerting tools.

To see how you can monitor Pipes and Data Sources health in a dashboard, see Operational Analytics in Real Time with Tinybird and Retool.

Updated