Check the health of your data sources¶

After you have fixed all the possible errors in your source files, matched the data source schema to your needs, and done on-the-fly transformations, you can start ingesting data periodically. Knowing the status of your ingestion processes helps you to keep your data clean and consistent.

Data source summary¶

Select a data source to see the size of the data source, the number of rows, the number of rows in the quarantine data source, and when it was last updated. The log contains details of the events for the data source, which appears as the results of the query.

Service data sources for continuous monitoring¶

Service data sources can help you with ingestion health checks. You can use them like any other data source in your Workspace, which means you can create API Endpoints to monitor your ingestion processes.

Querying the tinybird.datasources_ops_log directly, you can, for example, lists your ingest processes during the last week:

LISTING INGESTIONS IN THE LAST 7 DAYS

SELECT * 
FROM tinybird.datasources_ops_log
WHERE toDate(timestamp) > now() - INTERVAL 7 DAY
ORDER BY timestamp DESC

This query calculates the percentage of quarantined rows for a given period of time:

CALCULATE % OF ROWS THAT WENT TO QUARANTINE

SELECT 
  countIf(result != 'ok') / countIf(result == 'ok') * 100 percentage_failed,
  sum(rows_quarantine) / sum(rows) * 100 quarantined_rows
FROM tinybird.datasources_ops_log

The following query monitors the average duration of your periodic ingestion processes for a given data source:

CALCULATING AVERAGE INGEST DURATION

SELECT avg(elapsed_time) avg_duration 
FROM tinybird.datasources_ops_log
WHERE datasource_id = 't_8417d5126ed84802aa0addce7d1664f2'

If you want to configure or build an external service that monitors these metrics, you need to create an API Endpoint and raise an alert when passing a threshold. When you receive an alert, you can check the quarantine data source or the Operations log to see what's going on and fix your source files or ingestion processes.

Monitoring API endpoints¶

You can use the pipe_stats and pipe_stats_rt Service data sources to monitor the performance of your API Endpoints.

Every request to a pipe is logged to tinybird.pipe_stats_rt and kept in this data source for the last week.

The following example API Endpoint aggregates the statistics for each hour for the selected pipe.

PIPE_STATS_RT_BY_HR

SELECT
  toStartOfHour(start_datetime) as hour,
  count() as view_count,
  round(avg(duration), 2) as avg_time,
  arrayElement(quantiles(0.50)(duration),1) as quantile_50,
  arrayElement(quantiles(0.95)(duration),1) as quantile_95,
  arrayElement(quantiles(0.99)(duration),1) as quantile_99
FROM tinybird.pipe_stats_rt
WHERE pipe_id = 'PIPE_ID'
GROUP BY hour
ORDER BY hour

pipe_stats contains statistics about your pipe Endpoints' API calls aggregated per day using intermediate states.

PIPE_STATS_BY_DATE

SELECT
  date,
  sum(view_count) view_count,
  sum(error_count) error_count,
  avgMerge(avg_duration_state) avg_time,
  quantilesTimingMerge(0.9, 0.95, 0.99)(quantile_timing_state) quantiles_timing_in_millis_array
FROM tinybird.pipe_stats
WHERE pipe_id = 'PIPE_ID'
GROUP BY date
ORDER BY date

You can use these API endpoints to trigger alerts whenever statistics pass predefined thresholds. Export API endpoint statistics in Prometheus format to integrate with your monitoring and alerting tools.

Get started

Ingest data

Work with data

Test and deploy

Monitor your data

Administration

Pricing

Deployment options

Reference