Monitor jobs in your Workspace

Many operations in your Tinybird Workspace, like Imports, Copy Jobs, Sinks, and Populates, are executed as background jobs within the platform. This approach ensures that the system can handle a large volume of requests efficiently without causing timeouts or delays in your workflow.

Monitoring and managing jobs, for example querying job statuses, types, and execution details, is essential for maintaining a healthy Workspace. The two mechanisms for generic job monitoring are the Jobs API and the jobs_log Data Source.

You can also track more specific things using dedicated Service Data Sources, such as datasources_ops_log for import, replaces, or copy, or sinks_ops_log for sink operations, or tracking jobs across Organizations with organization.jobs_log. See Service Data Sources docs.

The Jobs API and the jobs_log return identical information about job execution. However, the Jobs API has some limitations: It reports only on a single Workspace, returns only 100 records, from the last 48 hours. If you want to monitor jobs outside these parameters, use the jobs_log Data Source.

Track a specific job

The most elemental use case is to track a specific job. You can do this using the Jobs API or SQL queries.

Jobs API

The Jobs API is a convenient way to programmatically check the status of a job. By sending a GET request, you can retrieve detailed information about a specific job. This method is particularly useful for integration into scripts or applications.

curl \
    -X GET "https://$TB_HOST/v0/jobs/{job_id}" \
    -H "Authorization: Bearer $TOKEN"

Replace {job_id} with the actual job ID.

Replace the Tinybird API hostname or region with the API region that matches your Workspace.

SQL queries

Alternatively, you can use SQL to query the jobs_log Data Source from directly within a Tinybird Pipe. This method is ideal for users who are comfortable with SQL and prefer to run queries directly against the data, and then expose them with an endpoint or perform any other actions with it.

SELECT * FROM tinybird.jobs_log WHERE job_id='{job_id}'

Replace {job_id} with the desired job ID. This query retrieves all columns for the specified job, providing comprehensive details about its execution.

Track specific job types

Tracking jobs by type lets you monitor and analyze all jobs of a certain category, such as all copy jobs. This can help you understand the performance and status of specific job types across your entire Workspace.

Jobs API

You can fetch all jobs of a specific type by making a GET request against the Jobs API:

curl \
    -X GET "https://$TB_HOST/v0/jobs?kind=copy" \
    -H "Authorization: Bearer $TOKEN"

Replace copy with the type of job you want to track. Make sure you have set your Tinybird host ($TB_HOST) and authorization token ($TOKEN) correctly.

SQL queries

Alternatively, you can run an SQL query to fetch all jobs of a specific type from the jobs_log Data Source:

SELECT * FROM tinybird.jobs_log WHERE job_type='copy'

Replace copy with the desired job type. This query retrieves all columns for jobs of the specified type.

Track ongoing jobs

To keep track of jobs that are currently running, you can query the status of jobs in progress. This helps in monitoring the real-time workload and managing system performance.

Jobs API

By making an HTTP GET request to the Jobs API, you can fetch all jobs that are currently in the working status:

curl \
    -X GET "https://$TB_HOST/v0/jobs?status=working" \
    -H "Authorization: Bearer $TOKEN"

This call retrieves jobs that are actively running. Ensure you have set your Tinybird host ($TB_HOST) and authorization token ($TOKEN) correctly.

SQL queries

You can also use an SQL query to fetch currently running jobs from the jobs_log Data Source:

SELECT * FROM tinybird.jobs_log WHERE status='working'

This query retrieves all columns for jobs with the status working, allowing you to monitor ongoing operations.

Track errored jobs

Tracking errored jobs is crucial for identifying and resolving issues that may arise during job execution. Jobs API or SQL queries to jobs_log helps you monitor jobs that errored during the execution.

Jobs API

You can use the Jobs API to fetch details of jobs that have ended in error.

Use the following curl command to retrieve all jobs that have a status of error:

curl \
    -X GET "https://$TB_HOST/v0/jobs?status=error" \
    -H "Authorization: Bearer $TOKEN"

This call fetches a list of jobs that are currently in an errored state, providing details that can be used for further analysis or debugging. Make sure you've set your Tinybird host ($TB_HOST) and authorization token ($TOKEN) correctly.

SQL queries

Alternatively, you can use SQL to query the jobs_log Data Source directly.

Use the following SQL query to fetch job IDs, job types, and error messages for jobs that have encountered errors in the past day:

SELECT job_id, job_type, error
FROM tinybird.jobs_log
WHERE
    status='error' AND
    created_at > now() - INTERVAL 1 DAY

Track success rate

Extrapolating from errored jobs, you can also use jobs_log to calculate the success rate of your Workspace jobs:

SELECT
    job_type,
    pipe_id,
    countIf(status='done') AS job_success,
    countIf(status='error') AS job_error,
    job_success / (job_success + job_error) as success_rate
FROM tinybird.jobs_log
WHERE
    created_at > now() - INTERVAL 1 DAY
GROUP BY job_type, pipe_id

Get job execution metadata

In the jobs_log Data Source, there is a property called job_metadata that contains metadata related to job executions. This includes the execution type, manual or scheduled, for Copy and Sink jobs, or the count of quarantined rows for append operations, along with many other properties.

You can extract and analyze this metadata using JSON functions within SQL queries. This allows you to gain valuable information about job executions directly from the jobs_log Data Source.

The following SQL query is an example of how to extract specific metadata fields from the job_metadata property, such as the import mode and counts of quarantined rows and invalid lines, and how to aggregate this data for analysis:

SELECT
    job_type,
    JSONExtractString(job_metadata, 'mode') AS import_mode, 
    sum(simpleJSONExtractUInt(job_metadata, 'quarantine_rows')) AS quarantine_rows,
    sum(simpleJSONExtractUInt(job_metadata, 'invalid_lines')) AS invalid_lines
FROM tinybird.jobs_log
WHERE
    job_type='import' AND
    created_at >= toStartOfDay(now())
GROUP BY job_type, import_mode

There are many other use cases you can put together with the properties in the job_metadata; see below.

Advanced use cases

Beyond basic tracking, you can leverage the jobs_log Data Source for more advanced use cases, such as gathering statistics and performance metrics. This can help you optimize job scheduling and resource allocation.

Get queue status

The following SQL query returns the number of jobs that are waiting to be executed, the number of jobs that are in progress, and how many of them are done already:

SELECT
    job_type,
    countIf(status='waiting') AS jobs_in_queue,
    countIf(status='working') AS jobs_in_progress,
    countIf(status='done') AS jobs_succeeded,
    countIf(status='error') AS jobs_errored
FROM tinybird.jobs_log
WHERE
    created_at > now() - INTERVAL 1 DAY
GROUP BY job_type

Run time statistics grouped by type of job

The following SQL query calculates the maximum, minimum, median, and p95 running time (in seconds) grouped by type of job over the past day. This helps in understanding the efficiency of different job types:

SELECT
    job_type,
    max(date_diff('s', started_at, updated_at)) as max_run_time_in_secs,
    min(date_diff('s', started_at, updated_at)) as min_run_time_in_secs,
    median(date_diff('s', started_at, updated_at)) as median_run_time_in_secs,
    quantile(0.95)(date_diff('s', started_at, updated_at)) as p95_run_time_in_secs
FROM tinybird.jobs_log
WHERE
    created_at > now() - INTERVAL 1 DAY
GROUP BY job_type

Statistics on queue time by type of job

The following SQL query calculates the average queue time, in seconds, for a specific type of job over the past day. This can help in identifying bottlenecks in job scheduling:

SELECT
    job_type,
    max(date_diff('s', created_at, started_at)) as max_run_time_in_secs,
    min(date_diff('s', created_at, started_at)) as min_run_time_in_secs,
    median(date_diff('s', created_at, started_at)) as median_run_time_in_secs,
    quantile(0.95)(date_diff('s', created_at, started_at)) as p95_run_time_in_secs
FROM tinybird.jobs_log
WHERE
    created_at > now() - INTERVAL 1 DAY
GROUP BY job_type

Get statistics on job completion rate

The following SQL query calculates the success rate by type of job (e.g., copy) and Pipe over the past day. This can help you to assess the reliability and efficiency of your workflows by measuring the completion rate of the jobs, and find potential issues and areas for improvement:

SELECT
    job_type,
    pipe_id,
    countIf(status='done') AS job_success,
    countIf(status='error') AS job_error,
    job_success / (job_success + job_error) as success_rate
FROM tinybird.jobs_log
WHERE
    created_at > now() - INTERVAL 1 DAY
GROUP BY job_type, pipe_id

Statistics on the amount of manual vs. scheduled run jobs

The following SQL query calculates the percentage rate between manual and scheduled jobs. Understanding the distribution of manually-executed jobs versus scheduled jobs can let you know about some on-demand jobs performed for some specific reasons:

SELECT
    job_type,
    countIf(JSONExtractString(job_metadata, 'execution_type')='manual') AS job_manual,
    countIf(JSONExtractString(job_metadata, 'execution_type')='scheduled') AS job_scheduled
FROM tinybird.jobs_log
WHERE
    job_type='copy' AND
    created_at > now() - INTERVAL 1 DAY
GROUP BY job_type

Next steps

Updated