Name: Tinybird
Brand: Tinybird
Rating: 5.0 (10 reviews)

Real-time inventory systems help prevent stockouts or overstocking and ensure a good customer experience.

But turning raw inventory and transaction data into a useful real-time inventory API isn’t trivial. You'll need to slice and dice the raw data by product, location, or time to serve an accurate inventory within a specific context. Flexibility adds complexity, and performing on-the-fly aggregations of inventory levels (especially with high-cardinality datasets and complex filters) can be very resource-intensive.

Transactional database limitations

Transactional databases are often used to maintain a source of truth for inventory systems, but they introduce limitations for real-time inventory management at scale. While potentially adequate for low-volume scenarios, simple "latest state" queries or direct scans of raw data struggle to keep up with high event ingestion rates. To effectively manage inventory levels in real time at scale, you need a more refined architectural approach, specifically, a system that handles real-time updates and historical analysis both performantly and cost-effectively.

As an alternative, you can use an analytical database for real-time inventory systems.

Analytical databases like Tinybird leverage columnar storage and specialized indexing to deliver exceptional high-throughput, real-time analytics. They store data in columnar files, often compressed and organized into parts designed for efficient sequential reads, which is crucial for analytical queries that typically scan large portions of the data. To maintain this efficiency, data within these parts is often kept contiguous on disk and close to memory. These performance characters make analytical databases like Tinybird very suitable for real-time inventory systems.

Analytical database limitations

However, there are some tradeoffs.

Unlike in transactional databases, where updates and deletes modify individual rows in place to maintain state, analytical databases like Tinybird are designed for immutability using an append-only event log. In transactional databases, update/delete operations can be used to alter a single record, however an analytical database must rewrite the entire column segment within the affected data part, create new parts that are merged in the background, and keep associated indexes up to date, which is particularly costly for large datasets.To better understand this concept, imagine a library of books organized by subject, with an incredibly detailed catalog. Finding a specific book is lightning-fast because of this meticulous organization. This is akin to how analytical databases use columnar storage and indexing for rapid data retrieval.

However, imagine you need to correct a single page in one of those books. What seems like a simple task becomes complex:

Locating the book: First, you must pinpoint the exact book (the data part) containing the error.
Rewriting the section: Instead of just changing the page, you might have to rewrite a large section, or even the entire book (rewriting the column segment or creating a new data part).
Updating the catalog: The library's catalog (indexes) must be updated to reflect the change.

Given this, you might think that maintaining an accurate state of inventory in real-time wouldn't work with analytical databases. There are some strategies, however, to overcome these limitations and still reap the benefits of the performance of an analytical database for real-time inventory management.

Specifically, pre-aggregated snapshots and the lambda architecture allow you to maintain real-time state without record updates. The lambda architecture leverages the aggregation and query performance strengths of analytical databases to produce an accurate, real-time inventory state.

Lambda architecture

The lambda architecture is a data-processing architecture that combines both batch and real-time data processing to provide up-to-date results while minimizing computational load.

Lambda architecture works very well for real-time inventory management because it combines recent inventory state snapshots with real-time transactional data to arrive at up-to-the-second live, accurate inventory metrics. Rather than aggregating all inventory counts at query time, lambda architecture uses pre-aggregated snapshots merged with real-time aggregations to reduce compute requirements while still calculating inventory states in real time.

A lambda architecture diagram implemented in Tinybird — An example lambda architecture implementation in Tinybird, an analytical database

Pre-aggregated snapshots

Imagine tracking stock level changes for thousands of products across multiple warehouses. A real-time data source captures every transaction and inventory change. To analyze stock levels over a month, a direct query would require scanning all transactions within that period, potentially involving billions of rows and leading to slow queries.

Pre-aggregated snapshots solve this. They summarize data at specific intervals (e.g. daily), reducing scans to a single row per product/warehouse, improving query performance and resource utilization.

It's important to note that snapshot generation queries are typically batch processes that periodically create summarized data views. This allows the system to offload computationally intensive aggregations, maintaining the real-time data ingestion and query performance of the primary real-time pipeline.

However, since you can’t rely solely on snapshots because they lack real-time data granularity, the ideal approach would be blending pre-aggregated snapshots with real-time query capabilities. This strategy offers both historical depth and current data accuracy, which is the core principle of the lambda architecture.

Herein lies the challenge with lambda architecture. Because it combines two modalities: batch and real-time, in practice, it can require several technical handoffs between different data systems, resulting in brittle data pipelines that require a lot of maintenance.

Tinybird for lambda architecture

Tinybird, a real-time data platform used by many e-commerce and retail companies for real-time inventory management, simplifies the lambda architecture implementation by supporting both batch and real-time modalities in a unified platform.

To demonstrate this, let's walk through how to implement lambda architecture in Tinybird for real-time inventory management.

Start building with Tinybird!

If you've read this far, you might want to use Tinybird as your analytics backend. You can just get started, on the free plan.

Implementing lambda architecture with Tinybird

Instead of juggling separate data systems to implement the Lambda architecture, Tinybird offers a single platform to build a unified, efficient pipeline.

With Tinybird, you can combine both pre-aggregated inventory snapshots with real-time transactional queries, merging the two to produce an up-to-date inventory state in realtime.

Here’s a simplified view of how it works:

A diagram showing lambda architecture implementation in Tinybird — How to implement a lambda architecture in Tinybird

Real-time data capture: The Ingestion Layer

Retail stores produce a continuous stream of stock updates. Every sale, return, or adjustment must be captured in real time.

Tinybird's Events API can be used to stream thousands of stock events per second from anywhere you can make an HTTP request. In addition, Tinybird's native Kafka connector can be used to consume streams from inventory topics on any existing Kafka-compatible infrastructure.

These systems can be used to ingest hundreds of thousands of inventory events per second.

Events are written into a raw, landing data source with less than a few seconds of end-to-end latency.

Pre-aggregated snapshots: The Batch Layer

Instead of setting up a separate batch processing system, you create periodic, summarized snapshots of your stock data directly within Tinybird. These snapshots, stored in a separate data source, provide a stable, historical record.

You can create daily snapshots for a broad overview or more frequent snapshots, like every five minutes, for finer detail. Snapshots are generated using a combination of copy pipes and materialized views, which aggregate data from the real time data source and save the results in the new one.

Unified, real-time inventory API: The Serving Layer

The serving layer involves combining real-time events and pre-aggregated snapshots within a single, dynamic API endpoint pipe.

Using SQL, you can:

Fetch the most recent snapshot as a starting point.
Append the latest real-time updates to this baseline.
Aggregate and filter the combined data to calculate the latest state
Publish the state calculation as a scalable real-time REST API
Use SQL templates to define query parameters for dynamic filtering at query time

Furthermore, you can use advanced logic in Tinybird pipes to automatically adjust the data source used and aggregation level based on user-defined parameters, such as date filters. For example, if a user wants to see inventory states over multiple days, the endpoint pipe can query daily snapshots. But if they zoom in on a specific hour, the same pipe can select from hourly snapshots, providing the appropriate level of detail. This dynamic adaptation ensures optimal performance and data granularity, all within a single, streamlined workflow.

A practical example

To better demonstrate how to build a real-time inventory management system using Lambda architecture, let's walk through a practical implementation using Tinybird.

The following code snippets illustrate each stage of the pipeline, from data ingestion to generating aggregated snapshots and creating a dynamic endpoint for visualization.

This is an admittedly simple dataflow, designed to clarify the concepts discussed above. We will end up with a system that aligns with the diagram below.

The basic lambda architecture we will build to support real-time inventory management

Ingestion: The landing data source

The first data source, which we call the "landing data source" serves as the entry point for all inventory change events. It contains all raw inventory events. The data source schema defines columns for item_id, location, inventory status, and inventory quantity in addition to timestamps for when the inventory state was both created and updated.

Real-time processing: Assigning snapshot IDs

A materialized pipe then assigns a snapshot_id to each stock event, grouping events into snapshots based on pre-defined time intervals. In this example, we use 5-minute intervals, but the time interval is up to you and depends on your specific use case and performance requirements.

The results from the materialized pipe are stored in a new data source: stock_events_snapshot, with the following schema:

Considerations:

Materialized views are calculated on an ingest trigger and provide a real-time data population, so this data source will be updated as soon as new events arrive.
The ReplacingMergeTree engine optimizes data storage by retaining only the most recent update for each item_id, status, and location combination within a 5-minute interval. It will apply deduplication in the background process when merging parts.
The Time to Live (TTL) setting maintains only the last 7 days' worth of snapshots, reducing storage.

Batch processing: Generating raw snapshots

To handle the batch layer in Tinybird, we can use a copy pipe, which runs an SQL query over a data source and copies the result into another data source on a set schedule. In this example, the copy pipe generates raw snapshots of inventory data every 5 minutes.

Considerations:

The copy pipe query is designed to merge the current 5-minute snapshot with the previous snapshots, therefore the first snapshot must be created manually.
In essence, this pipe merges the changes belonging to the snapshot currently being generated with the rows that did not change from the previous snapshot.
It is important to note that, for simplicity, we have not set up a check to detect that we have received all the changes corresponding to the last 5 minutes (to avoid the generation of an incomplete snapshot), but it is a recommended practice which would be solved by checking that the last updated_at received is greater than the snapshot_id we are generating.

The results of this copy pipe are written into a new data source: stock_snapshots_raw.

This data source stores raw inventory snapshots with a 5-day TTL.

Considerations:

The 5-day TTL prevents excessive storage consumption and can be adjusted to your particular use case. We have determined that 5 days of 5-minute granularity is performance-compatible for our use case.
Direct queries against this data source may consume significant resources, so pre-aggregation is recommended.

Batch processing: Generating aggregated snapshots

In addition to the finely granular 5-minute snapshots, we generate pre-aggregated snapshots for larger time intervals that can be utilized to reduce query size when larger time ranges are selected.

For example, here is a materialized pipe that generates daily aggregated snapshots of inventory data from the raw snapshot data.

The results of this materialized pipe, which contains daily inventory snapshots, are stored in a new data source: stock_snapshots_agg_daily.

Considerations:

Aggregated views drastically improve query performance for common analytical tasks.
Since the aggregated snapshots are considerably smaller than the raw ones, we don’t set a TTL, allowing us to keep a daily, historical view of our stock evolution.
For the purpose of this example, daily aggregated views are enough, but we could also set hourly aggregations or 5-minute aggregations if we need finer granularity.

Serving Layer: Dynamic endpoint

Using a combination of the pre-aggregated snapshots and the real-time events, we can create a new endpoint pipe: stock_evolution.

This endpoint pipe provides an API endpoint to visualize stock evolution over time with dynamic filtering by location and time range. Using advanced logic templates, the pipe dynamically selects the appropriate snapshot based on the requested time range to minimize resource consumption.

Considerations:

We are using the param agg when calling the endpoint to determine the kind of aggregation we want to show. We choose which pre-aggregated snapshot to select from based on that parameter.
We are also adding dynamic filters like start_date, end_date, and warehouse.
When the user is not querying daily aggregations, we generate the last snapshot of our data on the fly to include inventory changes that occurred after the latest snapshot generation.

If you want to check out a full example of Lambda architecture in Tinybird with more details and nuance, click here.

Subscribe to our newsletter

Get 10 links weekly to the Data and AI articles the Tinybird team is reading.

Loading…

Next Steps

We've shown you how to build a dynamic, real-time inventory management system using Tinybird and lambda architecture. Tinybird simplifies the process by giving you the tools and infrastructure to build the entire architecture, from data ingestion to API endpoints, without patching together batch and real-time systems.

To take this further:

Experiment with the provided code snippets. Tailor the SQL and data source configurations to match your specific inventory data and reporting needs. Here's the code repository.
Explore Tinybird's documentation. Focus on understanding how copy pipes and materialized views can be used to create your own snapshotting and aggregation strategies using lambda architecture.
Implement a basic dashboard. Use the generated API endpoint to visualize your stock evolution data, and iterate on the dashboard design as you gain more insights.

Skip the infra work. Ship your first API today.

Product /

Company /

Resources /

Integrations /

Use Cases /

Real-time inventory management with lambda architecture

Transactional database limitations

Analytical database limitations

Lambda architecture

Pre-aggregated snapshots

Tinybird for lambda architecture

Implementing lambda architecture with Tinybird

Real-time data capture: The Ingestion Layer

Pre-aggregated snapshots: The Batch Layer

Unified, real-time inventory API: The Serving Layer

A practical example

Ingestion: The landing data source

Real-time processing: Assigning snapshot IDs

Batch processing: Generating raw snapshots

Batch processing: Generating aggregated snapshots

Serving Layer: Dynamic endpoint

Next Steps

Skip the infra work. Ship your first API today.

Product /

Company /

Resources /

Integrations /

Use Cases /