Skip to main content

Understanding Watermark

In stream processing, a watermark is a mechanism used to track the progress of data processing in one or more streams. It helps determine when all events up to a specific timestamp have likely arrived, enabling the system to process time-based operations like window aggregations.

Since stream processing deals with unbounded, out-of-order data, watermarks are essential for defining bounded windows and for discarding late or outdated data in a consistent way. Watermarks allow the system to advance event-time, trigger computations (e.g., closing a tumbling window and emitting results), and maintain accuracy even when data arrives with delays.

Try It Out

To better understand how watermarks work, use the following tool. It generates random data and simulates watermark progression for a tumbling window aggregation with a 5-second window size and a 2-second watermark delay.

You can click the Play button to start the simulation, or click the Next or Prev to view the next or previous step.

Watermark & Window Processing

SELECT window_start,count(),sum(v),avg(v) 
FROM tumble(stream, 5s)
GROUP BY window_start
EMIT AFTER WATERMARK WITH DELAY 2s
0123450s5s10s15s20s25s30sComputingWatermark: 0s1
N: sequence number when the event was received  
 
Normal Event
 
Out-of-Order Event
 
Late Event