Tumble/Hop/Session Windows
Tumble
Tumble slices the unbounded data into different windows according to its parameters. Internally, Timeplus observes the data streaming and automatically decides when to close a sliced window and emit the final results for that window.
SELECT <column_name1>, <column_name2>, <aggr_function>
FROM tumble(<stream_name>, [<timestamp_column>], <tumble_window_size>, [<time_zone>])
[WHERE clause]
GROUP BY [window_start | window_end], ...
EMIT <window_emit_policy>
SETTINGS <key1>=<value1>, <key2>=<value2>, ...
Tumble window means a fixed non-overlapped time window. Here is one example for a 5 seconds tumble window:
["2020-01-01 00:00:00", "2020-01-01 00:00:05]
["2020-01-01 00:00:05", "2020-01-01 00:00:10]
["2020-01-01 00:00:10", "2020-01-01 00:00:15]
...
tumble
window in Timeplus is left closed and right open [)
meaning it includes all events which have timestamps greater or equal to the lower bound of the window, but less than the upper bound of the window.
tumble
in the above SQL spec is a table function whose core responsibility is assigning tumble window to each event in a streaming way. The tumble
table function will generate 2 new columns: window_start, window_end
which correspond to the low and high bounds of a tumble window.
tumble
table function accepts 4 parameters: <timestamp_column>
and <time-zone>
are optional, the others are mandatory.
When the <timestamp_column>
parameter is omitted from the query, the stream's default event timestamp column which is _tp_time
will be used.
When the <time_zone>
parameter is omitted the system's default timezone will be used. <time_zone>
is a string type parameter, for example UTC
.
<tumble_window_size>
is an interval parameter: <n><UNIT>
where <UNIT>
supports s
, m
, h
, d
, w
.
It doesn't yet support M
, q
, y
. For example, tumble(my_stream, 5s)
.
More concrete examples:
SELECT device, max(cpu_usage)
FROM tumble(device_utils, 5s)
GROUP BY device, window_end
The above example SQL continuously aggregates max cpu usage per device per tumble window for the stream devices_utils
. Every time a window is closed, Timeplus Proton emits the aggregation results.
Let's change tumble(stream, 5s)
to tumble(stream, timestmap, 5s)
:
SELECT device, max(cpu_usage)
FROM tumble(devices, timestamp, 5s)
GROUP BY device, window_end
EMIT AFTER WINDOW CLOSE WITH DELAY 2s;
Same as the above delayed tumble window aggregation, except in this query, user specifies a specific time column timestamp
for tumble windowing.
The example below is so called processing time processing which uses wall clock time to assign windows. Timeplus internally processes now/now64
in a streaming way.
SELECT device, max(cpu_usage)
FROM tumble(devices, now64(3, 'UTC'), 5s)
GROUP BY device, window_end
EMIT AFTER WINDOW CLOSE WITH DELAY 2s;
Hop
Like Tumble, Hop also slices the unbounded streaming data into smaller windows, and it has an additional sliding step.
SELECT <column_name1>, <column_name2>, <aggr_function>
FROM hop(<stream_name>, [<timestamp_column>], <hop_slide_size>, [hop_windows_size], [<time_zone>])
[WHERE clause]
GROUP BY [<window_start | window_end>], ...
EMIT <window_emit_policy>
SETTINGS <key1>=<value1>, <key2>=<value2>, ...
Hop window is a more generalized window compared to tumble window. Hop window has an additional
parameter called <hop_slide_size>
which means window progresses this slide size every time. There are 3 cases:
<hop_slide_size>
is less than<hop_window_size>
. Hop windows have overlaps meaning an event can fall into several hop windows.<hop_slide_size>
is equal to<hop_window_size>
. Degenerated to a tumble window.<hop_slide_size>
is greater than<hop_window_size>
. Windows has a gap in between. Usually not useful, hence not supported so far.
Please note, at this point, you need to use the same time unit in <hop_slide_size>
and <hop_window_size>
, for example hop(device_utils, 1s, 60s)
instead of hop(device_utils, 1s, 1m)
.
Here is one hop window example which has 2 seconds slide and 5 seconds hop window.
["2020-01-01 00:00:00", "2020-01-01 00:00:05]
["2020-01-01 00:00:02", "2020-01-01 00:00:07]
["2020-01-01 00:00:04", "2020-01-01 00:00:09]
["2020-01-01 00:00:06", "2020-01-01 00:00:11]
...
Except that the hop window can have overlaps, other semantics are identical to the tumble window.
SELECT device, max(cpu_usage)
FROM hop(device_utils, 2s, 5s)
GROUP BY device, window_end
EMIT AFTER WINDOW CLOSE;
The above example SQL continuously aggregates max cpu usage per device per hop window for stream device_utils
. Every time a window is closed, Timeplus emits the aggregation results.
Session
This is similar to tumble and hop window. Please check the session function.