_tp_time (Event time)

All data with an event time

Streams are where data live and each data contains a _tp_time column as the event time. Timeplus takes this attribute as one important identity of an event.

Event time is used to identify when the event is generated, like a birthday to a human being. It can be the exact timestamp when the order is placed, when the user logins a system, when an error occurs, or when an IoT device reports its status. If there is no suitable timestamp attribute in the event, Timeplus will generate the event time based on the data ingestion time.

By default, the _tp_time column is in datetime64(3, 'UTC') type with millisecond precision. You can also create it in datetime type with second precision.

When you are about to create a new stream, please choose the right column as the event time. If no column is specified, then Timeplus will use the current timestamp as the value of _tp_time It's not recommended to rename a column as _tp_time at the query time, since it will lead to unexpected behaviour, specially for Time Travel.

Why event time is treated differently

Event time is used almost everywhere in Timeplus data processing and analysis workflow:

while doing time window based aggregations, such as tumble or hop to get the downsampled data or outlier in each time window, Timeplus will use the event time to decide whether certain event belongs to a specific window
in such time sensitive analytics, event time is also used to identify out of order events or late events, and drop them in order to get timely streaming insights.
when one data stream is joined with the other, event time is the key to collate the data, without expecting two events to happen in exactly the same millisecond.
event time also plays an important role to device how long the data will be kept in the stream

How to specify the event time

Specify during data ingestion

When you ingest data into Timeplus, you can specify an attribute in the data which best represents the event time. Even if the attribute is in String type, Timeplus will automatically convert it to a timestamp for further processing.

If you don't choose an attribute in the wizard, then Timeplus will use the ingestion time to present the event time, i.e. when Timeplus receives the data. This may work well for most static or dimensional data, such as city names with zip codes.

Specify during query

The tumble or hop window functions take an optional parameter as the event time column. By default, we will use the event time in each data. However you can also specify a different column as the event time.

Taking an example for taxi passengers. The data stream can be

car_id	user_id	trip_start	trip_end	fee
c001	u001	2022-03-01 10:00:00	2022-03-01 10:30:00	45

The data may come from a Kafka topic. When it's configured, we may set trip_end as the (default) event time. So that if we want to figure out how many passengers in each hour, we can run query like this

select count(*) from tumble(taxi_data,1h) group by window_end

This query uses trip_end , the default event time, to run the aggregation. If the passenger ends the trip on 00:01 at midnight, it will be included in the 00:00-00:59 time window.

In some cases, you as the analyst, may want to focus on how many passengers get in the taxi, instead of leaving the taxi, in each hour, then you can set trip_start as the event time for the query via tumble(taxi_data,trip_start,1h)

Full query:

select count(*) from tumble(taxi_data,trip_start,1h) group by window_end

All data with an event time​

Why event time is treated differently​

How to specify the event time​

Specify during data ingestion​

Specify during query​

All data with an event time

Why event time is treated differently

How to specify the event time

Specify during data ingestion

Specify during query