Query Settings
Timeplus supports some advanced SETTINGS
to fine tune the streaming query processing behaviors, listed below:
query_mode
query_mode=<table|streaming>
. By default, if it's omitted, it's streaming
. A general setting which decides if the overall query is streaming data processing or historical data processing. This can be overwritten in the port. If you use 3128, default is streaming. If you use 8123, default is historical.
seek_to
seek_to=<timestamp|earliest|latest>
. By default, if it's omitted, it's latest
. A setting which tells Timeplus to seek old data in the streaming storage by timestamp. It can be a relative timestamp or an absolute timestamp. By default, it is latest
, which tells Timeplus to not seek old data. Example:seek_to='2022-01-12 06:00:00.000'
, seek_to='-2h'
, or seek_to='earliest'
Please note, as of Jan 2023, we no longer recommend you use SETTINGS seek_to=..
(except for External Stream). Please use WHERE _tp_time>='2023-01-01'
or similar. _tp_time
is the special timestamp column in each raw stream to represent the event time. You can use >
, <
, BETWEEN .. AND
operations to filter the data in Timeplus storage. The only exception is External Stream. If you need to scan all existing data in the Kafka topic, you need to run the SQL with seek_to, e.g. select raw from my_ext_stream settings seek_to='earliest'
enable_backfill_from_historical_store
enable_backfill_from_historical_store=0|1
. By default, if it's omitted, it's 1
.
- When it's 0, the query engine either loads data from streaming storage, or from historical storage.
- When it's 1, the query engine evaluates whether it's necessary to load data from historical storage(such as the time range is outside of the streaming storage), or it'll be more efficient to get data from historical storage(for example, count/min/max is pre-computed in historical storage, faster than scanning data in streaming storage).
force_backfill_in_order
force_backfill_in_order=0|1
. By default, if it's omitted, it's 0
.
- When it's 0, the data from the historical storage are turned without extra sorting. This would improve the performance.
- When it's 1, the data from the historical storage are turned with extra sorting. This would decrease the performance. So turn on this flag carefully.
emit_during_backfill
emit_during_backfill=0|1
. By default, if it's omitted, it's 0
.
- When it's 0, the query engine won't emit intermediate aggregation results during the historical data backfill.
- When it's 1, the query engine will emit intermediate aggregation results during the historical data backfill. This will ignore the
force_backfill_in_order
setting. As long as there are aggregation functions and time window functions(e.g. tumble/hop/session) in the streaming SQL, when theemit_during_backfill
is on,force_backfill_in_order
will be applied to 1 automatically.
recovery_policy
recovery_policy=<strict|best_effort>
. By default, if it's omitted, it's strict
. The main use case for materialized views, if new events fail to process, such as converting a string to an int32, the default behavior will make the materialized view unusable. You may monitor the Timeplus logs to act on the dirty data. However, if you set SETTINGS recovery_policy=best_effort
, then Timeplus will attempt to recover from checkpoint and try up to 3 times, then skip dirty data and continue processing the rest of the data.
replay_speed
replay_speed=N
When replay_speed set to 1, it will use the replay_time_column
(_tp_append_time
as the default) to replay the historical data. When replay_speed is not set or set to 0, historical data will be replayed as fast as possible. When replay_speed set to 0 to 1, it will replay slower. If it's greater than 1, it will replay faster.
e.g.
select * from test_stream where _tp_time > earliest_timestamp()
settings replay_speed=1, replay_time_column='time_col'
replay_time_column
replay_time_column=columnName
Specify the replay time column, default column is _tp_append_time
.
default_hash_table
Default value is memory
. To optimize memory usage for large data streams, you can set default_hash_table='hybrid'
. This will use both memory and disk to store the hash table. Available since Timeplus Enterprise 2.6.0.
default_hash_join
Default value is memory
. To optimize memory usage for joining large data streams, you can set default_hash_join='hybrid'
. This will use both memory and disk to store the hash table. Available since Timeplus Enterprise 2.6.0.
max_threads
Type: uint64
Default: 0, which means the number of threads for request execution is equal to the number of CPU cores.
enable_idempotent_processing
Type: bool
Default: false
Whether to enable idempotent processing for the query. If set to true, the query will be processed idempotently. If set to 0, the query will be processed non-idempotently. Learn more.