Insert Idempotency
Since Timeplus Enterprise v2.4 there are new settings idempotent_id
and enable_idempotent_processing
:
- INSERT INTO .. SETTINGS idempotent_id='..' VALUES ..
- SELECT .. FROM .. SETTINGS enable_idempotent_processing=true
These settings allow you to define a unique ID for each batch INSERT. Sending the data with the same value of idempotent_id won't result in duplicated data in the target stream. You can retry safely with those settings.
Here is an example.
Create a stream for testing
Let's create a stream with 2 columns:
CREATE STREAM test_stream(`i` int, `v` string)
This is an append-only stream. You can insert duplicated data, e.g.
INSERT INTO test43_stream (i, v) VALUES (1, 'a') (1, 'a')
Insert with idempotent_id
Recreate the stream if you have inserted any data. You can run the following SQL multiple times:
INSERT INTO test_stream (i, v) SETTINGS idempotent_id = 'batch1' VALUES (1, 'a') (2, 'b');
Then run
SELECT count() FROM table(test_stream)
You will get 2.
If you remove SETTINGS idempotent_id = 'batch1'
and run the SQL again, duplicated data will be inserted.
If you push data to Timeplus via REST API, you can also set x-timeplus-idempotent-id
in the HTTP header.
Streaming SQL with enable_idempotent_processing
When you insert data with idempotent_id
, querying the stream with table
function will retrieve the historical data of the stream, without duplication. However if you run:
SELECT count() FROM test_stream
Every time you run the INSERT SQL, no matter with idempotent_id
or not, a bigger count number will be returned. To enable idempotent processing, set the flag to true in the Streaming SQL, e.g.
SELECT count() FROM test_stream SETTINGS enable_idempotent_processing=true;
With this setting, your Streaming SQL will only emit results when it accepts new data.
max_idempotent_ids
Timeplus keeps a set of idempotent IDs in the memory and on the disk. By default the maximum number of idempotent IDs is 1000 (per stream). As an advanced setting, before you start the server for the first time, you can set max_idempotent_ids
to a different value in the /server/config.yaml
then start the server. Each node in the cluster should set the same value of max_idempotent_ids
.