Skip to main content

Insert Idempotency

Since Timeplus Enterprise v2.4 there are new settings idempotent_id and enable_idempotent_processing:

  • INSERT INTO .. SETTINGS idempotent_id='..' VALUES ..
  • SELECT .. FROM .. SETTINGS enable_idempotent_processing=true

These settings allow you to define a unique ID for each batch INSERT. Sending the data with the same value of idempotent_id won't result in duplicated data in the target stream. You can retry safely with those settings.

Here is an example.

Create a stream for testing

Let's create a stream with 2 columns:

CREATE STREAM test_stream(`i` int,  `v` string)

This is an append-only stream. You can insert duplicated data, e.g.

INSERT INTO test43_stream (i, v) VALUES (1, 'a') (1, 'a')

Insert with idempotent_id

Recreate the stream if you have inserted any data. You can run the following SQL multiple times:

INSERT INTO test_stream (i, v) SETTINGS idempotent_id = 'batch1' VALUES (1, 'a') (2, 'b');

Then run

SELECT count() FROM table(test_stream)

You will get 2.

If you remove SETTINGS idempotent_id = 'batch1' and run the SQL again, duplicated data will be inserted.

If you push data to Timeplus via REST API, you can also set x-timeplus-idempotent-id in the HTTP header.

Streaming SQL with enable_idempotent_processing

When you insert data with idempotent_id, querying the stream with table function will retrieve the historical data of the stream, without duplication. However if you run:

SELECT count() FROM test_stream

Every time you run the INSERT SQL, no matter with idempotent_id or not, a bigger count number will be returned. To enable idempotent processing, set the flag to true in the Streaming SQL, e.g.

SELECT count() FROM test_stream SETTINGS enable_idempotent_processing=true;

With this setting, your Streaming SQL will only emit results when it accepts new data.

max_idempotent_ids

Timeplus keeps a set of idempotent IDs in the memory and on the disk. By default the maximum number of idempotent IDs is 1000 (per stream). As an advanced setting, before you start the server for the first time, you can set max_idempotent_ids to a different value in the /server/config.yaml then start the server. Each node in the cluster should set the same value of max_idempotent_ids.