PostgreSQL External Table
Timeplus can read or write PostgreSQL tables directly. This unlocks a set of new use cases, such as
- Use Timeplus to efficiently process real-time data in Kafka/Redpanda, apply flat transformation or stateful aggregation, then write the data to the local or remote PostgreSQL for further analysis or visualization.
- Enrich the live data with the static or slow-changing data in PostgreSQL. Apply streaming JOIN.
- Use Timeplus to query historical or recent data in PostgreSQL.
This integration is done by introducing "External Table" in Timeplus. Similar to External Stream, there is no data persisted in Timeplus. However, since the data in PostgreSQL is in the form of table, not data stream, so we call this as External Table. Currently, we support S3, MySQL, PostgreSQL and ClickHouse. In the roadmap, we will support more integration by introducing other types of External Table.
CREATE EXTERNAL TABLE
Syntax
CREATE EXTERNAL TABLE name
SETTINGS type='postgresql',
address='host:port',
user='..',
password='..',
database='..',
secure=true|false,
config_file='..',
table='..';
The required settings are type and address. For other settings, the default values are
- 'default' for
user
- '' (empty string) for
password
- 'default' for
database
- 'false' for
secure
- If you omit the table name, it will use the name of the external table
The config_file
setting is available since Timeplus Enterprise 2.7. You can specify the path to a file that contains the configuration settings. The file should be in the format of key=value
pairs, one pair per line. You can set the PostgreSQL user and password in the file.
Please follow the example in Kafka External Stream.
You don't need to specify the columns, since the table schema will be fetched from the PostgreSQL server.
Once the external table is created successfully, you can run the following SQL to list the columns:
DESCRIBE name
The data types in the output will be Timeplus data types, such as uint8
, instead of PostgreSQL data type. Timeplus maintains a mapping for those types. Learn more.
You can define the external table and use it to read data from the PostgreSQL table, or write to it.
Connect to a local PostgreSQL
You can use the following command to start a local PostgreSQL via Docker:
docker run --name=postgres --rm --env=POSTGRES_PASSWORD=foo -p 5432:5432 postgres:latest -c log_statement=all
Then open a new terminal and run the following command to connect to the PostgreSQL server:
psql -p 5432 -U postgres -h localhost
Create a table and add some rows:
-- Table Definition
CREATE TABLE "public"."dim_products" (
"product_id" varchar NOT NULL,
"price" float8,
PRIMARY KEY ("product_id")
);
INSERT INTO "public"."dim_products" ("product_id", "price") VALUES ('1', '10.99'), ('2', '19.99'), ('3', '29.99');
In Timeplus, you can create an external table to read data from the PostgreSQL table:
CREATE EXTERNAL TABLE pg_local
SETTINGS type='postgresql',
address='localhost:5432',
database='postgres',
user='postgres',
password='foo',
table='dim_products';
Then query the table:
SELECT * FROM pg_local;
Connect to Aiven for PostgreSQL
Example SQL to connect to Aiven for PostgreSQL:
CREATE EXTERNAL TABLE postgres_aiven
SETTINGS type='postgresql',
address='abc.aivencloud.com:28851',
user='avnadmin',
password='..',
database='defaultdb',
secure=true,
table='events';
Read data from PostgreSQL
Once the external table is created successfully, it means Timeplus can connect to the PostgreSQL server and fetch the table schema.
You can query it via the regular select .. from table_name
.
Please note, in the current implementation, all rows will be fetched from PostgreSQL to Timeplus, with the selected columns. Then Timeplus applies the SQL functions and LIMIT n
locally. It's not recommended to run SELECT *
for a large PostgreSQL table.
Also note, use the Timeplus function names when you query the external table, such as to_int, instead of PostgreSQL's naming convention, e.g. CONVERT. In current implementation, the SQL functions are applied in Timeplus engine. We plan to support some function push-down to PostgreSQL in future versions.
Limitations:
- tumble/hop/session/table functions are not supported for External Table (coming soon)
- scalar or aggregation functions are performed by Timeplus, not the remote PostgreSQL
LIMIT n
is performed by Timeplus, not the remote PostgreSQL
Write data to PostgreSQL
You can run regular INSERT INTO
to add data to PostgreSQL table, such as:
INSERT INTO pg_local (product_id, price) VALUES ('10', 90.99), ('20', 199.99);
Please note, since the price
column is in float8
type, in Timeplus, you need to insert via 90.99
, instead of a string "90.99"
as in PostgreSQL INSERT command.
However it's more common to use a Materialized View in Timeplus to send the streaming SQL results to PostgreSQL.
Say you have created an external table pg_table
. You can create a materialized view to read Kafka data(via an external stream) and transform/aggregate the data and send to the external table:
-- setup the ETL pipeline via a materialized view
CREATE MATERIALIZED VIEW mv INTO pg_table AS
SELECT now64() AS _tp_time,
raw:requestedUrl AS url,
raw:method AS method,
lower(hex(md5(raw:ipAddress))) AS ip
FROM kafka_events;
Batching Settings
In Timeplus Enterprise, additional performance tuning settings are available, such as
INSERT INTO pg_table
SELECT * FROM some_source_stream
SETTINGS max_insert_block_size=10, max_insert_block_bytes=1024, insert_block_timeout_ms = 100;
max_insert_block_size
- The maximum block size for insertion, i.e. maximum number of rows in a batch. Default value: 65409max_insert_block_bytes
- The maximum size in bytes of block for insertion. Default value: 1 MiB.insert_block_timeout_ms
- The maximum time in milliseconds for constructing a block(a block) for insertion. Increasing the value gives greater possibility to create bigger blocks (limited bymax_insert_block_bytes
andmax_insert_block_size
), but also increases latency. Negative numbers means no timeout. Default value: 500.
Supported data types
All PostgreSQL data types are supported in the external table. While reading or writing data, Timeplus applies a data type mapping, such as converting Timeplus' uint8
to PostgreSQL's SMALLINT
. If you find anything wrong with the data type, please let us know.