Timeplus Proton
How to install Proton
Proton can be installed as a single binary on Linux or Mac, via:
curl https://install.timeplus.com/oss | sh
Once the proton
binary is available, you can run Timeplus Proton in different modes:
- Local Mode. You run
proton local
to start it for fast processing on local and remote files using SQL without having to install a full server - Config-less Mode. You run
proton server
to start the server and put the config/logs/data in the current folderproton-data
. Then useproton client
in the other terminal to start the SQL client. - Server Mode. You run
sudo proton install
to install the server in predefined path and a default configuration file. Then you can runsudo proton server -C /etc/proton-server/config.yaml
to start the server and useproton client
in the other terminal to start the SQL client.
For Mac users, you can also use Homebrew to manage the install/upgrade/uninstall:
brew tap timeplus-io/timeplus
brew install proton
You can also install Proton in Docker, Docker Compose or Kubernetes.
docker run -d --pull always -p 8123:8123 -p 8463:8463 --name proton d.timeplus.com/timeplus-io/proton:latest
Please check Server Ports to determine which ports to expose, so that other tools can connect to Timeplus, such as DBeaver.
The Docker Compose stack demonstrates how to read/write data in Kafka/Redpanda with external streams.
You can also try Proton in the fully-managed Timeplus Cloud.
Running the single node Proton via Kubernetes is possible. We recommend you contact us to deploy Timeplus Enterprise for on-prem deployment.
How to read/write Kafka or Redpanda
You use External Stream to read from Kafka topics or write data to the topics. We verified the integration with Apache Kafka, Confluent Cloud, Confluent Platform, Redpanda, WarpStream and many more.
CREATE EXTERNAL STREAM [IF NOT EXISTS] stream_name
(<col_name1> <col_type>)
SETTINGS type='kafka', brokers='ip:9092',topic='..',security_protocol='..',
username='..',password='..',sasl_mechanism='..'
How to load data from PostgreSQL/MySQL/ClickHouse
For PostgreSQL, MySQL or other OLTP databases, you can apply the CDC (Change Data Capture) technology to load realtime changes to Proton via Debezium and Kafka/Redpanda. Example configuration at the cdc folder of proton repo. This blog shows the Timeplus Cloud UI but could be applied to Proton too.
If you have data in local ClickHouse or ClickHouse Cloud, you can also use External Table to read data.
How to read/write ClickHouse
You use External Table to read from ClickHouse tables or write data to the ClickHouse tables. We verified the integration with self-hosted ClickHouse, ClickHouse Cloud, Aiven for ClickHouse and many more.
How to handle UPSERT or DELETE
By default, streams in Timeplus are append-only. But you can create a stream with versioned_kv
or changelog_kv
mode to support data mutation or deletion. The Versioned Stream supports UPSERT (Update or Insert) and Changelog Stream supports UPSERT and DELETE.
You can use tools like Debezium to send CDC messages to Timeplus, or just use INSERT
SQL to add data. Values with same primary key(s) will be overwritten. For more details, please check this video:
How to work with JSON
Proton supports powerful, yet easy-to-use JSON processing. You can save the entire JSON document as a raw
column in string
type. Then use JSON path as the shortcut to access those values as string. For example raw:a.b.c
. If your data is in int/float/bool or other type, you can also use ::
to convert them. For example raw:a.b.c::int
. If you want to read JSON documents in Kafka topics, you can choose to read each JSON as a raw
string, or read each top level key/value pairs as columns. Please check the doc for details.
How to load CSV files
If you only need to load a single CSV file, you can create a stream then use the INSERT INTO .. SELECT .. FROM file(..)
syntax. For example, if there are 3 fields in the CSV file: timestamp, price, volume, you can create the stream via
CREATE STREAM stream
(
`timestamp` datetime64(3),
`price` float64,
`volume` float64
)
SETTINGS event_time_column = 'timestamp';
Please note there will be the 4th column in the stream, which is _tp_time as the Event Time.
To import CSV content, use the file table function to set the file path and header and data types.
INSERT INTO stream (timestamp,price,volume)
SELECT timestamp,price,volume
FROM file('data/my.csv', 'CSV', 'timestamp datetime64(3), price float64, volume float64')
SETTINGS max_insert_threads=8;
Please note:
- You need to specify the column names. Otherwise
SELECT *
will get 3 columns while there are 4 columns in the data stream. - For security reasons, Proton only read files under
proton-data/user_files
folder. If you install proton viaproton install
command on Linux servers, the folder will be/var/lib/proton/user_files
. If you don't install proton and run proton binary directly viaproton server start
, the folder will beproton-data/user_files
- We recommend to use
max_insert_threads=8
to use multiple threads to maximize the ingestion performance. If your file system has high IOPS, you can create the stream withSETTINGS shards=3
and set a highermax_insert_threads
value in theINSERT
statement.
If you need to import multiple CSV files to a single stream, you can do something similar. You can even add one more column to track the file path.
CREATE STREAM kraken_all
(
`path` string,
`timestamp` datetime64(3),
`price` float64,
`volume` float64,
`_tp_time` datetime64(3, 'UTC') DEFAULT timestamp CODEC(DoubleDelta, LZ4),
INDEX _tp_time_index _tp_time TYPE minmax GRANULARITY 2
)
ENGINE = Stream(1, 1, rand())
PARTITION BY to_YYYYMM(_tp_time)
ORDER BY to_start_of_hour(_tp_time)
SETTINGS event_time_column = 'timestamp', index_granularity = 8192;
INSERT INTO kraken_all (path,timestamp,price,volume)
SELECT _path,timestamp,price,volume
FROM file('data/*.csv', 'CSV', 'timestamp datetime64(3), price float64, volume float64')
SETTINGS max_insert_threads=8;
How to visualize Timeplus query results with Grafana or Metabase
The official Grafana plugin for Timeplus is available here. The source code is at https://github.com/timeplus-io/proton-grafana-source. You can run streaming SQL with the plugin and build live charts in Grafana, without having to refresh the dashboard. Check out here for sample setup.
We also provide a plugin for Metabase: https://github.com/timeplus-io/metabase-proton-driver This is based on the Proton JDBC driver.
How to access Timeplus Proton programmatically
SQL is the main interface to work with Proton. The Ingest REST API allows you to push realtime data to Proton with any language.
The following drivers are available:
- https://github.com/timeplus-io/proton-java-driver JDBC and other Java clients
- https://github.com/timeplus-io/proton-go-driver for Golang
- https://github.com/timeplus-io/proton-python-driver for Python