Skip to main content

Proton Quickstart

Follow the compact guides that help you work with common Proton functionality.

How to install Proton

Proton can be installed as a single binary on Linux or Mac, via:

curl https://install.timeplus.com | sh

For Mac users, you can also use Homebrew to manage the install/upgrade/uninstall:

brew tap timeplus-io/timeplus
brew install proton

You can also install Proton in Docker, Docker Compose or Kubernetes.

docker run -d --pull always --name proton ghcr.io/timeplus-io/proton:latest

The Docker Compose stack demonstrates how to read/write data in Kafka/Redpanda with external streams.

You can also try Proton in the fully-managed Timeplus Cloud.

How to read/write Kafka or Redpanda

You use External Stream to read from Kafka topics or write data to the topics. We verified the integration with Apache Kafka, Confluent Cloud, Confluent Platform, Redpanda, WarpStream, Upstash and many more.

CREATE EXTERNAL STREAM [IF NOT EXISTS] stream_name (<col_name1> <col_type>)
SETTINGS type='kafka', brokers='ip:9092',topic='..',security_protocol='..',username='..',password='..',sasl_mechanism='..'

How to load data from PostgreSQL/MySQL/ClickHouse

For PostgreSQL, MySQL or other OLTP databases, you can apply the CDC (Change Data Capture) technology to load realtime changes to Proton via Debezium and Kafka/Redpanda. Example configuration at the cdc folder of proton repo. This blog shows the Timeplus Cloud UI but could be applied to Proton too.

If you have data in local ClickHouse or ClickHouse Cloud, you can also use External Table to read data.

How to read/write ClickHouse

You use External Table to read from ClickHouse tables or write data to the ClickHouse tables. We verified the integration with self-hosted ClickHouse, ClickHouse Cloud, Aiven for ClickHouse and many more.

How to work with JSON

Proton supports powerful, yet easy-to-use JSON processing. You can save the entire JSON document as a raw column in string type. Then use JSON path as the shortcut to access those values as string. For example raw:a.b.c. If your data is in int/float/bool or other type, you can also use :: to convert them. For example raw:a.b.c::int. If you want to read JSON documents in Kafka topics, you can choose to read each JSON as a raw string, or read each top level key/value pairs as columns. Please check the doc for details.

How to load CSV files

If you only need to load a single CSV file, you can create a stream then use the INSERT INTO .. SELECT .. FROM file(..) syntax. For example, if there are 3 fields in the CSV file: timestamp, price, volume, you can create the stream via

CREATE STREAM stream
(
`timestamp` datetime64(3),
`price` float64,
`volume` float64
)
SETTINGS event_time_column = 'timestamp';

Please note there will be the 4th column in the stream, which is _tp_time as the Event Time.

To import CSV content, use the file table function to set the file path and header and data types.

INSERT INTO stream (timestamp,price,volume) 
SELECT timestamp,price,volume
FROM file('data/my.csv', 'CSV', 'timestamp datetime64(3), price float64, volume float64')
SETTINGS max_insert_threads=8;
info

Please note:

  1. You need to specify the column names. Otherwise SELECT * will get 3 columns while there are 4 columns in the data stream.
  2. For security reasons, Proton only read files under proton-data/user_files folder. If you install proton via proton install command on Linux servers, the folder will be /var/lib/proton/user_files. If you don't install proton and run proton binary directly via proton server start, the folder will be proton-data/user_files
  3. We recommend to use max_insert_threads=8 to use multiple threads to maxiumize the ingestion performance. If your file system has high IOPS, you can create the stream with SETTINGS shards=3 and set a higher max_insert_threads value in the INSERT statement.

If you need to import multiple CSV files to a single stream, you can do something similar. You can even add one more column to track the file path.

CREATE STREAM kraken_all
(
`path` string,
`timestamp` datetime64(3),
`price` float64,
`volume` float64,
`_tp_time` datetime64(3, 'UTC') DEFAULT timestamp CODEC(DoubleDelta, LZ4),
INDEX _tp_time_index _tp_time TYPE minmax GRANULARITY 2
)
ENGINE = Stream(1, 1, rand())
PARTITION BY to_YYYYMM(_tp_time)
ORDER BY to_start_of_hour(_tp_time)
SETTINGS event_time_column = 'timestamp', index_granularity = 8192;

INSERT INTO kraken_all (path,timestamp,price,volume)
SELECT _path,timestamp,price,volume
FROM file('data/*.csv', 'CSV', 'timestamp datetime64(3), price float64, volume float64')
SETTINGS max_insert_threads=8;

How to visualize Proton query results with Grafana or Metabase

The offical Grafana plugin for Proton is available on https://grafana.com/grafana/plugins/timeplus-proton-datasource/ The source code is at https://github.com/timeplus-io/proton-grafana-source. You can run streaming SQL with the plugin and build live charts in Grafana, without having to refresh the dashboard. Check out https://github.com/timeplus-io/proton/tree/develop/examples/grafana for sample setup.

We also provide a plugin for Metabase: https://github.com/timeplus-io/metabase-proton-driver This is based on the Proton JDBC driver.

How to access Proton programmatically

SQL is the main interface to work with Proton. The Ingest REST API allows you to push realtime data to Proton with any language.

The following drivers are available: