Load streaming data from Apache Kafka
As of today, Apache Kafka is the primary data source (and sink) for Timeplus. You can also create external streams to analyze data in Confluent/Kafka/Redpanda without moving data.
Apache Kafka Source
- From the left side navigation menu, click Data Ingestion, then click the Add Data button in the top right corner.
- In this pop-up, you’ll see the sources you can connect and other methods to add your data. Click Apache Kafka.
- Enter the broker URL. You can also enable TLS or authentication, if needed.
- Enter the name of the Kafka topic, and specify the ‘read as’ data format. We currently support JSON, AVRO and Text formats.
- If the data in the Kafka topic is in JSON format, but the schema may change over time, we recommend you choose Text. This way, the entire JSON document will be saved as a string, and you can apply JSON related functions to extract value, even if the schema changes.
- If you choose AVRO, there is an option for 'Auto Extraction'. By default, this is toggled off, meaning the entire message will be saved as a string. If you toggle it on, then the top level attribute in the AVRO message will be put into different columns. This would be more convenient for you to query, but won't support schema evolution. When AVRO is selected, you also need to specify the address, API key, and secret key for the schema registry.
- In the next “Preview” step, we will show you at least one event from your specified Apache Kafka source.
- By default, your new source will create a new stream in Timeplus. Give this new stream a name and verify the columns information (column name and data type). You can also set a column as the event time column. If you don’t, we will use the ingestion time as the event time. Alternatively, you can select an existing stream from the dropdown.
- After previewing your data, you can give the source a name and an optional description, and review the configuration. Once you click Finish, your streaming data will be available in the specified stream immediately.
Custom Kafka Deployment
Similar steps as above. Please make sure Timeplus can reach out to your Kafka broker(s). You can use tools like ngrok to securely expose your local Kafka broker(s) to the internet, so that Timeplus Cloud can connect to it. Check out this blog for more details.
If you maintain an IP whitelist, you'll need to whitelist our static IP:
22.214.171.124 for us.timeplus.cloud
Notes for Kafka source
- Currently we support JSON and AVRO formats for the messages in Kafka topics
- The topic level JSON attributes will be converted to stream columns. For nested attributes, the element will be saved as a
Stringcolumn and later you can query them with one of the JSON functions.
- Values in number or boolean types in the JSON message will be converted to corresponding types in the stream.
- Datetime or timestamp will be saved as a String column. You can convert them back to DateTime via to_time function.