Kafka External Stream

Timeplus allows users to read from and write to Apache Kafka (and compatible platforms like Confluent Cloud and Redpanda) using Kafka External Streams.

By combining external streams with Materialized Views and Target Streams, users can build robust real-time streaming pipelines.

Tutorial with Docker Compose

Explore the following hands-on tutorials:

CREATE EXTERNAL STREAM

Use the following SQL command to create a Kafka external stream:

CREATE EXTERNAL STREAM [IF NOT EXISTS] <stream_name>
    (<col_name1> <col_type>)
SETTINGS
    type='kafka', -- required
    brokers='ip:9092', -- required
    topic='..', -- required
    security_protocol='..',
    sasl_mechanism='..',
    username='..',
    password='..',
    config_file='..',
    data_format='..',
    format_schema='..',
    one_message_per_row=..,
    kafka_schema_registry_url='..',
    kafka_schema_registry_credentials='..',
    ssl_ca_cert_file='..',
    ssl_ca_pem='..',
    skip_ssl_cert_check=..,
    properties='..'

Settings

type

Must be set to kafka. Compatible with:

Apache Kafka
Confluent Platform or Cloud
Redpanda
Other Kafka-compatible systems

brokers

Comma-separated list of broker addresses (host:port), e.g.:

kafka1:9092,kafka2:9092,kafka3:9092

topic

Kafka topic name to connect to.

security_protocol

The supported values for security_protocol are:

PLAINTEXT: when this option is omitted, this is the default value.
SASL_SSL: when this value is set, username and password should be specified.
- If users need to specify own SSL certification file, add another setting ssl_ca_cert_file='/ssl/ca.pem'. Users can also put the full content of the pem file as a string in the ssl_ca_pem setting.
- To skip the SSL certification verification: skip_ssl_cert_check=true.

sasl_mechanism

The supported values for sasl_mechanism are:

PLAIN: when setting security_protocol to SASL_SSL, this is the default value for sasl_mechanism.
SCRAM-SHA-256
SCRAM-SHA-512
AWS_MSK_IAM (for AWS MSK IAM role-based access when EC2 or Kubernetes pod is configured with a proper IAM role)

username / password

Required when sasl_mechanism is set to SCRAM-SHA-256 or SCRAM-SHA-512.

Alternatively, use config_file to securely pass credentials.

config_file

Use this to point to a file containing key-value config lines for Kafka external stream, e.g.:

username=my_username
password=my_password
data_format='Avro'
one_message_per_row=true

This is especially useful in Kubernetes environments with secrets managed via HashiCorp Vault.

HarshiCorp Vault injection example:

annotations:
        vault.hashicorp.com/agent-inject: "true"
        vault.hashicorp.com/agent-inject-status: "update"
        vault.hashicorp.com/agent-inject-secret-kafka-secret: "secret/kafka-secret"
        vault.hashicorp.com/agent-inject-template-kafka-secret: |
          {{- with secret "secret/kafka-secret" -}}
          username={{ .Data.data.username }}
          password={{ .Data.data.password }}
          {{- end }}
        vault.hashicorp.com/role: "vault-role"

info

Please note values in settings in the DDL will override those in config_file and it will only merge the settings from the config_file which are not explicitly specified in the DDL.

data_format

Defines how Kafka messages are parsed and written. Supported formats are

Format	Description
`JSONEachRow`	Parses one JSON document per line
`CSV`	Parses comma-separated values
`TSV`	Like CSV, but tab-delimited
`ProtobufSingle`	One Protobuf message per Kafka message
`Protobuf`	Multiple Protobuf messages per Kafka msg
`Avro`	Avro-encoded messages
`RawBLOB`	Raw text, no parsing (default)

format_schema

Required for these data formats:

ProtobufSingle
Protobuf
Avro

one_message_per_row

Set to true to ensure each Kafka message maps to exactly one JSON document, especially when writing with JSONEachRow.

kafka_schema_registry_url

URL of the Kafka Schema Registry, including the protocol is required (http:// or https://).

kafka_schema_registry_credentials

Credentials for the registry, in username:password format.

ssl_ca_cert_file / ssl_ca_pem

Use either:

ssl_ca_cert_file='/path/to/cert.pem'
ssl_ca_pem='-----BEGIN CERTIFICATE-----\n...'

skip_ssl_cert_check

Default: false
Set to true to bypass SSL verification.

properties

Used for advanced configurations. These settings are passed directly to the Kafka client (librdkafka config options) to fine tune the Kafka producer, consumer or topic behaviors.

For more, see the Advanced Settings section.

Read Data from Kafka

Timeplus allows reading Kafka messages in multiple data formats, including:

Plain string (raw)
CSV / TSV
JSON
Protobuf
Avro

Read Kafka Messages as Raw String

Use this mode when:

Messages contain unstructured text or binary data
No built-in format is applicable
You want to debug raw Kafka messages

Raw String Example

CREATE EXTERNAL STREAM ext_application_logs
         (raw string)
SETTINGS type='kafka',
         brokers='localhost:9092',
         topic='application_logs'

Users can use functions like regex string processing or JSON extract etc functions to further process the raw string.

Regex Example – Parse Application Logs

SELECT 
    to_time(extract(raw, '^(\\d{4}\\.\\d{2}\\.\\d{2} \\d{2}:\\d{2}:\\d{2}\\.\\d+)')) AS timestamp, 
    extract(raw, '} <(\\w+)>') AS level,
    extract(raw, '} <\\w+> (.*)') AS message
FROM application_logs;

Read JSON Kafka Message

Assuming Kafka message contains JSON text with this schema

{
  "actor": string,
  "created_at": timestamp,
  "id": string,
  "payload": string,
  "repo": string,
  "type": string
}

You can process JSON in two ways:

Option A: Parse with JSON Extract Functions

Create a raw stream:

CREATE EXTERNAL STREAM ext_json_raw
    (raw string)
SETTINGS type='kafka',
         brokers='localhost:9092',
         topic='github_events';

Extract fields using JSON extract shortcut syntax or JSON extract functions:

SELECT 
    raw:actor AS actor,
    raw:created_at::datetime64(3, 'UTC') AS created_at,
    raw:id AS id,
    raw:payload AS payload,
    raw:repo AS repo,
    raw:type AS type
FROM ext_json_raw;

This method is most flexible and is best for dynamic JSON text with new fields or missing fields and it can also extract nested JSON fields.

Option B: Use JSONEachRow Format

Define a Kafka external stream with columns which are mapped to the JSON fields and also specify the data_format as JSONEachRow.

CREATE EXTERNAL STREAM ext_json_parsed
    (
        actor string,
        created_at datetime64(3, 'UTC'),
        id string,
        payload string,
        repo string,
        type string
    )
SETTINGS type='kafka',
         brokers='localhost:9092',
         topic='github_events',
         data_format='JSONEachRow'

When users query the ext_json_parsed stream, the JSON fields will be parsed and cast to the target column type automatically.

This method is most convenient when the JSON text is in stable schema and can be used to extract JSON fields at top level.

Read CSV Kafka Messages

Similar to data format JSONEachRow, users can read Kafka message in CSV format.

CREATE EXTERNAL STREAM ext_json_parsed
    (
        actor string,
        created_at datetime64(3, 'UTC'),
        id string,
        payload string,
        repo string,
        type string
    )
SETTINGS type='kafka',
         brokers='localhost:9092',
         topic='csv_topic',
         data_format='CSV';

Read TSV Kafka Messages

Identical to CSV, but expects tab-separated values:

SETTINGS data_format='TSV';

Read Avro or Protobuf Messages

To read Avro-encoded / Protobuf-encoded Kafka message, please refer to Schema and Schema Registry for details.

Access Kafka Message Metadata

Timeplus provides virtual columns for Kafka message metadata.

Virtual Column	Description	Type
`_tp_time`	Kafka message timestamp	`datetime64(3, 'UTC')`
`_tp_message_key`	Kafka message key	`string`
`_tp_message_headers`	Kafka headers as key-value map	`map(string, string)`
`_tp_sn`	Kafka message offset	`int64`
`_tp_shard`	Kafka partition ID	`int32`

Kafka Message Metadata Examples

-- View message time and payload
SELECT _tp_time, raw FROM ext_github_events;

-- View message key
SELECT _tp_message_key, raw FROM ext_github_events;

-- Access headers
SELECT _tp_message_headers['trace_id'], raw FROM ext_github_events;

-- View message offset and partition
SELECT _tp_sn, _tp_shard, raw FROM ext_github_events;

Query Settings for Kafka External Streams

Timeplus supports several query-level settings to control how data is read from Kafka topics. These settings can be especially useful for targeting specific partitions or replaying messages from a defined point in time.

Read from Specific Kafka Partitions

By default, Timeplus reads from all partitions of a Kafka topic. You can override this by using the shards setting to specify which partitions to read from.

Read from a Single Partition

SELECT raw FROM ext_stream SETTINGS shards='0'

Read from Multiple Partitions

Separate partition IDs with commas:

SELECT raw FROM ext_stream SETTINGS shards='0,2'

Rewind via seek_to

By default, Timeplus only reads new messages published after the query starts. To read historical messages, use the seek_to setting.

Rewind to the Earliest Offset (All Partitions)

SELECT raw FROM ext_stream SETTINGS seek_to='earliest'

Rewind to Specific Offsets (Per Partition)

Offsets are specified in partition order. For example:

SELECT raw FROM ext_stream SETTINGS seek_to='5,3,11'

This seeks to:

Offset 5 in partition 0
Offset 3 in partition 1
Offset 11 in partition 2

Rewind to a Specific Timestamp (All Partitions)

You can also rewind based on a timestamp:

SELECT raw FROM ext_stream SETTINGS seek_to='2025-01-01T00:00:00.000'

info

Timeplus will use Kafka API to convert the timestamp to the corresponding offsets for each partition internally.

Write Data to Kafka

Timeplus supports writing data to Kafka using various encoding formats such as strings, JSON, CSV, TSV, Avro, and Protobuf. You can write to Kafka using SQL INSERT statements, the Ingest REST API, or as the target of a Materialized View.

Write as Raw String

You can encode data as a raw string in Kafka messages:

CREATE EXTERNAL STREAM ext_github_events (raw string)
SETTINGS type='kafka',
         brokers='localhost:9092',
         topic='github_events'

You can then write data via:

INSERT INTO ext_github_events VALUES ('some string')
Ingest REST API
Materialized View

info

Internally, the data_format is RawBLOB, and one_message_per_row=true by default.

Pay attention to setting kafka_max_message_size. When multiple rows can be written to the same Kafka message, this setting will control how many data will be put in a Kafka message, ensuring it won't exceed the kafka_max_message_size limit.

Write as JSONEachRow

Encode each row as a separate JSON object (aka JSONL or jsonlines):

CREATE EXTERNAL STREAM target(
    _tp_time datetime64(3),
    url string,
    method string,
    ip string)
    SETTINGS type='kafka',
             brokers='redpanda:9092',
             topic='masked-fe-event',
             data_format='JSONEachRow',
             one_message_per_row=true;

The messages will be generated in the specific topic as

{
    "_tp_time":"2023-10-29 05:36:21.957"
    "url":"https://www.nationalweb-enabled.io/methodologies/killer/web-readiness"
    "method":"POST"
    "ip":"c4ecf59a9ec27b50af9cc3bb8289e16c"
}

info

Please note, by default multiple JSON documents will be inserted to the same Kafka message. One JSON document each row/line (JSONEachRow, jsonl). Such default behavior aims to get the maximum writing performance to Kafka/Redpanda. But users need to make sure the downstream applications are able to properly process the json lines.

If users need a valid JSON per each Kafka message, instead of a JSONL, please set one_message_per_row=true e.g.

CREATE EXTERNAL STREAM target(_tp_time datetime64(3), url string, ip string)
SETTINGS type='kafka', brokers='redpanda:9092', topic='masked-fe-event',
         data_format='JSONEachRow',one_message_per_row=true

The default value of one_message_per_row is false for data_format='JSONEachRow' and true for data_format='RawBLOB'.

Write as CSV

Each row is encoded as one CSV line:

CREATE EXTERNAL STREAM target(
    _tp_time datetime64(3),
    url string,
    method string,
    ip string)
    SETTINGS type='kafka',
             brokers='redpanda:9092',
             topic='masked-fe-event',
             data_format='CSV';

The messages will be generated in the specific topic as

"2023-10-29 05:35:54.176","https://www.nationalwhiteboard.info/sticky/recontextualize/robust/incentivize","PUT","3eaf6372e909e033fcfc2d6a3bc04ace"

Write as TSV

Same as CSV, but uses tab characters as delimiters instead of commas.

Write as ProtobufSingle

To write Protobuf-encoded messages from Kafka topics, please refer to Protobuf Schema, and Kafka Schema Registry pages for details.

Write as Avro

To write Avro-encoded messages from Kafka topics, please refer to Avro Schema, and Kafka Schema Registry pages for details.

Write Kafka Message Metadata

_tp_message_key

If users like to populate Kafka message key when producing data to a Kafka topic, users can define the _tp_message_key column when creating the external stream.

For example:

CREATE EXTERNAL STREAM foo (
    id int32,
    name string,
    _tp_message_key string
) SETTINGS type='kafka',...;

After inserting a row to the stream like this:

INSERT INTO foo(id,name,_tp_message_key) VALUES (1, 'John', 'some-key');

Kafka key will be 'some-key'
Message body: {"id": 1, "name": "John"}. Kafka key was excluded from the message body.

_tp_message_key supports these types:

Numeric: uint8/16/32/64, int8/16/32/64
Others: string, bool, float32, float64, fixed_string
Nullable are also supported:

CREATE EXTERNAL STREAM foo (
    id int32,
    name string,
    _tp_message_key nullable(string) default null
) SETTINGS type='kafka',...;

_tp_message_headers

Add Kafka headers via _tp_message_headers (map of key-value pairs):

CREATE EXTERNAL STREAM example (
    s string,
    i int,
    ...,
    _tp_message_headers map(string, string)
) settings type='kafka',...;

Then insert rows to the external stream via INSERT INTO or Materialized Views, the _tp_message_headers will be set to the headers of the Kafka message.

sharding_expr

sharding_expr is used to control how rows are distributed to Kafka partitions:

CREATE EXTERNAL STREAM foo (
    id int32,..
) SETTINGS type='kafka', sharding_expr='hash(id)'...;

When inserting rows, the partition ID will be evaluated based on the sharding_expr and Timeplus will put the message into the corresponding Kafka partition.

Properties for Kafka client

In advanced use cases, you may want to fine-tune the behavior of the Kafka consumer, producer, or topic when creating Kafka external streams. For example, fine tune the consumeer, producer's latency, throughput etc. Timeplus allows these fine tuning through the properties setting, which passes configuration options directly to the underlying librdkafka client.

These settings can control aspects like message size limits, retry behavior, timeouts, and more. For a full list of available configuration options, refer to the librdkafka configuration documentation.

Kafka Client Properties Example

CREATE EXTERNAL STREAM ext_github_events(raw string)
SETTINGS type='kafka',
         brokers='localhost:9092',
         topic='github_events',
         properties='message.max.bytes=1000000;message.timeout.ms=6000'

This example sets the maximum Kafka message size to 1MB and the message timeout to 6 seconds.

Kafka Client Properties

Please note while most configuration properties from librdkafka are supported, Timeplus may restrict or ignore certain settings. Here is the list of supported properties.

(C/P legend: C = Consumer, P = Producer, * = both)

Property	C/P	Range	Default	Importance	Description
client.id	*		rdkafka	low	Client identifier. Type: string
message.max.bytes	*	1000 .. 1000000000	1000000	medium	Maximum Kafka protocol request message size. Due to differing framing overhead between protocol versions the producer is unable to reliably enforce a strict max message limit at produce time and may exceed the maximum size by one message in protocol ProduceRequests, the broker will enforce the topic's `max.message.bytes` limit (see Apache Kafka documentation). Type: integer
message.copy.max.bytes	*	0 .. 1000000000	65535	low	Maximum size for message to be copied to buffer. Messages larger than this will be passed by reference (zero-copy) at the expense of larger iovecs. Type: integer
receive.message.max.bytes	*	1000 .. 2147483647	100000000	medium	Maximum Kafka protocol response message size. This serves as a safety precaution to avoid memory exhaustion in case of protocol hiccups. This value must be at least `fetch.max.bytes` + 512 to allow for protocol overhead; the value is adjusted automatically unless the configuration property is explicitly set. Type: integer
max.in.flight.requests.per.connection	*	1 .. 1000000	1000000	low	Maximum number of in-flight requests per broker connection. This is a generic property applied to all broker communication, however it is primarily relevant to produce requests. In particular, note that other mechanisms limit the number of outstanding consumer fetch request per broker to one. Type: integer
max.in.flight	*	1 .. 1000000	1000000	low	Alias for `max.in.flight.requests.per.connection`: Maximum number of in-flight requests per broker connection. This is a generic property applied to all broker communication, however it is primarily relevant to produce requests. In particular, note that other mechanisms limit the number of outstanding consumer fetch request per broker to one. Type: integer
metadata.request.timeout.ms	*	10 .. 900000	60000	low	Non-topic request timeout in milliseconds. This is for metadata requests, etc. Type: integer
topic.metadata.refresh.interval.ms	*	-1 .. 3600000	300000	low	Period of time in milliseconds at which topic and broker metadata is refreshed in order to proactively discover any new brokers, topics, partitions or partition leader changes. Use -1 to disable the intervaled refresh (not recommended). If there are no locally referenced topics (no topic objects created, no messages produced, no subscription or no assignment) then only the broker list will be refreshed every interval but no more often than every 10s. Type: integer
metadata.max.age.ms	*	1 .. 86400000	900000	low	Metadata cache max age. Defaults to topic.metadata.refresh.interval.ms * 3 Type: integer
topic.metadata.refresh.fast.interval.ms	*	1 .. 60000	250	low	When a topic loses its leader a new metadata request will be enqueued with this initial interval, exponentially increasing until the topic metadata has been refreshed. This is used to recover quickly from transitioning leader brokers. Type: integer
topic.metadata.refresh.fast.cnt	*	0 .. 1000	10	low	DEPRECATED No longer used. Type: integer
topic.metadata.refresh.sparse	*	true, false	true	low	Sparse metadata requests (consumes less network bandwidth) Type: boolean
topic.metadata.propagation.max.ms	*	0 .. 3600000	30000	low	Apache Kafka topic creation is asynchronous and it takes some time for a new topic to propagate throughout the cluster to all brokers. If a client requests topic metadata after manual topic creation but before the topic has been fully propagated to the broker the client is requesting metadata from, the topic will seem to be non-existent and the client will mark the topic as such, failing queued produced messages with `ERR__UNKNOWN_TOPIC`. This setting delays marking a topic as non-existent until the configured propagation max time has passed. The maximum propagation time is calculated from the time the topic is first referenced in the client, e.g., on produce(). Type: integer
topic.blacklist	*			low	Topic blacklist, a comma-separated list of regular expressions for matching topic names that should be ignored in broker metadata information as if the topics did not exist. Type: pattern list
debug	*	generic, broker, topic, metadata, feature, queue, msg, protocol, cgrp, security, fetch, interceptor, plugin, consumer, admin, eos, mock, assignor, conf, all		medium	A comma-separated list of debug contexts to enable. Detailed Producer debugging: broker,topic,msg. Consumer: consumer,cgrp,topic,fetch Type: CSV flags
socket.timeout.ms	*	10 .. 300000	60000	low	Default timeout for network requests. Producer: ProduceRequests will use the lesser value of `socket.timeout.ms` and remaining `message.timeout.ms` for the first message in the batch. Consumer: FetchRequests will use `fetch.wait.max.ms` + `socket.timeout.ms`. Admin: Admin requests will use `socket.timeout.ms` or explicitly set `rd_kafka_AdminOptions_set_operation_timeout()` value. Type: integer
socket.blocking.max.ms	*	1 .. 60000	1000	low	DEPRECATED No longer used. Type: integer
socket.send.buffer.bytes	*	0 .. 100000000	0	low	Broker socket send buffer size. System default is used if 0. Type: integer
socket.receive.buffer.bytes	*	0 .. 100000000	0	low	Broker socket receive buffer size. System default is used if 0. Type: integer
socket.keepalive.enable	*	true, false	false	low	Enable TCP keep-alives (SO_KEEPALIVE) on broker sockets Type: boolean
socket.nagle.disable	*	true, false	false	low	Disable the Nagle algorithm (TCP_NODELAY) on broker sockets. Type: boolean
socket.max.fails	*	0 .. 1000000	1	low	Disconnect from broker when this number of send failures (e.g., timed out requests) is reached. Disable with 0. WARNING: It is highly recommended to leave this setting at its default value of 1 to avoid the client and broker to become desynchronized in case of request timeouts. NOTE: The connection is automatically re-established. Type: integer
broker.address.ttl	*	0 .. 86400000	1000	low	How long to cache the broker address resolving results (milliseconds). Type: integer
broker.address.family	*	any, v4, v6	any	low	Allowed broker IP address families: any, v4, v6 Type: enum value
reconnect.backoff.jitter.ms	*	0 .. 3600000	0	low	DEPRECATED No longer used. See `reconnect.backoff.ms` and `reconnect.backoff.max.ms`. Type: integer
reconnect.backoff.ms	*	0 .. 3600000	100	medium	The initial time to wait before reconnecting to a broker after the connection has been closed. The time is increased exponentially until `reconnect.backoff.max.ms` is reached. -25% to +50% jitter is applied to each reconnect backoff. A value of 0 disables the backoff and reconnects immediately. Type: integer
reconnect.backoff.max.ms	*	0 .. 3600000	10000	medium	The maximum time to wait before reconnecting to a broker after the connection has been closed. Type: integer
statistics.interval.ms	*	0 .. 86400000	0	high	librdkafka statistics emit interval. The application also needs to register a stats callback using `rd_kafka_conf_set_stats_cb()`. The granularity is 1000ms. A value of 0 disables statistics. Type: integer
log_level	*	0 .. 7	6	low	Logging level (syslog(3) levels) Type: integer
log.thread.name	*	true, false	true	low	Print internal thread name in log messages (useful for debugging librdkafka internals) Type: boolean
log.connection.close	*	true, false	true	low	Log broker disconnects. It might be useful to turn this off when interacting with 0.9 brokers with an aggressive `connection.max.idle.ms` value. Type: boolean
api.version.request.timeout.ms	*	1 .. 300000	10000	low	Timeout for broker API version requests. Type: integer
api.version.fallback.ms	*	0 .. 604800000	0	medium	Dictates how long the `broker.version.fallback` fallback is used in the case the ApiVersionRequest fails. NOTE: The ApiVersionRequest is only issued when a new connection to the broker is made (such as after an upgrade). Type: integer
broker.version.fallback	*		0.10.0	medium	Older broker versions (before 0.10.0) provide no way for a client to query for supported protocol features (ApiVersionRequest, see `api.version.request`) making it impossible for the client to know what features it may use. As a workaround a user may set this property to the expected broker version and the client will automatically adjust its feature set accordingly if the ApiVersionRequest fails (or is disabled). The fallback broker version will be used for `api.version.fallback.ms`. Valid values are: 0.9.0, 0.8.2, 0.8.1, 0.8.0. Any other value >= 0.10, such as 0.10.2.1, enables ApiVersionRequests. Type: string
ssl.cipher.suites	*			low	A cipher suite is a named combination of authentication, encryption, MAC and key exchange algorithm used to negotiate the security settings for a network connection using TLS or SSL network protocol. See manual page for `ciphers(1)` and `SSL_CTX_set_cipher_list(3). Type: string
ssl.curves.list	*			low	The supported-curves extension in the TLS ClientHello message specifies the curves (standard/named, or 'explicit' GF(2^k) or GF(p)) the client is willing to have the server use. See manual page for `SSL_CTX_set1_curves_list(3)`. OpenSSL >= 1.0.2 required. Type: string
ssl.sigalgs.list	*			low	The client uses the TLS ClientHello signature_algorithms extension to indicate to the server which signature/hash algorithm pairs may be used in digital signatures. See manual page for `SSL_CTX_set1_sigalgs_list(3)`. OpenSSL >= 1.0.2 required. Type: string
ssl.key.location	*			low	Path to client's private key (PEM) used for authentication. Type: string
ssl.key.password	*			low	Private key passphrase (for use with `ssl.key.location` and `set_ssl_cert()`) Type: string
ssl.key.pem	*			low	Client's private key string (PEM format) used for authentication. Type: string
ssl.certificate.location	*			low	Path to client's public key (PEM) used for authentication. Type: string
ssl.certificate.pem	*			low	Client's public key string (PEM format) used for authentication. Type: string
ssl.ca.location	*			low	File or directory path to CA certificate(s) for verifying the broker's key. Defaults: On Windows the system's CA certificates are automatically looked up in the Windows Root certificate store. On Mac OSX this configuration defaults to `probe`. It is recommended to install openssl using Homebrew, to provide CA certificates. On Linux install the distribution's ca-certificates package. If OpenSSL is statically linked or `ssl.ca.location` is set to `probe` a list of standard paths will be probed and the first one found will be used as the default CA certificate location path. If OpenSSL is dynamically linked the OpenSSL library's default path will be used (see `OPENSSLDIR` in `openssl version -a`). Type: string
ssl.ca.certificate.stores	*		Root	low	Comma-separated list of Windows Certificate stores to load CA certificates from. Certificates will be loaded in the same order as stores are specified. If no certificates can be loaded from any of the specified stores an error is logged and the OpenSSL library's default CA location is used instead. Store names are typically one or more of: MY, Root, Trust, CA. Type: string
ssl.crl.location	*			low	Path to CRL for verifying broker's certificate validity. Type: string
ssl.keystore.location	*			low	Path to client's keystore (PKCS#12) used for authentication. Type: string
ssl.keystore.password	*			low	Client's keystore (PKCS#12) password. Type: string
enable.ssl.certificate.verification	*	true, false	true	low	Enable OpenSSL's builtin broker (server) certificate verification. This verification can be extended by the application by implementing a certificate_verify_cb. Type: boolean
ssl.endpoint.identification.algorithm	*	none, https	none	low	Endpoint identification algorithm to validate broker hostname using broker certificate. https - Server (broker) hostname verification as specified in RFC2818. none - No endpoint verification. OpenSSL >= 1.0.2 required. Type: enum value
ssl.certificate.verify_cb	*			low	Callback to verify the broker certificate chain. Type: see dedicated API
sasl.kerberos.service.name	*		kafka	low	Kerberos principal name that Kafka runs as, not including /hostname@REALM Type: string
sasl.kerberos.principal	*		kafkaclient	low	This client's Kerberos principal name. (Not supported on Windows, will use the logon user's principal). Type: string
sasl.kerberos.kinit.cmd	*
sasl.kerberos.keytab	*			low	Path to Kerberos keytab file. This configuration property is only used as a variable in `sasl.kerberos.kinit.cmd` as `... -t "%{sasl.kerberos.keytab}"`. Type: string
sasl.kerberos.min.time.before.relogin	*	0 .. 86400000	60000	low	Minimum time in milliseconds between key refresh attempts. Disable automatic key refresh by setting this property to 0. Type: integer
sasl.password	*			high	SASL password for use with the PLAIN and SASL-SCRAM-.. mechanism Type: string
sasl.oauthbearer.config	*			low	SASL/OAUTHBEARER configuration. The format is implementation-dependent and must be parsed accordingly. The default unsecured token implementation (see https://tools.ietf.org/html/rfc7515#appendix-A.5) recognizes space-separated name=value pairs with valid names including principalClaimName, principal, scopeClaimName, scope, and lifeSeconds. The default value for principalClaimName is "sub", the default value for scopeClaimName is "scope", and the default value for lifeSeconds is 3600. The scope value is CSV format with the default value being no/empty scope. For example: `principalClaimName=azp principal=admin scopeClaimName=roles scope=role1,role2 lifeSeconds=600`. In addition, SASL extensions can be communicated to the broker via `extension_NAME=value`. For example: `principal=admin extension_traceId=123` Type: string
enable.sasl.oauthbearer.unsecure.jwt	*	true, false	false	low	Enable the builtin unsecure JWT OAUTHBEARER token handler if no oauthbearer_refresh_cb has been set. This builtin handler should only be used for development or testing, and not in production. Type: boolean
partition.assignment.strategy	C		range,roundrobin	medium	The name of one or more partition assignment strategies. The elected group leader will use a strategy supported by all members of the group to assign partitions to group members. If there is more than one eligible strategy, preference is determined by the order of this list (strategies earlier in the list have higher priority). Cooperative and non-cooperative (eager) strategies must not be mixed. Available strategies: range, roundrobin, cooperative-sticky. Type: string
session.timeout.ms	C	1 .. 3600000	10000	high	Client group session and failure detection timeout. The consumer sends periodic heartbeats (heartbeat.interval.ms) to indicate its liveness to the broker. If no hearts are received by the broker for a group member within the session timeout, the broker will remove the consumer from the group and trigger a rebalance. The allowed range is configured with the broker configuration properties `group.min.session.timeout.ms` and `group.max.session.timeout.ms`. Also see `max.poll.interval.ms`. Type: integer
heartbeat.interval.ms	C	1 .. 3600000	3000	low	Group session keepalive heartbeat interval. Type: integer
group.protocol.type	C		consumer	low	Group protocol type. NOTE: Currently, the only supported group protocol type is `consumer`. Type: string
coordinator.query.interval.ms	C	1 .. 3600000	600000	low	How often to query for the current client group coordinator. If the currently assigned coordinator is down the configured query interval will be divided by ten to more quickly recover in case of coordinator reassignment. Type: integer
max.poll.interval.ms	C	1 .. 86400000	300000	high	Maximum allowed time between calls to consume messages (e.g., rd_kafka_consumer_poll()) for high-level consumers. If this interval is exceeded the consumer is considered failed and the group will rebalance in order to reassign the partitions to another consumer group member. Warning: Offset commits may be not possible at this point. Note: It is recommended to set `enable.auto.offset.store=false` for long-time processing applications and then explicitly store offsets (using offsets_store()) after message processing, to make sure offsets are not auto-committed prior to processing has finished. The interval is checked two times per second. See KIP-62 for more information. Type: integer
auto.commit.interval.ms	C	0 .. 86400000	5000	medium	The frequency in milliseconds that the consumer offsets are committed (written) to offset storage. (0 = disable). This setting is used by the high-level consumer. Type: integer
queued.min.messages	C	1 .. 10000000	100000	medium	Minimum number of messages per topic+partition librdkafka tries to maintain in the local consumer queue. Type: integer
queued.max.messages.kbytes	C	1 .. 2097151	65536	medium	Maximum number of kilobytes of queued pre-fetched messages in the local consumer queue. If using the high-level consumer this setting applies to the single consumer queue, regardless of the number of partitions. When using the legacy simple consumer or when separate partition queues are used this setting applies per partition. This value may be overshot by fetch.message.max.bytes. This property has higher priority than queued.min.messages. Type: integer
fetch.wait.max.ms	C	0 .. 300000	500	low	Maximum time the broker may wait to fill the Fetch response with fetch.min.bytes of messages. Type: integer
fetch.message.max.bytes	C	1 .. 1000000000	1048576	medium	Initial maximum number of bytes per topic+partition to request when fetching messages from the broker. If the client encounters a message larger than this value it will gradually try to increase it until the entire message can be fetched. Type: integer
max.partition.fetch.bytes	C	1 .. 1000000000	1048576	medium	Alias for `fetch.message.max.bytes`: Initial maximum number of bytes per topic+partition to request when fetching messages from the broker. If the client encounters a message larger than this value it will gradually try to increase it until the entire message can be fetched. Type: integer
fetch.max.bytes	C	0 .. 2147483135	52428800	medium	Maximum amount of data the broker shall return for a Fetch request. Messages are fetched in batches by the consumer and if the first message batch in the first non-empty partition of the Fetch request is larger than this value, then the message batch will still be returned to ensure the consumer can make progress. The maximum message batch size accepted by the broker is defined via `message.max.bytes` (broker config) or `max.message.bytes` (broker topic config). `fetch.max.bytes` is automatically adjusted upwards to be at least `message.max.bytes` (consumer config). Type: integer
fetch.min.bytes	C	1 .. 100000000	1	low	Minimum number of bytes the broker responds with. If fetch.wait.max.ms expires the accumulated data will be sent to the client regardless of this setting. Type: integer
fetch.error.backoff.ms	C	0 .. 300000	500	medium	How long to postpone the next fetch request for a topic+partition in case of a fetch error. Type: integer
offset.store.method	C	none, file, broker	broker	low	DEPRECATED Offset commit store method: 'file' - DEPRECATED: local file store (offset.store.path, et.al), 'broker' - broker commit store (requires Apache Kafka 0.8.2 or later on the broker). Type: enum value
isolation.level	C	read_uncommitted, read_committed	read_committed	high	Controls how to read messages written transactionally: `read_committed` - only return transactional messages which have been committed. `read_uncommitted` - return all messages, even transactional messages which have been aborted. Type: enum value
check.crcs	C	true, false	false	medium	Verify CRC32 of consumed messages, ensuring no on-the-wire or on-disk corruption to the messages occurred. This check comes at slightly increased CPU usage. Type: boolean
allow.auto.create.topics	C	true, false	false	low	Allow automatic topic creation on the broker when subscribing to or assigning non-existent topics. The broker must also be configured with `auto.create.topics.enable=true` for this configuraiton to take effect. Note: The default value (false) is different from the Java consumer (true). Requires broker version >= 0.11.0.0, for older broker versions only the broker configuration applies. Type: boolean
client.rack	*			low	A rack identifier for this client. This can be any string value which indicates where this client is physically located. It corresponds with the broker config `broker.rack`. Type: string
transactional.id	P			high	Enables the transactional producer. The transactional.id is used to identify the same transactional producer instance across process restarts. It allows the producer to guarantee that transactions corresponding to earlier instances of the same producer have been finalized prior to starting any new transactions, and that any zombie instances are fenced off. If no transactional.id is provided, then the producer is limited to idempotent delivery (if enable.idempotence is set). Requires broker version >= 0.11.0. Type: string
transaction.timeout.ms	P	1000 .. 2147483647	60000	medium	The maximum amount of time in milliseconds that the transaction coordinator will wait for a transaction status update from the producer before proactively aborting the ongoing transaction. If this value is larger than the `transaction.max.timeout.ms` setting in the broker, the init_transactions() call will fail with ERR_INVALID_TRANSACTION_TIMEOUT. The transaction timeout automatically adjusts `message.timeout.ms` and `socket.timeout.ms`, unless explicitly configured in which case they must not exceed the transaction timeout (`socket.timeout.ms` must be at least 100ms lower than `transaction.timeout.ms`). This is also the default timeout value if no timeout (-1) is supplied to the transactional API methods. Type: integer
enable.idempotence	P	true, false	false	high	When set to `true`, the producer will ensure that messages are successfully produced exactly once and in the original produce order. The following configuration properties are adjusted automatically (if not modified by the user) when idempotence is enabled: `max.in.flight.requests.per.connection=5` (must be less than or equal to 5), `retries=INT32_MAX` (must be greater than 0), `acks=all`, `queuing.strategy=fifo`. Producer instantation will fail if user-supplied configuration is incompatible. Type: boolean
enable.gapless.guarantee	P	true, false	false	low	EXPERIMENTAL: subject to change or removal. When set to `true`, any error that could result in a gap in the produced message series when a batch of messages fails, will raise a fatal error (ERR__GAPLESS_GUARANTEE) and stop the producer. Messages failing due to `message.timeout.ms` are not covered by this guarantee. Requires `enable.idempotence=true`. Type: boolean
queue.buffering.max.messages	P	1 .. 10000000	100000	high	Maximum number of messages allowed on the producer queue. This queue is shared by all topics and partitions. Type: integer
queue.buffering.max.kbytes	P	1 .. 2147483647	1048576	high	Maximum total message size sum allowed on the producer queue. This queue is shared by all topics and partitions. This property has higher priority than queue.buffering.max.messages. Type: integer
queue.buffering.max.ms	P	0 .. 900000	5	high	Delay in milliseconds to wait for messages in the producer queue to accumulate before constructing message batches (MessageSets) to transmit to brokers. A higher value allows larger and more effective (less overhead, improved compression) batches of messages to accumulate at the expense of increased message delivery latency. Type: float
linger.ms	P	0 .. 900000	5	high	Alias for `queue.buffering.max.ms`: Delay in milliseconds to wait for messages in the producer queue to accumulate before constructing message batches (MessageSets) to transmit to brokers. A higher value allows larger and more effective (less overhead, improved compression) batches of messages to accumulate at the expense of increased message delivery latency. Type: float
message.send.max.retries	P	0 .. 2147483647	2147483647	high	How many times to retry sending a failing Message. Note: retrying may cause reordering unless `enable.idempotence` is set to true. Type: integer
retries	P	0 .. 2147483647	2147483647	high	Alias for `message.send.max.retries`: How many times to retry sending a failing Message. Note: retrying may cause reordering unless `enable.idempotence` is set to true. Type: integer
retry.backoff.ms	P	1 .. 300000	100	medium	The backoff time in milliseconds before retrying a protocol request. Type: integer
queue.buffering.backpressure.threshold	P	1 .. 1000000	1	low	The threshold of outstanding not yet transmitted broker requests needed to backpressure the producer's message accumulator. If the number of not yet transmitted requests equals or exceeds this number, produce request creation that would have otherwise been triggered (for example, in accordance with linger.ms) will be delayed. A lower number yields larger and more effective batches. A higher value can improve latency when using compression on slow machines. Type: integer
compression.codec	P	none, gzip, snappy, lz4, zstd	none	medium	compression codec to use for compressing message sets. This is the default value for all topics, may be overridden by the topic configuration property `compression.codec`. Type: enum value
compression.type	P	none, gzip, snappy, lz4, zstd	none	medium	Alias for `compression.codec`: compression codec to use for compressing message sets. This is the default value for all topics, may be overridden by the topic configuration property `compression.codec`. Type: enum value
batch.num.messages	P	1 .. 1000000	10000	medium	Maximum number of messages batched in one MessageSet. The total MessageSet size is also limited by batch.size and message.max.bytes. Type: integer
batch.size	P	1 .. 2147483647	1000000	medium	Maximum size (in bytes) of all messages batched in one MessageSet, including protocol framing overhead. This limit is applied after the first message has been added to the batch, regardless of the first message's size, this is to ensure that messages that exceed batch.size are produced. The total MessageSet size is also limited by batch.num.messages and message.max.bytes. Type: integer
delivery.report.only.error	P	true, false	false	low	Only provide delivery reports for failed messages. Type: boolean
sticky.partitioning.linger.ms	P	0 .. 900000	10	low	Delay in milliseconds to wait to assign new sticky partitions for each topic. By default, set to double the time of linger.ms. To disable sticky behavior, set to 0. This behavior affects messages with the key NULL in all cases, and messages with key lengths of zero when the consistent_random partitioner is in use. These messages would otherwise be assigned randomly. A higher value allows for more effective batching of these messages. Type: integer

Tutorial with Docker Compose​

CREATE EXTERNAL STREAM​

Settings​

type​

brokers​

topic​

security_protocol​

sasl_mechanism​

username / password​

config_file​

data_format​

format_schema​

one_message_per_row​

kafka_schema_registry_url​

kafka_schema_registry_credentials​

ssl_ca_cert_file / ssl_ca_pem​

skip_ssl_cert_check​

properties​

Read Data from Kafka​

Read Kafka Messages as Raw String​

Raw String Example​

Regex Example – Parse Application Logs​

Read JSON Kafka Message​

Option A: Parse with JSON Extract Functions​

Option B: Use JSONEachRow Format​

Read CSV Kafka Messages​

Read TSV Kafka Messages​

Read Avro or Protobuf Messages​

Access Kafka Message Metadata​

Kafka Message Metadata Examples​

Query Settings for Kafka External Streams​

Read from Specific Kafka Partitions​

Read from a Single Partition​

Read from Multiple Partitions​

Rewind via seek_to​

Rewind to the Earliest Offset (All Partitions)​

Rewind to Specific Offsets (Per Partition)​

Rewind to a Specific Timestamp (All Partitions)​

Write Data to Kafka​

Write as Raw String​

Write as JSONEachRow​

Write as CSV​

Write as TSV​

Write as ProtobufSingle​

Write as Avro​

Write Kafka Message Metadata​

_tp_message_key​

_tp_message_headers​

sharding_expr​

Properties for Kafka client​

Kafka Client Properties Example​

Kafka Client Properties​

Tutorial with Docker Compose

CREATE EXTERNAL STREAM

Settings

type

brokers

topic

security_protocol

sasl_mechanism

username / password

config_file

data_format

format_schema

one_message_per_row

kafka_schema_registry_url

kafka_schema_registry_credentials

ssl_ca_cert_file / ssl_ca_pem

skip_ssl_cert_check

properties

Read Data from Kafka

Read Kafka Messages as Raw String

Raw String Example

Regex Example – Parse Application Logs

Read JSON Kafka Message

Option A: Parse with JSON Extract Functions

Option B: Use JSONEachRow Format

Read CSV Kafka Messages

Read TSV Kafka Messages

Read Avro or Protobuf Messages

Access Kafka Message Metadata

Kafka Message Metadata Examples

Query Settings for Kafka External Streams

Read from Specific Kafka Partitions

Read from a Single Partition

Read from Multiple Partitions

Rewind via seek_to

Rewind to the Earliest Offset (All Partitions)

Rewind to Specific Offsets (Per Partition)

Rewind to a Specific Timestamp (All Partitions)

Write Data to Kafka

Write as Raw String

Write as JSONEachRow

Write as CSV

Write as TSV

Write as ProtobufSingle

Write as Avro

Write Kafka Message Metadata

_tp_message_key

_tp_message_headers

sharding_expr

Properties for Kafka client

Kafka Client Properties Example

Kafka Client Properties