Kafka 外部流
You can read data from Apache Kafka (as well as Confluent Cloud, or Redpanda) in Timeplus with External Stream. 结合 物化视图 和 目标流,你还可以使用外部流向 Apache Kafka 写入数据。
创建外部流
In Timeplus Proton, the external stream supports Kafka API as the only type.
In Timeplus Enterprise, it also supports External Stream for Apache Pulsar and External Stream for other Timeplus deployment.
To create an external stream for Apache Kafka or Kafka-compatiable messaging platforms, you can run the following DDL SQL:
CREATE EXTERNAL STREAM [IF NOT EXISTS] stream_name
(<col_name1> <col_type>)
SETTINGS
type='kafka',
brokers='ip:9092',
topic='..',
security_protocol='..',
username='..',
password='..',
sasl_mechanism='..',
data_format='..',
kafka_schema_registry_url='..',
kafka_schema_registry_credentials='..',
ssl_ca_cert_file='..',
ssl_ca_pem='..',
skip_ssl_cert_check=..
The supported values for security_protocol
are:
- 纯文本:省略此选项时,这是默认值。
- SASL_SSL:设置此值时,应指定用户名和密码。
- 如果你需要指定自己的 SSL 认证文件,可以添加另一个设置
ssl_ca_cert_file='/ssl/ca.pem'
Proton 1.5.5 中的新增内容,如果你不想或无法使用文件路径,例如在 Timeplus Cloud 或 Docker/Kuker/Kup 中,也可以将 pem 文件的全部内容作为字符串放入ssl_ca_pem
设置中伯内特斯环境。 - 可以通过
设置 skip_ssl_cert_check=true
来跳过 SSL 认证验证。
- 如果你需要指定自己的 SSL 认证文件,可以添加另一个设置
The supported values for sasl_mechanism
are:
- PLAIN:当你将 security_protocol 设置为 SASL_SSL 时,这是 sasl_mechanmic 的默认值。
- SCRAM-SHA-256
- SCRAM-SHA-512
The supported values for data_format
are:
- JSONEachRow: parse each row of the message as a single JSON document. The top level JSON key/value pairs will be parsed as the columns. 了解更多.
- CSV:不太常用。 了解更多.
- ProtobufSingle: for single Protobuf message per message
- Protobuf: there could be multiple Protobuf messages in a single message.
- Avro:在 Proton 1.5.2 中添加
- rawBlob:默认值。 Read/write message as plain text.
For examples to connect to various Kafka API compatitable message platforms, please check this doc.
定义列
Single column to read
如果 Kafka 主题中的消息是纯文本格式或 JSON,则可以创建只有 字符串
类型的 原始
列的外部流。
示例:
Then use query time JSON extraction functions or shortcut to access the values, e.g. raw:id
.
以纯文本写入 Kafka
您可以使用带有单列的外部流向 Kafka 主题写入纯文本消息。
然后使用 INSERT INTO <stream_name> VALUES (v)
或 Ingest REST API,或者将其设置为物化视图向卡夫卡主题写入消息的目标流。 实际的 data_format
值为 rawBlob
但可以省略。 By default one_message_per_row
is true
.
Since Timeplus Proton 1.5.11, a new setting kafka_max_message_size
is available. When multiple rows can be written to the same Kafka message, this setting will control how many data will be put in a Kafka message, ensuring it won't exceed the kafka_max_message_size
limit.
从 Kafka 中读取多列
If the keys in the JSON message never change, or you don't care about the new columns, you can also create the external stream with multiple columns.
您可以在 JSON 中选取一些顶级键作为列,或将所有可能的键选为列。
Please note the behaviors are changed in recent versions, based on user feedback:
版本 | 默认行为 | 如何覆盖 |
---|---|---|
1.4.2 或以上 | 假设在 JSON 中有 5 个顶级键/值对,则可以在外部流中定义 5 列或少于 5 列。 数据将被正确读取。 | 如果你不想读取带有意外列的新事件,请在 CREATE DDL 中设置 input_format_skip_unknown_fields=false 。 |
1.3.24 到 1.4.1 | 假设JSON中有5个顶级键/值对,则可能需要定义5列才能全部读取。 或者在 DDL 中定义少于 5 列,并确保在每个 SELECT 查询设置中添加 input_format_skip_unknown_fields=true ,否则不会返回任何搜索结果。 | or only define some keys as columns and append this to your query: SETTINGS input_format_skip_unknown_fields=true |
1.3.23 或更高版本 | 你必须为整个 JSON 文档定义一个 字符串 列,并将查询时 JSON 解析应用于提取字段。 | 不适用 |
示例:
如果消息中有嵌套的复杂 JSON,则可以将该列定义为字符串类型。 实际上,任何 JSON 值都可以保存在字符串列中。
Protobuf messages can be read with all or partial columns. Please check this page.
要写入 Kafka 的多列
To write data to Kafka topics, you can choose different data formats:
jsoneaChrow
You can use data_format='JSONEachRow',one_message_per_row=true
to inform Timeplus to write each event as a JSON document. 外部流的列将转换为 JSON 文档中的密钥。 例如:
消息将在特定主题中生成
Please note, by default multiple JSON documents will be inserted to the same Kafka message. 每行/每行一个 JSON 文档。 这种默认行为旨在为Kafka/Redpanda获得最大的写入性能。 但是你需要确保下游应用程序能够正确拆分每条 Kafka 消息的 JSON 文档。
如果你需要每条 Kafka 消息的有效的 JSON,而不是 JSONL,请设置 one_message_per_row=true
例如
The default value of one_message_per_row, if not specified, is false for data_format='JSONEachRow'
and true for data_format='RawBLOB'
.
Since Timeplus Proton 1.5.11, a new setting kafka_max_message_size
is available. When multiple rows can be written to the same Kafka message, this setting will control how many data will be put in a Kafka message and when to create new Kafka message, ensuring each message won't exceed the kafka_max_message_size
limit.
CSV
You can use data_format='CSV'
to inform Timeplus to write each event as a JSON document. 外部流的列将转换为 JSON 文档中的密钥。 例如:
消息将在特定主题中生成
“2023-10-29 05:35:54.176 “,” https://www.nationalwhiteboard.info/sticky/recontextualize/robust/incentivize","PUT","3eaf6372e909e033fcfc2d6a3bc04ace”
ProtobufSingle
你可以在 Proton中定义 Protobuf 格式,也可以在创建外部流时指定 Kafka 架构注册表 。
Avro
Starting from Proton 1.5.2, you can use Avro format when you specify the Kafka Schema Registry when you create the external stream.
读/写 Kafka 消息密钥
对于 Kafka 主题中的每条消息,其价值肯定至关重要。 密钥是可选的,但可以携带重要的元数据。
_message_key
_message_key
is deprecated since Timeplus Proton 1.5.15 and timeplusd 2.3.10. Please use _tp_message_key
From Timeplus Proton 1.5.4 to 1.5.14, it supports _message_key
as a virtual column in Kafka external streams. 如果你运行 SELECT * FROM ext_stream
,则不会查询这样的虚拟列。 你需要明确选择该列来检索消息密钥,例如 SELECT _message_key,* FROM ext_stream
。
To write the message key and value, you need to set the message_key
in the CREATE
DDL. 它是一个返回字符串值的表达式,该表达式返回的值将用作每条消息的密钥。
示例:
message_key
可以与 sharding_expr
(在 Kafka 主题中指定目标分区号)一起使用,而且 sharding_expr
将优先级更高。
_tp_message_key
Based on user feedback, we introduced a better way to read or write the message key. Starting from timeplusd 2.3.10, you can define the _tp_message_key
column when you create the external stream. This new approach provides more intuitive and flexible way to write any content as the message key, not necessarily mapping to a specify column or a set of columns.
例如:
CREATE EXTERNAL STREAM foo (
id int32,
name string,
_tp_message_key string
) SETTINGS type='kafka',...;
You can insert any data to the Kafka topic.
When insert a row to the stream like:
INSERT INTO foo(id,name,_tp_message_key) VALUES (1, 'John', 'some-key');
'some-key'
will be used for the message key for the Kafka message (and it will be excluded from the message body, so the message will be {"id": 1, "name": "John"}
for the above SQL).
When doing a SELECT query, the message key will be populated to the _tp_message_key
column as well. SELECT * FROM foo
will return 'some-key'
for the _tp_message_key
message.
_tp_message_key
support the following types: uint8
, uint16
, uint32
, uint64
, int8
, int16
, int32
, int64
, bool
, float32
, float64
, string
, and fixed_string
.
_tp_message_key
also support nullable
. Thus we can create an external stream with optional message key. 例如:
CREATE EXTERNAL STREAM foo (
id int32,
name string,
_tp_message_key nullable(string) default null
) SETTINGS type='kafka',...;
删除外部流
DROP STREAM [IF EXISTS] stream_name
使用 SQL 查询 Kafka 数据
你可以在外部流上运行流式传输 SQL,例如
阅读现有消息
When you run SELECT raw FROM ext_stream
, Timeplus will read the new messages in the topics, not the existing ones. 如果您需要阅读所有现有消息,则可以使用以下设置:
或者如果你运行的是 Proton 1.5.9 或更高版本,则使用以下 SQL:
:: warning
请避免通过 select * from table (ext_stream)
扫描所有数据。 However select count(*) from table(ext_stream)
is optimized to get the number of current message count from the Kafka topic.
:::
读取指定分区
从 Proton 1.3.18 开始,你还可以在指定的 Kafka 分区中读取。 默认情况下,将读取所有分区。 但是你也可以通过 shards
设置从单个分区读取,例如
或者你可以指定一组用逗号分隔的分区 ID,例如
使用 SQL 写入 Kafka
你可以使用物化视图将数据作为外部流写入到 Kafka,例如
Docker Compose 教程
请查看教程:
Kafka 客户端的属性
对于更高级的用例,可以在创建外部流时指定自定义属性。 For more advanced use cases, you can specify customized properties while creating the external streams. Those properties will be passed to the underlying Kafka client, which is librdkafka.
例如:
请注意,并非支持 librdkafka 中的所有属性。 今天,Proton 接受了以下内容。 有关详细信息,请查看 librdkafka 的配置指南。
(C/P legend: C = Consumer, P = Producer, * = both)
属性 | C/P | 范围 | 默认值 | Importance | 描述 |
---|---|---|---|---|---|
client.id | * | rdkafka | low | Client identifier. Type: string | |
message.max.bytes | * | 1000 .. 1000 .. 1000000000 | 1000000 | 中 | Kafka 协议请求消息的最大大小。 Due to differing framing overhead between protocol versions the producer is unable to reliably enforce a strict max message limit at produce time and may exceed the maximum size by one message in protocol ProduceRequests, the broker will enforce the topic's max.message.bytes limit (see Apache Kafka documentation). Type: integer |
message.copy.max.bytes | * | 1 .. 1000000 1000 .. 1000000000 | 65535 | low | Maximum size for message to be copied to buffer. Messages larger than this will be passed by reference (zero-copy) at the expense of larger iovecs. Type: integer |
receive.message.max.bytes | * | 1000 .. 2147483647 | 100000000 | 中 | Maximum Kafka protocol response message size. This serves as a safety precaution to avoid memory exhaustion in case of protocol hickups. This value must be at least fetch.max.bytes + 512 to allow for protocol overhead; the value is adjusted automatically unless the configuration property is explicitly set. Type: integer |
max.in.flight.requests.per.connection | * | 1.。 1000000 | 1000000 | low | Maximum number of in-flight requests per broker connection. This is a generic property applied to all broker communication, however it is primarily relevant to produce requests. In particular, note that other mechanisms limit the number of outstanding consumer fetch request per broker to one. Type: integer |
max.in.flight | * | 1.。 1000000 | 1000000 | low | Alias for max.in.flight.requests.per.connection : Maximum number of in-flight requests per broker connection. This is a generic property applied to all broker communication, however it is primarily relevant to produce requests. In particular, note that other mechanisms limit the number of outstanding consumer fetch request per broker to one. Type: integer |
metadata.request.timeout.ms | * | 10 .. 0 .. 900000 | 60000 | low | Non-topic request timeout in milliseconds. This is for metadata requests, etc. Type: integer |
topic.metadata.refresh.interval.ms | * | -1。。 3600000 | 1 .. 300000 | low | 刷新主题和代理元数据以便主动发现任何新的代理、主题、分区或分区领导变更的时间段(以毫秒为单位)。 Use -1 to disable the intervalled refresh (not recommended). If there are no locally referenced topics (no topic objects created, no messages produced, no subscription or no assignment) then only the broker list will be refreshed every interval but no more often than every 10s. Type: integer |
metadata.max.age.ms | * | 1.。 86400000 | 0 .. 900000 | low | Metadata cache max age. Defaults to topic.metadata.refresh.interval.ms * 3 Type: integer |
topic.metadata.refresh.fast.interval.ms | * | 1.。 60000 | 250 | low | When a topic loses its leader a new metadata request will be enqueued with this initial interval, exponentially increasing until the topic metadata has been refreshed. This is used to recover quickly from transitioning leader brokers. Type: integer |
topic.metadata.refresh.fast.cnt | * | 0。。 1000 | 10 | low | DEPRECATED No longer used. Type: integer |
topic.metadata.refresh.sparse | * | 真的,假的 | true | low | Sparse metadata requests (consumes less network bandwidth) Type: boolean |
topic.metadata.propagation.max.ms | * | 0。。 3600000 | 30000 | low | Apache Kafka topic creation is asynchronous and it takes some time for a new topic to propagate throughout the cluster to all brokers. If a client requests topic metadata after manual topic creation but before the topic has been fully propagated to the broker the client is requesting metadata from, the topic will seem to be non-existent and the client will mark the topic as such, failing queued produced messages with ERR__UNKNOWN_TOPIC . This setting delays marking a topic as non-existent until the configured propagation max time has passed. The maximum propagation time is calculated from the time the topic is first referenced in the client, e.g., on produce(). Type: integer |
topic.blacklist | * | low | Topic blacklist, a comma-separated list of regular expressions for matching topic names that should be ignored in broker metadata information as if the topics did not exist. Type: pattern list | ||
debug | * | generic, broker, topic, metadata, feature, queue, msg, protocol, cgrp, security, fetch, interceptor, plugin, consumer, admin, eos, mock, assignor, conf, all | 中 | A comma-separated list of debug contexts to enable. Detailed Producer debugging: broker,topic,msg. Consumer: consumer,cgrp,topic,fetch Type: CSV flags | |
socket.timeout.ms | * | 10 .. 1 .. 300000 | 60000 | low | Default timeout for network requests. Producer: ProduceRequests will use the lesser value of socket.timeout.ms and remaining message.timeout.ms for the first message in the batch. Consumer: FetchRequests will use fetch.wait.max.ms + socket.timeout.ms . Admin: Admin requests will use socket.timeout.ms or explicitly set rd_kafka_AdminOptions_set_operation_timeout() value. Type: integer |
socket.blocking.max.ms | * | 1.。 60000 | 1000 | low | DEPRECATED No longer used. Type: integer |
socket.send.buffer.bytes | * | 0。。 100000000 | 0 | low | Broker socket send buffer size. System default is used if 0. Type: integer |
socket.receive.buffer.bytes | * | 0。。 100000000 | 0 | low | Broker socket receive buffer size. System default is used if 0. Type: integer |
socket.keepalive.enable | * | 真的,假的 | false | low | Enable TCP keep-alives (SO_KEEPALIVE) on broker sockets Type: boolean |
socket.nagle.disable | * | 真的,假的 | false | low | Disable the Nagle algorithm (TCP_NODELAY) on broker sockets. Type: boolean |
socket.max.fails | * | 0。。 1000000 | 1 | low | Disconnect from broker when this number of send failures (e.g., timed out requests) is reached. Disable with 0. WARNING: It is highly recommended to leave this setting at its default value of 1 to avoid the client and broker to become desynchronized in case of request timeouts. NOTE: The connection is automatically re-established. Type: integer |
broker.address.ttl | * | 0。。 86400000 | 1000 | low | How long to cache the broker address resolving results (milliseconds). Type: integer |
broker.address.family | * | any, v4, v6 | 任何 | low | Allowed broker IP address families: any, v4, v6 Type: enum value |
reconnect.backoff.jitter.ms | * | 0。。 3600000 | 0 | low | DEPRECATED No longer used. See reconnect.backoff.ms and reconnect.backoff.max.ms . Type: integer |
reconnect.backoff.ms | * | 0。。 3600000 | 100 | 中 | The initial time to wait before reconnecting to a broker after the connection has been closed. The time is increased exponentially until reconnect.backoff.max.ms is reached. -25% to +50% jitter is applied to each reconnect backoff. A value of 0 disables the backoff and reconnects immediately. Type: integer |
reconnect.backoff.max.ms | * | 0。。 3600000 | 10000 | 中 | The maximum time to wait before reconnecting to a broker after the connection has been closed. Type: integer |
statistics.interval.ms | * | 0。。 86400000 | 0 | high | librdkafka statistics emit interval. The application also needs to register a stats callback using rd_kafka_conf_set_stats_cb() . The granularity is 1000ms. A value of 0 disables statistics. Type: integer |
log_level | * | 0。。 7 | 6 | low | Logging level (syslog(3) levels) Type: integer |
log.thread.name | * | 真的,假的 | true | low | Print internal thread name in log messages (useful for debugging librdkafka internals) Type: boolean |
log.connection.close | * | 真的,假的 | true | low | Log broker disconnects. It might be useful to turn this off when interacting with 0.9 brokers with an aggressive connection.max.idle.ms value. Type: boolean |
api.version.request.timeout.ms | * | 1.。 1 .. 300000 | 10000 | low | Timeout for broker API version requests. Type: integer |
api.version.fallback.ms | * | 0。。 604800000 | 0 | 中 | Dictates how long the broker.version.fallback fallback is used in the case the ApiVersionRequest fails. NOTE: The ApiVersionRequest is only issued when a new connection to the broker is made (such as after an upgrade). Type: integer |
broker.version.fallback | * | 0.10.0 | 中 | Older broker versions (before 0.10.0) provide no way for a client to query for supported protocol features (ApiVersionRequest, see api.version.request ) making it impossible for the client to know what features it may use. As a workaround a user may set this property to the expected broker version and the client will automatically adjust its feature set accordingly if the ApiVersionRequest fails (or is disabled). The fallback broker version will be used for api.version.fallback.ms . Valid values are: 0.9.0, 0.8.2, 0.8.1, 0.8.0. Any other value >= 0.10, such as 0.10.2.1, enables ApiVersionRequests. Type: string | |
ssl.cipher.suites | * | low | A cipher suite is a named combination of authentication, encryption, MAC and key exchange algorithm used to negotiate the security settings for a network connection using TLS or SSL network protocol. See manual page for ciphers(1) and `SSL_CTX_set_cipher_list(3). Type: string | ||
ssl.curves.list | * | low | The supported-curves extension in the TLS ClientHello message specifies the curves (standard/named, or 'explicit' GF(2^k) or GF(p)) the client is willing to have the server use. See manual page for SSL_CTX_set1_curves_list(3) . OpenSSL >= 1.0.2 required. Type: string | ||
ssl.sigalgs.list | * | low | The client uses the TLS ClientHello signature_algorithms extension to indicate to the server which signature/hash algorithm pairs may be used in digital signatures. See manual page for SSL_CTX_set1_sigalgs_list(3) . OpenSSL >= 1.0.2 required. Type: string | ||
ssl.key.location | * | low | Path to client's private key (PEM) used for authentication. Type: string | ||
ssl.key.password | * | low | Private key passphrase (for use with ssl.key.location and set_ssl_cert() ) Type: string | ||
ssl.key.pem | * | low | Client's private key string (PEM format) used for authentication. Type: string | ||
ssl.certificate.location | * | low | Path to client's public key (PEM) used for authentication. Type: string | ||
ssl.certificate.pem | * | low | Client's public key string (PEM format) used for authentication. Type: string | ||
ssl.ca.location | * | low | File or directory path to CA certificate(s) for verifying the broker's key. Defaults: On Windows the system's CA certificates are automatically looked up in the Windows Root certificate store. On Mac OSX this configuration defaults to probe . It is recommended to install openssl using Homebrew, to provide CA certificates. On Linux install the distribution's ca-certificates package. If OpenSSL is statically linked or ssl.ca.location is set to probe a list of standard paths will be probed and the first one found will be used as the default CA certificate location path. If OpenSSL is dynamically linked the OpenSSL library's default path will be used (see OPENSSLDIR in openssl version -a ). Type: string | ||
ssl.ca.certificate.stores | * | Root | low | Comma-separated list of Windows Certificate stores to load CA certificates from. Certificates will be loaded in the same order as stores are specified. If no certificates can be loaded from any of the specified stores an error is logged and the OpenSSL library's default CA location is used instead. Store names are typically one or more of: MY, Root, Trust, CA. Type: string | |
ssl.crl.location | * | low | Path to CRL for verifying broker's certificate validity. Type: string | ||
ssl.keystore.location | * | low | Path to client's keystore (PKCS#12) used for authentication. Type: string | ||
ssl.keystore.password | * | low | Client's keystore (PKCS#12) password. Type: string | ||
enable.ssl.certificate.verification | * | 真的,假的 | true | low | Enable OpenSSL's builtin broker (server) certificate verification. This verification can be extended by the application by implementing a certificate_verify_cb. Type: boolean |
ssl.endpoint.identification.algorithm | * | none, https | 无 | low | Endpoint identification algorithm to validate broker hostname using broker certificate. https - Server (broker) hostname verification as specified in RFC2818. none - No endpoint verification. OpenSSL >= 1.0.2 required. Type: enum value |
ssl.certificate.verify_cb | * | low | Callback to verify the broker certificate chain. Type: see dedicated API | ||
sasl.kerberos.service.name | * | kafka | low | Kerberos principal name that Kafka runs as, not including /hostname@REALM Type: string | |
sasl.kerberos.principal | * | kafkaclient | low | This client's Kerberos principal name. (Not supported on Windows, will use the logon user's principal). Type: string | |
sasl.kerberos.kinit.cmd | * | ||||
sasl.kerberos.keytab | * | low | Path to Kerberos keytab file. This configuration property is only used as a variable in sasl.kerberos.kinit.cmd as ... -t "%{sasl.kerberos.keytab}" . Type: string | ||
sasl.kerberos.min.time.before.relogin | * | 0。。 86400000 | 60000 | low | Minimum time in milliseconds between key refresh attempts. Disable automatic key refresh by setting this property to 0. Type: integer |
sasl.password | * | high | SASL password for use with the PLAIN and SASL-SCRAM-.. mechanism Type: string | ||
sasl.oauthbearer.config | * | low | SASL/OAUTHBEARER configuration. The format is implementation-dependent and must be parsed accordingly. The default unsecured token implementation (see https://tools.ietf.org/html/rfc7515#appendix-A.5) recognizes space-separated name=value pairs with valid names including principalClaimName, principal, scopeClaimName, scope, and lifeSeconds. The default value for principalClaimName is "sub", the default value for scopeClaimName is "scope", and the default value for lifeSeconds is 3600. The scope value is CSV format with the default value being no/empty scope. For example: principalClaimName=azp principal=admin scopeClaimName=roles scope=role1,role2 lifeSeconds=600 . In addition, SASL extensions can be communicated to the broker via extension_NAME=value . For example: principal=admin extension_traceId=123 Type: string | ||
enable.sasl.oauthbearer.unsecure.jwt | * | 真的,假的 | false | low | Enable the builtin unsecure JWT OAUTHBEARER token handler if no oauthbearer_refresh_cb has been set. This builtin handler should only be used for development or testing, and not in production. Type: boolean |
partition.assignment.strategy | C | range,roundrobin | 中 | The name of one or more partition assignment strategies. The elected group leader will use a strategy supported by all members of the group to assign partitions to group members. If there is more than one eligible strategy, preference is determined by the order of this list (strategies earlier in the list have higher priority). Cooperative and non-cooperative (eager) strategies must not be mixed. Available strategies: range, roundrobin, cooperative-sticky. Type: string | |
session.timeout.ms | C | 1.。 3600000 | 10000 | high | Client group session and failure detection timeout. The consumer sends periodic heartbeats (heartbeat.interval.ms) to indicate its liveness to the broker. If no hearts are received by the broker for a group member within the session timeout, the broker will remove the consumer from the group and trigger a rebalance. The allowed range is configured with the broker configuration properties group.min.session.timeout.ms and group.max.session.timeout.ms . Also see max.poll.interval.ms . Type: integer |
heartbeat.interval.ms | C | 1.。 3600000 | 3000 | low | Group session keepalive heartbeat interval. Type: integer |
group.protocol.type | C | consumer | low | Group protocol type. NOTE: Currently, the only supported group protocol type is consumer . Type: string | |
coordinator.query.interval.ms | C | 1.。 3600000 | 600000 | low | How often to query for the current client group coordinator. If the currently assigned coordinator is down the configured query interval will be divided by ten to more quickly recover in case of coordinator reassignment. Type: integer |
max.poll.interval.ms | C | 1.。 86400000 | 1 .. 300000 | high | Maximum allowed time between calls to consume messages (e.g., rd_kafka_consumer_poll()) for high-level consumers. If this interval is exceeded the consumer is considered failed and the group will rebalance in order to reassign the partitions to another consumer group member. Warning: Offset commits may be not possible at this point. Note: It is recommended to set enable.auto.offset.store=false for long-time processing applications and then explicitly store offsets (using offsets_store()) after message processing, to make sure offsets are not auto-committed prior to processing has finished. The interval is checked two times per second. See KIP-62 for more information. Type: integer |
auto.commit.interval.ms | C | 0。。 86400000 | 5000 | 中 | The frequency in milliseconds that the consumer offsets are committed (written) to offset storage. (0 = disable). This setting is used by the high-level consumer. Type: integer |
queued.min.messages | C | 1.。 10000000 | 100000 | 中 | Minimum number of messages per topic+partition librdkafka tries to maintain in the local consumer queue. Type: integer |
queued.max.messages.kbytes | C | 1.。 2097151 | 65536 | 中 | Maximum number of kilobytes of queued pre-fetched messages in the local consumer queue. If using the high-level consumer this setting applies to the single consumer queue, regardless of the number of partitions. When using the legacy simple consumer or when separate partition queues are used this setting applies per partition. This value may be overshot by fetch.message.max.bytes. This property has higher priority than queued.min.messages. Type: integer |
fetch.wait.max.ms | C | 0。。 1 .. 300000 | 500 | low | Maximum time the broker may wait to fill the Fetch response with fetch.min.bytes of messages. Type: integer |
fetch.message.max.bytes | C | 1.。 1000 .. 1000000000 | 1048576 | 中 | Initial maximum number of bytes per topic+partition to request when fetching messages from the broker. If the client encounters a message larger than this value it will gradually try to increase it until the entire message can be fetched. Type: integer |
max.partition.fetch.bytes | C | 1.。 1000 .. 1000000000 | 1048576 | 中 | Alias for fetch.message.max.bytes : Initial maximum number of bytes per topic+partition to request when fetching messages from the broker. If the client encounters a message larger than this value it will gradually try to increase it until the entire message can be fetched. Type: integer |
fetch.max.bytes | C | 0。。 2147483135 | 52428800 | 中 | Maximum amount of data the broker shall return for a Fetch request. Messages are fetched in batches by the consumer and if the first message batch in the first non-empty partition of the Fetch request is larger than this value, then the message batch will still be returned to ensure the consumer can make progress. The maximum message batch size accepted by the broker is defined via message.max.bytes (broker config) or max.message.bytes (broker topic config). fetch.max.bytes is automatically adjusted upwards to be at least message.max.bytes (consumer config). Type: integer |
fetch.min.bytes | C | 1.。 100000000 | 1 | low | Minimum number of bytes the broker responds with. If fetch.wait.max.ms expires the accumulated data will be sent to the client regardless of this setting. Type: integer |
fetch.error.backoff.ms | C | 0。。 1 .. 300000 | 500 | 中 | How long to postpone the next fetch request for a topic+partition in case of a fetch error. Type: integer |
offset.store.method | C | none, file, broker | broker | low | DEPRECATED Offset commit store method: 'file' - DEPRECATED: local file store (offset.store.path, et.al), 'broker' - broker commit store (requires Apache Kafka 0.8.2 or later on the broker). Type: enum value |
isolation.level | C | read_uncommitted, read_committed | read_committed | high | Controls how to read messages written transactionally: read_committed - only return transactional messages which have been committed. read_uncommitted - return all messages, even transactional messages which have been aborted. Type: enum value |
check.crcs | C | 真的,假的 | false | 中 | Verify CRC32 of consumed messages, ensuring no on-the-wire or on-disk corruption to the messages occurred. This check comes at slightly increased CPU usage. Type: boolean |
allow.auto.create.topics | C | 真的,假的 | false | low | Allow automatic topic creation on the broker when subscribing to or assigning non-existent topics. The broker must also be configured with auto.create.topics.enable=true for this configuraiton to take effect. Note: The default value (false) is different from the Java consumer (true). Requires broker version >= 0.11.0.0, for older broker versions only the broker configuration applies. Type: boolean |
client.rack | * | low | A rack identifier for this client. This can be any string value which indicates where this client is physically located. It corresponds with the broker config broker.rack . Type: string | ||
transactional.id | P | high | Enables the transactional producer. The transactional.id is used to identify the same transactional producer instance across process restarts. It allows the producer to guarantee that transactions corresponding to earlier instances of the same producer have been finalized prior to starting any new transactions, and that any zombie instances are fenced off. If no transactional.id is provided, then the producer is limited to idempotent delivery (if enable.idempotence is set). Requires broker version >= 0.11.0. Type: string | ||
transaction.timeout.ms | P | 1000 .. 2147483647 | 60000 | 中 | The maximum amount of time in milliseconds that the transaction coordinator will wait for a transaction status update from the producer before proactively aborting the ongoing transaction. If this value is larger than the transaction.max.timeout.ms setting in the broker, the init_transactions() call will fail with ERR_INVALID_TRANSACTION_TIMEOUT. The transaction timeout automatically adjusts message.timeout.ms and socket.timeout.ms , unless explicitly configured in which case they must not exceed the transaction timeout (socket.timeout.ms must be at least 100ms lower than transaction.timeout.ms ). This is also the default timeout value if no timeout (-1) is supplied to the transactional API methods. Type: integer |
enable.idempotence | P | 真的,假的 | false | high | 当设置为 true 时,生产者将确保消息仅按原始生成顺序成功生成一次。 The following configuration properties are adjusted automatically (if not modified by the user) when idempotence is enabled: max.in.flight.requests.per.connection=5 (must be less than or equal to 5), retries=INT32_MAX (must be greater than 0), acks=all , queuing.strategy=fifo . Producer instantation will fail if user-supplied configuration is incompatible. Type: boolean |
enable.gapless.guarantee | P | 真的,假的 | false | low | EXPERIMENTAL: subject to change or removal. When set to true , any error that could result in a gap in the produced message series when a batch of messages fails, will raise a fatal error (ERR__GAPLESS_GUARANTEE) and stop the producer. Messages failing due to message.timeout.ms are not covered by this guarantee. Requires enable.idempotence=true . Type: boolean |
队列缓冲最大消息 | P | 1.。 10000000 | 100000 | high | 生产者队列中允许的最大消息数。 This queue is shared by all topics and partitions. Type: integer |
队列缓冲最大千字节 | P | 1.。 2147483647 | 1048576 | high | 生产者队列允许的最大消息总大小总和。 This queue is shared by all topics and partitions. This property has higher priority than queue.buffering.max.messages. Type: integer |
队列缓冲区.max.ms | P | 0。。 0 .. 900000 | 5 | high | 在构造要传输给代理的消息批处理 (MessageSet) 之前,等待生产者队列中的消息累积的延迟(以毫秒为单位)。 A higher value allows larger and more effective (less overhead, improved compression) batches of messages to accumulate at the expense of increased message delivery latency. Type: float |
linger.ms | P | 0。。 0 .. 900000 | 5 | high | Alias for queue.buffering.max.ms : Delay in milliseconds to wait for messages in the producer queue to accumulate before constructing message batches (MessageSets) to transmit to brokers. A higher value allows larger and more effective (less overhead, improved compression) batches of messages to accumulate at the expense of increased message delivery latency. Type: float |
message.send.max.retries | P | 0。。 2147483647 | 2147483647 | high | 重试发送失败的消息多少次。 Note: retrying may cause reordering unless enable.idempotence is set to true. Type: integer |
retries | P | 0。。 2147483647 | 2147483647 | high | message.send.max.retries 的别名 :重试发送失败的消息多少次。 Note: retrying may cause reordering unless enable.idempotence is set to true. Type: integer |
重试.backoff.ms | P | 1.。 1 .. 300000 | 100 | 中 | The backoff time in milliseconds before retrying a protocol request. Type: integer |
queue.buffering.backpressure.threshold | P | 1.。 1000000 | 1 | low | The threshold of outstanding not yet transmitted broker requests needed to backpressure the producer's message accumulator. If the number of not yet transmitted requests equals or exceeds this number, produce request creation that would have otherwise been triggered (for example, in accordance with linger.ms) will be delayed. A lower number yields larger and more effective batches. A higher value can improve latency when using compression on slow machines. Type: integer |
压缩编解码器 | P | 无、gzip、snappy、lz4、zstd | 无 | 中 | compression codec to use for compressing message sets. This is the default value for all topics, may be overridden by the topic configuration property compression.codec . Type: enum value |
压缩类型 | P | 无、gzip、snappy、lz4、zstd | 无 | 中 | compression.codec 的别名:用于压缩消息集的压缩编解码器。 This is the default value for all topics, may be overridden by the topic configuration property compression.codec . Type: enum value |
batch.num.messages | P | 1.。 1000000 | 10000 | 中 | 一个 MessageSet 中批处理的最大消息数。 The total MessageSet size is also limited by batch.size and message.max.bytes. Type: integer |
批次大小 | P | 1.。 2147483647 | 1000000 | 中 | 一个 MessageSet 中批处理的所有消息的最大大小(以字节为单位),包括协议帧开销。 This limit is applied after the first message has been added to the batch, regardless of the first message's size, this is to ensure that messages that exceed batch.size are produced. The total MessageSet size is also limited by batch.num.messages and message.max.bytes. Type: integer |
delivery.report.only.error | P | 真的,假的 | false | low | Only provide delivery reports for failed messages. Type: boolean |
sticky.partitioning.linger.ms | P | 0。。 0 .. 900000 | 10 | low | Delay in milliseconds to wait to assign new sticky partitions for each topic. By default, set to double the time of linger.ms. To disable sticky behavior, set to 0. This behavior affects messages with the key NULL in all cases, and messages with key lengths of zero when the consistent_random partitioner is in use. These messages would otherwise be assigned randomly. A higher value allows for more effective batching of these messages. Type: integer |
限制
基于 Kafka 的外部流有一些限制,因为 TimePlus 不控制外部流的储存或数据格式。
- The UI wizard to setup Kafka External Stream only supports JSON or TEXT. To use Avro, Protobuf, or schema registry service, you need the SQL DDL.
_tp_time
可在外部流中使用(自 Proton 1.3.30 起)。_tp_time
is available in the external streams (since Proton 1.3.30)._tp_append_time
is set only when message timestamp is an append time.- 与正常流不同的是,外部流没有历史存储。 In recent versions, you can run
table(kafka_ext_stream)
but it will scan all messages in the topic, unless you are running acount()
. If you need to frequently run query for historical data, you can use a Materialized View to query the Kafka External Stream and save the data in Timeplus columnar or row storage. This will improve the query performance. - 在 Timeplus 中没有关于外部流的保留政策。 您需要在 Kafka/Confluent/Redpanda 配置保留政策。 如果外部系统不再提供数据,则不能在 Timeplus 搜索。