Skip to main content

Process Text

lower

lower(str)

Converts ASCII Latin symbols in a string to lowercase.

upper

upper(str)

Converts ASCII Latin symbols in a string to uppercase.

format

format(template,args) Formatting constant pattern with the string listed in the arguments.

For example, format('{} {}', 'Hello', 'World')gets Hello World

concat

concat(str1,str2 [,str3]) Combine 2 or more strings as a single string. For example, concat('95','%') to get 95%. You can also use || as the shortcut syntax, e.g. '95' || '%' Each parameter in this function needs to be a string. You can use to_string function to convert them, for example to_string(95) || '%'

substr

substr(str,index [,length]) Returns the substring of str from index (starting from 1). length is optional.

trim

trim(string)

Removes all specified characters from the start or end of a string. By default removes all consecutive occurrences of common whitespace (ASCII character 32) from both ends of a string.

split_by_string

split_by_string(sep,string)

Splits a string into substrings separated by a string. It uses a constant string sep of multiple characters as the separator. If the string sep is empty, it will split the string string into an array of single characters.

For example split_by_string('b','abcbxby')will get an array with string ['a','c','x','y']

match

match(string,pattern) determines whether the string matches the given regular expression. For example, to check whether the text contains a sensitive AWS ARN, you can run match(text,'arn:aws:kms:us-east-1:\d{12}:key/.{36}')

multi_search_any

multi_search_any(text, array) determines whether the text contains any of the strings from the given array. For example, to check whether the text contains any sensitive keywords, you can run multi_search_any(text,['password','token','secret'])

replace_one

replace_one(string,pattern,replacement)

Replace pattern with the 3rd argument replacement in string.

For example replace_one('abca','a','z') will get zbca

replace

replace(string,pattern,replacement)

Replace pattern with the 3rd argument replacement in string.

For example replace('aabc','a','z') will get zzbc

replace_regex

replace_regex(string,pattern,replacement)

Replaces all occurrences of the pattern.

This can be used to mask data, e.g. to hide the full phone number, you can run replace_regex('604-123-4567','(\\d{3})-(\\d{3})-(\\d{4})','\\1-***-****') to get 604-***-****

extract

Process plain text with regular expression and extract the content. For example, extract('key1=value1, key2=value2','key1=(\\w+)'), this will get “value1”. If the log lines are put into a single text column, you can create a view with the extracted fields, e.g.

create view logs as
select extract(value, 'key1=(\\w+)') as key1,
extract(value, 'key2=(\\w+)') as key2
from log_stream

extract_all_groups

extract_all_groups(haystack, pattern)

Matches all groups of the haystack string using the pattern regular expression. Returns an array of arrays, where the first array includes keys and the second array includes all values.

SELECT
extract_all_groups('v1=111, v2=222, v3=333', '("[^"]+"|\\w+)=("[^"]+"|\\w+)') as groups
-- return [ [ "v1", "v2", "v3" ], [ "111", "222", "333" ] ]

extract_all_groups_horizontal

extract_all_groups_horizontal(haystack, pattern)

Matches all groups of the haystack string using the pattern regular expression. Returns an array of arrays, where the first array includes all fragments matching the first group, the second array matching the second group, etc.

SELECT
extract_all_groups_horizontal('v1=111, v2=222, v3=333', '("[^"]+"|\\w+)=("[^"]+"|\\w+)') as groups
-- [ [ "v1", "111" ], [ "v2", "222" ], [ "v3", "333" ] ]

extract_key_value_pairs

extract_key_value_pairs(string)

Extract key value pairs from the string and return a map. For example, extract_key_value_pairs('name:neymar, age:31 team:psg,nationality:brazil') will return a map with keys: name, age, team, ad nationality.

For the advanced usage of the function, please check the doc.

grok

grok(string,pattern)

Extract value from plan text without using regular expression. e.g. SELECT grok('My name is Jack. I am 23 years old.','My name is %{DATA:name}. I am %{INT:age} years old.') as m will get {"name":"Jack","age":"23"} as the m.

Please note all keys and values in the returned map are in string type. You can convert them to other type, e.g. (m['age'])::int

coalesce

coalesce(value1, value2,..) Checks from left to right whether NULL arguments were passed and returns the first non-NULL argument. If you get error messages related to Nullable type, e.g. "Nested type array(string) cannot be inside Nullable type", you can use this function to turn the data into non-NULL For example json_extract_array(coalesce(raw:payload, ''))

hex

hex(argument)Returns a string containing the argument’s hexadecimal representation. argument can be any type.

unhex

unhex(string) Performs the opposite operation of hex. It interprets each pair of hexadecimal digits (in the argument) as a number and converts it to the byte represented by the number. The return value is a binary string (BLOB).

uuid

uuid() or uuid(x) Generates a universally unique identifier (UUID) which is a 16-byte number used to identify records. In order to generate multiple UUID in one row, pass a parameter in each function call, such as SELECT uuid(1) as a, uuid(2) as b Otherwise if there is no parameter while calling multiple uuid functions in one SQL statement, the same UUID value will be returned.

base64_encode

base64_encode(string) Encodes a string or fixed_string as base64.

For example base64_encode('hello') returns aGVsbG8=

base64_decode

base64_decode(string) Decode a base64 string to a string.

For example base64_decode('aGVsbG8=') returns hello

base58_encode

base58_encode(string) Encodes a string or fixed_string as base58 in the "Bitcoin" alphabet.

For example base58_encode('hello') returns Cn8eVZg

base58_decode

base58_decode(string) Decode a base58 string to a string.

For example base58_decode('Cn8eVZg') returns hello

format_readable_quantity

format_readable_quantity(number) Returns a rounded number with suffix (thousand, million, billion, etc.) as string. For example, format_readable_quantity(10036) returns "10.04 thousand".

format_readable_size

format_readable_size(number) Returns a rounded number with suffix (KiB, GiB, etc.) as string. For example, format_readable_size(10036) returns "9.80 KiB".