Functions
Timeplus supports ANSI-SQL standard syntax. The following functions are provided for various use cases. Please contact us if you need more functions.
Type Conversion
to_time
to_time(time_string [, default_time_zone] [,defaultValue])
to convert the string to a datetime64 value
For example to_time('1/2/22')
or to_time('1/2/22','America/New_York')
Some of the common timezones are:
UTC
: same asGMT
(Greenwich Mean Time)EST
: US Eastern TimeMST
: US Mountain TimePST8PDT
: US Pacific TimeAmerica/New_York
: same asEST
America/Los_Angeles
: same asPST8PDT
America/Vancouver
: same asPST8PDT
Asia/Shanghai
: same asPRC
For the full list of possible timezones, please check "TZ database name" column in the wikipedia page.
You can also convert the time between timezones via to_timezone
to_int
to_int(string)
Convert a string to an integer.
to_float
to_float(string)
Convert a string to a float number, e.g. to_float('3.1415926')
to_decimal
to_decimal(number_or_string, scale)
For example to_decimal('3.1415926',2)
to get 3.14
to_string
Convert any data type to a string, so that you can do other string operations, such as concat
to_bool
Convert the value to a bool
type. e.g. select to_bool(1), to_bool(true),to_bool(True),to_bool('true')
all return true
. Please note you cannot run to_bool('True')
cast
Convert an input value to the specified data type. Three syntax variants are supported:
cast(x, T)
cast(x as t)
x::t
While
- x — A value to convert. May be of any type.
- T — The name of the target data type. String.
- t — The target data type
For example
select
cast('1', 'integer'),
cast('1' as integer),
cast(3.1415, 'decimal(3, 2)'),
json_extract_string('{"a":"001"}','a')::integer
to_type_name
to_type_name(x)
to show the type name of the argument x
. This is mainly for troubleshooting purpose to understand the date type for a function call.
Access Compound Type
array_cast
array_cast(element1,element2,..)
create a new array with the given elements, e.g. array_cast(1,2)
will get [1,2]
Please note, the elements should be in the same type, such as array_cast('a','n')
, not array_cast('a',0)
length
length(array)
Get the length of the array.
array[index]
You can easily access any element in the array, just using arrayName[index], such as topValues[2]
The first element's index is 1, instead of 0.
index_of
index_of(arr,x)
returns the index of x
in the array arr
. The first element's index is 1. Return 0 if x
is not in the array.
array_compact
array_compact(arr)
Removes consecutive duplicate elements from an array, e.g. array_compact([1,1,2,2,2,3,4,4,5])
returns [1,2,3,4,5]
array_concat
array_concat(array1,array2)
Concatenates two arrays into one.
array_difference
array_difference(arr)
calculates the difference between adjacent array elements. Returns an array where the first element will be 0, the second is the difference between a[1] - a[0]
, etc. e.g. array_difference([1,2,3,5])
returns [0,1,1,2]
array_distinct
array_distinct(arr)
returns an array containing the distinct elements only. e.g. array_distinct([1,1,2,3,3,1])
return [1,2,3], while array_compact([1,1,2,3,3,1])
returns [1,2,3,1]
array_flatten
array_flatten(array1, array2,..)
Converts an array of arrays to a flat array. e.g. array_flatten([[[1]], [[2], [3]]])
returns [1,2,3]
array_string_concat
array_string_concat(arr[, separator])
Concatenates string representations of values listed in the array with the separator. separator
is an optional parameter: a constant string, set to an empty string by default.
For example array_string_concat([1,2,3],'-')
to get a string 1-2-3
array_join
array_join(an_array)
This is a special function. group_array(col)
to group the column value from multiple rows to a single value in a row. array_join
does the opposite: it can convert one row with an array value to multiple rows.
For example select array_join([10,20]) as v, 'text' as t
will get 2 rows
v | t |
---|---|
10 | text |
20 | text |
array_pop_back
array_pop_back(array)
removes the last item from the array. e.g. array_pop_back([1,2,3])
returns [1,2]
array_pop_front
array_pop_front(array)
removes the first item from the array. e.g. array_pop_front([1,2,3])
returns [2,3]
array_push_back
array_push_back(array, value)
add the value to the array as the last item. e.g. array_push_back([1,2,3],4)
returns [1,2,3,4]
array_push_front
array_push_front(array, value)
add the value to the array as the first item. e.g. array_push_front([1,2,3],4)
returns [4,1,2,3]
array_product
array_product(array)
multiplies elements in the array. e.g. array_product([2,3,4])
returns 24 (2 x 3 x 4)
array_resize
array_resize(array, size [,extender])
changes the length of the array. If size
is smaller than the current length of the array, the array is truncated. Otherwise, a new array with the specified size is created, filling value with the specified extender
. e.g. array_resize([3,4],1)
returns [3]. array_resize([3,4],4,5)
returns [3,4,5,5]
array_reverse
array_reverse(arr)
returns an array with the reversed order of the original array, e.g. array_reverse([1,2,3])
returns [3,2,1]
array_slice
array_slice(arr, offset [,length])
returns a slice of the array. If length
is not specified, then slice to the end of the array, e.g. array_slice([1,2,3,4,5],2)
returns [2,3,4,5]. If offset
is greater than the array lenghth, returns an empty array []. If length
is specfied, this is the lenght of new array, e.g. array_slice([1,2,3,4,5],2,3)
returns [2,3,4]
array_uniq
array_uniq(arr)
returns the number of unique values in the array, e.g. array_uniq([1,1,2,3])
returns 3
array_zip
array_zip(arr1,arr2,.. arrN)
group elements from different arrays to a new array of tuples. e.g. array_zip([1,2,3],['a','b','c'])
returns [(1,'a'),(2,'b'),(3,'c')]
array_all
array_all([func,] array)
returns 1(true) or 0(false) if all elements in the array meet the condition. For example, array_all([1,2])
return 1, and array_all([0,0])
return 0. You can pass a lambda function to it as the first argument to customize the condition check, such as array_all(x->x%2==0,[2,4,6])
to check whether each element in the array is even. It returns 1.
array_avg
array_avg([func,] array)
returns the average value in the array. For example, array_avg([2,6])
return 4. You can pass a lambda function to it as the first argument to apply on each element before calculating the average, such as array_avg(x->x*x,[2,6])
to get the average for 2*2 and 6*6, which is 20.
array_count
array_count([func,] array)
returns the number of elements in the array meeting the condition. By default, check whether the value is not 0. e.g. array_count([0,0,1,2])
returns 2. You can pass a lambda function to it as the first argument to apply on each element before calculating the count, such as array_count(x->x>1,[1,2])
to get the number of numbers which is greater than 1, it returns 1.
array_cum_sum
array_cum_sum([func,] array)
returns an array of partial sums of elements in the source array (a running sum). e.g. array_cum_sum([1,1,1])
returns [1,2,3]. You can pass a lambda function to it as the first argument to apply on each element before calculating the moving sum, such as array_cum_sum(x->x*x,[1,2])
to get [1,5]
array_exists
array_exists([func,] array)
returns 1(true) or 0(false) if any element in the array meet the condition. For example, array_exists([0,1,2])
return 1, and array_exists([0,0])
return 0. You can pass a lambda function to it as the first argument to customize the condition check, such as array_exists(x->x%2==0,[2,3,4])
to check whether any element in the array is even. It returns 1. To check whether all elements meet the condition, use array_all
array_filter
array_filter(func, array)
returns an array containing only the element that matches the condition of the specified function. e.g. array_filter(x->x%2==0, [1,2,3,4])
returns [2,4]
array_first
array_first(func, array)
returns the first element that matches the condition of the specified function. e.g. array_first(x->x%2==0, [1,2,3,4])
returns 2.
array_first_index
array_first_index(func, array)
returns the index of the first element that matches the condition of the specified function. e.g. array_first_index(x->x%2==0, [1,2,3,4])
returns 2.
array_last
array_last(func, array)
returns the last element that matches the condition of the specified function. e.g. array_last(x->x%2==0, [1,2,3,4])
returns 4. If nothing found, return 0.
array_last_index
array_last_index(func, array)
returns the index of the last element that matches the condition of the specified function. e.g. array_last_index(x->x%2==0, [1,2,3,4])
returns 4. If nothing found, return 0.
array_map
array_map(func, array)
apply the function to every element in the array and returns a new array. e.g. array_map(x->x*x,[1,2])
returns [1,4]
array_max
array_max(func, array)
apply the function to every element in the array and then returns the maximum value e.g. array_max(x->x*x,[1,2])
returns 4
array_min
array_min(func, array)
apply the function to every element in the array and then returns the minimum value e.g. array_min(x->x*x,[1,2])
returns 1
array_sort
array_sort(func, array)
sorts the array elements in asecending order. e.g. array_sort([3,2,5,4])
returns [2,3,4,5]. You can pass a lambda function to it as the first argument to apply the function before the sort, e.g. array_sort(x->-x,[3,2,5,4])
returns [5,4,3,2]
array_sum
array_sum([func,] array)
returns the sum value in the array. For example, array_sum([2,6])
return 8. You can pass a lambda function to it as the first argument to apply on each element before calculating the sum, such as array_sum(x->x*x,[2,6])
to get the sum for 2*2 and 6*6, which is 40.
map[key]
You can easily access any element in the map, just using mapName[keyName], such as kv['key1']
map_cast
map_cast(array1, array2)
to generate a map with keys from array1
and values from array2
(these 2 arrays should be with same size). For example map_cast(['k1','k2'],[91,95])
will get {'k1':91,'k2':95}
Alternatively, you can use map_cast(key1,value1,key2,value2..)
tuple_cast
tuple_cast(item1,item2)
to generate a tuple with these 2 elements. You can also use the shortcut syntax: (item1,item2)
to create the tuple directly.
Process Date and Time
year
year(date)
Get the year, for example year(today())
will be 2022
.
quarter
quarter(date)
Get the quarter, for example quarter(today())
will be 1
if it's in Q1.
month
month(date)
Get the month, for example month(today())
will be 2
if it's Feb.
day
day(date)
Get the the day in the month.
weekday
weekday(date)
Get the the day in the week. Monday is 1. Sunday is 7.
day_of_year
day_of_year(date)
Get the number of the day of the year (1-366).
hour
hour(datetime)
Get the hour of the datetime.
minute
minute(datetime)
Get the minute of the datetime.
second
second(datetime)
Get the second of the datetime.
to_unix_timestamp
Get the UNIX timestamp of the datetime. Returns a number in uint32
For example to_unix_timestamp(now())
gets 1644272032
to_start_of_year
to_start_of_year(date)
Rounds down a date or date with time to the first day of the year. Returns the date.
to_start_of_quarter
to_start_of_quarter(date)
Rounds down a date or date with time to the first day of the quarter. Returns the date.
to_start_of_month
to_start_of_month(date)
Rounds down a date or date with time to the first day of the month. Returns the date.
to_start_of_day
to_start_of_day(date)
Rounds down a date with time to the start of the day.
to_start_of_hour
to_start_of_hour(datetime)
Rounds down a date or date with time to the start of the hour.
to_start_of_minute
to_start_of_minute(datetime)
Rounds down a date or date with time to the start of the minute.
to_start_of_second
to_start_of_second(datetime64)
Rounds down a date or date with time to the start of the second.
Unlike other to_start_of_
functions, this function expects a datetime with millisecond, such as to_start_of_second(now64())
to_date
to_date(string)
convert a date string to a date type, e.g. to_date('1953-11-02')
to_datetime
to_datetime(value)
convert the value to a datetime type, e.g. to_datetime(1655265661)
or to_datetime(today())
today
today()
Returns the current date.
to_YYYYMM
to_YYYYMM(date)
Get a number. For example to_YYYYMM(today())
will get the number 202202
to_YYYYMMDD
to_YYYYMMDD(date)
Get a number.
to_YYYYMMDDhhmmss
to_YYYYMMDDhhmmss(date)
Get a number.
to_timezone
to_timezone(datetime_in_a_timezone,target_timezone)
Convert the datetime from one timezone to the other.
For the full list of possible timezones, please check "TZ database name" column in the wikipedia page. For the common timezones, pls check to_time
For example,
SELECT
to_time('2022-05-16', 'America/New_York') AS t1, to_timezone(t1, 'UTC') AS t2
Output:
t1 | t2 |
---|---|
2022-05-16 00:00:00.000 | 2022-05-16 04:00:00.000 |
format_datetime
format_datetime(time,format,timezone)
Format the datetime as a string. The 3rd argument is optional. The following placeholders are supported
Placeholder | Description | Output String |
---|---|---|
%Y | Year with 4 digits | 2022 |
%y | Year with 2 digits | 22 |
%m | Month with 2 digits | 01 |
%d | Day with 2 digits | 02 |
%F | short YYYY-MM-DD date, equivalent to %Y-%m-%d | 2022-01-02 |
%D | short MM/DD/YY date, equivalent to %m/%d/%y | 01/02/22 |
%H | Hour with 2 digits (00-23) | 13 |
%M | Minute with 2 digits (00-59) | 44 |
%S | Second with 2 digits (00-59) | 44 |
%w | Weekday as a decimal number with Sunday as 0 (0-6) | 1 |
date_diff
date_diff(unit,begin,end)
Calculate the difference between begin
and end
and produce a number in unit
. For example date_diff('second',window_start,window_end)
date_diff_within
date_diff_within(timegap,time1, time2)
Return true or false. This function only works in stream-to-stream join. Check whether the gap between time1
and time2
are within the specific range. For example date_diff_within(10s,payment.time,notification.time)
to check whether the payment time and notification time are within 10 seconds or less.
date_trunc
date_trunc(unit, value[, timezone])
Truncates date and time data to the specified part of date. For example, date_trunc('month',now())
to get the datetime at the beginning of the current month. Possible unit values are:
- year
- quarter
- month
- day
- hour
- minute
- second
date_add
It supports both date_add(unit, value, date)
and a shortcut solution data_add(date,timeExpression)
date_add(HOUR, 2, now())
will get a new datetime in 2 hours, whiledate_add(HOUR, -2, now())
will get a new datetime 2 hours back.date_add(now(),2h)
anddate_add(now(),-2h)
also work
date_sub
It supports both date_sub(unit, value, date)
and a shortcut solution data_sub(date,timeExpression)
date_sub(HOUR, 2, now())
will get a new datetime 2 hours backdate_sub(now(),2h)
also work
earliest_timestamp
earliest_timestamp()
returns "1970-1-1 00:00:00"
earliest_ts
earliest_ts()
shortcut for earliest_timestamp()
Process JSON
json_extract_int
json_extract_int(json, key)
to get the integer value from the specified JSON document and key. For example json_extract_int('{"a":10,"b":3.13}','a')
will get the number 10
json_extract_float
json_extract_float(json, key)
to get the float value from the specified JSON document and key. For example json_extract_int('{"a":10,"b":3.13}','b')
will get the float value 3.13
json_extract_bool
json_extract_bool(json, key)
to get the boolean value from the specified JSON document and key. For example json_extract_bool('{"a":true}','a')
will get the boolean value true
or 1
json_extract_string
json_extract_string(json, key)
to get the string value from the specified JSON document and key. For example json_extract_string('{"a":true,"b":{"c":1}}','b')
will get the string value {"c":1}
and you can keep applying JSON functions to extract the values.
json_extract_array
json_extract_array(json, key)
to get the array value from the specified JSON document and key. For example json_extract_array('{"a": "hello", "b": [-100, 200.0, "hello"]}', 'b')
will get the array value ['-100','200','"hello"']
If the entire JSON document is an array, the 2nd parameter key
can be omitted to turn the json string as an array, e.g. json_extract_array(arrayString)
json_extract_keys
json_extract_keys(jsonStr)
to parse the JSON string and extracts the keys. e.g. select '{"system_diskio_name":"nvme0n1"}' as tags,json_extract_keys(tags)
will get an array: [ "system_diskio_name" ]
is_valid_json
is_valid_json(str)
to check whether the given string is a valid JSON or not. Return true(1) or false(0)
json_has
json_has(json, key)
to check whether specified key exists in the JSON document. For example json_has('{"a":10,"b":20}','c')
returns 0(false).
json_value
json_value(json, path)
allows you to access the nested JSON objects. For example, json_value('{"a":true,"b":{"c":1}}','$.b.c')
will return the number 1
json_query
json_query(json, path)
allows you to access the nested JSON objects as JSON array or JSON object. If the value doesn't exist, an empty string will be returned. For example, json_query('{"a":true,"b":{"c":1}}','$.b.c')
will return an array with 1 element [1]
In a more complex example, json_query('{"records":[{"b":{"c":1}},{"b":{"c":2}},{"b":{"c":3}}]}','$.records[*].b.c')
will get [1,2,3]
Process text
lower
lower(str)
Converts ASCII Latin symbols in a string to lowercase.
upper
upper(str)
Converts ASCII Latin symbols in a string to uppercase.
format
format(template,args)
Formatting constant pattern with the string listed in the arguments.
For example, format('{} {}', 'Hello', 'World')
gets Hello World
concat
concat(str1,str2 [,str3])
Combine 2 or more strings as a single string. For example, concat('95','%')
to get 95%. You can also use ||
as the shortcut syntax, e.g. '95' || '%'
Each parameter in this function need to be a string. You can use to_string function to convert them, for example to_string(95) || '%'
substr
substr(str,index [,length])
Returns the substring of str
from index
(starting from 1). length
is optional.
trim
trim(string)
Removes all specified characters from the start or end of a string. By default removes all consecutive occurrences of common whitespace (ASCII character 32) from both ends of a string.
split_by_string
split_by_string(sep,string)
Splits a string into substrings separated by a string. It uses a constant string sep
of multiple characters as the separator. If the string sep
is empty, it will split the string string
into an array of single characters.
For example split_by_string('b','abcbxby')
will get an array with string ['a','c','x','y']
match
match(string,pattern)
determine whether the string matches the given regular expression. For example, to check whether the text contains a sensitive AWS ARN, you can run match(text,'arn:aws:kms:us-east-1:\d{12}:key/.{36}')
multi_search_any
multi_search_any(text, array)
determine whether the text contains any of string from the given array. For example, to check whether the text contains any sensitive keywords, you can run multi_search_any(text,['password','token','secret'])
replace_one
replace_one(string,pattern,replacement)
Replace pattern
with the 3rd argument replacement
in string
.
For example replace_one('abca','a','z')
will get zbca
replace
replace(string,pattern,replacement)
Replace pattern
with the 3rd argument replacement
in string
.
For example replace('aabc','a','z')
will get zzbc
replace_regex
replace_regex(string,pattern,replacement)
Replaces all occurrences of the pattern.
This can be used to mask data, e.g. to hide the full phone number, you can run replace_regex('604-123-4567','(\\d{3})-(\\d{3})-(\\d{4})','\\1-***-****')
to get 604-***-****
extract
Process plain text with regular expression and extract the content. For example, extract('key1=value1, key2=value2','key1=(\\w+)')
, this will get “value1”. If the log lines are put into a single text column, you can create a view with the extracted fields, e.g.
create view logs as
select extract(value, 'key1=(\\w+)') as key1,
extract(value, 'key2=(\\w+)') as key2
from log_stream
extract_all_groups
extract_all_groups(haystack, pattern)
Matches all groups of the haystack
string using the pattern
regular expression. Returns an array of arrays, where the first array includes keys and the second array includes all values.
SELECT
extract_all_groups('v1=111, v2=222, v3=333', '("[^"]+"|\\w+)=("[^"]+"|\\w+)') as groups
-- return [ [ "v1", "v2", "v3" ], [ "111", "222", "333" ] ]
extract_all_groups_horizontal
extract_all_groups_horizontal(haystack, pattern)
Matches all groups of the haystack
string using the pattern
regular expression. Returns an array of arrays, where the first array includes all fragments matching the first group, the second array matching the second group, etc.
SELECT
extract_all_groups('v1=111, v2=222, v3=333', '("[^"]+"|\\w+)=("[^"]+"|\\w+)') as groups
-- [ [ "v1", "111" ], [ "v2", "222" ], [ "v3", "333" ] ]
grok
grok(string,pattern)
Extract value from plan text without using regular expression. e.g. SELECT grok('My name is Jack. I am 23 years ago.','My name is %{DATA:name}. I am %{INT:age} years ago.') as m
will get {"name":"Jack","age":"23"}
as the m
.
Please note all keys and values in the returned map is in string type. You can convert them to other type, e.g. (m['age'])::int
coalesce
coalesce(value1, value2,..)
Checks from left to right whether NULL
arguments were passed and return the first non-NULL
argument. If you get error messages related to Nullable type, e.g. "Nested type array(string) cannot be inside Nullable type", you can use this function to turn the data into non-NULL
For example json_extract_array(coalesce(raw:payload, ''))
hex
hex(argument)
Returns a string containing the argument’s hexadecimal representation. argument
can be any type.
Unique identifier
uuid
uuid()
or uuid(x)
Generate a universally unique identifier (UUID) which is a 16-byte number used to identify records. In order to generate multiple UUID in one row, pass a parameter in each function call, such as SELECT uuid(1) as a, uuid(2) as b
Otherwise if there is no parameter while calling multiple uuid
functions in one SQL statement, the same UUID value will be returned.
Logic
if
if(condition,yesValue,noValue)
Controls conditional branching.
For example if(1=2,'a','b')
will get b
multi_if
multi_if(condition1, then1, condition2, then2.. ,else)
An easier way to write if/self or case/when.
Aggregation
count
count(*)
to get the row number, or count(col)
to get the number of rows when col
is not NULL
count_distinct
count_distinct(col)
to get the number of unique value for the col
column. Same as count(distinct col)
count_if
count_if(condition)
to apply a filter with condition
and get the number of records. e.g. count_if(speed_kmh>80)
distinct
distinct(col)
to get the distinct value for the col
column.
unique
unique(<column_name1>[, <column_name2>, ...])
: Calculates the approximate number of different values of the columns.
unique_exact
unique_exact(<column_name1>[, <column_name2>, ...])
Calculates the exact number of different values of the columns.
unique_exact_if
unique_exact_if(col,condition)
to apply a filter with condition
and get the distincat count of col
, e.g. to get the cars with high speed unique_exact_if(cid,speed_kmh>80)
min
min(<column_name>)
: minimum value of a column. For String column, the comparison is lexicographic order.
max
max(<column_name>)
: maximum value of a column. For String column, the comparison is lexicographic order.
sum
sum(<column_name>)
: sum of the columns. Only works for numerics.
avg
avg(<column_name>)
: average value of a column (sum(column) / count(column)). Only works for numeric column.
median
median(<column_name>)
Calculate median of a numeric data sample.
quantile
quantile(column,level)
Calculate an approximate quantile of a numeric data sequence. e.g. quantile(a,0.9)
to get the P90 for the column and quantile(a,0.5)
to get the median number
p90
short for quantile(a,0.9)
p95
short for quantile(a,0.95)
p99
short for quantile(a,0.99)
top_k
top_k(<column_name>,K [,true/false])
: Top frequent K items in column_name. Return an array.
e.g. top_k(cid, 3)
may get [('c01',1200),('c02,800)',('c03',700)]
if these 3 ids appear most frequently in the aggregation window.
If you don't need the event count, you can set false for the 3rd parameter, e.g. top_k(cid, 3, false)
may get ['c01','c02','c03']
min_k
min_k(<column_name>,K [,context_column])
: The least K items in column_name. Return an array. You can also add a list of columns to get more context of the values in same row, such as min_k(price,3,product_id,last_updated)
This will return an array with each element as a tuple, such as [(5.12,'c42664'),(5.12,'c42664'),(15.36,'c84068')]
max_k
max_k(<column_name>,K[,context_column])
: The greatest K items in column_name. You can also add a list of columns to get more context of the values in same row, such as max_k(price,3,product_id,last_updated)
arg_min
arg_min(argument, value_column)
Gets the value in argument
column for a minimal value in value_column
. If there are several different values of argument
for minimal values of value_column
, returns the first of these values encountered. You can achieve the same query with min_k(value_column,1, argument)[1].2
. But this is much easier.
arg_max
arg_max(argument, value_column)
Gets the value in argument
column for a maximum value in value_column
. If there are several different values of argument
for maximum values of value_column
, returns the first of these values encountered. You can achieve the same query with max_k(value_column,1, argument)[1].2
. But this is much easier.
group_array
group_array(<column_name>)
to combine the values of the specific column as an array. For example, if there are 3 rows and the values for this columns are "a","b","c". This function will generate a single row and single column with value ['a','b','c']
moving_sum
moving_sum(column)
returns an array with the moving sum of the specified column. For example, select moving_sum(a) from(select 1 as a union select 2 as a union select 3 as a)
will return [1,3,6]
Math
abs
abs(value)
returns the absolute value of the number. If the a<0, then return -a.
round
round(x [,N])
Rounds a value to a specified number of decimal places.
- If
N
is omitted, we consider N as 0 and the function rounds the value to the near integer, e.g.round(3.14)
as 3 - If
N
>0, the function rounds the value to the right of the decimal point, e.g.round(3.14,1)
as 3.1 - If
N
<0, the function rounds the value to the left of the decimal point. e.g.round(314.15,-2)
as 300
e
e()
returns a float
number that is close to the number e
pi
pi()
returns a float
number that is close to the number π
exp
exp(x)
returns a float
number that is close to the exponent of the argument x
.
exp2
exp2(x)
returns a float
number that is close to 2 to the power of x
.
exp10
exp10(x)
returns a float
number that is close to 10 to the power of x
.
log
log(x)
returns a float
number that is close to the natural logarithm of the argument x
.
log2
log2(x)
returns a float
number that is close to the binary logarithm of the argument x
.
log10
log10(x)
returns a float
number that is close to the decimal logarithm of the argument x
.
sqrt
sqrt(x)
returns a float
number that is close to the square root of the argument x
.
cbrt
cbrt(x)
returns a float
number that is close to the cubic root of the argument x
.
lgamma
lgamma(x)
the logarithm of the gamma function.
tgamma
tgamma(x)
the gamma function
sin
sin(x)
the sine
cos
cos(x)
the cosine
tan
tan(x)
the tangent
asin
asin(x)
the arc sine
acos
acos(x)
the arc cosine
atan
atan(x)
the arc tangent
pow
pow(x,y)
returns a float
number that is close to x
to the power of y
power
power(x,y)
returns a float
number that is close to x
to the power of y
sign
sign(x)
returns the sign of the number x
. If x<0, return -1. If x>0, return 1. Otherwise, return 0.
degrees
degrees(x)
converts the input value in radians to degrees. E.g. degress(3.14)
returns 180.
radians
radians(x)
converts the input value in degrees to radians . E.g. radians(180)
returns 3.14.
is_finite
is_finite(x)
return 1 when the value x
is not infinite and not a NaN, othewise return 0.
is_infinite
is_infinite(x)
to return 1 when the value x
is infinite, otherwise return 0.
is_nan
is_nan(x)
to return 1 if the x
is Not-a-Number(NaN), otherwise return 0.
Hash
Hash functions can be used for the deterministic pseudo-random shuffling of elements.
md5
md5(string)
Calculates the MD5 from a string and returns the resulting set of bytes as fixed_string(16)
. If you want to get the same result as output by the md5sum utility, use lower(hex(md5(s)))
.
md4
md4(string)
Calculates the MD4 from a string and returns the resulting set of bytes as fixed_string(16)
.
Financial
xirr
Calculates the internal rate of return of an investment based on a specified series of potentially irregularly spaced cash flows.
xirr(cashflow_column,date_column [, rate_guess])
Geo Location
point_in_polygon
Checks whether the point belongs to the polygon. point_in_polygon((x,y),[(a,b),(c,d)..])
e.g. SELECT point_in_polygon((3., 3.), [(6, 0), (8, 4), (5, 8), (0, 2)]) AS res
returns 1
since the point (3,3) is in the defined polygon.
geo_distance
Calculates the distance on WGS-84 ellipsoid. geo_distance(lon1,lat1,lon2,lat2)
Streaming processing
table
table(stream)
turns the unbounded data stream as a bounded table, and query its historical data. For example, you may load the clickstream data from a Kafka topic into the clicks
stream in Timeplus. By default, if you run SELECT .. FROM clicks ..
This is a streaming query with unbounded data. The query will keep sending you new results whenever it's available. If you only need to analyze the past data, you can put the stream into the table
function. Taking a count
as an example:
- running
select count(*) from clicks
will show latest count every 2 seconds and never ends, until the query is cancelled by the user - running
select count(*) from table(clicks)
will return immediately with the row count for the historical data for this data stream.
You can create views such as create view histrical_view as select * from table(stream_name)
, if you want to query the data in the table mode more than once. This may work well for static data, such as lookup information(city names and their zip code).
Learn more about Non-streaming queries.
tumble
tumble(stream [,timeCol], windowSize)
Create a tumble window view for the data stream, for example tumble(iot,5s)
will create windows for every 5 seconds for the data stream iot
. The SQL must end with group by
with either window_start
or window_end
or both.
hop
hop(stream [,timeCol], step, windowSize)
Create a hopping window view for the data stream, for example hop(iot,1s,5s)
will create windows for every 5 seconds for the data stream iot
and moving the window forwards every second. The SQL must end with group by
with either window_start
or window_end
or both.
session
session(stream [,timeCol], idle, [maxLength,] [startCondition,endCondition] )
Create dynamic windows based on the activities in the data stream.
Parameters:
stream
a data stream, a view, or a CTE/subquerytimeCol
optional, by default it will be__tp_time
(the event time for the record)idle
how long the events will be automatically splitted to 2 session windowsmaxLength
the max length of the session window. Optional. Default value is the 5 times ofidle
[startCondition, endCondition]
Optional. If specified, the session window will start when thestartCondition
is met and will close whenendCondition
is met. You can use[expression1, expression2]
to indicate start and end events will be included in the session, or(expression1, expression2]
to indicate the ending events will be included but not the starting events.
For example, if the car keeps sending data when it's moving and stops sending data when it's parked or waiting for the traffic light
session(car_live_data, 1m) partition by cid
will create session windows for each car with 1 minute idle time. Meaning if the car is not moved within one minute, the window will be closed and a new session window will be created for future events. If the car keeps moving for more than 5 minutes, different windows will be created (every 5 minutes), so that as analysts, you can get near real-time results, without waiting too long for the car to be stopped.session(car_live_data, 1m, [speed>50,speed<50)) partition by cid
create session windows to detect when the car is speeding. The first event with speed over 50 will be included, and the last event with speed lower than 50 will not be included in the session window.session(access_log, 5m, [action='login',action='logout']) partition by uid
create session windows when the user logins the system and logout. If there is no activity within 5 minutes, the window will be closed automatically.
dedup
dedup(stream, column1 [,otherColumns..] [liveInSecond,limit])
Apply the deduplication at the given data stream with the specified column(s). liveInSecond
is specify how long the keys will be kept in the memory/state. By default forever. But if you only want to avoid duplicating within a certain time peroid, say 2 minutes, you can set 120s
, e.g. dedup(subquery,myId,120s)
The last parameter limit
is optional which is 100000
by default. It limits the max unique keys maintained in the query engine. If the limit reaches, the system will recycle the earliest keys to maintain this limit.
You can cascade this table function like tumble(dedup(table(....
and so far the wrapping order must in this sequence : tumble/hop/session -> dedup -> table.
lag
lag(<column_name> [, <offset=1>][, <default_value>])
: Work for both streaming query and historical query. If you omit the offset
the last row will be compared. E.g.
lag(total)
to get the value of total
from the last row. lag(total, 12)
to get the value from 12 rows ago. lag(total, 12, 0)
to use 0 as the default value if the specified row is not available.
lags
lags(<column_name>, begin_offset, end_offset [, <default_value>])
simliar to lag
function but can get a list of value. e.g. lags(total,1,3)
will return an array for the last 1, last 2 and last 3 values.
latest
latest(<column_name>)
get the latest value for specific column, working with streaming aggregation with group by.
earliest
earliest(<column_name>)
get the earliest value for specific column, working with streaming aggregation with group by.
now
now()
Show the current date time, such as 2022-01-28 05:08:16
If the now() is used in a streaming query, no matter SELECT
or WHERE
or tumble/hop
window, it will reflect the current time when the row is projected.
now64
Similar to now()
but with extra millisecond information, such as 2022-01-28 05:08:22.680
It can be also used in streaming query to show latest datetime with millisecond.
emit_version
emit_version()
to show an auto-increasing number for each emit of streaming query result. It only works with streaming aggregation, not tail or filter.
For example, if you run select emit_version(),count(*) from car_live_data
the query will emit results every 2 seconds and the first result will be with emit_version=0, the second result with emit_version=1. This function is particularly helpful when there are multiple rows in each emit result. For example, you can run a tumble window aggregation with a group by. All results from the same aggregation window will be in the same emit_version. You can then show a chart with all rows in the same aggregation window.