Skip to main content

Local UDF in JavaScript

In addition to Remote UDF, Timeplus Proton also supports JavaScript-based UDF running in the SQL engine. You can develop User-defined scalar functions (UDFs) or User-defined aggregate functions (UDAFs) with modern JavaScript (powered by V8). No need to deploy extra server/service for the UDF. More languages will be supported in the future.

info

The JavaScript-based UDF can run in both Timeplus and Proton local deployments. It runs "locally" in the database engine. It doesn't mean this feature is only available for local deployment.

Register a JS UDF via SQL

Please check CREATE FUNCTION page for the SQL syntax.

Register a JS UDF via Web Console

  1. Open "UDFs" from the navigation menu on the left, and click the 'Register New Function' button.
  2. Specify a function name, such as second_max. Make sure the name won't conflict with built-in functions or other UDF. Description is optional.
  3. Choose the data type for input parameters and return value.
  4. Choose "JavaScript" as the UDF type.
  5. Specify whether the function is for aggregation or not.
  6. Enter the JavaScript source for the UDF. (We will explain more how to write the code.)
  7. Click Create button to register the function.

Arguments

Unlike Remote UDF, the argument names don't matter when you register a JS UDF. Make sure you the list of arguments matches the input parameter lists in your JavaScript function.

The input data are in Timeplus data type. They will be converted to JavaScript data type.

Timeplus Data TypesJavaScript Data Types
int8/16/32/64, uint8/16/32/64,float32/64number
boolboolean
fixed_string/stringstring
date/date32/datetime/datetime64Date (in milliseconds)
array(Type)Array

Returned value

The JavaScript UDF can return the following data types and they will be converted back to the specified Timeplus data types. The supported return type are similar to argument types. The only difference is that if you return a complex data structure as an object, it will be converted to a named tuple in Timeplus.

JavaScript Data TypesTimeplus Data Types
numberint8/16/32/64, uint8/16/32/64,float32/64
booleanbool
stringfixed_string/string
Date (in milliseconds)date/date32/datetime/datetime64
Arrayarray(Type)
objecttuple

Develop a scalar function

A scalar function is a function that returns one value per invocation; in most cases, you can think of this as returning one value per row. This contrasts with Aggregate Functions, which returns one value per group of rows.

Scalar function with 1 argument

For example, you would like to check whether the user sets a work email in their profile. Although this could be doable with plain SQL but it'll be nice if you can create a UDF to make the SQL more readable, e.g.

SELECT * FROM user_clicks where is_work_email(email)

You can use the following code to define a new function is_work_email with one input type string and return bool.

function is_work_email(values){
return values.map(email=>!email.endsWith("@gmail.com"));
}

Notes:

  1. The first line defines a function with the exact same name as the UDF. The number of arguments should match what you specify in the UDF form.
  2. Please note the input is actually a JavaScript list. For the sake of high performance, Timeplus will reduce the number of function calls by combining the arguments together. You need to return a list with the exact same length of the input.
  3. values.map(..) creates a new array populated with the results of calling a provided function on every element in the calling array (doc).
  4. email=>email.endsWith("@gmail.com") is the shortcut to return a bool by checking whether the email ends with "@gmail.com". You can add more complex logic, or write in multiple lines and end with return ...

Scalar function with 2 or more arguments

Let's enhance the previous example, by defining a list of email domains which won't be considered as work-related. e.g.

SELECT * FROM user_clicks where email_not_in(email,'gmail.com,icloud.com,live.com')

Similar to the last tutorial, you create a new function called email_not_in. This time you specify two arguments in string. Note: currently JS UDF doesn't support complex data types, such as array(string).

The following code implements this new function:

function email_not_in(emails,lists){
let list=lists[0].split(','); // convert string to array(string)
return emails.map(email=>{
for(let i=0;i<list.length;i++){
if(email.endsWith('@'+list[i]))
return false; // if the email ends with any of the domain, return false, otherwise continue
}
return true; // no match, return true confirming the email is in none of the provided domains
});
}

Scalar function with no argument

Currently we don't support JS UDF without arguments. As a workaround, you can define a single argument, e.g.

SELECT *, magic_number(1) FROM user_clicks

The magic_number takes an int argument as a workaround.

function magic_number(values){
return values.map(v=>42)
}

In this case, the function will return 42 no matter what parameter is specified.

Develop an aggregate function

An aggregate function returns one value per group of rows. When you register the UDF, make sure you turn on the option to indicate this is an aggregation function. Compared to scalar functions, the life cycle is a bit more complex.

3 required and 3 optional functions

Let's take an example of a function to get the second maximum values from the group.

OrderFunctionRequired?DescriptionExample
1initialize()YesInitialize the states.function(){
this.max=-1.0;
this.sec_max=-1.0;
}
2process(args..)YesMain logic for the functionfunction(values){
values.map(..)
}
3finalize()YesReturn the final aggregation resultfunction(){
return this.sec_max
}
4serialize()NoSerialize JS internal state to a string, so that Timeplus can persist for failover/recovery.function(){
return JSON.stringify({'max':this.max,'sec_max':this.sec_max})
}
5deserialize(str)NoOpposite to serialize(). Read the string and convert back to JS internal state.function(str){
let s=JSON.parse(str);
this.max=s['max'];
this.sec_max=s['sec_max'];
}
6merge(str)NoMerges two states into one. Used for multiple shards processing.function(str){
let s=JSON.parse(str);
if..else..}

Example: get second largest number

The full source code for this JS UDAF is

{
initialize: function() {
this.max = -1.0;
this.sec_max = -1.0;
},

process: function(values) {
for (let i = 0; i < values.length; i++) {
this._update(values[i]);
}
},

_update: function(value) {
if (value > this.max) {
this.sec_max = this.max;
this.max = value;
} else if (value > this.sec_max) {
this.sec_max = value;
}
},

finalize: function() {
return this.sec_max
},

serialize: function() {
return JSON.stringify({
'max': this.max,
'sec_max': this.sec_max
});
},

deserialize: function(state_str) {
let s = JSON.parse(state_str);
this.max = s['max'];
this.sec_max = s['sec_max']
},

merge: function(state_str) {
let s = JSON.parse(state_str);
this._update(s['max']);
this._update(s['sec_max']);
}
};

To register this function, steps are different in Timeplus Cloud and Proton:

  • With Timeplus UI: choose JavaScript as UDF type, make sure to turn on 'is aggregation'. Set the function name say second_max (you don't need to repeat the function name in JS code). Add one argument in float type and set return type to float too. Please note, unlike JavaScript scalar function, you need to put all functions under an object {}. You can define internal private functions, as long as the name won't conflict with native functions in JavaScript, or in the UDF lifecycle.
  • With SQL in Proton Client: check the example at here.

Advanced Example for Complex Event Processing

User-Defined Aggregation Function can be used for Complex Event Processing (CEP). Here is an example to count the number of failed login attempts for the same user. If there are more than 5 failed logins, create an alert message. If there is a successful login, reset the counter. Assuming the stream name is logins , with timestamp, user, login_status_code, this SQL can continuously monitor the login attempts:

SELECT window_start, user, login_fail_event(login_status_code)
FROM hop(logins, 1m, 1h) GROUP BY window_start, user

The UDAF is registered in this way:

CREATE AGGREGATE FUNCTION login_fail_event(msg string)
RETURNS string LANGUAGE JAVASCRIPT AS $$
{
has_customized_emit: true,

initialize: function() {
this.failed = 0; //internal state, number of login failures
this.result = [];
},

process: function (events) {
for (let i = 0; i < events.length; i++) {
if (events[i]=="failed") {
this.failed = this.failed + 1;
}
else if (events[i]=="ok") {
this.failed = 0; //reset to 0 if there is login_ok before 5 login_fail
}

if (this.failed >= 5) {
this.result.push("alert"); //we can also attach a timestamp
this.failed = 0; //reset to 0 there are 5 login_fail
}
}
return this.result.length; //show the number of alerts for the users
},

finalize: function () {
var old_result = this.result;
this.initialize();
return old_result;
},

serialize: function() {
let s = {
'failed': this.failed
};
return JSON.stringify(s);
},

deserialize: function (state_str) {
let s = JSON.parse(state_str);
this.failed = s['failed'];
},

merge: function(state_str) {
let s = JSON.parse(state_str);
this.failed = this.failed + s['failed'];
}
}
$$;

There is an advanced setting has_customized_emit. When this is set to true:

  • initialize() is called to prepare a clean state for each function invocation.
  • Proton partitions the data according to group by keys and feeds the partitioned data to the JavaScript UDAF. process(..) is called to run the customized aggregation logic. If the return value of process(..) is 0, no result will be emitted. If a none-zero value is returned by process(..), then finalize() function will be called to get the aggregation result. Proton will emit the results immediately. finalize() function should also reset its state for next aggregation and emit.

Caveats:

  1. One streaming SQL supports up to 1 UDAF with has_customized_emit=true
  2. If there are 1 million unique key, there will be 1 million UDAF invocations and each of them handles its own partitioned data.
  3. If one key has aggregation results to emit, but other keys don't have, then Proton only emit results for that key.

This is an advanced feature. Please contact us or discuss your use case in Community Slack with us.

Debug Tips

console.log

Staring from Timeplus Proton 1.6.5 or Timeplus Enterprise 2.5, you can use console.log(..) to add logging messages. The logs will be available in the server logs, such as /var/log/proton-server/proton-server.log for the Linux-based docker container console.log, e.g.

2024.12.09 19:55:51.585993 [ 34 ] {c4569424-a7b7-4a89-b5a6-2adc22c96628} <Information> JavaScriptUDF(test_add_five_5): show some log
2024.12.09 19:55:51.586039 [ 34 ] {c4569424-a7b7-4a89-b5a6-2adc22c96628} <Information> JavaScriptUDF(test_add_five_5): [1]
2024.12.09 19:55:51.586117 [ 34 ] {c4569424-a7b7-4a89-b5a6-2adc22c96628} <Information> JavaScriptUDF(test_add_five_5): about to return
2024.12.09 19:55:51.586120 [ 34 ] {c4569424-a7b7-4a89-b5a6-2adc22c96628} <Information> JavaScriptUDF(test_add_five_5): [6]

Test the JS UDF without running in Timeplus

To improve the debug efficiency, you can test the JS UDF without running them in Timeplus SQL. Taking the is_work_email UDF as an example, you can create a JS file and run it with node directly, e.g.

//the JS UDF to test
function is_work_email(values){
return values.map(email=>!email.endsWith("@gmail.com"));
}
//create some testing data
var tests=['a@gmail.com','eng@timeplus.com']
//run the UDF
var results=is_work_email(tests)
console.log(results)

Then you can run the JavaScript file with nodejs, e.g.

node test.js
[ false, true ]

You may use any IDE and set breakpoint or watch the variables.