Aggregation

count

count(*) to get the row number, or count(col) to get the number of rows when col is not NULL

count_distinct

count_distinct(col) to get the number of unique values for the col column. Same as count(distinct col)

count_if

count_if(condition) to apply a filter with condition and get the number of records. e.g. count_if(speed_kmh>80)

distinct

distinct(col)to get the distinct value for the col column.

unique

unique(<column_name1>[, <column_name2>, ...]): Calculates the approximate number of different values of the columns.

unique_exact

unique_exact(<column_name1>[, <column_name2>, ...])Calculates the exact number of different values of the columns.

unique_exact_if

unique_exact_if(col,condition) to apply a filter with condition and get the distinct count of col, e.g. to get the cars with high speed unique_exact_if(cid,speed_kmh>80)

min

min(<column_name>): minimum value of a column. For String column, the comparison is lexicographic order.

max

max(<column_name>): maximum value of a column. For String column, the comparison is lexicographic order.

sum

sum(<column_name>): sum of the columns. Only works for numerics.

avg

avg(<column_name>): average value of a column (sum(column) / count(column)). Only works for numeric columns.

median

median(<column_name>) Calculate median of a numeric data sample.

quantile

quantile(column,level)Calculate an approximate quantile of a numeric data sequence. e.g. quantile(a,0.9)to get the P90 for the column and quantile(a,0.5) to get the median number

p90

short for quantile(a,0.9)

p95

short for quantile(a,0.95)

p99

short for quantile(a,0.99)

top_k

top_k(<column_name>,K [,true/false]): Top frequent K items in column_name. Return an array.

e.g. top_k(cid, 3) may get [('c01',1200),('c02,800)',('c03',700)] if these 3 ids appear most frequently in the aggregation window.

If you don't need the event count, you can set false for the 3rd parameter, e.g. top_k(cid, 3, false) may get ['c01','c02','c03']

min_k

min_k(<column_name>,K [,context_column]): The least K items in column_name. Return an array. You can also add a list of columns to get more context of the values in same row, such as min_k(price,3,product_id,last_updated) This will return an array with each element as a tuple, such as [(5.12,'c42664'),(5.12,'c42664'),(15.36,'c84068')]

max_k

max_k(<column_name>,K[,context_column]): The greatest K items in column_name. You can also add a list of columns to get more context of the values in same row, such as max_k(price,3,product_id,last_updated)

arg_min

arg_min(argument, value_column) Gets the value in the argument column for a minimal value in the value_column. If there are several different values of argument for minimal values of value_column, it returns the first of these values encountered. You can achieve the same query with min_k(value_column,1, argument)[1].2 . But this is much easier.

arg_max

arg_max(argument, value_column) Gets the value in the argument column for a maximum value in the value_column. If there are several different values of argument for maximum values of value_column, it returns the first of these values encountered. You can achieve the same query with max_k(value_column,1, argument)[1].2 . But this is much easier.

group_array

group_array(<column_name>) to combine the values of the specific column as an array. For example, if there are 3 rows and the values for these columns are "a","b","c". This function will generate a single row and single column with value ['a','b','c']

group_uniq_array

group_uniq_array(<column_name>) to combine the values of the specific column as an array, making sure only unique values in it. For example, if there are 3 rows and the values for these columns are "a","a","c". This function will generate a single row and single column with value ['a','c']

moving_sum

moving_sum(column) returns an array with the moving sum of the specified column. For example, select moving_sum(a) from(select 1 as a union select 2 as a union select 3 as a) will return [1,3,6]

any

any(column) Selects the first encountered (non-NULL) value, unless all rows have NULL values in that column. The query can be executed in any order and even in a different order each time, so the result of this function is indeterminate. To get a determinate result, you can use the min or max function instead of any.

last_value

last_value(column) Selects the last encountered value.

Aggregation

count​

count_distinct​

count_if​

distinct​

unique​

unique_exact​

unique_exact_if​

min​

max​

sum​

avg​

median​

quantile​

p90​

p95​

p99​

top_k​

min_k​

max_k​

arg_min​

arg_max​

group_array​

group_uniq_array​

moving_sum​

any​

last_value​