Aggregate

../../_images/aggregateNode.png

The Clario Aggregate node allows you to summarize data to a level specified by any attribute(s) on the input data stream. You may aggregate attributes using functions such as min, max, average, and sum.

The incoming data stream must be sorted by the Group By attribute(s).

Configuration

The Aggregate node has two configuration tabs: Configure and Summary.

Configure Tab

../../_images/aggregate_configureTab.png

Functions

Group By

For each unique value of the Group By attribute(s) the values of the aggregated attribute(s) are summarized per the specified function. That is, each unique value of the selected Group By attribute(s) will correspond to a summarized row on the output data stream. If no Group By attribute is specified the aggregated attribute(s) are summarized per the specified functions across all rows.

To select the Group By attribute(s):

  1. Select ‘Group By’ in the Functions list.
  2. Drag and drop the desired attribute(s) from the Available Attributes list to the Selected Attributes list. The Available Attributes list displays all of the attributes from the data stream attached to the incoming connector.

Attributes selected in the ‘Group By’ list will not be available for use in other aggregation functions.

Other Aggregations

To define aggregations:

  1. Select a function in the Functions list. Available functions include:
Function Type Description
Average Numeric, Date Returns the mathematical average of values
Count Any Counts the number of non-null values following SQL conventions. If the selected attribute is * the count is of all null and non-null values.
Count Distinct Any Counts the number of distinct non-null values
Sum Numeric Returns the sum of values
Minimum Numeric, Date Returns the minimum value
Maximum Numeric, Date Returns the maximum value
First Any Returns the first value
Last Any Returns the last value
  1. Drag and drop the desired attribute(s) from the Available Attributes list to the Selected Attributes list. The Available Attributes list displays all of the attributes of acceptable Type from the data stream attached to the incoming connector.

The same attribute may be selected for aggregation via multiple functions. To remove defined aggregations, select the function from the Functions list, drag and drop the attribute from the Selected Attributes list into the Available Attributes list.

Pivot Groups

Pivot Groups on the Aggregate node allow users to transform data structured in an EAV (Entity-Attribute-Value) model to a more conventional analytic format. It also allows for simplified and more performant transformation of transactional data into customer/household attributes for predictive modeling. It may help to think of this as a “two-dimensional group by”.

To define a pivot group, first click the [+] below the Pivot Groups list and enter a name. This will bring you to the Pivot Group Tab.

Pivot Group Tab
../../_images/aggregate_pivotGroupTab.png

Key Attribute

Drag and drop an attribute from the Available Attributes list into the Key Attribute field. A new attribute will be created for each unique value of the Key Attribute.

Match Values and Attribute Names

The Aggregate node will try to produce a new attribute for each Match Value specified. A Match Value is a unique value of the Key Attribute. The Match Values entered must exactly match the values in the underlying data, including case.

Define the Match Values and the Attribute Names to be created from these values by clicking the [+] on the bottom left of the Key Attribute area. Attribute names may only consist of the following characters: A-Z, a-z, 0-9, ‘-‘, and ‘_’. To remove a match value and attribute name, select the pair from the list and click the [-] button.

Click [Import] to paste a list of unique values for the Key Attribute which will automatically create a match value and attribute name for each value.

Value Attribute

Drag and drop an attribute from the Available Attributes list into the Value Attribute field. After Pivot Group creation, the values of this attribute will be aggregated according to the specified function(s) within each Match Value.

Data Type Tab
../../_images/aggregate_dataTypes.png

The Data Type tab allows you to specify the data types of the attributes created by the Pivot Groups. To set a data type, click on a data type (Number or String) in the Types list, and drag and drop the attribute(s) from the Available Attributes list to the Selected Attributes list. If you need a data type that is not listed, click the [+] at the bottom of the Types list, then select the appropriate dropdown values. To delete a data type, click the [-].

A data type must be specified for each attribute created within the Pivot Group.

Newly created pivot attributes will now be listed in the Available Attributes box when defining aggregations. The naming convention for attributes created by Pivot Groups is ‘Attribute Name | Pivot Group’.

Summary Tab

../../_images/aggregate_summaryTab.png

The Summary tab contains a table summarizing the aggregation. The table includes the name of the source attribute, the role of the attribute, the aggregate function, the name of the outgoing attribute, and the outgoing attribute’s data type. The outgoing attribute names default to (attribute name)_(function).

To rename Outgoing Attributes, click the [Rename] button. In the Rename Attributes dialog, you may manually edit attribute names or paste names copied from a file. Attribute names may only consist of the following characters: A-Z, a-z, 0-9, ‘-‘, and ‘_’.

To export the aggregate summary, click [Export] and enter a filename before downloading the spreadsheet.

Output Stream

The data stream sent to the Aggregate node’s outgoing connector contains the attributes and types displayed in the Summary tab.