Reduce

../../_images/reduceNode.png

The Clario Reduce node is used to reduce the number of numeric attributes by eliminating those with high levels of multi-collinearity. For groups of attributes that are highly correlated with each other, only the attributes most related to the dependent attribute will be retained. First, factor analysis is used to find the unique dimensions of the data. Next, Ordinary Least Squares (OLS) stepwise regression is used to determine which of the attributes from each factor are most strongly related to the dependent attribute. The strongest attributes from each factor survive, and there is typically representation from each factor in the resulting dataset.

Configuration

The Reduce node has four configuration tabs: Dependent Attribute, Select Attributes, Configure, and Write File.

Dependent Attribute Tab

../../_images/reduce_dependentAttributeTab.png

In the Dependent Attribute tab, select the Dependent Attribute by dragging and dropping a numeric attribute from the Available Attributes list to the Dependent Attribute field. Once an attribute is selected as the Dependent Attribute, it will be unavailable for the Candidate or Pass-Through lists in the Select Attributes tab. To remove the Dependent Attribute, click the [-] button.

Select Attributes Tab

../../_images/reduce_selectAttributesTab.png

Available Attributes

The Available Attributes are all of the attributes from the incoming data stream except the Dependent Attribute. Drag and drop these attributes into Candidate Attributes field to evaluate or Pass-Through Attributes field to keep. Attributes remaining in the Available Attributes list will be dropped from the resulting dataset.

Candidate Attributes

Attributes placed in the Candidate Attributes list will be evaluated during the reduce process. Candidate Attributes must have numeric type. At least one attribute must be placed in the Candidate Attributes list.

Pass-Through Attributes

Attributes placed in the Pass-Through Attributes list will be kept regardless of their relationship to other attributes. These will not be evaluated during the reduce process.

Configure Tab

../../_images/reduce_configureTab.png

Factor Analysis Settings

In Factor Analysis Settings, factors are built to represent the unique dimensions of your data. Select the Number of Factors Method to Automatic or Manual.

If Automatic is selected, you need to also specify:

  • Proportion of Variance %: the percent of variance in the dataset you want to explain with the set of factors. Valid range is from 0 to 100 with one digit of decimal precision. Note that higher % of variance will result in more factors. Default is set to 75%.

If Manual is selected, you need to also specify:

  • Number of Factors: the number of factors you want to create to represent the dataset. Valid range is greater than or equal to 1, and less than or equal to the number of attributes in Candidate Attributes in the Select Attributes tab.

Choose the Rotation Method that you want to use in the factor analysis, choices being None, Varimax, and Equamax.

Linear Regression Settings

Select the maximum p value to enter an attribute into each regression and the minimum p value to remove an attribute from each regression. Note that maximum and minimum value cannot be exactly the same.

Reduce Settings

In Reduce Settings, specify the minimum tolerance value to keep an attribute in each regression and the minimum correlation with the Dependent Attribute to keep an attribute in each regression.

Note

For both Linear Regression and Reduce Settings, to keep more attributes, raise the p to enter and p to remove criteria, and/or lower the minimum tolerance. To keep fewer attributes, lower the p to enter and p to remove criteria, and/or raise the minimum tolerance.

Write File Tab

../../_images/reduce_writeFileTab.png

The Write File tab allows the Reduce node to output the newly created dataset to a delimited file. See Write File Tab for configuration details.

The new dataset will contain only the Pass-Through Attributes, the Selected Attributes, and the Dependent Attribute, in this order.

Results

Summary Tab

../../_images/reduce_results.png

Results Summary

  • Number of Factors: number of factors as a result of the factor analysis step
  • Total Attributes Considered: number of non-constant attributes processed in the factor analysis step
  • Total Attributes Selected: number of final attributes in the resulting dataset, after linear regression.

Results Details

The results set contains a table listing all the Candidate, Pass-Through, and Dependent attributes.

  • Attribute: name of the attribute
  • Factor: the factor that the attribute belongs to (represented by a number). A factor of 0 means this attribute was not considered.
  • Reason: reason the attribute was either kept or dropped.
Reason Description
Selected This attribute is one of the final selected attributes.
Nonsignif_Rejected This attribute was rejected because it was not significant in the regression within one factor.
Nonsignif_Rejected_From_Final_Pool This attribute was rejected because it was not significant among all factors.
Collinear_Rejected This attribute was rejected because it is highly correlated with another attribute in the same factor.
Constant_Rejected This attribute was not considered in the factor analysis or regressions because it has a constant value.
Passthrough_Attribute This attribute was specified as a Pass-Through Attribute.
Dependent_Attribute This attribute was specified as the Dependent Attribute.
Min_Dependent_Correlation_Rejected This attribute was rejected because it does not meet the minimum correlation with the Dependent Attribute.

Keep Statement Tab

../../_images/reduce_resultsKeepStatementTab.png

The Keep Statement tab displays the ClarioScript code for the Pass-Through, Selected, and Dependent attributes. This code can be copied and pasted into the Code Editor of a Transform node.