Univariate

../../_images/univariateNode1.png

The Clario Univariate node allows you to understand and explore your data by looking at frequency distributions, graphs, and a variety of statistical metadata for all selected attributes.

Configuration

The Univariate node has only one configuration tab.

Configuration Tab

../../_images/univariate_configurationTab.png

Available Attributes

The Available Attributes box displays all of the attributes available on the input data stream.

Selected Attributes

Drag and drop the attribute(s) to be analyzed from Available Attributes to Selected Attributes. At least one attribute must be placed in the Selected Attributes list.

Settings

Analysis Level

In the Settings box, select the analysis level. Selecting Detailed will take take longer to run, but will produce more detailed results. Selecting Summary will be faster but will give fewer details in the results. Default is Summary.

Weight Attribute

Drag and drop a numeric Available Attribute into the Weight Attribute field to replicate each row a number of times equivalent to the value of the Weight Attribute.

Weight Attributes must contain positive integers. Univariate will exclude rows containing negative, zero, or null Weight Attribute values and will identify these rows as Invalid Rows in the Results. Decimal Weight Attribute values are not allowed and will cause a workflow to fail.

To remove an attribute, press the [-] button. If you do not wish to apply a weight, leave the Weight Attribute field empty.

By Attribute

Drag and drop an Available Attribute into the By Attribute field to create a separate set of statistics for each distinct value of the By Attribute. To remove an attribute, press the [-] button. If you do not wish to use a By Attribute, leave the field empty.

Allow Zero Rows

Check the Allow Zero Rows box to enable Univariate to finish processing successfully even if the incoming data stream has zero rows. If left unchecked, Univariate will error and cause the workflow to fail if the incoming data stream has zero rows.

Results

The Univariate node has two results tabs: Results Explorer and Numeric Summary.

Results Explorer Tab

../../_images/univariate_annotatedResultsExplorer.png

Click the Results Explorer tab for more detailed statistics and charts for one attribute at a time.

Attribute List

Click on any attribute listed in the Attribute List, and details will be displayed for that attribute in the Attribute Statistics and Attribute Values sections. To export results to a spreadsheet, select one or more attributes in the Attribute List and click on the Export to Spreadsheet button located on the Toolbar, and enter a filename when prompted.

Attribute Statistics

Click on the section headings to expand or collapse each section. Attribute statistics will vary by how the Settings are configured in the Univariate Configuration tab.

Label Description Outgoing Attribute Name Summary Summary with Weight Summary with By Attribute Detailed Detailed with Weight Detailed with By Attribute
Summary                
Mean Mean value mean x x x x x x
Minimum Minimum value min x x x x x x
Maximum Maximum value max x x x x x x
Standard Deviation Standard deviation std x x x x x x
Sum Sum of values sum x x x x x x
Median Median value median       x x x
Processed Rows Number of rows on the incoming data stream rows x x x x x x
Valid Rows Number of rows with a valid weight validRows   x     x  
Invalid Rows Number of rows with an invalid weight invalidRows   x     x  
Weighted Rows Number of valid rows multiplied by the weight attribute weightedRows   x     x  
Values Number of rows where the attribute value is not null values x x x x x x
Null Values Number of rows where the attribute value is null nullValues x x x x x x
Weighted Values Values multiplied by the weight attribute weightedValues   x     x  
Weighted Null Values Null values multiplied by the weight attribute weightedNullValues   x     x  
Distinct Values Number of distinct non-null values distinctValues       x x x
Mode % Percentage of non-null values that appears most often modePct       x x x
Coverage % Percentage of non-null values (Values / Processed Rows) coveragePct x x x x x x
Zero Values Number of rows where the attribute value is “0” zeroValues x x x x x x
Zero Values % Percentage of values where the attribute value is “0” (Zero Values / Values) zeroValuePct x x x x x x
Label Description Outgoing Attribute Name Summary Summary with Weight Summary with By Attribute Detailed Detailed with Weight Detailed with By Attribute
Location                
Mode Mode value mode       x x x
Variance Square of standard deviation var x x x x x x
Range Maximum - Minimum range x x x x x x
Interquartile Range See external reference iqr       x x x
Label Description Outgoing Attribute Name Summary Summary with Weight Summary with By Attribute Detailed Detailed with Weight Detailed with By Attribute
Moments                
Skewness See external reference skew       x x x
Kurtosis See external reference kurt       x x x
Uncorrected Sum of Squares Sum of Squares not corrected for the mean ss x x x x x x
Corrected Sum of Squares Sum of Squares corrected for the mean css       x x x
Standard Error of Mean See external reference sem x x x x x x
Coefficient of Variation See external reference cv x x x x x x
Values Sum Sum of values sum x x x x x x
Z of Minimum (Minimum - Mean) / Standard Deviation zmin x x x x x x
Z of Maximum (Maximum - Mean) / Standard Deviation zmax x x x x x x
Label Description Outgoing Attribute Name Summary Summary with Weight Summary with By Attribute Detailed Detailed with Weight Detailed with By Attribute
Quantiles                
Nth Percentile See external reference qN       x x x
Label Description Outgoing Attribute Name Summary Summary with Weight Summary with By Attribute Detailed Detailed with Weight Detailed with By Attribute
Extremes                
Nth Minimum Nth smallest value minN       x x x
Nth Maximum Nth largest value maxN       x x x

Attribute Values

../../_images/univariate_histogramResultsExplorer.png

Histogram

For numeric and date attributes, a histogram is displayed in the Attribute Values section when the analysis level is set to Detailed. As you hover over each bar of the histogram, value ranges and counts are displayed. Above the histogram is a drop-down menu that allows you to choose to view the values that lie within two or three standard deviations of the mean and a box to check to view the histogram on a logarithmic scale.

../../_images/univariate_frequencyTableResultsExplorer.png

Frequency Table

For string attributes, this area contains distinct values of the attribute and frequencies of each value, up to 1,000 distinct values. There are also Copy to Clipboard buttons ([Values] and [Both]) to make the frequency information available to be pasted in another location.

Numeric Summary Tab

../../_images/univariate_numericSummaryTab.png

Click the Numeric Summary tab for a high level statistical summary of numeric and date attributes including Valid Values, Mean, Minimum, Maximum, Standard Deviation, Sum and Median.

To export results to a spreadsheet, select one or more attributes in the Attribute List and click on the Export to Spreadsheet button located on the Toolbar, and enter a filename when prompted.

Output Stream

The data stream sent to the Univariate node’s outgoing connector contains all of the available attributes in the Attribute Statistics table. These attributes are labeled using the Outgoing Attribute Name.

When you use a By Attribute, there is no output stream from the Univariate node.