Treatment

../../_images/treatmentNode.png

Treatment is a data-cleansing node used to replace missing values and outliers across attributes in a data stream. Replacement values can be directly assigned or leverage univariate metadata.

Connecting Sources

When connecting a data source to a Treatment node, the Select Connectors dialog will open and require the user to specify if the data stream is the data source (required) or the univariate source (optional).

The data source is the set of data to which you want to apply the treatment. When using only a data source, default treatments which require univariate statistics will not be available for use and will be highlighted in red. These must be removed for the node to be properly configured.

The univariate source provides Univariate statistics which can be used when authoring treatments.

To reassign the sources, terminate the connected links and reconnect the data streams.

Configuration

The Treatment node has two configuration tabs: Configure and Summary.

Configure Tab

../../_images/treatment_configureTab.png

Assign Attributes to a Treatment

In the configuration tab, you are able to specify which treatments to apply to particular attributes from the incoming data stream.

Select a treatment, and drag and drop the desired attribute(s) from the Available Attributes list to the Selected Attributes list. The Available Attributes list displays all of the attributes of acceptable Type from the data stream attached to the incoming connector. Each attribute can be selected for only one treatment.

Manage Treatments

To create a new treatment, click [New] at the bottom left of the Treatments list and name the new treatment in the resulting dialog. This will open the Expression Builder. Alternatively, click [Edit Code] to open the Code Editor and author a new treatment.

To edit an existing treatment, double-click on the treatment to open the Expression Builder or click [Edit Code] to open the Code Editor.

To remove a treatment, select the treatment and hit [delete] or clicked [Edit Code] and delete the corresponding line of code in the Code Editor.

Unique ClarioScript/Expression Builder Features

In the Expression Library

  • the Selected Attribute function implies the treatment will be applied to each selected attribute
  • the [att] tab only contains attributes from the univariate source

In ClarioScript, treatments are written in the form :TREATMENT “name_of_treatment” = [treatment expression]

Default Treatments

The Treatment node has a group of default treatments for each of its two primary use cases.

Default treatments for treating outliers:

Name Description
NONE | PCT 99 Bounds the selected attribute above by the value of its 99th percentile
VALUE 0 | PCT 99 Bounds the selected attribute below by zero and above by the value of its 99th percentile
VALUE 0 | NONE Bounds the selected attribute below by zero
NONE | STD 3 Bounds the selected attribute above by the mean value of the attribute plus three standard deviations
VALUE 0 | STD 3 Bounds the selected attribute below by zero and above by the mean value of the attribute plus three standard deviations
PCT 1 | PCT 99 Bounds the selected attribute below by the value of its 1st percentile and above by the value of its 99th percentile
STD 3 | STD 3 Bounds the selected attribute below by its mean value minus three standard deviations and above by the its mean value plus three standard deviations
VALUE 0 | EXTREME 10 Bounds the selected attribute below by zero and above by the 10th most extreme value
VALUE 0 | EXTREME 10,000 Bounds the selected attribute below by zero and above by 10,000
REPLACE NULL | MEAN Replaces a missing value with the mean value of its attribute
REPLACE NULL | MODE Replaces a missing value with the mode of its attribute
REPLACE NULL | MIN Replaces a missing value with the minimum value of its attribute
REPLACE NULL | MAX Replaces a missing value with the maximum value of its attribute
REPLACE NULL | ZERO Replaces a missing value with a zero

Summary Tab

../../_images/treatment_summaryTab.png

The Summary tab lists each attribute, its type, and any treatment applied to it. Treatment summary can be exported to a spreadsheet by clicking [Export]. You will be prompted to enter a filename before downloading.

Results

The Treatment node has two results tabs: Summary and Transform Code.

Summary Tab

../../_images/treatment_resultsSummaryTab.png

The results shown for each attribute in the Summary Tab include the attribute name, the treatment applied, and the number of values treated. For treatments that affect the lower and upper bounds of an attribute, the number of values treated will be displayed for each.

Transform Code Tab

../../_images/treatment_resultsTransformCodeTab.png

The Transform Code tab contains the ClarioScript for how the treatment was applied to the data. This code can be copied and pasted into the Code Editor of a Transform node.

Output Stream

The data stream sent to the Treatment node’s outgoing connector contains the attributes and types displayed in the Summary tab.