Linear Regression

../../_images/linearNode.png

The Clario Linear Regression node uses linear regression to build a model for either a discrete or continuous dependent attribute. The resulting model equation can be used to create a predictive score based on one or more independent attributes.

Configuration

The Linear Regression node has three configuration tabs: Dependent Attribute, Weight Attribute, and Predictor Attributes.

Dependent Attribute Tab

../../_images/linearRegression_dependentAttributeTabStepwise.png

The Dependent Attribute tab contains an Available Attribute list box, a Dependent Attribute field, and Settings area for the Attribute Selection Method drop down.

Select the dependent attribute from the Available Attributes and drag and drop it into the Dependent Attribute area (required). Note that the dependent attribute must be a numeric attribute.

Next, choose the Attribute Selection Method. Choices for the Attribute Selection Method are None and Stepwise. If Stepwise is chosen, two additional parameters appear that need to be defined: the ‘Maximum p to Enter’ and ‘Minimum p to Remove’ values.

Weight Attribute Tab

../../_images/linearRegression_weightAttributeTab.png

If the incoming data set is weighted, drag and drop the weight attribute into the Weight Attribute Field. The weight attribute must be a non-zero integer value.

Predictor Attributes Tab

../../_images/linearRegression_predictorAttributesTabNone.png

Select the desired predictor attribute(s) by dragging and dropping them from the Available Attributes box to the Force-Entry Attributes list box.

If Selection Method is ‘None’ in the Dependent Attribute tab, attributes must be selected for entry into the model. At least one attribute must be placed into the Force-Entry Attributes box.

If the Selection Method is ‘Stepwise’, attributes may be chosen as Candidates or be selected for Force-Entry into the model. At least one attribute must be placed into either the Force-Entry or Candidate Attributes box.

../../_images/linearRegression_predictorAttributesTabStepwise.png

Results

There is one results set with three different tabs (Detailed Results, Step History, and Model Equation) for the Linear Regression node. When Attribute Selection Method is set to None, the Step History is omitted in the results set.

Detailed Results Tab

../../_images/linearRegression_detailResultsStepwise.png

Steps

If stepwise selection method is chosen, steps will appear in this box. Choose a step to see detailed results to the right.

Model Summary

This box contains the following statistics for each model step: R2 and Adjusted R2 (Coefficient of Determination), Standard Error of Estimate, and Dependent Mean.

Analysis of Variance

The Analysis of Variance (ANOVA) table contains the following statistics for each model step: Source of Variance, Degrees of Freedom, Sum of Squares, Mean Squares, F-statistic, and corresponding p-value.

Coefficients

For each model attribute, the following statistics are displayed: Degrees of Freedom, Regression Coefficient, Standard Error, Standardized Coefficient, Model Contribution, t-value, p-value, and Tolerance.

Step History Tab

../../_images/linearRegression_stepHistoryStepwise.png

This tab (stepwise method only) contains one row of data for each step in the model building process. Each step lists the attribute entered or removed along with the step on which it was entered or removed and the resulting model R2 for that step.

Model Equation Tab

../../_images/linearRegression_modelEquationTab.png

This tab contains the model equation for the linear regression. This code can be copied and pasted into the Code Editor of a Transform node.

Output Stream

The output stream contains three attributes: Component, Description, and Value. If the Linear Regression results are written to a file to be used in a scoring application, make sure ‘Full Precision’ is selected as the number format to avoid truncation of model coefficients.

Below is an example of an output stream from Linear Regression.

COMPONENT,DESCRIPTION,VALUE
ModelType,LINEAR,
InterceptCoefficient,,1.0552939929444425
Coefficient,age,-0.0035260526656904004
Coefficient,catscan,-0.29600915114346016
Coefficient,crp,-0.02508035288183742
Coefficient,cspinal_poly,-1.757772385166236E-5
Coefficient,cult_neisseria,-0.750689995339289
Coefficient,cult_staph,-0.7689190279231668
Coefficient,cult_strep,-0.417500088493898
Coefficient,kernig,-0.1655508370605784
InterceptStandardCoefficient,,0.0
StandardCoefficient,age,-0.12224377165778993
StandardCoefficient,catscan,-0.28264839644194134
StandardCoefficient,crp,-0.23582395458011013
StandardCoefficient,cspinal_poly,-0.2219313201368109
StandardCoefficient,cult_neisseria,-0.14865877088447363
StandardCoefficient,cult_staph,-0.21444127143491745
StandardCoefficient,cult_strep,-0.2269238137668467
StandardCoefficient,kernig,-0.15271832696069013
DependentAttribute,DX_VIRAL,