Logistic Regression

../../_images/logisticNode.png

The Clario Logistic Regression node uses logistic regression to build a model for a binary (2 class) dependent attribute. The model outcome is a prediction of the likelihood of belonging to the success value.

Configuration

The Logistic Regression node has three configuration tabs, Dependent Attribute, Weight Attribute, and Predictor Attributes.

Dependent Attribute Tab

../../_images/logisticRegression_dependentAttributeTab.png

Select the dependent attribute from the Available Attributes and drag and drop it into the Dependent Attribute area (required). This attribute cannot be null or have more than two distinct values.

Under Settings, enter both Success and Failure values for the dependent attribute. Next, choose the Attribute Selection Method. Choices for the Attribute Selection Method are None and Stepwise. If Stepwise is chosen, two additional parameters appear that need to be defined: the ‘Maximum p to Enter’ and ‘Minimum p to Remove’ values.

Weight Attribute Tab

../../_images/logisticRegression_weightAttributeTab.png

If the incoming data set is weighted, drag and drop the weight attribute into the Weight Attribute Field. The weight attribute must be a non-zero integer value.

Predictor Attributes

../../_images/logisticRegression_predictorAttributesTabNone.png

Select the desired predictor attribute(s) by dragging and dropping them from the Available Attributes box to either the Force-Entry Attributes or the Candidate Attributes box.

If Selection Method is ‘None’ in the Dependent Attribute tab, attributes must be selected for entry into the model. At least one attribute must be placed into the Force-Entry Attributes box.

../../_images/logisticRegression_predictorAttributesTabStepwise.png

If the Selection Method is ‘Stepwise’, attributes may be chosen as Candidates or be selected for Force-Entry into the model. At least one attribute must be placed into either the Force-Entry or Candidate Attributes box.

Results

There is one results set with four different tabs (Response Profile, Detailed Results, Step History, and Model Equation) for the Logistic Regression node. When Attribute Selection Method is set to None, the Step History Tab is omitted in the results set.

Response Profile Tab

../../_images/logisticRegression_responseProfileTab.png

This tab contains statistics such as Success Frequency, Failure Frequency, Missing number of rows, Success Weight, and Failure Weight.

Detailed Results Tab

../../_images/logisticRegression_detailedResultsTab.png

This tab contains various statistics, including Global Fit Statistics, Model Fit Statistics (AIC, SC, and -2*Log Likelihood), and Analysis of Maximum Likelihood Estimates for each parameter (Degrees of Freedom, Chi-square, Standard Error, Standardized Coefficient, Model Contribution, Chi-square, and p-value).

Step History Tab

../../_images/logisticRegression_stepHistoryTab.png

This tab (for stepwise method only) contains one row of data for each step in the model building process. Each step lists the attribute entered or removed at that step along with the resulting model Chi-squared value.

Model Equation Tab

../../_images/logisticRegression_modelEquationTab.png

This tab contains the model equation for the logistic regression. This code can be copied and pasted into the Code Editor of a Transform node.

Output Stream

The output stream contains three attributes: Component, Description, and Value. If the Logistic Regression results are written to a file to be used in a scoring application, make sure ‘Full Precision’ is selected as the number format to avoid truncation of model coefficients.

Below is an example of an output stream from Logistic Regression.

COMPONENT,DESCRIPTION,VALUE
ModelType,LOGISTIC,
InterceptCoefficient,,-24.4077
Coefficient,pub_priv,5.007
Coefficient,sat_v,0.0157
Coefficient,hstop25,-0.003
Coefficient,sat_m,0.0083
Coefficient,admit_rate,2.0278
Coefficient,fac_phd,0.0289
Coefficient,admits,0.0006
Coefficient,app_rec,-0.0003
InterceptStandardCoefficient,,0
StandardCoefficient,pub_priv,1.5307
StandardCoefficient,sat_v,0.5957
StandardCoefficient,hstop25,-0.0401
StandardCoefficient,sat_m,0.3677
StandardCoefficient,admit_rate,0.1924
StandardCoefficient,fac_phd,0.3222
StandardCoefficient,admits,1.0863
StandardCoefficient,app_rec,-0.8239
DependentAttribute,tuit_binary,
DependentSuccessType,STRING,
DependentSuccessValue,1,