Candidates

../../_images/candidatesNode.png

The Clario Candidates node is used to build a series of regression models in order to determine the ‘best’ predictive model. You can direct the node to control the number of predictor attributes and the number of models generated. The node connector can be connected to a variety of nodes, (e.g. Read File, Aggregate, Append, etc.), but requires a valid stream of data.

Configuration

The Candidates node has two configuration tabs: Dependent Attribute and Candidate Attributes.

Dependent Attribute

../../_images/candidatesdependent.png

Dependent Attribute Tab

The Dependent Attribute tab contains an Available Attributes list box along with a Dependent Attribute field. Drag and drop one attribute (required) into the Dependent Attribute field. Note that this attribute must be numeric for the Candidates node to run, so only numeric attributes will be listed in the Available Attributes list box. Even if you build a logistic model using a string dependent attribute, the Candidates node requires a numeric representation of the string dependent attribute. This is because Candidates builds Ordinary Least Squares (OLS) regression models.

Candidate Attributes Tab

../../_images/candidatescand.png

Candidate Attributes Tab

The Candidate Attributes tab contains an Available Attributes list box, a Candidate Attributes list box, as well as a Settings area containing several model settings. First, drag and drop the attributes you want to use in constructing models from the Available Attributes box to the Candidate Attributes box. You must drag and drop at least one attribute into the Candidate Attributes list box.

Below the Candidate Attributes box, in the Settings area, a Minimum and Maximum number of model attributes can be selected. For example, you can restrict the creation of models to those with at least four attributes and at most six attributes. Also, the number of models to produce for each number of attributes can be specified by selecting the number of models per size (valid range is 1 to 5). The minimum number of attributes must be LESS THAN or equal to the number of Candidate Attributes; the maximum number of attributes also must be LESS THAN or equal to the number of Candidate Attributes, but greater than the minimum number of attributes. The number of models per size must be less than the number of Candidate Attributes.

Results

There is one results set for the Candidates node containing the following tables: Models, Model Summary, Analysis of Variance, Coefficients, and a model graph.

../../_images/candidatesresults.png

Results

The left two tables describe all of the resulting models. The Models table (upper left) shows all models produced, along with each model’s Cp statistic and R2 (Coefficient of determination) statistic. The table below this shows graphs of all models, with R2 on the x-axis and Cp statistic on the y-axis. Each model is represented by a colored circle, and the color corresponds with a specific number of attributes. Placing the mouse directly over one of these circles shows model-specific information, including the # of predictors. In general, the models with higher R2 statistics are better, and the models with lower Cp statistics are better.

The right three tables (Model Summary, Analysis of Variance, and Coefficients) describe each individual model. Clicking on one of the models in the Models table will bring up the three tables that correspond with the selected model. The upper two right tables contain overall model statistics. The Coefficients table (lower right) contains attributes in the model, along with coefficients and other statistics for each attribute. To easily compare multiple models, you can export each model of interest to a spreadsheet by clicking on the Export to Spreadsheet button on the Toolbar, and compare side by side.

Output Stream

The Candidates node results tables can be exported into Excel by clicking on the Export to Spreadsheet button on the Toolbar. There is no data file output from the Candidates node, as it is a terminal node. Once you choose a final model, you can go back and use either the Linear Regression node or the Logistic Regression node to build the final model. From there you can go on to Rank and Evaluate the model.