Sample

../../_images/sampleNode.png

The Clario Sample node gives you the ability to perform either a simple random sampling or a stratified sampling from an input data stream.

Configuration

The Sample node has only one configuration tab.

Configuration Tab

Settings

Seed

Specify the seed value (an integer between 1 and one billion).

Number of Samples

Specify the number of samples (an integer between 1 and 99).

Replicate ID Attribute

Sample will generate a unique value in the Replicate ID Attribute that will be added to the end of the output data stream metadata. This attribute can be renamed by clicking in the Replicate ID Attribute text box and typing in a new name. Attribute names may only consist of the following characters: A-Z, a-z, 0-9, ‘-‘, and ‘_’.

Sample Method

Specify the sampling method desired: Simple Random or Stratified.

Sample Size Type

Specify the sample size type: Rows or Percent.

Simple Random

../../_images/sample_simpleRandom.png

Using a simple random sampling method, each row of data has equal probability of being chosen.

Sample Size

Specify the Sample Size, according to the Sample Size type chosen. Sample size is between 1 and the number of rows in the input data stream, or a percent from 0.001 to 100.

Stratified

../../_images/sample_stratified.png

The stratified sampling method creates subgroups of the population (strata) and samples on each stratum individually.

Available Attributes

The Available Attributes box displays all of the string attributes available on the input data stream.

Class Attribute

Drag and drop an Available Attribute into the Class Attribute field. Only String type attributes will be available for selection.

Strata

../../_images/sample_createStratum.png

To create a stratum, click [+] at the bottom of the Strata box to open the Create Stratum dialog. The name of the stratum must match a value in the Class Attribute. The Value entered is the corresponding sample size (number of rows or percent) of the current stratum.

To remove a stratum, click [-] at the bottom of the Strata box.

To import strata, click the [Import…] button. Strata must be entered with the format: STRATUM,VALUE.

Results

../../_images/sample_results.png

The table columns are as follows:

Column Description
Name value of attribute
Total Row Count Total # of rows in input file with specified attribute value
Sample Row Count Total # of rows sampled with specified attribute value
Selection Probability Sample Row Count / Total Row Count
Sampling Weight 1 / Selection Probability

If the Sampling Method is Simple, the Name will be ‘Simple’. If the Sampling Method is Stratified, the Name in each row will be the value of the attribute of each stratum defined in the Configuration Tab.

Output Stream

The data stream sent to the Sample node’s outgoing connector contains all of the available attributes plus an additional Replicate ID attribute. For each sample specified, the rows on the output data stream are identified with a unique Replicate ID value.