Read File

../../_images/readFileNode1.png

The Clario Read File node is used to read in either a delimited or fixed-length flat file. Files can be read in from either the Shared Files or Project Files location.

Configuration

The Read File node has two configuration tabs: Define File and Define Attributes.

Define File Tab

../../_images/readFile_defineFileTab.png

Specifying Files

The first step in configuring a Read File node is specifying the input file name(s). This can be accomplished in one of three ways: specifying new files, selecting files, or importing a list of files. New or imported file names and locations may contain User-Defined Function references. If a constant reference is used to identify the location, it must resolve to “project” or “shared.” If no location is specified, it will default to project. Make sure you select the correct file location as files with the same name may be located in both shared and project files. If multiple files are specified, read file will append the files together during execution in the order the file names are listed. All files must have identical structures (e.g. type, enclosure, header rows, attributes, etc.).

New Files

Specify a new file by clicking [New] and entering the file name. Next, a file location must be specified. This is where the file will be read from. A location can be specified by selecting Project or Shared from the drop down menu or by typing directly in the Location field.

../../_images/readFile_newFileDialog.png

Select Files

Select one or more files by clicking [Select], which launches the File Browser. Select the desired file(s) from the list and click [OK].

Import Files

Import one or more files by clicking [Import] which opens a text dialog. Files must be listed with the format: Name, Location.

../../_images/readFile_importFile.png

Edit File Names

Edit the name of a file by double-clicking the file name in the list.

Raw File Preview

Once one or more file(s) have been specified, select a file from the list and click the [Preview] button to display a Raw File Preview of the first 50 rows of the selected file, unformatted.

Type

Clario supports delimited or fixed file formats. If delimited is selected, you will need to specify the delimiter and, optionally, the enclosure.

Delimiter

Select the attribute delimiter from the drop down. The available delimiters are comma, semi-colon, pipe, double pipe, tab, and space.

Enclosure

If the selected delimited file uses enclosures, the enclosure type (single quote or double quote) must be specified. Leave enclosure blank if none is present.

Header Rows

For either delimited or fixed data files, if there are header rows present in the file, click on the check box and specify the number of rows containing the header. The header rows will be excluded from any further data processing, but will be used in the define attributes tab by the Attribute Guesser.

# Rows to Read From File

To read a subset of rows from the beginning of the selected data file, uncheck [All] and specify a value in [# Rows to Read From File]. Leave the [All] box checked to read all rows in the file.

Set Parse Errors to Null

If any type mismatch errors are encountered while parsing the data, the workflow run will fail and errors will be reported in the Run Summary. Set these parse errors to null values by checking [Set Parse Errors to Null]. For example, if the defined attribute is numeric, but there is one row that contains a string, the value on the affected row for the attribute will be set to NULL, and a count of parse errors will be displayed in the results.

Discard Malformed Rows

If any delimiter frequency errors (too many or too few delimiters) are encountered while parsing the data, the workflow run will fail and errors will be reported in the Run Summary. Checking [Discard Malformed Rows] will cause the affected rows to be discarded and will return a count of malformed rows in the results.

Define Attributes Tab

../../_images/readFile_defineAttributesTab.png

Attribute Name

Attribute Names are required for all attributes. Attribute names may only consist of the following characters: A-Z, a-z, 0-9, ‘-‘, and ‘_’.

Type

A type must be specified for all attributes. Valid types are String, Number, and Date.

Start

Required if a fixed-length file was specified. Specifies the starting position of the attribute. Each data row starts in position 1.

Length

Required if a fixed-length file was specified. Specifies the length to read from the attribute’s starting position.

Format

Format is only required for Date Type attributes. Valid formats are listed in a drop-down box. If the Date Format is not listed, you can read the attribute in as a string or number and then transform it into the desired format using the Transform node.

New Attribute

../../_images/readFile_newAttribute.png

To create a new attribute, click [New Attribute] in the top right corner. Specify a name and type for the attribute, and a format if the type is Date.

Attribute Guesser

If you specified a delimited file on the define file tab, there is an Attribute Guesser button in the lower left corner. Clicking this button will use the define file configuration to guess the Name, Type and, in the case of Date Types, Format of each attribute. If the [Header Rows] box is checked on the define file tab, the data in the first header row will be used to fill in the Attribute Names. If the Header Rows check box is not checked the Attribute Names default to attribute1, attribute2, etc. The first 100 data rows (after the specified number of header rows) will be examined to determine the attribute Type and Format. If more than one file is read in, Attribute Guesser uses the first file to define the attribute names and types.

If [Attribute Guesser] has been pressed and invalid attribute names appear, they become highlighted in red indicating an error. To fix the error you must double click the affected attribute to open the Edit Attribute dialog, then rename the attribute using valid characters.

Import

The Read File node allows users to Import attribute definitions directly from a comma separated file via the [Import] button. After the file has been set up in the Define File tab, click on the Define Attributes tab and then click the [Import] button. In the Import dialog enter the following values depending on the file type:

  • Fixed Length Format: NAME, TYPE, START_POSITION, LENGTH, DATE_FORMAT
  • Delimited Format: NAME, TYPE, DATE_FORMAT

After clicking [Save] in the dialog, you return to the Define Attributes tab that is now populated.

Export

Attribute definitions can be exported to a spreadsheet by clicking [Export]. You will be prompted to enter a filename before downloading.

Formatted File Preview

Clicking [Formatted File Preview] will display the first 50 rows of data in a grid using the configuration from both the define file and define attributes tabs. This is useful to check that you have properly defined all attributes. If more than one file is defined, you can choose which file to preview in the Select a File dialog. Formatted File Preview is also valuable for visualizing two types of formatting errors:

  • <MALFORMED>: displayed when too many or too few delimiters are found on a row
  • <ERROR>: displayed when the Attribute Type is defined incorrectly, (eg. Date or String instead Number, Number or String instead of Date))

Modifying an Attribute

To modify an attribute, double-click the row you wish to change and make the desired modifications in the resulting dialog; click [Save] to accept the changes, or [Cancel] to discard them.

Results

Read File node results will only be produced if there is a warning. Warnings are triggered when malformed rows or parse errors are encountered and the Set Parse Errors to Null or Discard Malformed Rows boxes are checked in the Define File tab. Read File Results display the number of rows processed, header rows, malformed rows, and output rows. Also displayed are the attributes that caused the parse error, along with a parse error frequency count for each attribute.

../../_images/readFile_warningResults.png

Read File Warning Results

Output Streams

Read File has two output streams you can connect to.

Read Rows

The “Read Rows” output stream receives parsed rows with attributes and types displayed in the Define Attributes tab.

Error Rows

The “Error Rows” output stream receives the following attributes for any line on the incoming file unable to be read:

  • Line_Number: Line number of incoming file where error occurred
  • Parse_Errors Count of attributes that caused a parse error or zero if error was a malformed row.
  • Line_Text: Text of the line that caused the error.

Example

With a three column comma-delimited data file like:

RowNo,LineNo,Description
1,2,Row 1
X,3,Row 2
3,4,Row 3
45Row4
Z,Z,Row 5

The “Error Rows” output will be:

Line_Number Parse_Errors Line_Text
NUMBER NUMBER STRING
3 1 X,3,Row 2
5 0 45Row4
6 2 Z,Z,Row 5

Notes

  • Since “Line_Text” is the unparsed text it will contain delimiters and enclosures from the incoming file. Therefore if you connect a Write File node you’ll want to specify a different delimiter and/or enclosure to avoid Write File output errors.
  • Line_Number is the physical line number of the source file, which includes any headers