What is a Data File

Top  Previous  Next

The ChaosHunter reads text files (sometimes called ASCII files). These are files that contain data in a readable form where numbers are separated by the list separator character designated by the Windows Control Panel (usually a comma in the United States). Often (but not always) the file extensions of .TXT, .CSV, or .PRN denote that those files have been saved in text format. Any word processor or spreadsheet can read such files.

Many programs save files in these formats, including spreadsheets. For example, you can save an Excel spreadsheet by selecting Save As from the File Menu and choosing a .CSV (comma separated) file type from the list of displayed options. Microsoft Word will allow you to save the file as a .TXT (text) file type.

ChaosHunter also reads files from NeuroShell Trader that are saved  with a file extension of .WSG that are usually saved in the c:\NeuroShell Trader data directory or in the Servers folder data directory.  .WSG files are not readable in a word processor or spreadsheet program.

What Do the Files Look Like?
The file may include a single row of label information on the first line. The data rows of numeric values should begin immediately below the label row on the second line. Each row should contain a set of inputs and the output you are trying to predict. Refer to Select Inputs and an Output for a description of inputs and outputs.

For example, you might have data that may be used to predict the pH of a mixture.  The inputs would be data on the amount of each chemical in the mixture.  The output would be the pH of the mixture.

The file in the U.S. would look like the following comma separated file (.CSV):

Caustic soda flow,Organic acid flow,Temperature,pH

56.7,9.4,101,6.2

62.3,3.3,115,9.8

59.6,12.1,143,2.9

60.6,5.4,117,9.5

58.2,7.6,153,8.1

46.3,10.2,89,5.8

70.1,13.7,114,6.5

73.7,6.9,109,9.2

63,11.6,106,6.2

67.4,13.2,119,5.8

64.3,11.6,121,6.1

etc.

In some European countries where the list separator is a semicolon (;) and the decimal symbol is a comma (,), the file would look like this:

Caustic soda flow;Organic acid flow;Temperature;pH

56,7;9,4;101;6,2

62,3;3,3;115;9,8

59,6;12,1;143;2,9

60,6;5,4;117;9,5

58,2;7,6;153;8,1

46,3;10,2;89;5,8

70,1;13,7;114;6,5

73,7;6,9;109;9,2

63;11,6;106;6,2

67,4;13,2;119;5,8

64,3;11,6;121;6,1

 

etc.

Your file may contain many more columns of data than you intend to use in your formula because it will be possible to choose which columns will be inputs and which will be the output. Refer to Select Inputs and Output for more information on how to select columns.

File Specifications
The inputs and output are called fields in the file. Each field should be separated by the list separator character. If the list separator character is not found, the program will test to see if a tab was used as a delimiter character. If not, the program will test to see if the space character was used as a delimiter. No matter which of the above characters was used, the program assumes number formats are as specified in the Windows Control Panel. This means that the decimal symbol and digit-grouping symbol will be used to convert the numbers in the fields. Note that the digit-grouping symbol cannot be used if it is the same as the list separator (e.g., a comma in the United States).

If the program determines that the fields in the file are not delimited by the list separator, the tab, or the space character, and if commas are found in the file, then the file is assumed to be a comma separated file with a period as a decimal symbol. This feature enables users in countries other than the U.S. to read the U.S. example files we provide, even if their Windows Control Panel has non-U.S. settings in it. The program provides a warning message when it makes this assumption.

The fields can be in any order, as long as they are in the same order when you apply the trained formula. It will be possible to choose which columns will be inputs and which will be the output.

The first row of the file may contain unique column names (labels) which describe the input variables and the output. A label row is not required, however. If the file is space separated and the label contains a space, the label should be enclosed in double quotes, e.g., “daily price”.

The data rows must contain numbers if they will be used as an input or an output.  The data rows should begin immediately below the label row, if one exists.

Each row should contain a set of input values and the output value you are trying to predict, if the file is to be used to create the formula. If you are applying the formula to a new data file, it only needs to have input values, which should be in the same order as the inputs were in the training data. If output values are included, the program will make comparisons for you of the predicted values and the actual values.

If you want to include a date column and also include the time, the date and time must be in the same field or column.

The number of inputs columns in the data file is not limited; however the equation limits the number of inputs when you set max equation size.  Overall, if the file contains a lot of data columns that you are not going to use, processing will take much longer.

The number of output columns is limited to 1.

The limit on the number of rows is defined by the operating system only. It is not, however, how much RAM is installed or the size of your page file. The operating system puts limits on how much memory each application is allowed to use for all of its resources (graphic objects, data arrays, etc).

The file cannot have any extraneous data in it, e.g., total or summary information at the bottom of the columns. Refer to Extraneous Data for details.

ChaosHunter allows you to work with files that are missing data either by skipping those rows or substituting an average value.  (See Select Inputs and an Output for details.)  Text files should indicate missing data by two list separator characters, e.g. ,, . Space delimited files cannot have any missing data since multiple spaces will be considered to be a single delimiter.

If you wish to create your formula using the beginning rows of data in a file and apply the formula to the last rows of data in the file which do not include the value you are trying to predict, make sure you have the correct number of list separator characters in the last data rows. I.e. if you final rows do not contain the output values, you need to indicate by two delimiters that the output is missing.

What is Training Data? Click here for an explanation of what types of data should be included in your training file.

File Names  Click here for details on file naming conventions.

Exporting Data from NeuroShell Trader  Click here for details on how to export data and indicator values from NeuroShell Trader.

Date/Time Numbering on the X Axis in a Graph  Click here for details on displaying the Date and/or Time on the Optimization or Results Screen.