Chapter 12 Frequently Asked Questions |
Top Previous Next |
Q. I got better results with the Evolution method than the Swarm method. Why then do you even have both methods? A. You should always evaluate your formulas on out-of-sample (evaluation) data, not in-sample (training) data. Nevertheless, we don't think either method is always better than the other, which is why we have two. Sometimes one is better; sometimes the other is better. Q. Do I have to normalize all of my variables before I feed them to ChaosHunter? A. If by "normalize" you mean scaling all the inputs into the same range, as is often done in statistics (e.g. Z-score), the answer is no, because there is an option in ChaosHunter that can do that for you automatically. Q. Can you give an example of not coding inputs with monotonic values unless they represent monotonic concepts? A. Suppose you want an input value that is the code for which of your company’s manufacturing plants made a particular part. The plants are East Coast, Southern, Northern, and West Coast. You might be tempted to make this your input: Q. In view of the question above about monotonic inputs, can I use a person's zip code as an input? Certainly I can't use one input that is 0 or 1 for each zip area. A. Of course you can't use thousands of inputs. But you can't use the zip code number either. Think about what you are trying to represent with the zip code. Suppose it is probable economic status, since you know some zip codes cover wealthy areas and others represent poverty areas. Translate your zip codes into a variable such as this: Q. I get great training results but when I feed in some out-of-sample patterns the results are pretty poor. Have any ideas why? I have 40 inputs and 15 training patterns. A. Yes, we do. When you use too many inputs and too few training patterns, you are really asking for over-fitting trouble! Mathematically, if you had as few as 15 inputs you could exactly fit the data even with a linear model. The old statistical rule of thumb (use AT LEAST 10 times as many patterns as you have inputs" works as a minimum for our non-liner models. More is better to a point. Q. What is the point where more is not better? A. If you use so many patterns that you have clusters of the same or close patterns, then you aren't adding anything, and large clusters may be biasing the formula. Q. Is there a way to know if over-training has occurred? A. The only way to know this for sure is test on out-of-sample data. Q. What do I do if some of my data fields are missing? A. On the Select Inputs screen, you may also click on the appropriate radio button to have ChaosHunter skip rows in your data file that contain blank or missing values or to include those rows by having ChaosHunter replace missing values with average values from each column.
|