What does it mean to not have enough codes?

 

In your statistical analysis adventures with STATISTICA, you may run into some error messages along the way. One such type of error tells you that a variable does not have enough codes. Let’s explore more of what this error is telling us and how to fix the data.

The Error

In statistical modeling tools like General Linear models and Data Mining tree algorithms like C&RT, CHAID, Boosted Trees and Random Forests, when a categorical predictor variable has less than 2 distinct levels, you get an error message that says: “Not enough codes selected for variable : The required minimum number of codes is 2.”

For Neural Networks and machine learning tools, you get a bit wordier message that is telling us the same thing. It says: “STATISTICA has detected an insufficient number of nominal levels in the categorical independent variable . All categorical variables must contain at least two nominal values. Please carefully check your data and case selection conditions and try again.

The Fix

Here are some possible causes and solutions for this type of error message. I reserve the right to add to my list, if I think of something else.

  1. All records in the sample have the same response for this variable. Say the population you are studying is male subjects. You don’t need a gender variable in the model because all participants in your study are male. This gender variable should not be (and can’t be) used for analysis.
  2. Records are missing from our dataset. Say we want to compare several hospitals on various metrics. Each hospital has its own data set. These data files need merged before starting the analysis.
  3. Data was recorded as 1 if present and left blank otherwise. When this is the case, those blanks should be filled in with a value.  When STATISTICA encounters a blank cell, it is taken to be unknown, missing. In this case, the value is not unknown; it is a unique group level. So to fix this, you might use the Process Missing Data tool to replace missing cells with a value. Or a spreadsheet function like “=iif(ismd(vcur)), 0, vcur)” will leave existing data unchanged and fill in missing cells with 0.
  4. Case selection conditions have excluded some cases, and now there is only 1 unique level to this variable. To solve this issue, either case selection conditions need changed or the predictor variable should not be included in the analysis.
  5. Missing data in another variable has excluded some cases, leaving only one unique level of a grouping factor. Missing data should be dealt with before starting an analysis. Model build tools can’t work with missing data and the records are thrown out. Here is a video that discusses dealing with missing data.

Now that you understand the cause of your error message, you are ready to get back to blazing your analytic trail.

Advertisements

About statsoftsa

StatSoft, Inc. was founded in 1984 and is now one of the largest global providers of analytic software worldwide. StatSoft is also the largest manufacturer of enterprise-wide quality control and improvement software systems in the world, and the only company capable of supporting its QC products worldwide, with wholly owned subsidiaries in all major markets (StatSoft has 23 full-service offices, on all continents), and its software is available in more than 10 languages.

Posted on May 21, 2013, in Uncategorized. Bookmark the permalink. Leave a comment.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: