How to Understand Text Labels

You may have seen text labels mentioned in an analysis warning such as this one:
STATISTICA text labels warning dialog box
Or you may have encountered unexpected results in an analysis or graph such as seen here:
STATISTICA histogram
In STATISTICA, text data can be stored either as text or with text labels. When text data is entered in a spreadsheet, each unique text string can be assigned a numeric code. So the data are stored both as a number, which is hidden, and the text we see. In this article, we will discuss what text labels are and some of the benefits and common questions associated with text data and the use of text labels.
What Are Text Labels?
Create a new spreadsheet and enter some text in the first cell. When you press ENTER, you are prompted to designate how this text should be treated. The last two options are to Enable Text Labels and Convert to a text only variable.
STATISTICA variable text values
Select the Enable Text Labels option button, and click OK.
Select  the Data tab, and in the Variables group, click Text Labels to display the Text Labels Editor for variable 1. Here we can view the numeric associations for the text entered.
STATISTICA text labels editor
In the global options of STATISTICA (Options dialog box, Navigation/Defaults tab), you can customize the start point for numeric associations for text labels. By default, 101 is the start point. So, while in the spreadsheet we see the text Apple, this cell is also associated with the number 101.
Some of the benefits to this text-to-number association are:
  1. Ordinal data can be represented by numbers that show their order as well as text values that have more meaning. For example, enter high, medium, and low stored as 3, 2, 1 respectively. Now their natural order is preserved, but also the more descriptive text is present, too. The variable with text labels can be analyzed either as categorical or continuous.
    STATISTICA mean plot
  2. Easy data entry. The numeric associations can easily be modified and become a shortcut in data entry,  i.e., when typing in the data, I can type in 1 and that value will automatically show Low from my text labels.
Common Questions Associated with Text Labels
Following are answers to common questions from STATISTICA users when text labels are employed in their data.
  1. When a variable is selected for analysis as a continuous variable (in basic descriptive statistics for example) and that variable has text labels, the following warning dialog box is displayed.
    STATISTICA text labels warning dialog box
    This does not mean the analysis can’t proceed. It simply brings to your attention the fact that the analysis you are about to perform may be suspect. Consider the previous example where 1 to 3 represent low to high. We can compute a mean, standard deviation, etc., on this data because the numbers 1 to 3 are used in the mathematical formulas. This warning dialog box prompts you to examine if this analysis makes sense with the data you have selected. If so, select the option to continue. If not, you can further explore the variables containing text labels with the Scan Spreadsheet option.
  2. In numeric data, suppose you inadvertently typed in some text, or on import, perhaps the row of variable names were incorrectly read in as the first row of data. Now, a text label and number combination is used in this column. Deleting the offending case is only one step in fixing this issue. The text label, although not used, is still there. This will cause the warning dialog above to be displayed in analysis. The software does not know that the text label was a mistake. Using the Text Label Editor, the unwanted text label can be removed.
  3. Another potential problem stemming from accidental text labels is unexpected text popping up in your numeric data. Because of a data entry or import error, a number is assigned a text label. Now, when that number naturally occurs in the data, the number is hidden by the unwanted text label. The root cause of the issue and the fix are the same, but the symptoms are different.
  4. One final possible symptom is unexpected values in graphs and analyses.
    STATISTICA histogram
    This plot shows what happens when numeric data, on a scale of 0 to 1, are plotted in a histogram, but one case has an unexpected text label. The numeric value associated with the text in this graph is the default 101. The data look skewed, as a very extreme outlier is present. This is simply a data entry error that is masked by text labels. Using the Text Label Editor, you can further explore this error.
When properly understood and used correctly, text labels are a good tool for data storage. This understanding of the way text labels work can help all STATISTICA users to improve their data integrity.

About statsoftsa

StatSoft, Inc. was founded in 1984 and is now one of the largest global providers of analytic software worldwide. StatSoft is also the largest manufacturer of enterprise-wide quality control and improvement software systems in the world, and the only company capable of supporting its QC products worldwide, with wholly owned subsidiaries in all major markets (StatSoft has 23 full-service offices, on all continents), and its software is available in more than 10 languages.

Posted on September 3, 2013, in Uncategorized. Bookmark the permalink. Leave a comment.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: