How to Understand Text Labels
You may have seen text labels mentioned in an analysis warning such as this one:
Or you may have encountered unexpected results in an analysis or graph such as seen here:
In STATISTICA, text data can be stored either as text or with text labels. When text data is entered in a spreadsheet, each unique text string can be assigned a numeric code. So the data are stored both as a number, which is hidden, and the text we see. In this article, we will discuss what text labels are and some of the benefits and common questions associated with text data and the use of text labels.
What Are Text Labels?
Create a new spreadsheet and enter some text in the first cell. When you press ENTER, you are prompted to designate how this text should be treated. The last two options are to Enable Text Labels and Convert to a text only variable.
Select the Enable Text Labels option button, and click OK.
Select the Data tab, and in the Variables group, click Text Labels to display the Text Labels Editor for variable 1. Here we can view the numeric associations for the text entered.
In the global options of STATISTICA (Options dialog box, Navigation/Defaults tab), you can customize the start point for numeric associations for text labels. By default, 101 is the start point. So, while in the spreadsheet we see the text Apple, this cell is also associated with the number 101.
Some of the benefits to this text-to-number association are:
- Ordinal data can be represented by numbers that show their order as well as text values that have more meaning. For example, enter high, medium, and low stored as 3, 2, 1 respectively. Now their natural order is preserved, but also the more descriptive text is present, too. The variable with text labels can be analyzed either as categorical or continuous.
- Easy data entry. The numeric associations can easily be modified and become a shortcut in data entry, i.e., when typing in the data, I can type in 1 and that value will automatically show Low from my text labels.
Common Questions Associated with Text Labels
Following are answers to common questions from STATISTICA users when text labels are employed in their data.
When a variable is selected for analysis as a continuous variable (in basic descriptive statistics for example) and that variable has text labels, the following warning dialog box is displayed.This does not mean the analysis can’t proceed. It simply brings to your attention the fact that the analysis you are about to perform may be suspect. Consider the previous example where 1 to 3 represent low to high. We can compute a mean, standard deviation, etc., on this data because the numbers 1 to 3 are used in the mathematical formulas. This warning dialog box prompts you to examine if this analysis makes sense with the data you have selected. If so, select the option to continue. If not, you can further explore the variables containing text labels with the Scan Spreadsheet option.
In numeric data, suppose you inadvertently typed in some text, or on import, perhaps the row of variable names were incorrectly read in as the first row of data. Now, a text label and number combination is used in this column. Deleting the offending case is only one step in fixing this issue. The text label, although not used, is still there. This will cause the warning dialog above to be displayed in analysis. The software does not know that the text label was a mistake. Using the Text Label Editor, the unwanted text label can be removed.
Another potential problem stemming from accidental text labels is unexpected text popping up in your numeric data. Because of a data entry or import error, a number is assigned a text label. Now, when that number naturally occurs in the data, the number is hidden by the unwanted text label. The root cause of the issue and the fix are the same, but the symptoms are different.
One final possible symptom is unexpected values in graphs and analyses.This plot shows what happens when numeric data, on a scale of 0 to 1, are plotted in a histogram, but one case has an unexpected text label. The numeric value associated with the text in this graph is the default 101. The data look skewed, as a very extreme outlier is present. This is simply a data entry error that is masked by text labels. Using the Text Label Editor, you can further explore this error.
When properly understood and used correctly, text labels are a good tool for data storage. This understanding of the way text labels work can help all STATISTICA users to improve their data integrity.