Monthly Archives: September 2013

How to Understand Text Labels

You may have seen text labels mentioned in an analysis warning such as this one:
STATISTICA text labels warning dialog box
Or you may have encountered unexpected results in an analysis or graph such as seen here:
STATISTICA histogram
In STATISTICA, text data can be stored either as text or with text labels. When text data is entered in a spreadsheet, each unique text string can be assigned a numeric code. So the data are stored both as a number, which is hidden, and the text we see. In this article, we will discuss what text labels are and some of the benefits and common questions associated with text data and the use of text labels.
What Are Text Labels?
Create a new spreadsheet and enter some text in the first cell. When you press ENTER, you are prompted to designate how this text should be treated. The last two options are to Enable Text Labels and Convert to a text only variable.
STATISTICA variable text values
Select the Enable Text Labels option button, and click OK.
Select  the Data tab, and in the Variables group, click Text Labels to display the Text Labels Editor for variable 1. Here we can view the numeric associations for the text entered.
STATISTICA text labels editor
In the global options of STATISTICA (Options dialog box, Navigation/Defaults tab), you can customize the start point for numeric associations for text labels. By default, 101 is the start point. So, while in the spreadsheet we see the text Apple, this cell is also associated with the number 101.
Benefits
Some of the benefits to this text-to-number association are:
  1. Ordinal data can be represented by numbers that show their order as well as text values that have more meaning. For example, enter high, medium, and low stored as 3, 2, 1 respectively. Now their natural order is preserved, but also the more descriptive text is present, too. The variable with text labels can be analyzed either as categorical or continuous.
    STATISTICA mean plot
  2. Easy data entry. The numeric associations can easily be modified and become a shortcut in data entry,  i.e., when typing in the data, I can type in 1 and that value will automatically show Low from my text labels.
Common Questions Associated with Text Labels
Following are answers to common questions from STATISTICA users when text labels are employed in their data.
  1. When a variable is selected for analysis as a continuous variable (in basic descriptive statistics for example) and that variable has text labels, the following warning dialog box is displayed.
    STATISTICA text labels warning dialog box
    This does not mean the analysis can’t proceed. It simply brings to your attention the fact that the analysis you are about to perform may be suspect. Consider the previous example where 1 to 3 represent low to high. We can compute a mean, standard deviation, etc., on this data because the numbers 1 to 3 are used in the mathematical formulas. This warning dialog box prompts you to examine if this analysis makes sense with the data you have selected. If so, select the option to continue. If not, you can further explore the variables containing text labels with the Scan Spreadsheet option.
  2. In numeric data, suppose you inadvertently typed in some text, or on import, perhaps the row of variable names were incorrectly read in as the first row of data. Now, a text label and number combination is used in this column. Deleting the offending case is only one step in fixing this issue. The text label, although not used, is still there. This will cause the warning dialog above to be displayed in analysis. The software does not know that the text label was a mistake. Using the Text Label Editor, the unwanted text label can be removed.
  3. Another potential problem stemming from accidental text labels is unexpected text popping up in your numeric data. Because of a data entry or import error, a number is assigned a text label. Now, when that number naturally occurs in the data, the number is hidden by the unwanted text label. The root cause of the issue and the fix are the same, but the symptoms are different.
  4. One final possible symptom is unexpected values in graphs and analyses.
    STATISTICA histogram
    This plot shows what happens when numeric data, on a scale of 0 to 1, are plotted in a histogram, but one case has an unexpected text label. The numeric value associated with the text in this graph is the default 101. The data look skewed, as a very extreme outlier is present. This is simply a data entry error that is masked by text labels. Using the Text Label Editor, you can further explore this error.
Conclusion
When properly understood and used correctly, text labels are a good tool for data storage. This understanding of the way text labels work can help all STATISTICA users to improve their data integrity.

Powering the Cloud

by Win Noren on Wednesday, August 21, 2013 1:42 PM

I read an interesting article the other day: The Cloud Begins With Coal: Big Data, Big Networks, Big Infrastructure, and Big Power – an overview of the electricity used by the global digital ecosystem. I must admit that until reading this (lengthy) article I never gave much thought to the electrical consumption of our ever-expanding digital world.

According to this paper, the Information-Communications-Technologies (ICT) ecosystem consumes almost 10% of world electricity generation and 50% more energy than global aviation. Even more surprising to me was that streaming an hour of video content weekly to my smartphone or tablet will consume more electricity in a year than is consumed by two new refrigerators! Beyond noticing that my cell phone needs to be charged, I never thought about the electrical cost to deliver content to my devices.

Soon hourly Internet traffic will exceed the annual Internet traffic in 2000. This digital traffic is distributed by an electricity-consuming infrastructure. According to the Digital Power Group, coal is the world’s largest source of electricity currently supplying 40% of the global electricity which is why they state that “the digital universe and Cloud begins with coal.”

The paper goes on to remind us that “digital bits are electrons…[and that] astronomical quantities of data eventually add up to real power in the real world.” In fact, according to Greenpeace1, “If the Cloud were a country, it would have the fifth largest electricity demand in the world,” coming after only the US, China, Russia, and Japan but before India.

Obviously data centers are large consumers of electricity and for many the cost of buying computer servers is less than the cumulative cost of the electricity consumed by those servers in their four-year life span. Facebook opened a data center in 2012 in North Carolina where electric rates are 10-30% below the national average and Facebook projects that it will save $100 million in operating costs because of the lower electrical costs. It is also projected that this Facebook facility will use one million tons of coal over the next decade. Similarly a huge data center under construction in China advertises cheap power, not cheap labor, as their competitive advantage.

So, of course, the obvious question is how will the ever-growing need for power be met? I was aware that our growing world has a growing demand for electricity but I never had considered the role that moving bits of digital data play in this…..Hey, did you see that funny comedy video by Michael Jr?
1 Greenpeace International, How Clean is your Cloud, April 2012