How to Most Efficiently Store Your Data

statistica how to logoWhen working with large data files, it becomes important to look for ways to make one’s processes more efficient. File size and computation times can both be affected by how data is stored.
Many variables can be stored more efficiently merely by changing a few of the default settings. In this brief article, we will explore the various methods to help make spreadsheet storage and computations more efficient.
To view and change the storage method of a given variable, click on the variable header in the spreadsheet. Then, select the Data tab and in the Variables group, click Specs to display the variable specification dialog box for the selected variable. You can also double-click on the variable header to display this dialog box. In the drop-down box labeled Type, you will find the data storage options. The default data storage method is double precision. In STATISTICA, it is called simply Double.
selecting double precision in STATISTICA
For variables stored with double precision, values are stored as 64-bit floating point real numbers, with 15-digit precision. The range of values supported by this data type is approximately +/-1.7*10308.
The next option, Text, is used for storing text data. The Length should be specified to store the number of characters needed. As you would expect, the longer the designated length of the text variable, the more storage space the data takes. So the length parameter should be set as small as possible to capture the full text.
For some types of numeric data, the double precision data storage is necessary. Any variable with values that have decimals or are extremely large or small require this storage type. But many variables are stored with far greater precision than necessary, and this is where we can change the data type and gain efficiency.

The integer data type takes on integer values between +/- 2,147,483,647. Variables stored with this method are still more efficient, with 4 bytes per cell, compared to 8 with double precision.
The byte data type takes on integer values from 0 to 255 and is the most economical storage option. For variables needing only small integer values, this data type should be used and only takes 1 byte of storage per cell in the spreadsheet.
Using the most efficient storage method for your variables makes for smaller spreadsheet files and faster computing.

About statsoftsa

StatSoft, Inc. was founded in 1984 and is now one of the largest global providers of analytic software worldwide. StatSoft is also the largest manufacturer of enterprise-wide quality control and improvement software systems in the world, and the only company capable of supporting its QC products worldwide, with wholly owned subsidiaries in all major markets (StatSoft has 23 full-service offices, on all continents), and its software is available in more than 10 languages.

Posted on July 3, 2013, in Uncategorized. Bookmark the permalink. 1 Comment.

  1. Useful advice for speeding up computations on big data sets, thank yoyu

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: