Monthly Archives: October 2012
STATISTICA Enterprise can be used for automated analyses and reports as well as interactive analyses. One of the main strengths of the STATISTICA Enterprise tool is the analysis templates, called analysis configurations, which automate and streamline various analysis tasks. Often, an analysis should be performed interactively, with a person to guide the project, opposed to automated results. When this is the case, Enterprise users can take advantage of existing data configurations to perform these interactive analyses with an ad hoc analysis configuration.
Written by: Todd Ellingson
When monitoring a process, it’s critical to know if that process is capable of meeting the required specifications. If process variability is high compared to the range of your customers’ specifications, then you will end up with lots of scrap. That’s bad.
Written by: Jennifer Thompson
Macros and automation can save so much time, ulcers, gray hair, etc. In STATISTICA, creating macros to automate tasks is as easy as hitting record on your DVR. They are fast, and they make sure the analysis is done consistently with the same options and analysis procedures. One drawback, when comparing macros to people, is macros don’t think for themselves. They just run the script, even if the analysis it is performing is absurd. So they may need taught to play nice with their data sets.
Method 1: Delete Variable Reference
By deleting 3-5, leaving empty quotes, the variable position reference is removed. Now running the macro will prompt you to select appropriate variables. As is typical of life, taking the easy option now, means more work in the future. Each time you run this macro, you will need to select variables for analysis. The plus side is that the macro is compatible with any data set now.
Method 2: Customizing Macro to Reference by Name
- Reference the spreadsheet, S1.
- Create placeholders, v1, v2 and v3, for the 3 variable positions.
- Create an array, VarList, for storing the variable names.
- In a loop, find the spreadsheet position of each of the variables and store them in the placeholders created in step 2.
- Modify the recorded macro, variable selection line, using the variable position variables.
- Delete any unnecessary lines of recorded code for simplicity.*
Picture by: Duane Daws
|World Gold Council publishes conflict-free gold standard|
|The World Gold Council (WGC) on Thursday published the conflict-free gold standard, which aims to curb gold production fuelling conflict and human rights violations. The standard, which would apply to conflict-affected areas globally, was developed in collaboration with the council’s member companies, which comprise the world’s leading gold producers. Full Article|
Written by: Angela Waner
I define “business intelligence” (BI) as transforming data into actionable information with computer-based tools. I did not realize it until much later, but BI was my first job out of college. And in many ways, BI is my job now. I work every day with my company’s business intelligence solution, STATISTICA Enterprise.
So back to my first job…I was hired as a software developer. Because I was the newest employee, I inherited a thankless task that no one wanted to do. I became the “report guru” and I quickly learned the mainframe language Easytrieve Plus. This language was actually created so that analyses and reports could be quickly generated on mainframes.
I was in charge of 50 scheduled analyses/reports. About 10 ran once a day. About 20 reports were generated every Monday. And the remaining reports were generated once a month. It was my job to make sure the analyses/reports were executed as soon as the data was available.
I also had to read and understand every report. I “validated” the analyses results as being reasonable. If I saw anything unusual, I had to investigate and fix it before I turned the reports over to management.
Every morning I had to summarize the 10 daily reports. I created a “dashboard” of KPIs (key performance indicators). Excel was my best friend.
(I know that some people will not see my activities as BI, but my work met the spirit of the definition. I was using a mainframe and Excel. These are computer-based tools. And I tried to automate as many tasks as possible.)
Occasionally I would see changes ripple through the company from the analyses, reports, and dashboards that I created. I felt powerful. I felt useful.
But many times management would respond to the “dashboard” by asking for an ad-hoc analysis that sliced the data differently. Or they would ask me for my interpretation of the KPIs. And they would want this information yesterday. I felt stressed.
Data moved too slowly into actionable information.
My first employer was focused on answering questions like:
Why did it happen?
But management really wanted answers to questions like:
Why is it happening right now? (monitoring)
What might happen? (predicting)
I did not have the ability to provide this.
Eventually I left programming, became a parent, and changed careers. I have been a project manager at StatSoft since 2005. When I started my employment at StatSoft, I left the “analyze my data with Excel” environment. I joined an environment with enterprise analytics (templates, reporting, monitoring, and dashboards) and predictive analytics.
It has been an interesting adventure. I learned how to use data mining software. I learned how to create templates for my analyses and reports. And I plan on learning more about text mining. I feel empowered.
StatSoft Recognized by Analysts as Industry Leader in Predictive Analytics
Hurwitz’s Victory Index Labels StatSoft as “Double Victor”
Organizations are adopting, integrating and utilizing predictive analytics at an incredible rate. The business value of predictive analytics is clear: it enables organizations to define and attract the most profitable customers, streamline their resourcing and supply chain, improve the quality and targeting of their products, and many other applications. The Victory Index for predictive analytics, developed by Hurwitz & Associates, is designed to help organizations with an analysis of vendors and solutions for predictive analytics software. Hurwitz labeled StatSoft as a “Double Victor” based on its strong presence in the market, a solid vision, impeccable customer service, and great value for lower total cost of ownership.
The Victory Index is a valuable tool that companies can use to better understand predictive analytics and how that company can become a key player in a highly competitive market. The report shows where each of the leading vendors fall within the designated categories so that companies can capitalize on the experts and their Index rankings in the field of predictive analytics.
Click here to view the full report.
Written by: Jennifer Thompson
When data come from multiple sources, such as database tables, it can become necessary and beneficial to join or merge those tables to get the maximum information from our data. In this article, I will look at ways to bring data sources together easily in STATISTCA.
Specifically, we will show how inner and outer joins in queries can achieve this goal. Then we will show how the merge tool in STATISTICA can do the same tasks, both via interactive dialog boxes and the workspace.
Joins in Queried Data
When data reside in databases, a query is needed to bring the data into STATISTICA for analysis. During the query, functions such as an inner join or outer join can bring the data together. When joining two or more tables, a reference field from each table is needed. An inner join returns records where matches were found on this reference field in both tables. Records are discarded when a match from the other table is not found. (Inner joins can be built in STATISTICA with the GUI Query Builder tool.) For an outer join, records without a match in the joining table are returned. This is based on the type of outer join used: left, right or full. (Outer joins in STATISTICA queries can be performed in Text Mode in the query tool.) See the simple example below:
Inner join results are shown below. Only complete records are returned, they were found in both tables.
|ID||Data||First Name||Last Name|
This join was built with the GUI STATISTICA Query tool as seen here.
Full outer join results are shown below. All records are returned from both tables.
|ID||Data||First Name||Last Name|
These results were found with this query statement, seen below:
Merging Data Interactively
The same concepts can be used with data already found in STATISTICA spreadsheets. The Merge tool found on the Data tab in the Manage group can do this as well. In the Merge Options dialog box, the two data spreadsheets are selected with the File 1 and File 2 buttons. Then, change the Mode to Match variables. For an inner join style of merge, select the Unmatched Cases option, Delete cases. For an outer join style of merge, use the default Unmatched Cases option, Fill with MD.
Merging Data in the Workspace
This merge task can also be performed in the STATISTICA Workspace. First, both data tables should be inserted into the workspace by clicking Data Source. The column to be used for joining the data tables should be selected as the Dependent, continuous variable in each data source.
Then, using the Node Browser, select the Comparing and Merging Multiple Data Sources folder to find the Merge Variables node.
Next, we need to edit parameters of the Merge Variables node. Double click the node to display the Edit Parameters dialog box. Change the Mode to Relational. For an inner style join, the Unmatched cases should Delete, for an outer style join, Fill with MD.
When the selections are made, run the workspace to create the joined spreadsheet.
These operations are essential to creating the needed tables for analysis. With STATISTICA, you have several paths to choose from for meeting the end goal.
In statistics, sample data is often used to help find estimates of population parameters. Common parameters that experimenters try to estimate include population means, standard deviations, and proportions. Estimates called confidence intervals are used to estimate these parameters.
What Is a Confidence Interval?
The sample statistics (or point estimates) – such as the mean, standard deviation, proportion, etc. – are used to make inference about a population based on a random sample from that population. The point estimate likely does not equal the population parameter it estimates, but should be close. The confidence interval is a range around the point estimate that has a specific probability of containing the population parameter, typically 0.95 for a 95% confidence interval. The confidence interval gives a better estimate of the population parameter of interest because it gives the idea of the range in which the population parameter is.
Confidence Intervals for Single Means and Standard Deviations in STATISTICA
In STATISTICA, you can use the Descriptive Statistics analysis available via the Basic Statistics module to find confidence intervals for a single mean or single standard deviation. To access this analysis, first open a data file, and then select the Statistics tab. In the Base group, click Basic Statistics.
In the Basic Statistics and Tables Startup Panel, select Descriptive Statistics and click OK to display the Descriptive Statistics dialog box. The options for the confidence intervals for the mean and standard deviation are on the Advanced tab. You can specify the confidence level for each via the respective Interval edit box.
You would then click the Summary button to get the requested statistics, which would include these confidence intervals.
Using STATISTICA to Find a Confidence Interval for a Single Proportion
The Descriptive Statistics analysis is useful for finding statistics regarding continuous data. Proportions are not continuous, but counts. Tools such as Frequency Tables and Tables and Banners can find proportions. You can find a confidence interval for a single proportion using the Power Analysis module. This module is often used to calculate statistical power for a given analysis or to calculate the sample size required to attain a certain power level for a given analysis, but it can also be used to calculate, for a given analysis type, specialized confidence intervals not generally available in the general-purpose statistical packages.
Confidence Interval for a Single Proportion Example
In this example, researchers took a sample of 500 randomly selected subjects who completed four years of college. They found that 75 of them smoked on a regular basis. Thus, the sample proportion (often designated as p̂) of people who smoked and had a four-year college education is 75/500=0.15 (or 15%). If we wanted an estimate of the true proportion (usually designated as p) of people who smoke that have a four-year education, we could construct a confidence interval for the proportion.
The simplest and most commonly used formula for this type of confidence interval relies on approximating the binomial distribution with a normal distribution (the proportion is binomial because the person sampled either smoked or did not smoke). The formula is:
where z₁-α⁄2 is the 1-α⁄2 percentile of the standard normal distribution; α is the Type I error rate and is the complement of the confidence level. Thus, for a 95% confidence level, the error α is 5% or 0.05.
This z-score can be calculated within STATISTICA. On the Statistics tab in the Base group, click Basic Statistics to display the Basic Statistics and Tables Startup Panel. Select Probability calculator.
Click OK to display the Probability Distribution Calculator.
In the Distribution field, select Z (Normal). Select the Inverse, Two-tailed, and (1-cumulative p) check boxes. We are using α = 0.05, so enter this value for p. Click the Compute button to calculate the z critical value (which is given in the X edit field). It is found to be 1.959964, which is commonly rounded to 1.96.
Thus, the confidence interval for the true proportion is 0.15-1.96*sqrt[(0.15)(0.85)/500] < p < 0.15+1.96*sqrt[(0.15)(0.85)/500]→0.11870131 < p < 0.18129869.
Finding the Confidence Interval in STATISTICA
As previously mentioned, we can find this same confidence interval for a single proportion using the Power Analysis module in STATISTICA.
With any data file opened, select the Statistics tab. In the Advanced/Multivariate group, click Power Analysis. In the Power Analysis and Interval Estimation Startup Panel, select Interval Estimation as the analysis category, and then select One Proportion, Z, Chi-Square Test as the analysis type.
In the Single Proportion: Interval Estimation dialog box, enter 0.15 for Observed Proportion p, 500 for Sample Size (N), and 0.95 for Conf. Level.
Click Compute to calculate the confidence interval.
The Pi (Crude) results should match what was calculated earlier by hand as these are the estimates using the normal approximation to the binomial distribution (note that the hand calculations could be off a little due to rounding the z critical value to 1.96; STATISTICA will carry this out to more decimals for better accuracy).
The results in the Interval Estimation spreadsheet also include two other ways to calculate the confidence interval for a proportion – Pi (Exact) (the confidence intervals are the “exact, Clopper-Pearson” confidence intervals) and Pi (Approximate) (the confidence intervals employ a score method with a continuity correction). For more information on how these two methods are computed, see methods 4 and 5 from Robert Newcombe’s paper, Two-Sided Confidence Intervals for the Single
Proportion: Comparison of Seven Methods (1998, Statistics in Medicine, 17, 857-872).
Sometimes a researcher wants to estimate the true proportion of a population of interest by finding the confidence interval for that proportion. In STATISTICA, the Power Analysis module provides the means to find this estimate.
OCTOBER 15, 2012, TULSA, OK USA: StatSoft, Inc., one of the world’s largest providers of
analytics software, is taking its corporate motto to a whole new level by offering STATISTICA
Enterprise™ solutions (including its cutting-edge Big Data Predictive Analytics Platform) at no
charge to companies in countries most affected by the European economic downturn: Greece,
Portugal, and Spain.
StatSoft’s motto, “Making the World More Productive™,” reflects its core belief that business
analytics and big data processing are key to the productivity of every growing company. The
sour economies in some countries, however, have made it impossible for struggling businesses
to afford the very software solutions that could help them increase productivity, streamline
operations, and achieve safety, quality, and environmental improvements. So, for a limited
time, StatSoft is offering its powerful Enterprise and Predictive Analytics solutions for free to
“Given StatSoft’s recent growth in other international markets, we are very pleased to be in a
position to help our corporate neighbors in Greece, Portugal, and Spain whose well-known
capabilities are being undermined by regional economic conditions,” notes StatSoft’s CEO, Dr.
“The highly educated work force in those economies is fully capable of taking advantage of the well-demonstrated, tangible productivity improvements and savings that our modern predictive analytics software can offer. Paradoxically, however, the shortage of available credit prevents them from acquiring the crucial technology that would vastly speed up their recovery. Our goal is to make sure these companies succeed, because we firmly believe that analytics can change the world for the better.”
This large scale initiative adds new meaning to the well-known business term ROI (Return on
Investment). StatSoft will serve first those companies whose infrastructure development it
deems would produce the largest economic return, in terms of social and employment benefits.
Lewicki anticipates other companies will follow suit.
“We hope for a snowball effect, where our leadership will prompt other companies to do the same as us,” he explains, “thus helping reduce the current risk for the Euro and speeding up the European recovery.”
As reflected in case studies and success stories at http://www.statsoft.com/customers/successstories/,
STATISTICA Enterprise installations of any scope can result in huge dividends that help business enterprises not only survive but thrive, even while regional economies may be slow to
Those companies in Spain and Portugal interested in this free software opportunity must
contact the StatSoft Iberica office. Companies in Greece are welcome to contact any StatSoft
office in Europe (found at https://www.statsoft.com/contact-us/statsoft-locations-map/) or the
ABOUT STATISTICA AND STATSOFT, INC.
StatSoft was founded in 1984 and is now one of the world’s largest providers of analytics
software, with 30 offices around the globe and more than one million users of STATISTICA
software. StatSoft’s solutions enjoy an extremely high level of user satisfaction across
industries, as demonstrated in the unprecedented record of top ratings in practically all
published reviews and large, independent surveys of analytics users worldwide. With its
comprehensive suite of STATISTICA solutions for a wide variety of industries, StatSoft is a
trusted partner of the world’s largest organizations and businesses (including most of the
Fortune 500 companies), providing mission-critical applications that help them increase
productivity, control risk, reduce waste, streamline operations, achieve regulatory compliance,
and protect the environment.
For more information contact: