Monthly Archives: April 2012

Getting Started – Data Import

Data comes in many formats. For use in STATISTICA, these data may need imported and possibly prepared for analysis as well. STATISTICA imports data from a variety of sources including Microsoft Excel, text files and statistical software data files. Additionally, data can be queried from a database such as Access, Oracle, SQL and more. This video shows an example import of an Excel data set that then must be rearranged to follow the structural requirements of STATISTICA.

Part 2 of this video gives a step by step example using the query tool to bring in data from a database.

Getting Started – Navigating Analysis Dialogs

STATISTICA Solutions for Heavy Equipment Manufacturing

Capital Equipment Manufacturers utilize STATISTICA throughout the manufacturing process and then analyze the repair and usage data once their products are in use by customers

Manufacturing / Six Sigma

STATISTICA is an integral part of the quality control and Six Sigma programs at heavy equipment manufacturing organizations. Several of the largest global manufacturing organizations have global, site licenses for STATISTICA, used throughout their manufacturing sites.

Applications range from Web-based monitoring of Quality Control to fairly standard statistical process control techniques to customized STATISTICA-based applications for analyses that are specific to the type of manufacturing being performed.

Warranty Analyses

Capital equipment manufacturers typically provide basic and extended warranties to their customers as a value-added service. The length of warranty to provide and its associated cost for each product are important concerns for these organizations.

It is also helpful from product improvement and repair process improvement perspectives to be able to determine the most frequent repairs by product, the factors that contribute to a failure type, and the correlations between failures (e.g., if the repair technician determines that the water pump needs to be replaced, they may as well replace another component that is also likely to fail).

STATISTICA‘s data mining and text mining algorithms are critical components in the successful setting of warranty parameters and the determination of repair guidelines and rules to decrease warranty service costs.

Remote Monitoring

As a value-added service to their customers, organizations are able to offer remote monitoring services to their customers that deploy data transmission devices on their products and feed data to a centralized database. STATISTICA is integrated with those databases and monitors the various data feeds from the customer’s equipment. For example, the STATISTICA application includes predictive models to monitor oil pressure, RPMs, water pressure and various other equipment parameters. STATISTICA provides automated alerting and exception reporting when the latest data predict a problem or a failure for a piece of equipment. The organization notifies the customer proactively before there is a problem and a decision is made about whether a repair technician should be sent out to make adjustments to the machine.

Sales Analysis / CRM

StatSoft’s customers in the Capital Equipment Industry use the broad base of analytic techniques in the platform to determine regional patterns in their sales and to make cross-selling and up-selling recommendations based upon what an individual customer just purchased, what they already own, the business that the customer is in, the region in which the customer is based, etc.

STATISTICA Solutions for Food and Beverage Manufacturing

Companies in the Food industry utilize STATISTICA throughout the product development, manufacturing and sensory testing processes.

Research and Development

STATISTICA provides the integrated platform for analytics empowering research and new product development within organization in the Food industry. Improvements in the time-consuming and expensive process of research and development translate directly to the organization’s bottom line. Research organizations have experienced the positive impact of the deployment of the STATISTICA Enterprise platform. STATISTICA is the multi-user, server-based analytics platform to empower scientists with analytical tools that are easy to use, relevant, and integrated with their data sources.

The STATISTICA platform results in hard and soft Return on Investment (ROI) by:

  • Empowering scientists with the analytic and exploratory tools to make more sound decisions and gain greater insights from the precious data that they collect
  • Saving the scientists’ time by integrating analytics in their core processes
  • Saving the statisticians’ time to focus on the delivery and packaging of effective analytic tools within the STATISTICA framework
  • Increasing the level of collaboration across the R&D organization by sharing study results, findings, and reports

STATISTICA provides a broad base of integrated statistical and graphical tools including:

  • Tools for basic research such as Exploratory Graphical Analysis, Descriptive Statistics, t-tests, Analysis of Variance, General Linear Models, and Nonlinear Curve Fitting
  • Design of Experiments (DOE), including mixture designs and response optimization
  • Tools for more advanced analyses such as a variety of clustering, predictive modeling, classification and machine learning approaches including Principal Components Analysis

The STATISTICA platform meets the needs of both scientists and statisticians in your R&D organization.

Sensory Testing

STATISTICA provides a comprehensive set of tools for sensory testing. STATISTICA allows the Sensory Testing team to “break down” participant responses by group. The software allows them to perform comparisons across the responses of multiple groups. Integrated graphical analyses provide intuitive summaries of the observed differences for communication to a wider audience. STATISTICA Reports provide an effective way to summarize the data and findings from a sensory study, outputted in PDF, HTML or a Word Processor-compatible (e.g., MS Word) format.

Manufacturing / Six Sigma

STATISTICA is an integral part of the quality control and Six Sigma programs at food manufacturing organizations. STATISTICA performs real-time and offline analyses of product defects, package weights, nutritional components, and many other product attributes critical to quality.

9.999… reasons that .999… = 1

STATISTICA Solutions for Semiconductor and Automated Manufacturing

One of the most complex as well as expensive automated manufacturing environments is that required for the manufacture of semiconductors. The typical process involves the nearly fully automated application of hundreds of processing steps to lots (“stacks”) of silicon wafers, each containing a large number of microchips. Creating a high-yield process, where most (e.g., 90% or more) of all chips pass final acceptance testing is extremely difficult and time consuming. At the same time, the cost of failure in this environment is significant, as each wafer can be many times more valuable than even the most precious metal by weight. Moreover, unexpectedly lengthy ramp-up times (to create a reliable production process) may significantly undercut the commercial value of the final product, hence jeopardizing the huge investment in the semiconductor Fab, which may well reach $2 Billion dollars or more!

STATISTICA and the Engineering Process

The STATISTICA system provides a huge set of tools for engineers, to study processes. First, the STATISTICA system will quickly and seamlessly integrate into the existing information infrastructure, querying directly the relevant databases (practically all industry standard database formats are supported). There is no need to laboriously import the data into, for example, a limited spreadsheet format for further analyses; instead STATISTICA connects directly to your data.

Next, STATISTICA interactive graphics are extremely fast and flexible, so meaningful views and graphical summaries of key processes, variables, measurements, outcomes, etc. can be created very quickly. 

Complete Customizability and Programmability

Each process is unique, and the techniques for automated manufacturing are constantly evolving in this highly competitive environment. STATISTICA is fully customizable and programmable, down to all aspects of graphs, data handling, and so on. Hence, in addition to providing an extremely sophisticated and flexible off-the-shelf tool, the system also serves as a toolbox that will enable engineers to develop custom analyses and processes quickly, to support the critical ramp-up of new manufacturing processes, and the specialized analytic tools to support them.

Advanced Data Mining and Predictive Quality Control

STATISTICA Data Miner provides an extremely comprehensive set of knowledge discovery algorithms that can be applied to support the manufacturing process. In addition to commonly used advanced neural network architectures, STATISTICA Data Miner implements the most cutting edge tools in a single integrated platform. For example, the system includes algorithms such as stochastic gradient boosting, random forests, support vector machines, multivariate adaptive regression splines (MARSplines), independent components analysis, to name a few. These techniques can be used to build robust and reliable predictive models for quality or failure, even in high-dimensional environments with large numbers of variables (but few “cases” or “rows”) and significant interactions between them (e.g., interactions between tools). All of these methods are implemented in the same efficient and programmable STATISTICA platform, yielding the most advanced set of tools for tackling difficult root-cause analysis and predictive QC problems.

STATISTICA Data Miner and KLA-Tencor

STATISTICA Data Miner and other statistical analysis algorithms are used to provide the core support in KLA-Tencor’s yield analysis and management systems. Because StatSoft is the recognized leader in the application of advanced, cutting-edge data mining techniques, KLA-Tencor has chosen STATISTICA and StatSoft as the partner, to provide critical advanced data analysis and data mining support for dedicated yield management solutions for the semiconductor industries. Indeed, the complete programmability and customizability of the STATISTICA system make it the ideal toolkit for these types of custom solution systems.

Electronic Statistics Textbook

The only Internet Resource about Statistics Recommended by Encyclopedia Britannica

StatSoft has freely provided the Electronic Statistics Textbook as a public service for more than 17 years now.

This Textbook offers training in the understanding and application of statistics. The material was developed at the StatSoft R&D department based on many years of teaching undergraduate and graduate statistics courses and covers a wide variety of applications, including laboratory research (biomedical, agricultural, etc.), business statistics, credit scoring, forecasting, social science statistics and survey research, data mining, engineering and quality control applications, and many others.

The Electronic Textbook begins with an overview of the relevant elementary (pivotal) concepts and continues with a more in depth exploration of specific areas of statistics, organized by “modules” and accessible by buttons, representing classes of analytic techniques. A glossary of statistical terms and a list of references for further study are included.

Proper citation
(Electronic Version): StatSoft, Inc. (2011). Electronic Statistics Textbook. Tulsa, OK: StatSoft. WEB:
(Printed Version): Hill, T. & Lewicki, P. (2007). STATISTICS: Methods and Applications. StatSoft, Tulsa, OK.



Overview of Elementary Concepts in Statistics. In this introduction, we will briefly discuss those elementary statistical concepts that provide the necessary foundations for more specialized expertise in any area of statistical data analysis. The selected topics illustrate the basic assumptions of most statistical methods and/or have been demonstrated in research to be necessary components of one’s general understanding of the “quantitative nature” of reality (Nisbett, et al., 1987). Because of space limitations, we will focus mostly on the functional aspects of the concepts discussed and the presentation will be very short. Further information on each of those concepts can be found in the Introductory Overview and Examples sections of this manual and in statistical textbooks. Recommended introductory textbooks are: Kachigan (1986), and Runyon and Haber (1976); for a more advanced discussion of elementary theory and assumptions of statistics, see the classic books by Hays (1988), and Kendall and Stuart (1979).

Pepsi Hungary uses an SPC system based on STATISTICA Enterprise

As a result of the successful implementation of STATISTICA Enterprise quality monitoring and control systems at the Polish and Czech Pepsi plants, Pepsi Hungary has also started using a new SPC system based on STATISTICA Enterprise, customized according to local specific requirements. Hungarian Pepsi BU is wholly owned by PepsiCo after the merger of PAS & PBG & PepsiCo.Pepsi Hungary produces various brands of carbonated beverages like Pepsi, 7Up, and Mirinda.
“I can say that StatSoft’s user friendly software makes our life much easier.”
Mr. József Sinkó, Quality Assurance & Regulatory Manager
Mr. József Sinkó, quality assurance & regulatory manager at Pepsi Hungary said “We have chosen StatSoft’s solution for the effective assurance of the strict quality requirements of the production processes. The main requirements were that the SPC system should help us in monitoring and controlling product quality parameters, recognizing relationships and trends in time, and identifying sources of errors. Pepsi is also monitoring the machine capability and looking for to eliminate waste to ensure sustainability and decrease carbon footprint of the plant. Before introducing the new SPC system, it was a complicated and time-consuming task to perform analyses for longer periods of time or more parameters, because our old SPC system did not have the appropriate tools to find the necessary data easily and quickly. I can say that StatSoft’s user friendly software makes our life much easier.”
A number of quality parameters are measured during production including, the sugar concentration, carbon dioxide concentration, acidity and volume of the product, as well as the tightness of bottle caps. These are all monitored and controlled in the new SPC system. The quality parameters are calculated using different specifications for the different products.
Unlike the STATISTICA Enterprise solutions at the Polish and Czech Pepsi plants, where StatSoft developed a data collection system, the STATISTICA Enterprise SPC system implemented at Pepsi Hungary is connected to a third party data collection system that has been used at Pepsi Hungary for many years. The customized SPC system performs the statistical analyses and creates the analysis reports corresponding to the special user requirements, starting from data taken from the SQL database of the third party data collection system of Pepsi Hungary.
Mr. József Sinkó, quality assurance & regulatory manager pointed out: “We have had good experiences with the STATISTICA Enterprise SPC system since it was introduced in January, 2010. We are satisfied both with the support StatSoft provided to us during the planning stage, and the implemented system itself. The system meets our requirements in providing us with up-to-date, detailed reports that make quality assurance more effective. Using the new STATISTICA Enterprise SPC system the process capability indices can easily be calculated for a given day, week or any other selected period of time that provides us with a quick and detailed overview on the quality of production. Based on the quality results obtained for individual production lines and products, it is easy to decide where we should take remedial action. Using the science of SPC gives us the possibility to increase product quality and process capability, while the financial figures also improve.”
The main purpose of the new STATISTICA Enterprise SPC system is to support the daily decisions of the quality managers. The system:
• provides customized reports to obtain a general survey of the quality of production, and to identify problems and areas where development is required,
• provides flexible access to historical production data,
• allows for exhaustive individual analyses of data using a set of comprehensive analysis tools.
Future plans
A planned next step of applying the STATISTICA Enterprise SPC system is to provide management with an automated daily report summarizing the quality results of Pepsi plant, and to increase the number of the SPC controlled data supporting sustainable growth of the company.