Monthly Archives: April 2013

STATISTICA 12 Released! – Bigger! Better! Faster! #Statistica #Statistics #Software

statistica 12

Better. Bigger. Faster.

An example image of the new STATISTICA workspace
An example of the new workspace in version 12. (Click to enlarge)

An explosive combination of Big Data growth, digital storage capabilities, and technological advances has forever altered the modern business analytics landscape. The application of analytic tools and decision making is no longer limited to the realm of data scientists, computer programmers, engineers, and the like. Rather, analytics are now being integrated into day-to-day tasks across all departments, utilized by project managers, business analysts, predictive modelers, customer agents, and executive leaders who need access to sensible, actionable information. People who need visual user interfaces to create, consume, and share KPIs, graphs, reports, slide presentations, and more.

To meet these changes head on, we made STATISTICA even faster, more flexible, and more functional than ever:

  • We boosted the Big Data performance of the entire product line.
  • We added a visual user interface to write SQL queries with the new Advanced Query Builder in all products .
  • We reinvented the visual analytic workspace in STATISTICA Enterprise and Data Miner for a more intuitive user experience, with greater visual workflow and storage capabilities to help users understand and communicate their findings.
  • We strengthened the predictive/prescriptive capabilities of Decisioning Platform®.
  • We introduced the highly flexible Reporting Tables product that enables users to visually build tables of summary statistics and use them in presentations and other reports.
  • We developed new nodes, such as the practical Data Health Check that facilitates cleanup of a large number of variables.

With the rollout of STATISTICA 12 in April 2013, StatSoft builds on its nearly 30-year legacy of exceeding customer expectations, furnishing this ever-growing business landscape with a host of relevant features and performance improvements that will make our analytic solutions even faster, more accessible, and more effective for business leaders and power users alike.

We fit into your IT world better than any alternative. Whether handling medium data or Big Data, STATISTICA 12 takes greater advantage of existing data warehouses and IT tools than ever before, helping move businesses even closer and faster to meaningful ROI.

Advanced Query Builder

Advanced Query Builder (AQB) makes it possible for even non-technical staff to write complex queries to retrieve data . It has a new visual user interface to build queries (dragging, dropping, nesting, selecting). The application’s parsing engine determines the current context.

the new advanced query builder dialog

Offering features usually found only in specialized applications, AQB can build left, right, and full outer joins graphically; can build queries with aggregate functions; is capable of building complex queries involving unions and minus operations; can graphically represent complex SQL queries and ER diagrams; and can provide the means for SQL dialect to be changed when the universal default is not practical.

Spreadsheet Improvements

New File Format for Better Support of Big Data

STATISTICA now features a new data file format that is optimized for Big Data by supporting variable storage length for text variables. When text variables include sparsely populated columns, the space occupied by those values is automatically optimized, reducing spreadsheet sizes sufficiently to produce significant performance improvements.

Spreadsheet “Virtual Variables”

Spreadsheets now use virtual variables that can be specified by formula and evaluated at run time, requiring no real storage.  These virtual variables are added or deleted behind the scenes without needing to rewrite entire spreadsheet data sections, so users will notice only enhanced performance.  New data hides in a separate vector on disk and is reunited with the original spreadsheet when data is saved. This especially adds significant performance improvements to large spreadsheets when you need to add transformed variables.

Increase in Text Labels

Text Label support in spreadsheets has now been increased to millions of distinct labels with significant performance improvements for name/value lookup. This makes Text Labels a good choice for text fields with large numbers of distinct values, inheriting all the performance benefits from a fixed storage size of the numeric value and avoiding duplication of repeated values.

Aggregate Function in OLE DB Provider for STATISTICA Spreadsheets

The OLE DB provider now allows for the utilization of aggregate functions such as average, count, max, min, or sum.

Importing Text Files Using Auto-Fixed Importing Variable Operations

This enhancement to STATISTICA provides the ability to take blocks of data that contain fixed-length pieces of information, and specify the fixed length to import variable- specific information.

the new text file import prompt

STATISTICA now has the option for a Fixed import setting.

Data Visualization

Several new options have been added to provide additional features and tools for visualizing data.

  • “Orthogonal regression” fit type is now supported in 2D scatterplots
  • Points on graphs can now be annotated
  • New options in compound graphs improve visual appearance by controlling the scaling display
  • A new data file can be created by brushing the points to be included
  • Date and time s upport was added for “meaningful time intervals” in graph scales
  • Now you can modify the margins of all plots in an original graph (e.g., multi-graph layout)
  • Create Pareto charts more easily
  • We added a new graph type, the parallel coordinate plot, which shows multiple variables, side-by-side, on comparable scales, thus making it easier to compare values across variables (see below).
    an example of the new graph type parallel coordinate plot
    Each Y-axis corresponds to a variable in a STATISTICA spreadsheet and can be defined according to standalone values or two-sided values (e.g., range boundaries, upper and lower limits, etc.)


False Discovery Rate

False Discovery Rate (FDR) and Qvalues were added. FDR performs the Benjamini and Hochberg method, and Qvalues performs the method described in the 2002 Storey paper .

New Distributions

New distributions were added to the Probability Distribution Calculator, STATISTICA Visual Basic functions, and spreadsheet functions. These are for hypergeometric distributions (inverse, cumulative, prob) and the inverse P oisson and inverse binomial distributions.

Stepwise Model Builder (STATISTICA Advanced)

Stepwise Model Builder provides control over model building and gives the modeler a “what-if” environment. This is useful when regulation or a company’s standard practices limit which variables can be used to build models. For example, a bank cannot discriminate based on age or gender.

Negative Binomial Distribution (STATISTICA Advanced)

This new option is available within GLZ. It enables you to specify the Negative Binomial as the distribution for the response variable. This specific form is referred to as the Poisson-Gamma mixture form and is the discrete analog to the continuous gamma distribution.

Quality Control Charts (STATISTICA Quality Control)

Quality Control now includes options that can set the background color for in control, out of control, and out of warning lines on quality control graphs.

Full Factorial Analytic Option within the DOE Module (STATISTICA Quality Control)

Enhanced functionally allows for the generation of a full factorial design where the number of factors can vary from 2 to 1,000, and each factor can have up to 1,000 levels. The factor type can be categorical or continuous.


Microsoft Office 2010 Style Toolbars

STATISTICA now uses the Office 2010 style toolbars. The Help menu has been moved to the File tab.

Search Facility

Now you can search for modules by name, select a module, and start it. This feature indexes all available ribbon bar options and displays them alphabetically. Typing in the search box will start restricting the list to those entries that match any of the words from the ribbon bar option. Pressing ENTER will open the selected module’s dialog box.

High Resolution DPI 120 Supported

Starting with the release of Microsoft Vista and the greater availability of very high resolution monitors, Microsoft made it much easier to change DPI. And for Windows 7, themes come with a default of DPI 120 for high resolution.  This resolution is now supported with STATISTICA.

Data Miner Workspace Enhancements

The Workspace has been upgraded to include a large number of new features to improve usability and performance, especially with respect to handing very large data sets.

an example of the new Data Miner workspace A new system of nodes has been introduced with enhancements of the user interface to closely resemble the user interface in the respective modules. The previous nodes are still offered and supported for backwards compatibility.

Enhanced Ability to Import Excel files

STATISTICA now has the ability to import Excel files using the nomenclature of Excel spreadsheets: letters for columns and numbers for cases.

an example of the new import process for excel
an example of the new import process for excel
This functionality is not only available interactively, but is also translated to the Workspace utilizing the new Import Excel node.

an example of the new import process for excel You can use this node to import Excel data directly from a spreadsheet into a Workspace.

Analytic Enhancements

Data Health Checkthe icon for the new data health check node

The Data Health Check node is new in STATISTICA 12 and is available to all STATISTICA Data Miner users. This node detects common data issues for each variable, completes basic data cleaning, and generates a report that can be used in deciding how to further clean the data. The Data Health Check node is especially useful for exploring a large number of variables automatically.

Construction of Trees, Sensitivity Analysis

This new “sensitivity” option enables you to learn more detail about a specific node. You can then use this knowledge to redefine the splits of the proposed tree in an expert way.

Ordered Twoing Criterion

This is an option to treat categorical dependent variables in order. It is useful when categories represent levels (low, medium, high).

Predictor Screening

This is a new method for analyzing predictors that was added to Feature Selection. This functionality can be used as a quick, first look at a predictor to provide a basic set of statistics.

Data Access Enhancements

Teradata Code Deployment (STATISTICA Data Miner with Code Generator )

User-defined functions can now be defined for the Teradata database, which allows for in-database scoring.

Enterprise Workspace Enhancements

The Workspace has been upgraded to include a large number of new features to improve usability and performance, especially with respect to handing very large data sets.

an example of the new enterprise workspace A new system of nodes has been introduced with enhancements of the user interface to closely resemble the user interface in the respective modules. The previous nodes are still offered and supported for backwards compatibility.

Enhanced Ability to Import Excel files

STATISTICA now has the ability to import Excel files using the nomenclature of Excel spreadsheets: letters for columns and numbers for cases.

an example of the new import process for excel
an example of the new import process for excel
This functionality is not only available interactively, but is also translated to the Workspace utilizing the new Import Excel node.

an example of the new import process for excel You can use this node to import Excel data directly from a spreadsheet into a Workspace.

Analytic Enhancements

Data Health Checkthe icon for the new data health check node

The Data Health Check node is new in STATISTICA 12 and is available to all STATISTICA Enterprise users. This node detects common data issues for each variable, completes basic data cleaning, and generates a report that can be used in deciding how to further clean the data. The Data Health Check node is especially useful for exploring a large number of variables automatically.


A new enhancement is the selection of spreadsheet cells into dynamic tags, which allows inserting the value of a particular cell into the text of a report and can be used for both text (including paragraph text strings) and numeric values.

Individual workbook items can be specified as dynamic tags, making it possible for these items to be included in reports.

Additionally, STATISTICA now supports an expanded list of keyword tags, including workflow name, SDMS version numbers, and more .

Quality Control Charts

STATISTICA Enterprise now supports full color and pattern control for the elements of QC charts, in the same manner that these options are supported in the interactive usage of STATISTICA. These controls are accessible from inside the Enterprise Manager application.

Data Access Enhancements

SVB Data Configurations

With SVBData Configurations, you can access non traditional databases that don’t have an ODBC or OLE DB provider. As an example, a large text file can be thought of as a database if someone desired to obtain its data . As a text file, however, it does not have an ODBC or OLE DB provider. But with an SVB Data Configuration, it is possible to access this text file as a database and make its data available to STATISTICA. If you want to execute different queries based on predetermined conditions, those conditions can also be coded into the SVB Data Configuration.

General Document Store

Files can now be saved/opened within the Enterprise System View , so STATISTICA documents and other document types can be stored within Enterprise Manager and shared among users outside a file share. The Enterprise System View is the default destination for saving reports. Additionally, standard STATISTICA Enterprise permissions and SDMS versioning are supported.

SVB and SVX code can be stored within Enterprise using the General Document store. Now all the places in Enterprise that use SVB can reference the stored code; changing the code in one place can simultaneously implement that change in SVB Analysis Configurations, SVB Data Configurations, Workspace node code, and Secondary SVB Programs within Enterprise.

Browser Support (STATISTICA Enterprise Server)

Support is provided for all main stream browsers: Internet Explorer, Chrome, Firefox, Safari, and Opera. This makes it possible for you to use STATISTICA Enterprise Server from your iPad or laptop.

Workbook Supported (STATISTICA Enterprise Server)

Workbooks can now be shared easily with others through the STATISTICA Enterprise Server Portal. After a file is published, a Download from Server link (URL) will be provided.

Versioning Support (STATISTICA Enterprise Compliance Edition)

STATISTICA Enterprise Compliance Edition is an integration of STATISTICA Enterprise with a highly scalable document management system that enables you to securely manage documents of any kind, and it is designed to ensure compliance with FDA 21 CFR Part 11 regulations, Sarbanes-Oxley legislation, as well as ISO 9000, 9001, and 14001 documentation requirements. New functionality provides for easy version comparison and opening of previous versions of documents.

Version Comparison

Now when SDMS integration is enabled, you can compare different versions of SDMS objects in Enterprise Manager. Each versionable Enterprise object will have a text representation:

  • Data Configuration – list of query, data types, and OLE DB column properties
  • IQC Analysis Configuration – summary of QC settings/parameters
  • SVB Analysis Configuration – SVB text and properties
  • Rules object – text representation of rules
  • PMML object – PMML representation of model
  • Workflow – text detailing all contained nodes and parameters

Open Previous Version

For those versionable objects that can be opened directly in Enterprise, including Workspaces, PMML, and Rules objects, STATISTICA will allow a specified previous version of the object to be opened as a read-only object.

Labels (STATISTICA Web Data Entry)

Labels are used with the Data Entry product. Labels can now be stored in one or more system folders. Customers will find it easier to manage Labels with this new option.

Calibration Tests

Calibration Tests is a tool that makes it possible to compare the forecast probability of default ( PD) with the realized PD that eventually occurs.
A typical use case in financial institutions is to divide customers into segments of like customers, realizing that each separate segment will have a certain number of customers who meet credit obligations and a certain number who will not. Based upon the model the financial institution has agreed upon, each segment has a forecast PD. After the model has been used for a period of time, the accuracy of the model must be tested. Performing such tests is very easy in STATISTICA , which even includes a built-in “traffic light approach ” described in a popular reference on guidelines in credit risk management (Oesterreichishe Nationalbank, 2004).


STATISTICA Scorecard is now integrated with STATISTICA Decisioning Platform. This tool can now generate rules for batch scoring or live scoring.

Versioning Support

STATISTICA Compliance Edition is an integration of STATISTICA with a highly scalable document management system that enables you to securely manage documents of any kind, and it is designed to ensure compliance with FDA 21 CFR Part 11 regulations, Sarbanes-Oxley legislation, as well as ISO 9000, 9001, and 14001 documentation requirements. New functionality provides for easy version comparison and opening of previous versions of documents.

Version Comparison

Now when SDMS integration is enabled, you can compare different versions of SDMS objects. Each versionable object will have a text representation:

  • Data Configuration – list of query, data types, and OLE DB column properties
  • IQC Analysis Configuration – summary of QC settings/parameters
  • SVB Analysis Configuration – SVB text and properties
  • Rules object – text representation of rules
  • PMML object – PMML representation of model
  • Workflow – text detailing all contained nodes and parameters

Weight of Evidence

This new product is important to anyone engaged in binary prediction (yes/no). This tool automates a time- consuming task to bin predictors.

an example of the new weight of evidence feature Two methods are used:

  • Optimal
  • Interpreted (e.g., observed risk of prediction probability)

Rules Builder

Every organization has rules that govern its behavior. Consistently applying these rules to analytic projects or reports is a common challenge. Rules Builder solves this problem.

Business users, developers, or modelers find it easy to create, maintain, share, and re-use sets of rules. A “rule set” for data transformation could be created and then used by one or thousands of analytic projects. Role-based security controls access to these rules.

an example of the new rules builder dialogue Rules Builder has the ability to conditionally execute models with pre-scoring segment rules and then apply post-scoring policy rules. Rules can retrieve reason codes for individual predictions, which can be critical for many industries, such as banking or insurance. For example, banks are required to state why a loan application was denied.

The execution of rules can be visually traced with sample data to aid in troubleshooting complex scenarios.

STATISTICA Reporting Tables

Businesses are challenged to:

  • Summarize large amounts of data into formats that are easily understood
  • Easily emphasize particular data segments (e.g. , only report on Oklahoma and France)

an example of the new reporting tables STATISTICA Reporting Tables automatically sorts and summarizes data based on specifications made while developing the table. The tables are generated interactively by visually dragging and dropping variables into the appropriate four sections of the Reporting Tables dialog box (Layers, Column Label, Row Label , and Sigma). As the tables are customized, they can be previewed, and final results can be generated with the click of a button.

an example of the new reporting tables Options are available for processing Multiple Response Categories, Crosstable Groups , and Conditional Formatting.

Malaria – Bite Back #MalariaStatistics #Statistics #Health

Writtwn by Win Noren

Today, Thursday April 25, is World Malaria Day. The statistics surrounding malaria are a bit overwhelming, and for those of us who live in the United States it is a problem that doesn’t impact us personally unless we travel to a malaria risk area. So it is easy not to be even aware of the problem this disease causes to millions around the world.

How can I make sense of those numbers? And why isn’t this something that makes our news?

Well, I don’t really have answers to those questions, but I do know that there are simple ways that we can help “bite back” at this terrible disease that are easy and cost so little that I will hardly notice an impact in my life but those who benefit will feel a huge impact. There are dozens of organizations who are working to combat malaria including the World Health Organization and a number of NGOs operating in Africa (where malaria is the most endemic) but also in other at-risk areas.

One of the most cost effective interventions is education. When caregivers understand what is the cause of malaria they can “bite back” by removing standing water from around their homes, plant trees that are natural mosquito deterrents and be sure that everyone sleeps under a treated mosquito net.


A long-lasting treated bed net is a cruical piece in helping families combat malaria. According to Chris Helfrich, director of the United Nations Foundation’s Nothing But Nets“Bed nets are still one of the simplest, most cost-effective tools in the fight against malaria.”

Using these methods, an NGO operating in Haiti, Compassion International, saw a reduction in malaria infections so that only one case of malaria was reported among their beneficiaries in the last fiscal year (2011-2012). This is remarkable, especially when seen in contrast to the fact that malaria is the third-leading cause of death in children under 5 in Haiti.

According to Compassion International, two mosquito bed nets can be provided for only $20 which is as easy as watching a DVD at home rather than going out with my husband to see a movie this weekend.

Root Cause Analysis

Importance plot for Root Cause AnalysisThe term root cause analysis is commonly used in manufacturing to summarize the activities involved in determining the variables or factors that impact the final quality or yield of the respective processes. For example, if a particular pattern of defects emerges in the manufacture of silicon chips, engineers will pursue various methods and strategies for root cause analysis to determine the ultimate causes of those patterns of quality problems.

One method of root cause analysis is variable screening (also called feature selection), where analytic tools are used to find the variables most highly associated with the quality issue.  Interaction terms between these variables can also be part of the root cause. The analysis will yield a list of variables that are the best predictors of the quality issues. These variables can be explored further to gain additional insight.

Design terms

The columns of the design matrix (design terms) for interaction effects are created as follows:
Continuous-by-continuous predictor interactions – A single column is created in the design matrix for each product of the continuous predictor columns.
Continuous-by-categorical predictor interactions – First, the number of unique values (classes) in the categorical predictor is determined. As many columns as there are unique values in the categorical predictors are generated. For each column j of the k columns (unique values), a 1 is generated if the respective observation belongs to class j, and a 0 otherwise. Each column (with the 0/1 indicator codes) is then multiplied by the continuous predictor variable. Hence, for continuous-by-categorical predictor interactions, the program will generate as many columns in the design matrix as there are unique values in the categorical predictor.
Categorical-by-categorical predictor interactions – The unique combinations of groups or classes are enumerated into a single column in the design matrix. For example, the interaction between two categorical predictors with two unique values (classes) each would result in a single column with (2*2 =) 4 values. Note that these coded columns in the design matrix are technically “confounded” with the main effects. In other words, if one of the categorical predictors is strongly related to the dependent variable in the analysis, it is likely that some of the interactions with other categorical predictors will show strong relationships with the dependent variable as well.
Higher-order interactions (e.g., three-way interactions) are created accordingly, i.e., they are generated as the products of continuous and categorical predictors following the rules outlined above. For example, a three-way interaction column would be generated by multiplying a two-way interaction with another effect.

Variable Screening

Options for variable screening enable you to screen predictor variables for regression and classification problems as well as the methods that can be used to find the predictors that are important. In general, predictor statistics can be computed by the respective method, and then predictors can be ranked based on the method-specific measure of predictor importance. The following methods may be appropriate:
Linear model. A linear fit model using stepwise selection of predictors is a simple approach to the regression problem. Predictor importance is computed by ranking the p-values for each predictor effect. For tied p-values, the rankings are based on the ranking of the F-values. For classification tasks, a stepwise linear discriminant function analysis can be used. Predictor importance is computed by ranking the values of the Wilks’ lambda statistics for each predictor.
Classification and regression trees. For classification and regression trees, the standard rankings for predictor importance are used.
Boosted trees. For boosted trees models (stochastic gradient boosting), the standard rankings for predictor importance are used.
MARSplines. For multivariate regression splines (MARSplines), rankings are computed based on the number of times that each predictor was used (referenced) in a basis function. The more frequently a predictor was used (referenced by a basis function), the greater is its importance.
Neural networks. For neural networks, the final importance rankings for the predictors is computed by averaging the importance rankings for each predictor over a set of networks.

STATISTICA Quality Control #QualityControl #QC #Statistica #Statistics #Software

STATISTICA Quality Control Charts features a wide selection of quality control analysis techniques with presentation-quality charts of unmatched versatility and comprehensiveness. It is uniquely ideal for both automated shop-floor quality control systems of all types and levels of complexity (see also STATISTICA Enterprise-wide Systems, as well as sophisticated analytic and quality improvement research. A selection of automation options and user-interface shortcuts simplify routine work and practically all of the numerous graph layout options and specifications can be permanently modified (saved as system default settings or as reusable templates). Finally, STATISTICA Quality Control Charts includes powerful and easy to use facilities to custom design entirely new analytic procedures and add them permanently to the application, and those options are particularly useful when quality control analyses need to be integrated into existing data collection/monitoring systems.

It features the following features:

  • Standard quality control charts
  • Multivariate charts
  • Interactive, analytic brushing and labeling of points
  • Assigning causes and actions
  • Flexible, customizable alarm notification system
  • Supervisor and operator mode; password protection
  • Organization of data
  • Short run charts
  • Chart options and statistics
  • Non-normal control limits and process capability and performance indices
  • Other plots and Spreadsheets
  • Real-time QC systems; external data sources




Standard Quality Control Charts

Quality Control ChartingThe program offers flexible implementations of Pareto charts, X-bar charts, R charts, S charts, S-squared (variance) charts, C charts, Np charts (binomial counts), P charts (binomial proportions), U charts, CuSum (cumulative sum) charts, moving range charts, runs charts (for individual observations), regression control charts, MA charts (moving average), and EWMA charts (exponentially-weighted moving average). These charts may be based on user-specified values or on parameters (e.g., means, ranges, proportions, etc.) computed from the data. Most of the variable control charts can be constructed from single observations (e.g., moving range chart) as well as from samples of multiple observations. Control limits can be specified in terms of multiples of sigma (e.g., 3 * sigma), in terms of normal or non-normal (Johnson-curves) probabilities (e.g., p=.01, .99), or as constant values. For unequal sample sizes, control charts can be computed with variable control limits or based on standardized values. For most charts, multiple sets of specifications can be used in the same chart (e.g., control limits for all new samples can be computed based on a subset of previous samples, etc.). Runs tests, such as the Western Electric Run Rules, are easily integrated into the QC chart. As with all STATISTICA graphs, QC charts in STATISTICA Quality Control Charts are highly customizable; you can add titles, comments, draw lines or mark regions dynamically anchored to specific scale values, or label the samples with dates, ID codes, etc.

Multivariate charts

In addition to the univariate (standard Shewart) control charts, STATISTICA extends the control charting options with multivariate charts. These multivariate charts are useful for tracking large numbers of parameters (variables) in a single chart. The capability exists to “intelligently” monitor literally hundreds of processes simultaneously. Available charts include:

  • Hotelling T2 chart for individual observations and sample means
  • Multivariate Exponentially Weighted Moving Average charts (MEWMA) for observations and sample means
  • Multivariate Cumulative Sum Charts (MCUSUM) for observations
  • Multiple Stream X-Bar and R charts, MR charts, and S charts for observations and sample means
  • Generalized Variance Charts

Similar to the standard charts, many of the same tools exist for their multivariate counterparts.

Interactive, analytic brushing and labeling of points

General “intelligent” and comprehensive analytic brushing facilities are available for interactive removal or labeling of outliers (or what-if analyses) in individual charts or sets of charts. The user can select individual samples or groups of samples based on currently specified chart criteria (control limits, runs rules), and exclude them from the computations for the chart (but still show them in the chart), or drop them from the chart altogether. Multiple charts can be set up to use the same sample inclusion/exclusion criteria; in this manner several charts can be simultaneously brushed (e.g., a point excluded from the X-bar and R chart will simultaneously be excluded from all histograms). The user can also request to plot all individual observations for selected or for all samples.

Assigning causes and actions

The user can assign causes, actions, and/or comments to outliers or any other points in most charts. Labels for causes and actions can be assigned via interactive brushing, or the program can detect and select out-of-control samples.

Flexible, customizable alarm notification system

A comprehensive selection of options are provided for specifying user-defined criteria that define an out-of-control condition or “noteworthy event” (e.g., runs test violation, individual observation outside specification limits, etc.). The alarm notification system can be customized to trigger various types of “responses” to a particular event. For example, you can set up a system to respond to an out-of-control sample. STATISTICA Quality Control Charts will automatically prompt the operator to enter a cause, then launch a STATISTICA Visual Basic program to compute various other statistics or invoke an external program, and then run another external program to (for example) call a particular pager number or send an e-mail to the supervising engineers. The alarm notifications setup can be saved in a configuration file (that can be applied to future charts), or used as the default for all future charts.

Supervisor and operator mode; password protection

The chart-editing features for shop-floor control charts (including the assignment of causes, actions, brushing, alarm notification, etc.), chart specifications, as well as the input data file can be password-protected, to create a customized operator mode with only limited access to the charts or data. The charts can be saved (e.g., by the supervising engineer), and loaded by the operator in this limited-access operator mode.

Organization of data

For most charts, the data can be organized to accommodate practically all formats in which data are gathered for quality control applications. Samples can be identified by sample identifiers or code numbers, or you can specify a fixed number of measurements per sample (and part, see below).

Short run control charts

Most standard variable control charts (X-bar, R, S, S-squared, MA, EWMA) and attribute control charts (C, U, P, Np) can be used for short production runs (short run charts for multiple parts or machines). For short run variable control charts, you can specify nominal target values only (nominal chart or target chart), or target values and variability values for standardized short run charts. Options are provided for sorting sample points in the respective charts and for plotting them by sample number, by part, or in the order in which the respective samples were taken. Detailed statistics are computed by parts and samples. The respective sample and part identifiers for each measurement can be specified in the data file, and/or you can choose to assign a fixed number of consecutive cases to consecutive samples and/or parts. Note that all chart options and statistics (e.g., process capability and performance indices, runs rules, etc.) commonly reported for standard charts are also available for short run charts.

Chart options and statistics

A wide variety of additional quality control statistics are included. The user can compute the process capability and performance indices (e.g., normal distribution Cpk, Ppk, etc., non-normal distribution Cpk, Ppk, etc.), include histograms of the respective quality characteristics, or automatically perform any or all of seven different runs tests (runs rules). The standard variable control charts can be produced as compound graphic displays; for example, the X-bar and the R (or S, or S-squared) chart will be displayed together with optional corresponding histograms for the respective means, ranges, proportions, etc. also shown in the same chart. Outliers (samples outside the control limits) or sections of data identified via runs tests are automatically highlighted (marked) in the plots. The user can also add to the plot warning lines, moving average or exponentially-weighted moving average lines, or lines indicating specification ranges.

Non-normal control limits and process capability and performance indices

For variable control charts, in addition to the customary normal distribution based charts and statistics, the program will also compute charts for measurements that are not normally distributed (e.g., are highly skewed). These options are particularly important for situations where the sample sizes are small and where deviations from normality may lead to greatly inflated or deflated error rates if the customary normal distribution based statistics were used. The program will compute control limits based on the Johnson curves fit to the first four moments of the observed data; user-specified values for the moments can also be supplied. Process capability indices can be computed based on the fitting of Johnson curves as well as Pearson curves. Note that capability indices based on specific distributions can also be computed in STATISTICA Process Analysis.

Other plots and Spreadsheets

For most charts (including the R-chart), the user may compute and plot the respective operating characteristic curve (OC curve). In addition to the charts, the respective values (plotted in the charts) can also be reviewed via Spreadsheets, allowing the user to examine the precise values of plotted lines and points. Customized (blank) charts can be printed that can later be “filled in” by hand by the quality control engineer. Note that as with all other graphs in STATISTICA, the graphs produced by STATISTICA Quality Control Charts can be extensively customized and saved for further analysis and/or customization.

Real-time QC systems; external data sources

Most graphs and charts in STATISTICA Quality Control Charts can be automatically linked to the data, and updated when the data are updated. To facilitate data transfers powerful (optional) STATISTICA applications are available (STATISTICA Enterprise/QC and STATISTICA Enterprise).

STATISTICA Enterprise is a groupware version of STATISTICA fully integrated with a powerful central data warehouse that provides an efficient general interface to enterprise-wide repositories of data and a means for collaborative work (extensive groupware functionality).

STATISTICA Enterprise/QC. STATISTICA Enterprise/QC is an integrated multi-user software package that provides complete statistical process control (SPC) functionality for enterprise installations. STATISTICA Enterprise/QC includes a central database, provides all tools necessary to process and manage data from multiple channels, and coordinate the work of multiple operators, QC engineers, and supervisors.

STATISTICA Enterprise/QC and STATISTICA Enterprise provide very flexible facilities to integrate the procedures in STATISTICA Quality Control Charts into your enterprise-wide database, and to design elaborate company-wide quality monitoring systems.


System Requirements



STATISTICA Quality Control is compatible with Windows XP, Windows Vista, and Windows 7.

Minimum System Requirements

  • Operating System: Windows XP or above
  • RAM: 256 MB
  • Processor Speed: 500 MHz

Recommended System Requirements

  • Operating System: Windows XP or above
  • RAM: 1 GB
  • Processor Speed: 2.0 GHz

Native 64-bit versions and highly optimized multiprocessor versions are available.

Contact for more information.

Power Solutions – Improve Efficiency and Performance of Your Equipment #Power #Solutions #Software #Statistics

PPPR Screenshot, Scatterplot, Multiple Variables vs. TimePPPR Screenshot, Scatterplot, Multiple Variables vs. Time

Improve Efficiency and Performance of Your Equipment

Problem: Optimize performance and reliability of ongoing operations; stabilize and improve flame temperatures of an 85 MW coal-burning multicyclone unit.

Solution: Apply StatSoft’s proprietary data-driven (data mining) methodologies to consistently increase flame temperatures under a variety of loads.

Results: Flame temperatures increased consistently across all cyclone burners, leading to more reliable operations.

Note: Even though the flame temperatures had been within satisfactory limits, StatSoft’s optimized control settings improved temperatures further and beyond historical values.

Power Solutions – Increase Flame Temperatures

Optimize Operations

Increase Flame Temperatures

Problem: Optimization of a coal burning 300 MW multi-cyclone unit for consistent high flame temperatures; increase the flame temperatures to avoid forming slag, burning fuel oil, etc.

Solution: Analyze twelve months of three-minute historical data using StatSoft’s proprietary data-driven (data mining) methodologies; Identify optimized control parameter settings for Stoichiometric Ratios (S.R.), Coal flows, Primary Air, Tertiary Air, Split Secondary Air Damper Flows, etc.

Results: After dialing in StatSoft optimized settings, flame temperatures immediately responded (strongly), resulting in more stable and higher flame temperatures (cleaner combustion)

Note: The flame temperature at some of the cyclones had been abnormally and critically low for several days, requiring the burning of fuel oil (at a substantial cost) and intermittent shut-downs; flame temperatures recovered almost immediately after StatSoft’s optimized control settings were applied.

PPPR Screenshot, Line Graph, Time vs. Flame TemperaturePPPR Screenshot, Line Graph, Time vs. Flame Temperature

PPPR Screenshot, Standard vs. Optimized Comparison Chart, Optimization of Flame TemperaturePPPR Screenshot, Standard vs. Optimized Comparison Chart, Optimization of Flame Temperature

Metal & Engineering Industry companies needed.


Nelson Mandela Metropolitan Area companies willing to participate in a Research Study on the relationship between Industrial Relations climate, Employee-Supervisor relations and company performance. We are looking for companies in the Nelson Mandela Metropolitan Area that fall under the auspices of the Metal and Engineering Industry Bargaining Council or Chapter 111 Companies in the Motor Industry Bargaining Council. The companies should have NUMSA as the dominant union and a minimum of one trade union representative.



Labour Relations and Human Resources Unit

Nelson Mandela Metropolitan University

27 41 504 2362

Participate in the 2013 Data Miner Survey #RexerAnalytics

2013 Data Miner Survey:

Thank you for your interest in this 2013 survey of the analytic behaviors, views
and preferences of data mining, data science, and analytic professionals.

Your responses are completely confidential; no information you provide on the
survey will be shared with anyone outside of Rexer Analytics in any way that
identifies an individual respondent.  All reporting of the survey findings will be done
in the aggregate, and any quotes taken from open-end responses will be carefully
scrubbed so no individual identifying information is included.

Rexer Analytics has been conducting the Data Miner Survey since 2007.  Over
1300 people from around the globe participated in the 2011 survey.  Summary
reports (PDFs of about 40 pages) from previous surveys are available FREE to
everyone –– simply email us at  Also,
highlights of earlier Data Miner Surveys are available online, including best
practices shared by respondents on analytic success measurement, overcoming
data mining challenges, and other topics.  The FREE Summary Report for this
2013 Data Miner Survey will be available to everyone in the Fall of 2013.

This research is not being conducted on behalf of any third party, but is solely for
Rexer Analytics to disseminate the findings throughout the data mining
analytic community.

To participate, please click on the link below and enter the access code in the
space provided.  The survey should take approximately 15-20 minutes to complete.

Start Survey:   Rexer Analytics 2013 Data Miner Survey
Access Code:   Please use the code provided to you   
                If you haven’t received an access code, please use code R68L3.

Thank you for participating.

— The Rexer Analytics Data Miner Survey Team

StatSoft South America Presents “Road Show” in Manaus

Responding to requests from several clients, StatSoft South America will present the “South America Road Show” in the Manaus Industrial Pole (PIM) next week, April 15-17.
In order to demonstrate the business value of STATISTICA Enterprise™ solutions, Statsoft South America’s team will conduct a closed-door “Predictive Analytics Workshop” by invitation only. At this April 16 workshop, they will discuss how predictive analytics positively impacts businesses not only by creating an analytical mindset, but also by taking full advantage of existing investments and IT infrastructure. Attendees will learn the success stories of companies like Delphi, Benteler, the Decathlon, Lenovo and Pepsi. If you would like to attend this workshop, please email StatSoft
And on April 17, StatSoft South America’s Director of Operations, Nuno António Cruz, will be the featured speaker at the launch of the Business Intelligence MBA program on the Pós-Graduação IDAAM campus in Manaus. Titled, “From Business Intelligence to Intelligent Systems,” his lecture will focus on new trends in predictive modeling. Admission is free but the number of seats is limited.
Located in the Brazilian state of Amazonas, PIM is one of the most important industrial areas in South America. The companies located in this 10K-square-kilometer area enjoy high demand for innovative processes from customers seeking new solutions that can raise the profitability of their businesses.

Centrum – the “not so” clear choice

By Win Noren,

We are all familiar with the clichés about statistics. One of my favorites is attributed to Mark Twain: “There are three kinds of lies: lies, damned lies, and statistics.” It is very common for students of college statistics classes to claim that they hate the class. In fact in my undergraduate studies it was the one class that I refused to tutor. Of course in a classic twist of life, I then went on to receive a master’s degree in statistics only two years later.

Statistics (or its more popular current phrase “data analysis”) are all around us. A day does not go by when we are not bombarded by facts and figures that people and companies use to prove their point whether that be enticing us to purchase a product, vote for a candidate, agree with a position, or be a fan of a particular team.

When I taught undergraduate business statistics classes one of the goals I had for my students was that they would leave my class with an appreciation for the power of data analysis and would be more apt to question what was going on “behind the numbers” rather than just accept as “fact” whatever number was quoted in an article or advertisement.

Here is a recent example where you benefit from knowing the truth behind the numbers.

Centrum Silver is promoting as “big news” the fact that “Centrum Silver was part of the recently published landmark study evaluating the long-term benefits of multivitamins.” The study was the Physicians Health Study II. Of course the implied message is that all of us should also take Centrum Silver (well, all of us over the age of 50) since it was the vitamin of choice for the Physicians Health Study.

Started in 1997 the Physicians Health Study II was designed to study the impact of vitamin E, vitamin C, and multivitamins on the prevention of cardiovascular disease, cancer, age-related eye disease, and cognitive decline. . The multivitamin component concluded in June 2011. The conclusion of this study was published in various peer-reviewed journals and the conclusion is that while there was a statistically significant although modest reduction in cancer there was no reduction in the rate of cardiovascular disease such as heart attacks, stroke or death.

So should you take a daily multivitamin? You need to decide for yourself if the scientifically proven benefits are worth the cost, but that decision should be based on more than the Centrum Silver ad touting their use in the Physicians Health Study.