Monthly Archives: December 2012

Out of Office

All our valued customers & software users.

StatSoft Southern Africa will be closed between the 22nd of December & the 2nd of January 2013. For emergencies, kindly contact Lorraine on 082 5678 330/ Jason 084 301 7447.

We wish you all a safe & joyful Festive Season.

From the team at StatSoft Southern Africa.

Season Greetings


Webinar: Save Customers with Churn Analysis

Companies in all industries face stiff competition for both new customers and existing customers. Of particular concern is the retention of existing customers who have purchased products or services but who may be considering future purchases with competitors. If you could identify which customers may be considering such a move, it would be beneficial to engage them preemptively and encourage them not to take their business dollars to other companies. This kind of identification can be achieved through churn analysis, which calculates the rate of customer attrition and identifies what customers may be impacted.

Join us as StatSoft Data Mining Consultant, Dr. Danny Stout, covers such topics such as:

* Overview of Churn Analysis
* Review Example Data
* Data Transformation
* Temporal Abstraction
* General Approaches for Churn Analysis

Register FREE at

Meet Paul Lewicki, CEO of StatSoft

Our guest on this edition of ST is Dr. Paul Lewicki, the CEO of StatSoft, a Tulsa-based company (established in 1984 as a partnership of a group of university professors and scientists) that makes business-analytical software, and that now has 30 offices worldwide. To read the rest of the article, click here.

Big Data, Too Much of Anything is Bad

star trek kirk tribbles image

There is an old saying “Too much of anything is bad”. I guess it applies for data analysis as well. smiley

The first time I was working on a real-world data mining problem, I was given a dataset with millions of cases and thousands of variables. I was trying to predict a target variable with the other variables. Also, I was trying to find out the variables which were having significant impact on the Target variable and rank them based on their significance.

I was wondering how to select the input variables (predictors) for predicting the Target variable, as choosing all the thousands of input variables doesn’t make any sense. Variables known to be unnecessary, as well as those variables that add a small amount of information, should be excluded from the analysis to overcome the Curse of Dimensionality. Pre-screening input variables can improve performance with respect to model building speed and predictive accuracy of data mining models.

Also, each predictor used by the data mining model is required to deploy that model to new cases. If good predictive accuracy is attainable with a smaller set of inputs, deployment is made easier.

Thank you to the STATISTICA Feature Selection and Variable Screening (FSL) module!

The STATISTICA FSL module acts as a pre-processor to select the list of top predictors that are likely related to the outcome variable. Furthermore, it ranks the predictors based on their significance for regression as well as classification-type problems. It uses both F and p-values as criterion for finding the predictor importance.

Figure 1: Feature Selection and Variable Screening Dialog Box

feature selection and variable screening


Figure 2: Selecting top ten predictors

feature selection


Figure 3: Predictor Importance

best predictors

Figure 4: Predictor Importance Plot

predictor importance plot

Figure 5: Predictor Importance Report

best predictors report
Header photo courtesy of United Federation of Planets Deep Space Station K-7 (Alpha Quadrant), originally sourced to Paramount Pictures.

STATISTICA Enterprise: Monitoring Calculated Variables

International Year of Statistics (Statistics2013)

ParticipatingOrgIt’s almost here, the celebration we’ve all been waiting for: the International Year of Statistics. StatSoft is excited to announce our participation in this worldwide movement. Dedicated to promoting the importance of statistics, the International Year of Statistics (also known as Statistics2013) is designed to reach business and government data users, media, policy makers, employers, students, and the general public.

The goals of Statistics2013 include increasing public awareness of the power and impact of statistics on all aspects of society; nurturing statistics as a profession, especially among young people; and promoting creativity and development in the sciences of probability and statistics.

By participating, StatSoft is indicating that we support the goals of Statistics2013.

Also, you can help spread the word about this movement by passing on this flyer or sending out the link to International Year of Statistics.

Validation Services: Computer Systems Validation for STATISTICA Applications

StatSoft, provider of the STATISTICA product suite, is committed to partnering with our customers in meeting our mutual goal of the design and production of products of the highest quality and reliability. Many of our customers in FDA-regulated industries, such as the design and manufacturing of pharmaceutical and medical device products, rely on STATISTICA as an integral software tool within their Research and Development and Quality Control processes.

StatSoft, through its Technical Services group, provides software validation services as part of the deployment of STATISTICA applications. The following sections provide an introductory overview to our standard Validation Package for STATISTICA applications deployed within your environment. These services include requirements gathering and documentation, validation planning, installation qualification, operational qualification and performance qualification.


StatSoft Compliance Statement (Adobe Acrobat Reader)
Read StatSoft’s Compliance Statement

Relevance to STATISTICA Applications

STATISTICA is used for many applications where computer systems validation is relevant.

For example, STATISTICA is used by organizations:

  • To test the characteristics of new products,
  • To optimize product formulations,
  • To inspect raw materials to be used in the manufacturing of products,
  • To make judgments about the efficacy of multiple product configurations,
  • To make predictions about product reliability,
  • To determine the most important process parameters within a multivariate product manufacturing application,
  • To determine the optimal product packaging for shipment, storage and delivery to consumers, and
  • To certify that particular lots of product conform to product specifications.

For each of these applications, STATISTICA may be used for any combination of the following activities:

  • To store data and documents,
  • To perform data management and cleaning tasks,
  • To produce tabular and graphical output, and
  • To produce summary reports of those analytic results.

Depending upon the application, the data and results used for these purposes may be subject to the rules of the 21 CFR Part 11 regulation.

The STATISTICA Validation Package

StatSoft’s Technical Services Group provides a standard Validation Package that can be customized to suit the relevant STATISTICA application. Our Validation Package is a combination of a StatSoft team, a suite of services, and a set of documentation. Each of these components is covered in more detail below.

  • The StatSoft Team: StatSoft provides a team of highly-experience and skilled professionals to perform the planning, implementation and validation services. The specific number and type of resources depend on the project scope. Our team structure includes the following typical roles: a Project Manager, a STATISTICA Technical Consultant, and a Validation Engineer.
  • Scope of Services: Our Validation Package includes a full suite of installation and qualification services. The detailed scope depends upon the system and the responsibilities of the StatSoft team. Our suite of services may include requirements gathering and documentation, validation planning, system design, system installation and configuration, installation qualification, operational qualification, performance qualification, and documentation preparation.
  • Documentation: StatSoft provides a standard set of Validation deliverables. During the project planning activities, we will customize these deliverables to meet the project requirements. The documentation set includes a User Requirements and Functional Requirements Specification, a Validation Plan, a System Design Specification, Test Plan and Detailed Test Cases, a Traceability Matrix, Installation Qualification Summary, and a Validation Summary.


StatSoft provides its Validation Package as an integrated set of services to augment our standard system design and deployment methodology. We standardized the Validation Package to provide an enhanced suite of services to our clients in regulated industries.

Your organization is able to leverage our combination of STATISTICA expertise and validation expertise. What this means to you is a streamlined approach to validation with significant cost and time savings.

For More Information, Contact Us

Please contact StatSoft Technical Services at 0112346148 or email: for more information about our Validation Package to suite your STATISTICA application needs.

Text Mining Video: Analyzing Comments with STATISTICA

Statistica Knowledge Base – Output Management & Printing

What output management options are available in STATISTICA?

You can customize the way in which the output is managed in STATISTICA. When you perform an analysis, STATISTICA generates output in the form of spreadsheets and graphs. There are five basic channels to which you can direct all output: workbooks, stand-alone windows, reports, Microsoft Word, and the Web.

The first four output channels are controlled by the options in the Output Manager (accessible by selecting Output Manager from the File menu, or by selecting Options from the Tools menu to display the Options dialog and selecting Output Manager in the tree view). There are a number of ways to output to the Web, depending on the version of STATISTICA you have. These output channels can be used in many combinations (e.g., a workbook and report simultaneously), and each of the output channels can be customized and organized in a variety of ways.

How do I print spreadsheets?

The simplest way to print a spreadsheet is to click the Print button on the toolbar. STATISTICA then sends the current spreadsheet to the printer specified in the Print dialog. No other intermediate option dialogs are displayed. If a block is selected in the spreadsheet, then only that block is sent to the output destination; otherwise, the entire spreadsheet is sent to the output. More options are available when you select Print from the File menu (or CTRL+P) to display the Print dialog, where you can customize various aspects of the printing.

Automatic Reports. Note that you can keep a complete log of all spreadsheets (and/or graphs) that are displayed on the screen without having to remember to individually transfer them to the Report window or to print them. To do this, select Options from the Tools menu to display the Options dialog. Select Analyses/Graphs: Output Manager in the tree view. In the options pane, adjust the Report Output option to either Send to Multiple Reports (one for each Analysis/Graph or Single Report (common for all Analyses/Graphs). Note that this is a global option (as are all options in the Options dialog), and it will affect all analyses until the option is changed. To make changes for one particular analysis, use the Options button on the analysis or graph definition dialog.

What are workbooks?

The STATISTICA Workbook (*.stw) is a flexible output management facility based on the powerful ActiveX technology. Technically speaking, workbooks are “ActiveX containers” that enable you to manage all STATISTICA documents (e.g., spreadsheets, graphs), as well as all other ActiveX compatible documents such as Microsoft Excel worksheets or Microsoft Word documents. Each workbook contains two panels: an Explorer-style navigation tree on the left and a document viewer on the right. The navigation tree (workbook tree) can be hierarchically split into various nodes allowing you to organize your files in logical groupings (e.g., all analysis outputs, all macros created for a project, etc.). Tabs at the bottom of the document viewer (workbook viewer) are used to easily navigate the children of the currently selected node.

output management workbook

Workbooks help to organize sets of output files (e.g., spreadsheets, graphs, reports, macros, non-STATISTICA files, etc.) that have been created or used (e.g., reviewed) during the analysis of a data file.

How do I print previously saved results?

There are several options available for printing previously saved results:

1. You can open each spreadsheet (and/or graph) and print it by selecting Print from the File menu (CTRL+P) as described above.

2. You can open each spreadsheet (and/or graph), insert them into a report, and print the report. Note that with this method you can add supplementary text and comments to the analysis results.

3. You can insert all of the spreadsheets and graphs into a workbook and print the entire workbook by selecting Print from the Workbook File menu.

How can I suppress the printing of gridlines in spreadsheets?

To suppress the print of the gridlines in an active spreadsheet, you must make changes in two dialogs. First, change the Style of both the Horizontal and Vertical Data Lines to blank in the Gridlines dialog, accessed by selecting Gridlines from the Spreadsheet View menu. (Note that you can access the Spreadsheet View menu from within a report window or workbook by double-clicking on the spreadsheet. This gives you access to all spreadsheet editing tools.)

output management gridlines

Second, clear the Gridline styles and colors check box in the Edit Spreadsheet Layout: Print Filter dialog. (To display this dialog, first select Layout Manager from the Format – Spreadsheet submenu to display the Spreadsheet Layouts dialog. Then on the System tab, select Print Filter and click the Edit button.)

output management layouts

Note that clearing this check box causes STATISTICA to print the gridlines using the styles and colors specified in the Gridlines dialog rather than using a default black.

Can I add custom headers or footers to printed output?

You can create a customized header or footer for a STATISTICA Spreadsheet, Report, Graph, or Workbook that can include information such as the date, time, page number, and name of your company.

output management header footer

To create the header or footer for a STATISTICA Spreadsheet, Report, or Graph, select Header/Footer from the View menu and use the options in the Modify Header/Footer dialog to specify the custom header or footer. Note that custom headers and footers for STATISTICA Workbooks are created in the Workbook Page Setup dialog, accessible via the File menu.

How do I change the printer setup?

Most options for modifying the printing specifications for a given document, including margins and customized headers and footers, can be selected from the Print Preview window. To change the printer setup for a given printer, select Print Setup from the File menu to display the Print Setup dialog. Then click the Properties button to access the Printer Properties dialog.