# Category Archives: analytics

## How to Interpret Statistical Analysis Results – #statistics #statsoft #statistica

**How to Interpret Statistical Analysis Results **

Written by: STATISTICA News

Statistical tests examine a variety of relationships in data, but they share some common elements. Typically, statistical tests state a null and alternative hypothesis, calculate a test statistic, and report an associated p-value, and then the analyst makes a conclusion from the tests. This process is followed for simple tests as well as complex ones. Once you achieve a basic understanding of the process of statistical hypothesis testing, the concepts can be generalized to all tests.

**Stating the Hypothesis**

Statistical tests start with a null and alternative hypothesis. These hypotheses are statements about the population from which the sample was drawn. The sample data are used to support either the null or alternative hypothesis. A given test has one or more standard null and alternative hypotheses. For example, a one sample* t*-test has three possible hypotheses:

where *μ* represents the population mean and *μ*_{0}* *is the hypothesized mean. The first is a two-sided hypothesis where the researcher is looking for a significant difference between the population mean and the hypothesized mean. The second and third are one-sided alternatives where the researcher hypothesized that the true mean is either greater than (2) or less than (3) the hypothesized mean.

In a test for normality of data, the null hypothesis is: *H*_{0 }*: **X **~ **N**(**μ,σ) *versus the alternative that the data are not normally distributed.

Calculating the Test Statistic and p-Value

Test statistics are used to decide between the null and alternative hypotheses. They can follow one of a variety of statistical distributions. This makes test statistics harder to interpret. The critical value for deciding between the null and alternative hypothesis varies by test.

A* p*-value is the probability of obtaining a sample data set as extreme as the observed data, given that the null hypothesis is true. While not technically accurate, it is much easier to think of the *p*-value as support for the null hypothesis. Before the analysis, a threshold is chosen, called alpha or level of significance. If the calculated p-value is less than the threshold, typically 0.05, then the null hypothesis is rejected in favor of the alternative. Said another way, the test is statistically significant. In *STATISTICA*, statistically significant *p*-values are reported in red.

Conclusion and Interpretation of Results

The *p*-value computed by the test leads you to reject or fail to reject the null hypothesis. (When the *p*-value is reported in red, reject the null hypothesis.) This conclusion should then be interpreted in terms of your project. A good interpretation will not mention hypotheses or test statistics. The interpretation will simply state the conclusion in the context of the problem.

**Fail to Reject H**_{0}

When a test fails to reject the null hypothesis, it means that insufficient evidence exists to support the alternative hypothesis. Some examples of this include:

- A significant difference does not exist between the population means of A and B
- The correlation between A and B is not significantly different from 0.
- The distribution of the data is not significantly different from Normal.
- The regression parameter does not explain a significant amount of the variability in y. (The regression parameter is not significantly different from 0.)

The conclusion is not to accept the null hypothesis. The insignificant result from the test may be because the null is true. It may also be because either random chance or too small of a sample made it impossible to detect the significance.

**Reject H**_{0}

When a test does reject the null hypothesis, it does so in favor of the alternative hypothesis. The reject H_{0} conclusions for the same tests given above are:

- A significant difference exists between the population means of A and B. (Or, the population mean of A is significantly greater than the population mean of B.
- The correlation between A and B is significantly different from 0.
- The distribution of the data is significantly different from Normal.
- The regression parameter does explain a significant amount of the variability in y. (The regression parameter is significantly different from 0.)

Example

In a pain relief study, researchers are studying the effects of the pain relief medicine, aspirin, compared to a placebo. Pain relief scores were recorded for two groups of people who were given either aspirin or the placebo. Greater pain relief scores indicate better pain relief. The hypothesis to test is that pain relief will be different for patients given the aspirin compared to the placebo. Let’s write these as statistical hypotheses.

*H*_{0 }: *μ*_{asprin}* = **μ*_{placebo
}*H*_{a}_{ }: *μ*_{asprin}* *≠* **μ*_{placebo}

The null hypothesis states that average pain relief for patients given aspirin is equal to the average pain relief for patients given a placebo. The alternative hypothesis (which is what the research team believes to be true) states that the average pain relief for patients given aspirin is not equal to relief from the placebo.

Looking at the table of output, the sample mean pain relief for patients given aspirin is 59.1, which is greater than the average placebo pain relief of 56.3. These sample statistics are used to compute a test statistic to make inference about the populations they represent. That test statistic is computed to be 2.09. The test used follows a Studentized t distribution with N-2=16 degrees of freedom. The p-value for this test is 0.0527.

To make a conclusion, the *p*-value is compared to the alpha level of significance. In this case, *alpha* = 0.05. The* p*-value = 0.0527 > 0.05 = *alpha*. The test fails to reject the null hypothesis. The test does not show a significant difference between the mean pain relief for patients given aspirin and those given a placebo.

*Conclusion*: A significant difference does not exist between the population average pain relief for patients given aspirin vs. those given a placebo.

The conclusion is not that the means are equal, but that they are not significantly different. It is possible that a difference in average pain relief does exist between the two groups. One possible reason for this is that the experiment did not collect enough data (samples). With additional data points, the statistical power of the test is improved. Another possibility is that random chance led to a sample with greater variability or a different mean than what is typical of the population.

## Comprehensive Analytic Modules

*STATISTICA Multivariate Exploratory Techniques*offers a broad selection of exploratory techniques, from cluster analysis to advanced classification trees methods, with a comprehensive array of interactive visualization tools for exploring relationships and patterns; built-in complete Visual Basic scripting.

- Cluster Analysis Techniques
- Factor Analysis and Principle Components
- Canonical Correlation Analysis
- Reliability/Item Analysis
- Classification Trees
- Correspondence Analysis
- Multidimensional Scaling
- Discriminant Analysis
- General Discriminant Analysis Models
*STATISTICA*Visual Basic Language, and more.

*STATISTICA Advanced Linear/Nonlinear Models* contains a wide array of the most advanced linear and nonlinear modeling tools on the market, supports continuous and categorical predictors, interactions, hierarchical models; automatic model selection facilities; also, includes variance components, time series, and many other methods; all analyses include extensive, interactive graphical support and built-in complete Visual Basic scripting.

- Distribution and Simulation
- Variance Components and Mixed Model ANOVA/ANCOVA
- Survival/Failure Time Analysis
- General Nonlinear Estimation (and Logit/Probit)
- Log-Linear Analysis
- Time Series Analysis, Forecasting
- Structural Equation Modeling/Path Analysis (
*SEPATH*) - General Linear Models (
*GLM*) - General Regression Models (
*GRM*) - Generalized Linear/Nonlinear Models (
*GLZ*) - Partial Least Squares (
*PLS*) *STATISTICA*Visual Basic Language, and more.

*STATISTICA Power Analysis and Interval Estimation*is an extremely precise and user-friendly research tool for analyzing all aspects of statistical power and sample size calculation.

- Power Calculations
- Sample Size Calculations
- Interval Estimation
- Probability Distribution Calculators, and more.

*STATISTICA Automated Neural Networks*

*STATISTICA Automated Neural Networks*contains a comprehensive array of statistics, charting options, network architectures, and training algorithms; C and PMML (Predictive Model Markup Language) code generators. The C code generator is an add-on.

Fully integrated with the *STATISTICA* system.

- A selection of the most popular network architectures including Multilayer Perceptrons, Radial Basis Function networks, Linear Networks and Self Organizing Feature Maps.
- State-of-the-art training algorithms including:

Conjugate Gradient Descent, BFGS, Kohonen training, k-Means Center Assignment - Forming ensembles of networks for better prediction performance
- Automatic Network Search, a tool for automating neural network architecture and complexity selection
- Best Network Retention, and more.
- Supporting various statistical analysis and model predictive model building including regression, classification, time series regression, time series classification and cluster analysis for dimensionality reduction and visualization.
- Fully supports deployment of multiple models

*STATISTICA Automated Neural Networks Code Generator*

*STATISTICA Automated Neural Networks Code Generator* can generate neural network code in both C and PMML (Predictive Model Markup Language) languages. The Code Generator Add-on enables *STATISTICA Automated Neural Networks *users to generate a C code file to be used for compiling a C program based on the output of a neural networks analysis.

- The C code generator add-on requires
*STATISTICA Neural Networks* - Generates a source code version of a neural network (in C or C++ file) which can be compiled with all C or C++ compilers.
- C code file can then integrated into external programs.

*STATISTICA Base*

*STATISTICA Base*offers a comprehensive set of essential statistics in a user-friendly package with flexible output management and Web enablement features; it also includes all *STATISTICA* graphics tools and a comprehensive Visual Basic development environment. The program is shipped on CD ROM.

- Descriptive Statistics, Breakdowns, and Exploratory Data Analysis
- Correlations
- Interactive Probability Calculator
- T-Tests (and other tests of group differences)
- Frequency Tables, Crosstabulation Tables, Stub-and-Banner Tables, Multiple Response Analysis
- Multiple Regression Methods
- Nonparametric Statistics
- Distribution Fitting
- Enhanced graphics technology
- Powerful query tools
- Flexible data management
- ANOVA [supports 4 between factors and 1 within (repeated measure) factor]
*STATISTICA*Visual Basic Language, and more.

*STATISTICA Data Miner*

Includes the functionality of all of the following:

*STATISTICA Automated Neural Networks*

*STATISTICA Data Miner* contains the most comprehensive selection of data mining solutions on the market, with an icon-based, extremely easy-to-use user interface. It features a selection of completely integrated, and automated, ready to deploy “as is” (but also easily customizable) specific data mining solutions for a wide variety of business applications. The product is offered optionally with deployment and on-site training services. The data mining solutions are driven by powerful procedures from five modules, which can also be used interactively and/or used to build, test, and deploy new solutions.

- General Slicer/Dicer Explorer
- General Classifier
- General Modeler/Multivariate Explorer
- General Forecaster
- General Neural Networks Explorer, and more.

Solution Packages to meet specific needs are available.

## *STATISTICA Scorecard*

*STATISTICA Scorecard*, a software solution for developing, evaluating, and monitoring scorecard models, includes the following capabilities and workflow:

- Data preparation
- Modelling
- Evaluation and calibration
- Monitoring

*STATISTICA Data Warehouse*

*STATISTICA Data Warehouse* is the ultimate high-performance, scalable system for intelligent management of unlimited amounts of data, distributed across locations worldwide.

*STATISTICA Document Management System*

*STATISTICA Document Management System* is a scalable solution for flexible, productivity-enhancing management of local or Web-based document repositories (FDA/ISO compliant).

*STATISTICA Enterprise*

*STATISTICA Enterprise* is an integrated multi-user software system designed for general purpose data analysis and business intelligence applications in research, marketing, finance, and other industries. *STATISTICA Enterprise* provides an efficient interface to enterprise-wide data repositories and a means for collaborative work as well as all the statistical functionality available in *STATISTICA Base*, *STATISTICA Advanced Models*, and *STATISTICA Exploratory Techniques* (optionally also *STATISTICA Automated Neural Networks* and *STATISTICA Power Analysis and Interval Estimation*).

- An efficient general interface to enterprise-wide repositories of data
- A means for collaborative work (groupware functionality)
- A reporting tool for formatted documents (PDF, HTML, MS Word) and analysis summaries of any of the tabular and graphical results produced by
*STATISTICA*. - Compatible with (and linkable to) industry-standard enterprise-wide database management systems
- Custom configurations including any applications from the
*STATISTICA*product line, and more.

*STATISTICA Enterprise / Quality Control (QC)*

*STATISTICA’s* comprehensive array of both routine and high-end statistical analyses, superior graphing technology, and unparalleled record of reviews gives *STATISTICA Enterprise/QC*many advantages over competing products. A unique combination of features not found in any other SPC system makes *STATISTICA Enterprise/QC* the most comprehensive SPC System available.

- Real-time analytical tools
- A high performance database
- Groupware functionality for sharing queries, special applications, etc.
- Wizard-driven system administration tools
- A sophisticated reporting tool for web-based output
- One-click access to analyses and reports
- Built-in security system
- User-specific interfaces
- Open-ended alarm notification including cause/action prompts
- Interactive querying facilities
- Integration with external applications (Word, Excel, browsers)
- and much, much more…

*STATISTICA Enterprise Web Viewer*

*STATISTICA Enterprise Web Viewer* provides the ability to view analyses and reports that were generated within *STATISTICA Enterprise*or *STATISTICA Enterprise / QC*. This allows companies to protect their data and reports with the *STATISTICA Enterprise* security model.

*STATISTICA Extract, Transform, and Load (ETL)*

*STATISTICA Extract, Transform, and Load (ETL)* provides options to simplify and facilitate access to, aggregation, and alignment of data from multiple databases, when some of the databases contain process data (using the optional PI Connector), while others contain “static” data (e.g., from Oracle or MS SQL Server). Provides for ad-hoc querying and aligning of data, for subsequent analyses such as ad-hoc charting etc. of data describing a specific time interval.

*STATISTICA Live Score*

*STATISTICA Live Score *is* STATISTICA *Server software within the

*Data Analysis and Data Mining Platform. Data are aggregated & cleaned and models are trained & validated using the*

*STATISTICA**software. Once the models are validated, they are deployed to the*

*STATISTICA*Data Miner*server*

*STATISTICA Live Score**.*provides multi-threaded, efficient, and platform-independent scoring of data from line-of-business applications.

*STATISTICA Live Score*

*STATISTICA Monitoring and Alerting Server (MAS)*

*STATISTICA Monitoring and Alerting Server (MAS)*is a system that enables users to automate the continual monitoring of hundreds or thousands of critical process and product parameters.

*STATISTICA MultiStream™ for Pharmaceutical Industries*

*STATISTICA MultiStream for Pharmaceutical Industries*is a solution package for identifying and implementing effective strategies for advanced multivariate process monitoring and control. *STATISTICA MultiStream* was designed for process industries in general, but is particularly well suited to help pharmaceutical manufacturers leverage the data collected into their existing specialized process data bases for multivariate and predictive process control.

*STATISTICA* MultiStream™ for Power Industries

*STATISTICA MultiStream for Power Industries* is a solution package for identifying and implementing effective strategies for advanced multivariate process monitoring and control. *STATISTICA MultiStream* was designed for process industries in general, but is particularly well suited to help power generation facilities leverage the data collected into their existing specialized process data bases for multivariate and predictive process control, for actionable advisory systems.

*STATISTICA Multivariate Statistical Process Control (MSPC)*

*STATISTICA Multivariate Statistical Process Control (MSPC)* is a complete solution for multivariate statistical process control, deployed within a scalable, secure analytics software platform.

*STATISTICA PI Connector*

*STATISTICA* PI Connector is an optional *STATISTICA* add-on component that allows for direct integration to data stored in the PI data historian. The *STATISTICA* PI Connector utilizes the PI user access control and security model, allows for interactive browsing of tags, and takes advantages of dedicated PI functionality for interpolation and snapshot data. *STATISTICA* integrated with the PI system is being used for streamlined and automated analyses for applications such as Process Analytical Technology (PAT) in FDA-regulated industries, Advanced Process Control (APC) systems in Chemical and Petrochemical industries, and advisory systems for process optimization and compliance in the Energy Utility industry.

*STATISTICA Process Optimization*

*STATISTICA Process Optimization*, an optional extension of *STATISTICA Data Miner*, is a powerful software solution designed to monitor processes and identify and anticipate problems related to quality control and improvement with unmatched sensitivity and effectiveness. *STATISTICA Process Optimization* integrates all quality control charts, process capability analyses, experimental design procedures, and Six Sigma methods with a comprehensive library of cutting-edge techniques for exploratory and predictive data mining.

*STATISTICA Process Optimization* enables its users to:

- Predict QC problems with cutting edge data mining methods
- Discover root causes of problem areas
- Monitor and improve ROI (Return On Investment)
- Generate suggestions for improvement
- Monitor processes in real time over the Web
- Create and deploy QC/SPC solutions over the Web
- Use multithreading and distributed processing to rapidly process extremely large streams of data.
- General Optimization

Solution Packages to meet specific needs are available.

*STATISTICA Quality Control (QC)*

Includes the functionality of all of the following:

*STATISTICA* Quality Control Charts offers versatile presentation-quality charts with a selection of automation options, customizable features, and user-interface shortcuts to simplify routine work.

- Quality Control Charts
- Interactive Quality Control Charts including:

Real-time updating of charts, automatic alarm notification, shop floor mode, assigning causes and actions, analytic brushing, and dynamic project management - Multivariate Quality Control Charts including: Hotelling T-Square Charts, Multiple Stream (Group), Multivariate Exponentially Moving Average (MEWMA) charts, Multivariate Cumulative Sum (MCUSUM) Charts, Generalized Variance Charts
*STATISTICA*Visual Basic Language, and more.

*STATISTICA Process Analysis* is a comprehensive package for process capability, Gage R&R, and other quality control/improvement applications.

- Process Capability Analysis
- Weibull Analysis
- Gage Repeatability & Reproducibility
- Sampling Plans
- Variance Components, and more.

*STATISTICA Design of Experiments* features the largest selection of DOE, visualization and other analytic techniques including powerful desirability profilers and extensive residual statistics.

- Fractional Factorial Designs
- Mixture Designs
- Latin Squares
- Search for Optimal 2**
*k*-p Designs - Residual Analysis and Transformations
- Optimization of Single or Multiple Response Variables
- Central Composite Designs
- Taguchi Designs
- Desirability Profiler
- Minimum Aberration and Maximum Unconfounding 2**
*k*-p Fractional Factorial Designs with Blocks - Constrained Surfaces
- D- and A-optimal Designs, and more.

*STATISTICA Power Analysis and Interval Estimation*is an extremely precise and user-friendly research tool for analyzing all aspects of statistical power and sample size calculation.

- Power Calculations
- Sample Size Calculations
- Interval Estimation
- Probability Distribution Calculators, and more.

*STATISTICA Sequence, Association, and Link Analysis (SAL)*

*STATISTICA Sequence, Association and Link Analysis (SAL)* is designed to address the needs of clients in retailing, banking and insurance, etc., industries by implementing the fastest known highly scalable algorithm with the ability to drive Association and Sequence rules in one single analysis. The program represents a stand-alone module that can be used for both model building and deployment. All tools in *STATISTICA Data Miner* can be quickly and effortlessly leveraged to analyze and “drill into” results generated via *STATISTICA SAL*.

- Uses a Tree-Building technique to extract Association and Sequence rules from data
- Uses efficient and thread-safe local relational Database technology to store Association and Sequence models
- Handles multiple response, multiple dichotomy and continuous variables in one analysis
- Performs Sequence analysis while mining for Association rules in a single analysis
- Simultaneously extracts Association and Sequence rules for more than one dimension
- Given the ability to perform multidimensional Association and Sequence mining and the capacity to extract only rules for specific items, the program can be used for Predictive Data Mining
- Performs Hierarchical Single-Linkage Cluster analysis which can detect the more likely cluster of items that can occur. This has extremely useful, practical real-world applications such as in Retailing.

*STATISTICA Text Miner*

*STATISTICA Text Miner*is an optional extension of *STATISTICA Data Miner*. The program features a large selection of text retrieval, pre-processing, and analytic and interpretive mining procedures for unstructured text data (including Web pages), with numerous options for converting text into numeric information (for mapping, clustering, predictive data mining, etc.), language-specific stemming algorithms. Because *STATISTICA*’s flexible data import options, the methods available in *STATISTICA Text Miner* can also be useful for processing other unstructured input (e.g., image files imported as data matrices, etc.).

*STATISTICA Web Based Data Entry*

*STATISTICA Web Data Entry*enables companies to configure data entry scenarios to allow data entry via Web browsers and the analysis of these data using all of the graphical data analysis, statistical analysis, and data mining capabilities of the *STATISTICA Enterprise* software platform

*STATISTICA Web Data Entry *builds on the configuration objects in *STATISTICA Enterprise*:

**Characteristics:**Numeric data to be collected for analysis (e.g., pH)**Labels:**Text or date data for traceability (e.g., Lot Number)**Data Entry Setups:**Groups of Characteristics and Labels configured with specific User/Group permissions to collect the appropriate data for particular scenarios

*STATISTICA Variance Estimation and Precision*

*STATISTICA* Variance Estimation and Precision is a comprehensive set of techniques for analyzing data from experiments that include both fixed and random effects using REML (Restricted Maximum Likelihood Estimation). With Variance Estimation and Precision, users can obtain estimates of variance components and use them to make precision statements while at the same time comparing fixed effects in the presence of multiple sources of variation.

Variance Estimation and Precision includes the following:

- Variability plots
- Multiple plot layouts to allow direct comparison of multiple dependent variables
- Expected mean squares and variance components with confidence intervals
- Flexible handling of multiple dependent variables: analyze several variables with the same or different designs at once
- Graph displays of variance components

*WebSTATISTICA Knowledge Portal*

WebSTATISTICA Knowledge Portal is the ultimate knowledge-sharing tool. It incorporates the latest Internet technology and includes a powerful, flexible report generation tool and a secure system for information delivery.

*WebSTATISTICA Server Applications*

*WebSTATISTICA Server Applications* is the ultimate enterprise system that offers full Web enablement, including the ability to run *STATISTICA* interactively or in batch from a Web browser on any computer (incl. Linux, UNIX), offload time consuming tasks to the servers (using distributed processing), use multi-tier Client-Server architecture, manage projects over the Web, and collaborate “across the hall or across continents.”

- Work collaboratively “across the hall” or “across continents”
- Run
*STATISTICA*using any computer in the world (connected to the Internet - Offload time-consuming tasks to the servers
- Manage/administer projects over the Web
- Develop highly customized Web applications
- and much, much more…