Blog Archives

Considering Alternatives to SAS?

Do you use SAS for predictive modeling, advanced analytics, business intelligence, insurance or financial applications, or data visualization?

Why Choose STATISTICA?

SAS software is expensive and carries high, unpredictable annual licensing costs. SAS software is difficult to use, requiring specific SAS programming expertise, and it drives users toward dependency on only SAS-specific solutions (e.g., their proprietary data warehouses). Data visualization is integral for analytics, but SAS’s graphics have major shortcomings.

STATISTICA has consistently been ranked the highest in ease of use and customer satisfaction in independent surveys of analytics professionals. Click here to see the results of the most recent Rexer survey (2010), the largest survey of data mining professionals in the industry.

SAS

We offer the breadth of analytics capabilities and performance, including the most comprehensive data mining solution on the market, using more open, modern technologies. StatSoft software is designed to facilitate interfacing with all industry standard components of your computer infrastructure (e.g., ultra-fast integration with Oracle, MS SQL Server, and other databases) instead of locking you into proprietary standards and total dependence on one vendor.

STATISTICA is significantly faster than SAS. StatSoft is an Intel® Software Premiere Elite Partner and has developed technologies that leverage Intel CPU architecture to deliver unmatched parallel processing performance (press release with Intel) and rapidly process terabytes of data. StatSoft’s robust, cutting-edge enterprise system technology drives the analytics and analytic data management at some of the largest computer infrastructures in the world at Fortune 100 and Fortune 500 companies.

Quotes from SAS Customers

“We acquired our SAS license seven years ago and quickly learned that with SAS, you do not pay just an annual renewal and support fee – you practically have to “buy” the software again every year. Our first year renewal fee was already 60% of the initial purchase price, and it increased steadily and every year. Two years ago, our annual fee exceeded the initial purchase price we paid, and it keeps going up much faster than the inflation. This is not sustainable.” – CEO, Technology Company

“It took 8 weeks to install SAS Enterprise Miner. The installer just didn’t work. And we’re a midsize company, so we were a low priority for SAS’s technical support.” – Engineer, Chemical Company

“Early in our evaluation, we eliminated SAS from our consideration of fraud detection solutions primarily due to the exorbitant cost.” – Chief Actuary, Insurance Company

“We had used SAS on-demand for my data mining class. A few days before finals, all of our students’ project files were corrupted. Our SAS technical support representative confirmed there was nothing that could be done to restore the files. We’re switching to STATISTICA.” – University Professor

“Now, all graduate students use R. It is getting more difficult to find SAS programmers.” – Head of Statistics, Pharmaceutical Company

“We used SAS until May 2009 when we converted to WPS. The conversion went remarkably smoothly and was completed on time. Not only did we save a substantial amount in licensing fees, we also regained functionality such as Graphs that we had previously removed because of the cost.” – Survey respondent on KDNuggets.com
How to Proceed

StatSoft makes it easy to transition your current SAS environment to STATISTICA, either gradually or all at once. STATISTICA offers:

Direct import/export to SAS files
Deployment of predictive models to SAS code to score against SAS data sets
Native integration to run R program


For more information and for specific recommendations to suit your needs, please contact one of our representatives using the form below:

lorraine@statsoft.co.za , info@statsoft.co.za

How to Interpret Statistical Analysis Results – #statistics #statsoft #statistica

How to Interpret Statistical Analysis Results

Written by: STATISTICA News

Statistical tests examine a variety of relationships in data, but they share some common elements. Typically, statistical tests state a null and alternative hypothesis, calculate a test statistic, and report an associated p-value, and then the analyst makes a conclusion from the tests. This process is followed for simple tests as well as complex ones. Once you achieve a basic understanding of the process of statistical hypothesis testing, the concepts can be generalized to all tests.
 

Stating the Hypothesis

Statistical tests start with a null and alternative hypothesis. These hypotheses are statements about the population from which the sample was drawn. The sample data are used to support either the null or alternative hypothesis. A given test has one or more standard null and alternative hypotheses. For example, a one sample t-test has three possible hypotheses:

 

 
where μ represents the population mean and μ0 is the hypothesized mean. The first is a two-sided hypothesis where the researcher is looking for a significant difference between the population mean and the hypothesized mean. The second and third are one-sided alternatives where the researcher hypothesized that the true mean is either greater than (2) or less than (3) the hypothesized mean.

In a test for normality of data, the null hypothesis is: H0 : N(μ,σ) versus the alternative that the data are not normally distributed.


Calculating the Test Statistic and p-Value

Test statistics are used to decide between the null and alternative hypotheses. They can follow one of a variety of statistical distributions. This makes test statistics harder to interpret. The critical value for deciding between the null and alternative hypothesis varies by test.

A p-value is the probability of obtaining a sample data set as extreme as the observed data, given that the null hypothesis is true. While not technically accurate, it is much easier to think of the p-value as support for the null hypothesis. Before the analysis, a threshold is chosen, called alpha or level of significance. If the calculated p-value is less than the threshold, typically 0.05, then the null hypothesis is rejected in favor of the alternative. Said another way, the test is statistically significant. In STATISTICA, statistically significant p-values are reported in red.


Conclusion and Interpretation of Results

The p-value computed by the test leads you to reject or fail to reject the null hypothesis. (When the p-value is reported in red, reject the null hypothesis.) This conclusion should then be interpreted in terms of your project. A good interpretation will not mention hypotheses or test statistics. The interpretation will simply state the conclusion in the context of the problem.

Fail to Reject H0

When a test fails to reject the null hypothesis, it means that insufficient evidence exists to support the alternative hypothesis. Some examples of this include:

  • A significant difference does not exist between the population means of A and B
  • The correlation between A and B is not significantly different from 0.
  • The distribution of the data is not significantly different from Normal.
  • The regression parameter does not explain a significant amount of the variability in y.  (The regression parameter is not significantly different from 0.)

The conclusion is not to accept the null hypothesis. The insignificant result from the test may be because the null is true. It may also be because either random chance or too small of a sample made it impossible to detect the significance.

Reject H0

When a test does reject the null hypothesis, it does so in favor of the alternative hypothesis. The reject H0 conclusions for the same tests given above are:

  • A significant difference exists between the population means of A and B.  (Or, the population mean of A is significantly greater than the population mean of B.
  • The correlation between A and B is significantly different from 0.
  • The distribution of the data is significantly different from Normal.
  • The regression parameter does explain a significant amount of the variability in y.  (The regression parameter is significantly different from 0.)


Example

In a pain relief study, researchers are studying the effects of the pain relief medicine, aspirin, compared to a placebo. Pain relief scores were recorded for two groups of people who were given either aspirin or the placebo. Greater pain relief scores indicate better pain relief. The hypothesis to test is that pain relief will be different for patients given the aspirin compared to the placebo. Let’s write these as statistical hypotheses.

H0 μasprin = μplacebo
Ha μasprin  μplacebo

The null hypothesis states that average pain relief for patients given aspirin is equal to the average pain relief for patients given a placebo. The alternative hypothesis (which is what the research team believes to be true) states that the average pain relief for patients given aspirin is not equal to relief from the placebo.

 

 

 

 

 

 

 

 

 

 

 

 

 

 
Looking at the table of output, the sample mean pain relief for patients given aspirin is 59.1, which is greater than the average placebo pain relief of 56.3. These sample statistics are used to compute a test statistic to make inference about the populations they represent. That test statistic is computed to be 2.09. The test used follows a Studentized t distribution with N-2=16 degrees of freedom. The p-value for this test is 0.0527.

To make a conclusion, the p-value is compared to the alpha level of significance. In this case, alpha = 0.05.  The p-value = 0.0527 > 0.05 = alpha. The test fails to reject the null hypothesis. The test does not show a significant difference between the mean pain relief for patients given aspirin and those given a placebo.

Conclusion: A significant difference does not exist between the population average pain relief for patients given aspirin vs. those given a placebo.

The conclusion is not that the means are equal, but that they are not significantly different. It is possible that a difference in average pain relief does exist between the two groups. One possible reason for this is that the experiment did not collect enough data (samples). With additional data points, the statistical power of the test is improved. Another possibility is that random chance led to a sample with greater variability or a different mean than what is typical of the population.