# Blog Archives

## Electronic Statistics Textbook

The only Internet Resource about Statistics Recommended by Encyclopedia Britannica

StatSoft has freely provided the Electronic Statistics Textbook as a public service for more than 17 years now.

This Textbook offers training in the understanding and application of statistics. The material was developed at the StatSoft R&D department based on many years of teaching undergraduate and graduate statistics courses and covers a wide variety of applications, including laboratory research (biomedical, agricultural, etc.), business statistics, credit scoring, forecasting, social science statistics and survey research, data mining, engineering and quality control applications, and many others.

The Electronic Textbook begins with an overview of the relevant elementary (pivotal) concepts and continues with a more in depth exploration of specific areas of statistics, organized by “modules” and accessible by buttons, representing classes of analytic techniques. A glossary of statistical terms and a list of references for further study are included.

Proper citation
(Electronic Version): StatSoft, Inc. (2011). Electronic Statistics Textbook. Tulsa, OK: StatSoft. WEB: http://www.statsoft.com/textbook/.
(Printed Version): Hill, T. & Lewicki, P. (2007). STATISTICS: Methods and Applications. StatSoft, Tulsa, OK.

Overview of Elementary Concepts in Statistics. In this introduction, we will briefly discuss those elementary statistical concepts that provide the necessary foundations for more specialized expertise in any area of statistical data analysis. The selected topics illustrate the basic assumptions of most statistical methods and/or have been demonstrated in research to be necessary components of one’s general understanding of the “quantitative nature” of reality (Nisbett, et al., 1987). Because of space limitations, we will focus mostly on the functional aspects of the concepts discussed and the presentation will be very short. Further information on each of those concepts can be found in the Introductory Overview and Examples sections of this manual and in statistical textbooks. Recommended introductory textbooks are: Kachigan (1986), and Runyon and Haber (1976); for a more advanced discussion of elementary theory and assumptions of statistics, see the classic books by Hays (1988), and Kendall and Stuart (1979).

## Save The Rhino – South Africa – Petition

Dear friends,

 The rhino is being hunted to the brink of extinction, driven by growing horn demand in Asia. But EU pressure on China and Vietnam can force international action to save the rhino — sign our petition today to ensure the EU acts!

The rhino is being hunted into extinction and could disappear forever unless we act now. Shocking new statistics show 440 rhinos were brutally killed last year in South Africa alone — a massive increase on five years ago when just 13 had their horns hacked off. European nations could lead the world to a new plan to save these amazing creatures but they need to hear from us first!

Fueling this devastation is a huge spike in demand for rhino horns, used for bogus cancer cures, hangover remedies and good luck charms in China and Vietnam. Protests from South Africa have so far been ignored by the authorities, but Europe has the power to change this by calling for a ban on all rhino trade — from anywhere, to anywhere — when countries meet at the next crucial international wildlife trade summit in July.

The situation is so dire that the threat has even spread into British zoos who are on red-alert for rhino killing gangs! Let’s raise a giant outcry and urge Europe to push for new protections to save rhinos from extinction. When we reach 100,000 signers, our call will be delivered in Brussels, the decision-making heart of Europe, with a crash of cardboard rhinos. Every 50,000 signatures will add a rhino to the crash — bringing the size of our movement right to the door of EU delegates as they decide their position. Sign the petition below then forward this email widely:

http://www.avaaz.org/en/save_rhinos/?vl

So far this year one rhino has been killed every day in South Africa, home to at least 80% of the world’s remaining wild rhinos. Horns now have a street value of over \$65,000 a kilo — more expensive than gold or platinum. The South African Environment Minister has pledged to take action by putting 150 extra wardens and even an electric fence along the Mozambique border to try and stem the attacks — but the scale of the threat is so severe that global action is required.

Unless we act today we may lose this magnificent and ancient animal species permanently. Some Chinese are loudly lobbying for the trade in horn to be relaxed, but banning the trade in all rhinos will silence them. With the EU’s leadership, we can bring these international gangsters to justice, put the poachers in prison, and push for public awareness programmes in key Asian countries — and end this horn horror show for good.

In the next few weeks, the EU will be setting its agenda for the next big global meeting in just a few months — our best chance of turning the tide against the slaughter. We know that rhinos will be on their agenda, but only our pressure can ensure they challenge the problem at its source. Let’s build a giant outcry and deliver it in a spectacular fashion — sign now and together we can stop the slaughter across Africa:

http://www.avaaz.org/en/save_rhinos/?vl

In 2010, Avaaz’s actions helped to stop the elephant ivory trade from exploding. In 2012, we can do the same for the rhino. When we speak out together, we can change the world — last year was the worst year ever for the rhino, but this can be the year when we win.

With hope,

Iain, Sam, Maria Paz, Emma, Ricken and the whole Avaaz team

Few Rhinos Survive Outside Protected Areas (WWF)
ttp://www.worldwildlife.org/species/finder/rhinoceros/rhinos.html

South Africa record for rhino poaching deaths (BBC)
http://www.bbc.co.uk/news/world-africa-15571678

‘Cure for cancer’ rumour killed off Vietnam’s rhinos (The Guardian)
http://www.guardian.co.uk/environment/2011/nov/25/cure-cancer-rhino-horn-vietnam

British Zoos on Alert as Rhino Poaching Hits the UK (International Business Times)

# Considering Alternatives to SAS? Do you use SAS for predictive modeling, advanced analytics, business intelligence, insurance or financial applications, or data visualization? * Why Choose STATISTICA * Quotes from SAS Customers * How to Proceed? Why Choose STATISTICA? SAS software is expensive and carries high, unpredictable annual licensing costs. SAS software is difficult to use, requiring specific SAS programming expertise, and it drives users toward dependency on only SAS-specific solutions (e.g., their proprietary data warehouses). Data visualization is integral for analytics, but SAS’s graphics have major shortcomings. STATISTICA has consistently been ranked the highest in ease of use and customer satisfaction in independent surveys of analytics professionals. Click here to see the results of the most recent Rexer survey (2010), the largest survey of data mining professionals in the industry.

Lorraine@statsoft.co.za

## How to Summarize Data in STATISTICA Similar to Pivot Tables

To gain understanding of our data, it is helpful to summarize it.  Pivot tables, as found in Microsoft Excel and other programs, are used to summarize data and highlight important information. These tables can help us to extract meaning from data. Common tasks for pivot tables are to count, sum, or average. This is typically performed for classes of a grouping factor. For example, we could find the total sales in dollars and average sales in dollars grouped by region. These sales figures could further be grouped by fiscal quarter.  We can produce at-a-glance information from a large database with these summary tables.

In this example, we are interested in exploring a database of daily rain totals. The data come from the Australian Bureau of Meteorology. http://www.bom.gov.au/climate/data/.

To start out, we want to summarize the data with yearly rain totals. To do this, select the Statistics tab. In the Base group, click Basic Statistics to display the Basic Statistics and Tables Startup Panel. Select Breakdown; non-factorial tables.

Click OK to display the Statistics BreakDown (non-factorial) dialog box.

Click the Variables button. In the Select the dependent variables and grouping variables dialog box, select the continuous variable Rainfall amount (millimeters) in the Dependent variables list and Year in the Grouping variables list.

Click the OK button.

In the Statistics BreakDown (non-factorial) dialog box, click the Summary button to create the output. The result is a table with the yearly average rainfall, count per year and standard deviation.

Return to the Statistics BreakDown (non-factorial) dialog box, and select the Descriptives tab to view statistics that can be computed.  The mean is computed by default, and other statistics can be added or removed.

Next, we want to find the average rainfall broken down by year and month. Clear the Standard Deviation and Valid N check boxes.

Now, select the Quick tab. Click the Variables button, and add Month to the Grouping Variables. Create the Summary output.

This output lists Year and Month in columns. Most Pivot tables would arrange the output such that one variable was listed across and one was listed down. With a simple data management step, this can be achieved with this output.

Notice at the bottom of the output is an entry for All groups. This row should be removed. In STATISTICA, select the Data tab. In the Cases group, click the Cases arrow, and select Delete to display the Delete Cases dialog box. Select the last case, case number 302.

Click OK to delete.

Now, on the Data tab, in the Transformations group, click Stack to display the Unstacking/Stacking dialog box. Click Variables to display the Select Unstacking Variables dialog box.  Select Month in the Code (column) variables list, Rainfall amount (millimeter) in the Unstack (value) variables list, and Year in the Case ID (row) variables list.

Click the OK button.

Accept the default settings in the Unstacking/Stacking dialog box, and click OK to create the new table of output.

This output shows average rainfall amounts by year and month in a compact, easy to read table.

## Life, Disability, Automotive, Health, Property and Casualty, etc.

Companies in the insurance industry are using STATISTICA Data Minerto be more effective and competitive in the utilization of historical data, using the latest predictive modelling and data mining approaches to recognize patterns within terabytes of data. STATISTICA Data Miner allows companies to predict trends in customers’ behaviours and responses, claims, and losses.

Major successes and savings have been achieved by companies using STATISTICA Data Miner for predictive modelling for rate making, fraud detection, and customer segmentation.

## Areas of Application

### Rate making

STATISTICA Data Miner identifies the most important root causes in the frequency and magnitude of historical losses. Predictive Models relating these primary factors to the frequency and magnitude of losses are then used to update rate tables accordingly, making the insurers more accurate and competitive in their policy rates when compared to more traditional rate making approaches. In the past, General Linear Models were the industry standard approach. Now, more effective prediction of losses is achieved through the use of predictive modelling techniques such as recursive partitioning (i.e., “tree methods“).

### Customer segmentation

STATISTICA Data Miner‘s Clustering module may be used for customer segmentation, by grouping the entire customer base into clusters, identified on the basis of various demographic and behavioural factors. These clusters can then be used for a variety of predictive modelling applications to determine the efficacy of the clusters in predicting outcomes of interest.

### Fraud detection

Claims fraud is a significant and costly concern, costing insurance companies several billion dollars annually. Losses due to fraud have increased dramatically in the past ten years. Despite actions by insurance companies, a large amount of fraud remains undetected.

STATISTICA Data Miner helps the insurance company anticipate and quickly detect fraud and take immediate action to minimize costs. Through the use of sophisticated data mining tools, millions of claims can be searched to spot patterns and detect even subtle variations in billing practices, by analyzing above normal payoffs along different factors like geographical region, agent, and insured party.

Specifically for health insurance, STATISTICA Data Miner‘s Associations Rules may be used to analyze claim forms. Using the Associations Rule module, the payer will be able to find relationships among medical procedures performed together, patterns in diagnoses and procedures across providers, etc.

### Claims analysis

STATISTICA Data Miner helps users understand subtle business trends in claims, which would have been otherwise difficult to spot.

STATISTICA Generalized Linear Models has the Tweedie distribution. This distribution is a flexible predictive modelling option. It can include exact zero and continuous data.

### Predict which customers will buy new policies

STATISTICA Data Miner provides the insurance firm with reporting, tracking, and analysis tools to identify trends. Sequential pattern mining functions are powerful and can detect sets of customers associated with frequent buying patterns to inform future sales and marketing campaigns and tactics.

## How to Save a Microsoft Word Document in STATISTICA?

STATISTICA offers the ability to output your results, tables and graphs, to a Microsoft Word document. This feature makes creating your final analysis report easy. The step of copy and pasting the results to Word is no longer needed. Additionally, with Microsoft Word output, it is easy to share analysis output with colleagues, regardless of if they use STATISTICA.

For those using the 64 bit version of STATISTICA and the 32 bit version of Microsoft Word, problems may arise when saving the Microsoft Word document. The dialog to save the file does not default to saving the document as a *.docx, but rather a *.rtf file.

The Word document can be saved as a Word document, *.docx, but to do so may not be obvious. These simple steps will allow you to save your Microsoft Word document, created in STATISTICA, as a Word document and not only a Rich Text File.

1. Change the Save as type to All Files (*.*)
2. Type in the desired file name, adding the Microsoft Word file extension, *.docx. Now the file will save as a Microsoft Word document as expected.

This Microsoft Word document can be opened and edited in Word.