Blog Archives

Electronic Statistics Textbook

The only Internet Resource about Statistics Recommended by Encyclopedia Britannica

StatSoft has freely provided the Electronic Statistics Textbook as a public service for more than 17 years now.

This Textbook offers training in the understanding and application of statistics. The material was developed at the StatSoft R&D department based on many years of teaching undergraduate and graduate statistics courses and covers a wide variety of applications, including laboratory research (biomedical, agricultural, etc.), business statistics, credit scoring, forecasting, social science statistics and survey research, data mining, engineering and quality control applications, and many others.

The Electronic Textbook begins with an overview of the relevant elementary (pivotal) concepts and continues with a more in depth exploration of specific areas of statistics, organized by “modules” and accessible by buttons, representing classes of analytic techniques. A glossary of statistical terms and a list of references for further study are included.

Proper citation
(Electronic Version): StatSoft, Inc. (2011). Electronic Statistics Textbook. Tulsa, OK: StatSoft. WEB: http://www.statsoft.com/textbook/.
(Printed Version): Hill, T. & Lewicki, P. (2007). STATISTICS: Methods and Applications. StatSoft, Tulsa, OK.

 

 


Overview of Elementary Concepts in Statistics. In this introduction, we will briefly discuss those elementary statistical concepts that provide the necessary foundations for more specialized expertise in any area of statistical data analysis. The selected topics illustrate the basic assumptions of most statistical methods and/or have been demonstrated in research to be necessary components of one’s general understanding of the “quantitative nature” of reality (Nisbett, et al., 1987). Because of space limitations, we will focus mostly on the functional aspects of the concepts discussed and the presentation will be very short. Further information on each of those concepts can be found in the Introductory Overview and Examples sections of this manual and in statistical textbooks. Recommended introductory textbooks are: Kachigan (1986), and Runyon and Haber (1976); for a more advanced discussion of elementary theory and assumptions of statistics, see the classic books by Hays (1988), and Kendall and Stuart (1979).


Advertisements

Text Mining Insurance Losses – Video

To Watch the video, please click here

 

Save The Rhino – South Africa – Petition

Dear friends,

The rhino is being hunted to the brink of extinction, driven by growing horn demand in Asia. But EU pressure on China and Vietnam can force international action to save the rhino — sign our petition today to ensure the EU acts!

The rhino is being hunted into extinction and could disappear forever unless we act now. Shocking new statistics show 440 rhinos were brutally killed last year in South Africa alone — a massive increase on five years ago when just 13 had their horns hacked off. European nations could lead the world to a new plan to save these amazing creatures but they need to hear from us first!

To sign the petition click here

Fueling this devastation is a huge spike in demand for rhino horns, used for bogus cancer cures, hangover remedies and good luck charms in China and Vietnam. Protests from South Africa have so far been ignored by the authorities, but Europe has the power to change this by calling for a ban on all rhino trade — from anywhere, to anywhere — when countries meet at the next crucial international wildlife trade summit in July.

The situation is so dire that the threat has even spread into British zoos who are on red-alert for rhino killing gangs! Let’s raise a giant outcry and urge Europe to push for new protections to save rhinos from extinction. When we reach 100,000 signers, our call will be delivered in Brussels, the decision-making heart of Europe, with a crash of cardboard rhinos. Every 50,000 signatures will add a rhino to the crash — bringing the size of our movement right to the door of EU delegates as they decide their position. Sign the petition below then forward this email widely:

http://www.avaaz.org/en/save_rhinos/?vl

So far this year one rhino has been killed every day in South Africa, home to at least 80% of the world’s remaining wild rhinos. Horns now have a street value of over $65,000 a kilo — more expensive than gold or platinum. The South African Environment Minister has pledged to take action by putting 150 extra wardens and even an electric fence along the Mozambique border to try and stem the attacks — but the scale of the threat is so severe that global action is required.

Unless we act today we may lose this magnificent and ancient animal species permanently. Some Chinese are loudly lobbying for the trade in horn to be relaxed, but banning the trade in all rhinos will silence them. With the EU’s leadership, we can bring these international gangsters to justice, put the poachers in prison, and push for public awareness programmes in key Asian countries — and end this horn horror show for good.

In the next few weeks, the EU will be setting its agenda for the next big global meeting in just a few months — our best chance of turning the tide against the slaughter. We know that rhinos will be on their agenda, but only our pressure can ensure they challenge the problem at its source. Let’s build a giant outcry and deliver it in a spectacular fashion — sign now and together we can stop the slaughter across Africa:

http://www.avaaz.org/en/save_rhinos/?vl

In 2010, Avaaz’s actions helped to stop the elephant ivory trade from exploding. In 2012, we can do the same for the rhino. When we speak out together, we can change the world — last year was the worst year ever for the rhino, but this can be the year when we win.

With hope,

Iain, Sam, Maria Paz, Emma, Ricken and the whole Avaaz team

More Information:

Few Rhinos Survive Outside Protected Areas (WWF)
ttp://www.worldwildlife.org/species/finder/rhinoceros/rhinos.html

South Africa record for rhino poaching deaths (BBC)
http://www.bbc.co.uk/news/world-africa-15571678

‘Cure for cancer’ rumour killed off Vietnam’s rhinos (The Guardian)
http://www.guardian.co.uk/environment/2011/nov/25/cure-cancer-rhino-horn-vietnam

British Zoos on Alert as Rhino Poaching Hits the UK (International Business Times)
http://www.ibtimes.co.uk/articles/289792/20120130/british-zoos-uk-alert-rhino-poaching-hits.htm

STATISTICA SAS – Considering Alternatives to SAS?

Considering Alternatives to SAS?
Do you use SAS for predictive modeling, advanced analytics, business intelligence, insurance or financial applications, or data visualization?

* Why Choose STATISTICA
* Quotes from SAS Customers
* How to Proceed?

Why Choose STATISTICA?

SAS software is expensive and carries high, unpredictable annual licensing costs. SAS software is difficult to use, requiring specific SAS programming expertise, and it drives users toward dependency on only SAS-specific solutions (e.g., their proprietary data warehouses). Data visualization is integral for analytics, but SAS’s graphics have major shortcomings.

STATISTICA has consistently been ranked the highest in ease of use and customer satisfaction in independent surveys of analytics professionals. Click here to see the results of the most recent Rexer survey (2010), the largest survey of data mining professionals in the industry.

SAS Alternative, Rexer Survey

SAS Alternative, Rexer Survey

We offer the breadth of analytics capabilities and performance, including the most comprehensive data mining solution on the market, using more open, modern technologies. StatSoft software is designed to facilitate interfacing with all industry standard components of your computer infrastructure (e.g., ultra-fast integration with Oracle, MS SQL Server, and other databases) instead of locking you into proprietary standards and total dependence on one vendor.

STATISTICA is significantly faster than SAS. StatSoft is a Software Partner of Intel and has developed technologies that leverage Intel CPU architecture to deliver unmatched parallel processing performance (press release with Intel) and rapidly process terabytes of data. StatSoft’s robust, cutting-edge enterprise system technology drives the analytics and analytic data management at some of the largest computer infrastructures in the world at Fortune 100 and Fortune 500 companies.

Quotes from SAS Customers

“We acquired our SAS license seven years ago and quickly learned that with SAS, you do not pay just an annual renewal and support fee – you practically have to “buy” the software again every year. Our first year renewal fee was already 60% of the initial purchase price, and it increased steadily and every year. Two years ago, our annual fee exceeded the initial purchase price we paid, and it keeps going up much faster than the inflation. This is not sustainable.” – CEO, Technology Company

“It took 8 weeks to install SAS Enterprise Miner. The installer just didn’t work. And we’re a midsize company, so we were a low priority for SAS’s technical support.” – Engineer, Chemical Company

“Early in our evaluation, we eliminated SAS from our consideration of fraud detection solutions primarily due to the exorbitant cost.” – Chief Actuary, Insurance Company

“We had used SAS on-demand for my data mining class. A few days before finals, all of our students’ project files were corrupted. Our SAS technical support representative confirmed there was nothing that could be done to restore the files. We’re switching to STATISTICA.” – University Professor

“Now, all graduate students use R. It is getting more difficult to find SAS programmers.” – Head of Statistics, Pharmaceutical Company

“We used SAS until May 2009 when we converted to WPS. The conversion went remarkably smoothly and was completed on time. Not only did we save a substantial amount in licensing fees, we also regained functionality such as Graphs that we had previously removed because of the cost.” – Survey respondent on KDNuggets.com
How to Proceed?

StatSoft makes it easy to transition your current SAS environment to STATISTICA, either gradually or all at once. STATISTICA offers:

* Direct import/export to SAS files
* Deployment of predictive models to SAS code to score against SAS data sets
* Native integration to run R programs

For a limited time, we offer to qualifying customers a special upgrade program – MSP (Migration from SAS Program) – where the initial software acquisition cost is guaranteed to be below your current SAS annual renewal cost (and StatSoft annual fees are guaranteed to remain always at only 20% of the initial cost, adjusted for CPI). As part of MSP, we also offer discounted migration service fees if our consultants are engaged to facilitate the migration process.
For more information and for specific recommendations to suit your needs, please contact one of our representatives using the form below:

 

Lorraine@statsoft.co.za

How to Summarize Data in STATISTICA Similar to Pivot Tables

Click here to upload the data file so you’ll be able to work through the example.

To gain understanding of our data, it is helpful to summarize it.  Pivot tables, as found in Microsoft Excel and other programs, are used to summarize data and highlight important information. These tables can help us to extract meaning from data. Common tasks for pivot tables are to count, sum, or average. This is typically performed for classes of a grouping factor. For example, we could find the total sales in dollars and average sales in dollars grouped by region. These sales figures could further be grouped by fiscal quarter.  We can produce at-a-glance information from a large database with these summary tables.

In this example, we are interested in exploring a database of daily rain totals. The data come from the Australian Bureau of Meteorology. http://www.bom.gov.au/climate/data/.
Pivot table 1

To start out, we want to summarize the data with yearly rain totals. To do this, select the Statistics tab. In the Base group, click Basic Statistics to display the Basic Statistics and Tables Startup Panel. Select Breakdown; non-factorial tables.

Pivot table 2

Click OK to display the Statistics BreakDown (non-factorial) dialog box.

Click the Variables button. In the Select the dependent variables and grouping variables dialog box, select the continuous variable Rainfall amount (millimeters) in the Dependent variables list and Year in the Grouping variables list.

Pivot tables 3

Click the OK button.

In the Statistics BreakDown (non-factorial) dialog box, click the Summary button to create the output. The result is a table with the yearly average rainfall, count per year and standard deviation.

Return to the Statistics BreakDown (non-factorial) dialog box, and select the Descriptives tab to view statistics that can be computed.  The mean is computed by default, and other statistics can be added or removed.

Next, we want to find the average rainfall broken down by year and month. Clear the Standard Deviation and Valid N check boxes.

Pivot tables 5

Now, select the Quick tab. Click the Variables button, and add Month to the Grouping Variables. Create the Summary output.

Pivot tables 6

This output lists Year and Month in columns. Most Pivot tables would arrange the output such that one variable was listed across and one was listed down. With a simple data management step, this can be achieved with this output.

Notice at the bottom of the output is an entry for All groups. This row should be removed. In STATISTICA, select the Data tab. In the Cases group, click the Cases arrow, and select Delete to display the Delete Cases dialog box. Select the last case, case number 302.

Pivot tables 7

Click OK to delete.

Now, on the Data tab, in the Transformations group, click Stack to display the Unstacking/Stacking dialog box. Click Variables to display the Select Unstacking Variables dialog box.  Select Month in the Code (column) variables list, Rainfall amount (millimeter) in the Unstack (value) variables list, and Year in the Case ID (row) variables list.

Pivot tables 8

Click the OK button.

Pivot tables 9

Accept the default settings in the Unstacking/Stacking dialog box, and click OK to create the new table of output.

This output shows average rainfall amounts by year and month in a compact, easy to read table.

STATISTICA Data Miner Predictive Modelling Solutions for the Insurance Industry

Life, Disability, Automotive, Health, Property and Casualty, etc.

Companies in the insurance industry are using STATISTICA Data Minerto be more effective and competitive in the utilization of historical data, using the latest predictive modelling and data mining approaches to recognize patterns within terabytes of data. STATISTICA Data Miner allows companies to predict trends in customers’ behaviours and responses, claims, and losses.

Major successes and savings have been achieved by companies using STATISTICA Data Miner for predictive modelling for rate making, fraud detection, and customer segmentation.

Areas of Application

Rate making

STATISTICA Data Miner identifies the most important root causes in the frequency and magnitude of historical losses. Predictive Models relating these primary factors to the frequency and magnitude of losses are then used to update rate tables accordingly, making the insurers more accurate and competitive in their policy rates when compared to more traditional rate making approaches. In the past, General Linear Models were the industry standard approach. Now, more effective prediction of losses is achieved through the use of predictive modelling techniques such as recursive partitioning (i.e., “tree methods“). 

Customer segmentation

STATISTICA Data Miner‘s Clustering module may be used for customer segmentation, by grouping the entire customer base into clusters, identified on the basis of various demographic and behavioural factors. These clusters can then be used for a variety of predictive modelling applications to determine the efficacy of the clusters in predicting outcomes of interest. 

Fraud detection

Claims fraud is a significant and costly concern, costing insurance companies several billion dollars annually. Losses due to fraud have increased dramatically in the past ten years. Despite actions by insurance companies, a large amount of fraud remains undetected.

STATISTICA Data Miner helps the insurance company anticipate and quickly detect fraud and take immediate action to minimize costs. Through the use of sophisticated data mining tools, millions of claims can be searched to spot patterns and detect even subtle variations in billing practices, by analyzing above normal payoffs along different factors like geographical region, agent, and insured party. 

Association Rule GraphSpecifically for health insurance, STATISTICA Data Miner‘s Associations Rules may be used to analyze claim forms. Using the Associations Rule module, the payer will be able to find relationships among medical procedures performed together, patterns in diagnoses and procedures across providers, etc.

PDF Insurance Fraud Detection Case Study

Claims analysis

STATISTICA Data Miner helps users understand subtle business trends in claims, which would have been otherwise difficult to spot.

STATISTICA Generalized Linear Models has the Tweedie distribution. This distribution is a flexible predictive modelling option. It can include exact zero and continuous data.

Predict which customers will buy new policies

STATISTICA Data Miner provides the insurance firm with reporting, tracking, and analysis tools to identify trends. Sequential pattern mining functions are powerful and can detect sets of customers associated with frequent buying patterns to inform future sales and marketing campaigns and tactics.

PDF STATISTICA Data Miner in the Insurance Industry, White Paper

How to Save a Microsoft Word Document in STATISTICA?

STATISTICA offers the ability to output your results, tables and graphs, to a Microsoft Word document. This feature makes creating your final analysis report easy. The step of copy and pasting the results to Word is no longer needed. Additionally, with Microsoft Word output, it is easy to share analysis output with colleagues, regardless of if they use STATISTICA.

 

For those using the 64 bit version of STATISTICA and the 32 bit version of Microsoft Word, problems may arise when saving the Microsoft Word document. The dialog to save the file does not default to saving the document as a *.docx, but rather a *.rtf file.

Save as docx

 

The Word document can be saved as a Word document, *.docx, but to do so may not be obvious. These simple steps will allow you to save your Microsoft Word document, created in STATISTICA, as a Word document and not only a Rich Text File.

  1. Change the Save as type to All Files (*.*)
  2. Type in the desired file name, adding the Microsoft Word file extension, *.docx. Now the file will save as a Microsoft Word document as expected.

Save as docx

This Microsoft Word document can be opened and edited in Word.