Monthly Archives: January 2012

How to Summarize Data in STATISTICA Similar to Pivot Tables

Click here to upload the data file so you’ll be able to work through the example.

To gain understanding of our data, it is helpful to summarize it.  Pivot tables, as found in Microsoft Excel and other programs, are used to summarize data and highlight important information. These tables can help us to extract meaning from data. Common tasks for pivot tables are to count, sum, or average. This is typically performed for classes of a grouping factor. For example, we could find the total sales in dollars and average sales in dollars grouped by region. These sales figures could further be grouped by fiscal quarter.  We can produce at-a-glance information from a large database with these summary tables.

In this example, we are interested in exploring a database of daily rain totals. The data come from the Australian Bureau of Meteorology.
Pivot table 1

To start out, we want to summarize the data with yearly rain totals. To do this, select the Statistics tab. In the Base group, click Basic Statistics to display the Basic Statistics and Tables Startup Panel. Select Breakdown; non-factorial tables.

Pivot table 2

Click OK to display the Statistics BreakDown (non-factorial) dialog box.

Click the Variables button. In the Select the dependent variables and grouping variables dialog box, select the continuous variable Rainfall amount (millimeters) in the Dependent variables list and Year in the Grouping variables list.

Pivot tables 3

Click the OK button.

In the Statistics BreakDown (non-factorial) dialog box, click the Summary button to create the output. The result is a table with the yearly average rainfall, count per year and standard deviation.

Return to the Statistics BreakDown (non-factorial) dialog box, and select the Descriptives tab to view statistics that can be computed.  The mean is computed by default, and other statistics can be added or removed.

Next, we want to find the average rainfall broken down by year and month. Clear the Standard Deviation and Valid N check boxes.

Pivot tables 5

Now, select the Quick tab. Click the Variables button, and add Month to the Grouping Variables. Create the Summary output.

Pivot tables 6

This output lists Year and Month in columns. Most Pivot tables would arrange the output such that one variable was listed across and one was listed down. With a simple data management step, this can be achieved with this output.

Notice at the bottom of the output is an entry for All groups. This row should be removed. In STATISTICA, select the Data tab. In the Cases group, click the Cases arrow, and select Delete to display the Delete Cases dialog box. Select the last case, case number 302.

Pivot tables 7

Click OK to delete.

Now, on the Data tab, in the Transformations group, click Stack to display the Unstacking/Stacking dialog box. Click Variables to display the Select Unstacking Variables dialog box.  Select Month in the Code (column) variables list, Rainfall amount (millimeter) in the Unstack (value) variables list, and Year in the Case ID (row) variables list.

Pivot tables 8

Click the OK button.

Pivot tables 9

Accept the default settings in the Unstacking/Stacking dialog box, and click OK to create the new table of output.

This output shows average rainfall amounts by year and month in a compact, easy to read table.

STATISTICA Data Miner Predictive Modelling Solutions for the Insurance Industry

Life, Disability, Automotive, Health, Property and Casualty, etc.

Companies in the insurance industry are using STATISTICA Data Minerto be more effective and competitive in the utilization of historical data, using the latest predictive modelling and data mining approaches to recognize patterns within terabytes of data. STATISTICA Data Miner allows companies to predict trends in customers’ behaviours and responses, claims, and losses.

Major successes and savings have been achieved by companies using STATISTICA Data Miner for predictive modelling for rate making, fraud detection, and customer segmentation.

Areas of Application

Rate making

STATISTICA Data Miner identifies the most important root causes in the frequency and magnitude of historical losses. Predictive Models relating these primary factors to the frequency and magnitude of losses are then used to update rate tables accordingly, making the insurers more accurate and competitive in their policy rates when compared to more traditional rate making approaches. In the past, General Linear Models were the industry standard approach. Now, more effective prediction of losses is achieved through the use of predictive modelling techniques such as recursive partitioning (i.e., “tree methods“). 

Customer segmentation

STATISTICA Data Miner‘s Clustering module may be used for customer segmentation, by grouping the entire customer base into clusters, identified on the basis of various demographic and behavioural factors. These clusters can then be used for a variety of predictive modelling applications to determine the efficacy of the clusters in predicting outcomes of interest. 

Fraud detection

Claims fraud is a significant and costly concern, costing insurance companies several billion dollars annually. Losses due to fraud have increased dramatically in the past ten years. Despite actions by insurance companies, a large amount of fraud remains undetected.

STATISTICA Data Miner helps the insurance company anticipate and quickly detect fraud and take immediate action to minimize costs. Through the use of sophisticated data mining tools, millions of claims can be searched to spot patterns and detect even subtle variations in billing practices, by analyzing above normal payoffs along different factors like geographical region, agent, and insured party. 

Association Rule GraphSpecifically for health insurance, STATISTICA Data Miner‘s Associations Rules may be used to analyze claim forms. Using the Associations Rule module, the payer will be able to find relationships among medical procedures performed together, patterns in diagnoses and procedures across providers, etc.

PDF Insurance Fraud Detection Case Study

Claims analysis

STATISTICA Data Miner helps users understand subtle business trends in claims, which would have been otherwise difficult to spot.

STATISTICA Generalized Linear Models has the Tweedie distribution. This distribution is a flexible predictive modelling option. It can include exact zero and continuous data.

Predict which customers will buy new policies

STATISTICA Data Miner provides the insurance firm with reporting, tracking, and analysis tools to identify trends. Sequential pattern mining functions are powerful and can detect sets of customers associated with frequent buying patterns to inform future sales and marketing campaigns and tactics.

PDF STATISTICA Data Miner in the Insurance Industry, White Paper