Monthly Archives: August 2013

STATISTICA Secret, Competitive Advantage with Analytics

 

statistica secretI work on very diverse projects and every day is interesting. But I am challenged to write blogs about these projects. My biggest challenge?

 

Non-disclosure agreements (NDA)…

 

StatSoft lists hundreds of customers on our website. But for some customers, STATISTICA is their “secret sauce”… their competive advantage for Enterprise data analytics. And they don’t want to lose the advantage, so they have a NDA with us. Some companies are private and many are publicly traded. Some are worth millions and many are worth billions.

 

Even variable names can be covered by the NDA, so I have to be careful with graphs or images that contain the variable names.

 

All this makes it harder to write about “success stories” for my projects. So, I decided to put together a top-3 list of reasons on why STATISTICA is the “secret sauce”.

 

  1. Familiar user interface, easy to learn for business users and statisticians

    STATISTICA uses common windows metaphors, so it is easy to learn. You can use “classic menus” or ribbon bars. (I highly recommend using ribbon bars.)

    * File menu to open and save files
    * Edit menu to edit content
    * Statistics and Data Mining for statistical analyses
    * Data for data management options

  2. Data, data everywhere and not a drop to drink

    Buying analytics software is normally not enough. StatSoft sells enterprise analytics software and services (installation, training, custom development, statistical consulting, predictive analytics consulting). Our focus is to get the customer quickly using Enterprise Analytics software. Frequently this means we need to work with the customer on data issues.

    Sometimes this means we are contracted to build a data mart. Sometimes this means we create or script complex workflows. And sometimes we just mentor the customer on creating workflows. It varies from customer to customer.

    StatSoft customizes services for the customer and the situation. We don’t force round pegs into square holes.

  3. Flexible, customizable software

    Easy to create and modify:

    * workflows
    * rules based decisioning
    * roles based login accounts
    * scripting language
    * API

Property and Casualty Insurance

In the competitive environment of P&C insurance, it is more important than ever to reduce costs, focus acquisition, retention and expansion tactics on high-value customers, and deliver the right level of service to suit a particular claim. With STATISTICA Decisioning Platform®, predictive analytics and reporting solutions for P&C Insurance applications enable insurers to utilize data about its claims, policies, customers, and third party data sources to improve underwriting decisions, detect fraud, identify opportunities for subrogation, update reserve estimates, assign the right service level to claims, and segment customers across all product lines for: auto, worker’s compensation, home, commercial property, disability, and any other specialized product lines.

StatSoft’s approach to engaging with a Property and Casualty Insurance Company is very collaborative and results-oriented. StatSoft’s predictive modeling, analytics, and reporting solutions are designed to enable your company’s personnel; the StatSoft team works with your company’s key stakeholders to understand current personnel, processes, and systems. The STATISTICA solution is configured to fit and augment your company’s current capabilities. The collaboration begins with an assessment and agreement on business goals and the way that the results from the use of the STATISTICA solution will be measured. From that starting point, StatSoft and you collaborate to prioritize the incremental investment in order to use the ROI from the first project to pay for the next project.

Predictive Analytics May Drive Down Claim Costs at Many Points Across the Claim Lifecycle

predictive claims score control

StatSoft has developed the Predictive Claims Flow®, incorporating predictive modeling at each stage of the lifecycle of a claim. From first notice of loss through to closure, claim data is constantly changing.

As new information about a claim becomes available,with every new piece of information, the Predictive Claims Solution automatically scores the probability of fraud, updates the predicted reserve estimate, predicts claim complexity to determine whether the claim should be assigned to a more senior adjuster, identifies opportunities for subrogation, and determines the right level of servicing of the claim. At each stage, alerts and updates are provided with minimal human intervention to adjusters, SIU managers and claim personnel, and all other key stakeholders in the overall claims adjudication process.

Key Capabilities

Text Mining. Much of the data for the analysis of claims is unstructured text, in the adjuster’s notes, in the medical reports, emails, etc. The text mining solution combines this along with predictions from your structured data supply more accurate and precise predictions.

Real-time Predictions and Integration with Claims Management Systems. The STATISTICA Solution is optimized for performing real-time predictions for supporting instant underwriting decisions or evaluating a claim as new information is made available.

Reporting. Aggregated summary reports and configurable dashboards provide valuable information both to management and for tracking key performance indicators related to each functional area.

Reason Codes. In addition to predictions and recommendations, the STATISTICA solution provides information about the reasons for the decision both for the awareness of key personnel and regulatory reasons, when applicable.

Integration with Data Sources. The STATISTICA solution simplifies access to data from your company’s customer database, policy database, claims database, and third party data sources.

Data Preparation and Management. Data in other databases are rarely ready for analysis. STATISTICA includes all of the necessary recoding, transformation, and data aggregation procedures for preparing your data for analysis and scoring.

Resources Management. The STATISTICA solution provides the capabilities for managers to provide input and direction about the available personnel resources. For example, SIU Managers can decide how many claims the department can handle so that the claims that are the highest probability for fraud are the ones that are reviewed and investigated, making better utilization of available resources and assigning the most complex claims to the more senior personnel.

Learn and Grow your Advanced Analytics Skillset with Decisioning Platform

Learn new skills and grow your team’s expertise. Shown below is a graphic that illustrates how our customers begin their journey down the road to Predictive Analytics and Data Mining. They start off by replicating and streamlining their current reporting capabilities (as illustrated by the top down flow along the right side of the graphic).The STATISTICA Enterprise Server, part of the Decisioning Platform, enables the adoption of analytic and automated decision support aids and systems throughout the enterprise. All of your users will begin their work at the same point as they extract and prepare their data, by working with the tools provided within StatSoft’s Decisioning Platform to automatically source data, as depicted by the large funnel. (STATISTICA facilitates the acquisition of data with reusable, pre-configured data templates that can include filtering capabilities and access multiple databases.)

Then, based on their job roles and skill sets, users will proceed along the right or the left side of the graphic to complete their work. If their focus is on reporting, they will follow the path on the right. If they need to create a predictive model or perform some data mining activities, incorperating business rules, they will follow the path on the left.

Predictive Analytics Applications in Claims, Marketing, Underwriting, and Sales

Claims

Accelerated detection of claim severity
Claim draft authority optimization
Claims assignment automation by competency
Fast tracking claims
Predict claim complexity
Predict reserves / draft authority optimization
Real-time fraud detection early in FNOL1
Reduce working capital

Marketing

1:1 marketing
Campaign optimization
Customer segmentation
New product market analysis / pricing
Now market to non-fraudulent prospects
Outbound Predictive Marketing
Optimize leads delivered to your agents
Real-time inbound intelligent cross-sell
Spend less to obtain higher quality business

Underwriting

Automated renewal processing
Automated underwriting / risk selection
Optimized discount/credit recommendation
Predict lifetime customer value
Retain the “better” risk
Underwriting fraud detection

Sales & Service

Agent /Broker Performance effectiveness
Commission modeling and optimization
Cross-sell, up-sell, offer optimization
Field sales force optimization
In and outbound Customer Retention offers
Intelligent call routing
Smart / real time recommendations

How to Most Efficiently Store Your Data

statistica how to logoWhen working with large data files, it becomes important to look for ways to make one’s processes more efficient. File size and computation times can both be affected by how data is stored.
Many variables can be stored more efficiently merely by changing a few of the default settings. In this brief article, we will explore the various methods to help make spreadsheet storage and computations more efficient.
To view and change the storage method of a given variable, click on the variable header in the spreadsheet. Then, select the Data tab and in the Variables group, click Specs to display the variable specification dialog box for the selected variable. You can also double-click on the variable header to display this dialog box. In the drop-down box labeled Type, you will find the data storage options. The default data storage method is double precision. In STATISTICA, it is called simply Double.
selecting double precision in STATISTICA
For variables stored with double precision, values are stored as 64-bit floating point real numbers, with 15-digit precision. The range of values supported by this data type is approximately +/-1.7*10308.
The next option, Text, is used for storing text data. The Length should be specified to store the number of characters needed. As you would expect, the longer the designated length of the text variable, the more storage space the data takes. So the length parameter should be set as small as possible to capture the full text.
For some types of numeric data, the double precision data storage is necessary. Any variable with values that have decimals or are extremely large or small require this storage type. But many variables are stored with far greater precision than necessary, and this is where we can change the data type and gain efficiency.

The integer data type takes on integer values between +/- 2,147,483,647. Variables stored with this method are still more efficient, with 4 bytes per cell, compared to 8 with double precision.
The byte data type takes on integer values from 0 to 255 and is the most economical storage option. For variables needing only small integer values, this data type should be used and only takes 1 byte of storage per cell in the spreadsheet.
Using the most efficient storage method for your variables makes for smaller spreadsheet files and faster computing.

How to Estimate a Regression Model Subject to Parameter Constraints

STATISTICA how-to logoWith multiple linear regression, a statistical model is computed to explain the variability in the dependent variable, Y, as a function of one or more independent variables, X. The model parameters are calculated so as to minimize the difference in the observed y values and the predicted ones. Model parameters have no other constraints in a typical regression analysis.
What if it is desired or necessary to compute a regression model, while ensuring that one or more parameters conform to a set of constraints? This article will explore one avenue of constraining model parameters during the regression function computation using the custom loss function.
Example
For this example, we’ll use the Baseball.sta example spreadsheet in STATISTICA. Select the Home tab and, in the File group, click the Open arrow. Select Open Examples. In the Open a STATISTICA Data File dialog box, double-click on the Datasets folder, then browse to and open Baseball.sta.
STATISTICA Baseball.sta example spreadsheet
In this example, the goal is to model the variable WIN as a function of RUNS, BA, and DP, with no intercept. Using General Linear Models, the regression parameter estimates for the sigma-restricted model are as follows.
Regression parameter estimates
These are parameters that are subject only to the constraint of minimizing the squared errors for the estimates from the chosen model.
Adding Constraints
Now, suppose it is necessary to compute the best regression function for the same variables, no intercept, while constraining all parameters in the model to be positive. How is this accomplished in STATISTICA? By adding penalties to the loss function for parameters outside the desired range.
Select the Statistics tab. In the Advanced/Multivariate group, click Advanced Models and select Nonlinear Estimation. In the Nonlinear Estimation Startup Panel, select User-specified regression, custom loss function.
STATISTICA nonlinear estimation startup panel
Click the OK button.
In the User-Specified Regression, Custom Loss dialog box, click the Function to be estimated & loss function button to display the Estimated function and loss function dialog box. Enter the model to estimate. In this case, the model is:
‘WIN’ = B1 * ‘RUNS’ + B2 * ‘BA’ + B3 * ‘DP’
The custom Loss Function is where the constraints of the model are conveyed. The default loss function is the squared difference in observed and predicted Y, WIN in this case. Adding penalty functions to the loss function when parameters go outside of the desired range will effectively constrain those parameters within our desired limits. Here, a penalty should be added if any of the parameters, B1-3, should be other than positive.  This is achieved with the following custom loss function:
(OBS-PRED)**2 + (B1<0) * 1000 + (B2<0) * 1000 + (B3<0) * 1000
The loss function is very heavily penalized when any parameter is negative.
STATISTICA heavy penalty for negative parameter
Click OK in the Estimated function and loss function dialog box. Accept all other default settings, and click OK in the User-Specified Regression, Custom Loss Startup Panel, and then click OK in the Model Estimation dialog box to advance to the Results dialog box. Output the Summary spreadsheet to view the new parameter estimates in this constrained regression. As expected, all parameters are positive.
STATISTICA summary spreadsheet with positive parameters
Using this same strategy, a regression equation can be computed subject to your desired constraints.

Predictive Quality Control

Quality control is a goal worth pursuing in any process—manufacturing, customer service, accounting, sales, etc.—that can benefit from a proactive methodology. In any manufacturing process, for instance, it is important to make sure your end product meets specifications. However, rather than fix problems only after they have negatively impacted production, it is often more cost-effective to monitor the process continuously from beginning to end, watching for potential problems.

STATISTICA’s Predictive Quality Control solution not only provide a picture of what’s happening right now, it also makes it possible for you to analyze what has gone right (and wrong) in the past. This enables you to predict future problem areas, optimize the workflow, and continuously adapt and improve your process.

STATISTICA Solution

  • Real-Time Capability: Continuous monitoring of any process to see what’s happening now.
  • Cutting-Edge Predictive Analytics: Predictive modeling of historical process data to optimize workflows.
  • Enterprise-Wide Solution: Deployment of predictive models to anticipate potential future problems.
  • Wide Array of Tools: Quality Control Charts, Design of Experiments, Multiple Regression, Analysis of Variance, Non-Parametric Statistics, and more

Pharmaceutical Manufacturing

The STATISTICA Enterprise software is the global standard GxP analytics platform for empowering engineers, analysts, operators, scientists, managers, and customers with role-based access to data, standard analyses, and reports in order to support process understanding, process monitoring, and regulatory compliance. STATISTICA Enterprise is configured as a key component for pharmaceutical/biopharmaceutical research & development and manufacturing companies’ activities so that knowledge workers have immediate and proactive access to the data and reports about the manufacturing and quality of their respective products.

STATISTICA Solution

User Interface: Engineers and analysts at each site are presented with a user interface that matches what they know: the processes! On the STATISTICA Enterprise screen, they see and interact with the respective unit operations and parameters from the processes.

User/Group Security: Personnel at the sites have different roles and responsibilities: Operators enter data. Managers review summary reports. Engineers monitor real-time auto-updating trends of the current batches. In STATISTICA Enterprise, roles and responsibilities are easily managed using the existing infrastructure of domain and directory accounts so that users log on and electronically sign in using their standard network account username and password.

Data Connections: A primary benefit of the STATISTICA Enterprise server platform is its configured connections to data repositories. In pharmaceutical/biopharmaceutical manufacturing, there are many systems and sources of data, but engineers need the relevant data to investigate an issue immediately. STATISTICA Enterprise empowers users by managing the connections to relevant data sources centrally on the server so that engineers and analysts have point-and-click access to the data they need when they need it.

Analysis Templates: In STATISTICA Enterprise, standard analyses are defined and configured as analysis templates and are made available to the large numbers of other personnel who have responsibility for monitoring and decision making for the respective products and processes.

Report Templates: Organized, formatted report summaries (PDF, MS Word, HTML) are integral to keeping people informed. Key stakeholders are empowered with self-service report summaries about respective products and processes via a Web-based portal configured in STATISTICA Enterprise.

Central Configuration and Management across a Site or Site(s): Configurations of the queries, analyses, reports, and dashboards are managed via the STATISTICA Enterprise Manager administration software and are stored in a central STATISTICA database deployed on a standard relational database management system.

Computer Systems Validation: STATISTICA Enterprise software is easy to validate, with built-in features such as user access control and security, audit trails, and PDF reports to streamline validation. StatSoft’s stature has grown dramatically as a major global provider of analytics software: 1) because STATISTICA is COTS software that requires little if any custom coding or scripts, and 2) due to the ease of validation and deployment as well as the restriction of validation tasks to the intended uses of the system.

User Experience and Ease of Use: STATISTICA Enterprise provides personalized portals to the relevant information, making it easy to use and navigate, so knowledge workers focus on making informed decisions based on alerts and exceptions versus a tedious manual review of each and every report generated, resulting in dramatically reduced manpower and advanced early detection of new trends or shifts.

Automated Alerts: Proactive, automated alerts enable knowledge workers to become aware of shifts, trends, and patterns immediately so they can take action and drill down into the details directly from the report or dashboard.

Report Templates and Report Generation: Many stakeholders need to be provided with self-service options for maintaining awareness of the status of processes, such as standard formatted reports. The STATISTICA Enterprise system automates this process.

Customer Portals: For contract manufacturing, STATISTICA Enterprise provides a secure, standard approach for configuring the different views of data required by each customer. Customers can have access to self-service portals and obtain standard reports to minimize the resources and costs involved in supporting customers’ requests for data and status updates.

Real Time Scoring

Real Time Scoring

The application of predictive analytics is imperative to solve many critical business problems. By applying predictive analytics to determine patterns of historical data, a business enterprise can better refine and achieve its objectives for customer acquisition, customer retention, employee performance, decreased risk, and increased profits. Instant decisions are vital to success. As business evolves, so do our needs and the solutions required to meet those needs. Thus, real-time scoring becomes a necessity for ongoing success.

For credit and insurance applicants, real-time scoring means instant answers complete with full terms. For retailers, real-time scoring means relevant product recommendations and coupons before transactions are complete. For service companies, real-time scoring means targeted information provided to personnel so they can better serve customers and maintain customer loyalty. The quick responses from predictive models can help companies across all industries to be more competitive.

STATISTICA Solution

  • Real-Time Scoring for Instant Answers: STATISTICA Live Score® is an efficient, multi-threaded, and platform-independent scoring tool.
  • Predictive and Data Mining Tools: The most comprehensive selection of predictive modeling tools is available in STATISTICA Data Miner.
  • Easy-To-Use Wizard Approach to Data Mining: Data Miner Recipes guide you through the steps of data mining and predictive analytic projects to clean and prepare data, build models, and deploy.
  • Deployment Code for Multiple Applications: No hand-coding needed! STATISTICA automatically generates necessary code as you build custom deployment applications.
  • Organization and Collaboration: STATISTICA Enterprise simplifies the process of maintaining current predictive models for deployment.

Insurance Industry Solutions

In many ways, life insurance has not changed much over decades except for the utilization of blood tests during the underwriting process and recent patterns in underwriting fraud. Things that have changed are the increased competition among life insurance companies and the impact of lower yields from investments. Both factors demand more accurate and aggressive underwriting decisions and better integration between claims payouts and the links to the underwriting decisions for the respective policies.

STATISTICA provides the unique combination of traditional approaches for predictive modeling (e.g., linear modeling) and the latest developments in advanced analytics and data mining to deliver more accurate underwriting models. Utilizing customer and claims data, the models determine the most important factors responsible for historical claims. Importantly, the STATISTICA solution enables the analysis of losses based on their original underwriting decisions.

StatSoft’s approach to engaging with a Life Insurance Company is very collaborative and results-oriented. StatSoft’s predictive modeling, analytics, and reporting solutions are designed to enable your company’s personnel; the StatSoft team works with your company’s key stakeholders to understand current personnel, processes, and systems. The STATISTICA solution is configured to fit and augment your company’s current capabilities. The collaboration begins with an assessment and agreement on business goals and the way that the results from the use of the STATISTICA solution will be measured. From that starting point, StatSoft and your company agrees on the prioritization and an approach to incremental investment that matches the expected and achieved payoffs.

StatSoft Solution

  • Predictive Modeling. The STATISTICA solution includes both traditional analytics capabilities (e.g., linear models) as well as the latest data mining and predictive modeling approaches for improved flexibility and accuracy.
  • Real-time Predictions and Integration with Claims Management Systems. The STATISTICA Solution is optimized for performing real-time predictions for supporting instant underwriting decisions or evaluating claims.
    Reporting. Aggregated summary reports and configurable dashboards provide valuable information both to management and for tracking key performance indicators related to each functional area.
  • Reason Codes. In addition to predictions and recommendations, the STATISTICA solution provides information about the reasons for the decision both for the awareness of key personnel and regulatory reasons, when applicable.
  • Integration with Data Sources. The STATISTICA solution simplifies access to data from your company’s customer database, policy database, claims database, and third party data sources.
    Data Preparation and Management. Data in databases are rarely ready for analysis. STATISTICA includes all of the necessary recoding, transformation, and data aggregation procedures for preparing these data for analysis and scoring.
  • Resources Management. The STATISTICA solution provides the capabilities for managers to provide input and direction about the available personnel resources. For example, SIU Managers can decide how many claims the department can handle so that the claims that are the highest probability for fraud are the ones that are reviewed and investigated, making better utilization of available resources and assigning the most complex claims to the more senior personnel.

STATISTICA Process Optimization

Process Optimization

Screenshot of EPRI abstract page

Access the EPRI report detailing our technology’s optimization of a coal-fired power plant.

Oil and Gas industry operates with involved processes on all stages from well construction to production of the final product. Finding an optimal operation regime on each step is a key to cutting operation expenses and final product quality improvement. Each step from geological survey to production is accompanied by collection of huge amount of data. When analytics and optimization methods are applied to this data one can optimize production volume based on the well location, lower operation expenses by tuning well completion parameters, estimate ultimate production and risks associated with the process. Greenhouse gas and other environmental regulations also require process optimization on various stages with respect to emissions.

STATISTICA Solution

  • High performance analytic tools: STATISTICA provides users with a set of the most powerful analytical tools well suited to work with large amounts of data.
  • Quick and simple model building and reporting: Advanced models will allow to build accurate models, identify important factors.
  • Modern optimization methods: Optimization methods will help to identify optimal operating conditions, minimize operational expenses and reduce emissions
  • Increase collaboration between departments: STATISTICA Enterprisemetadata repository stores configurations and permissions for multiple users as relevant to their roles. In this multi-user collaborative environment, you can share data, improve churn models, and benefit from collaborative work whether you are working simultaneously with small or large groups.

INCLUDED Technologies

STATISTICA’s Solution for Process Optimization includes all of the analytics tools you need, including:

  • STATISTICA Decisioning Platform
  • STATISTICA Enterprise
  • STATISTICA Data Miner
  • STATISTICA Process Optimization

Emissions Reduction

Emissions Reduction

The Oil and Gas industry is now challenged with increasing competition, regulatory concerns and new stringent federal guidelines while trying to meet growing demand. It is imperative that the Oil and Gas plants ensure stable/robust operations with minimal downtime and reduced emission.

StatSoft Power Solutions, based on cutting edge, proprietary, predictive data mining and analytics, are easy and quick to implement, produce immediate, significant improvement, and are offered at a fraction of the cost of the respective hardware upgrades necessary to produce similar – but often not as effective outcomes.

Whether you are looking to optimize combustion (OFA), stabilize operations, reduce emissions, or predict problems, StatSoft Power Solutions will assist in leveraging your current data to help increase profits and meet recently mandated federal regulations.

STATISTICA Solution

Screenshot of EPRI abstract page

Access the EPRI report detailing our technology’s optimization of a coal-fired power plant.

  • Optimize Combustion: Cyclone, wall-fired, t-fired. Robust flame temperatures. Primary, secondary, and tertiary air values
  • Stabilize Operations: Avoid uncontrolled emissions, excursions, expensive downtime, and generation rollbacks
  • Reduce Emissions: NOx, CO, LOI
  • Predict Problems: Emissions related to combustion optimization, Maintenance issues