First Look at ”STATISTICA” – Decision Management
First Look – StatSoft STATISTICA
James Taylor on Everything Decision Management
January 31, 2012
StatSoft was founded in 1984 and started building statistical software when it first became practical to deliver on the PC. STATISTICA is an enterprise predictive analytics platform on the Windows platform with role-based access, connections to the various data sources that companies have and support for data exploration through to deployment. The product has four main pieces:
- Windows-based analytics Workbench for analysts.
- Decision Management to combine models and rules to automate decision-making.
- Enterprise Server to support multiple users in a client/server environment.
- Enterprise Workspaces to capture the data analysis process from end to end and for managing metadata, decision-making workflow etc.
STATISTICA is a long time Windows platform user and Microsoft partner. As a result it offers a solution that is tightly coupled with Intel multi-core chips very well integrated with Windows. Everything is available as an API call in .Net making it easy to integrate into SharePoint or other Windows applications.
The components get combined into various analytic applications such as a warranty analytics solution, credit scoring, collections, cross-sell, insurance fraud detection, subrogation, price optimization, marketing mix optimization and more. These solutions can be completely automated, accessing multiple data sources, running tens or hundreds of predictive analytic models, writing results back into the database and monitoring the performance of the models.
With version 11, StatSoft released the STATISTICA Decisioning Platform that pulls together all the existing product capabilities with new rules management, integrated rules scoring, and other capabilities. The suite now includes:
- Templated data access
- Data pre-processing
- Rules management
- Modeling tools including accelerated logistic regression
- Version control
- Direct deployment
Everything is managed in an enterprise metadata repository deployed on a relational database. Workflows and other components for model creation or business rules are created, managed in the enterprise repository and deployed to a server for execution. Multiple projects and folders can be managed in the repository and permissions are layered onto these. Data access templates, analysis templates, decisioning flows and rules are all managed in this repository. Decision flows with models and rules are checked in and then used to drive reporting (integration with MS document tools), batch scoring for writing back to the production database or deployment to the STATISTICA Live Score Server for real-time decisioning using web services calls. There is a Monitoring and Alerting server for dashboards that monitor model performance and there is an integrated Document Management System for version control and approvals of models.
A decisioning flow involves several steps using the STATISTICA Enterprise Manager product. At each stage elements are retrieved from the repository based on the access defined for users and can be written back to the repository for management and reuse.
- The first step is to retrieve data from data connection and configuration templates. Users may have access to the underlying queries or just to the data. Data from multiple data connections can be used and a wide range of ETL functions are available in the data manipulation step.
- Data can be prepared and recoded, using Weight of Evidence for instance, and these transformations are then deployed as rules that can be versioned and reused. The rules are sequential and can assign text labels as well as transform the data. The rules are deployed to the enterprise server and can be associated with the data source. They can then be included in the defined workflow.
- Models can then be built using various modeling techniques and embedded in the flow. A wide range of modeling techniques are supported and the workflow can create multiple models, combine or compare them etc.
- Additional rules can be added to the workflow. The rules node contains a sequential set of rules built using an editor that has some integration with the data structures being manipulated and has a nice tree structure to allow rules to be collapsed. Temporary variables can be managed and models can be executed by the rules as necessary. Reason codes can be assigned using array handling that lets you build a set of reason codes. Rules can be reused across batch and real-time environment and multiple workflows. Rules have access to the full range of mathematical functions also.
- The whole workflow can then be deployed to the various deployment options.
A debugger allows a set of records to run through the flow and see which transactions fired which rules. While the rules do not offer conflict detection there is some error detection (use of a variable that is not defined for instance) and some tools in the enterprise platform to see which objects refer to which other objects. Users can run multiple paths in a workflow for comparison purposes and can then use analysis tools built into the modeling environment to see what difference a change would make or which approach would be more profitable.
Besides executing the complete decisioning workflows using the STATISTICA products, all of the models can also be deployed as C, C++, PMML, Visual Basic, SAS, Java or C# Stored Procedure. The tool also supports Visual Basic scripting and this can also be used to push things into the database programmatically.
StatSoft will be one of the vendors listed in the forthcoming report on Decision Management Systems platform technologies.