Mayato Study: STATISTICA Surpasses Top Competitors in User Friendliness, Modern Interface

mayatoMayato Study: STATISTICA Surpasses Top Competitors in User Friendliness, Modern Interface

This past spring, Mayato, a data mining and business analytics consulting company based in Germany, conducted its annual study of data mining tools.

The 2013 study focused on multi-media analytics solutions and pitted several major software vendors against one another. Once again, STATISTICA scored very highly and earned top ranking for user friendliness.

Of over 150 analytics tools on the market, Mayato included STATISTICA among its selection of four data mining suites whose functionality they consider to be comprehensive:

  • StatSoft: STATISTICA Professional 12
  • IBM SPSS Statistics Professional 21
  • SAS Enterprise Guide 5.1
  • Rapid-I: RapidMiner 5.3 / R (open-source)

Each tool had to prove itself in a test scenario covering all phases of a typical analysis project: from data import through the creation of forecasting models (linear regression) to the interpretation of results. Factors affecting the user experience—stability, speed, documentation, and operation—were also evaluated.

Analyst Peter Neckel at ComputerWoche magazine reviewed the study and its competitors in a German-language article published April 25, 2013.

Neckel noted that STATISTICA outstripped the competitive field in the area of user friendliness, thanks to its modern and consistent user interface for all tasks and products. He also expressed appreciation for STATISTICA’s abundant variety of functions, especially regarding the number of available regression, data preparation, and parameterization methods.

Mayato conducted its field test on a sample of real data sets from JustBook, a hotel booking apps provider seeking to distribute its marketing budget efficiently across online and offline channels.

Complete study results are available at

How to Make Model Deployment Easier than Ever with New Workspace Nodes

how-to-articleOur previous How-To article, How to Deploy Models Using SVB Nodes, covered a topic that is becoming increasingly important, especially in data mining applications with a graphical user interface working with nodes that represent data mining algorithms. Rajiv Bhattarai covered the primary topic of deployment using the original STATISTICA Visual Basic (SVB) nodes. As STATISTICA reflects the rapid advances in technology and makes significant investments to remain a leader in predictive analytics, new nodes have been developed. This is a source of many questions, and this article will help to describe the differences between the scripted SVB nodes and the new STATISTICA Workspace nodes. Further, it will be shown how using the new nodes makes model deployment easier than ever.

As with the previous article, this article assumes that you have a basic understanding of how to navigate through the workspace. If you need a refresher, see How to Navigate the STATISTICA Workspace.
New STATISTICA Workspace Nodes v. Scripted Nodes
As you work with STATISTICA Workspaces, you will see two types of nodes in practice; one is the scripted SVB nodes, which are the nodes described in the previous article and will not be the focus of this article. These are indicated by SVB on node icons, as you will see below. The new nodes are introduced as enhancements of the workspace user interface to closely resemble the interactive user interface in the respective modules. Below you will see a comparison of the Boosted Trees Classification SVB node and the new Boosted Classification Trees node.
Boosted Trees Classification SVB node, STATISTICA screenshot
New Boosted Classification Trees node, STATISTICA 12 screenshot
Describing in detail all the additional features of the new nodes is beyond the scope of this article, but here are some highlights that will be beneficial to discriminate between the SVB and new nodes. A few of the properties of the new nodes are:
  • Before the node is run, it will appear with a yellow background. When the node is run, the background will turn from yellow to clear, an indication that you have completed the analysis.
  • Additional functionality is represented by icons on the node:
    • Nodes are run by clicking the green arrow icon located at the lower-left corner of the analysis node.
    • Parameters can be edited by clicking the grey gear icon at the upper-left corner of the node.
    • Node results can be viewed by clicking the report icon at the upper-right corner of the node.
    • Downstream results are indicated by a document icon at the lower-right corner of the node.
    • Nodes can be connected by clicking the gold diamond icon at the center-right side of the node, holding down, and drawing an arrow to another node where you can release the click, thereby attaching two nodes together.
  • Variable selection can be performed on the analysis node.
  • The functionality of the node closely resembles the functionality of the respective interactive analysis. As you can see with the results options for the Boosted Classification Trees above, in the results alone, you have much more control over what output is provided upon completion of the analysis.
  • Deployment functionality is built into the node.
Deployment Example with New Nodes
For this example using historical data of either Good or Bad credit, representing customers who satisfy or default on their loans respectively, we will build and compare the performance of two models to predict Good or Bad credit from future applicants using both Logistic Regression and Boosted Trees.
Open the data set provided with STATISTICA titled creditscoring.sta.
On the Home tab in the Output group, click Add to Workspace and select Add to New Workspace. In the title bar of the workspace, verify that Beta Procedures is selected.
Selecting Beta Procedures tab in New Workspace, STATISTICA
As new nodes are created for algorithms, and as they are fully tested, they are made available in the All Validated Procedures selection. Boosted Trees Classification is currently available using this option. Logistic Regression is currently in the testing process and is therefore only available within the Beta Procedures area.
Within the data set, there is a variable titled TrainTest that separates the data into a training data set and testing data set. To separate this data into these separate groups, do the following:
On the Data tab in the Manage group, click Subset twice to add two subset nodes into the workspace.  Verify that the subset nodes are connected to the data node. One helpful practice for modifying the workspace in order to clearly keep track of your analyses is to rename nodes according to your selection criteria. Edit the names of the nodes (right-click on the name and select Rename) to represent the training and testing data as illustrated below.
Editing names of nodes in STATISTICA Workspace
To edit the parameters of a node, you can either click the gray gear icon at the upper-left corner of the node or double-click the node. In the Include Cases group box, select the Specific, selected by option button. Enter the expression as shown in the next illustration.
Editing Parameters of new workspace nodes, STATISTICA
Complete the same procedure for the subset node that represents the testing data.
In the workspace illustration above, you can see that the Training subset node has been run since it no longer has a yellow background (run your Training node by clicking the green arrow icon at the lower-left corner of the node). Also, the document icon at the lower-right corner means that there is data available for downstream analysis. Clicking on that document icon will open the available data, and when you scroll to the right of the data file, you can verify that only those cases with TrainTest = “Train” have been selected, indicating you have specified the correct inclusion criteria in the subset node.
Training subset node output, STATISTICA
Close the data set.
On the Data Mining tab in the Trees/Partitioning group, click Boosted Trees and select Boosted Classification Trees. On the Statistics tab, in the Advanced/Multivariate group, click Advanced Models > Generalized Linear/Nonlinear and select GLZ Custom Design (beta). Ensure that both nodes are connected to the Training node.
Connecting new nodes in STATISTICA Workspace
Edit the parameters of the Boosted Classification Trees analysis node and make the variable selections shown below.
Editing parameters, variable selections of analysis node, STATISTICA
In the Boosted Classification Trees dialog box, select the Code Generator tab. Verify that the only selection is for PMML.
Code Generator dialog options, STATISTICA
Leave all other settings at their default values, and click the OK button.
Edit the parameters of the GLZ Custom Design (beta) node. On the Quick tab, select Logit model with a Binomial distribution using the Logit Link function.
Quick tab selections for generalized linear models, STATISTICA
On the Model Specification tab, make the same variable selections as indicated in the analysis node for Boosted Classification Trees, as well as only PMML selected on the Code Generator tab, and click OK.
Run both analysis nodes. There will be a warning displayed when logistic regression is being completed.  Ignore it for the purposes of this example, but for more information about zero pivot element messages in the Generalized Linear Model, see: After the analysis computations complete, the workspace will appear as below.
STATISTICA Workspace after running both analysis nodes
To review the results of the analysis on the training data, you could double-click on the Reporting Documents icon. For this example, the focus will be on the performance of these models on the testing data. There are two points that need to be highlighted at this point in the example. The PMML that was generated in our analysis was automatically loaded into the PMML Model nodes, which are downstream of the analysis nodes. Edit the parameters of the PMML Model node that is connected to the Boosted Classification Trees analysis node and select the PMML tab.
PMML script included in node, STATISTICA
You can see that the PMML script that represents this Boosted Classification Trees model is included in this node. Close the Deployment using PMML dialog box.
Connect the Testing subset node to the Rapid Deployment node. The Rapid Deployment node takes the models to which it is connected and applies those models to data to which it is also connected. In this example, it will take the Boosted Classification Trees and Logistic Regression models and apply them to the Testing data.
Run the Testing subset node and verify that you have correctly selected only the Testing data.
Edit the parameters of the Rapid Deployment node. You can review the options of the output from this node outside of this example, but you will find that there is a wide range of output available from including predicted probabilities in the output to ROC curves.
For this example, we will leave all settings at their default values with the exception of the Lift chart settings. On the Lift chart tab, verify that the Lift chart (lift value) check box is selected, with bad as the Category of response.
Rapid Deployement Lift Chart settings, STATISTICA
Run the Rapid Deployment node, which deploys the Boosted Trees and Logistic Regression models onto the Test data. After the node is run, the workspace will appear as below.
Workspace after running Rapid Deployment node on Test data, STATISTICA
To review the results of the Rapid Deployment node, you can either double-click the Reporting Documents nodes, or you can click the document icon at the upper-right corner of the Rapid Deployment node. For this example, review the results by clicking on the appropriate icon on the Rapid Deployment node; this will bring you immediately to the Rapid Deployment results. Select the table of results for Summary of Deployment (Error rates) (Testing).
Table of Results from Rapid Deployment node, STATISTICA
From this table, we can see that the Boosted Trees model had an error rate of 30.5% and the Logistic Regression model had an error rate of 26.3%.  This indicates that at the default settings for the algorithms, the Logistic Regression model performs better than the Boosted Trees model.  In the results folder, select the lift chart.
From this chart, we can see that if we applied both models to all the testing data, and took the top 20th percentile of those cases with the highest predicted probability of the classification Bad, the Logistic Regression model will have a lift value of approximately 1.9 while the Boosted Trees model will have a lift value of approximately 1.7. This again confirms that, using the default settings, the Boosted Trees model is outperformed by the Logistic Regression model.

Popular Decision Tree: CHAID Analysis, Automatic Interaction Detection


The primary goal of churn analysis is to identify those customers that are most likely to discontinue using your service or product. In this dynamic financial industry, companies are progressively providing products and services with similar features. Amidst this ever growing competition, the cost of acquiring a new customer typically exceeds the cost of retaining a current customer. Existing customers are a valuable asset. Furthermore, given the nature of the financial services industry, where customers generally tend to stay with a company for a longer term, churning could lead to substantial revenue loss.

With StatSoft’s Churn Analysis Solution, you can identify customers who are likely to churn by making precise predictions, reveal customer segments and reasons for leaving, engage with customers to improve communication and loyalty, calculate attrition rates, develop effective marketing campaigns to target customers and increase profitability. With STATISTICA’s advanced modeling algorithms and wide array of state-of-the-art tools, you can develop powerful models that can aid in accurate prediction of customer behavior and trends and avoid losing customers.


  • Batch or Real-Time Processing: Use the models you have built to determine churn and indicate, either by batch or in real-time, the customers who are likely to transfer their business to another company.
  • Cutting-edge Predictive Analytics: STATISTICA provides a wide variety of basic to sophisticated algorithms to build models which provide the most lift and highest accuracy for improved churn analysis.
  • Innovative Data Pre-processing Tools: STATISTICA provides a very comprehensive list of data management and data visualization tools.
  • Integrated Workflow: STATISTICA Decisioning Platform provides a streamlined workflow for powerful, rules-based, predictive analytics where business rules and industry regulations are used in conjunction with advanced analytics to build the best models.
  • Optimized Results: Compare the latest data mining algorithms side-by-side to determine which models provide the most gain. Produce profit charts with ease.
  • Role-Based, Enterprise-Wide Scope: If yours is a multi-user collaborative environment, you can use STATISTICA Enterprise to share data, improve churn models, and benefit from collaborative work with small or large groups.
  • Text Mining Unstructured Data: Improve churn models by using powerful text mining algorithms to incorporate unstructured data currently sitting unused in storage.

Data Death Spiral: Too much categorization stymies decision-making

2014-02-choice-curve-200Perhaps some readers are aware of Sheena Iyengar’s (classic) jam choice study from 1995, in which a grocery market try-before-you-buy display was set up with 24 sample jars of jam, alternated every few hours with a much smaller display of 6 jars. As described in the NY Times, considerably more customers were drawn to the larger display; however, the ratio of buyers was only 1/10 the size of the ratio who bought from the limited 6-jar display. Professor Iyengar hypothesized that “the presence of choice might be appealing as a theory, but in reality, people might find more and more choice to actually be debilitating.”

Certainly, given that the availability of choices does have some value, data categorization is important. But when I ran across Seth Redmore’s recent post about his musical background and the size and scope of musical genres on the market today, I could not believe what he had discovered: a laughably over-zealous list of electronic music categories. Thousands of them.

I am by no means a music industry expert, but it seems clear that when a musician/composer arbitrarily invents a unique name for his personal “brand” of music, such action does not mean a new genre has officially come into being. After all, we are talking about classification of “unstructured” content here (i.e., music), not a scientific taxonomy. As a practical matter in the real world where decisions are made, the differentiation of these so-called genres and sub-genres exists only in the minds of the (likely self-absorbed) composers who coined their names.

From a data collection standpoint, the more categories assigned, the greater the chance of miscategorization, misinterpretation, and confusion. This would only hinder the “shared understanding” Mr. Redmore says can be achieved with data categorization, even if music providers claim such categorization is intended to help consumers find exactly what they want.

My counter-intuitive point here (and maybe Redmore’s, too) is that the consumer cannot possibly know what he wants when faced with so many non-standardized music choices with ridiculously similar genre names like ritual ambient v. black ambient v. doom ambient v. drone ambient v. deep ambient v. death ambient. Mr. Redmore even mentions Netflix with its nearly 77K movie categories! From a marketing standpoint, that is crazy–There is simply no practical reason to attempt the creation of big data where such breadth is detrimental to decision-making. And this would be true whether in the online music room or in the executive board room.

Collaborative Analytics

STATISTICA Enterprise combines all the products from the STATISTICA family with the latest technologies for enterprise computing. STATISTICA Enterprise is an integrated multi-user software system that merges industry-standard database technologies with all the statistical and data mining analyses in STATISTICA. Reports can be configured in standard formats (HTML, PDF, Word). Access is controlled with user passwords and permissions.

This makes STATISTICA Enterprise a powerful tool for general purpose data analysis and business intelligence applications, as well as applications in manufacturing, research, marketing, and finance.

In business environments, STATISTICA Enterprise can be easily integrated into existing systems. And  it can complement other software systems, such as ERP (Enterprise Resource Planning) software.

STATISTICA Enterprise Offers:

  • Knowledge-sharing functionality that encourages collaboration among users
  • State-of-the-art database connectivity options to access existing database management systems
  • Analytic Report Generation
  • Optionally process data from remote data servers “in place” (that is, without having to import data to a local storage device)
  • Data filtering, automatic data monitoring and analysis, error detection and alarming
  • Easy-to-use administration tools

Upgrade to STATISTICA Enterprise/QCif statistical process control/quality control are needed. It is designed for local and global enterprise quality control/improvement and Six Sigma applications. It includes a high performance database (or optimized interface to existing databases), real-time and remote monitoring and alarm notification for the production floor, a comprehensive set of analytical tools for engineers (all the functionality of STATISTICA QC Charts, Process Analysis, Design of Experiments, and much more), sophisticated, Web-enabled user interface and reporting features for management, Six Sigma reporting options, and much more.



Knowledge-sharing functionality that encourages collaboration among users

Standard network versions of application programs typically have no (or very limited) support for the collaborative work of groups of users, and (with the exception of designated multi-user database management applications) usually have no support for central, multi-user repositories of data. The main advantages of standard network versions of application programs are:

  • lower cost “per seat” compared to stand alone programs
  • saving disk space (since only one copy of the application files resides on the network)
  • ease of patching or upgrading to new versions (only one copy needs to be reinstalled or patched)

However, no file sharing or any other groupware/multi-user features are supported in such applications, and, for example, two users cannot work on the same file.

STATISTICA Enterprise users can share queries of any degree of complexity, allowing them to retrieve specific subsets of data from central repositories and share scripts of analyses that can be centrally updated. For example, predefined reports that can be centrally modified by supervisors analysts. The results of their work can be shared either in the local environments (by making them available to other users who enjoy the respective access privileges), or the global network (by publishing HTML reports on the Internet/Intranet).

Moreover, with the addition of the optional WebSTATISTICAfunctionality, users can benefit from the power of STATISTICA using virtually any computer in the world that is connected to the Internet.

State-of-the-art database connectivity options to access existing database management systems

Fully integrated with a suite of system administration tools, STATISTICA Enterprise provides an efficient general interface to enterprise-wide repositories of data. Data can be accessed via industry-standard database protocols such as OLE DB and ODBC. Or data historian repositories such as the PI Data Historian from OSI Soft, Inc, can be used.

STATISTICA Enterprise is organized around a central STATISTICA Enterprise configuration database that can be installed on any industry standard database management system. This includes all major scalable systems such as Oracle, Microsoft SQL Server, IBM DB2, etc. The installation of the STATISTICA Enterprise warehouse can be set up using a pre-defined database template (schema), so the deployment of the system is relatively simple.

This enterprise data interface function makes data easily accessible, and provides one of the important advantages of STATISTICA Enterprise. In addition, the comprehensive STATISTICA Enterprise security management system allows the administrators to assign specific access privileges to particular categories of users.

Analytic Report Generation

Report generation is an important component of the STATISTICA Enterprise architecture. You can use report configurations and report generation in STATISTICA Enterprise to create formatted documents (PDF, HTML, MS Word) and analysis summaries of any of the tabular and graphical results produced by STATISTICA.

STATISTICA Enterprise provides a graphical user interface for defining the layout of formatted documents, including the placement of graphs and tables, the contents and formatting of headers/footer, static and dynamic text elements, and any additional formatting elements specific to the document type. The results of the report template definition are saved as a STATISTICA Report template document. Importantly, these Report Templates are stored centrally, wrapped with user access control and security, in the STATISTICA Configurations Database, deployed on an industry-standard relational database management system (RDBMS). Reports may either be run on-demand or as batch, scheduled tasks.

Optionally process data from remote data servers “in place” (without having to import data to a local storage device)

STATISTICA Enterprise offers options to process data from remote databases “in place” without the need to import the data to the local storage device. This technology produces significant performance gains (compared to importing the data subsets before they can be processed). It also allows you to process datasets that are larger than the local storage device’s capacity (e.g., terabytes of data).

Automatic data monitoring/analysis; analytic auto-responding; enterprise data broadcasting

In today’s rapidly changing business world, success depends more than ever on a business’ ability to quickly respond to the changing conditions. The reliance on comprehensive insight into the available data and the ability to quickly respond either directly or by performing appropriate, predefined analyses are no longer a luxury but a real necessity. The proactive data broadcasting and automated analysis functions available in STATISTICA Enterprise are the ideal complement to procedures found in ERP software.

STATISTICA Enterprise features powerful facilities to automatically react to user-defined conditions in data. These custom-defined conditions can be of practically any complexity and they can even represent results of on-line analyses performed by STATISTICA Enterprise in real-time on the incoming data stream or on sampled data.

These flexible facilities are built using StatSoft’s real-time data monitoring technologies and they can be used in countless business or research applications. Facilities are provided to setup any STATISTICA Enterprise workstation as an automatic data monitor and/or processor, that will fetch or sample the appropriate data subsets from the STATISTICA Enterprise or other enterprise data warehouse, perform the predefined analyses and then respond appropriately. For example, when certain conditions are met (e.g., the price of a particular commodity reaches a certain threshold, certain inventory falls below a particular level, the number of complaints or registered defects compared to the moving average exceeds a preset tolerance level), then STATISTICA Enterprise will automatically execute a predefined action (e.g., send e-mail, send a fax, call a pager, or simply broadcast the relevant information or the result of analyses to selected members of the organization or nodes of the STATISTICA Enterprise installation).

Easy-to-use administration tools

Easy-to-use administration tools in STATISTICA Enterprise provide the power to define the specific permissions of users, the queries to external data sources, and the reports to be generated. Flexible tools allow you to customize the view that a user sees, organized by department, report type, etc. STATISTICA Enterprise is also centrally managed so that changes made to the system through the administration tools are immediately available on all workstations. The administration tools in STATISTICA Enterprise are similar to those available in STATISTICA Enterprise/QC.


System Requirements


STATISTICA Enterprise is compatible with Windows XP, Windows Server 2003, Windows Vista, Windows 7, and Windows Server 2008.

This product requires the installation of a database. StatSoft supports the use of ODBC compliant databases such as Access, SQL Server, Oracle, and others.

System Requirements are based on an average size implementation. Server requirements are based on the number of concurrent users simultaneously accessing the system.

Minimum System Requirements

Operating System: Windows XP or above
Processor Speed: 1 GHz

Recommended System Requirements

Operating System: Windows Server 2003 or later
Processor Speed: 2.0 GHz, 64-bit, dual core


  • System Requirements are based on an average sized implementation.
  • For the 32-bit version of STATISTICA, a 64-bit processor and operating system is recommended due to the better memory management of the 64-bit operating systems.

Butler Analytics Highlights STATISTICA’s Strengths

Butler Analytics Highlights STATISTICA’s Strengths

butler-analytics-web-200Martin Butler of Butler Analytics, a London-based business intelligence consultancy, has earned a reputation as a well-informed, vendor-neutral resource for his clients. Butler regularly speaks and writes publicly about industry trends and analytics solutions, so he recently made time to learn about STATISTICA. His review makes note of STATISTICA‘s integration, graphical interface, and flexibility.

“One of the most powerful aspects of the product set is the level of integration, with seamless connections between disparate modes,” observes Butler. “Statistics, machine learning, data mining and text mining are all at the disposal of the user without having to migrate from one environment to another.”

He also appreciates STATISTICA‘s updated GUI, “a graphical interface where workflows can be constructed to process data…and used by anyone who has permissions.”

After listing a small sample of our software’s data mining and text mining functionality, Butler notes that larger organizations can scale up with STATISTICA Enterprise™ platform that “provides an enterprise working environment for business users as well as analysts.” Butler notes this broad enterprise appeal is achieved through a myriad of business tools (e.g., “analysis, reports and dashboards they can use, and various forms of monitoring and alerts,” he says) as well as flexibility of model deployment, which he confirms “includes PMML, C, C++, C#, Java, SAS, SQL stored procedures, and Teradata.”

Previously, Butler has referenced STATISTICA as a SAS Alternative and has included STATISTICA Text Miner among his list of Top Text Analytics Platforms.

See Butler’s full write-up here.

STATISTICA Data Warehouse





The STATISTICA Data Warehouse system is a complete, powerful, scalable, and customizable intelligent data warehouse solution, which optionally offers the most complete analytic functionality available on the market, fully integrated into the system.

  • Features and Benefits
  • Architecture and Connectivity
  • Web-Enablement
  • Advanced Security and Authentication
  • Document Control
  • Advanced Analytics
  • Programmability and Customizability

data warehouse database

STATISTICA Data Warehouse consists of a suite of powerful, flexible component applications that include the following:

  • STATISTICA Data Warehouse Server Database
  • STATISTICA Data Warehouse Query
  • STATISTICA Data Warehouse Analyzer
  • STATISTICA Data Warehouse Reporter
  • STATISTICA Data Warehouse Document Repository
  • STATISTICA Data Warehouse Scheduler
  • STATISTICA Data Warehouse Real Time Monitor and Reporter

If you are new to data warehousing, StatSoft consultants will guide you step by step through the entire process of designing the optimal data warehouse architecture, from a comprehensive review of your information storage and extraction/analysis needs, to the final training of your employees and support of your daily operations.

Telenor Digital Selects StatSoft’s STATISTICA Decisioning Platform® for Global Credit Risk Management Projects

telenor-digital-200TULSA, OK, USA [January 28, 2014] – StatSoft announced today that Telenor Digital has selected STATISTICA Decisioning Platform® for its global risk analysis solution.
Driven by its mission to bring modern communication infrastructures to customers worldwide, Telenor Digital has embarked on a new goal to provide flexible credit instruments and opportunities to heretofore under-banked customers. For this purpose, Telenor Digital evaluated leading analytic software platforms available on the market to supply the critical capabilities for modeling, model management, model deployment, and compliance reporting.
Important requirements included the analytics platform’s ability to accommodate diverse regulatory and physical constraints of deployment, great flexibility with respect to interoperability with other systems, and easy deployment to virtual and cloud environments. After a careful review, Telenor chose StatSoft’s STATISTICA Decisioning Platform to meet these challenges, while providing a proven and robust solution capable of supporting a dynamically growing enterprise.
Dr. Thomas Hill, Vice President for Analytic Solutions at StatSoft, Inc., remarked, “We are excited to have this opportunity to deploy our Decisioning Platform to Telenor Digital, and to support Telenor Digital’s continuous drive toward delivering new opportunities to new customers. Government regulated financial services demand the highest quality components, most stringent documentation and validation, and advanced capabilities, not only around analytics but also security, version control, and audit logs. Working for more than a decade in this domain, StatSoft has developed a deep understanding of these requirements, and a mature platform to meet them.”
The initial installation of STATISTICA Decisioning Platform was completed at the end of 2013.
Telenor Group is an international provider of tele, data and media communication services. Telenor Group has mobile operations in 12 markets in the Nordic region, Central and Eastern Europe, and in Asia, as well as a voting stake of 42.95 percent (economic stake 33 percent) in VimpelCom Ltd., operating in 17 markets. Headquartered in Norway, Telenor Group is one of the world’s major mobile operators with more than 160 million mobile subscriptions in its consolidated operations per Q2 2013, revenues in 2012 of NOK 102 billion, and a workforce of nearly 34,000.
StatSoft, Inc., was founded in 1984 and is now one of the world’s largest providers of analytics software, with 30 offices around the globe and more than one million users of its STATISTICA software. StatSoft’s solutions enjoy an extremely high level of user satisfaction across industries, as demonstrated in the unprecedented record of top ratings in practically all published reviews and large, independent surveys of analytics users worldwide. With its comprehensive suite of STATISTICA solutions for a wide variety of industries, StatSoft is a trusted partner of the world’s largest organizations and businesses (including most of the Fortune 500 companies), providing mission-critical applications that help them increase productivity, control risk, reduce waste, streamline operations, achieve regulatory compliance, and protect the environment.

STATISTICA Enterprise Server Knowledge Portal





With STATISTICA Enterprise Server portal tools, you can post up-to-date reports, charts, and tables on the Internet automatically, virtually in real-time, and without knowledge of HTML or Java programming languages.

The product is available in two versions:

  • Knowledge Portal – is a powerful, Web-based, knowledge-sharing tool that allows your colleagues, employees, and/or customers (with appropriate permissions) to log in and quickly and efficiently get access to the information they need, by reviewing predefined reports.
  • Interactive Knowledge Portal – offers to the portal visitors all the functionality of the Knowledge Portal and additional options.  These options include allowing the user to define and request new reports, run queries and custom analyses, drill down and up, slice/dice data, and gain insight from all resources that are made available to them by the portal designers or administrators.

In addition, the STATISTICA Enterprise Server Knowledge Portal can be integrated with the optional STATISTICA Document Management System.

STATISTICA Sequence, Association, and Link Analysis





STATISTICA Sequence, Association and Link Analysis (SAL) is designed to address the needs of clients in healthcare, retailing, banking and insurance, etc., industries. It can be used for model building and deployment. SAL is an implementation of several state-of-the-art techniques specifically designed for extracting rules from datasets (databases) that can be generally characterized as “market-baskets.”

fruit basket, creative common license,“Market-Basket” Metaphor

The market-basket problem assumes that there are a large number of products that can be purchased by the customer, either in a single transaction, or over time in a sequence of transactions. Such products can be goods displayed in a supermarket, spanning a wide range of items from groceries to electrical appliances, or they can be insurance packages which customers might be willing to purchase, etc. Customers fill their basket with only a fraction of what is on display or on offer.

Association Rules

Association Rule graphAssociation rules can be extracted from a database of transactions, to determine which products are frequently purchased together. For example, one might find that purchases of flashlights also typically coincide with purchases of batteries in the same basket. Likewise, when transactions are time-stamped, allowing the analysts to track purchases.

Sequence Analysis

Sequence Association GraphSequence analysis is concerned with a subsequent purchase of a product or products given a previous buy. For instance, buying an extended warranty is more likely to follow (in that specific sequential order) the purchase of a TV or other electric appliances. Sequence rules, however, are not always that obvious and sequence analysis helps you to extract such rules no matter how hidden they may be in your market-basket data. There is a wide range of applications for sequence analysis in many areas of industry and since including customer shopping patterns, phone call patterns, the fluctuation of the stock market, DNA sequence and web-log streams.

Link Analysis

Once extracted, rules about associations or the sequences of items as they occur in a transaction database can be extremely useful for numerous applications. Obviously, in retailing or marketing, knowledge of purchase “patterns” can help with the direct marketing of special offers to the “right” or “ready” customers (i.e., those that, according to the rules, are most likely to purchase some specific items given their observed past consumption patterns).

However, transaction databases occur in many areas of business, such as banking, as well as general customer “intelligence.” In fact, the term “link analysis” is often used when these techniques — for extracting sequential or non-sequential association rules — are applied to organize complex “evidence.”

It is easy to see how the “transactions” or “market-basket” metaphor can be applied to situations where individuals engage in certain actions, open accounts, contact other specific individuals, and so on. Applying the technologies described here to such databases may quickly extract patterns and associations between individuals and actions, and hence, reveal the patterns and structure in datasets.