This past spring, Mayato, a data mining and business analytics consulting company based in Germany, conducted its annual study of data mining tools.
The 2013 study focused on multi-media analytics solutions and pitted several major software vendors against one another. Once again, STATISTICA scored very highly and earned top ranking for user friendliness.
Of over 150 analytics tools on the market, Mayato included STATISTICA among its selection of four data mining suites whose functionality they consider to be comprehensive:
- StatSoft: STATISTICA Professional 12
- IBM SPSS Statistics Professional 21
- SAS Enterprise Guide 5.1
- Rapid-I: RapidMiner 5.3 / R (open-source)
Each tool had to prove itself in a test scenario covering all phases of a typical analysis project: from data import through the creation of forecasting models (linear regression) to the interpretation of results. Factors affecting the user experience—stability, speed, documentation, and operation—were also evaluated.
Analyst Peter Neckel at ComputerWoche magazine reviewed the study and its competitors in a German-language article published April 25, 2013.
Neckel noted that STATISTICA outstripped the competitive field in the area of user friendliness, thanks to its modern and consistent user interface for all tasks and products. He also expressed appreciation for STATISTICA’s abundant variety of functions, especially regarding the number of available regression, data preparation, and parameterization methods.
Mayato conducted its field test on a sample of real data sets from JustBook, a hotel booking apps provider seeking to distribute its marketing budget efficiently across online and offline channels.
Complete study results are available at http://www.mayato.com.
Our previous How-To article, How to Deploy Models Using SVB Nodes, covered a topic that is becoming increasingly important, especially in data mining applications with a graphical user interface working with nodes that represent data mining algorithms. Rajiv Bhattarai covered the primary topic of deployment using the original STATISTICA Visual Basic (SVB) nodes. As STATISTICA reflects the rapid advances in technology and makes significant investments to remain a leader in predictive analytics, new nodes have been developed. This is a source of many questions, and this article will help to describe the differences between the scripted SVB nodes and the new STATISTICA Workspace nodes. Further, it will be shown how using the new nodes makes model deployment easier than ever.
- Before the node is run, it will appear with a yellow background. When the node is run, the background will turn from yellow to clear, an indication that you have completed the analysis.
- Additional functionality is represented by icons on the node:
- Nodes are run by clicking the green arrow icon located at the lower-left corner of the analysis node.
- Parameters can be edited by clicking the grey gear icon at the upper-left corner of the node.
- Node results can be viewed by clicking the report icon at the upper-right corner of the node.
- Downstream results are indicated by a document icon at the lower-right corner of the node.
- Nodes can be connected by clicking the gold diamond icon at the center-right side of the node, holding down, and drawing an arrow to another node where you can release the click, thereby attaching two nodes together.
- Variable selection can be performed on the analysis node.
- The functionality of the node closely resembles the functionality of the respective interactive analysis. As you can see with the results options for the Boosted Classification Trees above, in the results alone, you have much more control over what output is provided upon completion of the analysis.
- Deployment functionality is built into the node.
The primary goal of churn analysis is to identify those customers that are most likely to discontinue using your service or product. In this dynamic financial industry, companies are progressively providing products and services with similar features. Amidst this ever growing competition, the cost of acquiring a new customer typically exceeds the cost of retaining a current customer. Existing customers are a valuable asset. Furthermore, given the nature of the financial services industry, where customers generally tend to stay with a company for a longer term, churning could lead to substantial revenue loss.
With StatSoft’s Churn Analysis Solution, you can identify customers who are likely to churn by making precise predictions, reveal customer segments and reasons for leaving, engage with customers to improve communication and loyalty, calculate attrition rates, develop effective marketing campaigns to target customers and increase profitability. With STATISTICA’s advanced modeling algorithms and wide array of state-of-the-art tools, you can develop powerful models that can aid in accurate prediction of customer behavior and trends and avoid losing customers.
- Batch or Real-Time Processing: Use the models you have built to determine churn and indicate, either by batch or in real-time, the customers who are likely to transfer their business to another company.
- Cutting-edge Predictive Analytics: STATISTICA provides a wide variety of basic to sophisticated algorithms to build models which provide the most lift and highest accuracy for improved churn analysis.
- Innovative Data Pre-processing Tools: STATISTICA provides a very comprehensive list of data management and data visualization tools.
- Integrated Workflow: STATISTICA Decisioning Platform provides a streamlined workflow for powerful, rules-based, predictive analytics where business rules and industry regulations are used in conjunction with advanced analytics to build the best models.
- Optimized Results: Compare the latest data mining algorithms side-by-side to determine which models provide the most gain. Produce profit charts with ease.
- Role-Based, Enterprise-Wide Scope: If yours is a multi-user collaborative environment, you can use STATISTICA Enterprise to share data, improve churn models, and benefit from collaborative work with small or large groups.
- Text Mining Unstructured Data: Improve churn models by using powerful text mining algorithms to incorporate unstructured data currently sitting unused in storage.
Perhaps some readers are aware of Sheena Iyengar’s (classic) jam choice study from 1995, in which a grocery market try-before-you-buy display was set up with 24 sample jars of jam, alternated every few hours with a much smaller display of 6 jars. As described in the NY Times, considerably more customers were drawn to the larger display; however, the ratio of buyers was only 1/10 the size of the ratio who bought from the limited 6-jar display. Professor Iyengar hypothesized that “the presence of choice might be appealing as a theory, but in reality, people might find more and more choice to actually be debilitating.”
Certainly, given that the availability of choices does have some value, data categorization is important. But when I ran across Seth Redmore’s recent post about his musical background and the size and scope of musical genres on the market today, I could not believe what he had discovered: a laughably over-zealous list of electronic music categories. Thousands of them.
I am by no means a music industry expert, but it seems clear that when a musician/composer arbitrarily invents a unique name for his personal “brand” of music, such action does not mean a new genre has officially come into being. After all, we are talking about classification of “unstructured” content here (i.e., music), not a scientific taxonomy. As a practical matter in the real world where decisions are made, the differentiation of these so-called genres and sub-genres exists only in the minds of the (likely self-absorbed) composers who coined their names.
From a data collection standpoint, the more categories assigned, the greater the chance of miscategorization, misinterpretation, and confusion. This would only hinder the “shared understanding” Mr. Redmore says can be achieved with data categorization, even if music providers claim such categorization is intended to help consumers find exactly what they want.
My counter-intuitive point here (and maybe Redmore’s, too) is that the consumer cannot possibly know what he wants when faced with so many non-standardized music choices with ridiculously similar genre names like ritual ambient v. black ambient v. doom ambient v. drone ambient v. deep ambient v. death ambient. Mr. Redmore even mentions Netflix with its nearly 77K movie categories! From a marketing standpoint, that is crazy–There is simply no practical reason to attempt the creation of big data where such breadth is detrimental to decision-making. And this would be true whether in the online music room or in the executive board room.
STATISTICA Enterprise combines all the products from the STATISTICA family with the latest technologies for enterprise computing. STATISTICA Enterprise is an integrated multi-user software system that merges industry-standard database technologies with all the statistical and data mining analyses in STATISTICA. Reports can be configured in standard formats (HTML, PDF, Word). Access is controlled with user passwords and permissions.
This makes STATISTICA Enterprise a powerful tool for general purpose data analysis and business intelligence applications, as well as applications in manufacturing, research, marketing, and finance.
In business environments, STATISTICA Enterprise can be easily integrated into existing systems. And it can complement other software systems, such as ERP (Enterprise Resource Planning) software.
STATISTICA Enterprise Offers:
- Knowledge-sharing functionality that encourages collaboration among users
- State-of-the-art database connectivity options to access existing database management systems
- Analytic Report Generation
- Optionally process data from remote data servers “in place” (that is, without having to import data to a local storage device)
- Data filtering, automatic data monitoring and analysis, error detection and alarming
- Easy-to-use administration tools
Upgrade to STATISTICA Enterprise/QCif statistical process control/quality control are needed. It is designed for local and global enterprise quality control/improvement and Six Sigma applications. It includes a high performance database (or optimized interface to existing databases), real-time and remote monitoring and alarm notification for the production floor, a comprehensive set of analytical tools for engineers (all the functionality of STATISTICA QC Charts, Process Analysis, Design of Experiments, and much more), sophisticated, Web-enabled user interface and reporting features for management, Six Sigma reporting options, and much more.
Knowledge-sharing functionality that encourages collaboration among users
Standard network versions of application programs typically have no (or very limited) support for the collaborative work of groups of users, and (with the exception of designated multi-user database management applications) usually have no support for central, multi-user repositories of data. The main advantages of standard network versions of application programs are:
- lower cost “per seat” compared to stand alone programs
- saving disk space (since only one copy of the application files resides on the network)
- ease of patching or upgrading to new versions (only one copy needs to be reinstalled or patched)
However, no file sharing or any other groupware/multi-user features are supported in such applications, and, for example, two users cannot work on the same file.
STATISTICA Enterprise users can share queries of any degree of complexity, allowing them to retrieve specific subsets of data from central repositories and share scripts of analyses that can be centrally updated. For example, predefined reports that can be centrally modified by supervisors analysts. The results of their work can be shared either in the local environments (by making them available to other users who enjoy the respective access privileges), or the global network (by publishing HTML reports on the Internet/Intranet).
Moreover, with the addition of the optional WebSTATISTICAfunctionality, users can benefit from the power of STATISTICA using virtually any computer in the world that is connected to the Internet.
State-of-the-art database connectivity options to access existing database management systems
Fully integrated with a suite of system administration tools, STATISTICA Enterprise provides an efficient general interface to enterprise-wide repositories of data. Data can be accessed via industry-standard database protocols such as OLE DB and ODBC. Or data historian repositories such as the PI Data Historian from OSI Soft, Inc, can be used.
STATISTICA Enterprise is organized around a central STATISTICA Enterprise configuration database that can be installed on any industry standard database management system. This includes all major scalable systems such as Oracle, Microsoft SQL Server, IBM DB2, etc. The installation of the STATISTICA Enterprise warehouse can be set up using a pre-defined database template (schema), so the deployment of the system is relatively simple.
This enterprise data interface function makes data easily accessible, and provides one of the important advantages of STATISTICA Enterprise. In addition, the comprehensive STATISTICA Enterprise security management system allows the administrators to assign specific access privileges to particular categories of users.
Analytic Report Generation
Report generation is an important component of the STATISTICA Enterprise architecture. You can use report configurations and report generation in STATISTICA Enterprise to create formatted documents (PDF, HTML, MS Word) and analysis summaries of any of the tabular and graphical results produced by STATISTICA.
STATISTICA Enterprise provides a graphical user interface for defining the layout of formatted documents, including the placement of graphs and tables, the contents and formatting of headers/footer, static and dynamic text elements, and any additional formatting elements specific to the document type. The results of the report template definition are saved as a STATISTICA Report template document. Importantly, these Report Templates are stored centrally, wrapped with user access control and security, in the STATISTICA Configurations Database, deployed on an industry-standard relational database management system (RDBMS). Reports may either be run on-demand or as batch, scheduled tasks.
Optionally process data from remote data servers “in place” (without having to import data to a local storage device)
STATISTICA Enterprise offers options to process data from remote databases “in place” without the need to import the data to the local storage device. This technology produces significant performance gains (compared to importing the data subsets before they can be processed). It also allows you to process datasets that are larger than the local storage device’s capacity (e.g., terabytes of data).
Automatic data monitoring/analysis; analytic auto-responding; enterprise data broadcasting
In today’s rapidly changing business world, success depends more than ever on a business’ ability to quickly respond to the changing conditions. The reliance on comprehensive insight into the available data and the ability to quickly respond either directly or by performing appropriate, predefined analyses are no longer a luxury but a real necessity. The proactive data broadcasting and automated analysis functions available in STATISTICA Enterprise are the ideal complement to procedures found in ERP software.
STATISTICA Enterprise features powerful facilities to automatically react to user-defined conditions in data. These custom-defined conditions can be of practically any complexity and they can even represent results of on-line analyses performed by STATISTICA Enterprise in real-time on the incoming data stream or on sampled data.
These flexible facilities are built using StatSoft’s real-time data monitoring technologies and they can be used in countless business or research applications. Facilities are provided to setup any STATISTICA Enterprise workstation as an automatic data monitor and/or processor, that will fetch or sample the appropriate data subsets from the STATISTICA Enterprise or other enterprise data warehouse, perform the predefined analyses and then respond appropriately. For example, when certain conditions are met (e.g., the price of a particular commodity reaches a certain threshold, certain inventory falls below a particular level, the number of complaints or registered defects compared to the moving average exceeds a preset tolerance level), then STATISTICA Enterprise will automatically execute a predefined action (e.g., send e-mail, send a fax, call a pager, or simply broadcast the relevant information or the result of analyses to selected members of the organization or nodes of the STATISTICA Enterprise installation).
Easy-to-use administration tools
Easy-to-use administration tools in STATISTICA Enterprise provide the power to define the specific permissions of users, the queries to external data sources, and the reports to be generated. Flexible tools allow you to customize the view that a user sees, organized by department, report type, etc. STATISTICA Enterprise is also centrally managed so that changes made to the system through the administration tools are immediately available on all workstations. The administration tools in STATISTICA Enterprise are similar to those available in STATISTICA Enterprise/QC.
STATISTICA Enterprise is compatible with Windows XP, Windows Server 2003, Windows Vista, Windows 7, and Windows Server 2008.
This product requires the installation of a database. StatSoft supports the use of ODBC compliant databases such as Access, SQL Server, Oracle, and others.
System Requirements are based on an average size implementation. Server requirements are based on the number of concurrent users simultaneously accessing the system.
Minimum System Requirements
Operating System: Windows XP or above
RAM: 1 GB
Processor Speed: 1 GHz
Recommended System Requirements
Operating System: Windows Server 2003 or later
RAM: 2 GB
Processor Speed: 2.0 GHz, 64-bit, dual core
- System Requirements are based on an average sized implementation.
- For the 32-bit version of STATISTICA, a 64-bit processor and operating system is recommended due to the better memory management of the 64-bit operating systems.
Butler Analytics Highlights STATISTICA’s Strengths
Martin Butler of Butler Analytics, a London-based business intelligence consultancy, has earned a reputation as a well-informed, vendor-neutral resource for his clients. Butler regularly speaks and writes publicly about industry trends and analytics solutions, so he recently made time to learn about STATISTICA. His review makes note of STATISTICA‘s integration, graphical interface, and flexibility.
“One of the most powerful aspects of the product set is the level of integration, with seamless connections between disparate modes,” observes Butler. “Statistics, machine learning, data mining and text mining are all at the disposal of the user without having to migrate from one environment to another.”
He also appreciates STATISTICA‘s updated GUI, “a graphical interface where workflows can be constructed to process data…and used by anyone who has permissions.”
After listing a small sample of our software’s data mining and text mining functionality, Butler notes that larger organizations can scale up with STATISTICA Enterprise™ platform that “provides an enterprise working environment for business users as well as analysts.” Butler notes this broad enterprise appeal is achieved through a myriad of business tools (e.g., “analysis, reports and dashboards they can use, and various forms of monitoring and alerts,” he says) as well as flexibility of model deployment, which he confirms “includes PMML, C, C++, C#, Java, SAS, SQL stored procedures, and Teradata.”
See Butler’s full write-up here.
The STATISTICA Data Warehouse system is a complete, powerful, scalable, and customizable intelligent data warehouse solution, which optionally offers the most complete analytic functionality available on the market, fully integrated into the system.
- Features and Benefits
- Architecture and Connectivity
- Advanced Security and Authentication
- Document Control
- Advanced Analytics
- Programmability and Customizability
STATISTICA Data Warehouse consists of a suite of powerful, flexible component applications that include the following:
- STATISTICA Data Warehouse Server Database
- STATISTICA Data Warehouse Query
- STATISTICA Data Warehouse Analyzer
- STATISTICA Data Warehouse Reporter
- STATISTICA Data Warehouse Document Repository
- STATISTICA Data Warehouse Scheduler
- STATISTICA Data Warehouse Real Time Monitor and Reporter
If you are new to data warehousing, StatSoft consultants will guide you step by step through the entire process of designing the optimal data warehouse architecture, from a comprehensive review of your information storage and extraction/analysis needs, to the final training of your employees and support of your daily operations.
Telenor Digital Selects StatSoft’s STATISTICA Decisioning Platform® for Global Credit Risk Management Projects
With STATISTICA Enterprise Server portal tools, you can post up-to-date reports, charts, and tables on the Internet automatically, virtually in real-time, and without knowledge of HTML or Java programming languages.
The product is available in two versions:
- Knowledge Portal – is a powerful, Web-based, knowledge-sharing tool that allows your colleagues, employees, and/or customers (with appropriate permissions) to log in and quickly and efficiently get access to the information they need, by reviewing predefined reports.
- Interactive Knowledge Portal – offers to the portal visitors all the functionality of the Knowledge Portal and additional options. These options include allowing the user to define and request new reports, run queries and custom analyses, drill down and up, slice/dice data, and gain insight from all resources that are made available to them by the portal designers or administrators.
In addition, the STATISTICA Enterprise Server Knowledge Portal can be integrated with the optional STATISTICA Document Management System.
STATISTICA Sequence, Association and Link Analysis (SAL) is designed to address the needs of clients in healthcare, retailing, banking and insurance, etc., industries. It can be used for model building and deployment. SAL is an implementation of several state-of-the-art techniques specifically designed for extracting rules from datasets (databases) that can be generally characterized as “market-baskets.”
The market-basket problem assumes that there are a large number of products that can be purchased by the customer, either in a single transaction, or over time in a sequence of transactions. Such products can be goods displayed in a supermarket, spanning a wide range of items from groceries to electrical appliances, or they can be insurance packages which customers might be willing to purchase, etc. Customers fill their basket with only a fraction of what is on display or on offer.
Association rules can be extracted from a database of transactions, to determine which products are frequently purchased together. For example, one might find that purchases of flashlights also typically coincide with purchases of batteries in the same basket. Likewise, when transactions are time-stamped, allowing the analysts to track purchases.
Sequence analysis is concerned with a subsequent purchase of a product or products given a previous buy. For instance, buying an extended warranty is more likely to follow (in that specific sequential order) the purchase of a TV or other electric appliances. Sequence rules, however, are not always that obvious and sequence analysis helps you to extract such rules no matter how hidden they may be in your market-basket data. There is a wide range of applications for sequence analysis in many areas of industry and since including customer shopping patterns, phone call patterns, the fluctuation of the stock market, DNA sequence and web-log streams.
Once extracted, rules about associations or the sequences of items as they occur in a transaction database can be extremely useful for numerous applications. Obviously, in retailing or marketing, knowledge of purchase “patterns” can help with the direct marketing of special offers to the “right” or “ready” customers (i.e., those that, according to the rules, are most likely to purchase some specific items given their observed past consumption patterns).
However, transaction databases occur in many areas of business, such as banking, as well as general customer “intelligence.” In fact, the term “link analysis” is often used when these techniques — for extracting sequential or non-sequential association rules — are applied to organize complex “evidence.”
It is easy to see how the “transactions” or “market-basket” metaphor can be applied to situations where individuals engage in certain actions, open accounts, contact other specific individuals, and so on. Applying the technologies described here to such databases may quickly extract patterns and associations between individuals and actions, and hence, reveal the patterns and structure in datasets.