Author Archives: statsoftsa

Completing the value chain: data, insight, action


Thomas Hill, Ph.D.

Thomas Hill, Ph.D. Dell Contributor at Tech Page One

Thomas Hill is Executive Director for Analytics at Dell’s Information Management Group

The value of effective predictive/prescriptive analytics is easily explained: The best and largest storage capabilities, fastest data access and ETL functionality, and most robust hardware infrastructure will not guarantee success in a highly competitive market place. If, however, one can predict what will happen next – how consumer sentiments will shift, which large insurance claim provides opportunities for subrogation, or how specific changes in the manufacturing process will drastically reduce warranty claims in the field – critical actions can be taken yielding competitive advantages that could pay off within weeks or even days for the entire investment required to achieve those insights.

I sometimes like to point out that I have predicted every stock market crash in the past 30 years – after they happened. Obviously, reporting on what happened to gain insight is interesting and perhaps useful, but the value of predicting outcomes and “pre-acting” rather than reacting to those outcomes can be priceless.

I cannot think of a single successful business that is not continuously working to complete the value chain from the collection of data to predictive modeling, and automating mission critical decisions through effective prescriptive decisioning systems, i.e., some (semi-) automated system by which the best pre-actions to anticipated events and outcomes become part of the routine day-to-day operations and SOPs.

There are near infinite numbers of specific examples. I have had the privilege of collaborating with some brilliant visionaries and practitioners on several books around predictive modeling, the analysis of unstructured data, and (in a forthcoming book) on the application of these technologies to optimize healthcare in various ways. These books describe the near-infinite universe of use cases and examples to illustrate what successful businesses and government agencies are doing today.

When good projects go bad

So what are the real challenges to adopting successfully predictive and prescriptive analytics? The biggest challenge in any such project – in order to incorporate these technologies into mission critical processes – is to complete successfully every single step of the value chain, from data collection, to data storage, data preparation, predictive modeling, validated analytic reporting, to providing decisioning support and prescriptive tools to realize value.

There are near infinite numbers of ways by which well-intended and sometimes planned projects can drive off the rails. But in our experience, it almost always has to do with the difficulty to connect to the right data at the right time, to deliver the right results to the right stakeholder within the actionable time interval where the right decision can make a difference, or to incorporate the predictions and prescriptions into an effective automated process that implements the right decisions.

Sometimes, it is an overworked IT department dealing with outdated and inadequate hardware and storage technologies, trying to manage the “prevention of IT” given these other challenges. Sometimes there are challenges integrating diverse data sources that span structured data in relational databases on premise, information that needs to be accessed in the cloud or from internet-based services, with unstructured textual information stored in distributed file systems.

For example, many manufacturing customers of StatSoft need to integrate manufacturing data upstream with final product testing data, and then link it to unstructured warranty claim narratives that capture failures in the field stored in diverse systems. In the financial services industry, in particular the established “brick-and-mortar” players are challenged to build the right systems to capture all customer touch points and connect them with the right prediction/prescription models, to deliver superior services when they are most needed.

So in short, the data may be there, the technologies to do useful things with those data exist (and are comprehensively available in StatSoft’s products), but the two cannot readily be connected. It is generally acknowledged that data preparation consumes about 90% or more of the effort in analytic projects.

Completing the value chain

That is why we are excited at StatSoft to be part of Dell, and why our customers almost immediately “get it”: Dell hardware, combined with the cutting edge tools and technologies in Dell’s software stack, combined with Dell’s thought leaders and effective services across different domains, and now combined with StatSoft’s tools and solutions for predictive and prescriptive analytics deliver the only ecosystem of its kind that can integrate very heterogeneous data sources, and connect them to effective predictive and prescriptive analytics. It does not matter if, as is the case in the real world, these data sources are structured or unstructured, involve multiple data storage technologies and vendors, are implemented on-prem or cloud based. We can deliver solutions based on robust hardware with cutting-edge software and effective and efficient services, combined with the right analytics capabilities to drive effective action.

So pausing for a moment to reflect on this, I cannot really think of any other provider of these capabilities that can complete the data-to-insight-and-action value chain for driving competitive advantages to all businesses small or large. StatSoft’s motto was “Making the World more Productive” which naturally goes with Dell and the Power to do more.

This will be an exciting time going forward for StatSoft and Dell, and our customers.

Credit Scoring at Novum Bank: Data Mining defines success in high risk lending


Original article by Marcel Wiedenbrugge

Imagine you are active in the provisioning of (micro) credit and a customer wants to borrow temporarily a few hundred euros from you. How do you determine whether it makes financial sense to do business with this customer? For Joop Bruinzeel, Chief Credit Risk Officer (CCRO) at Novum Bank, this question is just another day’s issue. As a provider of micro-credit, Novum Bank daily provides relatively small amounts (from €100 to €600) to customers where traditional banks have no interest due to a high risk profile. Properly set up and tuned credit risk management is essential.
For assessing credit applications, Novum Bank recently started using STATISTICA, the analytical software solution from StatSoft (now a part of Dell). In this interview I speak with Joop Bruinzeel about micro-credit, the importance of credit scoring and the use of analytical software.
What are the activities of Novum Bank and what is your role?
Joop: “Novum Bank is a Malta-based bank with a full European banking license. We are a specialist in the field of payment: Consumer credit and prepaid debit cards in particular. Our cards division is offering white label programs in all European countries. Currently, we are strongly represented in Germany, where you can find our products at almost all petrol stations.
“Regarding credit, we focus primarily on high risk, short-term loans. At the moment we are active in The Netherlands, Poland and Spain. For the past 1.5 years I have been a member of the Executive Board in the position of CCRO. That means that I am responsible for credit risk, both in the area of consumer credit as well as in the field of risk analysis regarding the daily use of prepaid debit cards.”
Can you update us on the high risk short term loanmarket? What is your differentiator?
Joop: “We are in a special position because no other banks are providing this type of short-term loan. Traditional banks often consider the risk to be high and the reward (risk/margin/revenue ratio) too low. Additionally, traditional banks are dealing with a different cost structure. We have chosen to operate lean and mean and invest in technology. Our partnership with StatSoft is a logical continuation of our strategy. Currently, we are much better able to provide risk assessments, especially with regard to managing portfolios with a more complex risk profile. The combination of lean and mean and state-of-the-art technology ensures that we are able to achieve a significant competitive advantage, even despite challenging market conditions.”
What developments has credit risk management gone through at Novum Bank?
Joop: “Just eighteen months ago we were still dependent on human interpretation and gut feeling. The steep growth path that we have made and the expansion into several countries made a more professional approach mandatory. New markets and new ways of lending require a high technology approach. Furthermore, credit scoring plays a crucial role as StatSoft came into the picture.”
Can you explain the importance of credit scoring?
Joop: “Credit scoring is essential to us. The turning point between customers who will pay us back and customers who will not pay us back is very close. Expensive errors are easily being made.
“Statistically, we are in this market due to the law of large numbers. Modeling and testing of scoring models usually takes several months to complete before results appear. If the modeling process is not in control or not understood, you can find out—potentially six months later—that wrong choices that were made can be very costly. Credit scoring is, per default, important to us because we accept customers with complex risk profiles. Fluctuations, peaks and valleys, may increase sharply. Besides this, the scale of operation in several countries with different characteristics makes it necessary to manage credit scoring methods and techniques. The third point that I want to mention is that (near real-time) credit scoring allows (near real-time) customer feedback, whether a credit application is accepted or not.”
Why did you choose STATISTICA?
Joop: “Before I started working for Novum Bank, I immersed myself in modelling. I got back in touch with another company that was specializing in short-term loans and had (successfully) made use of STATISTICA. As the shareholders of Novum Bank required mature risk management, STATISTICA was perceived as a logical choice to go with.”
What functionality are you using and how does it work in practice?
Joop : “We use STATISTICA primarily to make a profit scoring model. This allows us to calculate the risk if we are dealing with a good or bad customer. We have cleaned and complemented the historical data of the past few years, after which we continue editing the data in STATISTICA for further processing and analysis.
“Four models are being used to determine which model works best. Although we use statistical logistic regression techniques, our main focus is the usage of data mining algorithms, such as Boosted Trees and Random Forrest. During data preparation the software clearly identifies the key parameters that affect the profit score model, which I think is a strong feature of STATISTICA. Once the best model is found, we apply the model on the historical data. The software clearly shows how much return we could have made on the portfolio if we had used the scoring model of STATISTICA.
What do you like about STATISTICA?
Joop: “The beauty of STATISTICA is that you can build decisioning models which you can test on older portfolios (also called backlog or backtesting). The workbench also offers the possibility to add your own insights to the models. That allows us to refine the models, so that we can achieve better results.”
Can you share with us your client acceptance process? 
Joop: “We are working with strict customer acceptance requirements, which are fully compliant with the rules set by the Dutch regulator AFM (Autoriteit Financiële Markt). The standard requirements in order to qualify for a loan are: At least 21 years of age, sharing a (recent) payslip, proving a steady income with sufficient funds, plus identification. In addition, there are twenty other parameters that we use to dynamically determine whether or not we accept anyone. With “dynamic,” I mean that, for example, we may or may not look at the age of the applicant, or whether someone is married or not. Based on historical data analysis, we know that these things can affect the likelihood that someone will pay back a loan (on time). We also look at less obvious things such as the point of time a loan is requested. STATISTICA can handle all of these parameters in the profit scoring model. This results in a final score, which serves as a basis to make a financially responsible decision whether we will grant a loan application.”
How are you dealing with fraud?
Joop: “So far we are doing this largely case by case. We check at least the standard documents for completeness, authenticity, and accuracy. We verify the payslips and check the age of the applicant against the identification number. In the future we will automate these kinds of control, as the verifying costs are relatively high.
“When you consider that we reject most of every hundred loan applications, then it will be clear that efficiency improvements and cost reductions have our ongoing attention.”
What was the implementation time of STATISTICA?
Joop: “A number of issues have played a role: First, I knew in advance very well what I wanted. Additionally the two-day training course was well conducted and focused purely on the functionality that we needed to calculate profit scores. We are experiencing a pleasant cooperation with StatSoft. All together we had a good working model in 2.5 months’ time, resulting in a significant improvement of the results in the test market.”
And the return on investment?
Joop: “So far we have applied the scoring model only to a part of the Spanish portfolio. We have invested in software, education, and time. Our return on investment was only four months. Taking it from another perspective, we are just at the beginning, as there are plenty of opportunities to refine the models.”
What are your future plans?
Joop : “I want to use STATISTICA not only for scoring, but also for its statistical functionality. There are plenty of opportunities to further improve and refine the models. We think about further modeling the portfolio, especially for marketing purposes. I expect that in the future even more data will be linked. For data analysis, STATISTICA will play a central role.
[This interview originally appeared in Dutch in “Credit Expo,” April 28, 2014.]

A New Gold Rush Is On. Who Will Strike It Rich?


Original article by Michael Dell of Dell 

Data is arguably the most important natural resource of this century. Top thinkers are no longer the people who can tell you what happened in the past, but those who can predict the future. Welcome to the Data Economy, which holds the promise for significantly advancing society and economic growth on a global scale.

Big data is big news just about everywhere you go these days. Here in Texas, everything’s big, so we just call it data. But we’re all talking about the same thing—the universe of structured data, like transactional information in databases, and also the unstructured data, like social media, that exists in its natural form in the real world.

Organizations of all sizes are trying to figure out how to use all of this data to deliver a better customer experience and build new business models. Consumers are struggling to balance a desire for automated, personalized services with the need for safety. Governments are pressured to use all available data in support of national security, but not at the expense of citizens’ right to privacy. And underlying it all is the realization that data, if managed, secured and leveraged properly, is the pathway to progress and economic success.

So who will strike it rich in this new, data-driven gold rush? It will invariably be those who are willing to accept the new realities of the Data Economy. Business instincts and intuition are being augmented and increasingly replaced by data analysis as the drivers of success. We’ve seen it at Dell. Our marketing team uncovered more than $310 million in additional revenue last year through the use of advanced analytics. This year, we expect that number to exceed half-a-billion.

We believe that’s just the tip of the iceberg, and we’re accelerating our strategy. Recently we announced the acquisition of StatSoft, a leading provider of data mining, predictive analytics and data visualization solutions. It is yet another investment in our enterprise solutions, software and services portfolio specifically designed to help our customers turn data into action.

But contrary to popular opinion, the data economy isn’t just for global enterprises like Dell. A Dell-commissioned study that we will announce later this month found that mid-market companies are increasingly investing in data projects to drive better decision making and better business results. We have also found that startups that use technology more effectively create twice as many jobs on average and are more productive and profitable than companies that don’t.

At their core, entrepreneurs are all about solving problems, and nothing provides a better window into problems than data. Consider the popularity of Global Positioning System (GPS) technologies. The simple act of connecting to and delivering data paved the way for many successful businesses that in turn created an entirely new segment of the economy.

The day is near when the use of data analytics will simply become the price of remaining viable and competitive in the global marketplace. There is still a lot of uncertainty about the Data Economy, but this much is clear: the opportunity for data-driven organizations is golden.

STATISTICA | Video Tutorials

fp-banners-dnn-resource-libraryNo need to feel lost getting started with STATISTICA! We’ve got you covered with our popular videos on text mining, data mining, and all things analytic

How to Show Grouping in Scatterplots

STATISTICA how-to logoA scatterplot shows the relationship between continuous variables. Applying a grouping factor adds yet another dimension that can greatly enhance a plot’s usefulness.
This article explores two ways of showing a grouping variable in a scatterplot. The difference between the two methods is the fit line. One method uses one fit for all levels of a grouping factor, but shows the levels with point marker colors and patterns. The other method fits separate lines for each group.
The data set used in this example, Irisdat.sta, contains measurements for various parts of the flower for three different varieties of iris. To open the data set, select the Home tab and in the File group, click the Open arrow. From the menu, select Open Examples to display the Open a STATISTICA Data File dialog box. Double-click the Datasets folder, and then open Irisdat.sta.
One Fit Line for All Groups
Select the Graphs tab. In the Common group, click Scatterplot to display the 2D Scatterplots Startup Panel. Click the Variables button to display the Select Variables for Scatterplots dialog box, and select SEPALLEN as X and SEPALWID as Y.
STATISTICA - Select variables for Scatterplot
Click the OK button.
On the Advanced tab of the 2D Scatterplots Startup Panel,
STATISTICA - 2D Scatterplot startup panel
click the Mark Selected Subsets button. The Specify Multiple Subsets dialog box will be displayed. Create the three subsets for the grouping factor, IRISTYPE, as shown in the next image.
STATISTICA - Specify multiple subsets dialog
Click OK in the Specify Multiple Subsets dialog box, and click OK in the 2D Scatterplots Startup Panel.
The resulting graph is a scatterplot that contains one fit line for all points, but distinguishes points by the grouping variable IRISTYPE with colors and point markers.
STATISTICA - Scatterplot with one fit line for all points
Separate Fit Lines for Groups
Alternatively, it may be appropriate to use separate fit lines for the three groups. To do this, create a categorized graph.
Start a new 2D Scatterplots analysis, and select variables as before.
Now, in the 2D Scatterplots Startup Panel, select the Categorized tab. In the X-Categories group box, select the On check box. The options will become active. Click the Change Variable button to display the Select Categorization Variable dialog box. Select IRISTYPE.
STATISTICA - Select Categorization Variable dialog
Click OK to close this dialog box and return to the 2D Scatterplots Startup Panel.
The options Integer mode, Unique values, Categories, etc., give you flexibility with the grouping variable. A categorization variable does not have to be categorical in nature.
In the Layout group box, select the Overlaid option button.
STATISTICA - 2D Scatterplot startup panel again
Click OK to create the graph. Three separate fit lines are shown for the three categories in addition to the groups being designated by colors and point markers.
STATISTICA - Scatterplot with three separate fit lines - categorized
STATISTICA graphs offer extensive flexibility, which enables you to create the representation of the data that you need.

Upcoming STATISTICA Training

sectorsDear STATISTICA user/analyst/researcher,


Introduction to STATISTICA Training Workshop

Venue: Unit 22 Petervale Centre, Cor Cambridge & Frans Hals Rds, Petervale, Sandton 2191


22 & 23 APRIL 2014
8.30 for 9 a.m.


General Conventions:

  • User-interface
  • Customisation options
  • Creating reports, docs
  • Exercises & FAQ’s


Data Management:

  • Creating, modifying & saving data
  • Importing data
  • File structure manipulation
  • Exercises & FAQ’s



  • Overview of elementary concepts
  • Descriptive Statistics
  • T-Tests
  • Correlations
  • Frequency Tables
  • Cross Tabulations



4 p.m.

8.30 for 9 a.m.


Statistics Continued:

  • Exercises & FAQ’s



  • Overview of Graph types
  • Creating Graphs
  • Customising Graphs
  • Brushing Techniques
  • Curve-fitting
  • Exercises & FAQ’s



  • Introduction to automating & customising
  • Exercises


Overview of Additional Add-on Modules & their Applications


Industry-specific Example Applications


Questions & Answer Session


4 p.m.


PRICING:   Academic Pricing R 4,000.00 VAT Excl per delegate

                        Commercial Pricing R 6,000.00 VAT Excl per delegate


  • Free 30-Day Installation of STATISTICA on your laptops
  • Customised workshops using your own data (optional)
  • Training Material
  • Refreshments & lunches
  • Certificates


For more information or to register for this training please phone Lorraine Edel on 011 234 6148 or 082 5678 330 or mail


Unit 2 Petervale Centre, cor Cambridge & Frans Hals Rds, Bryanston, Johannesburg, South Africa
Phone: +27-11-656-0395; Fax: +27-11-656-0396



The Dell Acquisition: Hot Topic on the Web!

2014-03-27-surprise-computer-200sqIf the sheer volume of this week’s Twitter buzz is any indication, it is clear that Dell Software’s acquisition of Tulsa-based StatSoft (announced three days ago) has surprised, impressed, and befuddled many an observer of the advanced analytics space.

Within hours of Dell’s press release this past Monday morning, plenty of forward-thinking statements and opinions were already being expressed as bloggers and journalists trumpeted the information across social media channels.

For your reading pleasure, here is a short list of just some of the feedback I’ve been able to keep up with. To help you decide what to read, I have taken the liberty of noting what I found to be quick takeaways.

No doubt, we will see more opinions and thoughts published in the coming weeks and months. Naturally, we are pretty excited about the possibilities here at the (former) StatSoft HQ. What are your thoughts on all this?

StatSoft is now part of Dell

dell-announcementStatSoft is proud to announce today that we have joined forces with Dell and Dell’s Information Management Group, one of the largest providers of end-to end BI and analytic solutions in the market. As of today, StatSoft is part of the Dell organization.

End-to-end advanced analytics solutions.  For StatSoft and Dell customers, this means new opportunities and capabilities to enable leading edge analytics technologies to leverage the accelerating growth of data occurring in every industry, to achieve and retain industry leadership. Turning the torrents of data into actionable information is the fundamental mission of StatSoft as well as Dell’s Information Management Group. StatSoft’s big data predictive modeling and data mining solutions for various industries, combined with Dell’s wide range of data management and software capabilities and affordable, leading-edge, and comprehensively supported x86 server platforms can deliver big data analytics at a Dell price-point for unbeatable ROI.

Dell Software already offers a host of tools to manage data and databases across structured and unstructured data sources, including products such as Toad for Oracle, Toad for SQL Server, and Spotlight on SQL Server Enterprise, as well as tools to integrate data and applications distributed across the organizations, including products such as SharePlex and Dell Boomi, the latter of which was recently positioned by Gartner, Inc. in the “Leaders” Quadrant of the Magic Quadrant for Enterprise Integration Platform as a Service.

Making the World More Productive

We are excited to combine with Dell’s shared resources providing myriads of opportunities to leverage StatSoft’s analytic solutions in concert with Dell’s hardware solutions, and by way of its numerous industry relationships, including those with SAP Hana, Oracle, Microsoft SQL and PDW, and Cloudera. We are looking forward to continued growth together with our distinguished list of successful customers in practically every industry, and thank you for your support.

StatSoft Recognized in Magic Quadrant™ Announcement at BI Summit

gartner-summit-200pxInformation technology research and advisory firm Gartner has unveiled its Magic Quadrant for Advanced Analytics Platforms, publicly highlighting StatSoft’s position with top-tier “ability to execute” advanced solutions.

This particular quadrant report, new to Gartner’s offerings, was released February 24 in a brief session at the Gartner Business Intelligence and Information Management Summit in Sydney, Australia.

Darrel Amarasekera, Managing Director of StatSoft Pacific, was among the audience of vendors and executives, whom he described as “enthusiastic [and] very, very attentive” while Gartner Research Director Lisa Kart skimmed through the report’s contents.

Kart specifically drew the audience’s attention to StatSoft’s status as a new entrant among the top three vendors capable of executing advanced solutions. She shared with the audience some of the strengths of the STATISTICA platform. In the downloadable report (sign-in required), these strengths address STATISTICA’s wide range of functionality with a broad variety of data types; high customer satisfaction with advanced descriptive and predictive analytics; and scalability. In addition, StatSoft was reported with some of the highest evaluations for product reliability and upgrade experience, and STATISTICA was most frequently selected for license cost and speed of model deployment.

Previously, Gartner analysts had combined business intelligence with analytics in their annual Magic Quadrant research reports. However, recent industry changes with big data and predictive analytics have prompted them to develop a standalone “Advanced Analytics” category this year.

Gartner clients can access the complete Advanced Analytics Magic Quadrant report online.

STATISTICA reduces emissions spikes & associated costs with #PredictiveAnalytics at coal coking plant, pays for itself in 6 months

emissionsNew Success Story: STATISTICA reduces emissions spikes & associated costs with #PredictiveAnalytics at coal coking plant, pays for itself in 6 months. What about your industry? Click here to read full article.