Monthly Archives: July 2013

Flexible Output Options

When you perform an analysis, STATISTICA generates output in the form of multimedia tables (spreadsheets) and graphs. The Output Manager is used to direct all output to workbooks, reports, stand-alone windows, and/or Microsoft Word documents. STATISTICA Enterprise Server users can output results to the web using STATISTICA Enterprise Server Knowledge Portal.

Each of the STATISTICA output channels has its unique advantages. They can be used in many combinations (e.g., a workbook and report simultaneously) and can be customized in a variety of ways. Also, all output objects (spreadsheets and graphs) placed in each of the output channels can contain other embedded and linked objects and documents, so STATISTICA output can be hierarchically organized in a variety of ways.

The unique advantages of each of the STATISTICA output channels are described in the sections that follow. More comprehensive overviews of each of the document types associated with the respective channels of output can be found in STATISTICA Documents.

Workbooks

Workbooks are the default way of managing output (for more information, see the workbooks section of STATISTICA Documents). Each output document (e.g., a STATISTICA Spreadsheet or Graph, as well as a Microsoft Word or Excel document) is stored as a tab in the workbook.

Documents can be organized into hierarchies of folders or document nodes (by default, one is created for each new analysis) using a tree view, in which individual documents, folders, or entire branches of the tree can be flexibly managed.

Output Workbooks

For example, selections of documents can be extracted (e.g., drag-copied or drag-moved) to the report window or to the application workspace (i.e., the STATISTICA application “background” where they will be displayed in stand-alone windows). Entire branches can be placed into other workbooks in a variety of ways in order to build specific folder organization.

Technically speaking, workbooks are ActiveX document containers (see ActiveX technology). Workbooks are compatible with a variety of foreign file formats (e.g., Microsoft Office documents), which can be easily inserted into workbooks and in-place edited.

User notes and comments in workbooks. Workbooks offer powerful options to efficiently manage even extremely large amounts of output, and they may be the best output handling solution for both novices and advanced users. It might appear that one of their possible drawbacks is that user comments (e.g., notes) and supplementary information cannot be as transparently inserted into the “stream” of the workbook output as they can in traditional, word processor style reports, such as STATISTICA Reports (see the next section). However, note that:

  • All STATISTICA documents can easily be annotated, both (a) directly, by typing text into graphs, tables, and reports, and (b) indirectly, by entering notes into the Comments box of the Document Properties dialog (accessed by selecting Properties from the File menu).
  • Formatted documents with notes and comments (in the form of text files, STATISTICA Report documents, WordPad or word processor documents, etc.) can easily be inserted anywhere in the hierarchical organization of output in workbooks. Moreover, such summary notes or comment documents can be made nodes for groups of subordinate objects to which the note is related to further enhance their organization.

Reports

Reports in STATISTICA (more information about reports is available in STATISTICA Documents) offer a more traditional way of handling output where each object (e.g., a STATISTICA Spreadsheet or Graph, or a Microsoft Excel spreadsheet) is displayed sequentially in a word processor style document.

Output ReportsHowever, the technology behind this simple editor offers you very rich functionality. For example, like the workbook, the STATISTICA Report is also an ActiveX container, where each of its objects (not onlySTATISTICA Spreadsheets and Graphs, but also any other ActiveX-compatible documents, e.g., Microsoft Excel spreadsheets) remains active, customizable, and in-place editable. The obvious advantages of this way of handling output (more traditional than the workbook) are the ability to insert notes and comments “in between” the objects as well as its support for the more traditional way of quick scrolling through and reviewing the output to which some users may be accustomed.

The obvious drawback, however, of these traditional reports is the inherent flat structure imposed by their word processor style format, although that is what some users or certain applications may favor.

Stand-Alone Windows

STATISTICA output documents can also be directed to a queue of stand-alone windows; the Queue Length can be controlled in the Output Manager.

Output Stand-Alone Windows Output Stand-Alone Windows

The clear disadvantage of this output mode is its total lack of organization and its natural tendency to clutter the application workspace (some procedures can generate hundreds of tables or graphs with a click of the button).

One of the advantages of this way of handling output is that you can easily custom arrange these objects within the STATISTICA application workspace (e.g., to create multiple, easy to identify “reference documents” to be compared to the new output). However, note that in order to achieve that effect, you do not need to configure the output ahead of time and generate a large number of (mostly unwanted) separate windows that can clutter the workspace. Instead, individual, specific output objects directed to and stored in workbooks and/or reports can easily be dragged out from their respective tree views onto the application workspace as needed.

Microsoft Word

STATISTICA also allows routing output directly to Word via the Office Integration features. When Word is open within STATISTICA, Word toolbars and menus are also available through standard Active X Document interfaces technology. In STATISTICA, you can perform any formatting and editing that Word supports in its application.

When sending spreadsheet analytical results to Word, STATISTICA will take advantage of Word’s table editing facility, and convert the spreadsheet to a table. For multi-page spreadsheets, you can control where to break the rows and columns. These spreadsheets will be broken by columns such as will be allowed without exceeding the page width. All rows for a given set of columns will be rendered before the next set of spreadsheet columns is rendered in the Word document. This solution enables the presentation of spreadsheets in Word that are natively editable in Word, displays the entire contents of the spreadsheet, and prints and paginates correctly.

As with standard STATISTICA Reports, Word documents can store and preserve the record of supplementary information (e.g., selected variables, long names, etc.).

Although Word documents do not provide the navigational tree of a STATISTICA Workbook or Report, the advantages in sending output to Word documents are many. By sending results to a Word document, you have all the word processing features of Word at your finger tips. For example, you can attach templates to create customized documents, add tables of content and indices, track changes, etc.

When inserting a large spreadsheet into a Word document, STATISTICA automatically detects how many variables can fit on each page and partitions the spreadsheet into several Word tables. If the spreadsheet uses case names, those names will be the first column in each table.

Additional benefits of sending results to a Word document include increased printing functionality (e.g., printing to files, manual duplex) and the ability to save results as Web pages.

Web

HTML reports. You may want to post a STATISTICA Report or Workbook on the Internet for others to review. With STATISTICA, you can save reports and workbooks in HTML format. HTML is an acronym for HyperText Markup Language. HTML uses tags to identify elements of the document, such as text or graphics.

STATISTICA Enterprise Server Knowledge Portal. STATISTICA Enterprise Server provides another way to distribute reports – through the Knowledge Portal. The Knowledge Portal enables you to publish STATISTICA documents (spreadsheets, graphs, reports, or workbooks) to the Internet. Users with limited Knowledge Portal permissions can then view those documents. You control who can access these documents by setting permissions on the documents and directories using standard WebSTATISTICA repository tools.

Risk Management

Financial Services companies face the challenging task of making credit decisions in a complex and uncertain environment. Decisions regarding credit limits and approvals must be compliant with regulatory and policy rules, reflect known segmentation (rules) among products and customers, and incorporate standard as well as advanced and emerging predictive risk modeling techniques, such as text mining.

The STATISTICA Risk Management solution has been proven at some of the largest and most progressive financial institutions in the world. Our solution provides advanced analytical tools that enable financial services companies to gain more profitable customers and to decrease risks.

STATISTICA Solution

  • Full Range of Solutions: Data preparation, attribute building, weight of evidence coding, scorecard building, model selection, model evaluation, cut-off point selection, and population stability are all incorporated into one software package.
  • Streamlined Process:  Scorecard solutions integrate the various tools needed to provide a comprehensive risk modeling package.
  • The Most Powerful Algorithms Available: STATISTICA incorporates not only logistic regression and Cox Proportional Hazards, but also other powerful data mining algorithms such as k-means clustering, decision trees, and neural networks, which are being incorporated into credit risk models.
  • Reflexive Models for RealTime Needs: Live Score® processes new customers instantly and updates credit risk models in rapid turn-around times made possible only by STATISTICA’s integrated solutions.

 

Customer Loyalty

In the casino and gaming industry, as in most industries, acquiring new customers is much harder than holding onto existing customers. A major concern is losing a customer to a competitor thereby losing not only the current business of a customer but also forgoing all possible future business from that customer.

The STATISTICA Customer Loyalty solution provides an automated way of identifying which customers have a high probability of churn using advanced predictive analytics and the reasons for churn using root cause analysis and also suggests the next best action to retain those customers. STATISTICA Extract, Transform, and Load enables data acquisition from multiple data sources thus determining patterns in churn that may not be apparent otherwise.

STATISTICA Solution

  • Powerful Statistical Tools: STATISTICA provides you with an arsenal of the most power statistical tools available.
  • Innovative Data Pre-processing Tools: STATISTICA provides a very comprehensive list of data management and data visualization tools.
  • Cutting-edge Predictive Analytics: STATISTICA provides a wide variety of basic to sophisticated algorithms to build models which provide the most lift and highest accuracy for Churn detection.
  • Enhanced Text Analytics: STATISTICA provides an advanced text miner tool to better leverage unstructured/textual data.
  • Enterprise wide solution: A multi-user, role based, secure STATISTICA Enterprise platform allows for a truly collaborative environment to build, test and deploy the best possible models for churn reduction.
  • Integrated Workflow: STATISTICA Decisioning platform provides a streamlined workflow for powerful, rules-based, predictive analytics where business rules and industry regulations are used in conjunction with advanced analytics to build the best churn reduction models.

Automotive Manufacturing

Statistical Process Control

Advanced Process Monitoring Solutions for the Automotive Manufacturing Industry

Automotive manufacturers, including suppliers to the automotive industry, benefit from a multitude of STATISTICA products to achieve the most efficient processes in the business. Typical applications include monitoring processes, finding important controllable factors and anticipating issues before they occur. STATISTICA solutions available for these tasks include: STATISTICA Enterprise QC, STATISTICA Monitoring and Alerting Server (MAS), STATISTICA Enterprise Server, and STATISTICA Process Optimization and Root Cause Analysis.

Areas of Application: Monitoring Processes with STATISTICA Enterprise QC and MAS

STATISTICA Enterprise QC monitors the various critical manufacturing processes that are taking place simultaneously at the facility during testing and assembly. Immediately knowing when a process gets off spec saves time and materials. STATISTICA Enterprise QC offers SPC solutions for automotive suppliers to monitor processes and part testing to ensure quality of parts and assemblies.

STATISTICA Monitoring and Alerting Server (MAS) provides automated monitoring and dashboard summaries for highly automated automotive manufacturing and assembly processes.

STATISTICA Monitoring and Alerting Server Dashboard

Collaborating with Suppliers using STATISTICA Enterprise Server QC

STATISTICA Enterprise Server QC enables automotive manufacturers to collaborate with suppliers through its web interface. This allows for the sharing of supplier data and collaborative review of results.

Anticipating Issues before they Occur with STATISTICA Process Optimization and Root Cause Analysis

STATISTICA Process Optimization and Root Cause Analysis is an exceptional tool for monitoring the manufacturing process at each step along the way, even anticipating quality control problems with unmatched sensitivity and effectiveness. By integrating cutting-edge predictive modeling and data mining techniques with the vast array of traditional quality tools including quality control charting, process capability analysis, experimental design procedures and Six Sigma methods, STATISTICA Process Optimization and Root Cause Analysis allows for complete process understanding, root cause analysis, and accurate predictions of quality outcomes during the manufacturing process.

STATISTICA Process Optimization and Root Cause Analysis allows you to take advantage of existing historical data and find patterns in the data that affect the final outcome. As most automated manufacturing processes involve a large number of steps to get to the end product and interactions between these effects often exist, a traditional experimental design would require far too many runs. Root Cause analysis uses your historical data to find factors and combinations of factors that affect the end product quality.

STATISTICA Process Optimization and Root Cause Analysis builds predictive models that reflect the relationship between manufacturing inputs and outcomes (e.g., conformance to specifications) of the manufacturing process. The models can then be used to simulate runs, finding optimal settings and improving overall quality of the process.

For an overview of the application of predictive modeling to manufacturing processes, read the article from Quality Digest, Finding Direction in Chaos, Data mining methods make sense out of millions of seemingly random data points

Warranty Cost Sharing

New Challenges Facing OEMs

  • Auto manufacturers will begin submitting chargebacks to their original equipment manufacturers (OEMs) if quality thresholds are “breached”
  • The implications for the OEMs are that (a) an earlywarning detection system is needed to identify quality problems much earlier in the process so that critical factors can be corrected before the issue impacts the overall quality scorecard and (b) improved techniques to identify root causes are needed in order to positively determine whether their part is the culprit
  • Many companies use Excel and other manual analysis tools in an attempt to spot emerging complications, but simplistic approaches do not provide the advance notice that a supplier needs to rapidly identify and fixproblems
  • The problem isn’t more data. The problem is to better leverage the data to detect patterns earlier in the process and then rapidly identify the root cause and fix the process

Improve Insights into Warranty Claims and Part Failures

In an attempt to reduce warranty costs per vehicle, top automakers are focusing on warranty cost analysis. One result is a warranty “chargeback” system that will become more punitive in 2012. Automotive companies including General Motors, Ford, Chrysler, and others are formalizing their warranty chargeback systems to reduce their expenses by passing the costs back to their suppliers.

Automotive suppliers can be prepared by improving product quality and increasing their ability to defend claims of part failures. A major fear of suppliers is that warranty costs will be charged without proper evidence and justification. With existing manual processes, OEMs agree that it will be difficult and time consuming to differentiate between actual part defects and systemrelated failures.

Manufacturers and suppliers must be empowered to recognize “real” part failures from the many other possible problems. They need to be able to review part failure rate data across customers, platforms, and other factors to discover the root cause of failure, but the information relevant to warranty claims is diverse and complex. A systematic approach to aggregate and organize the relevant data is needed, but the relevant data are cryptic, with both numeric and track-andtrace data from the manufacturing process and text data from the warranty claims themselves. Warranty coding of failures by mechanics is inconsistent and unstructured, making it difficult to comprehend “the whole story.”

Using StatSoft’s multivariate solutions, patterns emerge that were not previously apparent. From these patterns, automatic alerts are generated, which indicate much earlier that a problem is developing. Once the alert is generated, the anomaly can be analyzed in real time to determine if it is the component or some other factor in the system that is causing the problem.

Warranty Process Flow: Early Warning Detection and Root Cause Analysis System

ETL: Data Access

Problem: Data resides in disparate databases requiring data connections, aggregation, and alignment across multiple databases and data historians. Getting to the data is a manual process, and engineers spend too much time producing reports.
Solution: Build data access, automated process monitoring, and root cause analysis templates only once and do away with manual data retrieval. Free up engineers’ time for higher value task resulting in process improvement and reduced warranty claims.

Traditional SPC Analysis

Problem: How to implement effective SPC monitoring that is responsive to small changes in (warranty) trends?
Solution: Use cumulative sum charts, exponentially weighted moving average charts, and runs tests to simultaneously monitor hundreds or thousands of components and subcomponents

Modeling: Failure Modes + Component Life

Problem: How to link warranty issues to manufacturing parameters and product testing? How to implement successful strategies for driving down warranty repair costs?
Solution: Predictive modeling techniques can identify the key patterns relating manufacturing and product testing data to warranty claims; those predictive models can then be used in “what-if” (scenario) analyses to identify cost-effective solutions to drive down warranty costs.

Root Cause Analysis

Problem: What are the most important variables that impact product quality and reliability in the field, and drive warranty cost? How to quickly diagnose root causes when new failure modes are reported in the field?
Solution: Diagnose complex issues quickly by applying automated root cause analyses. Quickly identify the critical manufacturing parameters and inputs where additional resources are needed to drive warranty cost down.

Effective Multivariate Process Monitoring

Problem: Exceptional component/product reliability is the result of a complete understanding of the interactions among numerous manufacturing parameters, supplier inputs, etc. When reviewing a process sequentially, one parameter at a time, important interactions will be overlooked.
Solution: Find anomalies and patterns in high-dimensional data through the implementation of multivariate and model-based process monitoring. Find manufacturing quality problems early and before they show up in standard control charts.

Ad Hoc Engineering Analytics; Text Mining of Warranty Claims

Problem: How to find emerging problems and new patterns in warranty data? How to avoid the expense of trained engineers reading large numbers of warranty reports in order to classify them, and to detect new problems?
Solution: Apply advanced automatic text mining methods to classify and cluster warranty claim reports; then use ad-hoc drill-down methods to detect emerging trends.

 

Media Mix Optimization

The marketing of brands and products has dramatically changed. Messages are disseminated through a variety of channels: printed media, radio, TV, blogs and forums, web sites, twitter and social networks.

The STATISTICA Enterprise solution for Social Media Mix Optimization provides an integrated system that evaluates response from the market and optimizes conversion of this response into sales. Looking from a different perspective, it analyses performance of different channels and optimizes related expenses.

Social media response is obtainable in many formats and aggregations: from the users count, number of views, friends, or “Likes” that can be available daily, hourly, or even by the minute, to time stamped customer reviews that may not be updated as frequently. Configuring and maintaining all data sources in STATISTICA Enterprise and numericizing text fields with STATISTICA Text Miner combined with STATISTICA ETL (Extract, Transform, Load) functionality helps to solve this challenging task in an efficient and automated way.

STATISTICA Solution

  • Powerful analytics tools: STATISTICA provides you with an arsenal of the most powerful data and text mining tools, which build accurate predictive models for linking variables from different sources.
  • Enterprise system: This system provides the robust and scalable server backbone for automating the analytics, linking marketing expenditures to consumer sentiment, and linking consumer sentiment to expected demand (and sales). STATISTICA Enterprise also provides the display layer to manage large numbers of channels via efficient and hierarchically nested dashboards that will alert/alarm when undesirable trends are detected.
  • Optimization tools: Powerful “what-if” scenario analysis identifies the optimal combinations of expenditures for different advertising and marketing channels. Predictive models will be built to establish confidence regions around the formula for the optimal mix to empower marketing or product managers to evaluate risk/reward scenarios, and ultimately, turn the buzz into sales.
  • Advanced Extract, Transform and Load functionality: ETL brings all data sources together
  • Monitoring and alerting server: provides automated and proactive alarms on changes in customers behavior.

Heavy Equipment Manufacturing

Heavy Equipment Manufacturing

 

Capital Equipment Manufacturers utilize STATISTICA throughout the manufacturing process and then analyze the repair and usage data once their products are in use by customers

STATISTICA Solution

  • Manufacturing / Six Sigma: STATISTICA is an integral part of the quality control and Six Sigma programs at heavy equipment manufacturing organizations. Several of the largest global manufacturing organizations have global, site licenses for STATISTICA, used throughout their manufacturing sites.
    Applications range from Web-based monitoring of Quality Control to fairly standard statistical process control techniques to customized STATISTICA-based applications for analyses that are specific to the type of manufacturing being performed.
  • Warranty Analyses: Capital equipment manufacturers typically provide basic and extended warranties to their customers as a value-added service. The length of warranty to provide and its associated cost for each product are important concerns for these organizations.
    It is also helpful from product improvement and repair process improvement perspectives to be able to determine the most frequent repairs by product, the factors that contribute to a failure type, and the correlations between failures (e.g., if the repair technician determines that the water pump needs to be replaced, they may as well replace another component that is also likely to fail).
    STATISTICA‘s data mining and text mining algorithms are critical components in the successful setting of warranty parameters and the determination of repair guidelines and rules to decrease warranty service costs.
  • Remote Monitoring: As a value-added service to their customers, organizations are able to offer remote monitoring services to their customers that deploy data transmission devices on their products and feed data to a centralized database. STATISTICA is integrated with those databases and monitors the various data feeds from the customer’s equipment. For example, the STATISTICA application includes predictive models to monitor oil pressure, RPMs, water pressure and various other equipment parameters. STATISTICA provides automated alerting and exception reporting when the latest data predict a problem or a failure for a piece of equipment. The organization notifies the customer proactively before there is a problem and a decision is made about whether a repair technician should be sent out to make adjustments to the machine.
  • Sales Analysis / CRM: StatSoft’s customers in the Capital Equipment Industry use the broad base of analytic techniques in the platform to determine regional patterns in their sales and to make cross-selling and up-selling recommendations based upon what an individual customer just purchased, what they already own, the business that the customer is in, the region in which the customer is based, etc.

Churn Analysis

Churn Analysis

The primary goal of churn analysis is to identify those customers that are most likely to discontinue using your service or product. In this dynamic financial industry, companies are progressively providing products and services with similar features. Amidst this ever growing competition, the cost of acquiring a new customer typically exceeds the cost of retaining a current customer. Existing customers are a valuable asset. Furthermore, given the nature of the financial services industry, where customers generally tend to stay with a company for a longer term, churning could lead to substantial revenue loss.

With StatSoft’s Churn Analysis Solution, you can identify customers who are likely to churn by making precise predictions, reveal customer segments and reasons for leaving, engage with customers to improve communication and loyalty, calculate attrition rates, develop effective marketing campaigns to target customers and increase profitability. With STATISTICA’s advanced modeling algorithms and wide array of state-of-the-art tools, you can develop powerful models that can aid in accurate prediction of customer behavior and trends and avoid losing customers.

STATISTICA Solution

  • Batch or Real-Time Processing: Use the models you have built to determine churn and indicate, either by batch or in real-time, the customers who are likely to transfer their business to another company.
  • Cutting-edge Predictive Analytics: STATISTICA provides a wide variety of basic to sophisticated algorithms to build models which provide the most lift and highest accuracy for improved churn analysis.
  • Innovative Data Pre-processing Tools: STATISTICA provides a very comprehensive list of data management and data visualization tools.
  • Integrated Workflow: STATISTICA Decisioning Platform provides a streamlined workflow for powerful, rules-based, predictive analytics where business rules and industry regulations are used in conjunction with advanced analytics to build the best models.
  • Optimized Results: Compare the latest data mining algorithms side-by-side to determine which models provide the most gain. Produce profit charts with ease.
  • Role-Based, Enterprise-Wide Scope: If yours is a multi-user collaborative environment, you can use STATISTICA Enterprise to share data, improve churn models, and benefit from collaborative work with small or large groups.
  • Text Mining Unstructured Data: Improve churn models by using powerful text mining algorithms to incorporate unstructured data currently sitting unused in storage.

Included Technologies

  • STATISTICA Decisioning Platform
  • STATISTICA Extract, Transform, and Load
  • STATISTICA Enterprise
  • STATISTICA Data Miner
  • STATISTICA Text Miner
  • STATISTICA Live Score

Success Stories

Danske Bank Logo
Telecommunications Company Implements Customer Retention Strategy with STATISTICA Data Miner

Customer Analytics

Customer Analytics

Industries including Retail, Banking, Insurance and Marketing, among others, are investing their time and money on analytics to understand customer behavior more effectively. All key business decisions can be made by properly understanding the behavior of the customer through performing analyses such as market segmentation and predictive analytics. Implementing Customer Relationship Management (CRM) models will play a vital role in improving sales, marketing and customer service.

STATISTICA provides advanced analytic tools to build Customer Behavior Scoring Models to understand the behavior of customers. These models enable companies to understand customer’s opinion regarding a product based on previous actions. Highly sophisticated and robust tools are also available for performing a Recency, Frequency and Monetary (RFM) analysis to understand customer behavior and define market segments based on Recency, Frequency and Monetary values of the previous purchases.

STATISTICA Solution

  • Enterprise wide solution: A multi-user, role based, secure STATISTICA Enterprise platform allows for a truly collaborative environment to build, test and deploy the best possible models.
  • Enhanced Text Analytics: STATISTICA provides an advanced text miner tool to better leverage unstructured/textual data.
  • Cutting-edge Predictive Analytics: STATISTICA provides a wide variety of basic to sophisticated algorithms to build models which provide the most lift and highest accuracy for improved customer analytics.
  •  Innovative Data Pre-processing Tools: STATISTICA provides a very comprehensive list of data management and data visualization tools.
  • Powerful Statistical Tools: STATISTICA provides you with an arsenal of the most power statistical tools available.
  • Reflexive models for realtime needs: Use Live Score® to process new issues as they occur, and update your models in turn-around times made possible only by STATISTICA’s integrated solutions.
  • Integrated Workflow: STATISTICA Decisioning platform provides a streamlined workflow for powerful, rules-based, predictive analytics where business rules and industry regulations are used in conjunction with advanced analytics to build the best models.

Included Technologies

  • STATISTICA Decisioning Platform
  • STATISTICA Enterprise
  • STATISTICA Data Miner
  • STATISTICA Text Miner
  • STATISTICA Live Score
  • STATISTICA Sequence Association and Link Analysis

How to Most Efficiently Store Your Data

statistica how to logoWhen working with large data files, it becomes important to look for ways to make one’s processes more efficient. File size and computation times can both be affected by how data is stored.
Many variables can be stored more efficiently merely by changing a few of the default settings. In this brief article, we will explore the various methods to help make spreadsheet storage and computations more efficient.
To view and change the storage method of a given variable, click on the variable header in the spreadsheet. Then, select the Data tab and in the Variables group, click Specs to display the variable specification dialog box for the selected variable. You can also double-click on the variable header to display this dialog box. In the drop-down box labeled Type, you will find the data storage options. The default data storage method is double precision. In STATISTICA, it is called simply Double.
selecting double precision in STATISTICA
For variables stored with double precision, values are stored as 64-bit floating point real numbers, with 15-digit precision. The range of values supported by this data type is approximately +/-1.7*10308.
The next option, Text, is used for storing text data. The Length should be specified to store the number of characters needed. As you would expect, the longer the designated length of the text variable, the more storage space the data takes. So the length parameter should be set as small as possible to capture the full text.
For some types of numeric data, the double precision data storage is necessary. Any variable with values that have decimals or are extremely large or small require this storage type. But many variables are stored with far greater precision than necessary, and this is where we can change the data type and gain efficiency.

The integer data type takes on integer values between +/- 2,147,483,647. Variables stored with this method are still more efficient, with 4 bytes per cell, compared to 8 with double precision.
The byte data type takes on integer values from 0 to 255 and is the most economical storage option. For variables needing only small integer values, this data type should be used and only takes 1 byte of storage per cell in the spreadsheet.
Using the most efficient storage method for your variables makes for smaller spreadsheet files and faster computing.

Big Data Solutions

Big Data

“Big data” is the buzzword that is currently dominating professional conferences around data science, predictive modeling, data mining, and CRM, to name only a few of the domains that have become electrified by the prospect of incorporating qualitatively larger data sizes and more voluminous, high-velocity data streams into business or other organizational processes.  As is usually the case when new technologies begin to transform industries, the technologies also introduce new ways of “thinking about” or conceptualizing problems and approaches for solutions, and indeed can open new horizons for a business altogether.

At the same time, there are domains and situations where, inevitably, the excitement about new technologies around big data can give way to great disappointment, when the investments fail to yield expected ROI. It is important to use proven tools that not only can access, process, and leverage big data, but also fit-in and integrate with other existing data repositories and IT assets, and extract actionable information to optimize processes, solve problems, and lead to “big insights.”

StatSoft has years of experience and expertise in high-velocity and big-data applications, ranging from automated manufacturing, complex high-velocity process monitoring and optimization, to real-time risk and fraud scoring.  STATISTICA Enterprise, Decisioning Platform, Data Miner, Text Miner, and other STATISTICA solutions support massively parallel processing, are optimized for in-memory processing, , and will easily connect and interact with modern big data repositories from practically all vendors as well as Hadoop and other distributed data storage systems. Moreover, STATISTICA’s solutions easily scale to a large number of servers to support parallel processing and efficient batch as well as rapid real-time scoring.

StatSoft Big Data solutions provide the means for you to build out your big data platform and framework in a manner that will leverage your existing data repositories and IT, and not lock you into a specific technology direction. STATISTICA is built with modern software tools, and adhering to accepted interface and connectivity standards and best practices. This means the platform is truly open and limitless as specific requirements and use cases emerge.

Thus, StatSoft’s STATISTICA is your safest bet to produce ROI from your big data initiative.

STATISTICA Solution

  • Leading-Edge Predictive Analytics: Sophisticated algorithms to build models based on big data that provide the highest accuracy.
  • Enterprise-Wide Solution: A multi-user, role-based, secure STATISTICA Enterprise platform allows for a truly collaborative environment to build, test, and deploy the best possible models for predictive analytics.
  • Model, reporting, and general analytic templates and lifecycle management. The ability to impose version control and life cycle management for all analytic reporting, modeling, scoring, and other analytic processes translates into a well-managed and successful approach to big-data analytics.
  • Reflexive Models for RealTime Needs: STATISTICA Live Score® processes new data as they arrive and efficiently updates predictions in real-time, based on scoring models centrally managed through the STATISTICA Enterprise platform.
  • Integrated Workflow: STATISTICA Decisioning Platform provides a streamlined workflow for powerful, rules-based predictive analytics where business rules and constraints can be used in conjunction with advanced analytics to build the best models, and then score it efficiently over multiple parallel processors, or in real-time.

Included Technologies

  • STATISTICA Decisioning Platform
  • STATISTICA Enterprise
  • STATISTICA Data Miner
  • STATISTICA Live Score