Monthly Archives: November 2013

Big Data is Watching You

by Win Noren

Recently, I have heard people express concern about the data security of the US government’s new health care portal. Certainly, it is reasonable to be concerned about the security of this information, as the impact of a security breach with that information would be huge. The cost of identity theft to the individual whose identity is stolen cannot be counted by just the monetary cost, as the frustration and time spent on restoring your identity is not trivial.

While you can attend any number of conferences about big data and the benefits that companies can reap from this data, it is much rarer to hear anyone addressing the privacy concerns surrounding the use of big data. Of course, this doesn’t just apply to data that has been provided directly to a business through the transactions that you execute with that business, but it also applies to the data that we as individuals make public through our use of social media and various smart devices.

Rather than telling you how much our own social media posts reveal, watch this “social media experiment” by Jack Vale, a “man who pranks people for a living.”

Microsoft principal researcher Kate Crawford is warning that data mining of personal data will create a problem of digital discrimination that will be so subtle that one won’t even know that she has been discriminated against. Let’s say that a bank does not want to lend to a certain segment of the population. They could simply analyze customer behavior to determine where to advertise so that they do not even promote themselves to this segment of the population. Crawford states, “It’s not that big data is effectively discriminating — it is, we know that it is. It’s that you will never actually know what those discriminations are.”

So, what do you think? What type of mechanisms do we need to protect ourselves from Big Data?

Advertisements

Churn Analysis

Predictive Analytics in Business Processes

2012_monika_germany_article_image

 

 

 

Monika Nielsen, co-manager of StatSoft (Europe) GmbH in Germany, recently wrote an overview of STATISTICA Decisioning Platform® that pays special attention to the value of our Rules Builder. Her article was published in IT Director magazine.

Offering Decisioning Platform as “a user-friendly and fully automated system architecture that can be adapted to the requirements of different industries,” Nielsen notes that its server-based structure enables users to handle “even complex models [that] lead to immediate action.” Of course, Rules Builder makes it possible for “individual cases [to] be considered, evaluated, and classified, taking into account both fixed-defined rules and data-driven insights.”

See complete, original article in German here.

SOURCE: IT Director. PRÄDIKTIVE ANALYTIK IN GESCHÄFTSABLÄUFE INTEGRIERT. Monika Dielsen. September 28, 2012. Excerpts and image retrieved November 30, 2012, from

Document Type Library

DocumentSpreadsheetWorkbooks are the default way of managing output. They store each output document (e.g., a STATISTICA Spreadsheet or Graph, as well as a Microsoft Word or Excel document) as a tab.

Technically speaking, STATISTICA Workbooks are optimized ActiveX containers that can efficiently handle large numbers of documents. The documents can be organized into hierarchies of folders or document nodes (by default, one is created for each new analysis) using a tree view, in which individual documents, folders, or entire branches of the tree can be flexibly managed.

For example, selections of documents can be extracted (e.g., drag-copied or drag-moved) to the report window or to the application workspace (i.e., the STATISTICA application “background” where they are displayed in stand-alone windows). Entire branches can be placed into other workbooks in a variety of ways in order to build a specific folder organization, etc.

STATISTICA Documents WorkbooksEach workbook contains two panels: an Explorer-style navigation tree on the left and a document viewer on the right. The navigation tree (workbook tree) can be split into various nodes that are used to organize files in logical groupings (e.g., all analysis outputs or all macros created for a project). Tabs at the bottom of the document viewer (workbook viewer) are used to easily navigate the children of the currently selected node. You can easily move the tabs to the top, right, or left of the workbook viewer by right-clicking on one of the tabs and selecting a different location from the shortcut menu. One advantage of the side placement of tabs is that multiple rows (rather than one long row) are provided (as shown below). This makes it easy to select the appropriate tab.

STATISTICA Documents WorkbooksDisplaying tabs can also be suppressed to save the space. Unlike many Explorer-style navigation and organization applications that only allow folders to have children, the STATISTICA Workbook allows any item in the tree to have children. For example, you can add a spreadsheet to your workbook, and then add all the graphs produced using the data in the spreadsheet as children to the spreadsheet. A variety of drag-and-drop features and Clipboard procedures are available to aid you in organizing the workbook tree.

The workbook can hold all native STATISTICA documents including spreadsheets, graphs, reports, and macros. It can handle other types of ActiveX documents as well, including Excel spreadsheets, Word documents, and others. If you want to edit these documents, you can do so using the workbook viewer pane. To edit a Microsoft Word document, double-click on the object in the workbook tree. The Word document opens in the viewer, and the workbook menu bar merges with the Microsoft Word menu bar giving you access to all of the editing features you need. Workbooks can also be used to store all output from a particular analysis.

Navigating the Workbook Tree

STATISTICA Documents WorkbooksThe workbook tree displays the organization of files and folders in the workbook. The files and folders are displayed in an Explorer-style format. Items with plus signs next to them indicate folders or files that have children associated with them. To expand the tree for a particular folder or file, click the plus sign next to it. The workbook can support an unlimited number of levels, and both individual items from the tree view and entire branches can be flexibly (interactively) managed (e.g., right-click dragging to copy or move between workbooks or reports).

To select a workbook item for review or editing, simply locate the file in the workbook tree and double-click on its associated icon. The document will then open in the workbook viewer pane. Note that you can also navigate through the children of the currently selected node using the navigation tabs available (by default) at the bottom of the workbook viewer. As mentioned previously, you can easily move these navigation tabs to the top, right, or left of the workbook viewer by right-clicking on one of the tabs and selecting a different location from the shortcut menu or selecting the appropriate command from the Workbook – Tab Control submenu. Note that tabs at the top and bottom of the viewer scroll sideways, while multiple rows of tabs are used when tabs are placed to the left or right of the viewer. Items in the tree are identified by the icon next to them. The folder icon represents a folder that can contain a variety of documents and subfolders. The spreadsheet, report, macro, and graph icons represent STATISTICA Spreadsheet, Report, Macro, and Graph documents, respectively.

“There must be limits to the density and variability of data flow, but I’ve not yet discovered them – despite some very demanding work that would make most softwaer packages cry”Software Review
Quality Digest, Sept. 2002

All non-STATISTICA documents are represented by their respective document icons. For example, Word documents are represented by the Word icon, and Excel spreadsheet files are represented by the Excel spreadsheet icon.

The workbook tree can be organized and modified using drag-and-drop features as well as Clipboard procedures. More information about Workbook Drag-and-Drop Features and Workbook Clipboard Features can be found in STATISTICA Help. Commands for inserting, extracting, renaming, and removing items from the workbook tree are available from the workbook tree shortcut menu (accessed by right-clicking anywhere in the tree). These commands are also accessible from the Workbook menu.

Spreadsheets (Multimedia Tables)

STATISTICA Documents SpreadsheetsSTATISTICA Spreadsheets are based on StatSoft’s proprietary multimedia table technology and are used to manage both input data and the numeric or text (and optionally any other type of) output. The basic form of the spreadsheet is a simple two-dimensional table that can handle a practically unlimited number of cases (rows) and variables (columns), and each cell can contain a virtually unlimited number of characters. Sound, video, graphs, animations, reports with embedded objects, or any ActiveX compatible documents can also be attached.

STATISTICA Documents SpreadsheetsBecause STATISTICA Spreadsheets can also contain macros and any user-defined user interface, these multimedia tables can be used as a framework for custom applications (e.g., with a list box of options or a series of buttons placed in the upper-left corner), self-running presentations, animations, simulations, etc.

Data file layout in spreadsheets. STATISTICA data are organized into cases and variables. If you are unfamiliar with this notation, you can think of cases as the equivalent of records in a database management program (or rows of a spreadsheet), and variables as the equivalent of fields (or columns of a spreadsheet). Each case consists of a set of values of variables, and the first column in the file can (optionally) contain names of cases.

STATISTICA Documents Spreadsheets

The spreadsheet window comprises several basic components, as seen in this illustration.

Data (and in-cell formatting options). The remainder of the spreadsheet contains data that pertain to the cases and variables and any optional attached or linked objects (multimedia objects, macros, custom user interface).

Text in cells can be of practically unlimited length (in most STATISTICA configurations, it is limited to 1,000 characters to protect against inadvertent pasting of unwanted large amounts of data into one cell). Text in cells can be extensively formatted including different fonts and font attributes.

STATISTICA Documents Spreadsheets

Reports

Reports in STATISTICA offer a more traditional way of handling output (compared to workbooks) as each object (e.g., a STATISTICA Spreadsheet or Graph, or a Microsoft Excel spreadsheet) is displayed sequentially in a word processor style document.

STATISTICA Documents ReportsHowever, the technology behind this simple report offers you rich functionality. For example, like the workbook, each STATISTICA Report is also an ActiveX container where each of its objects (not only STATISTICA Spreadsheets and Graphs, but also any other ActiveX-compatible documents, e.g., Microsoft Word documents, Excel files and graphics files) is active, customizable, and in-place editable. Reports are stored in the STR file format, which is a StatSoft extension of the Microsoft RTF (Rich Text Format, *.rtf) format. STR files share the RTF formatting information, and additionally they include the tree view information (which cannot be stored in the standard RTF files). Hence, report files are by default saved with the file name extension *.str, but they can also be saved as standard RTF files (in which case the tree information will not be preserved).

STATISTICA Documents ReportsThe obvious advantages of this way of handling output (more traditional than the workbook) are the ability to insert notes and comments “in between” the objects as well as its support for the more traditional way of quickly scrolling through and reviewing the output to which some users may be accustomed. (Note that the editor supports variable speed scrolling.)

The obvious drawback, however, of these traditional reports is the inherent flat structure imposed by their word processor style format, though that is what some users of certain applications may favor.

The report tree can be organized and modified using drag-and-drop features as well as Clipboard procedures. Commands for inserting, extracting, renaming, and removing items from the report tree are available from the report tree shortcut menu (accessed by right-clicking anywhere in the tree, as shown in the image above).

Graphs

Graph documents represent another distinctive type of STATISTICA documents, and they offer rich functionality both in terms of the variety of ways in which graphs can be created in STATISTICA and in the selection of graph customization tools.

Similar to the other STATISTICA documents, graphs are ActiveX containers, which means that they can contain a variety of compatible documents (e.g., Visio drawings, Adobe illustrations, Excel spreadsheets, etc.). STATISTICA Graphs are also ActiveX objects and, therefore, can be linked to or embedded into other compatible documents (e.g., Word Documents) where they can be in-place edited by simply double-clicking on them.

Macros (STATISTICA Visual Basic Programs)

STATISTICA Documents MacrosThe industry standard STATISTICA Visual Basic language (integrated into STATISTICA) offers another (alternative) user interface to the functionality of STATISTICA, and it offers incomparably more than just a “supplementary application programming language” that can be used to write custom extensions. STATISTICA Visual Basic takes full advantage of the object model architecture of STATISTICA and is used to access programmatically every aspect and virtually every detail of the functionality of STATISTICA. Even the most complex analyses and graphs can be recorded into Visual Basic macros and later be run repeatedly or edited and used as building blocks of other applications. STATISTICA Visual Basic adds an arsenal of more than 13,000 new functions to the standard comprehensive syntax of Microsoft Visual Basic, thus comprising one of the largest and richest development environments available.

STATISTICA Macros can be saved in several formats, depending on how you intend to use them. You can also copy them to the Clipboard and paste them into other programs as documents.

StatSoft Poland Showcases Successes at Annual Seminar

StatSoft Poland recently concluded its popular, annual series of data mining / data analysis seminars during October. Drawing roughly 650 attendees, this year’s presentations centered on improvement of production processes, covering applications in manufacturing and scientific research, as well as specific use cases by StatSoft customers and demonstrations of STATISTICA’s broad data mining capabilities.

amcor company logo Attendees learned of the collaboration between StatSoft and AMCOR Flexibles Reflex to develop an integrated SPC quality system with STATISTICA. An industrial packaging company, AMCOR’s complex production processes consist of several stages, so AMCOR required a monitoring platform that could meet specs for security, performance, and quality while offering easy access to information that could produce thorough reports on any production aspect necessary. AMCOR’s Justin Sikorska described how his company found that STATISTICA can readily deliver real-time process monitoring; produce summary tables and graphs based on past data; issue alerts for adverse events; execute SPC monitoring to measure regulation and stability; anticipate failure through predictive maintenance models; and track raw product components through all manufacturing stages (i.e. product traceability). Sikorska also described AMCOR’s pleasure with STATISTICA’s “clear and user-friendly environment.”

adfors saint-gobain company logo Also featured was Luke Depczyński of Saint-Gobain ADFORS, an industrial fabric and construction reinforcement company. ADFORS’ manufacturing plant, located in Gorlitz, requires continuous monitoring of changes in process parameters and final product properties due to high production volume and the specific, individualized requirements of its multiple customers. Mr. Depczyński described how Saint-Gobain ADFORS selected StatSoft to develop a suitable quality control system with STATISTICA and presented an overview of their successful integration.

StatSoft Poland’s seminars were conducted in Warsaw, October 22-24, 2013, at Hotel Gromada Warszawa Airport.

STATISTICA Document Management System

fp-banners-dnn-products-document-management-system

 

 

The STATISTICA Document Management System (SDMS) is a complete, highly scalable, database solution package for managing electronic documents.

The product enables you to quickly, efficiently, and securely manage documents of any type (e.g., find them, access them, search for content, review, organize, edit [with trail logging and versioning], approve, etc.).

It is specifically designed to ensure compliance with FDA 21 CFR Part 11 regulations, Sarbanes-Oxley legislation as well as ISO 9000, 9001, 14001 documentation requirements.

The key features include:

  • Extremely transparent and easy to use
  • Flexible, customizable (can be optionally configured for Web-enabled access) user interface
  • Electronic Signatures
  • Comprehensive Audit Trails, Approvals
  • Optimized Searches
  • Security
  • Satisfy the FDA 21 CFR Part 11 Requirements
  • Satisfy the Sarbanes-Oxley Legislation Requirements
  • Satisfy ISO 9000 (9001, 14001) Documentation Requirements
  • Unlimited scalability (from desktop or network Client-Server versions, to the ultimate size, Web-based worldwide systems)
  • Open Architecture and Compatibility with Industry Standards

Compliance

The STATISTICA Document Management System (SDMS) complies with the following:

FDA

The general requirements put forth in the Code of Federal Regulations (CFR) Title 21 Part 11 specify what a business needs to do in order to maintain electronic records acceptable for submission to the FDA (Food and Drug Administration).

Sarbanes-Oxley Legislation

Sarbanes-Oxley Legislation imposes new, extensive reporting and record keeping requirements on all publicly-traded companies and mandate that Executives of those companies take personal responsibility for the procedures of collecting data for the company’s financial Reports and for the integrity of their contents. In order to comply with the requirements, companies need flexible software systems that facilitate record keeping and document management in a secure and efficient manner.

ISO

Guidelines for manufacturing in general (often collectively known as ISO 9000 standards) have been published by the International Organization for Standardization (e.g., see ISO 9001 4.5: Document and data control; also ISO 14001, Ch. 4.5.5.).

Compatibility

Integrates with all STATISTICA products

STATISTICA Document Management System (SDMS) seamlessly integrates with all STATISTICA products, from Base and Advanced to enterprise-wide installations such as STATISTICA Enterprise worldwide installations or STATISTICA Enterprise/QC for process analysis and quality control/improvement.

You can easily access all SDMS functionality from within your STATISTICA projects (e.g., all analysis projects, data mining, text mining, reporting, etc.). So directing your reports or data sets to the secure repository of SDMS is as easy as simply saving a file, because your authentication can be based on your initial log-in into STATISTICA. No entry of additional passwords is necessary.

You can also build the functionality of SDMS into your shortcuts, automated STATISTICA applications, and other custom systems to simplify your work and enhance productivity.

Stand-alone, highly compatible application

SDMS can be used as a stand-alone system. But since SDMS uses COM and SOAP-based architecture, and is compatible with the Microsoft WebService interface, it can also be called from other applications, integrated into existing systems, or expanded by adding custom functionality.

Compatibility with other standards

Please also inquire about the compatibility of STATISTICA Document Management System (SDMS) with the Open Document Management API (ODMA) standard, and the interfaces and support for the Web-based Distributed Authoring and Versioning (WebDAV) standard.

Two Versions

The STATISTICA Document Management System (SDMS) is available in an Enterprise Version, or in an Entry Level version (designed for smaller groups of users):

The Enterprise Version can be deployed in one of two ways, depending on whether the user needs to build the SDMS functionality into an existing database system:

SDMS can be configured as a stand-alone complete application driven by a high-performance general database engine based on Microsoft SQL Server.

SDMS can be integrated with an already existing database infrastructure or data warehouse. SDMS is compatible with industry standard database management systems such as Oracle, MS SQL Server, Sybase, Informix, and DB2.

The Entry Level Version is recommended for smaller installations (usually 5 to 10 simultaneous users, depending on the volume of their work). The Entry Level version does not include (or require) a high performance, scalable database engine, because it is based on a fixed database management component built into the product. This makes the Entry Level Version more cost effective, but it is still a fully functional, secure, and large capacity document management system. It can also be easily converted later, as your needs grow, into the fully scalable Enterprise Version described above.

How the STATISTICA Document Management System Works

To satisfy the diverse functionality and security requirements of various types of users, the STATISTICA Document Management System (SDMS) implements several options for managing documents:

  1. SDMS enables you to save documents to a secure repository database from within STATISTICA, WebSTATISTICA, or the stand-alone SDMS application. Its intuitive user interface allows you to easily perform all document management operations from any computer on your network, or even via the Internet.
  2. Most document types can be automatically maintained in both (a) the archival, non-editable “review-only” PDF format, with the appropriate electronic signatures, and (b) the editable “source” format  that allows those with the appropriate access privileges to create new, modified versions of the document. None of the edits or changes, however, will ever overwrite the source file of the previous version–they will only add a new file to the repository.
  3. Strict security via electronic signatures (compliant with 21 CFR Part 11 and Sarbanes- Oxley Legislation requirements) is enforced. Different individuals or groups of users can be authorized to create, edit, or review documents in different parts of the archive.
  4. Documents in the archive cannot be deleted by end-users. Every time a document is edited, a new version is created and logged. The log will contain annotations to identify the time and the author of the modifications. SDMS can be configured to include other information in the log as well.
  5. The program is configured so that no information is ever discarded. Previous document versions, document histories, logs, etc. are all preserved.
  6. Documents can be locked to prohibit any further editing.
  7. Approval trail requirements can be established, so that documents must be reviewed, approved, and signed (via electronic signatures) by designated supervisors before they can be placed in designated parts of the repository.
  8. A complete audit trail of all document changes is automatically created. The audit trail can be printed, or saved in electronic form, and then submitted to regulatory bodies or agencies.
  9. To satisfy formatting requirements for electronic submission of records, various options are available for maintaining renditions in PDF and XPORT file formats (see FDA “Guidance for Industry: Providing Regulatory Submissions in Electronic Format – General Considerations”).

Ensuring Security and Compliance

The STATISTICA Document Management System (SDMS) is not only a flexible, high-performance system that will increase your productivity by facilitating the management of crucial documents. SDMS also ensures compliance with the requirements of regulatory agencies, such as FDA 21 CFR Part 11, Sarbanes-Oxley Legislation, and ISO 9000.

Security, Electronic Signatures

  1. The STATISTICA Document Management System requires that passwords contain more than 6 letters and not to be of a “common” type, e.g., “111111” is not allowed.
  2. Passwords can be configured by the administrator to expire, so that users are forced to change passwords at regularly scheduled intervals.
  3. The system applies automatic user-lockout and maintains records for the administrators when a certain number of attempts were made to log into the system with the wrong password.
  4. The STATISTICA Document Management System allows you to define users, and groups of users, with appropriate privilege. Types of privileges include the permission to create documents, edit documents, review documents, approve documents, and so on.

Version Control and Audit Trails

  1. In the STATISTICA Document Management System, everything is documented and traceable. For example, documents are never deleted. When a document is edited, then a new version of that document is created, properly authenticated, and annotated with electronic signatures. Authorized and authenticated users can be required to explicitly check out the respective documents from the repository, and check the new versions into the repository with notes and documentation regarding the nature and purpose of the edits.
  2. When a document is checked in, the program can be configured to perform various verification and documentation operations. For example, it may require the user to complete a check-list stating the purpose of the edits, or a brief summary of the edits. The system is fully customizable during installation, so that annotations, signatures, or other requirements associated with the creation or editing of documents can be enforced.
  3. Summarization options allow authorized users to review the complete audit trail for requested documents.
  4. To help ensure compliance with regulatory requirements, different version of documents will persist indefinitely and cannot be deleted by end users.
  5. Options are available to perform simple or complex searches of the documents, and their various versions.

Recommended (and FDA Approved) Archival Document types

One of unique strengths of the STATISTICA Document Management System is its ability to store and exchange information in almost any electronic file format, including your proprietary formats. This allows you to share information internally in the ways that are most convenient for your organization. It also makes it possible to share documents externally by using practically all industry standard formats and protocols.

In particular, SDMS allows you to save data and reports as PDF files or XPORT files. These formats are the preferred file formats that are recommended in the FDA “Guidance for Industry: Providing Regulatory Submissions in Electronic Format – General Considerations.”

Open Architecture

Like the entire STATISTICA system, the STATISTICA Document Management System is highly configurable, and its functionality is very compatible with other applications. So the system can be customized to accommodate your specific tasks, and can be integrated seamlessly into existing systems for data and document management.

PAW Recap: Privacy, Big Data, and Standing O

PAW Recap: Privacy, Big Data, and Standing O  PAW-we-are-exhibiting2013
 

We just wrapped up our exhibit and presentations at the Predictive Analytics World in Boston last week. Just a few notes about this event…

The StatSoft presentation, “Addressing Privacy Concerns: Critical Features for Predictive Analytics Platforms,” highlighted the role of model and data governance, a subject often neglected in predictive modeling discussions despite its importance as a driver of software requirements.

StatSoft’s VP of Analytic Solutions, Dr. Thomas Hill, prepared the content in light of recent media coverage of invasion-of-privacy concerns stemming from exhaustive and effective data mining. Taking the stance that enterprise analytics platforms must support features allowing the implementation of security policies and rules, Hill provided an overview of STATISTICA Decisioning Platform®’s key features that have made it a favorite in highly regulated industries.

Senior Statistician Dr. Gary Miner took part in the invitation-only “Big Data Expert Panel,” moderated by PAW Founder Eric Siegel. And Carleton Jones, StatSoft’s Director of Financial Services, was a big hit with the audience and received an unprecedented standing ovation after his brief presentation.

See our short PAW-Boston photo gallery on Facebook.

How to Navigate the STATISTICA Workspace

Overview

 beta proc
Analysis projects can be carried out in a variety of ways with STATISTICA. For Enterprise and Data Miner users, one option is the Workspace, which provides a visual way of performing your analysis tasks that additionally makes the process repeatable. The Workspace shows a symbolic representation of the flow from the input data through any data preparation and cleaning steps to exploratory and analysis tools. The flow continues to the output reports.
Why Use the Workspace
  • Visual – The project is laid out visually to show the workflow from input data to the results.
  • Repeatable – Run the Workspace multiple times as data update or even on new data sets.
  • Reproducible – Project steps are laid out visually and can be explored to see exactly what was done to obtain the results.
  • Flexible – The same analysis options are available in the Workspace that you have in the original interactive analyses.
  • Customizable – Custom nodes can be created for the Workspace and shared with colleagues.
Recent Improvements
The Workspace became more flexible in the most recent versions of STATISITCA, offering a new type of node in addition to the scripted nodes offered before. The Workspace is easier to use and offers much more flexibility. Some nodes are still in Beta version. New nodes, Beta nodes, and the previous style of scripted nodes are all available to you and can be used together in a single Workspace project.
Workspace nodes can be accessed not only from the Node Browser, but also from the menu structure you are already accustomed to using. When a Workspace is active, the Statistics, Data Mining, Graphs, and Data tabs change to the Workspace node tabs. Selecting an analysis or tool from the tabs will add the appropriate node to the Workspace. Visually, STATISTICA indicates this by highlighting the tabs as seen here.
STATISTICA workspace menu
The group of nodes available is governed by the selected configuration from the drop-down menu found at the upper-right corner of the Workspace. The configuration can be changed at any time, and both the available nodes on the tabs and in the Node Browser are updated accordingly.
Output management within the Workspace is more flexible. On the Workspace tab of the global Options dialog box, options are available to route output. By default, all results will be sent to one workbook node to organize all output together.
STATISTICA workspace showing global options for output
Scripted Nodes
The scripted nodes have always been part of the Workspace and are still available for use in all Workspace projects. These nodes are indicated in Version 12 with SVB at the top of the node as seen below. If you are familiar with the Workspace from past versions of STATISTICA, these nodes still work the same way. Variable selection is performed in the input data node.
STATISTICA workspace showing scripted SVB node
Working With New and Beta Nodes 
The nodes that are new with Version 12 (as well as those still in Beta) work differently from scripted nodes. They are more flexible and have more options.
Selecting variables. Each new node for data preparation, analysis, and graphing contains options for variable selection, case selection conditions, weighting, etc., all available in the node dialog box. In scripted nodes, variable selection is performed at the node prior to the analysis node. The new functionality gives you greater flexibility within each individual analysis to easily control the settings. Different variable selections, weights, selection conditions, etc., can be used for each analysis. Alternatively, the Select Variables node accessed from the Data tab in the Variables group offers variable selection one time where downstream analyses will inherit the same variable selection. The option that works best for your analysis is easily available to you.
The Select Variables node may be necessary when using a combination of new nodes and scripted nodes in one Workspace. In the example below, the Data Health Check node output is used as input for the scripted regression node, Best-Subset and Stepwise Regression. To use this new node output as input, variable selections must be made before the scripted node as shown below.
STATISTICA workspace showing mix of variable and scripted nodes
Selecting options and output. The available options for these new nodes are the same as what you are accustom to in the original interactive analyses. The new node dialog boxes have a similar appearance and are easier to navigate than the previous scripted nodes. Both the set up for the analysis and the results options are located in one dialog box. Use the tree pane on the left to navigate through the tabs, and make all desired selections before running the node. Check each desired result item you want to be included in the output. Some nodes offer spreadsheets for downstream analysis that can be selected and customized as well.
STATISTICA workspace new node options
Helpful Hints
  • Highlighting the input data node to be used for input before selecting the analysis node will automatically connect the input data to the node.
  • Options such as Run to node and Run modified nodes make it possible for you to execute only portions of the Workspace at a time.
  • Right-click on a connection and select Disable to temporarily avoid an analysis connection and everything downstream from it.
  • The Workspace node can have up to five icons on it to perform actions such as:
    • STATISTICA workspace node icon for review reporting docsSTATISTICA workspace node with 5 icons attached to itView the reporting documents (upper-right)
    • STATISTICA workspace node icon for node available for new connectionShow the node is available for a new connection for downstream analysis (center-right)
    • STATISTICA workspace node icon for view output spreadsheetView the output spreadsheet for downstream analyses (lower-right)
    • STATISTICA workspace node icon for running the workspaceRun the Workspace (lower-left)
    • STATISTICA workspace node icon for editing parametersEdit parameters (upper-left)
If You Need to Reset Your Node Browser
When running STATISTICA 12, if you don’t see the same node configurations as in this article, you likely need to restore the default settings of the Node Browser.
STATISTICA saves option changes and customizations to your software and transfers them when you upgrade, if you so choose. If in a previous version of the software, you customized the Workspace Node Browser, STATISTICA will not automatically override those customizations. So your former configurations, that don’t include Beta Procedures, will be used. To add the Beta Procedures:
  1. On the Workspace toolbar, click Node Browser
  2. On the Node Browser toolbar, click STATISTICA workspace node options icon Options
  3. In the Browser Options dialog box, click the Restore Defaults button to remove the customizations and show the Beta Procedures and other standard lists for Version 12.
STATISTICA workspace restore node browser

Limited role for big data seen in developing predictive models

Source: http://searchbusinessanalytics.techtarget.com

Many analytics professionals have high hopes for big data, but speakers at the Predictive Analytics World conference struck a decidedly cautious tone when discussing the concept as it relates to building predictive models.

“To me, big data is just a hot-flash term, but it’s nothing new to us,” said Gary Miner, senior statistician and data-mining consultant at StatSoft.

If you’re going to make sense of data, you need to sort through the noise, and you’re going to end up with a smaller data set.

Gary Miner,
senior statistician and data-mining consultant, StatSoft

There is still disagreement around what the term big data actually means. The most common definitions talk about high data volume, velocity and variety. But the precise volume needed to qualify a data set as “big” is imprecise. Miner said some people think several terabytes of data qualifies as big, while others say it takes hundreds of terabytes.

Either way, he feels the importance of big data has been overblown. He said it is possible to find some really telling correlations in rather small data sets. For example, he talked about how some medical breakthroughs have come out of trials involving fewer than 100 patients. This is because smaller, more refined data sets often make it easier to single out the trend in the noise.

The fact that storage space is getting cheaper has led many in the analytics world to ponder the possibilities that may come from analyzing whole data sets, but Miner said you typically get better results more quickly by using randomized samples from data sets.

“If you’re going to make sense of data you need to sort through the noise, and you’re going to end up with a smaller data set,” Miner said.

Michael Berry, analytics director at TripAdvisor for Business, said the current interest in big data comes from a desire on the part of businesses to implement a single piece of technology that solves multiple problems. He said vendors have been glad to play into this desire, promising that their big data software will greatly simplify business analytics projects. But he said this drive for an easy, simple solution is mostly a fantasy.

“While it’s never been true, it makes a good sales pitch,” he said.

Instead of hoping that big data software will solve every analytics problem, Berry recommended working to improve predictive models. The variables that define a predictive model ultimately matter more than the amount of data fed into the model.

And adding more data may simply increase the time it takes to reach new insights, Berry said. When analyzing data sets, patterns often reveal themselves quickly. If a pattern becomes apparent after analyzing 100 data points, there is no need to continue analyzing 100,000 more data points. The pattern will still be there. All you will have done is lengthen the project. Adding more data may simply lead to diminishing returns.

But not everyone was quite so bearish about big data. Peter Amstutz, analytics strategist at advertising agency Carmichael Lynch, said it is important, when developing predictive models, to collect data containing as many variables as possible. Sometimes it may be possible to accumulate information on a broad set of variables from a single source of standardized records, but often an organization will need to collect large amounts of less structured data. This is where the idea of big data can be helpful.

Learn more about developing predictive models

See what kind of skills you need on your IT team

Read this definition of predictive modeling

Learn why predictive modeling projects fail

Amstutz recently helped Subaru implement an uplift modeling project that allows the car manufacturer to target its ad buys more effectively. Amstutz said he is always looking for new data sources that might contain information on consumer attributes that are relevant to building the profile of a consumer who may be receptive to Subaru’s advertising. By looking at a greater number of variables, the advertising agency can precisely pinpoint the type of consumer who is likely to buy a Subaru.

It’s not so much the amount of data that’s important as it is the quality of the data. Eric Feinberg, senior director of mobile, media and entertainment at analytics vendor ForeSee, said large volumes of data are generally only helpful if they are standardized and accurate.

He added that the benefits of big data analytics vary greatly by industry. In studying sales trends, outliers that become apparent by studying full data sets may just add noise to the model, making it hard to find the true trend. But Feinberg pointed out that the outliers are exactly what analysts are looking for in fraud detection. So sales forecasting may work fine when using small samples, while fraud prevention efforts can benefit from big data analytics.

On the other hand, more traditional methods may work even better. Feinberg used the example of a medical device company that wants to build a better profile of its cardiologist customers. It could gather a large data set to find characteristics of likely buyers. Or it could simply pay cardiologists to participate in a focus group.

“That, in many cases, does the same thing,” Feinberg said. “It’s harder, it takes more time, but the outcome is a mature data set.”

Ed Burns is site editor of SearchBusinessAnalytics. Email him at eburns@techtarget.com and follow him on Twitter: @EdBurnsTT.