Monthly Archives: November 2013
by Win Noren
Recently, I have heard people express concern about the data security of the US government’s new health care portal. Certainly, it is reasonable to be concerned about the security of this information, as the impact of a security breach with that information would be huge. The cost of identity theft to the individual whose identity is stolen cannot be counted by just the monetary cost, as the frustration and time spent on restoring your identity is not trivial.
While you can attend any number of conferences about big data and the benefits that companies can reap from this data, it is much rarer to hear anyone addressing the privacy concerns surrounding the use of big data. Of course, this doesn’t just apply to data that has been provided directly to a business through the transactions that you execute with that business, but it also applies to the data that we as individuals make public through our use of social media and various smart devices.
Rather than telling you how much our own social media posts reveal, watch this “social media experiment” by Jack Vale, a “man who pranks people for a living.”
Microsoft principal researcher Kate Crawford is warning that data mining of personal data will create a problem of digital discrimination that will be so subtle that one won’t even know that she has been discriminated against. Let’s say that a bank does not want to lend to a certain segment of the population. They could simply analyze customer behavior to determine where to advertise so that they do not even promote themselves to this segment of the population. Crawford states, “It’s not that big data is effectively discriminating — it is, we know that it is. It’s that you will never actually know what those discriminations are.”
So, what do you think? What type of mechanisms do we need to protect ourselves from Big Data?
Monika Nielsen, co-manager of StatSoft (Europe) GmbH in Germany, recently wrote an overview of STATISTICA Decisioning Platform® that pays special attention to the value of our Rules Builder. Her article was published in IT Director magazine.
Offering Decisioning Platform as “a user-friendly and fully automated system architecture that can be adapted to the requirements of different industries,” Nielsen notes that its server-based structure enables users to handle “even complex models [that] lead to immediate action.” Of course, Rules Builder makes it possible for “individual cases [to] be considered, evaluated, and classified, taking into account both fixed-defined rules and data-driven insights.”
See complete, original article in German here.
SOURCE: IT Director. PRÄDIKTIVE ANALYTIK IN GESCHÄFTSABLÄUFE INTEGRIERT. Monika Dielsen. September 28, 2012. Excerpts and image retrieved November 30, 2012, from
Technically speaking, STATISTICA Workbooks are optimized ActiveX containers that can efficiently handle large numbers of documents. The documents can be organized into hierarchies of folders or document nodes (by default, one is created for each new analysis) using a tree view, in which individual documents, folders, or entire branches of the tree can be flexibly managed.
For example, selections of documents can be extracted (e.g., drag-copied or drag-moved) to the report window or to the application workspace (i.e., the STATISTICA application “background” where they are displayed in stand-alone windows). Entire branches can be placed into other workbooks in a variety of ways in order to build a specific folder organization, etc.
Each workbook contains two panels: an Explorer-style navigation tree on the left and a document viewer on the right. The navigation tree (workbook tree) can be split into various nodes that are used to organize files in logical groupings (e.g., all analysis outputs or all macros created for a project). Tabs at the bottom of the document viewer (workbook viewer) are used to easily navigate the children of the currently selected node. You can easily move the tabs to the top, right, or left of the workbook viewer by right-clicking on one of the tabs and selecting a different location from the shortcut menu. One advantage of the side placement of tabs is that multiple rows (rather than one long row) are provided (as shown below). This makes it easy to select the appropriate tab.
Displaying tabs can also be suppressed to save the space. Unlike many Explorer-style navigation and organization applications that only allow folders to have children, the STATISTICA Workbook allows any item in the tree to have children. For example, you can add a spreadsheet to your workbook, and then add all the graphs produced using the data in the spreadsheet as children to the spreadsheet. A variety of drag-and-drop features and Clipboard procedures are available to aid you in organizing the workbook tree.
The workbook can hold all native STATISTICA documents including spreadsheets, graphs, reports, and macros. It can handle other types of ActiveX documents as well, including Excel spreadsheets, Word documents, and others. If you want to edit these documents, you can do so using the workbook viewer pane. To edit a Microsoft Word document, double-click on the object in the workbook tree. The Word document opens in the viewer, and the workbook menu bar merges with the Microsoft Word menu bar giving you access to all of the editing features you need. Workbooks can also be used to store all output from a particular analysis.
Navigating the Workbook Tree
The workbook tree displays the organization of files and folders in the workbook. The files and folders are displayed in an Explorer-style format. Items with plus signs next to them indicate folders or files that have children associated with them. To expand the tree for a particular folder or file, click the plus sign next to it. The workbook can support an unlimited number of levels, and both individual items from the tree view and entire branches can be flexibly (interactively) managed (e.g., right-click dragging to copy or move between workbooks or reports).
To select a workbook item for review or editing, simply locate the file in the workbook tree and double-click on its associated icon. The document will then open in the workbook viewer pane. Note that you can also navigate through the children of the currently selected node using the navigation tabs available (by default) at the bottom of the workbook viewer. As mentioned previously, you can easily move these navigation tabs to the top, right, or left of the workbook viewer by right-clicking on one of the tabs and selecting a different location from the shortcut menu or selecting the appropriate command from the Workbook – Tab Control submenu. Note that tabs at the top and bottom of the viewer scroll sideways, while multiple rows of tabs are used when tabs are placed to the left or right of the viewer. Items in the tree are identified by the icon next to them. The folder icon represents a folder that can contain a variety of documents and subfolders. The spreadsheet, report, macro, and graph icons represent STATISTICA Spreadsheet, Report, Macro, and Graph documents, respectively.
Quality Digest, Sept. 2002
All non-STATISTICA documents are represented by their respective document icons. For example, Word documents are represented by the Word icon, and Excel spreadsheet files are represented by the Excel spreadsheet icon.
The workbook tree can be organized and modified using drag-and-drop features as well as Clipboard procedures. More information about Workbook Drag-and-Drop Features and Workbook Clipboard Features can be found in STATISTICA Help. Commands for inserting, extracting, renaming, and removing items from the workbook tree are available from the workbook tree shortcut menu (accessed by right-clicking anywhere in the tree). These commands are also accessible from the Workbook menu.
Spreadsheets (Multimedia Tables)
STATISTICA Spreadsheets are based on StatSoft’s proprietary multimedia table technology and are used to manage both input data and the numeric or text (and optionally any other type of) output. The basic form of the spreadsheet is a simple two-dimensional table that can handle a practically unlimited number of cases (rows) and variables (columns), and each cell can contain a virtually unlimited number of characters. Sound, video, graphs, animations, reports with embedded objects, or any ActiveX compatible documents can also be attached.
Because STATISTICA Spreadsheets can also contain macros and any user-defined user interface, these multimedia tables can be used as a framework for custom applications (e.g., with a list box of options or a series of buttons placed in the upper-left corner), self-running presentations, animations, simulations, etc.
Data file layout in spreadsheets. STATISTICA data are organized into cases and variables. If you are unfamiliar with this notation, you can think of cases as the equivalent of records in a database management program (or rows of a spreadsheet), and variables as the equivalent of fields (or columns of a spreadsheet). Each case consists of a set of values of variables, and the first column in the file can (optionally) contain names of cases.
The spreadsheet window comprises several basic components, as seen in this illustration.
Data (and in-cell formatting options). The remainder of the spreadsheet contains data that pertain to the cases and variables and any optional attached or linked objects (multimedia objects, macros, custom user interface).
Text in cells can be of practically unlimited length (in most STATISTICA configurations, it is limited to 1,000 characters to protect against inadvertent pasting of unwanted large amounts of data into one cell). Text in cells can be extensively formatted including different fonts and font attributes.
Reports in STATISTICA offer a more traditional way of handling output (compared to workbooks) as each object (e.g., a STATISTICA Spreadsheet or Graph, or a Microsoft Excel spreadsheet) is displayed sequentially in a word processor style document.
However, the technology behind this simple report offers you rich functionality. For example, like the workbook, each STATISTICA Report is also an ActiveX container where each of its objects (not only STATISTICA Spreadsheets and Graphs, but also any other ActiveX-compatible documents, e.g., Microsoft Word documents, Excel files and graphics files) is active, customizable, and in-place editable. Reports are stored in the STR file format, which is a StatSoft extension of the Microsoft RTF (Rich Text Format, *.rtf) format. STR files share the RTF formatting information, and additionally they include the tree view information (which cannot be stored in the standard RTF files). Hence, report files are by default saved with the file name extension *.str, but they can also be saved as standard RTF files (in which case the tree information will not be preserved).
The obvious advantages of this way of handling output (more traditional than the workbook) are the ability to insert notes and comments “in between” the objects as well as its support for the more traditional way of quickly scrolling through and reviewing the output to which some users may be accustomed. (Note that the editor supports variable speed scrolling.)
The obvious drawback, however, of these traditional reports is the inherent flat structure imposed by their word processor style format, though that is what some users of certain applications may favor.
The report tree can be organized and modified using drag-and-drop features as well as Clipboard procedures. Commands for inserting, extracting, renaming, and removing items from the report tree are available from the report tree shortcut menu (accessed by right-clicking anywhere in the tree, as shown in the image above).
Graph documents represent another distinctive type of STATISTICA documents, and they offer rich functionality both in terms of the variety of ways in which graphs can be created in STATISTICA and in the selection of graph customization tools.
Similar to the other STATISTICA documents, graphs are ActiveX containers, which means that they can contain a variety of compatible documents (e.g., Visio drawings, Adobe illustrations, Excel spreadsheets, etc.). STATISTICA Graphs are also ActiveX objects and, therefore, can be linked to or embedded into other compatible documents (e.g., Word Documents) where they can be in-place edited by simply double-clicking on them.
Macros (STATISTICA Visual Basic Programs)
The industry standard STATISTICA Visual Basic language (integrated into STATISTICA) offers another (alternative) user interface to the functionality of STATISTICA, and it offers incomparably more than just a “supplementary application programming language” that can be used to write custom extensions. STATISTICA Visual Basic takes full advantage of the object model architecture of STATISTICA and is used to access programmatically every aspect and virtually every detail of the functionality of STATISTICA. Even the most complex analyses and graphs can be recorded into Visual Basic macros and later be run repeatedly or edited and used as building blocks of other applications. STATISTICA Visual Basic adds an arsenal of more than 13,000 new functions to the standard comprehensive syntax of Microsoft Visual Basic, thus comprising one of the largest and richest development environments available.
STATISTICA Macros can be saved in several formats, depending on how you intend to use them. You can also copy them to the Clipboard and paste them into other programs as documents.
StatSoft Poland recently concluded its popular, annual series of data mining / data analysis seminars during October. Drawing roughly 650 attendees, this year’s presentations centered on improvement of production processes, covering applications in manufacturing and scientific research, as well as specific use cases by StatSoft customers and demonstrations of STATISTICA’s broad data mining capabilities.
Attendees learned of the collaboration between StatSoft and AMCOR Flexibles Reflex to develop an integrated SPC quality system with STATISTICA. An industrial packaging company, AMCOR’s complex production processes consist of several stages, so AMCOR required a monitoring platform that could meet specs for security, performance, and quality while offering easy access to information that could produce thorough reports on any production aspect necessary. AMCOR’s Justin Sikorska described how his company found that STATISTICA can readily deliver real-time process monitoring; produce summary tables and graphs based on past data; issue alerts for adverse events; execute SPC monitoring to measure regulation and stability; anticipate failure through predictive maintenance models; and track raw product components through all manufacturing stages (i.e. product traceability). Sikorska also described AMCOR’s pleasure with STATISTICA’s “clear and user-friendly environment.”
Also featured was Luke Depczyński of Saint-Gobain ADFORS, an industrial fabric and construction reinforcement company. ADFORS’ manufacturing plant, located in Gorlitz, requires continuous monitoring of changes in process parameters and final product properties due to high production volume and the specific, individualized requirements of its multiple customers. Mr. Depczyński described how Saint-Gobain ADFORS selected StatSoft to develop a suitable quality control system with STATISTICA and presented an overview of their successful integration.
StatSoft Poland’s seminars were conducted in Warsaw, October 22-24, 2013, at Hotel Gromada Warszawa Airport.
The STATISTICA Document Management System (SDMS) is a complete, highly scalable, database solution package for managing electronic documents.
The product enables you to quickly, efficiently, and securely manage documents of any type (e.g., find them, access them, search for content, review, organize, edit [with trail logging and versioning], approve, etc.).
The key features include:
- Extremely transparent and easy to use
- Flexible, customizable (can be optionally configured for Web-enabled access) user interface
- Electronic Signatures
- Comprehensive Audit Trails, Approvals
- Optimized Searches
- Satisfy the FDA 21 CFR Part 11 Requirements
- Satisfy the Sarbanes-Oxley Legislation Requirements
- Satisfy ISO 9000 (9001, 14001) Documentation Requirements
- Unlimited scalability (from desktop or network Client-Server versions, to the ultimate size, Web-based worldwide systems)
- Open Architecture and Compatibility with Industry Standards
The STATISTICA Document Management System (SDMS) complies with the following:
The general requirements put forth in the Code of Federal Regulations (CFR) Title 21 Part 11 specify what a business needs to do in order to maintain electronic records acceptable for submission to the FDA (Food and Drug Administration).
Sarbanes-Oxley Legislation imposes new, extensive reporting and record keeping requirements on all publicly-traded companies and mandate that Executives of those companies take personal responsibility for the procedures of collecting data for the company’s financial Reports and for the integrity of their contents. In order to comply with the requirements, companies need flexible software systems that facilitate record keeping and document management in a secure and efficient manner.
Guidelines for manufacturing in general (often collectively known as ISO 9000 standards) have been published by the International Organization for Standardization (e.g., see ISO 9001 4.5: Document and data control; also ISO 14001, Ch. 4.5.5.).
Integrates with all STATISTICA products
STATISTICA Document Management System (SDMS) seamlessly integrates with all STATISTICA products, from Base and Advanced to enterprise-wide installations such as STATISTICA Enterprise worldwide installations or STATISTICA Enterprise/QC for process analysis and quality control/improvement.
You can easily access all SDMS functionality from within your STATISTICA projects (e.g., all analysis projects, data mining, text mining, reporting, etc.). So directing your reports or data sets to the secure repository of SDMS is as easy as simply saving a file, because your authentication can be based on your initial log-in into STATISTICA. No entry of additional passwords is necessary.
You can also build the functionality of SDMS into your shortcuts, automated STATISTICA applications, and other custom systems to simplify your work and enhance productivity.
Stand-alone, highly compatible application
SDMS can be used as a stand-alone system. But since SDMS uses COM and SOAP-based architecture, and is compatible with the Microsoft WebService interface, it can also be called from other applications, integrated into existing systems, or expanded by adding custom functionality.
Compatibility with other standards
Please also inquire about the compatibility of STATISTICA Document Management System (SDMS) with the Open Document Management API (ODMA) standard, and the interfaces and support for the Web-based Distributed Authoring and Versioning (WebDAV) standard.
The STATISTICA Document Management System (SDMS) is available in an Enterprise Version, or in an Entry Level version (designed for smaller groups of users):
The Enterprise Version can be deployed in one of two ways, depending on whether the user needs to build the SDMS functionality into an existing database system:
SDMS can be configured as a stand-alone complete application driven by a high-performance general database engine based on Microsoft SQL Server.
SDMS can be integrated with an already existing database infrastructure or data warehouse. SDMS is compatible with industry standard database management systems such as Oracle, MS SQL Server, Sybase, Informix, and DB2.
The Entry Level Version is recommended for smaller installations (usually 5 to 10 simultaneous users, depending on the volume of their work). The Entry Level version does not include (or require) a high performance, scalable database engine, because it is based on a fixed database management component built into the product. This makes the Entry Level Version more cost effective, but it is still a fully functional, secure, and large capacity document management system. It can also be easily converted later, as your needs grow, into the fully scalable Enterprise Version described above.
To satisfy the diverse functionality and security requirements of various types of users, the STATISTICA Document Management System (SDMS) implements several options for managing documents:
- SDMS enables you to save documents to a secure repository database from within STATISTICA, WebSTATISTICA, or the stand-alone SDMS application. Its intuitive user interface allows you to easily perform all document management operations from any computer on your network, or even via the Internet.
- Most document types can be automatically maintained in both (a) the archival, non-editable “review-only” PDF format, with the appropriate electronic signatures, and (b) the editable “source” format that allows those with the appropriate access privileges to create new, modified versions of the document. None of the edits or changes, however, will ever overwrite the source file of the previous version–they will only add a new file to the repository.
- Strict security via electronic signatures (compliant with 21 CFR Part 11 and Sarbanes- Oxley Legislation requirements) is enforced. Different individuals or groups of users can be authorized to create, edit, or review documents in different parts of the archive.
- Documents in the archive cannot be deleted by end-users. Every time a document is edited, a new version is created and logged. The log will contain annotations to identify the time and the author of the modifications. SDMS can be configured to include other information in the log as well.
- The program is configured so that no information is ever discarded. Previous document versions, document histories, logs, etc. are all preserved.
- Documents can be locked to prohibit any further editing.
- Approval trail requirements can be established, so that documents must be reviewed, approved, and signed (via electronic signatures) by designated supervisors before they can be placed in designated parts of the repository.
- A complete audit trail of all document changes is automatically created. The audit trail can be printed, or saved in electronic form, and then submitted to regulatory bodies or agencies.
- To satisfy formatting requirements for electronic submission of records, various options are available for maintaining renditions in PDF and XPORT file formats (see FDA “Guidance for Industry: Providing Regulatory Submissions in Electronic Format – General Considerations”).
The STATISTICA Document Management System (SDMS) is not only a flexible, high-performance system that will increase your productivity by facilitating the management of crucial documents. SDMS also ensures compliance with the requirements of regulatory agencies, such as FDA 21 CFR Part 11, Sarbanes-Oxley Legislation, and ISO 9000.
Security, Electronic Signatures
- The STATISTICA Document Management System requires that passwords contain more than 6 letters and not to be of a “common” type, e.g., “111111” is not allowed.
- Passwords can be configured by the administrator to expire, so that users are forced to change passwords at regularly scheduled intervals.
- The system applies automatic user-lockout and maintains records for the administrators when a certain number of attempts were made to log into the system with the wrong password.
- The STATISTICA Document Management System allows you to define users, and groups of users, with appropriate privilege. Types of privileges include the permission to create documents, edit documents, review documents, approve documents, and so on.
Version Control and Audit Trails
- In the STATISTICA Document Management System, everything is documented and traceable. For example, documents are never deleted. When a document is edited, then a new version of that document is created, properly authenticated, and annotated with electronic signatures. Authorized and authenticated users can be required to explicitly check out the respective documents from the repository, and check the new versions into the repository with notes and documentation regarding the nature and purpose of the edits.
- When a document is checked in, the program can be configured to perform various verification and documentation operations. For example, it may require the user to complete a check-list stating the purpose of the edits, or a brief summary of the edits. The system is fully customizable during installation, so that annotations, signatures, or other requirements associated with the creation or editing of documents can be enforced.
- Summarization options allow authorized users to review the complete audit trail for requested documents.
- To help ensure compliance with regulatory requirements, different version of documents will persist indefinitely and cannot be deleted by end users.
- Options are available to perform simple or complex searches of the documents, and their various versions.
Recommended (and FDA Approved) Archival Document types
One of unique strengths of the STATISTICA Document Management System is its ability to store and exchange information in almost any electronic file format, including your proprietary formats. This allows you to share information internally in the ways that are most convenient for your organization. It also makes it possible to share documents externally by using practically all industry standard formats and protocols.
In particular, SDMS allows you to save data and reports as PDF files or XPORT files. These formats are the preferred file formats that are recommended in the FDA “Guidance for Industry: Providing Regulatory Submissions in Electronic Format – General Considerations.”
Like the entire STATISTICA system, the STATISTICA Document Management System is highly configurable, and its functionality is very compatible with other applications. So the system can be customized to accommodate your specific tasks, and can be integrated seamlessly into existing systems for data and document management.
The StatSoft presentation, “Addressing Privacy Concerns: Critical Features for Predictive Analytics Platforms,” highlighted the role of model and data governance, a subject often neglected in predictive modeling discussions despite its importance as a driver of software requirements.
StatSoft’s VP of Analytic Solutions, Dr. Thomas Hill, prepared the content in light of recent media coverage of invasion-of-privacy concerns stemming from exhaustive and effective data mining. Taking the stance that enterprise analytics platforms must support features allowing the implementation of security policies and rules, Hill provided an overview of STATISTICA Decisioning Platform®’s key features that have made it a favorite in highly regulated industries.
Senior Statistician Dr. Gary Miner took part in the invitation-only “Big Data Expert Panel,” moderated by PAW Founder Eric Siegel. And Carleton Jones, StatSoft’s Director of Financial Services, was a big hit with the audience and received an unprecedented standing ovation after his brief presentation.
See our short PAW-Boston photo gallery on Facebook.
- Visual – The project is laid out visually to show the workflow from input data to the results.
- Repeatable – Run the Workspace multiple times as data update or even on new data sets.
- Reproducible – Project steps are laid out visually and can be explored to see exactly what was done to obtain the results.
- Flexible – The same analysis options are available in the Workspace that you have in the original interactive analyses.
- Customizable – Custom nodes can be created for the Workspace and shared with colleagues.
- Highlighting the input data node to be used for input before selecting the analysis node will automatically connect the input data to the node.
- Options such as Run to node and Run modified nodes make it possible for you to execute only portions of the Workspace at a time.
- Right-click on a connection and select Disable to temporarily avoid an analysis connection and everything downstream from it.
- The Workspace node can have up to five icons on it to perform actions such as:
View the reporting documents (upper-right)
Show the node is available for a new connection for downstream analysis (center-right)
View the output spreadsheet for downstream analyses (lower-right)
Run the Workspace (lower-left)
Edit parameters (upper-left)
- On the Workspace toolbar, click Node Browser
- On the Node Browser toolbar, click Options
- In the Browser Options dialog box, click the Restore Defaults button to remove the customizations and show the Beta Procedures and other standard lists for Version 12.
Many analytics professionals have high hopes for big data, but speakers at the Predictive Analytics World conference struck a decidedly cautious tone when discussing the concept as it relates to building predictive models.
“To me, big data is just a hot-flash term, but it’s nothing new to us,” said Gary Miner, senior statistician and data-mining consultant at StatSoft.
If you’re going to make sense of data, you need to sort through the noise, and you’re going to end up with a smaller data set.
senior statistician and data-mining consultant, StatSoft
There is still disagreement around what the term big data actually means. The most common definitions talk about high data volume, velocity and variety. But the precise volume needed to qualify a data set as “big” is imprecise. Miner said some people think several terabytes of data qualifies as big, while others say it takes hundreds of terabytes.
Either way, he feels the importance of big data has been overblown. He said it is possible to find some really telling correlations in rather small data sets. For example, he talked about how some medical breakthroughs have come out of trials involving fewer than 100 patients. This is because smaller, more refined data sets often make it easier to single out the trend in the noise.
The fact that storage space is getting cheaper has led many in the analytics world to ponder the possibilities that may come from analyzing whole data sets, but Miner said you typically get better results more quickly by using randomized samples from data sets.
“If you’re going to make sense of data you need to sort through the noise, and you’re going to end up with a smaller data set,” Miner said.
Michael Berry, analytics director at TripAdvisor for Business, said the current interest in big data comes from a desire on the part of businesses to implement a single piece of technology that solves multiple problems. He said vendors have been glad to play into this desire, promising that their big data software will greatly simplify business analytics projects. But he said this drive for an easy, simple solution is mostly a fantasy.
“While it’s never been true, it makes a good sales pitch,” he said.
Instead of hoping that big data software will solve every analytics problem, Berry recommended working to improve predictive models. The variables that define a predictive model ultimately matter more than the amount of data fed into the model.
And adding more data may simply increase the time it takes to reach new insights, Berry said. When analyzing data sets, patterns often reveal themselves quickly. If a pattern becomes apparent after analyzing 100 data points, there is no need to continue analyzing 100,000 more data points. The pattern will still be there. All you will have done is lengthen the project. Adding more data may simply lead to diminishing returns.
But not everyone was quite so bearish about big data. Peter Amstutz, analytics strategist at advertising agency Carmichael Lynch, said it is important, when developing predictive models, to collect data containing as many variables as possible. Sometimes it may be possible to accumulate information on a broad set of variables from a single source of standardized records, but often an organization will need to collect large amounts of less structured data. This is where the idea of big data can be helpful.
Learn more about developing predictive models
See what kind of skills you need on your IT team
Read this definition of predictive modeling
Learn why predictive modeling projects fail
Amstutz recently helped Subaru implement an uplift modeling project that allows the car manufacturer to target its ad buys more effectively. Amstutz said he is always looking for new data sources that might contain information on consumer attributes that are relevant to building the profile of a consumer who may be receptive to Subaru’s advertising. By looking at a greater number of variables, the advertising agency can precisely pinpoint the type of consumer who is likely to buy a Subaru.
It’s not so much the amount of data that’s important as it is the quality of the data. Eric Feinberg, senior director of mobile, media and entertainment at analytics vendor ForeSee, said large volumes of data are generally only helpful if they are standardized and accurate.
He added that the benefits of big data analytics vary greatly by industry. In studying sales trends, outliers that become apparent by studying full data sets may just add noise to the model, making it hard to find the true trend. But Feinberg pointed out that the outliers are exactly what analysts are looking for in fraud detection. So sales forecasting may work fine when using small samples, while fraud prevention efforts can benefit from big data analytics.
On the other hand, more traditional methods may work even better. Feinberg used the example of a medical device company that wants to build a better profile of its cardiologist customers. It could gather a large data set to find characteristics of likely buyers. Or it could simply pay cardiologists to participate in a focus group.
“That, in many cases, does the same thing,” Feinberg said. “It’s harder, it takes more time, but the outcome is a mature data set.”