Purpose and Advantages of In-Place Database Processing (IDP)
The In-Place Database Processing (IDP) is an advanced database access technology developed at StatSoft to support high-performance, direct interface between external data sets residing on remote servers and the analytic functionality of STATISTICA products. The IDP technology has been developed to facilitate accessing data in large databases using a one-step process which does not necessitate creating local copies of the data set. IDP significantly increases the performance of STATISTICA; it is particularly well suited for large data mining and exploratory data analysis tasks. IDP technology also provides a security advantage in that data never leave the secure database (remain in the database at all times).
The speed gains of the IDP technology – over accessing data in a traditional way – result not only from the fact that IDP allows STATISTICA to access data directly in databases and skip the otherwise necessary step of first importing the data and creating a local data file, but also from its “multitasking” (technically, asynchronous and distributed processing) architecture. Specifically, IDP uses the processing resources (multiple CPUs) of the database server computers to execute the query operations, extract the requested records of data and send them to the STATISTICA computer, while STATISTICA is simultaneously processing these records as they arrive.
Compatibility with STATISTICA products
The IDP technology can be used with both desktop and enterprise versions of STATISTICA products and it is fully compatible with the Client-Server architecture of STATISTICA Enterprise Server (the requests can be made over the Web and data processed asynchronously by STATISTICA Enterprise Server computers connected to the (next-tier) database server computers which will execute the queries). IDP is also optimized to seamlessly integrate with STATISTICA Data Miner which supports multiple IDP data input channels.
Architecture and Programmability
The IDP technology is implemented around a COM object which wraps an instance of a Microsoft Active Data Object (ADO) Recordset object and implements a subset of the Spreadsheet COM interface in the STATISTICA Object Library. This works because all STATISTICA Analyses access the source Spreadsheet data via the Spreadsheet interface. (Actually the InputSpreadsheet interface, which has a subset of the Spreadsheet interface methods. This InputSpreadsheet interface is normally hidden in the Object Browser but can be seen by right-clicking in Object Browser and selecting “Show Hidden Members”.) Therefore, to a STATISTICA Analysis, the IDP looks just like a Spreadsheet. Indeed, advanced users of STATISTICA could wrap an InputSpreadsheet interface around any data source at all, and perform STATISTICA Analyses on it programmatically via the STATISTICA Object Model.
Behind the scenes, certain steps must be taken by the spreadsheet wrapper object to make Analyses work seamlessly. For instance, if an Analysis requires the number of cases in a Recordset before that information is known, then either a separate “count” query will be executed synchronously (i.e., the analysis must wait until the count query returns before continuing) and the result returned to the analysis, or some arbitrary upper bound on the case count will be returned immediately. This behavior is configurable on the IDP page of the STATISTICA options dialog. Also, if using a forward only cursor (see below) and the Analysis must make multiple passes through the data or access the data in random order, then any request for a previous case (row) forces the IDP to requery the database and advance the cursor forward to the requested case, since the cursor may not be scrolled backwards. The Analysis would simply wait until this process is completed and the requested data were provided to it.
IDP Type Library – Two Main Interfaces
DBTable provides programmatic access to the IDP Document, much as the Macro, Graph, and Spreadsheet interfaces provide access to STATISTICA Macros, Graphs, and Spreadsheets. In addition to the standard document methods and properties (Visible, Activate, Close, etc) it provides access to all IDP specific options (cursor type, location, query string, etc.) Its read-only property “Spreadsheet” returns the Spreadsheet wrapper around the ADO Recordset.
The second interface is DBSpreadsheet. This interface is used internally by the IDP to create the Spreadsheet wrapper object, and could also by used by users writing their own macros or programs, although in most cases the DBTable interface is sufficient and will itself use a DBSpreadsheet object. This interface has two methods, Open and CreateNew. Open executes the supplied query and opens an ADO Recordset. It creates a Spreadsheet wrapper object and attaches the ADO Recordset to it, and returns this Spreadsheet object. CreateNew creates a Spreadsheet wrapper object which is not attached to any Recordset and therefore is not useable until you call its “SetRecordset” method to attach an ADO Recordset object of your own creation.