|
Update: July 8, 2005
New version includes a Java and C++ utility class library, which can be used to develop pattern discovery and recognition applications using ready-made software modules.
What is Advanced Pattern Search Toolkit for Sequential Data?
Advanced Pattern Search Toolkit for Sequential Data is a group of tools that allow users to search for complex patterns in sequential data such as the genome, protein sequences, text, and times series data. The tools can also be accessed via a Java™ API, integrated into existing solutions, or used to build stand-alone applications.
The toolkit assists in mining large data sequences for specific patterns. The tool identifies an ordered set of marker strings, if present, in the sequence. Features such as substitutive matching and spatial marker distribution-based elimination/inclusion are provided.
The current version of the toolkit also provides a Java and C++ utility class library, which can be used to develop pattern discovery and recognition applications using ready-made pattern discovery and recognition software modules.
How does it work?
The tool uses XML to interface with external applications. Input data and search parameters are taken as valid XML, and search results are presented in XML. The core tool may be used as a part of a larger framework. Client code may be written for using the engine to mine for markers and for processing the output results. The tool has also been packaged as a stand-alone application for demonstration purposes.
The toolkit is based on a simple programming model. Developers must merely obtain a handle to the tool that is required and call a method. This process is consistent across all the tools provided. The Core Engine encapsulates the tools that have been provided. The core engine obtains input via an Input Adaptor; this allows the application designer to plug the core engine into a variety of data-storage spaces, such as flat files to object databases. The core engine also uses an Output Adaptor to populate the results.
The core engine assumes that the adaptor provides input conformant to the XML schema of the tool whose service has been requested. Hence it is the responsibility of the developer of the adaptor to ensure that the input provided to the core engine conforms to the predefined format.
Applications that must interface with the core engine must make a synchronous call to the core engine's instance with handles for the input data source and the output data source adaptors. The core engine then calls the input data source adaptor to obtain the search parameters. The core engine then executes the search and populates the results using the output data source adaptor.
Included with the toolkit are articles and a white paper to aid developers in building solutions using the toolkit.
|