IBM®
Skip to main content
    United States change      Terms of use
 
 
Select a scope:    
     Home      Products      Services & industry solutions      Support & downloads      My account     
alphaWorks  >  Java technology  >  

Advanced Pattern Search Toolkit for Sequential Data

A group of tools that allow users to search for complex patterns in sequential data such as the genome, protein sequences, text, and times series data.


Date Posted: March 16, 2004
OverviewRequirements Download FAQs Forum Reviews

Update: July 8, 2005

New version includes a Java and C++ utility class library, which can be used to develop pattern discovery and recognition applications using ready-made software modules.

What is Advanced Pattern Search Toolkit for Sequential Data?

Advanced Pattern Search Toolkit for Sequential Data is a group of tools that allow users to search for complex patterns in sequential data such as the genome, protein sequences, text, and times series data. The tools can also be accessed via a Java™ API, integrated into existing solutions, or used to build stand-alone applications.

The toolkit assists in mining large data sequences for specific patterns. The tool identifies an ordered set of marker strings, if present, in the sequence. Features such as substitutive matching and spatial marker distribution-based elimination/inclusion are provided.

The current version of the toolkit also provides a Java and C++ utility class library, which can be used to develop pattern discovery and recognition applications using ready-made pattern discovery and recognition software modules.

How does it work?

The tool uses XML to interface with external applications. Input data and search parameters are taken as valid XML, and search results are presented in XML. The core tool may be used as a part of a larger framework. Client code may be written for using the engine to mine for markers and for processing the output results. The tool has also been packaged as a stand-alone application for demonstration purposes.

The toolkit is based on a simple programming model. Developers must merely obtain a handle to the tool that is required and call a method. This process is consistent across all the tools provided. The Core Engine encapsulates the tools that have been provided. The core engine obtains input via an Input Adaptor; this allows the application designer to plug the core engine into a variety of data-storage spaces, such as flat files to object databases. The core engine also uses an Output Adaptor to populate the results.

The core engine assumes that the adaptor provides input conformant to the XML schema of the tool whose service has been requested. Hence it is the responsibility of the developer of the adaptor to ensure that the input provided to the core engine conforms to the predefined format.

Applications that must interface with the core engine must make a synchronous call to the core engine's instance with handles for the input data source and the output data source adaptors. The core engine then calls the input data source adaptor to obtain the search parameters. The core engine then executes the search and populates the results using the output data source adaptor.

Included with the toolkit are articles and a white paper to aid developers in building solutions using the toolkit.


About the technology author(s):

Jagir Hussan is a member of the Technology Incubation Centre at IBM Software Labs, India. His research interests include algorithm development in the areas of pattern recognition; machine learning; data-mining; and application of natural language processing, operations research, and complex systems in medicine and biology.

Snehit A Prabhu is with IBM Software Labs in India. He joined IBM in early 2003 and has since worked on pattern matching, grid computing, and Web services. He is currently involved in the lab's autonomic computing initiatives. His primary technical interests include machine learning, control systems, and artificial intelligence.

Deepak Srinivasa is a member of the Technology Incubation Center at IBM India Software Labs. He formerly worked on bioinformatics algorithms and still maintains an interest in them. Currently, he is focusing on IBM Workplace Client technology and XML technologies. Mr. Srinivasa is one of the leaders in a joint research project being executed by IBM Software and Research Labs, India, entitled Dynamic Workbenches. Apart from software, he also maintains interest in hybrid hardware-software solutions and hopes to contribute to that field.

Chanchal Kumar is a member of the Technology Incubation Center at IBM India Software Labs. He is currently working in the domain of Health Care and Life Sciences (HCLS). Mr. Kumar's interests are primarily in the area of data integration issues for HCLS, but he is also interested in algorithm development for pattern discovery in biomedical data; computational modeling; systems biology; and scientific computing.

Bobichan P. John is a member of the IBM VisualAge C/C++ Components team at IBM Software Labs, India. His interests include object-oriented software development and the area of compilation technology.

Jojan J. Vazhaeparampil was a member of IBM VisualAge C/C++ Components team at IBM Software Labs, India.

Subhas Balappanavar was a member of IBM VisualAge C/C++ Components team at IBM Software Labs, India.


Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.
Other company, product, or service names may be trademarks or service marks of others.

Download now Download now

Related technologies

For platform(s):
Java

For topics:
bioinformatics, life sciences, pattern recognition, analysis


Related resources

Press Articles

 

    About IBM Privacy Contact