Skip to main content

EMBL/FASTA Wrapper for WebSphere Information Integrator

A tool that enables (by using SQL) access, retrieval, and federation of bio-sequences and data stored in flat-file, specialized data sources in either EMBL or FASTA format.

Date Posted: February 21, 2008

alphaworks tab navigation

 

What is EMBL/FASTA Wrapper for WebSphere Information Integrator?

A huge amount of bioinformatical data is currently stored within flat files in numerous specialized databases, both public and private, set up in past decades by research institutions and laboratories all around the world. Accessing information stored in such databases is not a trivial task because specific bioinformatic tools and algorithms are required. Moreover, these tools often expose non-standard interfaces and query languages that make difficult such activities as integration or interoperation of these data sources. Among the dozens of formats for protein and DNA sequences stored in flat files, EMBL and FASTA certainly are widely used.

EMBL/FASTA Wrapper for WebSphere® Information Integrator enables applications as well as users to retrieve protein and DNA sequences and data stored in flat-file databases in either EMBL or FASTA format by using SQL as the query language. Thanks to key features provided by WebSphere Information Integrator and this non-relational wrapper, users can virtually "bring" EMBL/FASTA data into their federated databases in order not only to query a single, flat-file database, but to compose "federated queries" actually solved against both relational and non-relational data sources.

EMBL/FASTA Wrapper for WebSphere Information Integrator allows handling of flat-file data sources just as relational databases are handled (for example, the LIKE SQL statement is supported for querying non-sequence parts of EMBL and FASTA entries.)

How does it work?

EMBL/FASTA Wrapper for WebSphere Information Integrator consists of two components:

When users of the federated database send an SQL query against a nickname mapping an EMBL/FASTA data source, the wrapper will communicate with the corresponding server component through TCP/IP to propagate (part of) the query; the server component will reply with the appropriate result set gathered from the mapped data source files. WebSphere Information Integrator will use this result set to arrange query results.

About the technology author(s)

Pietro Leo is an executive IT (information technology) architect at IBM Global Business Services (GBS) Innovation Center in Bari, Italy. He is a permanent member of the IBM Italy Technical Expert Council, which is affiliated with the IBM Academy of Technology. His areas of expertise include data, information and application integration, unstructured information management, mining and semantic/conceptual indexing and search, bioinformatics, healthcare, and wireless solutions. Mr. Leo holds a higher artistic degree in oboe from the Music Conservatoir of Lecce (Italy); a computing science degree from the University of Bari (Italy); an advanced computing science degree from University of Udine (Italy); a Master of Science by Research degree in applied artificial intelligence from University of Aberdeen (UK), and a master's degree in public funding management for business from tax consulting firm (Italy). He has been an invited speaker at industrial and scientific conferences as well as a member of scientific committees of conferences. Mr. Leo is the author or co-author of more than 45 scientific or technical publications for journals, has presented during national and international conferences, and is the co-author of two books edited by IBM Redbooks.

Gaetano Scioscia is an IT architect at IBM GBS Innovation Centre in Bari, Italy. He graduated in 1995 in physics and earned a Ph.D. in theoretical physics in 1999 in Bari. In 1998, Dr. Scioscia joined IBM, where he worked as an IT specialist and then an IT architect in such fields as data and application integration, and information and knowledge management. During recent years, his work has been focused on bioinformatics.

Graziano Pappadà (contractor) is a bioinformatics scientist in Italy. He graduated in biology and, since 2002, has worked on several bioinformatics projects in collaboration with the University of Bari and IBM Innovation Centre.

Vincenzo Quinto (contractor) is an IT specialist with extensive experience in Java technologies and bioinformatic applications. He worked as an IT consultant for IBM Innovation Centre focusing on concerns related to biological data access and aggregation.

Trademarks