IBM®
Skip to main content
    United States change      Terms of use
 
 
Select a scope:    
     Home      Products      Services & industry solutions      Support & downloads      My account     
alphaWorks  >  Information management  >  

Text Analysis Perspective for DB2 Warehouse

A set of Eclipse plug-ins that allows you to configure and test text analysis engines and use them in warehouse and mining flows created by DB2 Warehouse Edition 9.5.


Date Posted: December 18, 2007
OverviewRequirements Download FAQs Forum Reviews

What is Text Analysis Perspective for DB2 Warehouse?

Text Analysis Perspective for DB2® Warehouse is an Eclipse perspective that can be integrated into the DB2 Warehouse Design Studio. It allows you to quickly and easily configure the analysis engines that are used in text operators.

DB2 Warehouse Design Studio 9.5 provides text operators that can be included into data flows. These operators use UIMA analysis engines to extract concepts and relations from unstructured text. As a result, unstructured information is transformed into a structure that can be analyzed in the DB2 warehouse together with existing structured information by using business-intelligence tools such as reporting tools, tools for multidimensional analysis, or data mining tools. However, in order to deliver meaningful results, the analysis engines must be configured to a particular business problem. Text Analysis Perspective for DB2 Warehouse simplifies this task.

Text Analysis Perspective for DB2 Warehouse provides the following main benefits:

  • the ability to test analysis engines on a custom collection of test documents to evaluate the quality of dictionaries, and regular-expression rules built with the DWE analysis engines, or to evaluate the resources of a third-party UIMA analysis engine; these documents can be extracted database text columns or text documents from the file system
  • the ability to compare analysis results across test runs to determine the impact of changes in your analysis engine
  • the ability to use text search on test documents to identify suitable terms to be included into a dictionary or to find suitable context terms to be used in regular-expression rules.

How does it work?

Text Analysis Perspective for DB2 Warehouse is a set of Eclipse plug-ins that allows users of the DB2 Warehouse Design Studio to configure and test UIMA 1.4.5 analysis engines before they are used in a data flow. These plug-ins build on the UIMA (Unstructured Information Management Architecture) Java™ SDK but do not require knowledge of UIMA itself. The Text Analysis Perspective supports users in all steps involved when configuring an annotator to use unstructured information for a business problem:

  • Create a "Text Analysis Project," which contains the structure and the actions tailored to the text analysis configuration task.
  • Import collections of sample text documents or database columns for testing your annotator configuration.
  • Explore these documents using Lucene-based text search and an Eclipse plug-in for frequent terms analysis in order to understand the information present in the documents.
  • Choose the right UIMA analysis engine for the extraction task. Text Analysis Perspective for DB2 Warehouse includes two built-in analysis engines that allow the extraction of information based on regular expressions and word lists. These annotators are packaged as "Text Analysis Plug-ins," which also contain all editors and viewers necessary for working with these annotators without UIMA skills. Moreover, one can use UIMA processing engine archive (PEAR) files, containing UIMA 1.4.5 compliant annotators.
  • Run the analysis engine on the document collections in order to analyze the documents and extract information. The results are stored in an embedded Derby database for the result evaluation.
  • Understand and compare the results. Text Analysis Perspective for DB2 Warehouse contains Eclipse viewers for viewing the results on the document collection and for comparing results across different runs in order to understand the impact of configuration changes (such as a change to a regular expression rule).
  • Use the configured analysis engine within a DWE warehouse project. By referencing a text analysis project in a warehouse project, all analysis engines and resources of the text analysis project are directly accessible within the warehouse project and can be used in text operators.


About the technology author(s):
Text Analysis Perspective for DB2 Warehouse is a joint project between the Business Intelligence (BI) Development team and the Content Discovery team in the IBM® Development Laboratory Boeblingen (Germany). The main contributors are

  • Alexander Lang, team lead, Content Discovery Solutions
  • Dennis Nienhueser, intern
  • Mathias Rueck, intern
  • Mathias Zapke, user-centered design
  • Andrea Elias, software engineer, Content Discovery
  • Sebastian Nelke, software engineer, Content Discovery
  • Silvia Mesturino, Ph.D., software test engineer, information management
  • Simone Daum, software engineer, BI Development
  • Stefan Abraham, software engineer, BI Development
  • Tong-Haing Fin, software engineer, IBM Research
  • Peter Bendel, architect, BI Development


IBM and DB2 are trademarks of IBM Corporation in the United States, other countries, or both.
Other company, product, or service names may be trademarks or service marks of others.

Download now Download now

Related technologies

For platform(s):
Win32, Windows, Windows XP

For topics:
analysis, Data Analysis, data mining, Eclipse, Java technology, Natural Language, semantics, UIMA, utilities


 

    About IBM Privacy Contact