Skip to main content

Text Analytics Tools and Runtime for IBM LanguageWare

An Eclipse application for building custom language analysis into IBM LanguageWare resources and their associated UIMA annotators.

Date Posted: December 7, 2006

alphaworks tab navigation


 

Update: June 22, 2009
The LanguageWare Resource Workbench 7.1.1.3 contains significant improvements to its performance and memory footprint when annotating large documents and document collections. It also contains a new dictionary merge capability and some bug fixes.

 

What is Text Analytics Tools and Runtime for IBM LanguageWare?

IBM LanguageWare is a technology which provides a full range of text analysis functions. It is used extensively throughout the IBM product suite and is successfully deployed in solutions which focus on mining facts from large repositories of text. With support for more than 20 languages, LanguageWare is the ideal solution for extracting the value locked up in unstructured text information and exposing it to business applications. With the emerging importance of Business Intelligence and the explosion in text-based information, the need to exploit this “hidden” information has never been so great. LanguageWare technology not only provides the functionality to address this need, it also makes it easier than ever to create, manage and deploy analysis engines and their resources.

It comprises Java libraries with a large set of features and the linguistic resources that supplement them. It also comprises an easy-to-use Eclipse-based development environment for building custom text analysis applications. In a few clicks, it is possible to create and deploy UIMA (Unstructured Information Management Architecture) annotators that perform everything from simple dictionary lookups to more sophisticated syntactic and semantic analysis of texts using dictionaries, rules and ontologies.

The LanguageWare libraries provide the following non-exhaustive list of features: dictionary look-up and fuzzy look-up, lexical analysis, language identification, spelling correction, hyphenation, normalization, part-of-speech disambiguation, syntactic parsing, semantic analysis, facts/entities extraction and relationship extraction. For more details see the documentation.

The LanguageWare Resource Workbench provides a complete development environment for the building and customization of dictionaries, rules, ontologies and associated UIMA annotators. This environment removes the need for specialist knowledge of the underlying technologies of natural language processing or UIMA. In doing so, it allows the user to focus on the concepts and relationships of interest, and to develop analyzers which extract them from text without having to write any code. The resulting application code is wrapped as UIMA annotators, which can be seamlessly plugged into any application that is UIMA-compliant. Further information about UIMA is available at alphaWorks and on the UIMA Apache site

LanguageWare is used in such various products as Lotus Notes and Domino, Information Integrator OmniFind Edition (IBM's search technology), and more.

The LanguageWare Resource Workbench technology runs on Windows and Linux. The core LanguageWare libraries support a much broader list of platforms. For more details on platform support please see the product documentation.

How does it work?

The LanguageWare Resource Workbench allows users to easily:

The Workbench contains the following tools:

The LanguageWare Resource Workbench documentatation is available online and is also installed using the Windows or Linux installers or using the respective .zip files.

What type of application is LanguageWare suitable for?

LanguageWare technology can be used in any application that makes use of text analytics. Good examples are:

For Web-based semantic query of the LanguageWare text analytics, you might be interested in checking out another alphaWorks technology, IBM Data Discovery and Query Builder. When used together, these two technologies can provide a full range of data access services including UI presentation, security and auditing of users, structured and unstructured data access through semantic concepts and deep text analytics of unstructured data elements.

About the technology author(s)

LanguageWare is a worldwide organization comprising a highly qualified team of specialists with a diverse combination of backgrounds: linguists, computer scientists, mathematicians, cognitive scientists, physicists, and computational linguists. This team is responsible for developing innovative Natural Language Processing technology for IBM Software Group.

LanguageWare, along with LanguageWare Resource Workbench, is a collaborative project combining skills, technologies, and ideas gathered from various IBM product teams and IBM Research division.

Trademarks