Text Analytics Tools and Runtime for IBM LanguageWare
An Eclipse application for building custom language analysis into IBM LanguageWare resources and their associated UIMA annotators.
Date Posted: December 7, 2006
|
|
 |
 |
|
 |  The best way to get started with LanguageWare is to install LanguageWare Resource Workbench and follow the steps outlined in the Getting Started Guide. This gives a great introduction to the capabilities of our products and helps get users up and running immediately. | | |
 |  You can contact the LanguageWare team by e-mail. You can ask the team questions, obtain the latest dictionaries, or request prices for commercial licenses. We appreciate any feedback about our products and will try to help as much as possible. However, because our focus is on supporting commercial users of our products, we only have limited resources to help alphaWorks® users. Please be patient: We try to answer all questions, but it might take some time.
| | |
 |  The LanguageWare Runtime, all the code contained within the package, and the generated Annotator code is provided on alphaWorks for evaluation purposes only as a complimentary download for a 90-day trial period. The purpose of this alphaWorks download is to allow you to evaluate the technology, to get a feeling for how it works and whether it might be useful to you, and to share with us your feedback and suggestions on how we could improve the technology in order to speed up its development.
| | |
 |  This version
- integrates state-of-the-art part-of-speech (POS) tagging capabilities for English, Japanese, and Chinese
- provides users with an easy-to-use interface for developing sophisticated shallow parsing rules. These rules enable context-sensitive text analysis. This analysis enables identification of facts/entities and relationships in text.
- provides users with an easy-to-use interface for developing dictionaries
- - allows dictionaries, the POS tagger, and shallow parsing rules to be packaged and deployed as UIMA annotators
- - allows users to generate dictionaries in a new format, which improves both performance and size
- contains an improved format of multi-word unit dictionaries, which provides new capabilities in fuzzy matching.
Please consult the relevant documentation for more detailed information. | | |
 |  The Workbench generates the UIMA annotator code, configuration files, and LanguageWare resources required. The deployment packaging is not yet automated; instructions for extracting the required files are included in the Getting Started Guide. However, the alphaWorks license allows you only to use this code for evaluation purposes. | | |
 |  LanguageWare annotators are compatible with Apache UIMA. The annotators have been tested against Apache UIMA, Version 2.1. They should work with newer versions of Apache UIMA; however, they have not been extensively tested for compatibility.
The LanguageWare annotators are not compatible with versions of UIMA prior to 2.1. These were released by IBM and have namespace conflict with Apache UIMA. | | |
 |  A variety of documentation is embedded as part of the package:
- context-sensitive, online help that describes the features (embedded in the Workbench under Help\Help Contents)
- Getting Started Guide, which walks you through use of LanguageWare Resource Workbench
- Advanced Guide, which covers more advanced topics such as generating dictionaries based on XML files
- LanguageWare UIMA Annotator Guide, which describes the UIMA Annotator that generates both standard LanguageWare annotation and custom annotations based on dictionary content as built by the Workbench.
More detailed information about the underlying APIs will be provided for fully-licensed users of the technology. | | |
 |  We built the Workbench on Eclipse because it provides a collaborative framework through which we can share components with other product teams across IBM, with our partners, and with our customers. This version of the Workbench is a complete, stand-alone application. However, users can still get the benefits of the Eclipse IDE by installing Eclipse features into the Workbench. Popular features include the Eclipse CVS feature for managing shared projects and the Eclipse XML feature for full XML editing support. See the Eclipse online help for more information about finding and installing new features. | | |
 |  Because this is an alphaWorks release, you might encounter some problems or limitations. Please refer to the ReleaseNote.htm file (located in the directory where the Workbench was installed) for the full list of known limitations. | | |
 |  The Runtime package includes
- lib folder: all the libraries needed in order to use LanguageWare
- IBM-dictionaries folder: the latest lexical analysis dictionaries for a number of languages. These are just a small sample; LanguageWare supports many other languages.
- doc and JavaDoc™ folders: documentation covering advanced information about using LanguageWare
- samples folder: samples showing how LanguageWare can be used
- license folder: license for using components in the Runtime package
| | |
 |  LanguageWare provides many run-time libraries. Although each of these libraries provides discrete functionality, many libraries build on the functionality provided by the core LanguageWare libraries. The following is a non-exhaustive list of the libraries and their functions.
- dlt.jar, rule_dlt.jar and icu4j.jar: provides core functionality, such as lexical analysis, dictionary look-up, and spelling correction
- tagger_dlt.jar: provides part-of-speech tagging; requires the lexical analysis libraries mentioned above
- dltls.jar: provides support for ontology-based semantic analysis of documents
- an_dlt.jar, an_tagger_dlt.jar: used for running LanguageWare annotators in a UIMA pipeline
- jfst.jar, antlr.jar: used for running the rule-based annotator
- jdemo.jar: used by the sample applications
- DictionaryBuilder.jar: used to build LanguageWare dictionaries from the command line. Building dictionaries using LanguageWare Resource Workbench is recommended instead. Several supporting JAR files are required; they are included in the lib directory.
| | |
 |  There are several dictionaries included in the run-time environment. The latest official lexical analysis dictionaries are included in the IBM-dictionaries folder. In addition, dictionaries required for running the sample applications are stored in the samples\SampleDictionaries folder. Users can request the most recent dictionaries (if they are not present in the IBM-dictionaries folder) by contacting LanguageWare. | | |
 |  The LanguageWare Runtime libraries can be used as part of a Java application. Several sample applications are included in the samples directory. These give an idea of some useful applications.
The libraries can also be used to create custom annotators based on LanguageWare. The preferred way to generate these annotators is by using LanguageWare Resource Workbench. However, it is possible to manually create LanguageWare-based UIMA annotators. UIMA requires XML descriptors for annotators. If you have downloaded LanguageWare Resource Workbench, you can use it to generate a PEAR file based on the sample rule-based annotator. This PEAR file contains descriptors for the core LanguageWare annotator, the POS Tagger annotator, and the rule-based annotator, as well as a descriptor for running all these together. The PEAR file also contains the libraries and resources required for running the annotators. See the Getting Started Guide for information about generating and installing a PEAR file. | | |
 |  The documentation for the run-time environment can be found in the doc folder of LW70.zip. This file contains the User Guide for LanguageWare in HTML and PDF formats. The Javadoc folder contains documentation on the LanguageWare APIs. The documentation for LanguageWare Resource Workbench might also be of use.
Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.
IBM, LanguageWare, and alphaWorks are trademarks of IBM Corporation in the United States, other countries, or both.
Other company, product, or service names may be trademarks or service marks of others.
| |
|
|
 |
|
For platform(s):
Java
|
 |
For topics:
Administration, analysis, business intelligence, data mining, Eclipse, globalization, Java technology, Natural Language, Parsers, Search, semantics, UIMA, XML
|
|
| |