IBM®
Skip to main content
    United States change      Terms of use
 
 
Select a scope:    
     Home      Products      Services & industry solutions      Support & downloads      My account     
alphaWorks  >  Information management  >  

IBM Tool for Interactive Text Classification and Labeling

An interactive interface that enables you to create, validate, train, and refine a text classification system.


Date Posted: September 11, 2007
OverviewRequirementsDownload FAQs Forum Reviews

Platform requirements

Operating systems: Windows® XP (tested)

Software:

Installation instructions

The bundled example text classification data set in this package is the Reuters-21578 text classification benchmark data set, hereafter refered to as reuters.

  1. Unzip the zip file under some directory, such as C:\TICL. The following steps assume you have unzipped the file in C:\TICL. If you unzip to a different location, please change following paths accordingly.
  2. Review the directory structure: Your C:\TICL directory will contain the following five subdirectories: configs, data, lib, new, and scripts.
    • C:\TICL\configs: This subdirectory contains all the configuration files under the directory reuters. For each new classification task, create a directory and put all the required configuration files under the same.
    • C:\TICL\data: Inside data, there is a directory reuters that contains training and test data for this classification task. Both traindata and testdata contain ten directories named after the topics; these directories contain actual files. These directories also contain a file classLabels.txt that contains a listing of the class labels. For any new classification task, that structure must be followed.
    • C:\TICL\lib: This subdirectory contains required JAR files.
    • C:\TICL\new: This subdirectory contains the files to be deployed in the Web server.
    • C:\TICL\scripts: This subdirectory contains the scripts for running different modules of the system.
  3. Download and configure dependent software, as follows:
    1. Make sure that Java JDK 1.4 or above is in PATH.
    2. Download UIMA and copy the following JAR files from UIMA\lib to C:\TICL\lib:
      • juru.jar siapi.jar
      • uima_adapter_soap.jar
      • uima_adapter_vinci.jar
      • uima_core.jar
      • uima_cpe.jar
      • uima_examples.jar
      • uima_jcas_builtin_types.jar
      • uima_search.jar
      • uima_tools.jar
    3. Download and unpack Xerces (we have tested with Version 2.9). Copy xercesImpl.jar and xml-apis.jar to C:\TICL\lib.
    4. Installing and configuring Tomcat (or other Web server):
      1. Download and install Tomcat 5.5 (or any other Web server). We assume Tomcat is installed in C:\Tomcat. Again, please change the following paths as appropriate.
      2. Copy C:\TICL\lib\ii.jar to the lib folder of the Web server. In our case, it will be C:\Tomcat\common\lib.
      3. Create a file called root.txt in C:\Tomcat. Type the location of the data directory (in our case, C:\TICL\data) in root.txt; then save and close the file.
      4. Copy the directory C:\TICL\new to C:\Tomcat\webapps\new.
      5. Restart the Web server.
  4. Run initializing test scripts from command line, as follows:
    1. Run C:\TICL\scripts\train.bat. This script will create ten files (test0.txt to test9.txt) in C:\TICL (as the data set is a ten-class data set); a directory xcas_output in C:\TICL\configs; a file dictionary.bin; and protrain in C:\TICL\data\reuters. The script will create a directory models in C:\TICL\data\reuters containing 20 files (features.selected0 to features.selected9 and nbm.mdl0 to nbm.mdl9). The script will create ten files (platt0.txt to platt9.txt) in C:\TICL\data\reuters\models.
    2. Run C:\TICL\scripts\test.bat. There are many test files, so it might take a while to complete. (It took 90 seconds in a 2.13-GHz machine with 1.5 GB RAM). This script will create two files: profact and lists in C:\TICL\data\reuters.

For usage instructions, please see the included readme.txt file.


Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.
IBM and alphaWorks are trademarks of IBM Corporation in the United States, other countries, or both.
Other company, product, or service names may be trademarks or service marks of others.

Download now Download now

Related technologies

For platform(s):
Windows XP

For topics:
Data mining, Databases and data management, Modeling, Utilities


 

    About IBM Privacy Contact