Date Posted: April 19, 2007
What is IBM Unstructured Information Modeler?
IBM® Unstructured Information Modeler is designed for use by data analysts on unstructured data sets. An example of such a data set would be problem ticket logs from a computer help desk. The data analysts' task is to find out what the commonly occurring problems are and to write or find solutions that will solve these problems in an automated way. IBM Unstructured Information Modeler helps the data analyst perform this task by automatically classifying the unstructured data set and providing insight into the categories. The tool further allows the user to modify the automatically-created categorization to incorporate any domain knowledge that the user may have in order to make the categorization more sensible. After the classification has been completed, the user can generate reports and create a classification engine for categorizing new problem tickets. In addition, IBM Unstructured Information Modeler can analyze trends by day, week, or month and can analyze correlations against a user-supplied categorical feature.
This version of the software is applicable only to data sets containing between 1000 and 10,000 examples, where each example consists of between one and 20 sentences of unstructured text. Optionally, each example can be provided with a creation date and one or more categorical values (such as machine type). IBM Unstructured Information Modeler was written in 100% pure Java™, so it can run on many different platforms. For the sake of simplicity, a Microsoft® Windows® platform is assumed. IBM Unstructured Information Modeler is enabled by the Java 1.4 Run-time Environment.
How does it work?
The technology is based on "mixed-initiative" data-mining techniques that allow the analyst input at every phase of the mining process. The technology is fully described in a forthcoming book by IBM Press: Mining the Talk: Unlocking the Business Value in Unstructured Information.
About the technology author(s)
W. Scott Spangler is a senior technical staff member who has been researching knowledge base and data mining for the past 20 years; he has been at the IBM Research Lab since 1996. In 1992, he won the prestigious "Boss" Kettering award for technical achievement. He currently works in IBM Almaden Services Research, where he designs and implements new methods for data visualization and text mining. Mr. Spangler holds a B.S. in math from the Massachusetts Institute of Technology and an M.A. in computer science from the University of Texas.
Jeffrey T. Kreulen is Senior Manager of Services-Oriented Technologies and a senior technical staff member at the IBM Almaden Research Center. He holds a B.S. in applied mathematics (computer science) from Carnegie-Mellon University, an M.S. in electrical engineering, and a Ph.D. in computer engineering, the two latter from Pennsylvania State University. Since joining IBM in 1992, Dr. Kreulen has worked on multiprocessor systems design and verification, operating systems, systems management, Web-based service delivery, integrated text and data analysis, and the science of services.
