IBM®
Skip to main content
    United States change      Terms of use
 
 
Select a scope:    
     Home      Products      Services & industry solutions      Support & downloads      My account     
alphaWorks  >  Information management  >  

CliniMiner

A demonstration of undirected data-mining method for detecting unexpected relationships in large data sets.


Date Posted: May 6, 2004
Overview

What is CliniMiner?

CliniMiner is a demonstration of undirected data-mining method for detecting unexpected relationships in large data sets. This tool discovers and predicts unexpected qualitative and quantitative phenomena in large data sets by unsupervised, that is, undirected, data mining where the combinations of items to examine explodes. By "explodes" is meant that exhaustive combinations of statistical tests or directed queries could take millions, in some case zillions, of years. Such difficulties show up particularly in analysis and use of genomic and clinical data, and they represent a major bottleneck in information-based medicine. Extensive archives of records of just a mere 100 parameters or columns could, in the worst case, contain 10 to the power of 29 combinations, which could appear as significant "rules" in any data-mining output.

In contrast, if one were simply to test a hypothesis in the classical way, that is, if one suspected what was interesting in advance, it could be tested in seconds by statistical tests or tools such as DiscoveryLink; hence, CliniMiner complements these kinds of methods. Pruning heuristics and other shortcuts are possible in CliniMiner and continue to be developed. For example, "rules" are not even explored in CliniMiner if the abundance of the component items or events predicts that the "rules" would have insufficient information content. The output data are estimates, and bias is in favor of not missing a potentially useful discovery, hence classical methods must be applied in order to verify the discoveries.

This tool combines aspects of data mining, information theory, and number theory (and even ultimately quantum theory) that directly address the hot mathematical topic known as "The Theory of Expected Information," more recently referred to as "Zeta Theory" (ZT).

How does it work?

CliniMiner includes the following features:

  • Enables undirected/unsupervised data mining against heterogeneous, wide, multidimensional numeric, structured and unstructured data for combinatorial analyses, which is one of the critical unmet industry needs, especially for pharmacogenomic and clinical genomic research and related areas.
  • Enables studies on negative associations as well as positive associations and covariances against multidimensional, heterogeneous data.
  • Handles high-dimensional, nonrectangular, sparse data such as lists, sequences, sets, and collections, as well as rectangular data such as tables, spreadsheets, and multidimensional arrays.
  • Augments other data-mining tools, such as iMiner, by enabling the capabilities for positive/negative association/covariance studies against multidimensional, structured/unstructured, and sparse data, coping with combinatorial explosion.
  • Feeds its analysis results produced in various formats (for example, XML or tabular) to other applications, such as Spotfire and SAS, for subsequent analyses.

Note: The former name for CliniMiner was FANO; this working name still appears in the demo.


About the technology author(s):
Barry Robson B.Sc. (Hons), Ph.D. D.Sc. (IBM Distinguished Engineer), is Strategic Advisor and PIC Chair to the Computational Biology Center and in matters of medicine at IBM's T. J. Watson Research Center, Yorktown Heights, NY. There he played a key role in proposals leading to IBM's DiscoveryLink, Blue Gene protein science, and Secure Health and Medical Access Network (S.H.A.M.A.N.) projects. Mr. Robson's full bio is available here.

View demo View demo

Related technologies

For platform(s):
Windows 95, Linux, UNIX

For topics:
data mining, life sciences


Related resources

Press Articles

 

    About IBM Privacy Contact