IBM®
Skip to main content
    United States change      Terms of use
 
 
Select a scope:    
     Home      Products      Services & industry solutions      Support & downloads      My account     
alphaWorks  >  Information management  >  

IBM Frequent Subgraph Miner

A data-mining tool for discovering all frequent substructure patterns from a set of labeled graphs.


Date Posted: February 27, 2004
OverviewRequirements Download FAQs Forum Reviews

Update: March 11, 2004

Updated version fixes bug in extracting vertex and edge label names.

What is IBM Frequent Subgraph Miner?

IBM Frequent Subgraph Miner is a data-mining tool that uses the a priori algorithm for finding related sets of items in transactional data and expands the existing algorithm by graphing structured data. An alternative way to model objects in a data set is to use graphs. In this approach, a labeled graph is used, constructed so that each vertex and edge are independently labeled. When the data set (consisting of a set of labeled graphs and the minimum support) is given as input, this tool derives all frequently-connected subgraphs whose support is more than or equal to the minimum support value in the data set.

IBM Frequent Subgraph Miner can efficiently find connected common substructures that frequently occur in a set of labeled graphs. Graphs are especially useful in databases of chemical structures in chemical compounds. If a set of compounds is given as input, this tool can efficiently find common substructures (subgraphs). The discovered patterns from a set of compounds with a certain activity can then be used to design new drugs.

This technology includes a sample data set, which consists of 340 chemical compounds; a resulting research report, which describes the algorithm used in detail; and complete instructions, which describe how to use this tool.

How does it work?

When the user gives a file consisting of a set of labeled graphs and a minimum support threshold, Frequent Subgraph Miner generates possible, frequently-occuring, connected subgraph patterns in ascending order of size and counts the frequency for each pattern. The tool puts out all connected subgraph patterns that it discovers.

The input to this mining tool is a set of labeled graphs in which each vertex and each edge have labels. Suppose a set of vertices V, a set of edges E, a set of vertex labels LV and a set of edge labels LE are provided respectively as

    V={v1,v2,...c,vk},
    E={(vi,vj)| vi,vj are elements of V, i is not equal to j},
    LV={lb(vi)| for all vi which is an element of V},
    LE={lb((vi,vj))| for all (vi,vj) which is an element of E}

Here, Graph G is expressed as G=(V,E,LV,LE). The number of vertices, |V|, is called the size of graph G. Given an undirected Graph G, if a path exists between any two vertices, G is called a connected graph. Given a set GD of labeled graphs, the support sup(P) of a graph pattern P is defined as a ratio of the number of graph data including P to the total number of graph data in the data set GD. When the data set that consists of graph structured data and the minimum support are given as input, this mining tool efficiently discovers all frequently-connected (induced) subgraphs that have the support greater than or equal to the minimum support value in the data set.


About the technology author(s):
Akihiro Inokuchi works at the IBM Tokyo Research Laboratory in Japan.

Download now Download now

Related technologies

For platform(s):
Linux

For topics:
data mining, frequent pattern


Related resources

Press Articles

 

    About IBM Privacy Contact