Search CORE

8 research outputs found

An Efficient Algorithm for Enumerating Chordless Cycles and Chordless Paths

Author: A. Inokuchi
A.T. Balaban
D. Eppstein
E. Tomita
G.M. Downs
G.M. Downs
H. Satoh
H. Satoh
K. Makino
M. Wild
R.C. Read
S. Kapoor
T. Asai
T. Hanser
T. Uno
Publication venue
Publication date: 01/01/2014
Field of study

A chordless cycle (induced cycle)

C

of a graph is a cycle without any chord, meaning that there is no edge outside the cycle connecting two vertices of the cycle. A chordless path is defined similarly. In this paper, we consider the problems of enumerating chordless cycles/paths of a given graph

G=(V,E),

and propose algorithms taking

O(|E|)

time for each chordless cycle/path. In the existing studies, the problems had not been deeply studied in the theoretical computer science area, and no output polynomial time algorithm has been proposed. Our experiments showed that the computation time of our algorithms is constant per chordless cycle/path for non-dense random graphs and real-world graphs. They also show that the number of chordless cycles is much smaller than the number of cycles. We applied the algorithm to prediction of NMR (Nuclear Magnetic Resonance) spectra, and increased the accuracy of the prediction

arXiv.org e-Print Archive

Crossref

Mining substructures in protein data

Author: Chang Elizabeth
Dillon Tharam S.
Hadzic Fedja
Sidhu Amandeep
Tan H.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

In this paper we consider the 'Prions' database that describes protein instances stored for Human Prion Proteins. The Prions database can be viewed as a database of rooted ordered labeled subtrees. Mining frequent substructures from tree databases is an important task and it has gained a considerable amount of interest in areas such as XML mining, Bioinformatics, Web mining etc. This has given rise to the development of many tree mining algorithms which can aid in structural comparisons, association rule discovery and in general mining of tree structured knowledge representations. Previously we have developed the MB3 tree mining algorithm, which given a minimum support threshold, efficiently discovers all frequent embedded subtrees from a database of rooted ordered labeled subtrees. In this work we apply the algorithm to the Prions database in order to extract the frequently occurring patterns, which in this case are of induced subtree type. Obtaining the set of frequent induced subtrees from the Prions database can potentially reveal some useful knowledge. This aspect will be demonstrated by providing an analysis of the extracted frequent subtrees with respect to discovering interesting protein information. Furthermore, the minimum support threshold can be used as the controlling factor for answering specific queries posed on the Prions dataset. This approach is shown to be a viable technique for mining protein data

CiteSeerX

espace@Curtin

PORSCHE: Performance ORiented SCHEma mediation

Author: Batini
Bernstein
Do
Doan
Ela Hunt
Khalid Saleem
Pluempitiwiriyawej
Rahm
Shvaiko
Zaki
Zohra Bellahsene
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Discovering Frequent Substructures In Large Unordered Trees

Author: Hiroki Arimura
Shin-ichi Nakano
Takeaki Uno
Tatsuya Asai
Publication venue: Springer-Verlag
Publication date: 01/01/2003
Field of study

In this paper, we study a data mining problem of discovering frequent substructures in a large collection of semi-structured data, where both of the patterns and the data are modeled by labeled unordered trees. An unordered tree is a directed acyclic graph with a specified node called the root, and all nodes but the root have at most one parent. Each node is labeled by a symbol drawn from an alphabet. Such unordered trees can be seen as either a generalization of itemsets in relational databases or an efficient specialization of attributed graphs in graph mining. They are also useful in various applications such as analysis of chemical compounds and mining hyperlink structures in Web. Introducing novel definitions of the support and the canonical form for unordered trees, we present an efficient algorithm called Unot that computes all labeled unordered trees appearing in a collection of data trees with frequency above a user-specified threshold. We prove that the algorithm enumerates each frequent pattern T in O(kb n) per pattern, where k is the size of T , b is the branching factor of the data tree, and n is the total number of occurrences of T in the data trees. The keys of the algorithm are e#cient enumerating all unordered trees in canonical form and incrementally computation of the occurrences based on a powerful design technique known as the reverse searc

CiteSeerX

Discovering Frequent Substructures in Large Unordered Trees

Author: Arimura Hiroki
Asai Tatsuya
Nakano Shin-ichi
Uno Takeaki
中野眞一
宇野毅明
有村博紀
浅井達哉
Publication venue: 九州大学大学院システム情報科学研究院情報理学部門
Publication date
Field of study

O(kb^2n)

per pattern, where

k

is the size of

T

b

is the branching factor of the data tree, and

n

is the total number of occurrences of

T

in the data trees. The keys of the algorithm are efficient enumerating all unordered trees in canonical form and incrementally computation of the occurrences based on a powerful design technique known as the reverse search

Institutional Repositories DataBase (IRDB)

Managing and analyzing phylogenetic databases

Author: DEEPAK AKSHAY
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2013
Field of study

The ever growing availability of phylogenomic data makes it increasingly possible to study and analyze phylogenetic relationships across a wide range of species. Indeed, current phylogenetic analyses are now producing enormous collections of trees that vary greatly in size. Our proposed research addresses the challenges posed by storing, querying, and analyzing such phylogenetic databases. Our first contribution is the further development of STBase, a phylogenetic tree database consisting of a billion trees whose leaf sets range from four to 20000. STBase applies techniques from different areas of computer science for efficient tree storage and retrieval. It also introduces new ideas that are specific to tree databases. STBase provides a unique opportunity to explore innovative ways to analyze the results from queries on large sets of phylogenetic trees. We propose new ways of extracting consensus information from a collection of phylogenetic trees. Specifically, this involves extending the maximum agreement subtree problem. We greatly improve upon an existing approach based on frequent subtrees and, propose two new approaches based on agreement subtrees and frequent subtrees respectively. The final part of our proposed work deals with the problem of simplifying multi-labeled trees and handling rogue taxa. We propose a novel technique to extract conflict-free information from multi-labeled trees as a much smaller single labeled tree. We show that the inherent problem in identifying rogue taxa is NP-hard and give fixed-parameter tractable and integer linear programming solutions

Digital Repository @ Iowa State University (ISU)

Data-Mining Techniques for Call-Graph-Based Software-Defect Localisation

Author: Eichinger Frank
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2011
Field of study

Defect localisation is an important problem in software engineering. This dissertation investigates call-graph-mining-based software defect localisation, which supports software developers by providing hints where defects might be located. It extends the state-of-the-art by proposing new graph representations and mining techniques for weighted graphs. This leads to a broader range of detectable defects, to an increased localisation precision and to enhanced scalability

KITopen