Search CORE

205 research outputs found

EvoMiner: Frequent Subtree Mining in Phylogenetic Databases

Author: Deepak Akshay
Fernández-Baca David
McMahon Michelle M.
Sanderson Michael J.
Tirthapura Srikanta
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2013
Field of study

The problem of mining collections of trees to identify common patterns, called frequent subtrees (FSTs), arises often when trying to interpret the results of phylogenetic analysis. FST mining generalizes the well-known maximum agreement subtree problem. Here we present EvoMiner, a new algorithm for mining frequent subtrees in collections of phylogenetic trees. EvoMiner is an Apriori-like level-wise method, which uses a novel phylogeny-specific constant-time candidate generation scheme, an efficient fingerprinting-based technique for downward closure, and a lowest common ancestor based support counting step that requires neither costly subtree operations nor database traversal. Our algorithm achieves speed-ups of up to 100 times or more over Phylominer, the current state-of-the-art algorithm for mining phylogenetic trees. EvoMiner can also work in depth first enumeration mode, to use less memory at the expense of speed. We demonstrate the utility of FST mining as a way to extract meaningful phylogenetic information from collections of trees when compared to maximum agreement subtrees and majority rule trees --- two commonly used approaches in phylogenetic analysis for extracting consensus information from a collection of trees over a common leaf set

Digital Repository @ Iowa State University (ISU)

EvoMiner: Frequent Subtree Mining in Phylogenetic Databases

Author: Deepak Akshay
Fernández-Baca David
McMahon Michelle M.
Sanderson Michael J.
Tirthapura Srikanta
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2011
Field of study

The problem of mining collections of trees to identify common patterns, called frequent subtrees (FSTs), arises often when trying to make sense of the results of phylogenetic analysis. FST mining generalizes the well-known maximum agreement subtree problem. Here we present EvoMiner, a new algorithm for mining frequent subtrees in collections of phylogenetic trees. EvoMiner is an Apriori-like level-wise method, which uses novel phylogeny-specific constant-time candidate generation scheme, an efficient fingerprinting-based technique for downward closure operation, and a lowest common ancestor based support counting step that requires neither costly subtree operations nor database traversal. As a result of these techniques, our algorithm achieves speed-ups of up to 100 times or more over phylominer, another algorithm for mining phylogenetic trees. EvoMiner can also work in vertical mining mode, to use less memory at the expense of speed

Digital Repository @ Iowa State University (ISU)

EvoMiner: frequent subtree mining in phylogenetic databases

Author: A Amir
Akshay Deepak
B Mau
B Rannala
B Schieber
C Finden
CC Aggarwal
D Bryant
D Harel
D Vienne De
David Fernández-Baca
E Kubicka
F Geerts
F Lapointe
G Yule
H Liu
J Felsenstein
J Han
J Han
J Pei
J Pei
J Pei
J Slowinski
J Wang
JP Huelsenbeck
L Lewis
M Farach
M Kao
M Sanderson
M Sanderson
M Smith
M Steel
M Zaki
M Zaki
Michael J. Sanderson
Michelle M. McMahon
N Pattengale
R Agrawal
R Cole
R Gray
R Karp
R Motwani
S Barns
S Flint-Garcia
S Guillemot
S Zhang
Srikanta Tirthapura
T Margush
TE Currie
V Daubin
W Goddard
X Wu
X Zou
Y Bei
Y Chi
Y Jia
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Mining frequent closed rooted trees

Author: A. Termier
Albert Bifet
Antoni Lozano
B. Ganter
D. E. Knuth
D. E. Knuth
D. Shasha
G. Valiente
J. Hein
J. M. Plotkin
José L. Balcázar
K. Hashimoto
M. J. Zaki
R. Kohavi
S. Chakrabarti
S. Weiss
T. Beyer
X. Yan
X. Yan
X. Yan
Y. Chi
Y. Chi
Y. Chi
Y. Chi
Y. Xiao
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Pattern discovery in structural databases with applications to bioinformatics

Author: Zhang Sen
Publication venue: Digital Commons @ NJIT
Publication date: 31/01/2005
Field of study

Frequent structure mining (FSM) aims to discover and extract patterns frequently occurring in structural data such as trees and graphs. FSM finds many applications in bioinformatics, XML processing, Web log analysis, and so on. In this thesis, two new FSM techniques are proposed for finding patterns in unordered labeled trees. Such trees can be used to model evolutionary histories of different species, among others. The first FSM technique finds cousin pairs in the trees. A cousin pair is a pair of nodes sharing the same parent, the same grandparent, or the same great-grandparent, etc. Given a tree T, our algorithm finds all interesting cousin pairs of T in O(|T|2) time where |T| is the number of nodes in T. Experimental results on synthetic data and phylogenies show the scalability and effectiveness of the proposed technique. This technique has been applied to locating co-occurring patterns in multiple evolutionary trees, evaluating the consensus of equally parsimonious trees, and finding kernel trees of groups of phylogenies. The technique is also extended to undirected acyclic graphs (or free trees). The second FSM technique extends traditional MAST (maximum agreement subtree) algorithms by employing the Apriori data mining technique to find frequent agreement subtrees in multiple phylogenies. The correctness and completeness of the new mining algorithm are presented. The method is also extended to unrooted phylogenetic trees. Both FSM techniques studied in the thesis have been implemented into a toolkit, which is fully operational and accessible on the World Wide Web

Digital Commons @ New Jersey Institute of Technology (NJIT)

Mining user-generated comments

Author: Gravier Christophe
Laforest Frederique
Subercaze Julien
Publication venue: HAL CCSD
Publication date: 05/12/2015
Field of study

International audience—Social-media websites, such as newspapers, blogs, and forums, are the main places of generation and exchange of user-generated comments. These comments are viable sources for opinion mining, descriptive annotations and information extraction. User-generated comments are formatted using a HTML template, they are therefore entwined with the other information in the HTML document. Their unsupervised extraction is thus a taxing issue – even greater when considering the extraction of nested answers by different users. This paper presents a novel technique (CommentsMiner) for unsupervised users comments extraction. Our approach uses both the theoretical framework of frequent subtree mining and data extraction techniques. We demonstrate that the comment mining task can be modelled as a constrained closed induced subtree mining problem followed by a learning-to-rank problem. Our experimental evaluations show that CommentsMiner solves the plain comments and nested comments extraction problems for 84% of a representative and accessible dataset, while outperforming existing baselines techniques

HAL-UJM

Mining complex structured data: Enhanced methods and applications

Author: Bui Dang Bach
Publication venue: Curtin University
Publication date: 01/01/2015
Field of study

Conventional approaches to analysing complex business data typically rely on process models, which are difficult to construct and use. This thesis addresses this issue by converting semi-structured event logs to a simpler flat representation without any loss of information, which then enables direct applications of classical data mining methods. The thesis also proposes an effective and scalable classification method which can identify distinct characteristics of a business process for further improvements

espace@Curtin

An Algebraic View of the Relation between Largest Common Subtrees and Smallest Common Supertrees

Author: Rossello Francesc
Valiente Gabriel
Publication venue
Publication date: 27/04/2006
Field of study

The relationship between two important problems in tree pattern matching, the largest common subtree and the smallest common supertree problems, is established by means of simple constructions, which allow one to obtain a largest common subtree of two trees from a smallest common supertree of them, and vice versa. These constructions are the same for isomorphic, homeomorphic, topological, and minor embeddings, they take only time linear in the size of the trees, and they turn out to have a clear algebraic meaning.Comment: 32 page

arXiv.org e-Print Archive

Elsevier - Publisher Connector