Search CORE

503 research outputs found

The Weight Function in the Subtree Kernel is Decisive

Author: Azaïs Romain
Ingels Florian
Publication venue
Publication date: 12/04/2019
Field of study

Tree data are ubiquitous because they model a large variety of situations, e.g., the architecture of plants, the secondary structure of RNA, or the hierarchy of XML files. Nevertheless, the analysis of these non-Euclidean data is difficult per se. In this paper, we focus on the subtree kernel that is a convolution kernel for tree data introduced by Vishwanathan and Smola in the early 2000's. More precisely, we investigate the influence of the weight function from a theoretical perspective and in real data applications. We establish on a 2-classes stochastic model that the performance of the subtree kernel is improved when the weight of leaves vanishes, which motivates the definition of a new weight function, learned from the data and not fixed by the user as usually done. To this end, we define a unified framework for computing the subtree kernel from ordered or unordered trees, that is particularly suitable for tuning parameters. We show through eight real data classification problems the great efficiency of our approach, in particular for small datasets, which also states the high importance of the weight function. Finally, a visualization tool of the significant features is derived.Comment: 36 page

arXiv.org e-Print Archive

HAL-ENS-LYON

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

Random enriched trees with applications to random graphs

Author: Stufler Benedikt
Publication venue
Publication date: 08/12/2016
Field of study

We establish limit theorems that describe the asymptotic local and global geometric behaviour of random enriched trees considered up to symmetry. We apply these general results to random unlabelled weighted rooted graphs and uniform random unlabelled

k

-trees that are rooted at a

k

-clique of distinguishable vertices. For both models we establish a Gromov--Hausdorff scaling limit, a Benjamini--Schramm limit, and a local weak limit that describes the asymptotic shape near the fixed root

arXiv.org e-Print Archive

HAL-ENS-LYON

EvoMiner: Frequent Subtree Mining in Phylogenetic Databases

Author: Deepak Akshay
Fernández-Baca David
McMahon Michelle
Sanderson Michael
Tirthapura Srikanta
Tirthapura Srikanta
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2013
Field of study

The problem of mining collections of trees to identify common patterns, called frequent subtrees (FSTs), arises often when trying to interpret the results of phylogenetic analysis. FST mining generalizes the well-known maximum agreement subtree problem. Here we present EvoMiner, a new algorithm for mining frequent subtrees in collections of phylogenetic trees. EvoMiner is an Apriori-like level-wise method, which uses a novel phylogeny-specific constant-time candidate generation scheme, an efficient fingerprinting-based technique for downward closure, and a lowest common ancestor based support counting step that requires neither costly subtree operations nor database traversal. Our algorithm achieves speed-ups of up to 100 times or more over Phylominer, the current state-of-the-art algorithm for mining phylogenetic trees. EvoMiner can also work in depth first enumeration mode, to use less memory at the expense of speed. We demonstrate the utility of FST mining as a way to extract meaningful phylogenetic information from collections of trees when compared to maximum agreement subtrees and majority rule trees --- two commonly used approaches in phylogenetic analysis for extracting consensus information from a collection of trees over a common leaf set

Digital Repository @ Iowa State University (ISU)

Crossref

Managing and analyzing phylogenetic databases

Author: DEEPAK AKSHAY
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2013
Field of study

The ever growing availability of phylogenomic data makes it increasingly possible to study and analyze phylogenetic relationships across a wide range of species. Indeed, current phylogenetic analyses are now producing enormous collections of trees that vary greatly in size. Our proposed research addresses the challenges posed by storing, querying, and analyzing such phylogenetic databases. Our first contribution is the further development of STBase, a phylogenetic tree database consisting of a billion trees whose leaf sets range from four to 20000. STBase applies techniques from different areas of computer science for efficient tree storage and retrieval. It also introduces new ideas that are specific to tree databases. STBase provides a unique opportunity to explore innovative ways to analyze the results from queries on large sets of phylogenetic trees. We propose new ways of extracting consensus information from a collection of phylogenetic trees. Specifically, this involves extending the maximum agreement subtree problem. We greatly improve upon an existing approach based on frequent subtrees and, propose two new approaches based on agreement subtrees and frequent subtrees respectively. The final part of our proposed work deals with the problem of simplifying multi-labeled trees and handling rogue taxa. We propose a novel technique to extract conflict-free information from multi-labeled trees as a much smaller single labeled tree. We show that the inherent problem in identifying rogue taxa is NP-hard and give fixed-parameter tractable and integer linear programming solutions

Digital Repository @ Iowa State University (ISU)

Fixed-parameter tractable canonization and isomorphism test for graphs of bounded treewidth

Author: Lokshtanov Daniel
Pilipczuk Marcin
Pilipczuk Michał
Saurabh Saket
Publication venue
Publication date: 01/10/2014
Field of study

We give a fixed-parameter tractable algorithm that, given a parameter

k

and two graphs

G_1,G_2

, either concludes that one of these graphs has treewidth at least

k

, or determines whether

G_1

and

G_2

are isomorphic. The running time of the algorithm on an

n

-vertex graph is

2^{O(k^5\log k)}\cdot n^5

, and this is the first fixed-parameter algorithm for Graph Isomorphism parameterized by treewidth. Our algorithm in fact solves the more general canonization problem. We namely design a procedure working in

2^{O(k^5\log k)}\cdot n^5

time that, for a given graph

G

n

vertices, either concludes that the treewidth of

G

is at least

k

, or: * finds in an isomorphic-invariant way a graph

\mathfrak{c}(G)

that is isomorphic to

G

; * finds an isomorphism-invariant construction term --- an algebraic expression that encodes

G

together with a tree decomposition of

G

of width

O(k^4)

. Hence, the isomorphism test reduces to verifying whether the computed isomorphic copies or the construction terms for

G_1

and

G_2

are equal.Comment: Full version of a paper presented at FOCS 201

arXiv.org e-Print Archive

Crossref

Warwick Research Archives Portal Repository

A Survey on Graph Kernels

Author: Johansson Fredrik D.
Kriege Nils M.
Morris Christopher
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Graph kernels have become an established and widely-used technique for solving classification tasks on graphs. This survey gives a comprehensive overview of techniques for kernel-based graph classification developed in the past 15 years. We describe and categorize graph kernels based on properties inherent to their design, such as the nature of their extracted graph features, their method of computation and their applicability to problems in practice. In an extensive experimental evaluation, we study the classification accuracy of a large suite of graph kernels on established benchmarks as well as new datasets. We compare the performance of popular kernels with several baseline methods and study the effect of applying a Gaussian RBF kernel to the metric induced by a graph kernel. In doing so, we find that simple baselines become competitive after this transformation on some datasets. Moreover, we study the extent to which existing graph kernels agree in their predictions (and prediction errors) and obtain a data-driven categorization of kernels as result. Finally, based on our experimental results, we derive a practitioner's guide to kernel-based graph classification

arXiv.org e-Print Archive

DSpace@MIT

Chalmers Research

A Survey of Alternating Permutations

Author: Stanley Richard P.
Publication venue
Publication date: 01/05/2009
Field of study

This survey of alternating permutations and Euler numbers includes refinements of Euler numbers, other occurrences of Euler numbers, longest alternating subsequences, umbral enumeration of classes of alternating permutations, and the cd-index of the symmetric group.Comment: 32 pages, 7 figure

arXiv.org e-Print Archive

DSpace@MIT