Search CORE

83,517 research outputs found

Comparing similar ordered trees in linear-time

Author: Touzet Hélène
Publication venue: Elsevier B.V.
Publication date: 31/12/2007
Field of study

AbstractWe describe a linear-time algorithm for comparing two similar ordered rooted trees with node labels. The method for comparing trees is the usual tree edit distance. We show that an optimal mapping that uses at most k insertions or deletions can then be constructed in O(nk3) where n is the size of the trees. The approach is inspired by the Zhang–Shasha algorithm for tree edit distance in combination with an adequate pruning of the search space based on the tree edit graph

Elsevier - Publisher Connector

Rule-based Machine Learning Methods for Functional Prediction

Author: Indurkhya N.
Weiss S. M.
Publication venue
Publication date: 01/01/1995
Field of study

We describe a machine learning method for predicting the value of a real-valued function, given the values of multiple input variables. The method induces solutions from samples in the form of ordered disjunctive normal form (DNF) decision rules. A central objective of the method and representation is the induction of compact, easily interpretable solutions. This rule-based decision model can be extended to search efficiently for similar cases prior to approximating function values. Experimental results on real-world data demonstrate that the new techniques are competitive with existing machine learning and statistical methods and can sometimes yield superior regression performance.Comment: See http://www.jair.org/ for any accompanying file

arXiv.org e-Print Archive

CiteSeerX

Arithmetic for Rooted Trees

Author: Luccio Fabrizio
Publication venue
Publication date: 01/01/2016
Field of study

We propose a new arithmetic for non-empty rooted unordered trees simply called trees. After discussing tree representation and enumeration, we define the operations of tree addition, multiplication and stretch, prove their properties, and show that all trees can be generated from a starting tree of one vertex. We then show how a given tree can be obtained as the sum or product of two trees, thus defining prime trees with respect to addition and multiplication. In both cases we show how primality can be decided in time polynomial in the number of vertices and we prove that factorization is unique. We then define negative trees and suggest dealing with tree equations, giving some preliminary results. Finally we comment on how our arithmetic might be useful, and discuss preceding studies that have some relations with our. To the best of our knowledge our approach and results are completely new aside for an earlier version of this work submitte as an arXiv manuscript.Comment: 18 pages, 8 figure

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

XML Compression via DAGs

Author: Bousquet-Melou Mireille
Lohrey Markus
Maneth Sebastian
Noeth Eric
Publication venue
Publication date: 01/01/2013
Field of study

Unranked trees can be represented using their minimal dag (directed acyclic graph). For XML this achieves high compression ratios due to their repetitive mark up. Unranked trees are often represented through first child/next sibling (fcns) encoded binary trees. We study the difference in size (= number of edges) of minimal dag versus minimal dag of the fcns encoded binary tree. One main finding is that the size of the dag of the binary tree can never be smaller than the square root of the size of the minimal dag, and that there are examples that match this bound. We introduce a new combined structure, the hybrid dag, which is guaranteed to be smaller than (or equal in size to) both dags. Interestingly, we find through experiments that last child/previous sibling encodings are much better for XML compression via dags, than fcns encodings. We determine the average sizes of unranked and binary dags over a given set of labels (under uniform distribution) in terms of their exact generating functions, and in terms of their asymptotical behavior.Comment: A short version of this paper appeared in the Proceedings of ICDT 201

arXiv.org e-Print Archive

CiteSeerX

Computing Runs on a General Alphabet

Author: Kosolobov Dmitry
Publication venue
Publication date: 22/11/2015
Field of study

We describe a RAM algorithm computing all runs (maximal repetitions) of a given string of length

n

over a general ordered alphabet in

O(n\log^{\frac{2}3} n)

time and linear space. Our algorithm outperforms all known solutions working in

\Theta(n\log\sigma)

time provided

\sigma = n^{\Omega(1)}

, where

\sigma

is the alphabet size. We conjecture that there exists a linear time RAM algorithm finding all runs.Comment: 4 pages, 2 figure

arXiv.org e-Print Archive

Crossref

Institutional repository of Ural Federal University named after the first President of Russia B.N.Yeltsin

Partitioned conditional generalized linear models for categorical data

Author: Guédon Yann
Peyhardi Jean
Trottier Catherine
Publication venue
Publication date: 01/01/2014
Field of study

In categorical data analysis, several regression models have been proposed for hierarchically-structured response variables, e.g. the nested logit model. But they have been formally defined for only two or three levels in the hierarchy. Here, we introduce the class of partitioned conditional generalized linear models (PCGLMs) defined for any numbers of levels. The hierarchical structure of these models is fully specified by a partition tree of categories. Using the genericity of the (r,F,Z) specification, the PCGLM can handle nominal, ordinal but also partially-ordered response variables.Comment: 25 pages, 13 figure

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Agritrop

HAL-CIRAD

Efficient chaining of seeds in ordered trees

Author: A. Lozano
B.A. Shapiro
D. Gusfield
D. Joseph
D.J. Lipman
E. Ohlebusch
J.S. Pedersen
K. Zhang
P.P. Gardner
R. Backofen
S. Heyne
S.F. Altschul
T. Jiang
W.R. Pearson
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

We consider here the problem of chaining seeds in ordered trees. Seeds are mappings between two trees Q and T and a chain is a subset of non overlapping seeds that is consistent with respect to postfix order and ancestrality. This problem is a natural extension of a similar problem for sequences, and has applications in computational biology, such as mining a database of RNA secondary structures. For the chaining problem with a set of m constant size seeds, we describe an algorithm with complexity O(m2 log(m)) in time and O(m2) in space

arXiv.org e-Print Archive

CiteSeerX

Crossref

Elsevier - Publisher Connector