12,169 research outputs found
An O(n^3)-Time Algorithm for Tree Edit Distance
The {\em edit distance} between two ordered trees with vertex labels is the
minimum cost of transforming one tree into the other by a sequence of
elementary operations consisting of deleting and relabeling existing nodes, as
well as inserting new nodes. In this paper, we present a worst-case
-time algorithm for this problem, improving the previous best
-time algorithm~\cite{Klein}. Our result requires a novel
adaptive strategy for deciding how a dynamic program divides into subproblems
(which is interesting in its own right), together with a deeper understanding
of the previous algorithms for the problem. We also prove the optimality of our
algorithm among the family of \emph{decomposition strategy} algorithms--which
also includes the previous fastest algorithms--by tightening the known lower
bound of ~\cite{Touzet} to , matching our
algorithm's running time. Furthermore, we obtain matching upper and lower
bounds of when the two trees have
different sizes and~, where .Comment: 10 pages, 5 figures, 5 .tex files where TED.tex is the main on
An Introduction to Recursive Partitioning: Rationale, Application and Characteristics of Classification and Regression Trees, Bagging and Random Forests
Recursive partitioning methods have become popular and widely used tools for nonparametric regression and classification in many scientific fields. Especially random forests, that can deal with large numbers of predictor variables even in the presence of complex interactions, have been applied successfully in genetics, clinical medicine and bioinformatics within the past few years.
High dimensional problems are common not only in genetics, but also in some areas of psychological research, where only few subjects can be measured due to time or cost constraints, yet a large amount of data is generated for each subject. Random forests have been shown to achieve a high prediction accuracy in such applications, and provide descriptive variable importance measures reflecting the impact of each variable in both main effects and interactions.
The aim of this work is to introduce the principles of the standard recursive partitioning methods as well as recent methodological improvements, to illustrate their usage for low and high dimensional data exploration, but also to point out limitations of the methods and potential pitfalls in their practical application.
Application of the methods is illustrated using freely available implementations in the R system for statistical computing
Search and Result Presentation in Scientific Workflow Repositories
We study the problem of searching a repository of complex hierarchical
workflows whose component modules, both composite and atomic, have been
annotated with keywords. Since keyword search does not use the graph structure
of a workflow, we develop a model of workflows using context-free bag grammars.
We then give efficient polynomial-time algorithms that, given a workflow and a
keyword query, determine whether some execution of the workflow matches the
query. Based on these algorithms we develop a search and ranking solution that
efficiently retrieves the top-k grammars from a repository. Finally, we propose
a novel result presentation method for grammars matching a keyword query, based
on representative parse-trees. The effectiveness of our approach is validated
through an extensive experimental evaluation
- …