12,516 research outputs found
Near-optimal labeling schemes for nearest common ancestors
We consider NCA labeling schemes: given a rooted tree , label the nodes of
with binary strings such that, given the labels of any two nodes, one can
determine, by looking only at the labels, the label of their nearest common
ancestor.
For trees with nodes we present upper and lower bounds establishing that
labels of size , are both sufficient and
necessary. (All logarithms in this paper are in base 2.)
Alstrup, Bille, and Rauhe (SIDMA'05) showed that ancestor and NCA labeling
schemes have labels of size . Our lower bound
increases this to for NCA labeling schemes. Since
Fraigniaud and Korman (STOC'10) established that labels in ancestor labeling
schemes have size , our new lower bound separates
ancestor and NCA labeling schemes. Our upper bound improves the
upper bound by Alstrup, Gavoille, Kaplan and Rauhe (TOCS'04), and our
theoretical result even outperforms some recent experimental studies by Fischer
(ESA'09) where variants of the same NCA labeling scheme are shown to all have
labels of size approximately
Better, Faster, Stronger Sequence Tagging Constituent Parsers
Sequence tagging models for constituent parsing are faster, but less accurate
than other types of parsers. In this work, we address the following weaknesses
of such constituent parsers: (a) high error rates around closing brackets of
long constituents, (b) large label sets, leading to sparsity, and (c) error
propagation arising from greedy decoding. To effectively close brackets, we
train a model that learns to switch between tagging schemes. To reduce
sparsity, we decompose the label set and use multi-task learning to jointly
learn to predict sublabels. Finally, we mitigate issues from greedy decoding
through auxiliary losses and sentence-level fine-tuning with policy gradient.
Combining these techniques, we clearly surpass the performance of sequence
tagging constituent parsers on the English and Chinese Penn Treebanks, and
reduce their parsing time even further. On the SPMRL datasets, we observe even
greater improvements across the board, including a new state of the art on
Basque, Hebrew, Polish and Swedish.Comment: NAACL 2019 (long papers). Contains corrigendu
Labeling Schemes with Queries
We study the question of ``how robust are the known lower bounds of labeling
schemes when one increases the number of consulted labels''. Let be a
function on pairs of vertices. An -labeling scheme for a family of graphs
\cF labels the vertices of all graphs in \cF such that for every graph
G\in\cF and every two vertices , the value can be inferred
by merely inspecting the labels of and .
This paper introduces a natural generalization: the notion of -labeling
schemes with queries, in which the value can be inferred by inspecting
not only the labels of and but possibly the labels of some additional
vertices. We show that inspecting the label of a single additional vertex (one
{\em query}) enables us to reduce the label size of many labeling schemes
significantly
Lower Bounds in the Preprocessing and Query Phases of Routing Algorithms
In the last decade, there has been a substantial amount of research in
finding routing algorithms designed specifically to run on real-world graphs.
In 2010, Abraham et al. showed upper bounds on the query time in terms of a
graph's highway dimension and diameter for the current fastest routing
algorithms, including contraction hierarchies, transit node routing, and hub
labeling. In this paper, we show corresponding lower bounds for the same three
algorithms. We also show how to improve a result by Milosavljevic which lower
bounds the number of shortcuts added in the preprocessing stage for contraction
hierarchies. We relax the assumption of an optimal contraction order (which is
NP-hard to compute), allowing the result to be applicable to real-world
instances. Finally, we give a proof that optimal preprocessing for hub labeling
is NP-hard. Hardness of optimal preprocessing is known for most routing
algorithms, and was suspected to be true for hub labeling
Recommended from our members
Whole-proteome tree of life suggests a deep burst of organism diversity.
An organism tree of life (organism ToL) is a conceptual and metaphorical tree to capture a simplified narrative of the evolutionary course and kinship among the extant organisms. Such a tree cannot be experimentally validated but may be reconstructed based on characteristics associated with the organisms. Since the whole-genome sequence of an organism is, at present, the most comprehensive descriptor of the organism, a whole-genome sequence-based ToL can be an empirically derivable surrogate for the organism ToL. However, experimentally determining the whole-genome sequences of many diverse organisms was practically impossible until recently. We have constructed three types of ToLs for diversely sampled organisms using the sequences of whole genome, of whole transcriptome, and of whole proteome. Of the three, whole-proteome sequence-based ToL (whole-proteome ToL), constructed by applying information theory-based feature frequency profile method, an "alignment-free" method, gave the most topologically stable ToL. Here, we describe the main features of a whole-proteome ToL for 4,023 species with known complete or almost complete genome sequences on grouping and kinship among the groups at deep evolutionary levels. The ToL reveals 1) all extant organisms of this study can be grouped into 2 "Supergroups," 6 "Major Groups," or 35+ "Groups"; 2) the order of emergence of the "founders" of all of the groups may be assigned on an evolutionary progression scale; 3) all of the founders of the groups have emerged in a "deep burst" at the very beginning period near the root of the ToL-an explosive birth of life's diversity
- …