15,042 research outputs found
Optimal LZ-End Parsing Is Hard
LZ-End is a variant of the well-known Lempel-Ziv parsing family such that each phrase of the parsing has a previous occurrence, with the additional constraint that the previous occurrence must end at the end of a previous phrase. LZ-End was initially proposed as a greedy parsing, where each phrase is determined greedily from left to right, as the longest factor that satisfies the above constraint [Kreft & Navarro, 2010]. In this work, we consider an optimal LZ-End parsing that has the minimum number of phrases in such parsings. We show that a decision version of computing the optimal LZ-End parsing is NP-complete by showing a reduction from the vertex cover problem. Moreover, we give a MAX-SAT formulation for the optimal LZ-End parsing adapting an approach for computing various NP-hard repetitiveness measures recently presented by [Bannai et al., 2022]. We also consider the approximation ratio of the size of greedy LZ-End parsing to the size of the optimal LZ-End parsing, and give a lower bound of the ratio which asymptotically approaches 2
Monadic parser combinators
In functional programming, a popular approach to building recursive descent parsers is to model parsers as functions, and to define higher-order functions (or combinators) that implement grammar constructions such as sequencing, choice, and repetition. Such parsers form an instance of a monad, an algebraic structure from mathematics that has proved useful for addressing a number of computational problems. The purpose of this report is to provide a step-by-step tutorial on the monadic approach to building functional parsers, and to explain some of the benefits that result from exploiting monads. No prior knowledge of parser combinators or of monads is assumed. Indeed, this report can also be viewed as a first introduction to the use of monads in programming
Decompositions of Grammar Constraints
A wide range of constraints can be compactly specified using automata or
formal languages. In a sequence of recent papers, we have shown that an
effective means to reason with such specifications is to decompose them into
primitive constraints. We can then, for instance, use state of the art SAT
solvers and profit from their advanced features like fast unit propagation,
clause learning, and conflict-based search heuristics. This approach holds
promise for solving combinatorial problems in scheduling, rostering, and
configuration, as well as problems in more diverse areas like bioinformatics,
software testing and natural language processing. In addition, decomposition
may be an effective method to propagate other global constraints.Comment: Proceedings of the Twenty-Third AAAI Conference on Artificial
Intelligenc
Learning Language from a Large (Unannotated) Corpus
A novel approach to the fully automated, unsupervised extraction of
dependency grammars and associated syntax-to-semantic-relationship mappings
from large text corpora is described. The suggested approach builds on the
authors' prior work with the Link Grammar, RelEx and OpenCog systems, as well
as on a number of prior papers and approaches from the statistical language
learning literature. If successful, this approach would enable the mining of
all the information needed to power a natural language comprehension and
generation system, directly from a large, unannotated corpus.Comment: 29 pages, 5 figures, research proposa
Self-Adaptive Hierarchical Sentence Model
The ability to accurately model a sentence at varying stages (e.g.,
word-phrase-sentence) plays a central role in natural language processing. As
an effort towards this goal we propose a self-adaptive hierarchical sentence
model (AdaSent). AdaSent effectively forms a hierarchy of representations from
words to phrases and then to sentences through recursive gated local
composition of adjacent segments. We design a competitive mechanism (through
gating networks) to allow the representations of the same sentence to be
engaged in a particular learning task (e.g., classification), therefore
effectively mitigating the gradient vanishing problem persistent in other
recursive models. Both qualitative and quantitative analysis shows that AdaSent
can automatically form and select the representations suitable for the task at
hand during training, yielding superior classification performance over
competitor models on 5 benchmark data sets.Comment: 8 pages, 7 figures, accepted as a full paper at IJCAI 201
Temporal characterization of the requests to Wikipedia
This paper presents an empirical study about the temporal patterns
characterizing the requests submitted by users to Wikipedia.
The study is based on the analysis of the log lines registered by the
Wikimedia Foundation Squid servers after having sent the appropriate
content in response to users' requests. The
analysis has been conducted regarding the ten most visited editions of
Wikipedia and has involved more than 14,000 million log lines
corresponding to the traffic of the entire year 2009. The conducted methodology
has mainly consisted in the parsing and filtering
of users' requests according to the study directives. As a result, relevant information
fields have been finally stored in a database for persistence and further
characterization. In this way, we, first, assessed, whether the traffic to Wikipedia could serve
as a reliable estimator of the overall traffic to all the Wikimedia Foundation
projects. Our subsequent analysis of the temporal evolutions corresponding to
the different types of requests to Wikipedia revealed interesting differences
and similarities among them that can be related to the users' attention to the Encyclopedia.
In addition, we have performed separated characterizations of each Wikipedia edition
to compare their respective evolutions over time
- …