138,598 research outputs found
Improving treebank-based automatic LFG induction for Spanish
We describe several improvements to the method of treebank-based LFG induction for Spanish from the Cast3LB treebank (O’Donovan et al., 2005). We discuss the different categories of problems encountered and present the solutions adopted. Some of the problems involve a simple adoption of existing linguistic analyses, as in our treatment of clitic doubling and null subjects. In other cases there is no standard LFG account for the phenomenon
we wish to model and we adopt a compromise, conservative solution. This is exemplified by our treatment of Spanish periphrastic constructions. In yet another case, the less configurational nature of Spanish means that the LFG annotation algorithm has to rely mostly on Cast3LB function tags, and consequently a reliable method of adding those tags to parse trees had to be developed. This method achieves over 6% improvement over the baseline for the
Cast3LB-function-tag assignment task, and over 3% improvement over the baseline for LFG f-structure construction from function-tag-enriched trees
Fighting with the Sparsity of Synonymy Dictionaries
Graph-based synset induction methods, such as MaxMax and Watset, induce
synsets by performing a global clustering of a synonymy graph. However, such
methods are sensitive to the structure of the input synonymy graph: sparseness
of the input dictionary can substantially reduce the quality of the extracted
synsets. In this paper, we propose two different approaches designed to
alleviate the incompleteness of the input dictionaries. The first one performs
a pre-processing of the graph by adding missing edges, while the second one
performs a post-processing by merging similar synset clusters. We evaluate
these approaches on two datasets for the Russian language and discuss their
impact on the performance of synset induction methods. Finally, we perform an
extensive error analysis of each approach and discuss prominent alternative
methods for coping with the problem of the sparsity of the synonymy
dictionaries.Comment: In Proceedings of the 6th Conference on Analysis of Images, Social
Networks, and Texts (AIST'2017): Springer Lecture Notes in Computer Science
(LNCS
Generating Property-Directed Potential Invariants By Backward Analysis
This paper addresses the issue of lemma generation in a k-induction-based
formal analysis of transition systems, in the linear real/integer arithmetic
fragment. A backward analysis, powered by quantifier elimination, is used to
output preimages of the negation of the proof objective, viewed as unauthorized
states, or gray states. Two heuristics are proposed to take advantage of this
source of information. First, a thorough exploration of the possible
partitionings of the gray state space discovers new relations between state
variables, representing potential invariants. Second, an inexact exploration
regroups and over-approximates disjoint areas of the gray state space, also to
discover new relations between state variables. k-induction is used to isolate
the invariants and check if they strengthen the proof objective. These
heuristics can be used on the first preimage of the backward exploration, and
each time a new one is output, refining the information on the gray states. In
our context of critical avionics embedded systems, we show that our approach is
able to outperform other academic or commercial tools on examples of interest
in our application field. The method is introduced and motivated through two
main examples, one of which was provided by Rockwell Collins, in a
collaborative formal verification framework.Comment: In Proceedings FTSCS 2012, arXiv:1212.657
Rerepresenting and Restructuring Domain Theories: A Constructive Induction Approach
Theory revision integrates inductive learning and background knowledge by
combining training examples with a coarse domain theory to produce a more
accurate theory. There are two challenges that theory revision and other
theory-guided systems face. First, a representation language appropriate for
the initial theory may be inappropriate for an improved theory. While the
original representation may concisely express the initial theory, a more
accurate theory forced to use that same representation may be bulky,
cumbersome, and difficult to reach. Second, a theory structure suitable for a
coarse domain theory may be insufficient for a fine-tuned theory. Systems that
produce only small, local changes to a theory have limited value for
accomplishing complex structural alterations that may be required.
Consequently, advanced theory-guided learning systems require flexible
representation and flexible structure. An analysis of various theory revision
systems and theory-guided learning systems reveals specific strengths and
weaknesses in terms of these two desired properties. Designed to capture the
underlying qualities of each system, a new system uses theory-guided
constructive induction. Experiments in three domains show improvement over
previous theory-guided systems. This leads to a study of the behavior,
limitations, and potential of theory-guided constructive induction.Comment: See http://www.jair.org/ for an online appendix and other files
accompanying this articl
Optimal Algorithm for Bayesian Incentive-Compatible Exploration
We consider a social planner faced with a stream of myopic selfish agents.
The goal of the social planner is to maximize the social welfare, however, it
is limited to using only information asymmetry (regarding previous outcomes)
and cannot use any monetary incentives. The planner recommends actions to
agents, but her recommendations need to be Bayesian Incentive Compatible to be
followed by the agents. Our main result is an optimal algorithm for the
planner, in the case that the actions realizations are deterministic and have
limited support, making significant important progress on this open problem.
Our optimal protocol has two interesting features. First, it always completes
the exploration of a priori more beneficial actions before exploring a priori
less beneficial actions. Second, the randomization in the protocol is
correlated across agents and actions (and not independent at each decision
time).Comment: EC 201
Certified Context-Free Parsing: A formalisation of Valiant's Algorithm in Agda
Valiant (1975) has developed an algorithm for recognition of context free
languages. As of today, it remains the algorithm with the best asymptotic
complexity for this purpose. In this paper, we present an algebraic
specification, implementation, and proof of correctness of a generalisation of
Valiant's algorithm. The generalisation can be used for recognition, parsing or
generic calculation of the transitive closure of upper triangular matrices. The
proof is certified by the Agda proof assistant. The certification is
representative of state-of-the-art methods for specification and proofs in
proof assistants based on type-theory. As such, this paper can be read as a
tutorial for the Agda system
- …