5,345 research outputs found
GumDrop at the DISRPT2019 Shared Task: A Model Stacking Approach to Discourse Unit Segmentation and Connective Detection
In this paper we present GumDrop, Georgetown University's entry at the DISRPT
2019 Shared Task on automatic discourse unit segmentation and connective
detection. Our approach relies on model stacking, creating a heterogeneous
ensemble of classifiers, which feed into a metalearner for each final task. The
system encompasses three trainable component stacks: one for sentence
splitting, one for discourse unit segmentation and one for connective
detection. The flexibility of each ensemble allows the system to generalize
well to datasets of different sizes and with varying levels of homogeneity.Comment: Proceedings of Discourse Relation Parsing and Treebanking
(DISRPT2019
A path following algorithm for the graph matching problem
We propose a convex-concave programming approach for the labeled weighted
graph matching problem. The convex-concave programming formulation is obtained
by rewriting the weighted graph matching problem as a least-square problem on
the set of permutation matrices and relaxing it to two different optimization
problems: a quadratic convex and a quadratic concave optimization problem on
the set of doubly stochastic matrices. The concave relaxation has the same
global minimum as the initial graph matching problem, but the search for its
global minimum is also a hard combinatorial problem. We therefore construct an
approximation of the concave problem solution by following a solution path of a
convex-concave problem obtained by linear interpolation of the convex and
concave formulations, starting from the convex relaxation. This method allows
to easily integrate the information on graph label similarities into the
optimization problem, and therefore to perform labeled weighted graph matching.
The algorithm is compared with some of the best performing graph matching
methods on four datasets: simulated graphs, QAPLib, retina vessel images and
handwritten chinese characters. In all cases, the results are competitive with
the state-of-the-art.Comment: 23 pages, 13 figures,typo correction, new results in sections 4,5,
Learning Better Font Slicing Strategies from Data
Generally, the present disclosure is directed to serving font files in topical subsets. In particular, in some implementations, the systems and methods of the present disclosure can include or otherwise leverage one or more machine-learned models to predict topic labels of characters in a font based on a corpus of characters or glyphs
Supervised Identification of Writer\u27s Native Language Based on Their English Word Usage
In this paper, we investigate the possibility of constructing an automated tool for the writer\u27s first language detection based on a~document written in their second language. Since English is the contemporary lingua franca, commonly used by non-native speakers, we have chosen it to be the second language to study. In this paper, we examine English texts from computer science, a field related to mathematics. More generally, we wanted to study texts from a domain that operates with formal rules. We were able to achieve a high classification rate, about~90\%, using a relatively simple model (n-grams with logistic regression). We trained the model to distinguish twelve nationality groups/first languages based on our dataset. The classification mechanism was implemented using logistic regression with L1~regularisation, which performed well with sparse document-term data table. The experiment proved that we can use vocabulary alone to detect the first language with high accuracy
Object-oriented Neural Programming (OONP) for Document Understanding
We propose Object-oriented Neural Programming (OONP), a framework for
semantically parsing documents in specific domains. Basically, OONP reads a
document and parses it into a predesigned object-oriented data structure
(referred to as ontology in this paper) that reflects the domain-specific
semantics of the document. An OONP parser models semantic parsing as a decision
process: a neural net-based Reader sequentially goes through the document, and
during the process it builds and updates an intermediate ontology to summarize
its partial understanding of the text it covers. OONP supports a rich family of
operations (both symbolic and differentiable) for composing the ontology, and a
big variety of forms (both symbolic and differentiable) for representing the
state and the document. An OONP parser can be trained with supervision of
different forms and strength, including supervised learning (SL) ,
reinforcement learning (RL) and hybrid of the two. Our experiments on both
synthetic and real-world document parsing tasks have shown that OONP can learn
to handle fairly complicated ontology with training data of modest sizes.Comment: accepted by ACL 201
- …