5,345 research outputs found

    GumDrop at the DISRPT2019 Shared Task: A Model Stacking Approach to Discourse Unit Segmentation and Connective Detection

    Full text link
    In this paper we present GumDrop, Georgetown University's entry at the DISRPT 2019 Shared Task on automatic discourse unit segmentation and connective detection. Our approach relies on model stacking, creating a heterogeneous ensemble of classifiers, which feed into a metalearner for each final task. The system encompasses three trainable component stacks: one for sentence splitting, one for discourse unit segmentation and one for connective detection. The flexibility of each ensemble allows the system to generalize well to datasets of different sizes and with varying levels of homogeneity.Comment: Proceedings of Discourse Relation Parsing and Treebanking (DISRPT2019

    A path following algorithm for the graph matching problem

    Get PDF
    We propose a convex-concave programming approach for the labeled weighted graph matching problem. The convex-concave programming formulation is obtained by rewriting the weighted graph matching problem as a least-square problem on the set of permutation matrices and relaxing it to two different optimization problems: a quadratic convex and a quadratic concave optimization problem on the set of doubly stochastic matrices. The concave relaxation has the same global minimum as the initial graph matching problem, but the search for its global minimum is also a hard combinatorial problem. We therefore construct an approximation of the concave problem solution by following a solution path of a convex-concave problem obtained by linear interpolation of the convex and concave formulations, starting from the convex relaxation. This method allows to easily integrate the information on graph label similarities into the optimization problem, and therefore to perform labeled weighted graph matching. The algorithm is compared with some of the best performing graph matching methods on four datasets: simulated graphs, QAPLib, retina vessel images and handwritten chinese characters. In all cases, the results are competitive with the state-of-the-art.Comment: 23 pages, 13 figures,typo correction, new results in sections 4,5,

    Learning Better Font Slicing Strategies from Data

    Get PDF
    Generally, the present disclosure is directed to serving font files in topical subsets. In particular, in some implementations, the systems and methods of the present disclosure can include or otherwise leverage one or more machine-learned models to predict topic labels of characters in a font based on a corpus of characters or glyphs

    Supervised Identification of Writer\u27s Native Language Based on Their English Word Usage

    Get PDF
    In this paper, we investigate the possibility of constructing an automated tool for the writer\u27s first language detection based on a~document written in their second language. Since English is the contemporary lingua franca, commonly used by non-native speakers, we have chosen it to be the second language to study. In this paper, we examine English texts from computer science, a field related to mathematics. More generally, we wanted to study texts from a domain that operates with formal rules. We were able to achieve a high classification rate, about~90\%, using a relatively simple model (n-grams with logistic regression). We trained the model to distinguish twelve nationality groups/first languages based on our dataset. The classification mechanism was implemented using logistic regression with L1~regularisation, which performed well with sparse document-term data table. The experiment proved that we can use vocabulary alone to detect the first language with high accuracy

    Object-oriented Neural Programming (OONP) for Document Understanding

    Full text link
    We propose Object-oriented Neural Programming (OONP), a framework for semantically parsing documents in specific domains. Basically, OONP reads a document and parses it into a predesigned object-oriented data structure (referred to as ontology in this paper) that reflects the domain-specific semantics of the document. An OONP parser models semantic parsing as a decision process: a neural net-based Reader sequentially goes through the document, and during the process it builds and updates an intermediate ontology to summarize its partial understanding of the text it covers. OONP supports a rich family of operations (both symbolic and differentiable) for composing the ontology, and a big variety of forms (both symbolic and differentiable) for representing the state and the document. An OONP parser can be trained with supervision of different forms and strength, including supervised learning (SL) , reinforcement learning (RL) and hybrid of the two. Our experiments on both synthetic and real-world document parsing tasks have shown that OONP can learn to handle fairly complicated ontology with training data of modest sizes.Comment: accepted by ACL 201
    • …
    corecore