6,452 research outputs found

    Content Differences in Syntactic and Semantic Representations

    Full text link
    Syntactic analysis plays an important role in semantic parsing, but the nature of this role remains a topic of ongoing debate. The debate has been constrained by the scarcity of empirical comparative studies between syntactic and semantic schemes, which hinders the development of parsing methods informed by the details of target schemes and constructions. We target this gap, and take Universal Dependencies (UD) and UCCA as a test case. After abstracting away from differences of convention or formalism, we find that most content divergences can be ascribed to: (1) UCCA's distinction between a Scene and a non-Scene; (2) UCCA's distinction between primary relations, secondary ones and participants; (3) different treatment of multi-word expressions, and (4) different treatment of inter-clause linkage. We further discuss the long tail of cases where the two schemes take markedly different approaches. Finally, we show that the proposed comparison methodology can be used for fine-grained evaluation of UCCA parsing, highlighting both challenges and potential sources for improvement. The substantial differences between the schemes suggest that semantic parsers are likely to benefit downstream text understanding applications beyond their syntactic counterparts.Comment: NAACL-HLT 2019 camera read

    One model, two languages: training bilingual parsers with harmonized treebanks

    Full text link
    We introduce an approach to train lexicalized parsers using bilingual corpora obtained by merging harmonized treebanks of different languages, producing parsers that can analyze sentences in either of the learned languages, or even sentences that mix both. We test the approach on the Universal Dependency Treebanks, training with MaltParser and MaltOptimizer. The results show that these bilingual parsers are more than competitive, as most combinations not only preserve accuracy, but some even achieve significant improvements over the corresponding monolingual parsers. Preliminary experiments also show the approach to be promising on texts with code-switching and when more languages are added.Comment: 7 pages, 4 tables, 1 figur

    Creation of a Style Independent Intelligent Autonomous Citation Indexer to Support Academic Research

    Get PDF
    This paper describes the current state of RUgle, a system for classifying and indexing papers made available on the World Wide Web, in a domain-independent and universal manner. By building RUgle with the most relaxed restrictions possible on the formatting of the documents it can process, we hope to create a system that can combine the best features of currently available closed library searches that are designed to facilitate academic research with the inclusive nature of general purpose search engines that continually crawl the web and add documents to their indexed database

    Organizing the Internet

    Get PDF
    This paper examines XML and its relationships with SGML (Standardized General Markup Language) and HTML (HyperText Markup Language). It examines the importance of metatags and the XML Document Type Definition (DTD) and proposed alternatives. It looks at the differences between the two types of XML data: “valid” and “well-formed” documents
    • …
    corecore