6,673 research outputs found

    Fusion of Knowledge-Based and Data-Driven Approaches to Grammar Induction

    Get PDF
    Georgiladakis S, Unger C, Iosif E, et al. Fusion of Knowledge-Based and Data-Driven Approaches to Grammar Induction. In: Fifteenth Annual Conference of the International Speech Communication Association. 2014.Using different sources of information for grammar induction results in grammars that vary in coverage and precision. Fusing such grammars with a strategy that exploits their strengths while minimizing their weaknesses is expected to produce grammars with superior performance. We focus on the fusion of grammars produced using a knowledge-based approach using lexicalized ontologies and a data-driven approach using semantic similarity clustering. We propose various algorithms for finding the map- ping between the (non-terminal) rules generated by each gram- mar induction algorithm, followed by rule fusion. Three fusion approaches are investigated: early, mid and late fusion. Results show that late fusion provides the best relative F-measure per- formance improvement by 20%

    Modeling Global Syntactic Variation in English Using Dialect Classification

    Get PDF
    This paper evaluates global-scale dialect identification for 14 national varieties of English as a means for studying syntactic variation. The paper makes three main contributions: (i) introducing data-driven language mapping as a method for selecting the inventory of national varieties to include in the task; (ii) producing a large and dynamic set of syntactic features using grammar induction rather than focusing on a few hand-selected features such as function words; and (iii) comparing models across both web corpora and social media corpora in order to measure the robustness of syntactic variation across registers

    An Abstract Machine for Unification Grammars

    Full text link
    This work describes the design and implementation of an abstract machine, Amalia, for the linguistic formalism ALE, which is based on typed feature structures. This formalism is one of the most widely accepted in computational linguistics and has been used for designing grammars in various linguistic theories, most notably HPSG. Amalia is composed of data structures and a set of instructions, augmented by a compiler from the grammatical formalism to the abstract instructions, and a (portable) interpreter of the abstract instructions. The effect of each instruction is defined using a low-level language that can be executed on ordinary hardware. The advantages of the abstract machine approach are twofold. From a theoretical point of view, the abstract machine gives a well-defined operational semantics to the grammatical formalism. This ensures that grammars specified using our system are endowed with well defined meaning. It enables, for example, to formally verify the correctness of a compiler for HPSG, given an independent definition. From a practical point of view, Amalia is the first system that employs a direct compilation scheme for unification grammars that are based on typed feature structures. The use of amalia results in a much improved performance over existing systems. In order to test the machine on a realistic application, we have developed a small-scale, HPSG-based grammar for a fragment of the Hebrew language, using Amalia as the development platform. This is the first application of HPSG to a Semitic language.Comment: Doctoral Thesis, 96 pages, many postscript figures, uses pstricks, pst-node, psfig, fullname and a macros fil

    Universal Dependencies Parsing for Colloquial Singaporean English

    Full text link
    Singlish can be interesting to the ACL community both linguistically as a major creole based on English, and computationally for information extraction and sentiment analysis of regional social media. We investigate dependency parsing of Singlish by constructing a dependency treebank under the Universal Dependencies scheme, and then training a neural network model by integrating English syntactic knowledge into a state-of-the-art parser trained on the Singlish treebank. Results show that English knowledge can lead to 25% relative error reduction, resulting in a parser of 84.47% accuracies. To the best of our knowledge, we are the first to use neural stacking to improve cross-lingual dependency parsing on low-resource languages. We make both our annotation and parser available for further research.Comment: Accepted by ACL 201

    Natural language processing

    Get PDF
    Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems

    Cognition-based approaches for high-precision text mining

    Get PDF
    This research improves the precision of information extraction from free-form text via the use of cognitive-based approaches to natural language processing (NLP). Cognitive-based approaches are an important, and relatively new, area of research in NLP and search, as well as linguistics. Cognitive approaches enable significant improvements in both the breadth and depth of knowledge extracted from text. This research has made contributions in the areas of a cognitive approach to automated concept recognition in. Cognitive approaches to search, also called concept-based search, have been shown to improve search precision. Given the tremendous amount of electronic text generated in our digital and connected world, cognitive approaches enable substantial opportunities in knowledge discovery. The generation and storage of electronic text is ubiquitous, hence opportunities for improved knowledge discovery span virtually all knowledge domains. While cognition-based search offers superior approaches, challenges exist due to the need to mimic, even in the most rudimentary way, the extraordinary powers of human cognition. This research addresses these challenges in the key area of a cognition-based approach to automated concept recognition. In addition it resulted in a semantic processing system framework for use in applications in any knowledge domain. Confabulation theory was applied to the problem of automated concept recognition. This is a relatively new theory of cognition using a non-Bayesian measure, called cogency, for predicting the results of human cognition. An innovative distance measure derived from cogent confabulation and called inverse cogency, to rank order candidate concepts during the recognition process. When used with a multilayer perceptron, it improved the precision of concept recognition by 5% over published benchmarks. Additional precision improvements are anticipated. These research steps build a foundation for cognition-based, high-precision text mining. Long-term it is anticipated that this foundation enables a cognitive-based approach to automated ontology learning. Such automated ontology learning will mimic human language cognition, and will, in turn, enable the practical use of cognitive-based approaches in virtually any knowledge domain --Abstract, page iii
    corecore