3,899 research outputs found

    The Composite Nature of Interlanguage as a Developing System

    Get PDF
    This paper explores the nature of interlanguage (IL) as a developing system with a focus on the abstract lexical structure underlying IL construction. The developing system of IL is assumed to be ‘composite’ in that in second language acquisition (SLA) several linguistic systems are in contact, each of which may contribute different amounts to the developing system. The lexical structure is assumed to be ‘abstract’ in that the mental lexicon contains abstract elements called ‘lemmas’, which contain information about individual lexemes, and lemmas in the bilingual mental lexicon are language-specific and are in contact in IL production. Based on the research findings, it concludes that language transfer in IL production should be understood as lemma transfer of the learner's first language (L1) lexical structure at three abstract levels: lexical-conceptual structure, predicate-argument structure, and morphological realization patterns, and IL construction is driven by an incompletely acquired abstract lexical structure of a target language (TL) item

    VerbAtlas: a novel large-scale verbal semantic resource and its application to semantic role labeling

    Get PDF
    We present VerbAtlas, a new, hand-crafted lexical-semantic resource whose goal is to bring together all verbal synsets from WordNet into semantically-coherent frames. The frames define a common, prototypical argument structure while at the same time providing new concept-specific information. In contrast to PropBank, which defines enumerative semantic roles, VerbAtlas comes with an explicit, cross-frame set of semantic roles linked to selectional preferences expressed in terms of WordNet synsets, and is the first resource enriched with semantic information about implicit, shadow, and default arguments. We demonstrate the effectiveness of VerbAtlas in the task of dependency-based Semantic Role Labeling and show how its integration into a high-performance system leads to improvements on both the in-domain and out-of-domain test sets of CoNLL-2009. VerbAtlas is available at http://verbatlas.org

    Treebank-based acquisition of a Chinese lexical-functional grammar

    Get PDF
    Scaling wide-coverage, constraint-based grammars such as Lexical-Functional Grammars (LFG) (Kaplan and Bresnan, 1982; Bresnan, 2001) or Head-Driven Phrase Structure Grammars (HPSG) (Pollard and Sag, 1994) from fragments to naturally occurring unrestricted text is knowledge-intensive, time-consuming and (often prohibitively) expensive. A number of researchers have recently presented methods to automatically acquire wide-coverage, probabilistic constraint-based grammatical resources from treebanks (Cahill et al., 2002, Cahill et al., 2003; Cahill et al., 2004; Miyao et al., 2003; Miyao et al., 2004; Hockenmaier and Steedman, 2002; Hockenmaier, 2003), addressing the knowledge acquisition bottleneck in constraint-based grammar development. Research to date has concentrated on English and German. In this paper we report on an experiment to induce wide-coverage, probabilistic LFG grammatical and lexical resources for Chinese from the Penn Chinese Treebank (CTB) (Xue et al., 2002) based on an automatic f-structure annotation algorithm. Currently 96.751% of the CTB trees receive a single, covering and connected f-structure, 0.112% do not receive an f-structure due to feature clashes, while 3.137% are associated with multiple f-structure fragments. From the f-structure-annotated CTB we extract a total of 12975 lexical entries with 20 distinct subcategorisation frame types. Of these 3436 are verbal entries with a total of 11 different frame types. We extract a number of PCFG-based LFG approximations. Currently our best automatically induced grammars achieve an f-score of 81.57% against the trees in unseen articles 301-325; 86.06% f-score (all grammatical functions) and 73.98% (preds-only) against the dependencies derived from the f-structures automatically generated for the original trees in 301-325 and 82.79% (all grammatical functions) and 67.74% (preds-only) against the dependencies derived from the manually annotated gold-standard f-structures for 50 trees randomly selected from articles 301-325

    Predicate Matrix: an interoperable lexical knowledge base for predicates

    Get PDF
    183 p.La Matriz de Predicados (Predicate Matrix en inglés) es un nuevo recurso léxico-semántico resultado de la integración de múltiples fuentes de conocimiento, entre las cuales se encuentran FrameNet, VerbNet, PropBank y WordNet. La Matriz de Predicados proporciona un léxico extenso y robusto que permite mejorar la interoperabilidad entre los recursos semánticos mencionados anteriormente. La creación de la Matriz de Predicados se basa en la integración de Semlink y nuevos mappings obtenidos utilizando métodos automáticos que enlazan el conocimiento semántico a nivel léxico y de roles. Asimismo, hemos ampliado la Predicate Matrix para cubrir los predicados nominales (inglés, español) y predicados en otros idiomas (castellano, catalán y vasco). Como resultado, la Matriz de predicados proporciona un léxico multilingüe que permite el análisis semántico interoperable en múltiples idiomas

    ANALYTIC CAUSATIVES IN JAVANESE : A LEXICAL-FUNCTIONAL APPROACH

    Get PDF
    This paper is a study of analytic causatives in Javanese from a Lexical Functional Grammar (LFG) perspective. Analytic causative s are the type of causatives where there are separate predicates expressing the cause and the effect, that is, the causing notion is realized by a word separate from the word denoting the caused activity. The question addressed in this paper is whether analytic causative s form a mono - or bi - clausal structure. In addition, this paper aims to explain the mechanism of argument sharing between the verbs involved in analytic causatives. By using a negation marker and modals as the syntactic operators to test mono - or bi - clausa lity of analytic causatives, it was found that analytic causatives in Javanese are biclausal. These constructions have an X - COMP structure, in that the SUBJ of the second verb is controlled by the OBJ of the causative verb (N)gawe ̳make‘. The syntactic str uctures of analytic causatives are described within a constituent structure and a functional structure, two main components of LFG

    Automatic treebank-based acquisition of Arabic LFG dependency structures

    Get PDF
    A number of papers have reported on methods for the automatic acquisition of large-scale, probabilistic LFG-based grammatical resources from treebanks for English (Cahill and al., 2002), (Cahill and al., 2004), German (Cahill and al., 2003), Chinese (Burke, 2004), (Guo and al., 2007), Spanish (O’Donovan, 2004), (Chrupala and van Genabith, 2006) and French (Schluter and van Genabith, 2008). Here, we extend the LFG grammar acquisition approach to Arabic and the Penn Arabic Treebank (ATB) (Maamouri and Bies, 2004), adapting and extending the methodology of (Cahill and al., 2004) originally developed for English. Arabic is challenging because of its morphological richness and syntactic complexity. Currently 98% of ATB trees (without FRAG and X) produce a covering and connected f-structure. We conduct a qualitative evaluation of our annotation against a gold standard and achieve an f-score of 95%

    Measuring Accuracy of Triples in Knowledge Graphs

    Get PDF
    An increasing amount of large-scale knowledge graphs have been constructed in recent years. Those graphs are often created from text-based extraction, which could be very noisy. So far, cleaning knowledge graphs are often carried out by human experts and thus very inefficient. It is necessary to explore automatic methods for identifying and eliminating erroneous information. In order to achieve this, previous approaches primarily rely on internal information i.e. the knowledge graph itself. In this paper, we introduce an automatic approach, Triples Accuracy Assessment (TAA), for validating RDF triples (source triples) in a knowledge graph by finding consensus of matched triples (among target triples) from other knowledge graphs. TAA uses knowledge graph interlinks to find identical resources and apply different matching methods between the predicates of source triples and target triples. Then based on the matched triples, TAA calculates a confidence score to indicate the correctness of a source triple. In addition, we present an evaluation of our approach using the FactBench dataset for fact validation. Our findings show promising results for distinguishing between correct and wrong triples
    corecore