48,424 research outputs found

    Automatic treebank-based acquisition of Arabic LFG dependency structures

    Get PDF
    A number of papers have reported on methods for the automatic acquisition of large-scale, probabilistic LFG-based grammatical resources from treebanks for English (Cahill and al., 2002), (Cahill and al., 2004), German (Cahill and al., 2003), Chinese (Burke, 2004), (Guo and al., 2007), Spanish (O’Donovan, 2004), (Chrupala and van Genabith, 2006) and French (Schluter and van Genabith, 2008). Here, we extend the LFG grammar acquisition approach to Arabic and the Penn Arabic Treebank (ATB) (Maamouri and Bies, 2004), adapting and extending the methodology of (Cahill and al., 2004) originally developed for English. Arabic is challenging because of its morphological richness and syntactic complexity. Currently 98% of ATB trees (without FRAG and X) produce a covering and connected f-structure. We conduct a qualitative evaluation of our annotation against a gold standard and achieve an f-score of 95%

    Adapting a general parser to a sublanguage

    Full text link
    In this paper, we propose a method to adapt a general parser (Link Parser) to sublanguages, focusing on the parsing of texts in biology. Our main proposal is the use of terminology (identication and analysis of terms) in order to reduce the complexity of the text to be parsed. Several other strategies are explored and finally combined among which text normalization, lexicon and morpho-guessing module extensions and grammar rules adaptation. We compare the parsing results before and after these adaptations

    Using distributional similarity to organise biomedical terminology

    Get PDF
    We investigate an application of distributional similarity techniques to the problem of structural organisation of biomedical terminology. Our application domain is the relatively small GENIA corpus. Using terms that have been accurately marked-up by hand within the corpus, we consider the problem of automatically determining semantic proximity. Terminological units are dened for our purposes as normalised classes of individual terms. Syntactic analysis of the corpus data is carried out using the Pro3Gres parser and provides the data required to calculate distributional similarity using a variety of dierent measures. Evaluation is performed against a hand-crafted gold standard for this domain in the form of the GENIA ontology. We show that distributional similarity can be used to predict semantic type with a good degree of accuracy

    Automatic acquisition of LFG resources for German - as good as it gets

    Get PDF
    We present data-driven methods for the acquisition of LFG resources from two German treebanks. We discuss problems specific to semi-free word order languages as well as problems arising fromthe data structures determined by the design of the different treebanks. We compare two ways of encoding semi-free word order, as done in the two German treebanks, and argue that the design of the TiGer treebank is more adequate for the acquisition of LFG resources. Furthermore, we describe an architecture for LFG grammar acquisition for German, based on the two German treebanks, and compare our results with a hand-crafted German LFG grammar

    Automated, objective texture segmentation of multibeam echosounder data - Seafloor survey and substrate maps from James Island to Ozette Lake, Washington Outer Coast

    Get PDF
    Without knowledge of basic seafloor characteristics, the ability to address any number of critical marine and/or coastal management issues is diminished. For example, management and conservation of essential fish habitat (EFH), a requirement mandated by federally guided fishery management plans (FMPs), requires among other things a description of habitats for federally managed species. Although the list of attributes important to habitat are numerous, the ability to efficiently and effectively describe many, and especially at the scales required, does not exist with the tools currently available. However, several characteristics of seafloor morphology are readily obtainable at multiple scales and can serve as useful descriptors of habitat. Recent advancements in acoustic technology, such as multibeam echosounding (MBES), can provide remote indication of surficial sediment properties such as texture, hardness, or roughness, and further permit highly detailed renderings of seafloor morphology. With acoustic-based surveys providing a relatively efficient method for data acquisition, there exists a need for efficient and reproducible automated segmentation routines to process the data. Using MBES data collected by the Olympic Coast National Marine Sanctuary (OCNMS), and through a contracted seafloor survey, we expanded on the techniques of Cutter et al. (2003) to describe an objective repeatable process that uses parameterized local Fourier histogram (LFH) texture features to automate segmentation of surficial sediments from acoustic imagery using a maximum likelihood decision rule. Sonar signatures and classification performance were evaluated using video imagery obtained from a towed camera sled. Segmented raster images were converted to polygon features and attributed using a hierarchical deep-water marine benthic classification scheme (Greene et al. 1999) for use in a geographical information system (GIS). (PDF contains 41 pages.

    Multimodal Grounding for Language Processing

    Get PDF
    This survey discusses how recent developments in multimodal processing facilitate conceptual grounding of language. We categorize the information flow in multimodal processing with respect to cognitive models of human information processing and analyze different methods for combining multimodal representations. Based on this methodological inventory, we discuss the benefit of multimodal grounding for a variety of language processing tasks and the challenges that arise. We particularly focus on multimodal grounding of verbs which play a crucial role for the compositional power of language.Comment: The paper has been published in the Proceedings of the 27 Conference of Computational Linguistics. Please refer to this version for citations: https://www.aclweb.org/anthology/papers/C/C18/C18-1197

    Wide-coverage deep statistical parsing using automatic dependency structure annotation

    Get PDF
    A number of researchers (Lin 1995; Carroll, Briscoe, and Sanfilippo 1998; Carroll et al. 2002; Clark and Hockenmaier 2002; King et al. 2003; Preiss 2003; Kaplan et al. 2004;Miyao and Tsujii 2004) have convincingly argued for the use of dependency (rather than CFG-tree) representations for parser evaluation. Preiss (2003) and Kaplan et al. (2004) conducted a number of experiments comparing “deep” hand-crafted wide-coverage with “shallow” treebank- and machine-learning based parsers at the level of dependencies, using simple and automatic methods to convert tree output generated by the shallow parsers into dependencies. In this article, we revisit the experiments in Preiss (2003) and Kaplan et al. (2004), this time using the sophisticated automatic LFG f-structure annotation methodologies of Cahill et al. (2002b, 2004) and Burke (2006), with surprising results. We compare various PCFG and history-based parsers (based on Collins, 1999; Charniak, 2000; Bikel, 2002) to find a baseline parsing system that fits best into our automatic dependency structure annotation technique. This combined system of syntactic parser and dependency structure annotation is compared to two hand-crafted, deep constraint-based parsers (Carroll and Briscoe 2002; Riezler et al. 2002). We evaluate using dependency-based gold standards (DCU 105, PARC 700, CBS 500 and dependencies for WSJ Section 22) and use the Approximate Randomization Test (Noreen 1989) to test the statistical significance of the results. Our experiments show that machine-learning-based shallow grammars augmented with sophisticated automatic dependency annotation technology outperform hand-crafted, deep, widecoverage constraint grammars. Currently our best system achieves an f-score of 82.73% against the PARC 700 Dependency Bank (King et al. 2003), a statistically significant improvement of 2.18%over the most recent results of 80.55%for the hand-crafted LFG grammar and XLE parsing system of Riezler et al. (2002), and an f-score of 80.23% against the CBS 500 Dependency Bank (Carroll, Briscoe, and Sanfilippo 1998), a statistically significant 3.66% improvement over the 76.57% achieved by the hand-crafted RASP grammar and parsing system of Carroll and Briscoe (2002)

    Using theory to inform capacity-building: Bootstrapping communities of practice in computer science education research

    Get PDF
    In this paper, we describe our efforts in the deliberate creation of a community of practice of researchers in computer science education (CSEd). We understand community of practice in the sense in which Wenger describes it, whereby the community is characterized by mutual engagement in a joint enterprise that gives rise to a shared repertoire of knowledge, artefacts, and practices. We first identify CSEd as a research field in which no shared paradigm exists, and then we describe the Bootstrapping project, its metaphor, structure, rationale, and delivery, as designed to create a community of practice of CSEd researchers. Features of other projects are also outlined that have similar aims of capacity building in disciplinary-specific pedagogic enquiry. A theoretically derived framework for evaluating the success of endeavours of this type is then presented, and we report the results from an empirical study. We conclude with four open questions for our project and others like it: Where is the locus of a community of practice? Who are the core members? Do capacity-building models transfer to other disciplines? Can our theoretically motivated measures of success apply to other projects of the same nature
    corecore