Search CORE

21,244 research outputs found

Designing and Implementing a Learning Object Repository: Issues of Complexity, Granularity, and User Sense-Making

Author: Moen William E.
Publication venue: Georgia Institute of Technology
Publication date: 20/05/2009
Field of study

4th International Conference on Open RepositoriesThis presentation was part of the session : DSpace User Group PresentationsDate: 2009-05-20 03:30 PM – 05:00 PMThe Texas Center for Digital Knowledge at the University of North Texas is designing and implementing a DSpace/Manakin learning object repository (LOR) for the Texas Higher Education Coordinating Board to store and provide access to redesigned undergraduate courses being created through the Board's Texas Course Redesign Project (TCRP). The content for the THECB LOR differs in significant ways from content stored in other well-known and evolving LORs, since the content is in the form of complete or partial courses. While this content can be represented as a single learning object (i.e., a complete course as one learning object), the THECB LOR is making the complete courses available as learning objects and it is providing access to components of the courses' content as discrete learning objects for reuse and repurposing. A number of challenges and issues have emerged in the design, development, and implementation the LOR, and this paper focuses on three key aspects and the solutions we are pursuing: 1) complexity of the course content and granularity; 2) submission of complex objects and metadata; and 3) user interface design to assist users in making sense of this repository and its contents.Texas Higher Education Coordinating Boar

Scholarly Materials And Research @ Georgia Tech

RNeXML: a package for reading and writing richly annotated phylogenetic, character, and trait data in R

Author: Boettiger Carl
Chamberlain Scott
Lapp Hilmar
Vos Rutger
Publication venue
Publication date: 08/06/2015
Field of study

NeXML is a powerful and extensible exchange standard recently proposed to better meet the expanding needs for phylogenetic data and metadata sharing. Here we present the RNeXML package, which provides users of the R programming language with easy-to-use tools for reading and writing NeXML documents, including rich metadata, in a way that interfaces seamlessly with the extensive library of phylogenetic tools already available in the R ecosystem

arXiv.org e-Print Archive

eScholarship - University of California

Wide-coverage deep statistical parsing using automatic dependency structure annotation

Author: Abney Stephen
Andy Way
Aoife Cahill
Briscoe Edward
Chinchor Nancy
Johnson Mark
Josef van Genabith
Michael Burke
Ruth O'Donovan
Stefan Riezler
Xue Nianwen
Publication venue: 'MIT Press - Journals'
Publication date: 01/03/2008
Field of study

A number of researchers (Lin 1995; Carroll, Briscoe, and Sanfilippo 1998; Carroll et al. 2002; Clark and Hockenmaier 2002; King et al. 2003; Preiss 2003; Kaplan et al. 2004;Miyao and Tsujii 2004) have convincingly argued for the use of dependency (rather than CFG-tree) representations for parser evaluation. Preiss (2003) and Kaplan et al. (2004) conducted a number of experiments comparing “deep” hand-crafted wide-coverage with “shallow” treebank- and machine-learning based parsers at the level of dependencies, using simple and automatic methods to convert tree output generated by the shallow parsers into dependencies. In this article, we revisit the experiments in Preiss (2003) and Kaplan et al. (2004), this time using the sophisticated automatic LFG f-structure annotation methodologies of Cahill et al. (2002b, 2004) and Burke (2006), with surprising results. We compare various PCFG and history-based parsers (based on Collins, 1999; Charniak, 2000; Bikel, 2002) to find a baseline parsing system that fits best into our automatic dependency structure annotation technique. This combined system of syntactic parser and dependency structure annotation is compared to two hand-crafted, deep constraint-based parsers (Carroll and Briscoe 2002; Riezler et al. 2002). We evaluate using dependency-based gold standards (DCU 105, PARC 700, CBS 500 and dependencies for WSJ Section 22) and use the Approximate Randomization Test (Noreen 1989) to test the statistical significance of the results. Our experiments show that machine-learning-based shallow grammars augmented with sophisticated automatic dependency annotation technology outperform hand-crafted, deep, widecoverage constraint grammars. Currently our best system achieves an f-score of 82.73% against the PARC 700 Dependency Bank (King et al. 2003), a statistically significant improvement of 2.18%over the most recent results of 80.55%for the hand-crafted LFG grammar and XLE parsing system of Riezler et al. (2002), and an f-score of 80.23% against the CBS 500 Dependency Bank (Carroll, Briscoe, and Sanfilippo 1998), a statistically significant 3.66% improvement over the 76.57% achieved by the hand-crafted RASP grammar and parsing system of Carroll and Briscoe (2002)

Crossref

Irish Universities

DCU Online Research Access Service

Codeco: A Grammar Notation for Controlled Natural Language in Predictive Editors

Author: Kuhn Tobias
Publication venue
Publication date: 15/09/2010
Field of study

Existing grammar frameworks do not work out particularly well for controlled natural languages (CNL), especially if they are to be used in predictive editors. I introduce in this paper a new grammar notation, called Codeco, which is designed specifically for CNLs and predictive editors. Two different parsers have been implemented and a large subset of Attempto Controlled English (ACE) has been represented in Codeco. The results show that Codeco is practical, adequate and efficient

arXiv.org e-Print Archive

ZORA

DCU 250 Arabic dependency bank: an LFG gold standard resource for the Arabic Penn treebank

Author: Akrout Amine
Al-Raheb Yafa
Dichy J.
van Genabith Josef
Publication venue
Publication date: 01/01/2006
Field of study

This paper describes the construction of a dependency bank gold standard for Arabic, DCU 250 Arabic Dependency Bank (DCU 250), based on the Arabic Penn Treebank Corpus (ATB) (Bies and Maamouri, 2003; Maamouri and Bies, 2004) within the theoretical framework of Lexical Functional Grammar (LFG). For parsing and automatically extracting grammatical and lexical resources from treebanks, it is necessary to evaluate against established gold standard resources. Gold standards for various languages have been developed, but to our knowledge, such a resource has not yet been constructed for Arabic. The construction of the DCU 250 marks the first step towards the creation of an automatic LFG f-structure annotation algorithm for the ATB, and for the extraction of Arabic grammatical and lexical resources

Irish Universities

DCU Online Research Access Service

Treebank-based acquisition of LFG parsing resources for French

Author: Schluter Natalie
van Genabith Josef
Publication venue
Publication date: 01/01/2008
Field of study

Motivated by the expense in time and other resources to produce hand-crafted grammars, there has been increased interest in automatically obtained wide-coverage grammars from treebanks for natural language processing. In particular, recent years have seen the growth in interest in automatically obtained deep resources that can represent information absent from simple CFG-type structured treebanks and which are considered to produce more language-neutral linguistic representations, such as dependency syntactic trees. As is often the case in early pioneering work on natural language processing, English has provided the focus of first efforts towards acquiring deep-grammar resources, followed by successful treatments of, for example, German, Japanese, Chinese and Spanish. However, no comparable large-scale automatically acquired deep-grammar resources have been obtained for French to date. The goal of this paper is to present the application of treebank-based language acquisition to the case of French. We show that with modest changes to the established parsing architectures, encouraging results can be obtained for French, with a best dependency structure f-score of 86.73%

CiteSeerX

Irish Universities

DCU Online Research Access Service

Treebank-based acquisition of wide-coverage, probabilistic LFG resources: project overview, results and evaluation

Author: Burke Michael
Cahill Aoife
O'Donovan Ruth
van Genabith Josef
Way Andy
Publication venue
Publication date: 01/01/2004
Field of study

This paper presents an overview of a project to acquire wide-coverage, probabilistic Lexical-Functional Grammar (LFG) resources from treebanks. Our approach is based on an automatic annotation algorithm that annotates “raw” treebank trees with LFG f-structure information approximating to basic predicate-argument/dependency structure. From the f-structure-annotated treebank we extract probabilistic unification grammar resources. We present the annotation algorithm, the extraction of lexical information and the acquisition of wide-coverage and robust PCFG-based LFG approximations including long-distance dependency resolution. We show how the methodology can be applied to multilingual, treebank-based unification grammar acquisition. Finally we show how simple (quasi-)logical forms can be derived automatically from the f-structures generated for the treebank trees

CiteSeerX

Irish Universities

DCU Online Research Access Service

A hybrid filtering approach for question answering

Author: Adafre Sisay Fissaha
van Genabith Josef
Publication venue
Publication date: 01/01/2009
Field of study

We describe a question answering system that took part in the bilingual CLEFQA task (German-English) where German is the source language and English the target language.We used the BableFish online translation system to translate the German questions into English. The system is targeted at Factoid and Denition questions. Our focus in designing the current system is on testing our online methods which are based on information extraction and linguistic ltering methods. Our system does not make use of precompiled tables or Gazetteers but uses Web snippets to rerank candidate answers extracted from the document collections. WordNet is also used as a lexical resource in the system. Our question answering system consists of the following core components: Question Anal- ysis, Passage Retrieval, Sentence Analysis and Answer Selection. These components employ various Natural Language Processing (NLP) and Machine Learning (ML) tools, a set of heuristics and dierent lexical resources. Seamless integration of the various components is one of the major challenges of QA system development. In order to facilitate our develop- ment process, we used the Unstructured Information Management Architecture (UIMA) as our underlying framework

Irish Universities

DCU Online Research Access Service

The ontology of signs as linguistic and non-linguistic entities: a cognitive perspective

Author: Kravchenko Prof. A.V.
Publication venue: John Benjamins
Publication date: 01/01/2003
Field of study

It is argued that the traditional philosophical/linguistic analysis of semiotic phe-nomena is based on the false epistemological assumption that linguistic and non-linguistic entities possess different ontologies. An attempt is made to show where linguistics as the study of signs went wrong, and an unorthodox account of the na-ture of semiosis is proposed in the framework of autopoiesis as a new epistemology of the living

CogPrints Cognitive Sciences Eprint Archive