64 research outputs found

    A Modular and Flexible Architecture for an Integrated Corpus Query System

    Full text link
    The paper describes the architecture of an integrated and extensible corpus query system developed at the University of Stuttgart and gives examples of some of the modules realized within this architecture. The modules form the core of a corpus workbench. Within the proposed architecture, information required for the evaluation of queries may be derived from different knowledge sources (the corpus text, databases, on-line thesauri) and by different means: either through direct lookup in a database or by calling external tools which may infer the necessary information at the time of query evaluation. The information available and the method of information access can be stated declaratively and individually for each corpus, leading to a flexible, extensible and modular corpus workbench.Comment: 10 pages, uuencoded gzip'ped PostScript; presented at COMPLEX'9

    IAC: a dynamic corpora access interface

    Get PDF
    En esta demostración presentamos IAC (Interfaz de Acceso a Corpus), una herramienta on-line desarrollada por Barcelona Media - Centro de Innovación y la Universidad Pompeu Fabra que permite crear interfaces dinámicas para hacer búsquedas en corpus.In this demo we present IAC (Corpus Access Interface), an on-line tool developed by Barcelona Media - Innovation Center and the Pompeu Fabra University to create dynamic interfaces to search in corpora

    The ASK Corpus – a Language Learner Corpus of Norwegian as a Second Language

    Get PDF
    In our paper we present the design and interface of ASK, a language learner corpus of Norwegian as a second language which contains essays collected from language tests on two different proficiency levels as well as personal data from the test takers. In addition, the corpus also contains texts and relevant personal data from native Norwegians as control data. The texts as well as the personal data are marked up in XML according to the TEI Guidelines. In order to be able to classify errors in the texts, we have introduced new attributes to the TEI corr and sic tags. For each error tag, a correct form is also in the text annotation. Finally, we employ an automatic tagger developed for standard Norwegian, the Oslo-Bergen Tagger , together with a facility for manual tag correction. As corpus query system, we are using the Corpus Workbench developed at the University of Stuttgart together with a web search interface developed at Aksis, University of Bergen. The system allows for searching for combinations of words, error types, grammatical annotation and personal data.publishedVersio

    The TXM Platform: Building Open-Source Textual Analysis Software Compatible with the TEI Encoding Scheme

    Get PDF
    International audienceAbstract. This paper describes the rationale and design of an XML-TEI encoded corpora compatible analysis platform for text mining called TXM.The design of this platform is based on a synthesis of the best available algorithms in existing textometry software. It also relies on identifying the most relevant open-source technologies for processing textual resources encoded in XML and Unicode, for efficient full-text search on annotated corpora and for statistical data analysis.The architecture is based on a Java toolbox articulating a full-text search engine component with a statistical computing environment and with an original import environment able to process a large variety of data sources, including XML-TEI, and to apply embedded NLP tools to them.The platform is distributed as an open-source Eclipse project for developers and in the form of two demonstrator applications for end users: a standard application to install on a workstation and an online web application framework

    Timber! Issues in treebank building and use

    Get PDF

    What can Metaphor Tell us about the Language of Translation?

    Get PDF
    AbstractThis paper illustrates an exploratory study aimed at devising a methodology for the analysis of the language of translations through a comparison of metaphor use in original and translated texts. It uses a pilot monolingual comparable corpus of corporate sustainability reports made up of 2 sections: a subcorpus of Spanish originals and a subcorpus of translations from English into Spanish. VERB-NOUN metaphors are analyzed to compare collocation variety, typical collocations and degree of metaphorical conventionality of the VERB-NOUN pairs in original and translated texts. Results suggest that metaphors in translated texts show both a tendency to normalization and a preference for unconventional uses arising from original text expressions “shining through” in the translations

    Corpus : a parallel corpus of English and Spanish Free Trade Agreements for the study of specialized collocations

    Get PDF
    ABSTRACT: This paper describes the Corpus of Free Trade Agreements (henceforth FTA), a specialized parallel corpus in English and Spanish from Europe and America and a smaller subcorpus in English-Norwegian and Spanish-Norwegian that was prepared and then aligned with Translation Corpus Aligner 2 (Hofland & Johansson, 1998). The data was taken from Free Trade Agreements. These agreements are specialized texts officially signed and ratified by several countries and blocks of countries in the last twenty years. Thus, FTAs are a rich repository for terminology and phraseology that is used in different fields of business activity throughout the world. The corpus contains around 1.37 million words in the English section and 1.48 million words in its Spanish counterpart, plus 60,000 words each in the Spanish-Norwegian and English-Norwegian subcorpus. The corpus is being used primarily to study the terms and specialized collocations that include these terms in this kind of specialized texts
    corecore