16 research outputs found

    CLARIN: Common language resources and technology infrastructure

    Get PDF
    This paper gives an overview of the CLARIN project [1], which aims to create a research infrastructure that makes language resources and technology (LRT) available and readily usable to scholars of all disciplines, in particular the humanities and social sciences (HSS)

    Frequencies of form function correlates in the Dutch verb inflection system

    Get PDF
    Wetensch. publicatieFaculteit der Lettere

    The CLARIN Research Infrastructure: Resources and Tools for e-Humanities Scholars

    No full text
    The CLARIN Research Infrastructure: Resources and Tools for eHumanities Scholars Erhard Hinrichs and Steven Krauwer CLARIN is the short name for the Common Language Resources and Technology Infrastructure, which aims at providing easy and sustainable access for scholars in the humanities and social sciences to digital language data and advanced tools to discover, explore, exploit, annotate, analyse or combine them, independent of where they are located. CLARIN is in the process of building a networked federation of European data repositories, service centers and centers of expertise, with single sign-on access for all members of the academic community in all participating countries. Tools and data from different centers will be interoperable so that data collections can be combined and tools from different sources can be chained to perform complex operations to support researchers in their work. Interoperability of language resources and tools in the federation of CLARIN Centers is ensured by adherence to TEI and ISO standards for text encoding, by the use of persistent identifiers, and by the observance of common protocols. The purpose of the present paper is to give an overview of language resources, tools, and services that CLARIN presently offers

    Frequencies of form function correlates in the Dutch verb inflection system

    No full text

    The CLARIN Research Infrastructure: Resources and Tools for e-Humanities Scholars

    No full text
    The CLARIN Research Infrastructure: Resources and Tools for eHumanities Scholars Erhard Hinrichs and Steven Krauwer CLARIN is the short name for the Common Language Resources and Technology Infrastructure, which aims at providing easy and sustainable access for scholars in the humanities and social sciences to digital language data and advanced tools to discover, explore, exploit, annotate, analyse or combine them, independent of where they are located. CLARIN is in the process of building a networked federation of European data repositories, service centers and centers of expertise, with single sign-on access for all members of the academic community in all participating countries. Tools and data from different centers will be interoperable so that data collections can be combined and tools from different sources can be chained to perform complex operations to support researchers in their work. Interoperability of language resources and tools in the federation of CLARIN Centers is ensured by adherence to TEI and ISO standards for text encoding, by the use of persistent identifiers, and by the observance of common protocols. The purpose of the present paper is to give an overview of language resources, tools, and services that CLARIN presently offers

    Efficient Disambiguation by means of Stochastic Tree Substitution Grammars

    No full text
    In Stochastic Tree Substitution Grammars (STSGs), one parse(-tree) of an input sentence can be generated by exponentially many derivations; the probability of a parse is defined as the sum of the probabilities of its derivations. As a result, some methods of Stochastic Context-Free Grammars (SCFGs), e.g. the Viterbi algorithm for finding the most probable parse (MPP) of an input sentence, are not applicable to STSGs. In this paper we study parsing with STSGs and concentrate on the problem of disambiguation. We present polynomial algorithms for computing both the probability of a parse and the probability of an input sentence and its most probable derivation. In addition, we present an optimization technique of search algorithms for the MPP. Keywords: Corpus-based NLP, Statistical NLP, Disambiguation. Motivation Natural language (NL) grammars often assign many syntactic structures to the same sentence. Most of these structures are perceived as implausible by a human language user. At..

    MEDAR – collaboration between European and Mediterranean Arabic partners to support the development of language technology for Arabic

    No full text
    After the successful completion of the NEMLAR project 2003-2005, a new opportunity for a project was opened by the European Commission, and a group of largely the same partners is now executing the MEDAR project. MEDAR will be updating the surveys and BLARK for Arabic already made, and will then focus on machine translation (and other tools for translation) and information retrieval with a focus on language resources, tools and evaluation for these applications. A very important part of the MEDAR project is to reinforce and extend the NEMLAR network and to create a cooperation roadmap for Human Language Technologies for Arabic. It is expected that the cooperation roadmap will attract wide attention from other parties and that it can help create a larger platform for collaborative projects. Finally, the project will focus on dissemination of knowledge about existing resources and tools, as well as actors and activities; this will happen through newsletter, website and an international conference which will follow up on the Cairo conference of 2004. Dissemination to user communities will also be important, e.g. through participation in translators ’ conferences. The goal of these activities is to create a stronger and lasting collaboration between EU countries and Arabic speaking countries. 1. Background and Mission The development of language resources and tools for the Arabic language is important for the economy in the Arab countries; but at the same time it is important for th
    corecore