248 research outputs found

    BAStat : New Statistical Resources at the Bavarian Archive for Speech Signals

    Get PDF
    A new type of language resource ’BAStat’ has been released by the Bavarian Archive for Speech Signals. In contrast to primary resources like speech and text corpora BAStat comprises statistical estimates based on a number of primary resources: first and second order occurrence probability of phones, syllables and words, duration statistics, probabilities of pronunciation variants of words and probabilities of context information. Unlike other statistical speech resources BAStat is based solely on recordings of conversational German and therefore models spoken language. It consists of 7-bit ASCII tables and matrices to maximize inter-operability between different platforms and can be downloaded from the BAS web-site. This paper gives a detailed description about the empirical basis, the contained data types, some interesting interpretations and a brief comparison to the text-based statistical resource CELEX

    New European Infrastructural and Networking Initiatives

    Get PDF
    I?ll point at two new infrastructural initiatives ? launched by the European Commission ? in the area of Language Resources (LR) and Language Technologies (LT), which will influence how we shape the future of the field: CLARIN and FLaReNet

    Foreword

    Get PDF
    My final remark is that, as with any new development, it is important on one side to leave space to the free rise of new ideas and methods inside the collaborative paradigm, but is also important to start organising its future. There must be a bold vision and an international group able to push for it (with both researchers and policy makers involved) and to organise some grand challenge that, via a distribution of efforts and exploiting the sharing trend, involves the collaboration of a consistent portion of our community. Could we envision a large "Language Library" as the beginning of a big Genome project for languages, where the community collectively deposits/creates increasingly rich and multi-layered LRs, enabling a deeper understanding of the complex relations between different annotation layers/language phenomena

    Approaches towards a Lexical Web: the role of Interoperability

    Get PDF
    After highlighting some of the major dimensions that are relevant for Language Resources (LR) and contribute to their infrastructural role, I underline some priority areas of concern today with respect to implementing an open Language Infrastructure, and specifically what we could call a ?Lexical Web?. My objective is to show that it is imperative to define an underlying global strategy behind the set of initiatives which are/can be launched in Europe and world-wide, and that it is necessary an allembracing vision and a cooperation among different communities to achieve more coherent and useful results. I end up mentioning two new European initiatives that in this direction and promise to be influential in shaping the future of the LR area

    Language Infrastructures: what happens outside Europe?

    Get PDF
    The setup of CLARIN in Europe was the result of a long series of initiatives and attempts from many of us, starting already at the beginning on the 6th Framework Programme. That time is finally ripe for such an infrastructure is shown also by other initiatives outside Europe that share objectives and ideas with CLARIN. I mention here just a few

    Planning the Future of Language Resources: The Role of the FLaReNet Network

    Get PDF
    In this paper we analyse the role of Language Resources (LR) and Language Technologies (LT) in today Human Language Technology field and try to speculate on some of the priorities for the next years, from the particular perspective of the FLaReNet project, that has been asked to act as an observatory to assess current status of the field on Language Resources and Technology and to indicate priorities of action for the future

    The EAGLES/ISLE initiative for setting standards: the Computational Lexicon Working Group for Multilingual Lexicons

    Get PDF
    ISLE (International Standards for Language Engineering), a transatlantic standards oriented initiative under the Human Language Technology (HLT) programme, is a continuation of the long standing EAGLES (Expert Advisory Group for Language Engineering Standards) initiative, carried out by European and American groups within the EU-US International Research Co-operation, supported by NSF and EC. The objective is to support HLT R&D international and national projects, and HLT industry, by developing and promoting widely agreed and urgently demanded HLT standards and guidelines for infrastructural language resources, tools, and HLT products. ISLE targets the areas of multilingual computational lexicons (MCL), natural interaction and multimodality (NIMM), and evaluation. For MCL, ISLE is working to: extend EAGLES work on lexical semantics, necessary to establish inter-language links; design standards for multilingual lexicons; develop a prototype tool to implement lexicon guidelines; create EAGLES-conformant sample lexicons and tag corpora for validation purposes; develop standardised evaluation procedures for lexicons. For NIMM, a rapidly innovating domain urgently requiring early standardisation, ISLE work is targeted to develop guidelines for: creation of NIMM data resources; interpretative annotation of NIMM data, including spoken dialogue; annotation of discourse phenomena. For evaluation, ISLE is working on: quality models for machine translation systems; maintenance of previous guidelines - in an ISO based framework. We concentrate in the paper on the Computational Lexicon Working Group, describing in detail the proposals of guidelines for the "Multilingual ISLE Lexical Entry" (MILE). We highlight some methodological principles applied in previous EAGLES, and followed in defining MILE. We also provide a description of the EU SIMPLE semantic lexicons built on the basis of previous EAGLES recommendations. Their importance is given by the fact that these lexicons are now enlarged to real-size lexicons within National Projects in 8 EU countries, thus building a really large infrastructural platform of harmonised lexicons in Europe. We will stress the relevance of standardised language resources also for the humanities applications. Numerous theories, approaches, systems are taken into account in ISLE, as any recommendation for harmonisation must build on the major contemporary approaches. Results will be widely disseminated, after validation in collaboration with EU and US HLT R&D projects, and industry. EAGLES work towards de facto standards has already allowed the field of Language Resources to establish broad consensus on key issues for some well-established areas - and will allow similar consensus to be achieved for other important areas through the ISLE project - providing thus a key opportunity for further consolidation and a basis for technological advance. EAGLES previous results in many areas have in fact already become de facto widely adopted standards, and EAGLES itself is a well-known trademark and a point of reference for HLT projects.Hosted by the Scholarly Text and Imaging Service (SETIS), the University of Sydney Library, and the Research Institute for Humanities and Social Sciences (RIHSS), the University of Sydney

    The FLaReNet Thematic Network: a Global Forum for Cooperation

    Get PDF
    The aim of this short paper is to present the FLaReNet Thematic Network for Language Resources and Language Technologies to the Asian Language Resources Community. Creation of a wide and committed community and of a shared policy in the field of Language Resources is essential in order to foster a substantial advancement of the field. This paper presents the background, overall objectives and methodology of work of the project, as well as a set of preliminary results

    Interoperability Framework: The FLaReNet action plan proposal

    Get PDF
    Standards are fundamental to ex-change, preserve, maintain and integrate data and language resources, and as an essential basis of any language resource infrastructure. This paper promotes an Interoperability Framework as a dynamic environment of standards and guidelines, also intended to support the provision of language-(web)service interoperability. In the past two decades, the need to define common practices and formats for linguistic resources has been increasingly recognized and sought. Today open, collaborative, shared data is at the core of a sound language strategy, and standardisation is actively on the move. This paper first describes the current landscape of standards, and presents the major barriers to their adoption; then, it describes those scenarios that critically involve the use of standards and provide a strong motivation for their adoption; lastly, a series of actions and steps needed to operationalise standards and achieve a full interoperability for Language Resources and Technologies are proposed

    The FLaReNet Databook

    Get PDF
    A collection of all the factual material collected during the activities of the FLaReNet project and a set of innovative initiatives and instruments that will remain in place for the continuous collection of such "facts". Editors: Paola Baroni, Claudia Soria, Nicoletta Calzolari. Contributors: Victoria Arranz, N?ria Bel, Gerhard Budin, Tommaso Caselli, Khalid Choukri, Riccardo Del Gratta, Elina Desypri, Gil Francopoulo, Francesca Frontini, Sara Goggi, Olivier Hamon, Erhard Hinrichs, Penny Labropoulou, Lothar Lemnizer, Steven Krauwer, Valerie Mapelli, Joseph Mariani, Monica Monachini, Jan Odijk, Jungyeul Park, Stelios Piperidis, Adam Przepiorkowski, Valeria Quochi, Eva Revilla, Laurent Romary, Francesco Rubino, Irene Russo, Helmut Schmidt, Hans Uszkoreit, Peter Wittenburg
    corecore