67 research outputs found

    Comparison of resource discovery methods

    Get PDF

    Foundation of a component-based flexible registry for language resources and technology

    No full text
    Within the CLARIN e-science infrastructure project it is foreseen to develop a component-based registry for metadata for Language Resources and Language Technology. With this registry it is hoped to overcome the problems of the current available systems with respect to inflexible fixed schema, unsuitable terminology and interoperability problems. The registry will address interoperability needs by refering to a shared vocabulary registered in data category registries as they are suggested by ISO

    Foundation of a Component-based Flexible Registry for Language Resources and Technology

    Get PDF
    Within the CLARIN e-science infrastructure project it is foreseen to develop a component-based registry for metadata for Language Resources and Language Technology. With this registry it is hoped to overcome the problems of the current available systems with respect to inflexible fixed schema, unsuitable terminology and interoperability problems. The registry will address interoperability needs by refering to a shared vocabulary registered in data category registries as they are suggested by ISO

    A Federation of Language Archives Enabling Future eHumanities Scenarios

    No full text
    This paper describes the need for new infrastructures for future eScience scenarios in the humanities. Three projects working on different aspects of these infrastructures are examined in detail. The first project is trying to achieve a federation of archives, developing an integration layer at the level of localization, access to and referring to an archive’s raw data objects. The other two try to achieve interoperability at the level of semantic interpretation of linguistic data-types and tagging systems. The project’s different approaches to this problem show the trade-of between flexibility and the user’s workload. All three approaches give an impression about the necessary steps to come to an eHumanities scenario

    A large Metadata Domain for Language Resources

    Get PDF
    Colloque avec actes et comité de lecture. internationale.International audienceThe INTERA and ECHO projects were partly intended to create a critical mass of open and linked metadata descriptions of language resources, helping researchers to understand the benefits of an increased visibility of language resources in the Internet and motivating them to participate. The work was based on the new IMDI version 3.0.3 which is a result of experiences with the earlier versions and new requirements coming from the involved partners. While in INTERA major data centers in Europe are participating, the ECHO project focuses on resources that can be seen as part of cultural heritage. Currently, 27 institutions and projects are active with the goal of having a large browsable and searchable domain by the summer of 2004. Experience shows that the creation of high quality metadata is not trivial and asks for a considerable amount of effort and skills, since manual work alone is too time consuming

    Archiving and accessing language resources

    Get PDF
    Languages are among the most complex systems that evolution has created. With an unforeseen speed many of these unique results of evolution are currently disappearing: every two weeks one of the 6500 still spoken languages is dying and many are subject to extreme changes due to globalization. Experts understood the need to document the languages and preserve the cultural and linguistic treasures embedded in them for future generations. Also linguistic theory will need to consider the variation of the linguistic systems encoded in languages to improve our understanding of how human minds process language material, thus accessibility to all types of resources is increasingly crucial. Deeper insights into human language processing and a higher degree of integration and interoperability between resources will also improve our language processing technology. The DOBES programme is focussing on the documentation and preservation of language material. The Max Planck Institute developed the Language Archiving Technology to help researchers when creating, archiving and accessing language resources. The recently started CLARIN research infrastructure has as main goals to achieve a broad visibility and an easy accessibility of language resources

    Digitizing Intangible Cultural Heritage

    Get PDF
    As part of the UNESCO project "Establishment of a National Inventory and Electronic Database of Lithuanian Intangible Cultural Heritage" the authors, representing the EU-funded project "European Cultural Heritage Online" (ECHO) were invited to give a course in digital archiving called "Digitizing Intangible Cultural Heritage" in Vilnius, Lithuania, March 15 to 20, 2004. The present report summarizes very briefly the sessions given. Thereafter, the analyses of the state of the digitization work of the participating institutes and recommendations for the future are given in a dedicated, stand-alone section

    Naturalistic Emotional Speech Corpora with Large Scale Emotional Dimension Ratings

    Get PDF
    The investigation of the emotional dimensions of speech is dependent on large sets of reliable data. Existing work has been carried out on the creation of emotional speech corpora and the acoustic analysis of emotional speech and this research seeks to buildupon this work while suggesting new methods and areas of potential. A review of the literature determined that a two dimensional emotional model of activation and evaluation was the ideal method for representing the emotional states expressed inspeech. Two case studies were carried out to investigate methods of obtaining naturalunderlying emotional speech in a high quality audio environment, the results of which were used to design a final experimental procedure to elicit natural underlying emotional speech. The speech obtained in this experiment was used in the creation ofa speech corpus that was underpinned by a persistent backend database that incorporated a three-tiered annotation methodology. This methodology was used to comprehensively annotate the metadata, acoustic data and emotional data of the recorded speech. Structuring the three levels of annotation and the assets in a persistent backend database allowed interactive web-based tools to be developed; aweb-based listening tool was developed to obtain a large amount of ratings for the assets that were then written back to the database for analysis. Once a large amount of ratings had been obtained, statistical analysis was used to determine the dimensionalrating for each asset. Acoustic analysis of the underlying emotional speech was then carried out and determined that certain acoustic parameters were correlated with the activation dimension of the dimensional model. This substantiated some of thefindings in the literature review and further determined that spectral energy was strongly correlated with the activation dimension in relation to underlying emotional speech. The lack of a correlation for certain acoustic parameters in relation to the evaluation dimension was also determined, again substantiating some of the findings in the literature.The work contained in this thesis makes a number of contributions to the field: the development of an experimental design to elicit natural underlying emotional speech in a high quality audio environment; the development and implementation of acomprehensive three-tiered corpus annotation methodology; the development and implementation of large scale web based listening tests to rate the emotional dimensions of emotional speech; the determination that certain acoustic parameters are correlated with the activation dimension of a dimensional emotional model inrelation to natural underlying emotional speech and the determination that certain acoustic parameters are not correlated with the evaluation dimension of a twodimensional emotional model in relation to natural underlying emotional speech

    Foundation of a Component-based Flexible Registry for Language Resources and Technolog

    Get PDF
    International audienceWithin the CLARIN e-science infrastructure project it is foreseen to develop a component-based registry for metadata for Language Resources and Language Technology. With this registry it is hoped to overcome the problems of the current available systems with respect to inflexible fixed schema, unsuitable terminology and interoperability problems. The registry will address interoperability needs by refering to a shared vocabulary registered in data category registries as they are suggested by ISO
    • …
    corecore