73 research outputs found

    New Developments in Arbil Metadata Manager

    Get PDF
    This talk will introduce Arbil which is a tool for managing metadata that describes research data, such as audio or video files, allowing research data files to be easily searched both before and after they are archived. Arbil has been developed at The Language Archive at MPIPL (Author, 2012) and was originally designed for the DOBES community to replace the IMDI Editor. The core needs expressed by this group was viewing and editing the metadata when in the field and being able to edit more than one metadata file at once. Indeed, Arbil is fully functional offline, provides tabular editing, and for robustness stores only text metadata files. For moving metadata and associated resources into an LAT archive, the structure is exported from Arbil and then uploaded into LAMUS (Broeder et al., 2006). Arbil was originally designed to support IMDI metadata (Broeder and Wittenburg, 2006). This format has been in use for many years, and it covers most needs with a number of set fields, but also may confuse researchers and slow down the workflow with so many fields to fill in. This issue has been addressed by CLARIN (Va ́radi et al., 2008). CLARIN provides flexible metadata fields, allowing a custom profile to be designed for each project ¬ only the relevant metadata fields need to be offered to the end user, greatly simplifying the process of creating metadata. Arbil has now been updated to support both IMDI and Clarin metadata formats. Because of the flexible design of Arbil, some of its components such as the metadata table and tree have been utilised in KinOath Kinship Archiver (Author, 2011). This application builds on the core functions of Arbil, onto which it adds an XML database to provide fast searches. Also, a plugin layer has been introduced in KinOath which has been migrated back into Arbil. Another project that is in the prototype stage is a web based search similar to the search in Arbil. These changes are being combined together as a search plugin for Arbil which is in development that will allow much more powerful searches to be available without compromising the original design of the application. References Author. 2012. Metadata Management with Arbil. In Proceedings of the Eighth International Conference On Language Resources And Evaluation (LREC 2012) Satellite Workshops, pages 72–75. Istanbul. http://www.lrec¬conf.org/proceedings/lrec2012/workshops/11.LREC2012%20Metadata%20Proceedin gs.pdf Author. 2011. KinOath, Kinship Software Beta Stage of Development. Talk presented at Atelier d’initiation au traitement informatique de la parenté. salle 3, RdC, bât. Le France. 2011¬12¬16. D. Broeder and P. Wittenburg. 2006. The IMDI metadata framework, its current application and future direction. International Journal of Metadata, Semantics and Ontologies, 1(2), pages 119–132. T. Váradi, S. Krauwer, P. Wittenburg, M. Wynne, and K. Koskenniemi. 2008. Clarin: Common language resources and technology infrastructure. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08), pages 1244–1248, Marrakech. European Language Resources Association (ELRA). http://www.lrecconf.org/proceedings/lrec2008/pdf/317_paper.pdf. D. Broeder, A. Claus, F. Offenga, R. Skiba, P. Trilsbeek, and P. Wittenburg. 2006. LAMUS : the Language Archive Management and Upload System. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), pages 2291–2294, Genoa. European Language Resources Association (ELRA). www.latmpi.eu/papers/papers 2006/lamuspaperfinal2.pd

    Virtual language observatory: The portal to the language resources and technology universe

    Get PDF
    Over the years, the field of Language Resources and Technology (LRT) hasdeveloped a tremendous amount of resources and tools. However, there is noready-to-use map that researchers could use to gain a good overview andsteadfast orientation when searching for, say corpora or software tools tosupport their studies. It is rather the case that information is scatteredacross project- or organisation-specific sites, which makes it hard if notimpossible for less-experienced researchers to gather all relevant material.Clearly, the provision of metadata is central to resource and softwareexploration. However, in the LRT field, metadata comes in many forms, tastesand qualities, and therefore substantial harmonization and curation efforts arerequired to provide researchers with metadata-based guidance. To address thisissue a broad alliance of LRT providers (CLARIN, the Linguist List, DOBES,DELAMAN, DFKI, ELRA) have initiated the Virtual Language Observatory portal toprovide a low-barrier, easy-to-follow entry point to language resources andtools; it can be accessed via http://www.clarin.eu/vl

    Archiving and accessing language resources

    Get PDF
    Languages are among the most complex systems that evolution has created. With an unforeseen speed many of these unique results of evolution are currently disappearing: every two weeks one of the 6500 still spoken languages is dying and many are subject to extreme changes due to globalization. Experts understood the need to document the languages and preserve the cultural and linguistic treasures embedded in them for future generations. Also linguistic theory will need to consider the variation of the linguistic systems encoded in languages to improve our understanding of how human minds process language material, thus accessibility to all types of resources is increasingly crucial. Deeper insights into human language processing and a higher degree of integration and interoperability between resources will also improve our language processing technology. The DOBES programme is focussing on the documentation and preservation of language material. The Max Planck Institute developed the Language Archiving Technology to help researchers when creating, archiving and accessing language resources. The recently started CLARIN research infrastructure has as main goals to achieve a broad visibility and an easy accessibility of language resources

    Reusable, Interactive, Multilingual Online Avatars

    Get PDF
    This paper details a system for delivering reusable, interactive multilingual avatars in online children’s games. The development of these avatars is based on the concept of an intelligent media object that can be repurposed across different productions. The system is both language and character independent, allowing content to be reused in a variety of contexts and locales. In the current implementation, the user is provided with an interactive animated robot character that can be dressed with a range of body parts chosen by the user in real-time. The robot character reacts to each selection of a new part in a different manner, relative to simple narrative constructs that define a number of scripted responses. Once configured, the robot character subsequently appears as a help avatar throughout the rest of the game. At time of writing, the system is currently in beta testing on the My Tiny Planets website to fully assess its effectiveness

    LinguaTag: an Emotional Speech Analysis Application

    Get PDF
    The analysis of speech, particularly for emotional content, is an open area of current research. Ongoing work has developed an emotional speech corpus for analysis, and defined a vowel stress method by which this analysis may be performed. This paper documents the development of LinguaTag, an open source speech analysis software application which implements this vowel stress emotional speech analysis method developed as part of research into the acoustic and linguistic correlates of emotional speech. The analysis output is contained within a file format combining SMIL and SSML markup tags, to facilitate search and retrieval methods within an emotional speech corpus database. In this manner, analysis performed using LinguaTag aims to combine acoustic, emotional and linguistic descriptors in a single metadata framework

    Naturalistic Emotional Speech Corpora with Large Scale Emotional Dimension Ratings

    Get PDF
    The investigation of the emotional dimensions of speech is dependent on large sets of reliable data. Existing work has been carried out on the creation of emotional speech corpora and the acoustic analysis of emotional speech and this research seeks to buildupon this work while suggesting new methods and areas of potential. A review of the literature determined that a two dimensional emotional model of activation and evaluation was the ideal method for representing the emotional states expressed inspeech. Two case studies were carried out to investigate methods of obtaining naturalunderlying emotional speech in a high quality audio environment, the results of which were used to design a final experimental procedure to elicit natural underlying emotional speech. The speech obtained in this experiment was used in the creation ofa speech corpus that was underpinned by a persistent backend database that incorporated a three-tiered annotation methodology. This methodology was used to comprehensively annotate the metadata, acoustic data and emotional data of the recorded speech. Structuring the three levels of annotation and the assets in a persistent backend database allowed interactive web-based tools to be developed; aweb-based listening tool was developed to obtain a large amount of ratings for the assets that were then written back to the database for analysis. Once a large amount of ratings had been obtained, statistical analysis was used to determine the dimensionalrating for each asset. Acoustic analysis of the underlying emotional speech was then carried out and determined that certain acoustic parameters were correlated with the activation dimension of the dimensional model. This substantiated some of thefindings in the literature review and further determined that spectral energy was strongly correlated with the activation dimension in relation to underlying emotional speech. The lack of a correlation for certain acoustic parameters in relation to the evaluation dimension was also determined, again substantiating some of the findings in the literature.The work contained in this thesis makes a number of contributions to the field: the development of an experimental design to elicit natural underlying emotional speech in a high quality audio environment; the development and implementation of acomprehensive three-tiered corpus annotation methodology; the development and implementation of large scale web based listening tests to rate the emotional dimensions of emotional speech; the determination that certain acoustic parameters are correlated with the activation dimension of a dimensional emotional model inrelation to natural underlying emotional speech and the determination that certain acoustic parameters are not correlated with the evaluation dimension of a twodimensional emotional model in relation to natural underlying emotional speech

    Semantic metadata mapping in practice: The Virtual Language Observatory

    No full text
    In this paper we present the Virtual Language Observatory (VLO), a metadata-based portal for language resources. It is completely based on the Component Metadata (CMDI) and ISOcat standards. This approach allows for the use of heterogeneous metadata schemas while maintaining the semantic compatibility. We describe the metadata harvesting process, based on OAI-PMH, and the conversion from several formats (OLAC, IMDI and the CLARIN LRT inventory) to their CMDI counterpart profiles. Then we focus on some post-processing steps to polish the harvested records. Next, the ingestion of the CMDI files into the VLO facet browser is described. We also include an overview of the changes since the first version of the VLO, based on user feedback from the CLARIN community. Finally there is an overview of additional ideas and improvements for future versions of the VLO

    CAVA (Human Communication: an Audio-Visual Archive for UCL) Project. Final report

    Get PDF
    The objective of the CAVA project was to establish a repository for primary audio-visual data on real-life human communication for spoken and signed languages

    Max-Planck-Institute for Psycholinguistics: Annual Report 2003

    Get PDF
    corecore