Search CORE

1,138 research outputs found

Querying a dozen corpora and a thousand years with Fintan

Author: Chiarcos Christian
Fäth Christian
Ionov Maxim
Publication venue
Publication date: 24/04/2023
Field of study

Large-scale diachronic corpus studies covering longer time periods are difficult if more than one corpus are to be consulted and, as a result, different formats and annotation schemas need to be processed and queried in a uniform, comparable and replicable manner. We describes the application of the Flexible Integrated Transformation and Annotation eNgineering (Fintan) platform for studying word order in German using syntactically annotated corpora that represent its entire written history. Focusing on nominal dative and accusative arguments, this study hints at two major phases in the development of scrambling in modern German. Against more recent assumptions, it supports the traditional view that word order flexibility decreased over time, but it also indicates that this was a relatively sharp transition in Early New High German. The successful case study demonstrates the potential of Fintan and the underlying LLOD technology for historical linguistics, linguistic typology and corpus linguistics. The technological contribution of this paper is to demonstrate the applicability of Fintan for querying across heterogeneously annotated corpora, as previously, it had only been applied for transformation tasks. With its focus on quantitative analysis, Fintan is a natural complement for existing multi-layer technologies that focus on query and exploration

OPUS Augsburg

Naturalistic Emotional Speech Corpora with Large Scale Emotional Dimension Ratings

Author: Vaughan Brian
Publication venue: Dublin Institute of Technology
Publication date: 01/01/2011
Field of study

The investigation of the emotional dimensions of speech is dependent on large sets of reliable data. Existing work has been carried out on the creation of emotional speech corpora and the acoustic analysis of emotional speech and this research seeks to buildupon this work while suggesting new methods and areas of potential. A review of the literature determined that a two dimensional emotional model of activation and evaluation was the ideal method for representing the emotional states expressed inspeech. Two case studies were carried out to investigate methods of obtaining naturalunderlying emotional speech in a high quality audio environment, the results of which were used to design a final experimental procedure to elicit natural underlying emotional speech. The speech obtained in this experiment was used in the creation ofa speech corpus that was underpinned by a persistent backend database that incorporated a three-tiered annotation methodology. This methodology was used to comprehensively annotate the metadata, acoustic data and emotional data of the recorded speech. Structuring the three levels of annotation and the assets in a persistent backend database allowed interactive web-based tools to be developed; aweb-based listening tool was developed to obtain a large amount of ratings for the assets that were then written back to the database for analysis. Once a large amount of ratings had been obtained, statistical analysis was used to determine the dimensionalrating for each asset. Acoustic analysis of the underlying emotional speech was then carried out and determined that certain acoustic parameters were correlated with the activation dimension of the dimensional model. This substantiated some of thefindings in the literature review and further determined that spectral energy was strongly correlated with the activation dimension in relation to underlying emotional speech. The lack of a correlation for certain acoustic parameters in relation to the evaluation dimension was also determined, again substantiating some of the findings in the literature.The work contained in this thesis makes a number of contributions to the field: the development of an experimental design to elicit natural underlying emotional speech in a high quality audio environment; the development and implementation of acomprehensive three-tiered corpus annotation methodology; the development and implementation of large scale web based listening tests to rate the emotional dimensions of emotional speech; the determination that certain acoustic parameters are correlated with the activation dimension of a dimensional emotional model inrelation to natural underlying emotional speech and the determination that certain acoustic parameters are not correlated with the evaluation dimension of a twodimensional emotional model in relation to natural underlying emotional speech

Arrow@TUDublin

CLARIN. The infrastructure for language resources

Author: Fišer Darja
Witt Andreas
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 17/10/2022
Field of study

CLARIN, the "Common Language Resources and Technology Infrastructure", has established itself as a major player in the field of research infrastructures for the humanities. This volume provides a comprehensive overview of the organization, its members, its goals and its functioning, as well as of the tools and resources hosted by the infrastructure. The many contributors representing various fields, from computer science to law to psychology, analyse a wide range of topics, such as the technology behind the CLARIN infrastructure, the use of CLARIN resources in diverse research projects, the achievements of selected national CLARIN consortia, and the challenges that CLARIN has faced and will face in the future. The book will be published in 2022, 10 years after the establishment of CLARIN as a European Research Infrastructure Consortium by the European Commission (Decision 2012/136/EU)

Publikationsserver des Instituts für Deutsche Sprache

Chatbot development to assist patients in health care services

Author: Barbosa António Pedro Mesquita
Publication venue
Publication date: 10/12/2020
Field of study

Dissertação de mestrado integrado em Engenharia InformáticaDados de alta qualidade sobre tratamentos médicos e de informação técnica tornaram-se acessíveis, criando novas oportunidades de E-Saúde para a recuperação de um paciente. A implementação da aprendizagem automática nestas soluções provou ser essencial e eficaz na elaboração de aplicações para o utilizador para aliviar a sobrecarga do sector de saúde. Atualmente, muitas interações com os utentes são realizadas via telefonemas e mensagens de texto. Os agentes de conversação podem responder a estas questões, fomentando uma rápida interação com os pacientes. O objetivo fundamental desta dissertação é prestar apoio aos pacientes, fornecendo uma fonte de informação fidedigna que lhes permita instruir-se e esclarecer dúvidas sobre os procedimentos e repercussões dos seus problemas de saúde. Este propósito foi concretizado não apenas através de uma plataforma Web intuitiva e acessível, composta por perguntas frequentes, mas também integrando um agente de conversação inteligente para responder a questões. Para este fim, cientificamente, foi necessário conduzir a investigação, implementação e viabilidade dos agentes de conversação no domínio fechado para os cuidados de saúde. Constitui um importante contributo para a comunidade de desenvolvimento de chatbots, na qual se reúnem as últimas inovações e descobertas, bem os desafios actuais da aprendizagem automática, contribuindo para a consciencialização desta área.High-quality data on medical treatments and facility-level information has become accessible, creating new eHealth opportunities for the recuperation of a patient. Machine learning implementation in these solutions has been proven to be essential and effective in building user-centred applications to relieves the burden on the healthcare sector. Nowadays, many patient interactions are handled through healthcare services via phone calls and text message exchange. Conversation agents can provide answers to these queries, promoting fast patient interaction. The underlying aim of this dissertation is to assist patients by providing a reliable source of information to educate themselves and clarify any doubts about procedures and implications of their health issue. This purpose was achieved not only through an intuitive and accessible web platform, with frequently asked questions, but also by integrating an intelligent chatting agent to answer questions. To this end, scientifically, it was necessary to conduct the research, implementation and feasibility of closed-domain conversation agents for healthcare. It is a valuable input for the chatbot development community, which assembles the latest innovations and findings, as well as the current challenges of machine learning, contributing to the awareness of this field

Universidade do Minho: RepositoriUM

CLARIN

Author
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 30/01/2023
Field of study

The book provides a comprehensive overview of the Common Language Resources and Technology Infrastructure – CLARIN – for the humanities. It covers a broad range of CLARIN language resources and services, its underlying technological infrastructure, the achievements of national consortia, and challenges that CLARIN will tackle in the future. The book is published 10 years after establishing CLARIN as an Europ. Research Infrastructure Consortium

Directory of Open Access Books (DOAB)

Using Social Media Websites to Support Scenario-Based Design of Assistive Technology

Author: Yu Xing
Publication venue
Publication date: 01/01/2020
Field of study

Indiana University-Purdue University Indianapolis (IUPUI)Having representative users, who have the targeted disability, in accessibility studies is vital to the validity of research findings. Although it is a widely accepted tenet in the HCI community, many barriers and difficulties make it very resource-demanding for accessibility researchers to recruit representative users. As a result, researchers recruit non-representative users, who do not have the targeted disability, instead of representative users in accessibility studies. Although such an approach has been widely justified, evidence showed that findings derived from non-representative users could be biased and even misleading. To address this problem, researchers have come up with different solutions such as building pools of users to recruit from. But still, the data is not widely available and needs a lot of effort and resource to build and maintain. On the other hand, online social media websites have become popular in the last decade. Many online communities have emerged that allow online users to discuss health-related subjects, exchange useful information, or provide emotional support. A large amount of data accumulated in such online communities have gained attention from researchers in the healthcare domain. And many researches have been done based on data from social media websites to better understand health problems to improve the wellbeing of people. Despite the increasing popularity, the value of data from social media websites for accessibility research remains untapped. Hence, my work aims to create methods that could extract valuable information from data collected on social media websites for accessibility practitioners to support their design process. First, I investigate methods that enable researchers to effectively collect representative data from social media websites. More specifically, I look into machine learning approaches that could allow researchers to automatically identify online users who have disabilities (representative users). Second, I investigate methods that could extract useful information from user-generated free-text using techniques drawn from the information extraction domain. Last, I explore how such information should be visualized and presented for designers to support the scenario-based design process in accessibility studies

IUPUIScholarWorks

Donate Speech : Collecting and Sharing a Large-Scale Speech Database for Social Sciences, Humanities and Artificial Intelligence Research and Innovation

Author: Jauhiainen Tommi
Kurimo Mikko
Kurki Tommi
Lennes Mietta
Lindén Krister
Pitkänen Olli
Rossi Aleksi
Publication venue: de Gruyter
Publication date: 01/10/2022
Field of study

The Donate Speech campaign aimed to collect 10 000 hours of ordinary, casual Finnish speech to be used for studying language as well as for developing technology and services that can be readily used in the languages spoken in Finland. In this project, particular attention has been paid to allowing for both academic and commercial use of the material. Even though the ambitious target currently seems to evade us, the Donate Speech campaign has managed to collect an extensive resource of more than 3500 h of Finnish colloquial speech with more than 200 000 speech recordings by roughly 50 000 speakers from all over Finland in just a few months.Peer reviewe

Aaltodoc Publication Archive

Helsingin yliopiston digitaalinen arkisto