Search CORE

121 research outputs found

Articulatory and bottleneck features for speaker-independent ASR of dysarthric speech

Author: Franco Horacio
Mitra Vikramjit
Sivaraman Ganesh
Yılmaz Emre
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

The rapid population aging has stimulated the development of assistive devices that provide personalized medical support to the needies suffering from various etiologies. One prominent clinical application is a computer-assisted speech training system which enables personalized speech therapy to patients impaired by communicative disorders in the patient's home environment. Such a system relies on the robust automatic speech recognition (ASR) technology to be able to provide accurate articulation feedback. With the long-term aim of developing off-the-shelf ASR systems that can be incorporated in clinical context without prior speaker information, we compare the ASR performance of speaker-independent bottleneck and articulatory features on dysarthric speech used in conjunction with dedicated neural network-based acoustic models that have been shown to be robust against spectrotemporal deviations. We report ASR performance of these systems on two dysarthric speech datasets of different characteristics to quantify the achieved performance gains. Despite the remaining performance gap between the dysarthric and normal speech, significant improvements have been reported on both datasets using speaker-independent ASR architectures.Comment: to appear in Computer Speech & Language - https://doi.org/10.1016/j.csl.2019.05.002 - arXiv admin note: substantial text overlap with arXiv:1807.1094

arXiv.org e-Print Archive

Radboud Repository

ScholarBank@NUS

The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism

Author: Batliner Anton
Chetouani Mohamed
Eyben Florian
Kim Samuel
Marchi Erik
Mortillaro Marcello
Polychroniou Anna
Ringeval Fabien
Salamin Hugues
Scherer Klaus
Schuller Björn
Steidl Stefan
Valente Fabio
Vinciarelli Alessandro
Weninger Felix
Publication venue
Publication date: 01/01/2013
Field of study

The INTERSPEECH 2013 Computational Paralinguistics Challenge provides for the first time a unified test-bed for Social Signals such as laughter in speech. It further introduces conflict in group discussions as new tasks and picks up on autism and its manifestations in speech. Finally, emotion is revisited as task, albeit with a broader ranger of overall twelve emotional states. In this paper, we describe these four Sub-Challenges, Challenge conditions, baselines, and a new feature set by the openSMILE toolkit, provided to the participants. \em Bj\"orn Schuller

^1

, Stefan Steidl

^2

, Anton Batliner

^1

, Alessandro Vinciarelli

^{3,4}

, Klaus Scherer

^5

}\\ {\em Fabien Ringeval

^6

, Mohamed Chetouani

^7

, Felix Weninger

^1

, Florian Eyben

^1

, Erik Marchi

^1

, }\\ {\em Hugues Salamin

^3

, Anna Polychroniou

^3

, Fabio Valente

^4

, Samuel Kim

^4

CiteSeerX

Hal - Université Grenoble Alpes

Enlighten

Hal-Diderot

Archive ouverte UNIGE

Question Answering using Syntactic Patterns in a Contextual Search Engine

Author: Sand Kim Andre
Publication venue
Publication date: 01/01/2006
Field of study

Question Answering (QA) systems promise to enhance both usability and accuracy when searching for knowledge. This thesis presents a prototype QA system built to leverage the extraction capabilities of a modern, context-aware search platform; Fast ESP. Questions in plain English are transformed to queries which target specific entities in the text that correspond with the identified answer types. A small set of unified patterns is demonstrated as adequate to classify a wide variety of syntactic constructs. For the purpose of verifying the answers, a semantic lexicon is compiled using an automated procedure. The whole solution is based on pattern matching and presents this as a viable alternative to deeper linguistic methods

NORA - Norwegian Open Research Archives

Champion Solution for the WSDM2023 Toloka VQA Challenge

Author: Chen Guo
Chen Zhe
Gao Shengyi
Lu Tong
Wang Wenhai
Publication venue
Publication date: 21/01/2023
Field of study

In this report, we present our champion solution to the WSDM2023 Toloka Visual Question Answering (VQA) Challenge. Different from the common VQA and visual grounding (VG) tasks, this challenge involves a more complex scenario, i.e. inferring and locating the object implicitly specified by the given interrogative question. For this task, we leverage ViT-Adapter, a pre-training-free adapter network, to adapt multi-modal pre-trained Uni-Perceiver for better cross-modal localization. Our method ranks first on the leaderboard, achieving 77.5 and 76.347 IoU on public and private test sets, respectively. It shows that ViT-Adapter is also an effective paradigm for adapting the unified perception model to vision-language downstream tasks. Code and models will be released at https://github.com/czczup/ViT-Adapter/tree/main/wsdm2023.Comment: Technical report in WSDM Cup 202

arXiv.org e-Print Archive

Coping with Alternate Formulations of Questions and Answers

Author: Ferret Olivier
Grau Brigitte
Hurault-Plantet Martine
Jacquemin Christian
Monceaux Laura
Robba Isabelle
Vilnat Anne
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/07/2006
Field of study

We present in this chapter the QALC system which has participated in the four TREC QA evaluations. We focus here on the problem of linguistic variation in order to be able to relate questions and answers. We present first, variation at the term level which consists in retrieving questions terms in document sentences even if morphologic, syntactic or semantic variations alter them. Our second subject matter concerns variation at the sentence level that we handle as different partial reformulations of questions. Questions are associated with extraction patterns based on the question syntactic type and the object that is under query. We present the whole system thus allowing situating how QALC deals with variation, and different evaluations

Analysis of atypical prosodic patterns in the speech of people with Down syndrome

Author: Cardeñoso Payo Valentín
Corrales Astorgano Mario
Escudero Mancebo David
González Ferreras César
Martínez Castilla Pastora
Publication venue: 'Elsevier BV'
Publication date: 01/01/2021
Field of study

Producción CientíficaThe speech of people with Down syndrome (DS) shows prosodic features which are distinct from those observed in the oral productions of typically developing (TD) speakers. Although a different prosodic realization does not necessarily imply wrong expression of prosodic functions, atypical expression may hinder communication skills. The focus of this work is to ascertain whether this can be the case in individuals with DS. To do so, we analyze the acoustic features that better characterize the utterances of speakers with DS when expressing prosodic functions related to emotion, turn-end and phrasal chunking, comparing them with those used by TD speakers. An oral corpus of speech utterances has been recorded using the PEPS-C prosodic competence evaluation tool. We use automatic classifiers to prove that the prosodic features that better predict prosodic functions in TD speakers are less informative in speakers with DS. Although atypical features are observed in speakers with DS when producing prosodic functions, the intended prosodic function can be identified by listeners and, in most cases, the features correctly discriminate the function with analytical methods. However, a greater difference between the minimal pairs presented in the PEPS-C test is found for TD speakers in comparison with DS speakers. The proposed methodological approach provides, on the one hand, an identification of the set of features that distinguish the prosodic productions of DS and TD speakers and, on the other, a set of target features for therapy with speakers with DS.Ministerio de Economía, Industria y Competitividad - Fondo Europeo de Desarrollo Regional (grant TIN2017-88858-C2-1-R)Junta de Castilla y León (grant VA050G18

Repositorio Documental de la Universidad de Valladolid

A Principled Framework for Constructing Natural Language Interfaces To Temporal Databases

Author: Androutsopoulos Ion
Publication venue
Publication date: 01/01/1996
Field of study

Most existing natural language interfaces to databases (NLIDBs) were designed to be used with ``snapshot'' database systems, that provide very limited facilities for manipulating time-dependent data. Consequently, most NLIDBs also provide very limited support for the notion of time. The database community is becoming increasingly interested in _temporal_ database systems. These are intended to store and manipulate in a principled manner information not only about the present, but also about the past and future. This thesis develops a principled framework for constructing English NLIDBs for _temporal_ databases (NLITDBs), drawing on research in tense and aspect theories, temporal logics, and temporal databases. I first explore temporal linguistic phenomena that are likely to appear in English questions to NLITDBs. Drawing on existing linguistic theories of time, I formulate an account for a large number of these phenomena that is simple enough to be embodied in practical NLITDBs. Exploiting ideas from temporal logics, I then define a temporal meaning representation language, TOP, and I show how the HPSG grammar theory can be modified to incorporate the tense and aspect account of this thesis, and to map a wide range of English questions involving time to appropriate TOP expressions. Finally, I present and prove the correctness of a method to translate from TOP to TSQL2, TSQL2 being a temporal extension of the SQL-92 database language. This way, I establish a sound route from English questions involving time to a general-purpose temporal database language, that can act as a principled framework for building NLITDBs. To demonstrate that this framework is workable, I employ it to develop a prototype NLITDB, implemented using ALE and Prolog.Comment: PhD thesis; 405 pages; LaTeX2e, uses the packages/macros: amstex, xspace, avm, examples, dvips, varioref, makeidx, epic, eepic, ecltree; postscript figures include

arXiv.org e-Print Archive

CiteSeerX

Edinburgh Research Archive

CERN Document Server

Report on first selection of resources

Author: Ananiadou Sophia
Bel Nùria
Branco Antonio
Cristea Dan
McNaught John
Meinedo Hugo
Mendes Amalia
Moreno Bilbao M. Asunción
Revilla Espí Eva
Rosner Mike
Thompson Paul
Trancoso Isabel
Trandaba¿ Diana
Tufis Dan
Vivaldi Jorge
Publication venue
Publication date: 01/01/2011
Field of study

The central objective of the Metanet4u project is to contribute to the establishment of a pan-European digital platform that makes available language resources and services, encompassing both datasets and software tools, for speech and language processing, and supports a new generation of exchange facilities for them.Peer ReviewedPreprin

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC