121 research outputs found
Articulatory and bottleneck features for speaker-independent ASR of dysarthric speech
The rapid population aging has stimulated the development of assistive
devices that provide personalized medical support to the needies suffering from
various etiologies. One prominent clinical application is a computer-assisted
speech training system which enables personalized speech therapy to patients
impaired by communicative disorders in the patient's home environment. Such a
system relies on the robust automatic speech recognition (ASR) technology to be
able to provide accurate articulation feedback. With the long-term aim of
developing off-the-shelf ASR systems that can be incorporated in clinical
context without prior speaker information, we compare the ASR performance of
speaker-independent bottleneck and articulatory features on dysarthric speech
used in conjunction with dedicated neural network-based acoustic models that
have been shown to be robust against spectrotemporal deviations. We report ASR
performance of these systems on two dysarthric speech datasets of different
characteristics to quantify the achieved performance gains. Despite the
remaining performance gap between the dysarthric and normal speech, significant
improvements have been reported on both datasets using speaker-independent ASR
architectures.Comment: to appear in Computer Speech & Language -
https://doi.org/10.1016/j.csl.2019.05.002 - arXiv admin note: substantial
text overlap with arXiv:1807.1094
The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism
The INTERSPEECH 2013 Computational Paralinguistics Challenge provides for the first time a unified test-bed for Social Signals such as laughter in speech. It further introduces conflict in group discussions as new tasks and picks up on autism and its manifestations in speech. Finally, emotion is revisited as task, albeit with a broader ranger of overall twelve emotional states. In this paper, we describe these four Sub-Challenges, Challenge conditions, baselines, and a new feature set by the openSMILE toolkit, provided to the participants.
\em Bj\"orn Schuller, Stefan Steidl, Anton Batliner, Alessandro Vinciarelli, Klaus Scherer}\\
{\em Fabien Ringeval, Mohamed Chetouani, Felix Weninger, Florian Eyben, Erik Marchi, }\\
{\em Hugues Salamin, Anna Polychroniou, Fabio Valente, Samuel Kim
Question Answering using Syntactic Patterns in a Contextual Search Engine
Question Answering (QA) systems promise to enhance both usability and accuracy when searching for knowledge. This thesis presents a prototype QA system built to leverage the extraction capabilities of a modern, context-aware search platform; Fast ESP. Questions in plain English are transformed to queries which target specific entities in the text that correspond with the identified answer types. A small set of unified patterns is demonstrated as adequate to classify a wide variety of syntactic constructs. For the purpose of verifying the answers, a semantic lexicon is compiled using an automated procedure. The whole solution is based on pattern matching and presents this as a viable alternative to deeper linguistic methods
Champion Solution for the WSDM2023 Toloka VQA Challenge
In this report, we present our champion solution to the WSDM2023 Toloka
Visual Question Answering (VQA) Challenge. Different from the common VQA and
visual grounding (VG) tasks, this challenge involves a more complex scenario,
i.e. inferring and locating the object implicitly specified by the given
interrogative question. For this task, we leverage ViT-Adapter, a
pre-training-free adapter network, to adapt multi-modal pre-trained
Uni-Perceiver for better cross-modal localization. Our method ranks first on
the leaderboard, achieving 77.5 and 76.347 IoU on public and private test sets,
respectively. It shows that ViT-Adapter is also an effective paradigm for
adapting the unified perception model to vision-language downstream tasks. Code
and models will be released at
https://github.com/czczup/ViT-Adapter/tree/main/wsdm2023.Comment: Technical report in WSDM Cup 202
Coping with Alternate Formulations of Questions and Answers
We present in this chapter the QALC system which has participated in the four TREC QA evaluations. We focus here on the problem of linguistic variation in order to be able to relate questions and answers. We present first, variation at the term level which consists in retrieving questions terms in document sentences even if morphologic, syntactic or semantic variations alter them. Our second subject matter concerns variation at the sentence level that we handle as different partial reformulations of questions. Questions are associated with extraction patterns based on the question syntactic type and the object that is under query. We present the whole system thus allowing situating how QALC deals with variation, and different evaluations
Analysis of atypical prosodic patterns in the speech of people with Down syndrome
Producción CientíficaThe speech of people with Down syndrome (DS) shows prosodic features which are distinct from those observed in the oral productions of typically developing (TD) speakers. Although a different prosodic realization does not necessarily imply wrong expression of prosodic functions, atypical expression may hinder communication skills. The focus of this work is to ascertain whether this can be the case in individuals with DS. To do so, we analyze the acoustic features that better characterize the utterances of speakers with DS when expressing prosodic functions related to emotion, turn-end and phrasal chunking, comparing them with those used by TD speakers. An oral corpus of speech utterances has been recorded using the PEPS-C prosodic competence evaluation tool. We use automatic classifiers to prove that the prosodic features that better predict prosodic functions in TD speakers are less informative in speakers with DS. Although atypical features are observed in speakers with DS when producing prosodic functions, the intended prosodic function can be identified by listeners and, in most cases, the features correctly discriminate the function with analytical methods. However, a greater difference between the minimal pairs presented in the PEPS-C test is found for TD speakers in comparison with DS speakers. The proposed methodological approach provides, on the one hand, an identification of the set of features that distinguish the prosodic productions of DS and TD speakers and, on the other, a set of target features for therapy with speakers with DS.Ministerio de Economía, Industria y Competitividad - Fondo Europeo de Desarrollo Regional (grant TIN2017-88858-C2-1-R)Junta de Castilla y León (grant VA050G18
A Principled Framework for Constructing Natural Language Interfaces To Temporal Databases
Most existing natural language interfaces to databases (NLIDBs) were designed
to be used with ``snapshot'' database systems, that provide very limited
facilities for manipulating time-dependent data. Consequently, most NLIDBs also
provide very limited support for the notion of time. The database community is
becoming increasingly interested in _temporal_ database systems. These are
intended to store and manipulate in a principled manner information not only
about the present, but also about the past and future.
This thesis develops a principled framework for constructing English NLIDBs
for _temporal_ databases (NLITDBs), drawing on research in tense and aspect
theories, temporal logics, and temporal databases. I first explore temporal
linguistic phenomena that are likely to appear in English questions to NLITDBs.
Drawing on existing linguistic theories of time, I formulate an account for a
large number of these phenomena that is simple enough to be embodied in
practical NLITDBs. Exploiting ideas from temporal logics, I then define a
temporal meaning representation language, TOP, and I show how the HPSG grammar
theory can be modified to incorporate the tense and aspect account of this
thesis, and to map a wide range of English questions involving time to
appropriate TOP expressions. Finally, I present and prove the correctness of a
method to translate from TOP to TSQL2, TSQL2 being a temporal extension of the
SQL-92 database language. This way, I establish a sound route from English
questions involving time to a general-purpose temporal database language, that
can act as a principled framework for building NLITDBs. To demonstrate that
this framework is workable, I employ it to develop a prototype NLITDB,
implemented using ALE and Prolog.Comment: PhD thesis; 405 pages; LaTeX2e, uses the packages/macros: amstex,
xspace, avm, examples, dvips, varioref, makeidx, epic, eepic, ecltree;
postscript figures include
Report on first selection of resources
The central objective of the Metanet4u project is to contribute to the establishment of a pan-European digital platform that makes available language resources and services, encompassing both datasets and software tools, for speech and language processing, and supports a new generation of exchange facilities for them.Peer ReviewedPreprin
- …