63 research outputs found

    Fast speaker independent large vocabulary continuous speech recognition [online]

    Get PDF

    System-independent ASR error detection and classification using Recurrent Neural Network

    Get PDF
    This paper addresses errors in continuous Automatic Speech Recognition (ASR) in two stages: error detection and error type classification. Unlike the majority of research in this field, we propose to handle the recognition errors independently from the ASR decoder. We first establish an effective set of generic features derived exclusively from the recognizer output to compensate for the absence of ASR decoder information. Then, we apply a variant Recurrent Neural Network (V-RNN) based models for error detection and error type classification. Such model learn additional information to the recognized word classification using label dependency. As a result, experiments on Multi-Genre Broadcast Media corpus have shown that the proposed generic features setup leads to achieve competitive performances, compared to state of the art systems in both tasks. Furthermore, we have shown that V-RNN trained on the proposed feature set appear to be an effective classifier for the ASR error detection with an Accuracy of 85.43%

    Comparison of ALBAYZIN query-by-example spoken term detection 2012 and 2014 evaluations

    Full text link
    Query-by-example spoken term detection (QbE STD) aims at retrieving data from a speech repository given an acoustic query containing the term of interest as input. Nowadays, it is receiving much interest due to the large volume of multimedia information. This paper presents the systems submitted to the ALBAYZIN QbE STD 2014 evaluation held as a part of the ALBAYZIN 2014 Evaluation campaign within the context of the IberSPEECH 2014 conference. This is the second QbE STD evaluation in Spanish, which allows us to evaluate the progress in this technology for this language. The evaluation consists in retrieving the speech files that contain the input queries, indicating the start and end times where the input queries were found, along with a score value that reflects the confidence given to the detection of the query. Evaluation is conducted on a Spanish spontaneous speech database containing a set of talks from workshops, which amount to about 7 h of speech. We present the database, the evaluation metric, the systems submitted to the evaluation, the results, and compare this second evaluation with the first ALBAYZIN QbE STD evaluation held in 2012. Four different research groups took part in the evaluations held in 2012 and 2014. In 2014, new multi-word and foreign queries were added to the single-word and in-language queries used in 2012. Systems submitted to the second evaluation are hybrid systems which integrate letter transcription- and template matching-based systems. Despite the significant improvement obtained by the systems submitted to this second evaluation compared to those of the first evaluation, results still show the difficulty of this task and indicate that there is still room for improvement.This research was funded by the Spanish Government ('SpeechTech4All Project' TEC2012 38939 C03 01 and 'CMC-V2 Project' TEC2012 37585 C02 01), the Galician Government through the research contract GRC2014/024 (Modalidade: Grupos de Referencia Competitiva 2014) and 'AtlantTIC Project' CN2012/160, and also by the Spanish Government and the European Regional Development Fund (ERDF) under project TACTICA

    Search on speech from spoken queries: the Multi-domain International ALBAYZIN 2018 Query-by-Example Spoken Term Detection Evaluation

    Get PDF
    [Abstract] The huge amount of information stored in audio and video repositories makes search on speech (SoS) a priority area nowadays. Within SoS, Query-by-Example Spoken Term Detection (QbE STD) aims to retrieve data from a speech repository given a spoken query. Research on this area is continuously fostered with the organization of QbE STD evaluations. This paper presents a multi-domain internationally open evaluation for QbE STD in Spanish. The evaluation aims at retrieving the speech files that contain the queries, providing their start and end times, and a score that reflects the confidence given to the detection. Three different Spanish speech databases that encompass different domains have been employed in the evaluation: MAVIR database, which comprises a set of talks from workshops; RTVE database, which includes broadcast television (TV) shows; and COREMAH database, which contains 2-people spontaneous speech conversations about different topics. The evaluation has been designed carefully so that several analyses of the main results can be carried out. We present the evaluation itself, the three databases, the evaluation metrics, the systems submitted to the evaluation, the results, and the detailed post-evaluation analyses based on some query properties (within-vocabulary/out-of-vocabulary queries, single-word/multi-word queries, and native/foreign queries). Fusion results of the primary systems submitted to the evaluation are also presented. Three different teams took part in the evaluation, and ten different systems were submitted. The results suggest that the QbE STD task is still in progress, and the performance of these systems is highly sensitive to changes in the data domain. Nevertheless, QbE STD strategies are able to outperform text-based STD in unseen data domains.Centro singular de investigación de Galicia; ED431G/04Universidad del País Vasco; GIU16/68Ministerio de Economía y Competitividad; TEC2015-68172-C2-1-PMinisterio de Ciencia, Innovación y Competitividad; RTI2018-098091-B-I00Xunta de Galicia; ED431G/0

    Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages

    Full text link
    We introduce the Universal Speech Model (USM), a single large model that performs automatic speech recognition (ASR) across 100+ languages. This is achieved by pre-training the encoder of the model on a large unlabeled multilingual dataset of 12 million (M) hours spanning over 300 languages, and fine-tuning on a smaller labeled dataset. We use multilingual pre-training with random-projection quantization and speech-text modality matching to achieve state-of-the-art performance on downstream multilingual ASR and speech-to-text translation tasks. We also demonstrate that despite using a labeled training set 1/7-th the size of that used for the Whisper model, our model exhibits comparable or better performance on both in-domain and out-of-domain speech recognition tasks across many languages.Comment: 20 pages, 7 figures, 8 table

    ALBAYZIN 2018 spoken term detection evaluation: a multi-domain international evaluation in Spanish

    Get PDF
    [Abstract] Search on speech (SoS) is a challenging area due to the huge amount of information stored in audio and video repositories. Spoken term detection (STD) is an SoS-related task aiming to retrieve data from a speech repository given a textual representation of a search term (which can include one or more words). This paper presents a multi-domain internationally open evaluation for STD in Spanish. The evaluation has been designed carefully so that several analyses of the main results can be carried out. The evaluation task aims at retrieving the speech files that contain the terms, providing their start and end times, and a score that reflects the confidence given to the detection. Three different Spanish speech databases that encompass different domains have been employed in the evaluation: the MAVIR database, which comprises a set of talks from workshops; the RTVE database, which includes broadcast news programs; and the COREMAH database, which contains 2-people spontaneous speech conversations about different topics. We present the evaluation itself, the three databases, the evaluation metric, the systems submitted to the evaluation, the results, and detailed post-evaluation analyses based on some term properties (within-vocabulary/out-of-vocabulary terms, single-word/multi-word terms, and native/foreign terms). Fusion results of the primary systems submitted to the evaluation are also presented. Three different research groups took part in the evaluation, and 11 different systems were submitted. The obtained results suggest that the STD task is still in progress and performance is highly sensitive to changes in the data domain.Ministerio de Economía y Competitividad; TIN2015-64282-R,Ministerio de Economía y Competitividad; RTI2018-093336-B-C22Ministerio de Economía y Competitividad; TEC2015-65345-PXunta de Galicia; ED431B 2016/035Xunta de Galicia; GPC ED431B 2019/003Xunta de Galicia; GRC 2014/024Xunta de Galicia; ED431G/01Xunta de Galicia; ED431G/04Agrupación estratéxica consolidada; GIU16/68Ministerio de Economía y Competitividad; TEC2015-68172-C2-1-

    Proceedings of the ACM SIGIR Workshop ''Searching Spontaneous Conversational Speech''

    Get PDF

    Youth in Lebanon: Using collaborative and interdisciplinary communication design methods to improve social integration in post-conflict societies

    Full text link
    In 1995, the World Summit for Social Development identified social integration as one of the three overriding objectives for social and economic development. This priority arose following a century that ended with the collapse of many states and the sharpening of strife around the world. Social integration was seen as a pathway to reinforcing common identities, supporting cooperation and lessening the likelihood of violence and conflict. For the past 20 years, governmental, academic and third sector organisations – with the United Nations at the forefront – sought to improve social integration. However their methods and interventions have commonly been restricted to policymaking and dialogue practices. Peacebuilding and reconciliation are affected by communication within and amongst different groups. Nonetheless, the potential for communication design to contribute towards social integration remains unexplored. This practice-led communication design research focuses on 18 to 30 year old youth in Lebanon – an extreme case of a politically, religiously, geographically, culturally and linguistically segregated post-conflict generation. The research adopts an innovative, interdisciplinary(1.) and collaborative(2.) approach, to explore the contribution of communication design methods towards social integration interventions. The interdisciplinary and collaborative case study process spans seven stages of practice: Discover, Delve, Define, Develop, Deliver, Determine Impact and Diverge. I developed this process with Darren Raven in 2010, and have been testing and refining it over the past five years through the socially-focused design projects of BA Design for Graphic Communication students and staff at the London College of Communication. This process builds on the Design Council’s Double Diamond design process by incorporating stages from the National Social Marketing Centre’s process. Through these stages, the research developed several innovative communication design methods: Explorations, a cultural probes toolkit exploring young people’s local context; Road Trip, an autoethnographic journey preparing the researcher; Connections, an effective method for recruiting stakeholders; Expressions Corner, a confidential diary room for understanding young people’s experiences, attitudes and behaviours; Imagination Studio, a collaborative workshop series for developing social integration interventions; Imagination Market, an efficient platform for piloting these interventions; and a Social Impact Framework; to evaluate the impact of the interventions and research. These methods enhanced candid input from young people, reduced ethical tensions, and improved their engagement with the research. The methods also involved youth and wider stakeholders in understanding and reframing the problem, invited them to generate and deliver solutions, strengthened their sense of ownership and therefore the sustainability of the research outputs, and finally, built their capabilities throughout the process. The social integration interventions developed and piloted through the case-study research ranged from a citizen journalism platform reducing media bias, to a youth-led internal tourism service encouraging geographic mobility. The evaluation of the 24-hour pilot interventions demonstrated a positive shift in young people’s willingness to integrate. The social impact and social value assessment suggests that effective social integration interventions – such as the ones developed and piloted in the case study research – have higher chances of delivering positive social and economic outcomes for the communities involved. This practice-led research presents a number of contributions, the most significant of which is a methodology, process and set of methods highly transferable across social integration challenges worldwide. The research also provides social integration theory and practice with a clear demonstration of the value and potential of communication design to advance interventions from replication to innovation. To communication design theory and practice, the research makes the case for the value of interdisciplinary and collaborative principles in enhancing rigour and social impact. Finally, to the Lebanese context, the research provides in-depth qualitative insights on social group dynamics, segments, and behaviours, which act as an evidence-base to underpin future local interventions. Beyond this thesis, the knowledge gained from this research will be disseminated to the various relevant communities of practice – including researchers, designers, policy makers, and community development workers – in the form of Creative Commons licensed design guidelines, as well as presentations, capacity building workshops, and academic publications. The dissemination of knowledge hopes to inspire and enable these communities to adopt, adapt and build on communication design methods when addressing social segregation challenges within their varying contexts. Notes in the text: (1.) Drawing on disciplines such as social, political, behavioural, and psychological sciences. (2.) Engaging multiple stakeholders including young people, civil society, institutions, topic experts and policy-makers
    corecore