35 research outputs found

    Extending AuToBI to prominence detection in European Portuguese

    Get PDF
    This paper describes our exploratory work in applying the Automatic ToBI annotation system (AuToBI), originally developed for Standard American English, to European Portuguese. This work is motivated by the current availability of large amounts of (highly spontaneous) transcribed data and the need to further enrich those transcripts with prosodic information. Manual prosodic annotation, however, is almost impractical for extensive data sets. For that reason, automatic systems such as AuToBi stand as an alternate solution. We have started by applying the AuToBI prosodic event detection system using the existing English models to the prediction of prominent prosodic events (accents) in European Portuguese. This approach achieved an overall accuracy of 74% for prominence detection, similar to state-of-the-art results for other languages. Later, we have trained new models using prepared and spontaneous Portuguese data, achieving a considerable improvement of about 6% accuracy (absolute) over the existing English models. The achieved results are quite encouraging and provide a starting point for automatically predicting prominent events in European Portuguese.info:eu-repo/semantics/publishedVersio

    Automatic Recognition of Prosodic Patterns in Semantic Verbal Fluency Tests - an Animal Naming Task for Edutainment Applications

    Get PDF
    This paper automatically detects prosodic patterns in the domain of semantic fluency tests. Verbal fluency tests aim at evaluating the spontaneous production of words under constrained conditions. Mostly used for assessing cognitive impairment, they can be used in a plethora of domains, as edutainment applications or games with educational purposes. This work discriminates between list effects, disfluencies, and other linguistic events in an animal naming task. Recordings from 42 Portuguese speakers were automatically recognized and AuToBI was applied in order to detect prosodic patterns, using both European Portuguese and English models. Both models allowed to differentiate list effects from the other events, mostly represented by the tunes: L* H/L(-%) (English models) or L*+H H/L(-%) (Portuguese models). However, English models proved to be more suitable because they rely in substantial more training material.info:eu-repo/semantics/publishedVersio

    SEA_AP: una herramienta de segmentación y etiquetado para el análisis prosódico

    Get PDF
    This paper introduces a tool that performs segmentation and labelling of sound chains in phono units, syllables and/or words departing from a sound signal and its corresponding orthographic transcription. In addition, it also integrates acoustic analysis scripts applied to the Praat programme with the aim of reducing the time spent on tasks related to analysis, correction, smoothing and generation of graphics of the melodic curve. The tool is implemented for Galician, Spanish and Brazilian Portuguese. Our goal is to contribute, by means of this application, to automatize some of the tasks of segmentation, labelling and prosodic analysis, since these tasks require a large investment of time and human resources.En este artículo se presenta una herramienta que realiza la segmentación y el etiquetado de cadenas sonoras en unidades de fono, sílaba y/o palabra partiendo de una señal sonora y de su correspondiente transcripción ortográfica. Además, integra scripts de análisis acústico que se ejecutan sobre el programa Praat con el fin de reducir el tiempo invertido en las tareas de análisis, corrección, suavizado y generación de gráficos de la curva melódica. La herramienta está implementada para gallego, español y portugués de Brasil. Nuestro objetivo es contribuir con esta aplicación a automatizar algunas de las labores de segmentación, etiquetado y análisis prosódico, pues constituyen tareas que requieren una gran inversión de tiempo y de recursos humanos.This work would have not been possible without the help of the Spanish Government (Project ‘SpeechTech4All’ TEC2012-38939-C03-01), the European Regional Development Fund (ERDF), the Government of the Autonomous Community of Galicia (GRC2014/024, “Consolidación de Unidades de Investigación: Proyecto AtlantTIC” CN2012/160) and the “Red de Investigación TecAnDAli” from the Council of Culture, Education and University Planning, Xunta de GaliciaS

    Speech Prosody

    Get PDF
    An automatic labeling system using Sp ToBI annotation conventions has been applied both to a non-native corpus of Japanese speakers using Spanish and to a reference corpus of Spanish speakers. A set of metrics based on conditional entropy is computed by using the output of an automatic labeler which happens to be highly correlated with the rates assigned by a team of subject evaluators. An analysis of the relative frequencies in the use of each of the Sp ToBI symbols permits to identify the recurrent mistakes in the productions of non-native speakers. It is discussed with the results that the majority of the observed prosodic deficits can be explained by the prosodic transference between the Japanese and Spanish systems as it had been previouly reported in the state of art.MEC-FEDER Grant TIN2014-59852-R y la Junta de Castilla y León Regional Grant VA145U1

    Corpora compilation for prosody-informed speech processing

    Get PDF
    Research on speech technologies necessitates spoken data, which is usually obtained through read recorded speech, and specifically adapted to the research needs. When the aim is to deal with the prosody involved in speech, the available data must reflect natural and conversational speech, which is usually costly and difficult to get. This paper presents a machine learning-oriented toolkit for collecting, handling, and visualization of speech data, using prosodic heuristic. We present two corpora resulting from these methodologies: PANTED corpus, containing 250 h of English speech from TED Talks, and Heroes corpus containing 8 h of parallel English and Spanish movie speech. We demonstrate their use in two deep learning-based applications: punctuation restoration and machine translation. The presented corpora are freely available to the research community

    Detection of Prosodic Boundaries in Speech Using Wav2Vec 2.0

    Full text link
    Prosodic boundaries in speech are of great relevance to both speech synthesis and audio annotation. In this paper, we apply the wav2vec 2.0 framework to the task of detecting these boundaries in speech signal, using only acoustic information. We test the approach on a set of recordings of Czech broadcast news, labeled by phonetic experts, and compare it to an existing text-based predictor, which uses the transcripts of the same data. Despite using a relatively small amount of labeled data, the wav2vec2 model achieves an accuracy of 94% and F1 measure of 83% on within-sentence prosodic boundaries (or 95% and 89% on all prosodic boundaries), outperforming the text-based approach. However, by combining the outputs of the two different models we can improve the results even further.Comment: This preprint is a pre-review version of the paper and does not contain any post-submission improvements or corrections. The Version of Record of this contribution is published in the proceedings of the International Conference on Text, Speech, and Dialogue (TSD 2022), LNAI volume 13502, and is available online at https://doi.org/10.1007/978-3-031-16270-1_3

    Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of SSWP

    Full text link
    In the realm of expressive Text-to-Speech (TTS), explicit prosodic boundaries significantly advance the naturalness and controllability of synthesized speech. While human prosody annotation contributes a lot to the performance, it is a labor-intensive and time-consuming process, often resulting in inconsistent outcomes. Despite the availability of extensive supervised data, the current benchmark model still faces performance setbacks. To address this issue, a two-stage automatic annotation pipeline is novelly proposed in this paper. Specifically, in the first stage, we propose contrastive text-speech pretraining of Speech-Silence and Word-Punctuation (SSWP) pairs. The pretraining procedure hammers at enhancing the prosodic space extracted from joint text-speech space. In the second stage, we build a multi-modal prosody annotator, which consists of pretrained encoders, a straightforward yet effective text-speech feature fusion scheme, and a sequence classifier. Extensive experiments conclusively demonstrate that our proposed method excels at automatically generating prosody annotation and achieves state-of-the-art (SOTA) performance. Furthermore, our novel model has exhibited remarkable resilience when tested with varying amounts of data.Comment: Submitted to ICASSP 202

    Fonética experimental y tecnologías del habla

    Get PDF
    Las relaciones entre Fonética y Tecnología del Habla han sido objeto de debate reiterado en la bibliografía de ambas disciplinas, debate en el que se ha puesto de manifiesto, por un lado, la innegable necesidad del conocimiento lingüístico en el desarrollo de estas tecnologías, y por otro, los problemas y limitaciones que ha supuesto en ocasiones para la Tecnología del Habla la falta de descripciones y modelos fonéticos adecuados, lo que ha llevado en muchos casos a la búsqueda de «caminos alternativos», principalmente en la estadística y la probabilidad. Este artículo pretende ofrecer una reflexión personal sobre las relaciones entre ambas disciplinas, en el momento actual y en un futuro próximo, analizando, por un lado, en qué medida el conocimiento fonético puede contribuir al desarrollo de las tecnologías del habla en los próximos años, y por otro, de qué forma la Tecnología del Habla puede ayudar a mejorar los métodos y resultados del trabajo teórico en Fonética experimental
    corecore