34,270 research outputs found

    Robust audio indexing for Dutch spoken-word collections

    Get PDF
    Abstract—Whereas the growth of storage capacity is in accordance with widely acknowledged predictions, the possibilities to index and access the archives created is lagging behind. This is especially the case in the oral history domain and much of the rich content in these collections runs the risk to remain inaccessible for lack of robust search technologies. This paper addresses the history and development of robust audio indexing technology for searching Dutch spoken-word collections and compares Dutch audio indexing in the well-studied broadcast news domain with an oral-history case-study. It is concluded that despite significant advances in Dutch audio indexing technology and demonstrated applicability in several domains, further research is indispensable for successful automatic disclosure of spoken-word collections

    Unravelling the voice of Willem Frederik Hermans: an oral history indexing case study

    Get PDF

    Access to recorded interviews: A research agenda

    Get PDF
    Recorded interviews form a rich basis for scholarly inquiry. Examples include oral histories, community memory projects, and interviews conducted for broadcast media. Emerging technologies offer the potential to radically transform the way in which recorded interviews are made accessible, but this vision will demand substantial investments from a broad range of research communities. This article reviews the present state of practice for making recorded interviews available and the state-of-the-art for key component technologies. A large number of important research issues are identified, and from that set of issues, a coherent research agenda is proposed

    Acomodación fonética durante las interacciones conversacionales: una visión general

    Get PDF
    During conversational interactions such as tutoring, instruction-giving tasks, verbal negotiations, or just talking with friends, interlocutors’ behaviors experience a series of changes due to the characteristics of their counterpart and to the interaction itself. These changes are pervasively present in every social interaction, and most of them occur in the sounds and rhythms of our speech, which is known as acoustic-prosodic accommodation, or simply phonetic accommodation. The consequences, linguistic and social constraints, and underlying cognitive mechanisms of phonetic accommodation have been studied for at least 50 years, due to the importance of the phenomenon to several disciplines such as linguistics, psychology, and sociology. Based on the analysis and synthesis of the existing empirical research literature, in this paper we present a structured and comprehensive review of the qualities, functions, onto- and phylogenetic development, and modalities of phonetic accommodation.Durante las interacciones conversacionales como dar una tutoría, dar instrucciones, las negociaciones verbales, o simplemente hablar con amigos, los comportamientos de las personas experimentan una serie de cambios debido a las características de su interlocutor y a la interacción en sí. Estos cambios están presentes en cada interacción social, y la mayoría de ellos ocurre en los sonidos y ritmos del habla, lo cual se conoce como acomodación acústico-prosódica, o simplemente acomodación fonética. Las consecuencias, las limitaciones lingüísticas y sociales, y los mecanismos cognitivos subyacentes a la acomodación fonética se han estudiado durante al menos 50 años, debido a la importancia del fenómeno para varias disciplinas como la lingüística, la psicología, y la sociología. A partir del análisis y síntesis de la literatura de investigación empírica existente, en este artículo presentamos una revisión estructurada y exhaustiva de las cualidades, funciones, desarrollo onto- y filogenético, y modalidades de la acomodación fonética

    Individual and environment-related acoustic-phonetic strategies for communicating in adverse conditions

    Get PDF
    In many situations it is necessary to produce speech in ‘adverse conditions’: that is, conditions that make speech communication difficult. Research has demonstrated that speaker strategies, as described by a range of acoustic-phonetic measures, can vary both at the individual level and according to the environment, and are argued to facilitate communication. There has been debate as to the environmental specificity of these adaptations, and their effectiveness in overcoming communication difficulty. Furthermore, the manner and extent to which adaptation strategies differ between individuals is not yet well understood. This thesis presents three studies that explore the acoustic-phonetic adaptations of speakers in noisy and degraded communication conditions and their relationship with intelligibility. Study 1 investigated the effects of temporally fluctuating maskers on global acoustic-phonetic measures associated with speech in noise (Lombard speech). The results replicated findings of increased power in the modulation spectrum in Lombard speech, but showed little evidence of adaptation to masker fluctuations via the temporal envelope. Study 2 collected a larger corpus of semi-spontaneous communicative speech in noise and other degradations perturbing specific acoustic dimensions. Speakers showed different adaptations across the environments that were likely suited to overcome noise (steady and temporally fluctuating), restricted spectral and pitch information by a noise-excited vocoder, and a sensorineural hearing loss simulation. Analyses of inter-speaker variation in both studies 1 and 2 showed behaviour was highly variable and some strategy combinations were identified. Study 3 investigated the intelligibility of strategies ‘tailored’ to specific environments and the relationship between intelligibility and speaker acoustics, finding a benefit of tailored speech adaptations and discussing the potential roles of speaker flexibility, adaptation level, and intrinsic intelligibility. The overall results are discussed in relation to models of communication in adverse conditions and a model accounting for individual variability in these conditions is proposed

    Speaker Identification for Swiss German with Spectral and Rhythm Features

    Get PDF
    We present results of speech rhythm analysis for automatic speaker identification. We expand previous experiments using similar methods for language identification. Features describing the rhythmic properties of salient changes in signal components are extracted and used in an speaker identification task to determine to which extent they are descriptive of speaker variability. We also test the performance of state-of-the-art but simple-to-extract frame-based features. The paper focus is the evaluation on one corpus (swiss german, TEVOID) using support vector machines. Results suggest that the general spectral features can provide very good performance on this dataset, whereas the rhythm features are not as successful in the task, indicating either the lack of suitability for this task or the dataset specificity

    Gender Representation in French Broadcast Corpora and Its Impact on ASR Performance

    Full text link
    This paper analyzes the gender representation in four major corpora of French broadcast. These corpora being widely used within the speech processing community, they are a primary material for training automatic speech recognition (ASR) systems. As gender bias has been highlighted in numerous natural language processing (NLP) applications, we study the impact of the gender imbalance in TV and radio broadcast on the performance of an ASR system. This analysis shows that women are under-represented in our data in terms of speakers and speech turns. We introduce the notion of speaker role to refine our analysis and find that women are even fewer within the Anchor category corresponding to prominent speakers. The disparity of available data for both gender causes performance to decrease on women. However this global trend can be counterbalanced for speaker who are used to speak in the media when sufficient amount of data is available.Comment: Accepted to ACM Workshop AI4T

    Does training with amplitude modulated tones affect tone-vocoded speech perception?

    Get PDF
    Temporal-envelope cues are essential for successful speech perception. We asked here whether training on stimuli containing temporal-envelope cues without speech content can improve the perception of spectrally-degraded (vocoded) speech in which the temporal-envelope (but not the temporal fine structure) is mainly preserved. Two groups of listeners were trained on different amplitude-modulation (AM) based tasks, either AM detection or AM-rate discrimination (21 blocks of 60 trials during two days, 1260 trials; frequency range: 4Hz, 8Hz, and 16Hz), while an additional control group did not undertake any training. Consonant identification in vocoded vowel-consonant-vowel stimuli was tested before and after training on the AM tasks (or at an equivalent time interval for the control group). Following training, only the trained groups showed a significant improvement in the perception of vocoded speech, but the improvement did not significantly differ from that observed for controls. Thus, we do not find convincing evidence that this amount of training with temporal-envelope cues without speech content provide significant benefit for vocoded speech intelligibility. Alternative training regimens using vocoded speech along the linguistic hierarchy should be explored
    corecore