386 research outputs found

    Proceedings of the ACM SIGIR Workshop ''Searching Spontaneous Conversational Speech''

    Get PDF

    Nommage non supervisé des personnes dans les émissions de télévision. Utilisation des noms écrits, des noms prononcés ou des deux ?

    Get PDF
    National audienceL'identiïŹcation de personnes dans les Ă©missions de tĂ©lĂ©vision est un outil prĂ©cieux pour l'indexation de ce type de vidĂ©os mais l'utilisation de modĂšles biomĂ©triques n'est pas une option viable sans connaissance a priori des personnes prĂ©sentes dans les vidĂ©os. Les noms prononcĂ©s ou Ă©crits peuvent nous fournir une liste de noms hypothĂšses. Nous proposons une comparaison du potentiel de ces deux modalitĂ©s (noms prononcĂ©s ou Ă©crits) aïŹn d'extraire le nom des personnes parlant et/ou apparaissant. Les noms prononcĂ©s proposent un plus grand nombre d'occurrences de citation mais les erreurs de transcription et de dĂ©tection de ces noms rĂ©duisent de moitiĂ© le potentiel de cette modalitĂ©. Les noms Ă©crits bĂ©nĂ©ïŹcient d'une amĂ©lioration croissante de la qualitĂ© des vidĂ©os et sont plus facilement dĂ©tectĂ©s. Par ailleurs, l'afïŹliation aux locuteurs/visages des noms Ă©crits reste plus simple que pour les noms prononcĂ©s

    Deriving and Exploiting Situational Information in Speech: Investigations in a Simulated Search and Rescue Scenario

    Get PDF
    The need for automatic recognition and understanding of speech is emerging in tasks involving the processing of large volumes of natural conversations. In application domains such as Search and Rescue, exploiting automated systems for extracting mission-critical information from speech communications has the potential to make a real difference. Spoken language understanding has commonly been approached by identifying units of meaning (such as sentences, named entities, and dialogue acts) for providing a basis for further discourse analysis. However, this fine-grained identification of fundamental units of meaning is sensitive to high error rates in the automatic transcription of noisy speech. This thesis demonstrates that topic segmentation and identification techniques can be employed for information extraction from spoken conversations by being robust to such errors. Two novel topic-based approaches are presented for extracting situational information within the search and rescue context. The first approach shows that identifying the changes in the context and content of first responders' report over time can provide an estimation of their location. The second approach presents a speech-based topological map estimation technique that is inspired, in part, by automatic mapping algorithms commonly used in robotics. The proposed approaches are evaluated on a goal-oriented conversational speech corpus, which has been designed and collected based on an abstract communication model between a first responder and a task leader during a search process. Results have confirmed that a highly imperfect transcription of noisy speech has limited impact on the information extraction performance compared with that obtained on the transcription of clean speech data. This thesis also shows that speech recognition accuracy can benefit from rescoring its initial transcription hypotheses based on the derived high-level location information. A new two-pass speech decoding architecture is presented. In this architecture, the location estimation from a first decoding pass is used to dynamically adapt a general language model which is used for rescoring the initial recognition hypotheses. This decoding strategy has resulted in a statistically significant gain in the recognition accuracy of the spoken conversations in high background noise. It is concluded that the techniques developed in this thesis can be extended to more application domains that deal with large volumes of natural spoken conversations

    AudioStreamer--leveraging the cocktail party effect for efficient listening

    Get PDF
    Thesis (M.S.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1996.Includes bibliographical references (p. 89-94).by Atty Thomas Mullins.M.S

    Sonic Pleasure, Absence and the History of the Self: An Alternative Approach to the Criticism of Sound Art

    Get PDF
    Historical, psychoanalytic and cinema criticism have characterised the history of Western modernity and the individual subject as founded upon an affective lack. Pleasure is solicited by the promise of fullness, but this is never fulfilled, fuelling further desire. Sound art however is more typically theorised as inherently present and immersive, as a form that offers direct experience, which literally touches the subject. I draw upon the work of Jonathan Sterne, Steven Connor, psychoanalysis and film criticism to rearticulate not just modernist media and subjectivity as characterised by lack and absence, but the perception of aestheticised sound. Starting with an analysis of influential seventeenth century audiovisual theorist Athanasius Kircher, I sketch a history of the self and media where pleasure is solicited and threatened by subjective absence and lack, in which the aesthetics of Romanticism, absolute music, Alvin Lucier, Noise artists (Justice Yeldham), feminist sound poets (Amanda Stewart), the New Music Ensemble Decibel (director Cat Hope) and others are implicated

    Spoken Corpora Good Practice Guide 2006

    Get PDF
    International audienceThere is currently a vast amount of fundamental or applied research, which is based on the exploitation of oral corpora (organized recorded collections of oral and multimodal language productions). Created as a result of linguists becoming aware of the importance to ensure the durability of sources and a diversified access to the oral documents they produce, this Guide to good practice mainly deals with “oral corpora”, created for and used by linguists. But the questions raised by the creation and documentary exploitation of these corpora can be found in numerous disciplines: ethnology, anthropology, sociology, psychology, demography, oral history notably use oral surveys, testimonies, interviews, life stories. Based on a linguistic approach, this Guide also touches on the preoccupations of other researchers who use oral corpora (for example in the field of speech synthesis and recognition), even if their specific needs aren’t consistently dealt with in the present document

    Phonological features of Hong Kong English : patterns of variation and effects on local acceptability

    Full text link
    The changing dynamics of international communication in English have led to a intense questioning of the relevance of native-speaker pronunciation models in language teaching and testing. In addition, the World Englishes approach to local varieties has increased their level of recognition. Both of these developments suggest that English pronunciation models need to be reviewed, and Hong Kong represents an interesting case study. Although it has been claimed that Hong Kong English is at the ‘nativization’ stage, the existence of exonormative attitudes towards English is also well known. Two important questions arise from this inherent tension, neither of which has been intensively addressed in previous studies. Firstly, although many of the features of Hong Kong English pronunciation have been described, patterns of inter-speaker variation have not been investigated in detail. Secondly, the attitudes of Hong Kong English users towards the phonological features of their own variety have not been studied in ways that take account of such variation. This dissertation addresses both of these questions by being features-based in approach and using local listeners to evaluate accent samples. After an initial review of the features of Hong Kong English pronunciation, a preliminary study surveys the occurrence of consonantal phonological features within a mini-corpus of speech samples taken from local television programmes. Its findings are presented in the form of an implicational scale, which not only shows the relative frequencies with which different features occurred, but also indicates the existence of implicational patterns of co-occurrence. In the main study, twelve authentic accent samples (eleven Hong Kong speakers and one British speaker) were presented to 52 first-year undergraduate students for evaluation as to their acceptability, defined here as acceptability for pedagogical purposes. Multivariate statistical analysis discovered firstly that phonological ‘errors’, as marked by the student listeners, were the most important measured factor in determining the acceptability scores, and secondly that only certain types of ‘error’ or ‘feature’ had significant effects. These features were either related to L1 transfer or involved other salient phenomena such as idiosyncratic alterations to syllable structure. The explanatory part of the study includes acceptability as one of the factors determining feature persistence, in an ‘ecological’ or ‘evolutionary’ model of L2 phonology acquisition and development that combines the findings of the preliminary and main studies. Among the other factors that determine feature persistence or disappearance, salience, intelligibility and markedness are invoked as important influences. The acceptability data also has pedagogical implications, in that local listeners did not give the British accent the highest acceptability rating. This contrasts with the findings of previous studies regarding the pedagogical acceptability of the Hong Kong English accent. However, the features-based approach indicates that only certain types of local accent were acceptable to these listeners, and that these accents were more, rather than less, ‘native-like’. In various ways, the study contributes to an understanding of accent variation and acceptability within a new variety of English
    • 

    corecore