1,552 research outputs found

    Indexing, browsing and searching of digital video

    Get PDF
    Video is a communications medium that normally brings together moving pictures with a synchronised audio track into a discrete piece or pieces of information. The size of a “piece ” of video can variously be referred to as a frame, a shot, a scene, a clip, a programme or an episode, and these are distinguished by their lengths and by their composition. We shall return to the definition of each of these in section 4 this chapter. In modern society, video is ver

    Automatic rich annotation of large corpus of conversational transcribed speech

    Get PDF
    International audienceThis paper describes the use of the CasSys platform in order to achieve the chunking of conversational speech transcripts by means of cascades of Unitex transducers. Our system is involved in the EPAC project of the French National agency of Research (ANR). The aim of this project is to develop robust methods for the annotation of audio/multimedia document collections which contains conversational speech sequences such as TV or radio programs. At first, this paper presents the EPAC project and the adaptation of a former chunking system (Romus) which was developed in the restricted framework of dedicated spoken man-machine dialogue. Then, it describes the problems that are arising due to 1) spontaneous speech disfluencies and 2) errors for the previous stages of processing (automatic speech recognition and POS tagging)

    Review of Research on Speech Technology: Main Contributions From Spanish Research Groups

    Get PDF
    In the last two decades, there has been an important increase in research on speech technology in Spain, mainly due to a higher level of funding from European, Spanish and local institutions and also due to a growing interest in these technologies for developing new services and applications. This paper provides a review of the main areas of speech technology addressed by research groups in Spain, their main contributions in the recent years and the main focus of interest these days. This description is classified in five main areas: audio processing including speech, speaker characterization, speech and language processing, text to speech conversion and spoken language applications. This paper also introduces the Spanish Network of Speech Technologies (RTTH. Red TemĂĄtica en TecnologĂ­as del Habla) as the research network that includes almost all the researchers working in this area, presenting some figures, its objectives and its main activities developed in the last years

    Accessing spoken interaction through dialogue processing [online]

    Get PDF
    Zusammenfassung Unser Leben, unsere Leistungen und unsere Umgebung, alles wird derzeit durch Schriftsprache dokumentiert. Die rasante Fortentwicklung der technischen Möglichkeiten Audio, Bilder und Video aufzunehmen, abzuspeichern und wiederzugeben kann genutzt werden um die schriftliche Dokumentation von menschlicher Kommunikation, zum Beispiel Meetings, zu unterstĂŒtzen, zu ergĂ€nzen oder gar zu ersetzen. Diese neuen Technologien können uns in die Lage versetzen Information aufzunehmen, die anderweitig verloren gehen, die Kosten der Dokumentation zu senken und hochwertige Dokumente mit audiovisuellem Material anzureichern. Die Indizierung solcher Aufnahmen stellt die Kerntechnologie dar um dieses Potential auszuschöpfen. Diese Arbeit stellt effektive Alternativen zu schlĂŒsselwortbasierten Indizes vor, die SuchraumeinschrĂ€nkungen bewirken und teilweise mit einfachen Mitteln zu berechnen sind. Die Indizierung von Sprachdokumenten kann auf verschiedenen Ebenen erfolgen: Ein Dokument gehört stilistisch einer bestimmten Datenbasis an, welche durch sehr einfache Merkmale bei hoher Genauigkeit automatisch bestimmt werden kann. Durch diese Art von Klassifikation kann eine Reduktion des Suchraumes um einen Faktor der GrĂ¶ĂŸenordnung 4­10 erfolgen. Die Anwendung von thematischen Merkmalen zur Textklassifikation bei einer Nachrichtendatenbank resultiert in einer Reduktion um einen Faktor 18. Da Sprachdokumente sehr lang sein können mĂŒssen sie in thematische Segmente unterteilt werden. Ein neuer probabilistischer Ansatz sowie neue Merkmale (Sprecherinitia­ tive und Stil) liefern vergleichbare oder bessere Resultate als traditionelle schlĂŒsselwortbasierte AnsĂ€tze. Diese thematische Segmente können durch die vorherrschende AktivitĂ€t charakterisiert werden (erzĂ€hlen, diskutieren, planen, ...), die durch ein neuronales Netz detektiert werden kann. Die Detektionsraten sind allerdings begrenzt da auch Menschen diese AktivitĂ€ten nur ungenau bestimmen. Eine maximale Reduktion des Suchraumes um den Faktor 6 ist bei den verwendeten Daten theoretisch möglich. Eine thematische Klassifikation dieser Segmente wurde ebenfalls auf einer Datenbasis durchgefĂŒhrt, die Detektionsraten fĂŒr diesen Index sind jedoch gering. Auf der Ebene der einzelnen Äußerungen können Dialogakte wie Aussagen, Fragen, RĂŒckmeldungen (aha, ach ja, echt?, ...) usw. mit einem diskriminativ trainierten Hidden Markov Model erkannt werden. Dieses Verfahren kann um die Erkennung von kurzen Folgen wie Frage/Antwort­Spielen erweitert werden (Dialogspiele). Dialogakte und ­spiele können eingesetzt werden um Klassifikatoren fĂŒr globale Sprechstile zu bauen. Ebenso könnte ein Benutzer sich an eine bestimmte Dialogaktsequenz erinnern und versuchen, diese in einer grafischen ReprĂ€sentation wiederzufinden. In einer Studie mit sehr pessimistischen Annahmen konnten Benutzer eines aus vier Ă€hnlichen und gleichwahrscheinlichen GesprĂ€chen mit einer Genauigkeit von ~ 43% durch eine graphische ReprĂ€sentation von AktivitĂ€t bestimmt. Dialogakte könnte in diesem Szenario ebenso nĂŒtzlich sein, die Benutzerstudie konnte aufgrund der geringen Datenmenge darĂŒber keinen endgĂŒltigen Aufschluß geben. Die Studie konnte allerdings fĂŒr detailierte Basismerkmale wie FormalitĂ€t und SprecheridentitĂ€t keinen Effekt zeigen. Abstract Written language is one of our primary means for documenting our lives, achievements, and environment. Our capabilities to record, store and retrieve audio, still pictures, and video are undergoing a revolution and may support, supplement or even replace written documentation. This technology enables us to record information that would otherwise be lost, lower the cost of documentation and enhance high­quality documents with original audiovisual material. The indexing of the audio material is the key technology to realize those benefits. This work presents effective alternatives to keyword based indices which restrict the search space and may in part be calculated with very limited resources. Indexing speech documents can be done at a various levels: Stylistically a document belongs to a certain database which can be determined automatically with high accuracy using very simple features. The resulting factor in search space reduction is in the order of 4­10 while topic classification yielded a factor of 18 in a news domain. Since documents can be very long they need to be segmented into topical regions. A new probabilistic segmentation framework as well as new features (speaker initiative and style) prove to be very effective compared to traditional keyword based methods. At the topical segment level activities (storytelling, discussing, planning, ...) can be detected using a machine learning approach with limited accuracy; however even human annotators do not annotate them very reliably. A maximum search space reduction factor of 6 is theoretically possible on the databases used. A topical classification of these regions has been attempted on one database, the detection accuracy for that index, however, was very low. At the utterance level dialogue acts such as statements, questions, backchannels (aha, yeah, ...), etc. are being recognized using a novel discriminatively trained HMM procedure. The procedure can be extended to recognize short sequences such as question/answer pairs, so called dialogue games. Dialog acts and games are useful for building classifiers for speaking style. Similarily a user may remember a certain dialog act sequence and may search for it in a graphical representation. In a study with very pessimistic assumptions users are able to pick one out of four similar and equiprobable meetings correctly with an accuracy ~ 43% using graphical activity information. Dialogue acts may be useful in this situation as well but the sample size did not allow to draw final conclusions. However the user study fails to show any effect for detailed basic features such as formality or speaker identity

    Access to recorded interviews: A research agenda

    Get PDF
    Recorded interviews form a rich basis for scholarly inquiry. Examples include oral histories, community memory projects, and interviews conducted for broadcast media. Emerging technologies offer the potential to radically transform the way in which recorded interviews are made accessible, but this vision will demand substantial investments from a broad range of research communities. This article reviews the present state of practice for making recorded interviews available and the state-of-the-art for key component technologies. A large number of important research issues are identified, and from that set of issues, a coherent research agenda is proposed

    Proceedings of the ACM SIGIR Workshop ''Searching Spontaneous Conversational Speech''

    Get PDF
    • 

    corecore