Search CORE

4,037 research outputs found

Spoken content retrieval: A survey of techniques and technologies

Author: Ani Nenkova
C A. Nenkova
K. Mckeown
Kathleen Mckeown
Publication venue: 'Now Publishers'
Publication date: 01/01/2012
Field of study

Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR

CiteSeerX

Crossref

Irish Universities

DCU Online Research Access Service

Scholarly Journals on the Net: A Reader's Assessment

Author: Bishop Ann Peterson
Publication venue: Graduate School of Library and Information Science. University of Illinois at Urbana-Champaign
Publication date: 01/01/1995
Field of study

published or submitted for publicatio

Illinois Digital Environment for Access to Learning and Scholarship Repository

Recommended from our members

Design Exposition with Literate Visualization

Author: Dykes J.
Kachkaev A.
Wood J.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

We propose a new approach to the visualization design and communication process, literate visualization, based upon and extending, Donald Knuth’s idea of literate programming. It integrates the process of writing data visualization code with description of the design choices that led to the implementation (design exposition). We develop a model of design exposition characterised by four visualization designer architypes: the evaluator, the autonomist, the didacticist and the rationalist. The model is used to justify the key characteristics of literate visualization: ‘notebook’ documents that integrate live coding input, rendered output and textual narrative; low cost of authoring textual narrative; guidelines to encourage structured visualization design and its documentation. We propose narrative schemas for structuring and validating a wide range of visualization design approaches and models, and branching narratives for capturing alternative designs and design views. We describe a new open source literate visualization environment, litvis, based on a declarative interface to Vega and Vega-Lite through the functional programming language Elm combined with markdown for formatted narrative. We informally assess the approach, its implementation and potential by considering three examples spanning a range of design abstractions: new visualization idioms; validation though visualization algebra; and feminist data visualization. We argue that the rich documentation of the design process provided by literate visualization offers the potential to improve the validity of visualization design and so benefit both academic visualization and visualization practice

City Research Online

Crossref

The Elusive Simplicity of Container-Level Encoded Archival Description: Some Considerations

Author: Broaddus Leah
Publication venue: DigitalCommons@Kennesaw State University
Publication date: 01/01/2008
Field of study

Web-managed finding aids require streamlined, efficient intellectual organization of materials. It is not just a question of aesthetics, but of pragmatics. A more consistent, generalizable system of organization aids institutions in adopting, migrating, and building on the structure. The generalizable elements of a solution can be repeated, predicted, explained, taught, and further developed.1 They also lend the skeletal structure necessary to support unique elements

DigitalCommons@Kennesaw State University

Digital libraries for creative communities

Author: Bainbridge David
Cantlon Polly
Cunningham Sally Jo
Jones Matt
Witten Ian H.
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2004
Field of study

Digital library technologies have a great deal to offer to creative, design communities. They can enable large collections of text, images, music, video and other information objects to be organised and accessed in interesting and diverse ways. Ordinary people—people not traditionally viewed as 'creators' or 'designers'—can now conceive, assemble, build, and disseminate new information collections. This paper explores the development rationale behind the Greenstone digital library technology. We also examine three examples of creative new techniques for accessing and presenting information in digital libraries and stress the importance of tailoring information access to support the requirements of the users and application area

Research Commons@Waikato

Real-time indexation of meeting recordings

Author: Schindler E.
Publication venue
Publication date: 01/01/2008
Field of study

Repository TU/e

Pure OAI Repository

Accessing spoken interaction through dialogue processing [online]

Author: Ries Klaus
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2002
Field of study

Zusammenfassung Unser Leben, unsere Leistungen und unsere Umgebung, alles wird derzeit durch Schriftsprache dokumentiert. Die rasante Fortentwicklung der technischen Möglichkeiten Audio, Bilder und Video aufzunehmen, abzuspeichern und wiederzugeben kann genutzt werden um die schriftliche Dokumentation von menschlicher Kommunikation, zum Beispiel Meetings, zu unterstützen, zu ergänzen oder gar zu ersetzen. Diese neuen Technologien können uns in die Lage versetzen Information aufzunehmen, die anderweitig verloren gehen, die Kosten der Dokumentation zu senken und hochwertige Dokumente mit audiovisuellem Material anzureichern. Die Indizierung solcher Aufnahmen stellt die Kerntechnologie dar um dieses Potential auszuschöpfen. Diese Arbeit stellt effektive Alternativen zu schlüsselwortbasierten Indizes vor, die Suchraumeinschränkungen bewirken und teilweise mit einfachen Mitteln zu berechnen sind. Die Indizierung von Sprachdokumenten kann auf verschiedenen Ebenen erfolgen: Ein Dokument gehört stilistisch einer bestimmten Datenbasis an, welche durch sehr einfache Merkmale bei hoher Genauigkeit automatisch bestimmt werden kann. Durch diese Art von Klassifikation kann eine Reduktion des Suchraumes um einen Faktor der Größenordnung 410 erfolgen. Die Anwendung von thematischen Merkmalen zur Textklassifikation bei einer Nachrichtendatenbank resultiert in einer Reduktion um einen Faktor 18. Da Sprachdokumente sehr lang sein können müssen sie in thematische Segmente unterteilt werden. Ein neuer probabilistischer Ansatz sowie neue Merkmale (Sprecherinitia tive und Stil) liefern vergleichbare oder bessere Resultate als traditionelle schlüsselwortbasierte Ansätze. Diese thematische Segmente können durch die vorherrschende Aktivität charakterisiert werden (erzählen, diskutieren, planen, ...), die durch ein neuronales Netz detektiert werden kann. Die Detektionsraten sind allerdings begrenzt da auch Menschen diese Aktivitäten nur ungenau bestimmen. Eine maximale Reduktion des Suchraumes um den Faktor 6 ist bei den verwendeten Daten theoretisch möglich. Eine thematische Klassifikation dieser Segmente wurde ebenfalls auf einer Datenbasis durchgeführt, die Detektionsraten für diesen Index sind jedoch gering. Auf der Ebene der einzelnen Äußerungen können Dialogakte wie Aussagen, Fragen, Rückmeldungen (aha, ach ja, echt?, ...) usw. mit einem diskriminativ trainierten Hidden Markov Model erkannt werden. Dieses Verfahren kann um die Erkennung von kurzen Folgen wie Frage/AntwortSpielen erweitert werden (Dialogspiele). Dialogakte und spiele können eingesetzt werden um Klassifikatoren für globale Sprechstile zu bauen. Ebenso könnte ein Benutzer sich an eine bestimmte Dialogaktsequenz erinnern und versuchen, diese in einer grafischen Repräsentation wiederzufinden. In einer Studie mit sehr pessimistischen Annahmen konnten Benutzer eines aus vier ähnlichen und gleichwahrscheinlichen Gesprächen mit einer Genauigkeit von ~ 43% durch eine graphische Repräsentation von Aktivität bestimmt. Dialogakte könnte in diesem Szenario ebenso nützlich sein, die Benutzerstudie konnte aufgrund der geringen Datenmenge darüber keinen endgültigen Aufschluß geben. Die Studie konnte allerdings für detailierte Basismerkmale wie Formalität und Sprecheridentität keinen Effekt zeigen. Abstract Written language is one of our primary means for documenting our lives, achievements, and environment. Our capabilities to record, store and retrieve audio, still pictures, and video are undergoing a revolution and may support, supplement or even replace written documentation. This technology enables us to record information that would otherwise be lost, lower the cost of documentation and enhance highquality documents with original audiovisual material. The indexing of the audio material is the key technology to realize those benefits. This work presents effective alternatives to keyword based indices which restrict the search space and may in part be calculated with very limited resources. Indexing speech documents can be done at a various levels: Stylistically a document belongs to a certain database which can be determined automatically with high accuracy using very simple features. The resulting factor in search space reduction is in the order of 410 while topic classification yielded a factor of 18 in a news domain. Since documents can be very long they need to be segmented into topical regions. A new probabilistic segmentation framework as well as new features (speaker initiative and style) prove to be very effective compared to traditional keyword based methods. At the topical segment level activities (storytelling, discussing, planning, ...) can be detected using a machine learning approach with limited accuracy; however even human annotators do not annotate them very reliably. A maximum search space reduction factor of 6 is theoretically possible on the databases used. A topical classification of these regions has been attempted on one database, the detection accuracy for that index, however, was very low. At the utterance level dialogue acts such as statements, questions, backchannels (aha, yeah, ...), etc. are being recognized using a novel discriminatively trained HMM procedure. The procedure can be extended to recognize short sequences such as question/answer pairs, so called dialogue games. Dialog acts and games are useful for building classifiers for speaking style. Similarily a user may remember a certain dialog act sequence and may search for it in a graphical representation. In a study with very pessimistic assumptions users are able to pick one out of four similar and equiprobable meetings correctly with an accuracy ~ 43% using graphical activity information. Dialogue acts may be useful in this situation as well but the sample size did not allow to draw final conclusions. However the user study fails to show any effect for detailed basic features such as formality or speaker identity

KITopen

Recommended from our members

Creative professional users musical relevance criteria

Author: A. Gruzd
Andy MacFarlane
C. Inskip
C. Inskip
Charlie Inskip
D. Bawden
E. Law
E. Rasmussen
E. Sormunen
E. Voorhees
J. Kim
J.S. Downie
J.S. Downie
L. Barrington
L. Schamber
M. Mandel
Mirex
Pauline Rafferty
S. Rüger
Spotify
T.D. Anderson
Trec
Publication venue: 'SAGE Publications'
Publication date: 28/06/2010
Field of study

Although known item searching for music can be dealt with by searching metadata using existing text search techniques, human subjectivity and variability within the music itself make it very difficult to search for unknown items. This paper examines these problems within the context of text retrieval and music information retrieval. The focus is on ascertaining a relationship between music relevance criteria and those relating to relevance judgements in text retrieval. A data-rich collection of relevance judgements by creative professionals searching for unknown musical items to accompany moving images using real world queries is analysed. The participants in our observations are found to take a socio-cognitive approach and use a range of content and context based criteria. These criteria correlate strongly with those arising from previous text retrieval studies despite the many differences between music and text in their actual content

City Research Online

Crossref

Sound environment analysis in smart home

Author: Boudy Jérôme
Dorizzi Bernadette
Istrate Dan
Lecouteux Benjamin
Portet François
Sehili Mohamed El Amine
Vacher Michel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/11/2012
Field of study

International audienceThis study aims at providing audio-based interaction technology that lets the users have full control over their home environment, at detecting distress situations and at easing the social inclusion of the elderly and frail population. The paper presents the sound and speech analysis system evaluated thanks to a corpus of data acquired in a real smart home environment. The 4 steps of analysis are signal detection, speech/sound discrimination, sound classification and speech recognition. The results are presented for each step and globally. The very first experiments show promising results be it for the modules evaluated independently or for the whole system

Hal - Université Grenoble Alpes