35,074 research outputs found

    An Illustrated Methodology for Evaluating ASR Systems

    Get PDF
    Proceeding of: 9th International Workshop on Adaptive Multimedia Retrieval (AMR 2011) Took place 2011, July, 18-19, in Barcelona, Spain. The event Web site is http://stel.ub.edu/amr2011/Automatic speech recognition technology can be integrated in an information retrieval process to allow searching on multimedia contents. But, in order to assure an adequate retrieval performance is necessary to state the quality of the recognition phase, especially in speaker-independent and domainindependent environments. This paper introduces a methodology to accomplish the evaluation of different speech recognition systems in several scenarios considering also the creation of new corpora of different types (broadcast news, interviews, etc.), especially in other languages apart from English that are not widely addressed in speech community.This work has been partially supported by the Spanish Center for Industry Technological Development (CDTI, Ministry of Industry, Tourism and Trade), through the BUSCAMEDIA Project (CEN-20091026). And also by MA2VICMR: Improving the access, analysis and visibility of the multilingual and multimedia information in web for the Region of Madrid (S2009/TIC-1542).Publicad

    Language-based multimedia information retrieval

    Get PDF
    This paper describes various methods and approaches for language-based multimedia information retrieval, which have been developed in the projects POP-EYE and OLIVE and which will be developed further in the MUMIS project. All of these project aim at supporting automated indexing of video material by use of human language technologies. Thus, in contrast to image or sound-based retrieval methods, where both the query language and the indexing methods build on non-linguistic data, these methods attempt to exploit advanced text retrieval technologies for the retrieval of non-textual material. While POP-EYE was building on subtitles or captions as the prime language key for disclosing video fragments, OLIVE is making use of speech recognition to automatically derive transcriptions of the sound tracks, generating time-coded linguistic elements which then serve as the basis for text-based retrieval functionality

    ICMR 2014: 4th ACM International Conference on Multimedia Retrieval

    Get PDF
    ICMR was initially started as a workshop on challenges in image retrieval (in Newcastle in 1998 ) and later transformed into the Conference on Image and Video Retrieval (CIVR) series. In 2011 the CIVR and the ACM Workshop on Multimedia Information Retrieval were combined into a single conference that now forms the ICMR series. The 4th ACM International Conference on Multimedia Retrieval took place in Glasgow, Scotland, from 1 – 4 April 2014. This was the largest edition of ICMR to date with approximately 170 attendees from 25 different countries. ICMR is one of the premier scientific conference for multimedia retrieval held worldwide, with the stated mission “to illuminate the state of the art in multimedia retrieval by bringing together researchers and practitioners in the field of multimedia retrieval .” According to the Chinese Computing Federation Conference Ranking (2013), ACM ICMR is the number one multimedia retrieval conference worldwide and the number four conference in the category of multimedia and graphics. Although ICMR is about multimedia retrieval, in a wider sense, it is also about automated multimedia understanding. Much of the work in that area involves the analysis of media on a pixel, voxel, and wavelet level, but it also involves innovative retrieval, visualisation and interaction paradigms utilising the nature of the multimedia — be it video, images, speech, or more abstract (sensor) data. The conference aims to promote intellectual exchanges and interactions among scientists, engineers, students, and multimedia researchers in academia as well as industry through various events, including a keynote talk, oral, special and poster sessions focused on re search challenges and solutions, technical and industrial demonstrations of prototypes, tutorials, research, and an industrial panel. In the remainder of this report we will summarise the events that took place at the 4th ACM ICMR conference

    Overview of VideoCLEF 2009: New perspectives on speech-based multimedia content enrichment

    Get PDF
    VideoCLEF 2009 offered three tasks related to enriching video content for improved multimedia access in a multilingual environment. For each task, video data (Dutch-language television, predominantly documentaries) accompanied by speech recognition transcripts were provided. The Subject Classification Task involved automatic tagging of videos with subject theme labels. The best performance was achieved by approaching subject tagging as an information retrieval task and using both speech recognition transcripts and archival metadata. Alternatively, classifiers were trained using either the training data provided or data collected from Wikipedia or via general Web search. The Affect Task involved detecting narrative peaks, defined as points where viewers perceive heightened dramatic tension. The task was carried out on the “Beeldenstorm” collection containing 45 short-form documentaries on the visual arts. The best runs exploited affective vocabulary and audience directed speech. Other approaches included using topic changes, elevated speaking pitch, increased speaking intensity and radical visual changes. The Linking Task, also called “Finding Related Resources Across Languages,” involved linking video to material on the same subject in a different language. Participants were provided with a list of multimedia anchors (short video segments) in the Dutch-language “Beeldenstorm” collection and were expected to return target pages drawn from English-language Wikipedia. The best performing methods used the transcript of the speech spoken during the multimedia anchor to build a query to search an index of the Dutch language Wikipedia. The Dutch Wikipedia pages returned were used to identify related English pages. Participants also experimented with pseudo-relevance feedback, query translation and methods that targeted proper names

    Multiple Media Correlation: Theory and Applications

    Get PDF
    This thesis introduces multiple media correlation, a new technology for the automatic alignment of multiple media objects such as text, audio, and video. This research began with the question: what can be learned when multiple multimedia components are analyzed simultaneously? Most ongoing research in computational multimedia has focused on queries, indexing, and retrieval within a single media type. Video is compressed and searched independently of audio, text is indexed without regard to temporal relationships it may have to other media data. Multiple media correlation provides a framework for locating and exploiting correlations between multiple, potentially heterogeneous, media streams. The goal is computed synchronization, the determination of temporal and spatial alignments that optimize a correlation function and indicate commonality and synchronization between media objects. The model also provides a basis for comparison of media in unrelated domains. There are many real-world applications for this technology, including speaker localization, musical score alignment, and degraded media realignment. Two applications, text-to-speech alignment and parallel text alignment, are described in detail with experimental validation. Text-to-speech alignment computes the alignment between a textual transcript and speech-based audio. The presented solutions are effective for a wide variety of content and are useful not only for retrieval of content, but in support of automatic captioning of movies and video. Parallel text alignment provides a tool for the comparison of alternative translations of the same document that is particularly useful to the classics scholar interested in comparing translation techniques or styles. The results presented in this thesis include (a) new media models more useful in analysis applications, (b) a theoretical model for multiple media correlation, (c) two practical application solutions that have wide-spread applicability, and (d) Xtrieve, a multimedia database retrieval system that demonstrates this new technology and demonstrates application of multiple media correlation to information retrieval. This thesis demonstrates that computed alignment of media objects is practical and can provide immediate solutions to many information retrieval and content presentation problems. It also introduces a new area for research in media data analysis

    An Extension of Rhetorical Structure Theory for the Treatment of Retrieval Dialogues

    Get PDF
    A unification of a speech-act oriented model for information-seeking dialogues (cor) with a model to describe the structure of monological text units (rst) is presented. This paper focuses on the necessary extensions of rst in order to be applicable for information-seeking dialogues: New relations are to be defined and basic assumptions of RST have to be relaxed. Our approach is verified by interfacing the dialogue component of an intelligent multimedia retrieval system with a component for natural language generation

    Multimedia search without visual analysis: the value of linguistic and contextual information

    Get PDF
    This paper addresses the focus of this special issue by analyzing the potential contribution of linguistic content and other non-image aspects to the processing of audiovisual data. It summarizes the various ways in which linguistic content analysis contributes to enhancing the semantic annotation of multimedia content, and, as a consequence, to improving the effectiveness of conceptual media access tools. A number of techniques are presented, including the time-alignment of textual resources, audio and speech processing, content reduction and reasoning tools, and the exploitation of surface features

    Toward an adaptive video retrieval system

    Get PDF
    Unlike text retrieval systems, retrieval of digital video libraries is facing a challenging problem: the semantic gap. Th is is the diïŹ€ erence between the low-level data representation of videos and the higher level concepts that a user associates with video. In 2005, the panel members of the International Workshop on Multimedia Information Retrieval identiïŹ ed this gap as one of the main technical problems in multimedia retrieval (Jaimes et al. 2005), carrying the potential to dominate the research eïŹ€ orts in multimedia retrieval for the next few years. Retrievable information such as textual sources of video clips (i.e., speech transcripts) is often not reliable enough to describe the actual content of a clip. Moreover, the approach of using visual features and automatically detecting high-level concepts, which have been the main focus of study within the international video processing and evaluation campaign TRECVID (Smeaton et al. 2006), turned out to be insuïŹƒ cient to bridge the semantic gap

    Augmenting conversations through context-aware multimedia retrieval based on speech recognition

    Get PDF
    Future’s environments will be sensitive and responsive to the presence of people to support them carrying out their everyday life activities, tasks and rituals, in an easy and natural way. Such interactive spaces will use the information and communication technologies to bring the computation into the physical world, in order to enhance ordinary activities of their users. This paper describes a speech-based spoken multimedia retrieval system that can be used to present relevant video-podcast (vodcast) footage, in response to spontaneous speech and conversations during daily life activities. The proposed system allows users to search the spoken content of multimedia files rather than their associated meta-information and let them navigate to the right portion where queried words are spoken by facilitating within-medium searches of multimedia content through a bag-of-words approach. Finally, we have studied the proposed system on different scenarios by using vodcasts in English from various categories, as the targeted multimedia, and discussed how it would enhance people’s everyday life activities by different scenarios including education, entertainment, marketing, news and workplace

    A collaborative web platform for sound archives management and analysis

    Get PDF
    In the context of digital sound archives, an innovative web framework for automatic analysis and manual annotation of audio files has been developed. This web framework, is called Timeside and is available under an open-source license. The TimeSide framework associates an audio processing engine, an audio database, a web API and a client-side multimedia player. The audio processing engine is written in Python language and has been designed for speech and audio signal analysis and Music Information Retrieval (MIR) tasks. It includes a set of audio analysis plugins and additionally wraps several state-of-the-art audio features extraction libraries to provide automatic annotation, segmentation and Music Information Retrieval analysis. It also provides decoding and encoding methods for most common multimedia formats. The audio database application is handled through Django (Python) and is interfaced with the audio processing engine. The web API component provides these functionalities over the web to enable web client to run analysis on the sounds in the audio database. Last but not least, the multimedia player provides an web player associated with several sound and analysis visualizations together with an annotations editor through a multi-tracks display. The TimeSide platform is available as an open-source project at the following addresses: TimeSide: https://github.com/Parisson/TimeSid
    • 

    corecore