50,113 research outputs found
AXES at TRECVID 2012: KIS, INS, and MED
The AXES project participated in the interactive instance search task (INS), the known-item search task (KIS), and the multimedia event detection task (MED) for TRECVid 2012. As in our TRECVid 2011 system, we used nearly identical search systems and user interfaces for both INS and KIS. Our interactive INS and KIS systems focused this year on using classifiers trained at query time with positive examples collected from external search engines. Participants in our KIS experiments were media professionals from the BBC; our INS experiments were carried out by students and researchers at Dublin City University. We performed comparatively well in both experiments. Our best KIS run found 13 of the 25 topics, and our best INS runs outperformed all other submitted runs in terms of P@100. For MED, the system presented was based on a minimal number of low-level descriptors, which we chose to be as large as computationally feasible. These descriptors are aggregated to produce high-dimensional video-level signatures, which are used to train a set of linear classifiers. Our MED system achieved the second-best score of all submitted runs in the main track, and best score in the ad-hoc track, suggesting that a simple system based on state-of-the-art low-level descriptors can give relatively high performance. This paper describes in detail our KIS, INS, and MED systems and the results and findings of our experiments
Segregating Event Streams and Noise with a Markov Renewal Process Model
DS and MP are supported by EPSRC Leadership Fellowship EP/G007144/1
Convolutional neural networks: a magic bullet for gravitational-wave detection?
In the last few years, machine learning techniques, in particular
convolutional neural networks, have been investigated as a method to replace or
complement traditional matched filtering techniques that are used to detect the
gravitational-wave signature of merging black holes. However, to date, these
methods have not yet been successfully applied to the analysis of long
stretches of data recorded by the Advanced LIGO and Virgo gravitational-wave
observatories. In this work, we critically examine the use of convolutional
neural networks as a tool to search for merging black holes. We identify the
strengths and limitations of this approach, highlight some common pitfalls in
translating between machine learning and gravitational-wave astronomy, and
discuss the interdisciplinary challenges. In particular, we explain in detail
why convolutional neural networks alone cannot be used to claim a statistically
significant gravitational-wave detection. However, we demonstrate how they can
still be used to rapidly flag the times of potential signals in the data for a
more detailed follow-up. Our convolutional neural network architecture as well
as the proposed performance metrics are better suited for this task than a
standard binary classifications scheme. A detailed evaluation of our approach
on Advanced LIGO data demonstrates the potential of such systems as trigger
generators. Finally, we sound a note of caution by constructing adversarial
examples, which showcase interesting "failure modes" of our model, where inputs
with no visible resemblance to real gravitational-wave signals are identified
as such by the network with high confidence.Comment: First two authors contributed equally; appeared at Phys. Rev.
Strategies for Searching Video Content with Text Queries or Video Examples
The large number of user-generated videos uploaded on to the Internet
everyday has led to many commercial video search engines, which mainly rely on
text metadata for search. However, metadata is often lacking for user-generated
videos, thus these videos are unsearchable by current search engines.
Therefore, content-based video retrieval (CBVR) tackles this metadata-scarcity
problem by directly analyzing the visual and audio streams of each video. CBVR
encompasses multiple research topics, including low-level feature design,
feature fusion, semantic detector training and video search/reranking. We
present novel strategies in these topics to enhance CBVR in both accuracy and
speed under different query inputs, including pure textual queries and query by
video examples. Our proposed strategies have been incorporated into our
submission for the TRECVID 2014 Multimedia Event Detection evaluation, where
our system outperformed other submissions in both text queries and video
example queries, thus demonstrating the effectiveness of our proposed
approaches
Enhanced visualisation of dance performance from automatically synchronised multimodal recordings
The Huawei/3DLife Grand Challenge Dataset provides multimodal recordings of Salsa dancing, consisting of audiovisual streams along with depth maps and inertial measurements. In this paper, we propose a system for augmented reality-based evaluations of Salsa dancer performances. An essential step for such a system is the automatic temporal synchronisation of the multiple modalities captured from different sensors, for which we propose efficient solutions. Furthermore, we contribute modules for the automatic analysis of dance performances and present an original software application, specifically designed for the evaluation scenario considered, which enables an enhanced dance visualisation experience, through the augmentation of the original media with the results of our automatic analyses
Automated speech and audio analysis for semantic access to multimedia
The deployment and integration of audio processing tools can enhance the semantic annotation of multimedia content, and as a consequence, improve the effectiveness of conceptual access tools. This paper overviews the various ways in which automatic speech and audio analysis can contribute to increased granularity of automatically extracted metadata. A number of techniques will be presented, including the alignment of speech and text resources, large vocabulary speech recognition, key word spotting and speaker classification. The applicability of techniques will be discussed from a media crossing perspective. The added value of the techniques and their potential contribution to the content value chain will be illustrated by the description of two (complementary) demonstrators for browsing broadcast news archives
- âŠ