5,651 research outputs found
Audio Event Detection using Weakly Labeled Data
Acoustic event detection is essential for content analysis and description of
multimedia recordings. The majority of current literature on the topic learns
the detectors through fully-supervised techniques employing strongly labeled
data. However, the labels available for majority of multimedia data are
generally weak and do not provide sufficient detail for such methods to be
employed. In this paper we propose a framework for learning acoustic event
detectors using only weakly labeled data. We first show that audio event
detection using weak labels can be formulated as an Multiple Instance Learning
problem. We then suggest two frameworks for solving multiple-instance learning,
one based on support vector machines, and the other on neural networks. The
proposed methods can help in removing the time consuming and expensive process
of manually annotating data to facilitate fully supervised learning. Moreover,
it can not only detect events in a recording but can also provide temporal
locations of events in the recording. This helps in obtaining a complete
description of the recording and is notable since temporal information was
never known in the first place in weakly labeled data.Comment: ACM Multimedia 201
Robotic swarm control from spatio-temporal specifications
In this paper, we study the problem of controlling a two-dimensional robotic swarm with the purpose of achieving high level and complex spatio-temporal patterns. We use a rich spatio-temporal logic that is capable of describing a wide range of time varying and complex spatial configurations, and develop a method to encode such formal specifications as a set of mixed integer linear constraints, which are incorporated into a mixed integer linear programming problem. We plan trajectories for each individual robot such that the whole swarm satisfies the spatio-temporal requirements, while optimizing total robot movement and/or a metric that shows how strongly the swarm trajectory resembles given spatio-temporal behaviors. An illustrative case study is included.This work was partially supported by the National Science Foundation under grants NRI-1426907 and CMMI-1400167. (NRI-1426907 - National Science Foundation; CMMI-1400167 - National Science Foundation
Unsupervised Learning from Narrated Instruction Videos
We address the problem of automatically learning the main steps to complete a
certain task, such as changing a car tire, from a set of narrated instruction
videos. The contributions of this paper are three-fold. First, we develop a new
unsupervised learning approach that takes advantage of the complementary nature
of the input video and the associated narration. The method solves two
clustering problems, one in text and one in video, applied one after each other
and linked by joint constraints to obtain a single coherent sequence of steps
in both modalities. Second, we collect and annotate a new challenging dataset
of real-world instruction videos from the Internet. The dataset contains about
800,000 frames for five different tasks that include complex interactions
between people and objects, and are captured in a variety of indoor and outdoor
settings. Third, we experimentally demonstrate that the proposed method can
automatically discover, in an unsupervised manner, the main steps to achieve
the task and locate the steps in the input videos.Comment: Appears in: 2016 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR 2016). 21 page
Principal Patterns on Graphs: Discovering Coherent Structures in Datasets
Graphs are now ubiquitous in almost every field of research. Recently, new
research areas devoted to the analysis of graphs and data associated to their
vertices have emerged. Focusing on dynamical processes, we propose a fast,
robust and scalable framework for retrieving and analyzing recurring patterns
of activity on graphs. Our method relies on a novel type of multilayer graph
that encodes the spreading or propagation of events between successive time
steps. We demonstrate the versatility of our method by applying it on three
different real-world examples. Firstly, we study how rumor spreads on a social
network. Secondly, we reveal congestion patterns of pedestrians in a train
station. Finally, we show how patterns of audio playlists can be used in a
recommender system. In each example, relevant information previously hidden in
the data is extracted in a very efficient manner, emphasizing the scalability
of our method. With a parallel implementation scaling linearly with the size of
the dataset, our framework easily handles millions of nodes on a single
commodity server
- …