374 research outputs found
LoANs: Weakly Supervised Object Detection with Localizer Assessor Networks
Recently, deep neural networks have achieved remarkable performance on the
task of object detection and recognition. The reason for this success is mainly
grounded in the availability of large scale, fully annotated datasets, but the
creation of such a dataset is a complicated and costly task. In this paper, we
propose a novel method for weakly supervised object detection that simplifies
the process of gathering data for training an object detector. We train an
ensemble of two models that work together in a student-teacher fashion. Our
student (localizer) is a model that learns to localize an object, the teacher
(assessor) assesses the quality of the localization and provides feedback to
the student. The student uses this feedback to learn how to localize objects
and is thus entirely supervised by the teacher, as we are using no labels for
training the localizer. In our experiments, we show that our model is very
robust to noise and reaches competitive performance compared to a
state-of-the-art fully supervised approach. We also show the simplicity of
creating a new dataset, based on a few videos (e.g. downloaded from YouTube)
and artificially generated data.Comment: To appear in AMV18. Code, datasets and models available at
https://github.com/Bartzi/loan
Temporal Localization of Fine-Grained Actions in Videos by Domain Transfer from Web Images
We address the problem of fine-grained action localization from temporally
untrimmed web videos. We assume that only weak video-level annotations are
available for training. The goal is to use these weak labels to identify
temporal segments corresponding to the actions, and learn models that
generalize to unconstrained web videos. We find that web images queried by
action names serve as well-localized highlights for many actions, but are
noisily labeled. To solve this problem, we propose a simple yet effective
method that takes weak video labels and noisy image labels as input, and
generates localized action frames as output. This is achieved by cross-domain
transfer between video frames and web images, using pre-trained deep
convolutional neural networks. We then use the localized action frames to train
action recognition models with long short-term memory networks. We collect a
fine-grained sports action data set FGA-240 of more than 130,000 YouTube
videos. It has 240 fine-grained actions under 85 sports activities. Convincing
results are shown on the FGA-240 data set, as well as the THUMOS 2014
localization data set with untrimmed training videos.Comment: Camera ready version for ACM Multimedia 201
Neural NILM: Deep Neural Networks Applied to Energy Disaggregation
Energy disaggregation estimates appliance-by-appliance electricity
consumption from a single meter that measures the whole home's electricity
demand. Recently, deep neural networks have driven remarkable improvements in
classification performance in neighbouring machine learning fields such as
image classification and automatic speech recognition. In this paper, we adapt
three deep neural network architectures to energy disaggregation: 1) a form of
recurrent neural network called `long short-term memory' (LSTM); 2) denoising
autoencoders; and 3) a network which regresses the start time, end time and
average power demand of each appliance activation. We use seven metrics to test
the performance of these algorithms on real aggregate power data from five
appliances. Tests are performed against a house not seen during training and
against houses seen during training. We find that all three neural nets achieve
better F1 scores (averaged over all five appliances) than either combinatorial
optimisation or factorial hidden Markov models and that our neural net
algorithms generalise well to an unseen house.Comment: To appear in ACM BuildSys'15, November 4--5, 2015, Seou
Neural Networks for Information Retrieval
Machine learning plays a role in many aspects of modern IR systems, and deep
learning is applied in all of them. The fast pace of modern-day research has
given rise to many different approaches for many different IR problems. The
amount of information available can be overwhelming both for junior students
and for experienced researchers looking for new research topics and directions.
Additionally, it is interesting to see what key insights into IR problems the
new technologies are able to give us. The aim of this full-day tutorial is to
give a clear overview of current tried-and-trusted neural methods in IR and how
they benefit IR research. It covers key architectures, as well as the most
promising future directions.Comment: Overview of full-day tutorial at SIGIR 201
DeepWalk: Online Learning of Social Representations
We present DeepWalk, a novel approach for learning latent representations of
vertices in a network. These latent representations encode social relations in
a continuous vector space, which is easily exploited by statistical models.
DeepWalk generalizes recent advancements in language modeling and unsupervised
feature learning (or deep learning) from sequences of words to graphs. DeepWalk
uses local information obtained from truncated random walks to learn latent
representations by treating walks as the equivalent of sentences. We
demonstrate DeepWalk's latent representations on several multi-label network
classification tasks for social networks such as BlogCatalog, Flickr, and
YouTube. Our results show that DeepWalk outperforms challenging baselines which
are allowed a global view of the network, especially in the presence of missing
information. DeepWalk's representations can provide scores up to 10%
higher than competing methods when labeled data is sparse. In some experiments,
DeepWalk's representations are able to outperform all baseline methods while
using 60% less training data. DeepWalk is also scalable. It is an online
learning algorithm which builds useful incremental results, and is trivially
parallelizable. These qualities make it suitable for a broad class of real
world applications such as network classification, and anomaly detection.Comment: 10 pages, 5 figures, 4 table
Evaluating Two-Stream CNN for Video Classification
Videos contain very rich semantic information. Traditional hand-crafted
features are known to be inadequate in analyzing complex video semantics.
Inspired by the huge success of the deep learning methods in analyzing image,
audio and text data, significant efforts are recently being devoted to the
design of deep nets for video analytics. Among the many practical needs,
classifying videos (or video clips) based on their major semantic categories
(e.g., "skiing") is useful in many applications. In this paper, we conduct an
in-depth study to investigate important implementation options that may affect
the performance of deep nets on video classification. Our evaluations are
conducted on top of a recent two-stream convolutional neural network (CNN)
pipeline, which uses both static frames and motion optical flows, and has
demonstrated competitive performance against the state-of-the-art methods. In
order to gain insights and to arrive at a practical guideline, many important
options are studied, including network architectures, model fusion, learning
parameters and the final prediction methods. Based on the evaluations, very
competitive results are attained on two popular video classification
benchmarks. We hope that the discussions and conclusions from this work can
help researchers in related fields to quickly set up a good basis for further
investigations along this very promising direction.Comment: ACM ICMR'1
ZipNet-GAN: Inferring Fine-grained Mobile Traffic Patterns via a Generative Adversarial Neural Network
Large-scale mobile traffic analytics is becoming essential to digital
infrastructure provisioning, public transportation, events planning, and other
domains. Monitoring city-wide mobile traffic is however a complex and costly
process that relies on dedicated probes. Some of these probes have limited
precision or coverage, others gather tens of gigabytes of logs daily, which
independently offer limited insights. Extracting fine-grained patterns involves
expensive spatial aggregation of measurements, storage, and post-processing. In
this paper, we propose a mobile traffic super-resolution technique that
overcomes these problems by inferring narrowly localised traffic consumption
from coarse measurements. We draw inspiration from image processing and design
a deep-learning architecture tailored to mobile networking, which combines
Zipper Network (ZipNet) and Generative Adversarial neural Network (GAN) models.
This enables to uniquely capture spatio-temporal relations between traffic
volume snapshots routinely monitored over broad coverage areas
(`low-resolution') and the corresponding consumption at 0.05 km level
(`high-resolution') usually obtained after intensive computation. Experiments
we conduct with a real-world data set demonstrate that the proposed
ZipNet(-GAN) infers traffic consumption with remarkable accuracy and up to
100 higher granularity as compared to standard probing, while
outperforming existing data interpolation techniques. To our knowledge, this is
the first time super-resolution concepts are applied to large-scale mobile
traffic analysis and our solution is the first to infer fine-grained urban
traffic patterns from coarse aggregates.Comment: To appear ACM CoNEXT 201
Geometric deep learning
The goal of these course notes is to describe the main mathematical ideas behind geometric deep learning and to provide implementation details for several applications in shape analysis and synthesis, computer vision and computer graphics. The text in the course materials is primarily based on previously published work. With these notes we gather and provide a clear picture of the key concepts and techniques that fall under the umbrella of geometric deep learning, and illustrate the applications they enable. We also aim to provide practical implementation details for the methods presented in these works, as well as suggest further readings and extensions of these ideas
- …