21 research outputs found
QARC: Video Quality Aware Rate Control for Real-Time Video Streaming via Deep Reinforcement Learning
Due to the fluctuation of throughput under various network conditions, how to
choose a proper bitrate adaptively for real-time video streaming has become an
upcoming and interesting issue. Recent work focuses on providing high video
bitrates instead of video qualities. Nevertheless, we notice that there exists
a trade-off between sending bitrate and video quality, which motivates us to
focus on how to get a balance between them. In this paper, we propose QARC
(video Quality Awareness Rate Control), a rate control algorithm that aims to
have a higher perceptual video quality with possibly lower sending rate and
transmission latency. Starting from scratch, QARC uses deep reinforcement
learning(DRL) algorithm to train a neural network to select future bitrates
based on previously observed network status and past video frames, and we
design a neural network to predict future perceptual video quality as a vector
for taking the place of the raw picture in the DRL's inputs. We evaluate QARC
over a trace-driven emulation. As excepted, QARC betters existing approaches.Comment: Accepted by ACM Multimedia 201
ThumbNet: One Thumbnail Image Contains All You Need for Recognition
Although deep convolutional neural networks (CNNs) have achieved great
success in computer vision tasks, its real-world application is still impeded
by its voracious demand of computational resources. Current works mostly seek
to compress the network by reducing its parameters or parameter-incurred
computation, neglecting the influence of the input image on the system
complexity. Based on the fact that input images of a CNN contain substantial
redundancy, in this paper, we propose a unified framework, dubbed as ThumbNet,
to simultaneously accelerate and compress CNN models by enabling them to infer
on one thumbnail image. We provide three effective strategies to train
ThumbNet. In doing so, ThumbNet learns an inference network that performs
equally well on small images as the original-input network on large images.
With ThumbNet, not only do we obtain the thumbnail-input inference network that
can drastically reduce computation and memory requirements, but also we obtain
an image downscaler that can generate thumbnail images for generic
classification tasks. Extensive experiments show the effectiveness of ThumbNet,
and demonstrate that the thumbnail-input inference network learned by ThumbNet
can adequately retain the accuracy of the original-input network even when the
input images are downscaled 16 times
Modeling Human Visual Search Performance on Realistic Webpages Using Analytical and Deep Learning Methods
Modeling visual search not only offers an opportunity to predict the
usability of an interface before actually testing it on real users, but also
advances scientific understanding about human behavior. In this work, we first
conduct a set of analyses on a large-scale dataset of visual search tasks on
realistic webpages. We then present a deep neural network that learns to
predict the scannability of webpage content, i.e., how easy it is for a user to
find a specific target. Our model leverages both heuristic-based features such
as target size and unstructured features such as raw image pixels. This
approach allows us to model complex interactions that might be involved in a
realistic visual search task, which can not be easily achieved by traditional
analytical models. We analyze the model behavior to offer our insights into how
the salience map learned by the model aligns with human intuition and how the
learned semantic representation of each target type relates to its visual
search performance.Comment: the 2020 CHI Conference on Human Factors in Computing System
Adversarial Infidelity Learning for Model Interpretation
Model interpretation is essential in data mining and knowledge discovery. It
can help understand the intrinsic model working mechanism and check if the
model has undesired characteristics. A popular way of performing model
interpretation is Instance-wise Feature Selection (IFS), which provides an
importance score of each feature representing the data samples to explain how
the model generates the specific output. In this paper, we propose a
Model-agnostic Effective Efficient Direct (MEED) IFS framework for model
interpretation, mitigating concerns about sanity, combinatorial shortcuts,
model identifiability, and information transmission. Also, we focus on the
following setting: using selected features to directly predict the output of
the given model, which serves as a primary evaluation metric for
model-interpretation methods. Apart from the features, we involve the output of
the given model as an additional input to learn an explainer based on more
accurate information. To learn the explainer, besides fidelity, we propose an
Adversarial Infidelity Learning (AIL) mechanism to boost the explanation
learning by screening relatively unimportant features. Through theoretical and
experimental analysis, we show that our AIL mechanism can help learn the
desired conditional distribution between selected features and targets.
Moreover, we extend our framework by integrating efficient interpretation
methods as proper priors to provide a warm start. Comprehensive empirical
evaluation results are provided by quantitative metrics and human evaluation to
demonstrate the effectiveness and superiority of our proposed method. Our code
is publicly available online at https://github.com/langlrsw/MEED.Comment: 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
(KDD '20), August 23--27, 2020, Virtual Event, US
Audiovisual Transformer Architectures for Large-Scale Classification and Synchronization of Weakly Labeled Audio Events
We tackle the task of environmental event classification by drawing
inspiration from the transformer neural network architecture used in machine
translation. We modify this attention-based feedforward structure in such a way
that allows the resulting model to use audio as well as video to compute sound
event predictions. We perform extensive experiments with these adapted
transformers on an audiovisual data set, obtained by appending relevant visual
information to an existing large-scale weakly labeled audio collection. The
employed multi-label data contains clip-level annotation indicating the
presence or absence of 17 classes of environmental sounds, and does not include
temporal information. We show that the proposed modified transformers strongly
improve upon previously introduced models and in fact achieve state-of-the-art
results. We also make a compelling case for devoting more attention to research
in multimodal audiovisual classification by proving the usefulness of visual
information for the task at hand,namely audio event recognition. In addition,
we visualize internal attention patterns of the audiovisual transformers and in
doing so demonstrate their potential for performing multimodal synchronization
A Substellar Companion to Pleiades HII 3441
We find a new substellar companion to the Pleiades member star, Pleiades HII
3441, using the Subaru telescope with adaptive optics. The discovery is made as
part of the high-contrast imaging survey to search for planetary-mass and
substellar companions in the Pleiades and young moving groups. The companion
has a projected separation of 0".49 +/- 0".02 (66 +/- 2 AU) and a mass of 68
+/- 5 M_J based on three observations in the J-, H-, and K_S-band. The spectral
type is estimated to be M7 (~2700 K), and thus no methane absorption is
detected in the H band. Our Pleiades observations result in the detection of
two substellar companions including one previously reported among 20 observed
Pleiades stars, and indicate that the fraction of substellar companions in the
Pleiades is about 10.0 +26.1/-8.8 %. This is consistent with multiplicity
studies of both the Pleiades stars and other open clusters.Comment: Main text (14 pages, 4 figures, 4 tables), and Supplementary data (8
pages, 3 tables). Accepted for Publications of Astronomical Society of Japa
QUOTIENT: Two-Party Secure Neural Network Training and Prediction
Recently, there has been a wealth of effort devoted to the design of secure
protocols for machine learning tasks. Much of this is aimed at enabling secure
prediction from highly-accurate Deep Neural Networks (DNNs). However, as DNNs
are trained on data, a key question is how such models can be also trained
securely. The few prior works on secure DNN training have focused either on
designing custom protocols for existing training algorithms, or on developing
tailored training algorithms and then applying generic secure protocols. In
this work, we investigate the advantages of designing training algorithms
alongside a novel secure protocol, incorporating optimizations on both fronts.
We present QUOTIENT, a new method for discretized training of DNNs, along with
a customized secure two-party protocol for it. QUOTIENT incorporates key
components of state-of-the-art DNN training such as layer normalization and
adaptive gradient methods, and improves upon the state-of-the-art in DNN
training in two-party computation. Compared to prior work, we obtain an
improvement of 50X in WAN time and 6% in absolute accuracy
Declarative Experimentation in Information Retrieval Using PyTerrier
The advent of deep machine learning platforms such as Tensorflow and Pytorch, developed in expressive high-level languages such as Python, have allowed more expressive representations of deep neural network architectures. We argue that such a powerful formalism is missing in information retrieval (IR), and propose a framework called PyTerrier that allows advanced retrieval pipelines to be expressed, and evaluated, in a declarative manner close to their conceptual design. Like the aforementioned frameworks that compile deep learning experiments into primitive GPU operations, our framework targets IR platforms as backends in order to execute and evaluate retrieval pipelines. Further, we can automatically optimise the retrieval pipelines to increase their efficiency to suite a particular IR platform backend. Our experiments, conducted on TREC Robust and ClueWeb09 test collections, demonstrate the efficiency benefits of these optimisations for retrieval pipelines involving both the Anserini and Terrier IR platforms