8 research outputs found
QARC: Video Quality Aware Rate Control for Real-Time Video Streaming via Deep Reinforcement Learning
Due to the fluctuation of throughput under various network conditions, how to
choose a proper bitrate adaptively for real-time video streaming has become an
upcoming and interesting issue. Recent work focuses on providing high video
bitrates instead of video qualities. Nevertheless, we notice that there exists
a trade-off between sending bitrate and video quality, which motivates us to
focus on how to get a balance between them. In this paper, we propose QARC
(video Quality Awareness Rate Control), a rate control algorithm that aims to
have a higher perceptual video quality with possibly lower sending rate and
transmission latency. Starting from scratch, QARC uses deep reinforcement
learning(DRL) algorithm to train a neural network to select future bitrates
based on previously observed network status and past video frames, and we
design a neural network to predict future perceptual video quality as a vector
for taking the place of the raw picture in the DRL's inputs. We evaluate QARC
over a trace-driven emulation. As excepted, QARC betters existing approaches.Comment: Accepted by ACM Multimedia 201
ThumbNet: One Thumbnail Image Contains All You Need for Recognition
Although deep convolutional neural networks (CNNs) have achieved great
success in computer vision tasks, its real-world application is still impeded
by its voracious demand of computational resources. Current works mostly seek
to compress the network by reducing its parameters or parameter-incurred
computation, neglecting the influence of the input image on the system
complexity. Based on the fact that input images of a CNN contain substantial
redundancy, in this paper, we propose a unified framework, dubbed as ThumbNet,
to simultaneously accelerate and compress CNN models by enabling them to infer
on one thumbnail image. We provide three effective strategies to train
ThumbNet. In doing so, ThumbNet learns an inference network that performs
equally well on small images as the original-input network on large images.
With ThumbNet, not only do we obtain the thumbnail-input inference network that
can drastically reduce computation and memory requirements, but also we obtain
an image downscaler that can generate thumbnail images for generic
classification tasks. Extensive experiments show the effectiveness of ThumbNet,
and demonstrate that the thumbnail-input inference network learned by ThumbNet
can adequately retain the accuracy of the original-input network even when the
input images are downscaled 16 times
Adversarial Infidelity Learning for Model Interpretation
Model interpretation is essential in data mining and knowledge discovery. It
can help understand the intrinsic model working mechanism and check if the
model has undesired characteristics. A popular way of performing model
interpretation is Instance-wise Feature Selection (IFS), which provides an
importance score of each feature representing the data samples to explain how
the model generates the specific output. In this paper, we propose a
Model-agnostic Effective Efficient Direct (MEED) IFS framework for model
interpretation, mitigating concerns about sanity, combinatorial shortcuts,
model identifiability, and information transmission. Also, we focus on the
following setting: using selected features to directly predict the output of
the given model, which serves as a primary evaluation metric for
model-interpretation methods. Apart from the features, we involve the output of
the given model as an additional input to learn an explainer based on more
accurate information. To learn the explainer, besides fidelity, we propose an
Adversarial Infidelity Learning (AIL) mechanism to boost the explanation
learning by screening relatively unimportant features. Through theoretical and
experimental analysis, we show that our AIL mechanism can help learn the
desired conditional distribution between selected features and targets.
Moreover, we extend our framework by integrating efficient interpretation
methods as proper priors to provide a warm start. Comprehensive empirical
evaluation results are provided by quantitative metrics and human evaluation to
demonstrate the effectiveness and superiority of our proposed method. Our code
is publicly available online at https://github.com/langlrsw/MEED.Comment: 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
(KDD '20), August 23--27, 2020, Virtual Event, US
Modeling Human Visual Search Performance on Realistic Webpages Using Analytical and Deep Learning Methods
Modeling visual search not only offers an opportunity to predict the
usability of an interface before actually testing it on real users, but also
advances scientific understanding about human behavior. In this work, we first
conduct a set of analyses on a large-scale dataset of visual search tasks on
realistic webpages. We then present a deep neural network that learns to
predict the scannability of webpage content, i.e., how easy it is for a user to
find a specific target. Our model leverages both heuristic-based features such
as target size and unstructured features such as raw image pixels. This
approach allows us to model complex interactions that might be involved in a
realistic visual search task, which can not be easily achieved by traditional
analytical models. We analyze the model behavior to offer our insights into how
the salience map learned by the model aligns with human intuition and how the
learned semantic representation of each target type relates to its visual
search performance.Comment: the 2020 CHI Conference on Human Factors in Computing System
Audiovisual Transformer Architectures for Large-Scale Classification and Synchronization of Weakly Labeled Audio Events
We tackle the task of environmental event classification by drawing
inspiration from the transformer neural network architecture used in machine
translation. We modify this attention-based feedforward structure in such a way
that allows the resulting model to use audio as well as video to compute sound
event predictions. We perform extensive experiments with these adapted
transformers on an audiovisual data set, obtained by appending relevant visual
information to an existing large-scale weakly labeled audio collection. The
employed multi-label data contains clip-level annotation indicating the
presence or absence of 17 classes of environmental sounds, and does not include
temporal information. We show that the proposed modified transformers strongly
improve upon previously introduced models and in fact achieve state-of-the-art
results. We also make a compelling case for devoting more attention to research
in multimodal audiovisual classification by proving the usefulness of visual
information for the task at hand,namely audio event recognition. In addition,
we visualize internal attention patterns of the audiovisual transformers and in
doing so demonstrate their potential for performing multimodal synchronization
QUOTIENT: Two-Party Secure Neural Network Training and Prediction
Recently, there has been a wealth of effort devoted to the design of secure
protocols for machine learning tasks. Much of this is aimed at enabling secure
prediction from highly-accurate Deep Neural Networks (DNNs). However, as DNNs
are trained on data, a key question is how such models can be also trained
securely. The few prior works on secure DNN training have focused either on
designing custom protocols for existing training algorithms, or on developing
tailored training algorithms and then applying generic secure protocols. In
this work, we investigate the advantages of designing training algorithms
alongside a novel secure protocol, incorporating optimizations on both fronts.
We present QUOTIENT, a new method for discretized training of DNNs, along with
a customized secure two-party protocol for it. QUOTIENT incorporates key
components of state-of-the-art DNN training such as layer normalization and
adaptive gradient methods, and improves upon the state-of-the-art in DNN
training in two-party computation. Compared to prior work, we obtain an
improvement of 50X in WAN time and 6% in absolute accuracy
Devil is Virtual: Reversing Virtual Inheritance in C++ Binaries
Complexities that arise from implementation of object-oriented concepts in
C++ such as virtual dispatch and dynamic type casting have attracted the
attention of attackers and defenders alike.
Binary-level defenses are dependent on full and precise recovery of class
inheritance tree of a given program.
While current solutions focus on recovering single and multiple inheritances
from the binary, they are oblivious to virtual inheritance. Conventional wisdom
among binary-level defenses is that virtual inheritance is uncommon and/or
support for single and multiple inheritances provides implicit support for
virtual inheritance. In this paper, we show neither to be true.
Specifically, (1) we present an efficient technique to detect virtual
inheritance in C++ binaries and show through a study that virtual inheritance
can be found in non-negligible number (more than 10\% on Linux and 12.5\% on
Windows) of real-world C++ programs including Mysql and libstdc++. (2) we show
that failure to handle virtual inheritance introduces both false positives and
false negatives in the hierarchy tree. These false positves and negatives
either introduce attack surface when the hierarchy recovered is used to enforce
CFI policies, or make the hierarchy difficult to understand when it is needed
for program understanding (e.g., during decompilation). (3) We present a
solution to recover virtual inheritance from COTS binaries. We recover a
maximum of 95\% and 95.5\% (GCC -O0) and a minimum of 77.5\% and 73.8\% (Clang
-O2) of virtual and intermediate bases respectively in the virtual inheritance
tree.Comment: Accepted at CCS20. This is a technical report versio
High-performance unsupervised anomaly detection for cyber-physical system networks
While the ever-increasing connectivity of cyber-physical systems enlarges their attack surface, existing anomaly detection frameworks often do not incorporate the rising heterogeneity of involved systems. Existing frameworks focus on a single fieldbus protocol or require more detailed knowledge of the cyber-physical system itself. Thus, we introduce a uniform method and framework for applying anomaly detection to a variety of fieldbus protocols. We use stacked denoising autoencoders to derive a feature learning and packet classification method in one step. As the approach is based on the raw byte stream of the network traffic, neither specific protocols nor detailed knowledge of the application is needed. Additionally, we pay attention on creating an efficient framework which can also handle the increased amount of communication in cyber-physical systems. Our evaluation on a Secure Water Treatment dataset using EtherNet/IP and a Modbus dataset shows that we can acquire network packets up to 100 times faster than packet parsing based methods. However, we still achieve precision and recall metrics for longer lasting attacks of over 99%