Search CORE

8 research outputs found

QARC: Video Quality Aware Rate Control for Real-Time Video Streaming via Deep Reinforcement Learning

Author: Abadi Mart'in
Geng Yufeng
Mnih Volodymyr
Rejaie R.
Rossi Dario
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 27/10/2018
Field of study

Due to the fluctuation of throughput under various network conditions, how to choose a proper bitrate adaptively for real-time video streaming has become an upcoming and interesting issue. Recent work focuses on providing high video bitrates instead of video qualities. Nevertheless, we notice that there exists a trade-off between sending bitrate and video quality, which motivates us to focus on how to get a balance between them. In this paper, we propose QARC (video Quality Awareness Rate Control), a rate control algorithm that aims to have a higher perceptual video quality with possibly lower sending rate and transmission latency. Starting from scratch, QARC uses deep reinforcement learning(DRL) algorithm to train a neural network to select future bitrates based on previously observed network status and past video frames, and we design a neural network to predict future perceptual video quality as a vector for taking the place of the raw picture in the DRL's inputs. We evaluate QARC over a trace-driven emulation. As excepted, QARC betters existing approaches.Comment: Accepted by ACM Multimedia 201

arXiv.org e-Print Archive

Crossref

ThumbNet: One Thumbnail Image Contains All You Need for Recognition

Author: Abadi Mart'in
Han Song
He Yihui
Hinton Geoffrey E
Ioffe Sergey
Li Yujia
Luo Jian-Hao
Simonyan Karen
Xu Mengmeng
Yu Jiahui
Zhao Chen
Zhou Bolei
Publication venue
Publication date: 03/12/2020
Field of study

Although deep convolutional neural networks (CNNs) have achieved great success in computer vision tasks, its real-world application is still impeded by its voracious demand of computational resources. Current works mostly seek to compress the network by reducing its parameters or parameter-incurred computation, neglecting the influence of the input image on the system complexity. Based on the fact that input images of a CNN contain substantial redundancy, in this paper, we propose a unified framework, dubbed as ThumbNet, to simultaneously accelerate and compress CNN models by enabling them to infer on one thumbnail image. We provide three effective strategies to train ThumbNet. In doing so, ThumbNet learns an inference network that performs equally well on small images as the original-input network on large images. With ThumbNet, not only do we obtain the thumbnail-input inference network that can drastically reduce computation and memory requirements, but also we obtain an image downscaler that can generate thumbnail images for generic classification tasks. Extensive experiments show the effectiveness of ThumbNet, and demonstrate that the thumbnail-input inference network learned by ThumbNet can adequately retain the accuracy of the original-input network even when the input images are downscaled 16 times

arXiv.org e-Print Archive

Crossref

Adversarial Infidelity Learning for Model Interpretation

Author: Abadi Mart'in
Ancona Marco
Chakraborti Tathagata
Chen Jianbo
Dombrowski Ann-Kathrin
Heo Juyeon
Howard Andrew G
Jain Sarthak
Schwab Patrick
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 02/08/2020
Field of study

Model interpretation is essential in data mining and knowledge discovery. It can help understand the intrinsic model working mechanism and check if the model has undesired characteristics. A popular way of performing model interpretation is Instance-wise Feature Selection (IFS), which provides an importance score of each feature representing the data samples to explain how the model generates the specific output. In this paper, we propose a Model-agnostic Effective Efficient Direct (MEED) IFS framework for model interpretation, mitigating concerns about sanity, combinatorial shortcuts, model identifiability, and information transmission. Also, we focus on the following setting: using selected features to directly predict the output of the given model, which serves as a primary evaluation metric for model-interpretation methods. Apart from the features, we involve the output of the given model as an additional input to learn an explainer based on more accurate information. To learn the explainer, besides fidelity, we propose an Adversarial Infidelity Learning (AIL) mechanism to boost the explanation learning by screening relatively unimportant features. Through theoretical and experimental analysis, we show that our AIL mechanism can help learn the desired conditional distribution between selected features and targets. Moreover, we extend our framework by integrating efficient interpretation methods as proper priors to provide a warm start. Comprehensive empirical evaluation results are provided by quantitative metrics and human evaluation to demonstrate the effectiveness and superiority of our proposed method. Our code is publicly available online at https://github.com/langlrsw/MEED.Comment: 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '20), August 23--27, 2020, Virtual Event, US

arXiv.org e-Print Archive

Crossref

Modeling Human Visual Search Performance on Realistic Webpages Using Analytical and Deep Learning Methods

Author: Abadi Mart'in
Borji Ali
Chen Kan
Chen X
Devlin Jacob
Ioffe Sergey
Kingma Diederik P
Koch Christof
LeCun Yann
Neisser Ubric
Tehranchi Farnaz
Treue Stefan
Wu Xiaoli
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 07/05/2020
Field of study

Modeling visual search not only offers an opportunity to predict the usability of an interface before actually testing it on real users, but also advances scientific understanding about human behavior. In this work, we first conduct a set of analyses on a large-scale dataset of visual search tasks on realistic webpages. We then present a deep neural network that learns to predict the scannability of webpage content, i.e., how easy it is for a user to find a specific target. Our model leverages both heuristic-based features such as target size and unstructured features such as raw image pixels. This approach allows us to model complex interactions that might be involved in a realistic visual search task, which can not be easily achieved by traditional analytical models. We analyze the model behavior to offer our insights into how the salience map learned by the model aligns with human intuition and how the learned semantic representation of each target type relates to its visual search performance.Comment: the 2020 CHI Conference on Human Factors in Computing System

arXiv.org e-Print Archive

Crossref

Audiovisual Transformer Architectures for Large-Scale Classification and Synchronization of Weakly Labeled Audio Events

Author: Abadi Mart'in
Gehring Jonas
Hershey Shawn
Iqbal Turab
Mesaros Annamaria
Mesaros Annamaria
Parekh Sanjeel
Plumbley Mark D.
Simonyan Karen
Virtanen Tuomas
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 02/12/2019
Field of study

We tackle the task of environmental event classification by drawing inspiration from the transformer neural network architecture used in machine translation. We modify this attention-based feedforward structure in such a way that allows the resulting model to use audio as well as video to compute sound event predictions. We perform extensive experiments with these adapted transformers on an audiovisual data set, obtained by appending relevant visual information to an existing large-scale weakly labeled audio collection. The employed multi-label data contains clip-level annotation indicating the presence or absence of 17 classes of environmental sounds, and does not include temporal information. We show that the proposed modified transformers strongly improve upon previously introduced models and in fact achieve state-of-the-art results. We also make a compelling case for devoting more attention to research in multimodal audiovisual classification by proving the usefulness of visual information for the task at hand,namely audio event recognition. In addition, we visualize internal attention patterns of the audiovisual transformers and in doing so demonstrate their potential for performing multimodal synchronization

arXiv.org e-Print Archive

Crossref

QUOTIENT: Two-Party Secure Neural Network Training and Prediction

Author: Abadi Mart'in
Alistarh Dan
Bernstein Jeremy
Chi-Chih Yao Andrew
Cruz-Roa Angel
Demmler Daniel
Gilad-Bachrach Ran
Gupta Suyog
Hou Lu
Kilbertus Niki
Kingma Diederik P
Kolesnikov Vladimir
Quinlan J. Ross
Reddi Sashank J.
Riazi M Sadegh
Sanyal Amartya
Schoppmann Phillipp
Wagh Sameer
Wen Wei
Publication venue
Publication date: 07/07/2019
Field of study

Recently, there has been a wealth of effort devoted to the design of secure protocols for machine learning tasks. Much of this is aimed at enabling secure prediction from highly-accurate Deep Neural Networks (DNNs). However, as DNNs are trained on data, a key question is how such models can be also trained securely. The few prior works on secure DNN training have focused either on designing custom protocols for existing training algorithms, or on developing tailored training algorithms and then applying generic secure protocols. In this work, we investigate the advantages of designing training algorithms alongside a novel secure protocol, incorporating optimizations on both fronts. We present QUOTIENT, a new method for discretized training of DNNs, along with a customized secure two-party protocol for it. QUOTIENT incorporates key components of state-of-the-art DNN training such as layer normalization and adaptive gradient methods, and improves upon the state-of-the-art in DNN training in two-party computation. Compared to prior work, we obtain an improvement of 50X in WAN time and 6% in absolute accuracy

arXiv.org e-Print Archive

Crossref

UCL Discovery

Oxford University Research Archive

Devil is Virtual: Reversing Virtual Inheritance in C++ Binaries

Complexities that arise from implementation of object-oriented concepts in C++ such as virtual dispatch and dynamic type casting have attracted the attention of attackers and defenders alike. Binary-level defenses are dependent on full and precise recovery of class inheritance tree of a given program. While current solutions focus on recovering single and multiple inheritances from the binary, they are oblivious to virtual inheritance. Conventional wisdom among binary-level defenses is that virtual inheritance is uncommon and/or support for single and multiple inheritances provides implicit support for virtual inheritance. In this paper, we show neither to be true. Specifically, (1) we present an efficient technique to detect virtual inheritance in C++ binaries and show through a study that virtual inheritance can be found in non-negligible number (more than 10\% on Linux and 12.5\% on Windows) of real-world C++ programs including Mysql and libstdc++. (2) we show that failure to handle virtual inheritance introduces both false positives and false negatives in the hierarchy tree. These false positves and negatives either introduce attack surface when the hierarchy recovered is used to enforce CFI policies, or make the hierarchy difficult to understand when it is needed for program understanding (e.g., during decompilation). (3) We present a solution to recover virtual inheritance from COTS binaries. We recover a maximum of 95\% and 95.5\% (GCC -O0) and a minimum of 77.5\% and 73.8\% (Clang -O2) of virtual and intermediate bases respectively in the virtual inheritance tree.Comment: Accepted at CCS20. This is a technical report versio

arXiv.org e-Print Archive

Crossref

High-performance unsupervised anomaly detection for cyber-physical system networks

Author: Abadi Mart'in
Goh Jonathan
Mirsky Yisroel
Publication venue
Publication date: 01/01/2018
Field of study

While the ever-increasing connectivity of cyber-physical systems enlarges their attack surface, existing anomaly detection frameworks often do not incorporate the rising heterogeneity of involved systems. Existing frameworks focus on a single fieldbus protocol or require more detailed knowledge of the cyber-physical system itself. Thus, we introduce a uniform method and framework for applying anomaly detection to a variety of fieldbus protocols. We use stacked denoising autoencoders to derive a feature learning and packet classification method in one step. As the approach is based on the raw byte stream of the network traffic, neither specific protocols nor detailed knowledge of the application is needed. Additionally, we pay attention on creating an efficient framework which can also handle the increased amount of communication in cyber-physical systems. Our evaluation on a Secure Water Treatment dataset using EtherNet/IP and a Modbus dataset shows that we can acquire network packets up to 100 times faster than packet parsing based methods. However, we still achieve precision and recall metrics for longer lasting attacks of over 99%

Crossref

Fraunhofer-ePrints