Search CORE

21 research outputs found

QARC: Video Quality Aware Rate Control for Real-Time Video Streaming via Deep Reinforcement Learning

Author: Abadi Mart'in
Geng Yufeng
Mnih Volodymyr
Rejaie R.
Rossi Dario
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 27/10/2018
Field of study

Due to the fluctuation of throughput under various network conditions, how to choose a proper bitrate adaptively for real-time video streaming has become an upcoming and interesting issue. Recent work focuses on providing high video bitrates instead of video qualities. Nevertheless, we notice that there exists a trade-off between sending bitrate and video quality, which motivates us to focus on how to get a balance between them. In this paper, we propose QARC (video Quality Awareness Rate Control), a rate control algorithm that aims to have a higher perceptual video quality with possibly lower sending rate and transmission latency. Starting from scratch, QARC uses deep reinforcement learning(DRL) algorithm to train a neural network to select future bitrates based on previously observed network status and past video frames, and we design a neural network to predict future perceptual video quality as a vector for taking the place of the raw picture in the DRL's inputs. We evaluate QARC over a trace-driven emulation. As excepted, QARC betters existing approaches.Comment: Accepted by ACM Multimedia 201

arXiv.org e-Print Archive

Crossref

ThumbNet: One Thumbnail Image Contains All You Need for Recognition

Author: Abadi Mart'in
Han Song
He Yihui
Hinton Geoffrey E
Ioffe Sergey
Li Yujia
Luo Jian-Hao
Simonyan Karen
Xu Mengmeng
Yu Jiahui
Zhao Chen
Zhou Bolei
Publication venue
Publication date: 03/12/2020
Field of study

Although deep convolutional neural networks (CNNs) have achieved great success in computer vision tasks, its real-world application is still impeded by its voracious demand of computational resources. Current works mostly seek to compress the network by reducing its parameters or parameter-incurred computation, neglecting the influence of the input image on the system complexity. Based on the fact that input images of a CNN contain substantial redundancy, in this paper, we propose a unified framework, dubbed as ThumbNet, to simultaneously accelerate and compress CNN models by enabling them to infer on one thumbnail image. We provide three effective strategies to train ThumbNet. In doing so, ThumbNet learns an inference network that performs equally well on small images as the original-input network on large images. With ThumbNet, not only do we obtain the thumbnail-input inference network that can drastically reduce computation and memory requirements, but also we obtain an image downscaler that can generate thumbnail images for generic classification tasks. Extensive experiments show the effectiveness of ThumbNet, and demonstrate that the thumbnail-input inference network learned by ThumbNet can adequately retain the accuracy of the original-input network even when the input images are downscaled 16 times

arXiv.org e-Print Archive

Crossref

Modeling Human Visual Search Performance on Realistic Webpages Using Analytical and Deep Learning Methods

Author: Abadi Mart'in
Borji Ali
Chen Kan
Chen X
Devlin Jacob
Ioffe Sergey
Kingma Diederik P
Koch Christof
LeCun Yann
Neisser Ubric
Tehranchi Farnaz
Treue Stefan
Wu Xiaoli
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 07/05/2020
Field of study

Modeling visual search not only offers an opportunity to predict the usability of an interface before actually testing it on real users, but also advances scientific understanding about human behavior. In this work, we first conduct a set of analyses on a large-scale dataset of visual search tasks on realistic webpages. We then present a deep neural network that learns to predict the scannability of webpage content, i.e., how easy it is for a user to find a specific target. Our model leverages both heuristic-based features such as target size and unstructured features such as raw image pixels. This approach allows us to model complex interactions that might be involved in a realistic visual search task, which can not be easily achieved by traditional analytical models. We analyze the model behavior to offer our insights into how the salience map learned by the model aligns with human intuition and how the learned semantic representation of each target type relates to its visual search performance.Comment: the 2020 CHI Conference on Human Factors in Computing System

arXiv.org e-Print Archive

Crossref

Adversarial Infidelity Learning for Model Interpretation

Author: Abadi Mart'in
Ancona Marco
Chakraborti Tathagata
Chen Jianbo
Dombrowski Ann-Kathrin
Heo Juyeon
Howard Andrew G
Jain Sarthak
Schwab Patrick
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 02/08/2020
Field of study

Model interpretation is essential in data mining and knowledge discovery. It can help understand the intrinsic model working mechanism and check if the model has undesired characteristics. A popular way of performing model interpretation is Instance-wise Feature Selection (IFS), which provides an importance score of each feature representing the data samples to explain how the model generates the specific output. In this paper, we propose a Model-agnostic Effective Efficient Direct (MEED) IFS framework for model interpretation, mitigating concerns about sanity, combinatorial shortcuts, model identifiability, and information transmission. Also, we focus on the following setting: using selected features to directly predict the output of the given model, which serves as a primary evaluation metric for model-interpretation methods. Apart from the features, we involve the output of the given model as an additional input to learn an explainer based on more accurate information. To learn the explainer, besides fidelity, we propose an Adversarial Infidelity Learning (AIL) mechanism to boost the explanation learning by screening relatively unimportant features. Through theoretical and experimental analysis, we show that our AIL mechanism can help learn the desired conditional distribution between selected features and targets. Moreover, we extend our framework by integrating efficient interpretation methods as proper priors to provide a warm start. Comprehensive empirical evaluation results are provided by quantitative metrics and human evaluation to demonstrate the effectiveness and superiority of our proposed method. Our code is publicly available online at https://github.com/langlrsw/MEED.Comment: 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '20), August 23--27, 2020, Virtual Event, US

arXiv.org e-Print Archive

Crossref

Audiovisual Transformer Architectures for Large-Scale Classification and Synchronization of Weakly Labeled Audio Events

Author: Abadi Mart'in
Gehring Jonas
Hershey Shawn
Iqbal Turab
Mesaros Annamaria
Mesaros Annamaria
Parekh Sanjeel
Plumbley Mark D.
Simonyan Karen
Virtanen Tuomas
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 02/12/2019
Field of study

We tackle the task of environmental event classification by drawing inspiration from the transformer neural network architecture used in machine translation. We modify this attention-based feedforward structure in such a way that allows the resulting model to use audio as well as video to compute sound event predictions. We perform extensive experiments with these adapted transformers on an audiovisual data set, obtained by appending relevant visual information to an existing large-scale weakly labeled audio collection. The employed multi-label data contains clip-level annotation indicating the presence or absence of 17 classes of environmental sounds, and does not include temporal information. We show that the proposed modified transformers strongly improve upon previously introduced models and in fact achieve state-of-the-art results. We also make a compelling case for devoting more attention to research in multimodal audiovisual classification by proving the usefulness of visual information for the task at hand,namely audio event recognition. In addition, we visualize internal attention patterns of the audiovisual transformers and in doing so demonstrate their potential for performing multimodal synchronization

arXiv.org e-Print Archive

Crossref

A Substellar Companion to Pleiades HII 3441

Author: Abe Lyu
Akiyama Eiji
Brandner Wolfgang
Brandt Timothy D.
Carson Joseph C.
Feldt Markus
Fukagawa Misato
Goto Miwa
Grady Carol A.
Guyon Olivier
Hashimoto Jun
Hayano Yutaka
Hayashi Masahiko
Hayashi Saeko S.
Henning Thomas
Hodapp Klaus W.
Ishii Miki
Itoh Yoichi
Iye Masanori
Janson Markus
Kandori Ryo
Knapp Gillian R.
Konishi Mihoko
Kudo Tomoyuki
Kusakabe Nobuhiko
Kuzuhara Masayuki
Kwon Jungmi
Matsuo Taro
Mcelwain Michael W.
Mede Kyle
Miyama Shoken
Morino Jun-Ichi
Moro-Mart'in Amaya
Nishimura Tetsuo
Oh Daehyeon
Pyo Tae-Soo
Samland Matthias
Schlieder Joshua E.
Serabyn Eugene
Shibai Hiroshi
Sudo Jun
Suenaga Takuya
Sumi Takahiro
Suto Hiroshi
Suzuki Ryuji
Takahashi Yasuhiro H.
Takami Hideki
Takami Michihiro
Takato Naruhisa
Tamura Motohide
Terada Hiroshi
Thalmann Christian
Turner Edwin L.
Usuda Tomonori
Watanabe Makoto
Wisniewski John P.
Yamada Toru
Yamamoto Kodai
Publication venue: 'Oxford University Press (OUP)'
Publication date: 05/08/2016
Field of study

We find a new substellar companion to the Pleiades member star, Pleiades HII 3441, using the Subaru telescope with adaptive optics. The discovery is made as part of the high-contrast imaging survey to search for planetary-mass and substellar companions in the Pleiades and young moving groups. The companion has a projected separation of 0".49 +/- 0".02 (66 +/- 2 AU) and a mass of 68 +/- 5 M_J based on three observations in the J-, H-, and K_S-band. The spectral type is estimated to be M7 (~2700 K), and thus no methane absorption is detected in the H band. Our Pleiades observations result in the detection of two substellar companions including one previously reported among 20 observed Pleiades stars, and indicate that the fraction of substellar companions in the Pleiades is about 10.0 +26.1/-8.8 %. This is consistent with multiplicity studies of both the Pleiades stars and other open clusters.Comment: Main text (14 pages, 4 figures, 4 tables), and Supplementary data (8 pages, 3 tables). Accepted for Publications of Astronomical Society of Japa

arXiv.org e-Print Archive

NASA Technical Reports Server

QUOTIENT: Two-Party Secure Neural Network Training and Prediction

Author: Abadi Mart'in
Alistarh Dan
Bernstein Jeremy
Chi-Chih Yao Andrew
Cruz-Roa Angel
Demmler Daniel
Gilad-Bachrach Ran
Gupta Suyog
Hou Lu
Kilbertus Niki
Kingma Diederik P
Kolesnikov Vladimir
Quinlan J. Ross
Reddi Sashank J.
Riazi M Sadegh
Sanyal Amartya
Schoppmann Phillipp
Wagh Sameer
Wen Wei
Publication venue
Publication date: 07/07/2019
Field of study

Recently, there has been a wealth of effort devoted to the design of secure protocols for machine learning tasks. Much of this is aimed at enabling secure prediction from highly-accurate Deep Neural Networks (DNNs). However, as DNNs are trained on data, a key question is how such models can be also trained securely. The few prior works on secure DNN training have focused either on designing custom protocols for existing training algorithms, or on developing tailored training algorithms and then applying generic secure protocols. In this work, we investigate the advantages of designing training algorithms alongside a novel secure protocol, incorporating optimizations on both fronts. We present QUOTIENT, a new method for discretized training of DNNs, along with a customized secure two-party protocol for it. QUOTIENT incorporates key components of state-of-the-art DNN training such as layer normalization and adaptive gradient methods, and improves upon the state-of-the-art in DNN training in two-party computation. Compared to prior work, we obtain an improvement of 50X in WAN time and 6% in absolute accuracy

arXiv.org e-Print Archive

Crossref

UCL Discovery

Oxford University Research Archive

Declarative Experimentation in Information Retrieval Using PyTerrier

Author: Apache Consortium
Cartright Marc-Allen
Chatterjee Shubham
Gysel Christophe Van
Kamphuis Chris
Macdonald Craig
Macdonald Craig
Mallia Antonio
Mart'in
Mühleisen Hannes
Zaharia Matei
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2020
Field of study

The advent of deep machine learning platforms such as Tensorflow and Pytorch, developed in expressive high-level languages such as Python, have allowed more expressive representations of deep neural network architectures. We argue that such a powerful formalism is missing in information retrieval (IR), and propose a framework called PyTerrier that allows advanced retrieval pipelines to be expressed, and evaluated, in a declarative manner close to their conceptual design. Like the aforementioned frameworks that compile deep learning experiments into primitive GPU operations, our framework targets IR platforms as backends in order to execute and evaluate retrieval pipelines. Further, we can automatically optimise the retrieval pipelines to increase their efficiency to suite a particular IR platform backend. Our experiments, conducted on TREC Robust and ClueWeb09 test collections, demonstrate the efficiency benefits of these optimisations for retrieval pipelines involving both the Anserini and Terrier IR platforms

arXiv.org e-Print Archive

Crossref

Archivio della Ricerca - Università di Pisa

Enlighten