Search CORE

938 research outputs found

Balanced Quantization: An Effective and Efficient Approach to Quantized Neural Networks

Author: He Qinyao
Wang Yuzhi
Wen He
Zhou Shuchang
Zou Yuheng
Publication venue
Publication date: 01/01/2017
Field of study

Quantized Neural Networks (QNNs), which use low bitwidth numbers for representing parameters and performing computations, have been proposed to reduce the computation complexity, storage size and memory usage. In QNNs, parameters and activations are uniformly quantized, such that the multiplications and additions can be accelerated by bitwise operations. However, distributions of parameters in Neural Networks are often imbalanced, such that the uniform quantization determined from extremal values may under utilize available bitwidth. In this paper, we propose a novel quantization method that can ensure the balance of distributions of quantized values. Our method first recursively partitions the parameters by percentiles into balanced bins, and then applies uniform quantization. We also introduce computationally cheaper approximations of percentiles to reduce the computation overhead introduced. Overall, our method improves the prediction accuracies of QNNs without introducing extra computation during inference, has negligible impact on training speed, and is applicable to both Convolutional Neural Networks and Recurrent Neural Networks. Experiments on standard datasets including ImageNet and Penn Treebank confirm the effectiveness of our method. On ImageNet, the top-5 error rate of our 4-bit quantized GoogLeNet model is 12.7\%, which is superior to the state-of-the-arts of QNNs

arXiv.org e-Print Archive

Understanding and Comparing Scalable Gaussian Process Regression for Big Data

Author: Cai Jianfei
Liu Haitao
Ong Yew-Soon
Wang Yi
Publication venue
Publication date: 01/01/2018
Field of study

As a non-parametric Bayesian model which produces informative predictive distribution, Gaussian process (GP) has been widely used in various fields, like regression, classification and optimization. The cubic complexity of standard GP however leads to poor scalability, which poses challenges in the era of big data. Hence, various scalable GPs have been developed in the literature in order to improve the scalability while retaining desirable prediction accuracy. This paper devotes to investigating the methodological characteristics and performance of representative global and local scalable GPs including sparse approximations and local aggregations from four main perspectives: scalability, capability, controllability and robustness. The numerical experiments on two toy examples and five real-world datasets with up to 250K points offer the following findings. In terms of scalability, most of the scalable GPs own a time complexity that is linear to the training size. In terms of capability, the sparse approximations capture the long-term spatial correlations, the local aggregations capture the local patterns but suffer from over-fitting in some scenarios. In terms of controllability, we could improve the performance of sparse approximations by simply increasing the inducing size. But this is not the case for local aggregations. In terms of robustness, local aggregations are robust to various initializations of hyperparameters due to the local attention mechanism. Finally, we highlight that the proper hybrid of global and local scalable GPs may be a promising way to improve both the model capability and scalability for big data.Comment: 25 pages, 15 figures, preprint submitted to KB

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)

Deep-Reinforcement Learning Multiple Access for Heterogeneous Wireless Networks

Author: Liew Soung Chang
Wang Taotao
Yu Yiding
Publication venue
Publication date: 16/07/2018
Field of study

This paper investigates the use of deep reinforcement learning (DRL) in a MAC protocol for heterogeneous wireless networking referred to as Deep-reinforcement Learning Multiple Access (DLMA). The thrust of this work is partially inspired by the vision of DARPA SC2, a 3-year competition whereby competitors are to come up with a clean-slate design that "best share spectrum with any network(s), in any environment, without prior knowledge, leveraging on machine-learning technique". Specifically, this paper considers the problem of sharing time slots among a multiple of time-slotted networks that adopt different MAC protocols. One of the MAC protocols is DLMA. The other two are TDMA and ALOHA. The nodes operating DLMA do not know that the other two MAC protocols are TDMA and ALOHA. Yet, by a series of observations of the environment, its own actions, and the resulting rewards, a DLMA node can learn an optimal MAC strategy to coexist harmoniously with the TDMA and ALOHA nodes according to a specified objective (e.g., the objective could be the sum throughput of all networks, or a general alpha-fairness objective)

arXiv.org e-Print Archive

Crossref

Practical Block-wise Neural Network Architecture Generation

Author: Liu Cheng-Lin
Shao Jing
Wu Wei
Yan Junjie
Zhong Zhao
Publication venue
Publication date: 14/05/2018
Field of study

Convolutional neural networks have gained a remarkable success in computer vision. However, most usable network architectures are hand-crafted and usually require expertise and elaborate design. In this paper, we provide a block-wise network generation pipeline called BlockQNN which automatically builds high-performance networks using the Q-Learning paradigm with epsilon-greedy exploration strategy. The optimal network block is constructed by the learning agent which is trained sequentially to choose component layers. We stack the block to construct the whole auto-generated network. To accelerate the generation process, we also propose a distributed asynchronous framework and an early stop strategy. The block-wise generation brings unique advantages: (1) it performs competitive results in comparison to the hand-crafted state-of-the-art networks on image classification, additionally, the best network generated by BlockQNN achieves 3.54% top-1 error rate on CIFAR-10 which beats all existing auto-generate networks. (2) in the meanwhile, it offers tremendous reduction of the search space in designing networks which only spends 3 days with 32 GPUs, and (3) moreover, it has strong generalizability that the network built on CIFAR also performs well on a larger-scale ImageNet dataset.Comment: Accepted to CVPR 201

arXiv.org e-Print Archive

Crossref