Search CORE

23,514 research outputs found

A Universal Parallel Two-Pass MDL Context Tree Compression Algorithm

Author: Baron Dror
Krishnan Nikhil
Publication venue
Publication date: 21/03/2015
Field of study

Computing problems that handle large amounts of data necessitate the use of lossless data compression for efficient storage and transmission. We present a novel lossless universal data compression algorithm that uses parallel computational units to increase the throughput. The length-

N

input sequence is partitioned into

B

blocks. Processing each block independently of the other blocks can accelerate the computation by a factor of

B

, but degrades the compression quality. Instead, our approach is to first estimate the minimum description length (MDL) context tree source underlying the entire input, and then encode each of the

B

blocks in parallel based on the MDL source. With this two-pass approach, the compression loss incurred by using more parallel units is insignificant. Our algorithm is work-efficient, i.e., its computational complexity is

O(N/B)

. Its redundancy is approximately

B\log(N/B)

bits above Rissanen's lower bound on universal compression performance, with respect to any context tree source whose maximal depth is at most

\log(N/B)

. We improve the compression by using different quantizers for states of the context tree based on the number of symbols corresponding to those states. Numerical results from a prototype implementation suggest that our algorithm offers a better trade-off between compression and throughput than competing universal data compression algorithms.Comment: Accepted to Journal of Selected Topics in Signal Processing special issue on Signal Processing for Big Data (expected publication date June 2015). 10 pages double column, 6 figures, and 2 tables. arXiv admin note: substantial text overlap with arXiv:1405.6322. Version: Mar 2015: Corrected a typ

arXiv.org e-Print Archive

Deep Active Learning for Named Entity Recognition

Author: Anandkumar Animashree
Kronrod Yakov
Lipton Zachary C.
Shen Yanyao
Yun Hyokun
Publication venue
Publication date: 03/02/2018
Field of study

Deep learning has yielded state-of-the-art performance on many natural language processing tasks including named entity recognition (NER). However, this typically requires large amounts of labeled data. In this work, we demonstrate that the amount of labeled training data can be drastically reduced when deep learning is combined with active learning. While active learning is sample-efficient, it can be computationally expensive since it requires iterative retraining. To speed this up, we introduce a lightweight architecture for NER, viz., the CNN-CNN-LSTM model consisting of convolutional character and word encoders and a long short term memory (LSTM) tag decoder. The model achieves nearly state-of-the-art performance on standard datasets for the task while being computationally much more efficient than best performing models. We carry out incremental active learning, during the training process, and are able to nearly match state-of-the-art performance with just 25\% of the original training data

arXiv.org e-Print Archive

A Parallel Two-Pass MDL Context Tree Algorithm for Universal Source Coding

Author: Baron Dror
Krishnan Nikhil
Mıhçak Mehmet Kıvanç
Publication venue
Publication date: 01/01/2014
Field of study

We present a novel lossless universal source coding algorithm that uses parallel computational units to increase the throughput. The length-

N

input sequence is partitioned into

B

blocks. Processing each block independently of the other blocks can accelerate the computation by a factor of

B

, but degrades the compression quality. Instead, our approach is to first estimate the minimum description length (MDL) source underlying the entire input, and then encode each of the

B

O(N/B)

. Its redundancy is approximately

B\log(N/B)

bits above Rissanen's lower bound on universal coding performance, with respect to any tree source whose maximal depth is at most

\log(N/B)

arXiv.org e-Print Archive

CiteSeerX

Crossref

Closed-loop estimation of retinal network sensitivity reveals signature of efficient coding

Author: Ferrari Ulisse
Gardella Christophe
Marre Olivier
Mora Thierry
Publication venue: 'Society for Neuroscience'
Publication date: 23/01/2017
Field of study

According to the theory of efficient coding, sensory systems are adapted to represent natural scenes with high fidelity and at minimal metabolic cost. Testing this hypothesis for sensory structures performing non-linear computations on high dimensional stimuli is still an open challenge. Here we develop a method to characterize the sensitivity of the retinal network to perturbations of a stimulus. Using closed-loop experiments, we explore selectively the space of possible perturbations around a given stimulus. We then show that the response of the retinal population to these small perturbations can be described by a local linear model. Using this model, we computed the sensitivity of the neural response to arbitrary temporal perturbations of the stimulus, and found a peak in the sensitivity as a function of the frequency of the perturbations. Based on a minimal theory of sensory processing, we argue that this peak is set to maximize information transmission. Our approach is relevant to testing the efficient coding hypothesis locally in any context where no reliable encoding model is known

arXiv.org e-Print Archive

Hal-Diderot