26 research outputs found
A neural circuit for navigation inspired by C. elegans Chemotaxis
We develop an artificial neural circuit for contour tracking and navigation
inspired by the chemotaxis of the nematode Caenorhabditis elegans. In order to
harness the computational advantages spiking neural networks promise over their
non-spiking counterparts, we develop a network comprising 7-spiking neurons
with non-plastic synapses which we show is extremely robust in tracking a range
of concentrations. Our worm uses information regarding local temporal gradients
in sodium chloride concentration to decide the instantaneous path for foraging,
exploration and tracking. A key neuron pair in the C. elegans chemotaxis
network is the ASEL & ASER neuron pair, which capture the gradient of
concentration sensed by the worm in their graded membrane potentials. The
primary sensory neurons for our network are a pair of artificial spiking
neurons that function as gradient detectors whose design is adapted from a
computational model of the ASE neuron pair in C. elegans. Simulations show that
our worm is able to detect the set-point with approximately four times higher
probability than the optimal memoryless Levy foraging model. We also show that
our spiking neural network is much more efficient and noise-resilient while
navigating and tracking a contour, as compared to an equivalent non-spiking
network. We demonstrate that our model is extremely robust to noise and with
slight modifications can be used for other practical applications such as
obstacle avoidance. Our network model could also be extended for use in
three-dimensional contour tracking or obstacle avoidance
Generative Compression
Traditional image and video compression algorithms rely on hand-crafted
encoder/decoder pairs (codecs) that lack adaptability and are agnostic to the
data being compressed. Here we describe the concept of generative compression,
the compression of data using generative models, and suggest that it is a
direction worth pursuing to produce more accurate and visually pleasing
reconstructions at much deeper compression levels for both image and video
data. We also demonstrate that generative compression is orders-of-magnitude
more resilient to bit error rates (e.g. from noisy wireless channels) than
traditional variable-length coding schemes
How Does Batch Normalization Help Optimization?
Batch Normalization (BatchNorm) is a widely adopted technique that enables
faster and more stable training of deep neural networks (DNNs). Despite its
pervasiveness, the exact reasons for BatchNorm's effectiveness are still poorly
understood. The popular belief is that this effectiveness stems from
controlling the change of the layers' input distributions during training to
reduce the so-called "internal covariate shift". In this work, we demonstrate
that such distributional stability of layer inputs has little to do with the
success of BatchNorm. Instead, we uncover a more fundamental impact of
BatchNorm on the training process: it makes the optimization landscape
significantly smoother. This smoothness induces a more predictive and stable
behavior of the gradients, allowing for faster training.Comment: In NeurIPS'1
Data Selection for Language Models via Importance Resampling
Selecting a suitable pretraining dataset is crucial for both general-domain
(e.g., GPT-3) and domain-specific (e.g., Codex) language models (LMs). We
formalize this problem as selecting a subset of a large raw unlabeled dataset
to match a desired target distribution given unlabeled target samples. Due to
the scale and dimensionality of the raw text data, existing methods use simple
heuristics or require human experts to manually curate data. Instead, we extend
the classic importance resampling approach used in low-dimensions for LM data
selection. We propose Data Selection with Importance Resampling (DSIR), an
efficient and scalable framework that estimates importance weights in a reduced
feature space for tractability and selects data with importance resampling
according to these weights. We instantiate the DSIR framework with hashed
n-gram features for efficiency, enabling the selection of 100M documents from
the full Pile dataset in 4.5 hours. To measure whether hashed n-gram features
preserve the aspects of the data that are relevant to the target, we define KL
reduction, a data metric that measures the proximity between the selected
pretraining data and the target on some feature space. Across 8 data selection
methods (including expert selection), KL reduction on hashed n-gram features
highly correlates with average downstream accuracy (r=0.82). When selecting
data for continued pretraining on a specific domain, DSIR performs comparably
to expert curation across 8 target distributions. When pretraining
general-domain models (target is Wikipedia and books), DSIR improves over
random selection and heuristic filtering baselines by 2-2.5% on the GLUE
benchmark. Code is available at https://github.com/p-lambda/dsir.Comment: NeurIPS 202