9,099 research outputs found
A Neural Attention Model for Abstractive Sentence Summarization
Summarization based on text extraction is inherently limited, but
generation-style abstractive methods have proven challenging to build. In this
work, we propose a fully data-driven approach to abstractive sentence
summarization. Our method utilizes a local attention-based model that generates
each word of the summary conditioned on the input sentence. While the model is
structurally simple, it can easily be trained end-to-end and scales to a large
amount of training data. The model shows significant performance gains on the
DUC-2004 shared task compared with several strong baselines.Comment: Proceedings of EMNLP 201
Exact and Scaling Form of the Bipartite Fidelity of the Infinite XXZ Chain
We find an exact expression for the bipartite fidelity f=|'|^2,
where |vac> is the vacuum eigenstate of an infinite-size antiferromagnetic XXZ
chain and |vac>' is the vacuum eigenstate of an infinite-size XXZ chain which
is split in two. We consider the quantity -ln(f) which has been put forward as
a measure of quantum entanglement, and show that the large correlation length
xi behaviour is consistent with a general conjecture -ln(f) ~ c/8 ln(xi), where
c is the central charge of the UV conformal field theory (with c=1 for the XXZ
chain). This behaviour is a natural extension of the existing conformal field
theory prediction of -ln(f) ~ c/8 ln(L) for a length L bipartite system with
0<< L <<xi.Comment: 6 page
FreezeOut: Accelerate Training by Progressively Freezing Layers
The early layers of a deep neural net have the fewest parameters, but take up
the most computation. In this extended abstract, we propose to only train the
hidden layers for a set portion of the training run, freezing them out
one-by-one and excluding them from the backward pass. Through experiments on
CIFAR, we empirically demonstrate that FreezeOut yields savings of up to 20%
wall-clock time during training with 3% loss in accuracy for DenseNets, a 20%
speedup without loss of accuracy for ResNets, and no improvement for VGG
networks. Our code is publicly available at
https://github.com/ajbrock/FreezeOutComment: Extended Abstrac
Discrete holomorphicity and quantized affine algebras
We consider non-local currents in the context of quantized affine algebras,
following the construction introduced by Bernard and Felder. In the case of
and , these currents can be identified with
configurations in the six-vertex and Izergin--Korepin nineteen-vertex models.
Mapping these to their corresponding Temperley--Lieb loop models, we directly
identify non-local currents with discretely holomorphic loop observables. In
particular, we show that the bulk discrete holomorphicity relation and its
recently derived boundary analogue are equivalent to conservation laws for
non-local currents
Generative and Discriminative Voxel Modeling with Convolutional Neural Networks
When working with three-dimensional data, choice of representation is key. We
explore voxel-based models, and present evidence for the viability of
voxellated representations in applications including shape modeling and object
classification. Our key contributions are methods for training voxel-based
variational autoencoders, a user interface for exploring the latent space
learned by the autoencoder, and a deep convolutional neural network
architecture for object classification. We address challenges unique to
voxel-based representations, and empirically evaluate our models on the
ModelNet benchmark, where we demonstrate a 51.5% relative improvement in the
state of the art for object classification.Comment: 9 pages, 5 figures, 2 table
SMASH: One-Shot Model Architecture Search through HyperNetworks
Designing architectures for deep neural networks requires expert knowledge
and substantial computation time. We propose a technique to accelerate
architecture selection by learning an auxiliary HyperNet that generates the
weights of a main model conditioned on that model's architecture. By comparing
the relative validation performance of networks with HyperNet-generated
weights, we can effectively search over a wide range of architectures at the
cost of a single training run. To facilitate this search, we develop a flexible
mechanism based on memory read-writes that allows us to define a wide range of
network connectivity patterns, with ResNet, DenseNet, and FractalNet blocks as
special cases. We validate our method (SMASH) on CIFAR-10 and CIFAR-100,
STL-10, ModelNet10, and Imagenet32x32, achieving competitive performance with
similarly-sized hand-designed networks. Our code is available at
https://github.com/ajbrock/SMAS
- …