8,124 research outputs found
Fading of collective attention shapes the evolution of linguistic variants
Language change involves the competition between alternative linguistic forms
(1). The spontaneous evolution of these forms typically results in monotonic
growths or decays (2, 3) like in winner-take-all attractor behaviors. In the
case of the Spanish past subjunctive, the spontaneous evolution of its two
competing forms (ended in -ra and -se) was perturbed by the appearance of the
Royal Spanish Academy in 1713, which enforced the spelling of both forms as
perfectly interchangeable variants (4), at a moment in which the -ra form was
dominant (5). Time series extracted from a massive corpus of books (6) reveal
that this regulation in fact produced a transient renewed interest for the old
form -se which, once faded, left the -ra again as the dominant form up to the
present day. We show that time series are successfully explained by a
two-dimensional linear model that integrates an imitative and a novelty
component. The model reveals that the temporal scale over which collective
attention fades is in inverse proportion to the verb frequency. The integration
of the two basic mechanisms of imitation and attention to novelty allows to
understand diverse competing objects, with lifetimes that range from hours for
memes and news (7, 8) to decades for verbs, suggesting the existence of a
general mechanism underlying cultural evolution.Comment: 8 pages, 2 figures, 3 supplementary figure
Where Graph Topology Matters: The Robust Subgraph Problem
Robustness is a critical measure of the resilience of large networked
systems, such as transportation and communication networks. Most prior works
focus on the global robustness of a given graph at large, e.g., by measuring
its overall vulnerability to external attacks or random failures. In this
paper, we turn attention to local robustness and pose a novel problem in the
lines of subgraph mining: given a large graph, how can we find its most robust
local subgraph (RLS)?
We define a robust subgraph as a subset of nodes with high communicability
among them, and formulate the RLS-PROBLEM of finding a subgraph of given size
with maximum robustness in the host graph. Our formulation is related to the
recently proposed general framework for the densest subgraph problem, however
differs from it substantially in that besides the number of edges in the
subgraph, robustness also concerns with the placement of edges, i.e., the
subgraph topology. We show that the RLS-PROBLEM is NP-hard and propose two
heuristic algorithms based on top-down and bottom-up search strategies.
Further, we present modifications of our algorithms to handle three practical
variants of the RLS-PROBLEM. Experiments on synthetic and real-world graphs
demonstrate that we find subgraphs with larger robustness than the densest
subgraphs even at lower densities, suggesting that the existing approaches are
not suitable for the new problem setting.Comment: 13 pages, 10 Figures, 3 Tables, to appear at SDM 2015 (9 pages only
Evaluating Sequence-to-Sequence Models for Handwritten Text Recognition
Encoder-decoder models have become an effective approach for sequence
learning tasks like machine translation, image captioning and speech
recognition, but have yet to show competitive results for handwritten text
recognition. To this end, we propose an attention-based sequence-to-sequence
model. It combines a convolutional neural network as a generic feature
extractor with a recurrent neural network to encode both the visual
information, as well as the temporal context between characters in the input
image, and uses a separate recurrent neural network to decode the actual
character sequence. We make experimental comparisons between various attention
mechanisms and positional encodings, in order to find an appropriate alignment
between the input and output sequence. The model can be trained end-to-end and
the optional integration of a hybrid loss allows the encoder to retain an
interpretable and usable output, if desired. We achieve competitive results on
the IAM and ICFHR2016 READ data sets compared to the state-of-the-art without
the use of a language model, and we significantly improve over any recent
sequence-to-sequence approaches.Comment: 8 pages, 1 figure, 8 table
Low-Latency Sequence-to-Sequence Speech Recognition and Translation by Partial Hypothesis Selection
Encoder-decoder models provide a generic architecture for
sequence-to-sequence tasks such as speech recognition and translation. While
offline systems are often evaluated on quality metrics like word error rates
(WER) and BLEU, latency is also a crucial factor in many practical use-cases.
We propose three latency reduction techniques for chunk-based incremental
inference and evaluate their efficiency in terms of accuracy-latency trade-off.
On the 300-hour How2 dataset, we reduce latency by 83% to 0.8 second by
sacrificing 1% WER (6% rel.) compared to offline transcription. Although our
experiments use the Transformer, the hypothesis selection strategies are
applicable to other encoder-decoder models. To avoid expensive re-computation,
we use a unidirectionally-attending encoder. After an adaptation procedure to
partial sequences, the unidirectional model performs on-par with the original
model. We further show that our approach is also applicable to low-latency
speech translation. On How2 English-Portuguese speech translation, we reduce
latency to 0.7 second (-84% rel.) while incurring a loss of 2.4 BLEU points (5%
rel.) compared to the offline system
Non-Monotonic Sequential Text Generation
Standard sequential generation methods assume a pre-specified generation
order, such as text generation methods which generate words from left to right.
In this work, we propose a framework for training models of text generation
that operate in non-monotonic orders; the model directly learns good orders,
without any additional annotation. Our framework operates by generating a word
at an arbitrary position, and then recursively generating words to its left and
then words to its right, yielding a binary tree. Learning is framed as
imitation learning, including a coaching method which moves from imitating an
oracle to reinforcing the policy's own preferences. Experimental results
demonstrate that using the proposed method, it is possible to learn policies
which generate text without pre-specifying a generation order, while achieving
competitive performance with conventional left-to-right generation.Comment: ICML 201
Enhancing Monotonic Multihead Attention for Streaming ASR
We investigate a monotonic multihead attention (MMA) by extending hard
monotonic attention to Transformer-based automatic speech recognition (ASR) for
online streaming applications. For streaming inference, all monotonic attention
(MA) heads should learn proper alignments because the next token is not
generated until all heads detect the corresponding token boundaries. However,
we found not all MA heads learn alignments with a na\"ive implementation. To
encourage every head to learn alignments properly, we propose HeadDrop
regularization by masking out a part of heads stochastically during training.
Furthermore, we propose to prune redundant heads to improve consensus among
heads for boundary detection and prevent delayed token generation caused by
such heads. Chunkwise attention on each MA head is extended to the multihead
counterpart. Finally, we propose head-synchronous beam search decoding to
guarantee stable streaming inference.Comment: Accepted to Interspeech 202
SCRAM: Spatially Coherent Randomized Attention Maps
Attention mechanisms and non-local mean operations in general are key
ingredients in many state-of-the-art deep learning techniques. In particular,
the Transformer model based on multi-head self-attention has recently achieved
great success in natural language processing and computer vision. However, the
vanilla algorithm computing the Transformer of an image with n pixels has
O(n^2) complexity, which is often painfully slow and sometimes prohibitively
expensive for large-scale image data. In this paper, we propose a fast
randomized algorithm --- SCRAM --- that only requires O(n log(n)) time to
produce an image attention map. Such a dramatic acceleration is attributed to
our insight that attention maps on real-world images usually exhibit (1)
spatial coherence and (2) sparse structure. The central idea of SCRAM is to
employ PatchMatch, a randomized correspondence algorithm, to quickly pinpoint
the most compatible key (argmax) for each query first, and then exploit that
knowledge to design a sparse approximation to non-local mean operations. Using
the argmax (mode) to dynamically construct the sparse approximation
distinguishes our algorithm from all of the existing sparse approximate methods
and makes it very efficient. Moreover, SCRAM is a broadly applicable
approximation to any non-local mean layer in contrast to some other sparse
approximations that can only approximate self-attention. Our preliminary
experimental results suggest that SCRAM is indeed promising for speeding up or
scaling up the computation of attention maps in the Transformer
Neutron star collapse and gravitational waves with a non-convex equation of state
The thermodynamical properties of the equation of state (EoS) of high-density
matter (above nuclear saturation density) and the possible existence of exotic
states such as phase transitions from nuclear/hadronic matter into quark-gluon
plasma, or the appearance of hyperons, may critically influence the stability
and dynamics of compact relativistic stars. From a theoretical point of view,
establishing the existence of those states requires the analysis of the
`convexity' of the EoS. We show indications of the existence of regions in the
dense-matter EoS where the thermodynamics may be non-convex as a result of a
non-monotonic dependence of the sound speed with the rest-mass density. When
this happens, non-conventional dynamics may develop. In this paper we
investigate the effects of a phenomenological, non-convex EoS on the
equilibrium structure of stable compact stars and on the dynamics of unstable
neutron stars that collapse gravitationally to black holes, both for
spherically symmetric and uniformly-rotating configurations. We show how the
dynamics of the collapse with a non-convex EoS departs from the convex case,
leaving distinctive imprints on the gravitational waveforms. The astrophysical
significance of these results for microphysical EoSs is discussed.Comment: 29 pages, 22 figures, Accepted by MNRAS on January 24tth 2019. The
author order has changed with respect to the previous arXiv versio
Globally Optimal Distributed Power Control for Nonconcave Utility Maximization
Transmit power control in wireless networks has long been recognized as an
effective mechanism to mitigate co-channel interference. Due to the highly
non-convex nature, optimal power control is known to be difficult to achieve if
a system utility is to be maximized. To date, there does not yet exist a
distributed power control algorithm that maximizes any form of system utility,
despite the importance of distributed implementation for the wireless
infrastructureless networks such as ad hoc and sensor networks. This paper
fills this gap by developing a Gibbs Sampling based Asynchronous distributed
power control algorithm (referred to as GLAD). The proposed algorithm quickly
converges to the global optimal solution regardless of the concavity,
continuity, differentiability and monotonicity of the utility function. Same as
other existing distributed power control algorithms, GLAD requires extensive
message passing among all users in the network, which leads to high signaling
overhead and high processing complexity. To address this issue, this paper
further proposes a variant of the GLAD algorithm, referred to as I-GLAD, where
the prefix "I" stands for infrequent message passing. The convergence of I-GLAD
can be proved regardless of the reduction in the message passing rate. To
further reduce the processing complexity at each transmitter, we develop an
enhanced version of I-GLAD, referred to as NI-GLAD, where only the control
messages from the neighboring links are processed. Our simulation results show
that I-GLAD approximately converges to the global optimal solution regardless
of the type of the system utility function. Meanwhile, the optimality of the
solution obtained by NI-GLAD depends on the selection of the neighborhood size.Comment: 30 pages, 3 tables, and 9 figure
Recommended from our members
Background suppressing Gabor energy filtering
In the field of facial emotion recognition, early research advanced with the use of Gabor filters. However, these filters lack generalization and result in undesirably large feature vector size. In recent work, more attention has been given to other local appearance features. Two desired characteristics in a facial appearance feature are generalization capability, and the compactness of representation. In this paper, we propose a novel texture feature inspired by Gabor energy filters, called background suppressing Gabor energy filtering. The feature has a generalization component that removes background texture. It has a reduced feature vector size due to maximal representation and soft orientation histograms, and it is awhite box representation. We demonstrate improved performance on the non-trivial Audio/Visual Emotion Challenge 2012 grand-challenge dataset by a factor of 7.17 over the Gabor filter on the development set. We also demonstrate applicability of our approach beyond facial emotion recognition which yields improved classification rate over the Gabor filter for four bioimaging datasets by an average of 8.22%
- …