8,124 research outputs found

    Fading of collective attention shapes the evolution of linguistic variants

    Full text link
    Language change involves the competition between alternative linguistic forms (1). The spontaneous evolution of these forms typically results in monotonic growths or decays (2, 3) like in winner-take-all attractor behaviors. In the case of the Spanish past subjunctive, the spontaneous evolution of its two competing forms (ended in -ra and -se) was perturbed by the appearance of the Royal Spanish Academy in 1713, which enforced the spelling of both forms as perfectly interchangeable variants (4), at a moment in which the -ra form was dominant (5). Time series extracted from a massive corpus of books (6) reveal that this regulation in fact produced a transient renewed interest for the old form -se which, once faded, left the -ra again as the dominant form up to the present day. We show that time series are successfully explained by a two-dimensional linear model that integrates an imitative and a novelty component. The model reveals that the temporal scale over which collective attention fades is in inverse proportion to the verb frequency. The integration of the two basic mechanisms of imitation and attention to novelty allows to understand diverse competing objects, with lifetimes that range from hours for memes and news (7, 8) to decades for verbs, suggesting the existence of a general mechanism underlying cultural evolution.Comment: 8 pages, 2 figures, 3 supplementary figure

    Where Graph Topology Matters: The Robust Subgraph Problem

    Full text link
    Robustness is a critical measure of the resilience of large networked systems, such as transportation and communication networks. Most prior works focus on the global robustness of a given graph at large, e.g., by measuring its overall vulnerability to external attacks or random failures. In this paper, we turn attention to local robustness and pose a novel problem in the lines of subgraph mining: given a large graph, how can we find its most robust local subgraph (RLS)? We define a robust subgraph as a subset of nodes with high communicability among them, and formulate the RLS-PROBLEM of finding a subgraph of given size with maximum robustness in the host graph. Our formulation is related to the recently proposed general framework for the densest subgraph problem, however differs from it substantially in that besides the number of edges in the subgraph, robustness also concerns with the placement of edges, i.e., the subgraph topology. We show that the RLS-PROBLEM is NP-hard and propose two heuristic algorithms based on top-down and bottom-up search strategies. Further, we present modifications of our algorithms to handle three practical variants of the RLS-PROBLEM. Experiments on synthetic and real-world graphs demonstrate that we find subgraphs with larger robustness than the densest subgraphs even at lower densities, suggesting that the existing approaches are not suitable for the new problem setting.Comment: 13 pages, 10 Figures, 3 Tables, to appear at SDM 2015 (9 pages only

    Evaluating Sequence-to-Sequence Models for Handwritten Text Recognition

    Full text link
    Encoder-decoder models have become an effective approach for sequence learning tasks like machine translation, image captioning and speech recognition, but have yet to show competitive results for handwritten text recognition. To this end, we propose an attention-based sequence-to-sequence model. It combines a convolutional neural network as a generic feature extractor with a recurrent neural network to encode both the visual information, as well as the temporal context between characters in the input image, and uses a separate recurrent neural network to decode the actual character sequence. We make experimental comparisons between various attention mechanisms and positional encodings, in order to find an appropriate alignment between the input and output sequence. The model can be trained end-to-end and the optional integration of a hybrid loss allows the encoder to retain an interpretable and usable output, if desired. We achieve competitive results on the IAM and ICFHR2016 READ data sets compared to the state-of-the-art without the use of a language model, and we significantly improve over any recent sequence-to-sequence approaches.Comment: 8 pages, 1 figure, 8 table

    Low-Latency Sequence-to-Sequence Speech Recognition and Translation by Partial Hypothesis Selection

    Full text link
    Encoder-decoder models provide a generic architecture for sequence-to-sequence tasks such as speech recognition and translation. While offline systems are often evaluated on quality metrics like word error rates (WER) and BLEU, latency is also a crucial factor in many practical use-cases. We propose three latency reduction techniques for chunk-based incremental inference and evaluate their efficiency in terms of accuracy-latency trade-off. On the 300-hour How2 dataset, we reduce latency by 83% to 0.8 second by sacrificing 1% WER (6% rel.) compared to offline transcription. Although our experiments use the Transformer, the hypothesis selection strategies are applicable to other encoder-decoder models. To avoid expensive re-computation, we use a unidirectionally-attending encoder. After an adaptation procedure to partial sequences, the unidirectional model performs on-par with the original model. We further show that our approach is also applicable to low-latency speech translation. On How2 English-Portuguese speech translation, we reduce latency to 0.7 second (-84% rel.) while incurring a loss of 2.4 BLEU points (5% rel.) compared to the offline system

    Non-Monotonic Sequential Text Generation

    Full text link
    Standard sequential generation methods assume a pre-specified generation order, such as text generation methods which generate words from left to right. In this work, we propose a framework for training models of text generation that operate in non-monotonic orders; the model directly learns good orders, without any additional annotation. Our framework operates by generating a word at an arbitrary position, and then recursively generating words to its left and then words to its right, yielding a binary tree. Learning is framed as imitation learning, including a coaching method which moves from imitating an oracle to reinforcing the policy's own preferences. Experimental results demonstrate that using the proposed method, it is possible to learn policies which generate text without pre-specifying a generation order, while achieving competitive performance with conventional left-to-right generation.Comment: ICML 201

    Enhancing Monotonic Multihead Attention for Streaming ASR

    Full text link
    We investigate a monotonic multihead attention (MMA) by extending hard monotonic attention to Transformer-based automatic speech recognition (ASR) for online streaming applications. For streaming inference, all monotonic attention (MA) heads should learn proper alignments because the next token is not generated until all heads detect the corresponding token boundaries. However, we found not all MA heads learn alignments with a na\"ive implementation. To encourage every head to learn alignments properly, we propose HeadDrop regularization by masking out a part of heads stochastically during training. Furthermore, we propose to prune redundant heads to improve consensus among heads for boundary detection and prevent delayed token generation caused by such heads. Chunkwise attention on each MA head is extended to the multihead counterpart. Finally, we propose head-synchronous beam search decoding to guarantee stable streaming inference.Comment: Accepted to Interspeech 202

    SCRAM: Spatially Coherent Randomized Attention Maps

    Full text link
    Attention mechanisms and non-local mean operations in general are key ingredients in many state-of-the-art deep learning techniques. In particular, the Transformer model based on multi-head self-attention has recently achieved great success in natural language processing and computer vision. However, the vanilla algorithm computing the Transformer of an image with n pixels has O(n^2) complexity, which is often painfully slow and sometimes prohibitively expensive for large-scale image data. In this paper, we propose a fast randomized algorithm --- SCRAM --- that only requires O(n log(n)) time to produce an image attention map. Such a dramatic acceleration is attributed to our insight that attention maps on real-world images usually exhibit (1) spatial coherence and (2) sparse structure. The central idea of SCRAM is to employ PatchMatch, a randomized correspondence algorithm, to quickly pinpoint the most compatible key (argmax) for each query first, and then exploit that knowledge to design a sparse approximation to non-local mean operations. Using the argmax (mode) to dynamically construct the sparse approximation distinguishes our algorithm from all of the existing sparse approximate methods and makes it very efficient. Moreover, SCRAM is a broadly applicable approximation to any non-local mean layer in contrast to some other sparse approximations that can only approximate self-attention. Our preliminary experimental results suggest that SCRAM is indeed promising for speeding up or scaling up the computation of attention maps in the Transformer

    Neutron star collapse and gravitational waves with a non-convex equation of state

    Full text link
    The thermodynamical properties of the equation of state (EoS) of high-density matter (above nuclear saturation density) and the possible existence of exotic states such as phase transitions from nuclear/hadronic matter into quark-gluon plasma, or the appearance of hyperons, may critically influence the stability and dynamics of compact relativistic stars. From a theoretical point of view, establishing the existence of those states requires the analysis of the `convexity' of the EoS. We show indications of the existence of regions in the dense-matter EoS where the thermodynamics may be non-convex as a result of a non-monotonic dependence of the sound speed with the rest-mass density. When this happens, non-conventional dynamics may develop. In this paper we investigate the effects of a phenomenological, non-convex EoS on the equilibrium structure of stable compact stars and on the dynamics of unstable neutron stars that collapse gravitationally to black holes, both for spherically symmetric and uniformly-rotating configurations. We show how the dynamics of the collapse with a non-convex EoS departs from the convex case, leaving distinctive imprints on the gravitational waveforms. The astrophysical significance of these results for microphysical EoSs is discussed.Comment: 29 pages, 22 figures, Accepted by MNRAS on January 24tth 2019. The author order has changed with respect to the previous arXiv versio

    Globally Optimal Distributed Power Control for Nonconcave Utility Maximization

    Full text link
    Transmit power control in wireless networks has long been recognized as an effective mechanism to mitigate co-channel interference. Due to the highly non-convex nature, optimal power control is known to be difficult to achieve if a system utility is to be maximized. To date, there does not yet exist a distributed power control algorithm that maximizes any form of system utility, despite the importance of distributed implementation for the wireless infrastructureless networks such as ad hoc and sensor networks. This paper fills this gap by developing a Gibbs Sampling based Asynchronous distributed power control algorithm (referred to as GLAD). The proposed algorithm quickly converges to the global optimal solution regardless of the concavity, continuity, differentiability and monotonicity of the utility function. Same as other existing distributed power control algorithms, GLAD requires extensive message passing among all users in the network, which leads to high signaling overhead and high processing complexity. To address this issue, this paper further proposes a variant of the GLAD algorithm, referred to as I-GLAD, where the prefix "I" stands for infrequent message passing. The convergence of I-GLAD can be proved regardless of the reduction in the message passing rate. To further reduce the processing complexity at each transmitter, we develop an enhanced version of I-GLAD, referred to as NI-GLAD, where only the control messages from the neighboring links are processed. Our simulation results show that I-GLAD approximately converges to the global optimal solution regardless of the type of the system utility function. Meanwhile, the optimality of the solution obtained by NI-GLAD depends on the selection of the neighborhood size.Comment: 30 pages, 3 tables, and 9 figure
    • …
    corecore