263 research outputs found

    Sound-to-imagination: an exploratory study on cross-modal translation using diverse audiovisual data

    Get PDF
    The motivation of our research is to explore the possibilities of automatic sound-to-image (S2I) translation for enabling a human receiver to visually infer occurrences of sound-related events. We expect the computer to ‘imagine’ scenes from captured sounds, generating original images that depict the sound-emitting sources. Previous studies on similar topics opted for simplified approaches using data with low content diversity and/or supervision/self-supervision for training. In contrast, our approach involves performing S2I translation using thousands of distinct and unknown scenes, using sound class annotations solely for data preparation, just enough to ensure aural–visual semantic coherence. To model the translator, we employ an audio encoder and a conditional generative adversarial network (GAN) with a deep densely connected generator. Furthermore, we present a solution using informativity classifiers for quantitatively evaluating the generated images. This allows us to analyze the influence of network-bottleneck variation on the translation process, highlighting a potential trade-off between informativity and pixel space convergence. Despite the complexity of the specified S2I translation task, we were able to generalize the model enough to obtain more than 14%, on average, of interpretable and semantically coherent images translated from unknown sounds.The present work was supported in part by the Brazilian National Council for Scientific and Technological Development (CNPq) under PhD grant 200884/2015-8. Also, the work was partly supported by the Spanish State Research Agency (AEI), project PID2019-107579RBI00/AEI/10.13039/501100011033.Peer ReviewedPostprint (published version

    Extreme Situations Prediction by MultidimenSional Heterogeneous Time Series Using Logical Decision Functions

    Get PDF
    * The work is supported by RFBR, grant 04-01-00858-aA method for prediction of multidimensional heterogeneous time series using logical decision functions is suggested. The method implements simultaneous prediction of several goal variables. It uses deciding function construction algorithm that performs directed search of some variable space partitioning in class of logical deciding functions. To estimate a deciding function quality the realization of informativity criterion for conditional distribution in goal variables' space is offered. As an indicator of extreme states, an occurrence a transition with small probability is suggested

    Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language Models

    Full text link
    Vision-Language Large Models (VLMs) have become primary backbone of AI, due to the impressive performance. However, their expensive computation costs, i.e., throughput and delay, impede potentials in real-world scenarios. To achieve acceleration for VLMs, most existing methods focus on the model perspective: pruning, distillation, quantification, but completely overlook the data-perspective redundancy. To fill the overlook, this paper pioneers the severity of data redundancy, and designs one plug-and-play Turbo module guided by information degree to prune inefficient tokens from visual or textual data. In pursuit of efficiency-performance trade-offs, information degree takes two key factors into consideration: mutual redundancy and semantic value. Concretely, the former evaluates the data duplication between sequential tokens; while the latter evaluates each token by its contribution to the overall semantics. As a result, tokens with high information degree carry less redundancy and stronger semantics. For VLMs' calculation, Turbo works as a user-friendly plug-in that sorts data referring to information degree, utilizing only top-level ones to save costs. Its advantages are multifaceted, e.g., being generally compatible to various VLMs across understanding and generation, simple use without retraining and trivial engineering efforts. On multiple public VLMs benchmarks, we conduct extensive experiments to reveal the gratifying acceleration of Turbo, under negligible performance drop

    Linear Order in Language:an Error-Driven Learning Account

    Get PDF
    Learners of German often struggle with learning the grammatical gender of nouns and their correct articles, for example, that it should be “die Gabel” (the fork) and not “der Gabel”. Why is this so hard? And why do gender systems even exist?I taught participants differently structured artificial languages and found that it is especially difficult to learn a gender system, when gender is marked before the noun (e.g., in German: “die Gabel”, the fork, vs. “der Löffel”, the spoon) as compared to when gender is marked after the noun (e.g., in Albanian: “pirun-i”, the fork, vs. “lug-a”, the spoon). With computational simulations I could show that this effect arises because human learning is sensitive to the order of words.However, while gendered articles are hard to learn, they can facilitate communication because they can make following nouns more predictable and therefore easier to process: for example, after the German article “der”, “Löffel” is quite likely, “Gabel”, however, is very unlikely to follow. This is a function that gendered suffixes, as in Albanian, or genderless articles, as in English, cannot fulfill. In a language production study, I observed that speakers produce more articles that can make following nouns predictable, such as German articles, than articles that cannot fulfill this function, such as the English article “the”.I conclude that the order in which gender is marked in languages affects language learning as well as communication. This makes German gender hard to learn but useful for communication

    A guide to learning modules in a dynamic network

    Get PDF

    A guide to learning modules in a dynamic network

    Get PDF

    Cost aware Inference for IoT Devices

    Full text link
    Networked embedded devices (IoTs) of limitedCPU, memory and power resources are revo-lutionizing data gathering, remote monitoringand planning in many consumer and businessapplications. Nevertheless, resource limita-tions place a significant burden on their ser-vice life and operation, warranting cost-awaremethods that are capable of distributivelyscreening redundancies in device informationand transmitting informative data. We pro-pose to train a decentralized gated networkthat, given an observed instance at test-time,allows for activation of select devices to trans-mit information to a central node, which thenperforms inference. We analyze our proposedgradient descent algorithm for Gaussian fea-tures and establish convergence guaranteesunder good initialization. We conduct exper-iments on a number of real-world datasetsarising in IoT applications and show that ourmodel results in over 1.5X service life withnegligible accuracy degradation relative to aperformance achievable by a neural network.http://proceedings.mlr.press/v89/zhu19d/zhu19d.pdfPublished versio
    • 

    corecore