263 research outputs found
Sound-to-imagination: an exploratory study on cross-modal translation using diverse audiovisual data
The motivation of our research is to explore the possibilities of automatic sound-to-image (S2I) translation for enabling a human receiver to visually infer occurrences of sound-related events. We expect the computer to âimagineâ scenes from captured sounds, generating original images that depict the sound-emitting sources. Previous studies on similar topics opted for simplified approaches using data with low content diversity and/or supervision/self-supervision for training. In contrast, our approach involves performing S2I translation using thousands of distinct and unknown scenes, using sound class annotations solely for data preparation, just enough to ensure auralâvisual semantic coherence. To model the translator, we employ an audio encoder and a conditional generative adversarial network (GAN) with a deep densely connected generator. Furthermore, we present a solution using informativity classifiers for quantitatively evaluating the generated images. This allows us to analyze the influence of network-bottleneck variation on the translation process, highlighting a potential trade-off between informativity and pixel space convergence. Despite the complexity of the specified S2I translation task, we were able to generalize the model enough to obtain more than 14%, on average, of interpretable and semantically coherent images translated from unknown sounds.The present work was supported in part by the Brazilian National Council for Scientific and Technological Development (CNPq) under PhD grant 200884/2015-8. Also, the work was partly supported by the Spanish State Research Agency (AEI), project PID2019-107579RBI00/AEI/10.13039/501100011033.Peer ReviewedPostprint (published version
Extreme Situations Prediction by MultidimenSional Heterogeneous Time Series Using Logical Decision Functions
* The work is supported by RFBR, grant 04-01-00858-aA method for prediction of multidimensional heterogeneous time series using logical decision functions
is suggested. The method implements simultaneous prediction of several goal variables. It uses deciding function
construction algorithm that performs directed search of some variable space partitioning in class of logical
deciding functions. To estimate a deciding function quality the realization of informativity criterion for conditional
distribution in goal variables' space is offered. As an indicator of extreme states, an occurrence a transition with
small probability is suggested
Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language Models
Vision-Language Large Models (VLMs) have become primary backbone of AI, due
to the impressive performance. However, their expensive computation costs,
i.e., throughput and delay, impede potentials in real-world scenarios. To
achieve acceleration for VLMs, most existing methods focus on the model
perspective: pruning, distillation, quantification, but completely overlook the
data-perspective redundancy. To fill the overlook, this paper pioneers the
severity of data redundancy, and designs one plug-and-play Turbo module guided
by information degree to prune inefficient tokens from visual or textual data.
In pursuit of efficiency-performance trade-offs, information degree takes two
key factors into consideration: mutual redundancy and semantic value.
Concretely, the former evaluates the data duplication between sequential
tokens; while the latter evaluates each token by its contribution to the
overall semantics. As a result, tokens with high information degree carry less
redundancy and stronger semantics. For VLMs' calculation, Turbo works as a
user-friendly plug-in that sorts data referring to information degree,
utilizing only top-level ones to save costs. Its advantages are multifaceted,
e.g., being generally compatible to various VLMs across understanding and
generation, simple use without retraining and trivial engineering efforts. On
multiple public VLMs benchmarks, we conduct extensive experiments to reveal the
gratifying acceleration of Turbo, under negligible performance drop
Linear Order in Language:an Error-Driven Learning Account
Learners of German often struggle with learning the grammatical gender of nouns and their correct articles, for example, that it should be âdie Gabelâ (the fork) and not âder Gabelâ. Why is this so hard? And why do gender systems even exist?I taught participants differently structured artificial languages and found that it is especially difficult to learn a gender system, when gender is marked before the noun (e.g., in German: âdie Gabelâ, the fork, vs. âder Löffelâ, the spoon) as compared to when gender is marked after the noun (e.g., in Albanian: âpirun-iâ, the fork, vs. âlug-aâ, the spoon). With computational simulations I could show that this effect arises because human learning is sensitive to the order of words.However, while gendered articles are hard to learn, they can facilitate communication because they can make following nouns more predictable and therefore easier to process: for example, after the German article âderâ, âLöffelâ is quite likely, âGabelâ, however, is very unlikely to follow. This is a function that gendered suffixes, as in Albanian, or genderless articles, as in English, cannot fulfill. In a language production study, I observed that speakers produce more articles that can make following nouns predictable, such as German articles, than articles that cannot fulfill this function, such as the English article âtheâ.I conclude that the order in which gender is marked in languages affects language learning as well as communication. This makes German gender hard to learn but useful for communication
Towards a text-linguistic definition of Qur'anic inimitability : a discourse perspective and problems of translation
Abstract unavailable please refer to PD
Cost aware Inference for IoT Devices
Networked embedded devices (IoTs) of limitedCPU, memory and power resources are revo-lutionizing data gathering, remote monitoringand planning in many consumer and businessapplications. Nevertheless, resource limita-tions place a significant burden on their ser-vice life and operation, warranting cost-awaremethods that are capable of distributivelyscreening redundancies in device informationand transmitting informative data. We pro-pose to train a decentralized gated networkthat, given an observed instance at test-time,allows for activation of select devices to trans-mit information to a central node, which thenperforms inference. We analyze our proposedgradient descent algorithm for Gaussian fea-tures and establish convergence guaranteesunder good initialization. We conduct exper-iments on a number of real-world datasetsarising in IoT applications and show that ourmodel results in over 1.5X service life withnegligible accuracy degradation relative to aperformance achievable by a neural network.http://proceedings.mlr.press/v89/zhu19d/zhu19d.pdfPublished versio
- âŠ