7,376 research outputs found

    Language Models for Image Captioning: The Quirks and What Works

    Full text link
    Two recent approaches have achieved state-of-the-art results in image captioning. The first uses a pipelined process where a set of candidate words is generated by a convolutional neural network (CNN) trained on images, and then a maximum entropy (ME) language model is used to arrange these words into a coherent sentence. The second uses the penultimate activation layer of the CNN as input to a recurrent neural network (RNN) that then generates the caption sequence. In this paper, we compare the merits of these different language modeling approaches for the first time by using the same state-of-the-art CNN as input. We examine issues in the different approaches, including linguistic irregularities, caption repetition, and data set overlap. By combining key aspects of the ME and RNN methods, we achieve a new record performance over previously published results on the benchmark COCO dataset. However, the gains we see in BLEU do not translate to human judgments.Comment: See http://research.microsoft.com/en-us/projects/image_captioning for project informatio

    Time fractals and discrete scale invariance with trapped ions

    Full text link
    We show that a one-dimensional chain of trapped ions can be engineered to produce a quantum mechanical system with discrete scale invariance and fractal-like time dependence. By discrete scale invariance we mean a system that replicates itself under a rescaling of distance for some scale factor, and a time fractal is a signal that is invariant under the rescaling of time. These features are reminiscent of the Efimov effect, which has been predicted and observed in bound states of three-body systems. We demonstrate that discrete scale invariance in the trapped ion system can be controlled with two independently tunable parameters. We also discuss the extension to n-body states where the discrete scaling symmetry has an exotic heterogeneous structure. The results we present can be realized using currently available technologies developed for trapped ion quantum systems.Comment: 4 + 5 pages (main + supplemental materials), 2 + 3 figures (main + supplemental materials), version to appear in Physical Review A Rapid Communication

    Generating Natural Questions About an Image

    Full text link
    There has been an explosion of work in the vision & language community during the past few years from image captioning to video transcription, and answering questions about images. These tasks have focused on literal descriptions of the image. To move beyond the literal, we choose to explore how questions about an image are often directed at commonsense inference and the abstract events evoked by objects in the image. In this paper, we introduce the novel task of Visual Question Generation (VQG), where the system is tasked with asking a natural and engaging question when shown an image. We provide three datasets which cover a variety of images from object-centric to event-centric, with considerably more abstract training data than provided to state-of-the-art captioning systems thus far. We train and test several generative and retrieval models to tackle the task of VQG. Evaluation results show that while such models ask reasonable questions for a variety of images, there is still a wide gap with human performance which motivates further work on connecting images with commonsense knowledge and pragmatics. Our proposed task offers a new challenge to the community which we hope furthers interest in exploring deeper connections between vision & language.Comment: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistic

    Nanoscale Metamaterial Optical Waveguides with Ultrahigh Refractive Indices

    Full text link
    We propose deep-subwavelength optical waveguides based on metal-dielectric multilayer indefinite metamaterials with ultrahigh effective refractive indices. Waveguide modes with different mode orders are systematically analyzed with numerical simulations based on both metal-dielectric multilayer structures and the effective medium approach. The dependences of waveguide mode indices, propagation lengths and mode areas on different mode orders, free space wavelengths and sizes of waveguide cross sections are studied. Furthermore, waveguide modes are also illustrated with iso-frequency contours in the wave vector space in order to investigate the mechanism of waveguide mode cutoff for high order modes. The deep-subwavelength optical waveguide with a size smaller than {\lambda}0/50 and a mode area in the order of 10-4 {\lambda}02 is realized, and an ultrahigh effective refractive index up to 62.0 is achieved at the telecommunication wavelength. This new type of metamaterial optical waveguide opens up opportunities for various applications in enhanced light-matter interactions.Comment: 22 pages, 8 figure

    Stochastic Substitute Training: A Gray-box Approach to Craft Adversarial Examples Against Gradient Obfuscation Defenses

    Full text link
    It has been shown that adversaries can craft example inputs to neural networks which are similar to legitimate inputs but have been created to purposely cause the neural network to misclassify the input. These adversarial examples are crafted, for example, by calculating gradients of a carefully defined loss function with respect to the input. As a countermeasure, some researchers have tried to design robust models by blocking or obfuscating gradients, even in white-box settings. Another line of research proposes introducing a separate detector to attempt to detect adversarial examples. This approach also makes use of gradient obfuscation techniques, for example, to prevent the adversary from trying to fool the detector. In this paper, we introduce stochastic substitute training, a gray-box approach that can craft adversarial examples for defenses which obfuscate gradients. For those defenses that have tried to make models more robust, with our technique, an adversary can craft adversarial examples with no knowledge of the defense. For defenses that attempt to detect the adversarial examples, with our technique, an adversary only needs very limited information about the defense to craft adversarial examples. We demonstrate our technique by applying it against two defenses which make models more robust and two defenses which detect adversarial examples.Comment: Accepted by AISec '18: 11th ACM Workshop on Artificial Intelligence and Security. Source code at https://github.com/S-Mohammad-Hashemi/SS

    Maximally localized Wannier functions in LaMnO3 within PBE+U, hybrid functionals, and partially self-consistent GW: an efficient route to construct ab-initio tight-binding parameters for e_g perovskites

    Full text link
    Using the newly developed VASP2WANNIER90 interface we have constructed maximally localized Wannier functions (MLWFs) for the e_g states of the prototypical Jahn-Teller magnetic perovskite LaMnO3 at different levels of approximation for the exchange-correlation kernel. These include conventional density functional theory (DFT) with and without additional on-site Hubbard U term, hybrid-DFT, and partially self-consistent GW. By suitably mapping the MLWFs onto an effective e_g tight-binding (TB) Hamiltonian we have computed a complete set of TB parameters which should serve as guidance for more elaborate treatments of correlation effects in effective Hamiltonian-based approaches. The method-dependent changes of the calculated TB parameters and their interplay with the electron-electron (el-el) interaction term are discussed and interpreted. We discuss two alternative model parameterizations: one in which the effects of the el-el interaction are implicitly incorporated in the otherwise "noninteracting" TB parameters, and a second where we include an explicit mean-field el-el interaction term in the TB Hamiltonian. Both models yield a set of tabulated TB parameters which provide the band dispersion in excellent agreement with the underlying ab initio and MLWF bands.Comment: 30 pages, 7 figure
    corecore