387 research outputs found

    The Role of Syntactic Planning in Compositional Image Captioning

    Full text link
    Image captioning has focused on generalizing to images drawn from the same distribution as the training set, and not to the more challenging problem of generalizing to different distributions of images. Recently, Nikolaus et al. (2019) introduced a dataset to assess compositional generalization in image captioning, where models are evaluated on their ability to describe images with unseen adjective-noun and noun-verb compositions. In this work, we investigate different methods to improve compositional generalization by planning the syntactic structure of a caption. Our experiments show that jointly modeling tokens and syntactic tags enhances generalization in both RNN- and Transformer-based models, while also improving performance on standard metrics.Comment: Accepted at EACL 202

    Teaching Compositionality to CNNs

    Full text link
    Convolutional neural networks (CNNs) have shown great success in computer vision, approaching human-level performance when trained for specific tasks via application-specific loss functions. In this paper, we propose a method for augmenting and training CNNs so that their learned features are compositional. It encourages networks to form representations that disentangle objects from their surroundings and from each other, thereby promoting better generalization. Our method is agnostic to the specific details of the underlying CNN to which it is applied and can in principle be used with any CNN. As we show in our experiments, the learned representations lead to feature activations that are more localized and improve performance over non-compositional baselines in object recognition tasks.Comment: Preprint appearing in CVPR 201

    Semantic Image Retrieval via Active Grounding of Visual Situations

    Full text link
    We describe a novel architecture for semantic image retrieval---in particular, retrieval of instances of visual situations. Visual situations are concepts such as "a boxing match," "walking the dog," "a crowd waiting for a bus," or "a game of ping-pong," whose instantiations in images are linked more by their common spatial and semantic structure than by low-level visual similarity. Given a query situation description, our architecture---called Situate---learns models capturing the visual features of expected objects as well the expected spatial configuration of relationships among objects. Given a new image, Situate uses these models in an attempt to ground (i.e., to create a bounding box locating) each expected component of the situation in the image via an active search procedure. Situate uses the resulting grounding to compute a score indicating the degree to which the new image is judged to contain an instance of the situation. Such scores can be used to rank images in a collection as part of a retrieval system. In the preliminary study described here, we demonstrate the promise of this system by comparing Situate's performance with that of two baseline methods, as well as with a related semantic image-retrieval system based on "scene graphs.

    Modulation of taxonomic (versus thematic) similarity judgments and product choices by inducing local and global processing

    Get PDF
    Perceived similarity is influenced by both taxonomic and thematic relations. Assessing taxonomic relations requires comparing individual features of objects whereas assessing thematic relations requires exploring how objects functionally interact. These processes appear to relate to different thinking styles: abstract thinking and a global focus may be required to explore functional interactions whereas attention to detail and a local focus may be required to compare specific features. In four experiments we explored this idea by assessing whether a preference for taxonomic or thematic relations could be created by inducing a local or global perceptual processing style. Experiments 1–3 primed processing style via a perceptual task and used a choice task to examine preference for taxonomic (versus thematic) relations. Experiment 4 induced processing style and examined the effect on similarity ratings for pairs of taxonomic and thematically related items. In all cases processing style influenced preference for taxonomic/thematic relations

    First use of single-crystal diamonds as fission-fragment detector

    Get PDF
    Single crystal chemical vapor deposited diamond (sCVD) was investigated for its ability to act as Fission fragment detector. In particular we investigated timing and energy resolution for application in a simultaneous time and energy measurement to determine the mass of the detected fission fragment. Previous tests have shown that poly crystalline chemical vapor deposited (pCVD) diamonds provide sufficient timing resolution, but their poor energy resolution did not allow complete separation between very low energy fission fragments, alpha-particles and noise. Our present investigations prove artificial sCVD diamonds to show similar timing resolution as pCVD diamonds close to 100 ps. Improved pulse height resolution allows the unequivocal separation of fission fragments, and the detection efficiency reaches 100%, but remains with about a few percent behind requirements for fragment mass identification. With high-speed digital electronics a timing resolution well below 100 ps is possible. However, the strongly varying quality of the presently available diamond material does not allow application on a sufficiently large scale within reasonable investments

    On the Accuracy of Hyper-local Geotagging of Social Media Content

    Full text link
    Social media users share billions of items per year, only a small fraction of which is geotagged. We present a data- driven approach for identifying non-geotagged content items that can be associated with a hyper-local geographic area by modeling the location distributions of hyper-local n-grams that appear in the text. We explore the trade-off between accuracy, precision and coverage of this method. Further, we explore differences across content received from multiple platforms and devices, and show, for example, that content shared via different sources and applications produces significantly different geographic distributions, and that it is best to model and predict location for items according to their source. Our findings show the potential and the bounds of a data-driven approach to geotag short social media texts, and offer implications for all applications that use data-driven approaches to locate content.Comment: 10 page
    • …
    corecore