387 research outputs found
The Role of Syntactic Planning in Compositional Image Captioning
Image captioning has focused on generalizing to images drawn from the same
distribution as the training set, and not to the more challenging problem of
generalizing to different distributions of images. Recently, Nikolaus et al.
(2019) introduced a dataset to assess compositional generalization in image
captioning, where models are evaluated on their ability to describe images with
unseen adjective-noun and noun-verb compositions. In this work, we investigate
different methods to improve compositional generalization by planning the
syntactic structure of a caption. Our experiments show that jointly modeling
tokens and syntactic tags enhances generalization in both RNN- and
Transformer-based models, while also improving performance on standard metrics.Comment: Accepted at EACL 202
Teaching Compositionality to CNNs
Convolutional neural networks (CNNs) have shown great success in computer
vision, approaching human-level performance when trained for specific tasks via
application-specific loss functions. In this paper, we propose a method for
augmenting and training CNNs so that their learned features are compositional.
It encourages networks to form representations that disentangle objects from
their surroundings and from each other, thereby promoting better
generalization. Our method is agnostic to the specific details of the
underlying CNN to which it is applied and can in principle be used with any
CNN. As we show in our experiments, the learned representations lead to feature
activations that are more localized and improve performance over
non-compositional baselines in object recognition tasks.Comment: Preprint appearing in CVPR 201
Semantic Image Retrieval via Active Grounding of Visual Situations
We describe a novel architecture for semantic image retrieval---in
particular, retrieval of instances of visual situations. Visual situations are
concepts such as "a boxing match," "walking the dog," "a crowd waiting for a
bus," or "a game of ping-pong," whose instantiations in images are linked more
by their common spatial and semantic structure than by low-level visual
similarity. Given a query situation description, our architecture---called
Situate---learns models capturing the visual features of expected objects as
well the expected spatial configuration of relationships among objects. Given a
new image, Situate uses these models in an attempt to ground (i.e., to create a
bounding box locating) each expected component of the situation in the image
via an active search procedure. Situate uses the resulting grounding to compute
a score indicating the degree to which the new image is judged to contain an
instance of the situation. Such scores can be used to rank images in a
collection as part of a retrieval system. In the preliminary study described
here, we demonstrate the promise of this system by comparing Situate's
performance with that of two baseline methods, as well as with a related
semantic image-retrieval system based on "scene graphs.
Modulation of taxonomic (versus thematic) similarity judgments and product choices by inducing local and global processing
Perceived similarity is influenced by both taxonomic and thematic relations. Assessing taxonomic relations requires comparing individual features of objects whereas assessing thematic relations requires exploring how objects functionally interact. These processes appear to relate to different thinking styles: abstract thinking and a global focus may be required to explore functional interactions whereas attention to detail and a local focus may be required to compare specific features. In four experiments we explored this idea by assessing whether a preference for taxonomic or thematic relations could be created by inducing a local or global perceptual processing style. Experiments 1–3 primed processing style via a perceptual task and used a choice task to examine preference for taxonomic (versus thematic) relations. Experiment 4 induced processing style and examined the effect on similarity ratings for pairs of taxonomic and thematically related items. In all cases processing style influenced preference for taxonomic/thematic relations
First use of single-crystal diamonds as fission-fragment detector
Single crystal chemical vapor deposited diamond (sCVD) was investigated for its ability to act as Fission fragment detector. In particular we investigated timing and energy resolution for application in a simultaneous time and energy measurement to determine the mass of the detected fission fragment. Previous tests have shown that poly crystalline chemical vapor deposited (pCVD) diamonds provide sufficient timing resolution, but their poor energy resolution did not allow complete separation between very low energy fission fragments, alpha-particles and noise. Our present investigations prove artificial sCVD diamonds to show similar timing resolution as pCVD diamonds close to 100 ps. Improved pulse height resolution allows the unequivocal separation of fission fragments, and the detection efficiency reaches 100%, but remains with about a few percent behind requirements for fragment mass identification. With high-speed digital electronics a timing resolution well below 100 ps is possible. However, the strongly varying quality of the presently available diamond material does not allow application on a sufficiently large scale within reasonable investments
On the Accuracy of Hyper-local Geotagging of Social Media Content
Social media users share billions of items per year, only a small fraction of
which is geotagged. We present a data- driven approach for identifying
non-geotagged content items that can be associated with a hyper-local
geographic area by modeling the location distributions of hyper-local n-grams
that appear in the text. We explore the trade-off between accuracy, precision
and coverage of this method. Further, we explore differences across content
received from multiple platforms and devices, and show, for example, that
content shared via different sources and applications produces significantly
different geographic distributions, and that it is best to model and predict
location for items according to their source. Our findings show the potential
and the bounds of a data-driven approach to geotag short social media texts,
and offer implications for all applications that use data-driven approaches to
locate content.Comment: 10 page
- …