31 research outputs found

    Improving fine-grained understanding in image-text pre-training

    Full text link
    We introduce SPARse Fine-grained Contrastive Alignment (SPARC), a simple method for pretraining more fine-grained multimodal representations from image-text pairs. Given that multiple image patches often correspond to single words, we propose to learn a grouping of image patches for every token in the caption. To achieve this, we use a sparse similarity metric between image patches and language tokens and compute for each token a language-grouped vision embedding as the weighted average of patches. The token and language-grouped vision embeddings are then contrasted through a fine-grained sequence-wise loss that only depends on individual samples and does not require other batch samples as negatives. This enables more detailed information to be learned in a computationally inexpensive manner. SPARC combines this fine-grained loss with a contrastive loss between global image and text embeddings to learn representations that simultaneously encode global and local information. We thoroughly evaluate our proposed method and show improved performance over competing approaches both on image-level tasks relying on coarse-grained information, e.g. classification, as well as region-level tasks relying on fine-grained information, e.g. retrieval, object detection, and segmentation. Moreover, SPARC improves model faithfulness and captioning in foundational vision-language models.Comment: 26 page

    Abstracts of the 2014 Brains, Minds, and Machines Summer School

    Get PDF
    A compilation of abstracts from the student projects of the 2014 Brains, Minds, and Machines Summer School, held at Woods Hole Marine Biological Lab, May 29 - June 12, 2014.This work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF-1231216

    Ulterior Motive Model

    No full text
    <p>This is the source code for the attribution of ulterior motives model developed as a project in Brains, Minds, and Machines summer course.</p

    Transfer of object shape knowledge across visual and haptic modalities

    No full text
    We investigate the hypothesis that multisensory representa-tions mediate the crossmodal transfer of shape knowledge across visual and haptic modalities. In our experiment, par-ticipants rated the similarities of pairs of synthetic 3-D objects in visual, haptic, cross-modal, and multisensory settings. Our results offer two contributions. First, we provide evidence for a single multisensory shape representation common to both vi-sual and haptic modalities. Second, our analyses suggest that these representations are part-based, representing objects as compositions of subparts

    From Sensory Signals to Modality-Independent Conceptual Representations: A Probabilistic Language of Thought Approach

    No full text
    People learn modality-independent, conceptual representations from modality-specific sensory signals. Here, we hypothesize that any system that accomplishes this feat will include three components: a representational language for characterizing modality-independent representations, a set of sensory-specific forward models for mapping from modality-independent representations to sensory signals, and an inference algorithm for inverting forward models—that is, an algorithm for using sensory signals to infer modality-independent representations. To evaluate this hypothesis, we instantiate it in the form of a computational model that learns object shape representations from visual and/or haptic signals. The model uses a probabilistic grammar to characterize modality-independent representations of object shape, uses a computer graphics toolkit and a human hand simulator to map from object representations to visual and haptic features, respectively, and uses a Bayesian inference algorithm to infer modality-independent object representations from visual and/or haptic signals. Simulation results show that the model infers identical object representations when an object is viewed, grasped, or both. That is, the model’s percepts are modality invariant. We also report the results of an experiment in which different subjects rated the similarity of pairs of objects in different sensory conditions, and show that the model provides a very accurate account of subjects’ ratings. Conceptually, this research significantly contributes to our understanding of modality invariance, an important type of perceptual constancy, by demonstrating how modality-independent representations can be acquired and used. Methodologically, it provides an important contribution to cognitive modeling, particularly an emerging probabilistic language-of-thought approach, by showing how symbolic and statistical approaches can be combined in order to understand aspects of human perception.United States. Air Force Office of Scientific Research (Grant FA9550-12-1-0303)National Science Foundation (U.S.) (Grant BCS-1400784

    Shape perception as Bayesian inference of modality-independent part-based 3D object-centered shape representations

    No full text
    Thesis (Ph. D.)--University of Rochester. Department of Brain and Cognitive Sciences, 2017.Shape is a fundamental property of physical objects. It provides crucial information for various critical behaviors from object recognition to motor planning. The fundamental question here for cognitive science is to understand object shape perception, i.e., how our brains extract shape information from sensory stimuli and make use of it. In other words, we want to understand the representations and algorithms our brains use to achieve successful shape perception. This thesis reports a computational theory of shape perception that uses modality-independent, part-based, 3D, object-centered shape representations and frames shape perception as Bayesian inference over such representations. In a series of behavioral, neuroimaging and computational studies reported in the following chapters, we test various aspects of this proposed theory and show that it provides a promising approach to understanding shape perception

    Effects of low back massage on perceived birth pain and satisfaction

    No full text
    Aim The aim of the study was to evaluate the effect of low back massage on perceived birth pain and delivery. Method This study was designed as a study-control experimental type. The study sample consisted of 62 pregnant women (massage group = 31, control group = 31). Massage was applied to the study group in three phases during intrapartum period. The massages were done at the end of latent, active and transition phases (at cervical dilatation 3–4 cm, 5–7 cm, 8–10 cm) correspondingly. The VAS scores were evaluated three times during all phases. Results The first mean VAS score was 5.2 ± 0.9 and 7.3 ± 1.3 for massage and control groups, respectively. Second VAS score was found as 6.6 ± 1.6 in massage group and 8.8 ± 1.0 in control group. The third VAS score was significantly higher in the control group than massage group during third evaluation (9.2 ± 2.4 vs 6.7 ± 2.7) (p 0.05). The mean scores of satisfaction about delivery were found as 8.8 ± 0.7 in massage group and 6.9 ± 0.8 in control group (p < 0.05). Conclusion It was determined in the study that lower back massage has a significant impact on reducing labor pain and increasing the satisfaction with birth. Health professionals, who work in the delivery unit, can use massage intervention for decreasing pain, shortening delivery time and increasing satisfaction with birth experience. © 2017 Elsevier Lt
    corecore