597 research outputs found

    Open Vocabulary Semantic Scene Sketch Understanding

    Full text link
    We study the underexplored but fundamental vision problem of machine understanding of abstract freehand scene sketches. We introduce a sketch encoder that results in semantically-aware feature space, which we evaluate by testing its performance on a semantic sketch segmentation task. To train our model we rely only on the availability of bitmap sketches with their brief captions and do not require any pixel-level annotations. To obtain generalization to a large set of sketches and categories, we build on a vision transformer encoder pretrained with the CLIP model. We freeze the text encoder and perform visual-prompt tuning of the visual encoder branch while introducing a set of critical modifications. Firstly, we augment the classical key-query (k-q) self-attention blocks with value-value (v-v) self-attention blocks. Central to our model is a two-level hierarchical network design that enables efficient semantic disentanglement: The first level ensures holistic scene sketch encoding, and the second level focuses on individual categories. We, then, in the second level of the hierarchy, introduce a cross-attention between textual and visual branches. Our method outperforms zero-shot CLIP pixel accuracy of segmentation results by 37 points, reaching an accuracy of 85.5%85.5\% on the FS-COCO sketch dataset. Finally, we conduct a user study that allows us to identify further improvements needed over our method to reconcile machine and human understanding of scene sketches

    Learning Dense Correspondences between Photos and Sketches

    Full text link
    Humans effortlessly grasp the connection between sketches and real-world objects, even when these sketches are far from realistic. Moreover, human sketch understanding goes beyond categorization -- critically, it also entails understanding how individual elements within a sketch correspond to parts of the physical world it represents. What are the computational ingredients needed to support this ability? Towards answering this question, we make two contributions: first, we introduce a new sketch-photo correspondence benchmark, PSC6k\textit{PSC6k}, containing 150K annotations of 6250 sketch-photo pairs across 125 object categories, augmenting the existing Sketchy dataset with fine-grained correspondence metadata. Second, we propose a self-supervised method for learning dense correspondences between sketch-photo pairs, building upon recent advances in correspondence learning for pairs of photos. Our model uses a spatial transformer network to estimate the warp flow between latent representations of a sketch and photo extracted by a contrastive learning-based ConvNet backbone. We found that this approach outperformed several strong baselines and produced predictions that were quantitatively consistent with other warp-based methods. However, our benchmark also revealed systematic differences between predictions of the suite of models we tested and those of humans. Taken together, our work suggests a promising path towards developing artificial systems that achieve more human-like understanding of visual images at different levels of abstraction. Project page: https://photo-sketch-correspondence.github.ioComment: Accepted to ICML 2023. Project page: https://photo-sketch-correspondence.github.i

    Drawing as a versatile cognitive tool

    Get PDF
    Drawing is a cognitive tool that makes the invisible contents of mental life visible. Humans use this tool to produce a remarkable variety of pictures, from realistic portraits to schematic diagrams. Despite this variety and the prevalence of drawn images, the psychological mechanisms that enable drawings to be so versatile have yet to be fully explored. In this Review, we synthesize contemporary work in multiple areas of psychology, computer science and neuroscience that examines the cognitive processes involved in drawing production and comprehension. This body of findings suggests that the balance of contributions from perception, memory and social inference during drawing production varies depending on the situation, resulting in some drawings that are more realistic and other drawings that are more abstract. We also consider the use of drawings as a research tool for investigating various aspects of cognition, as well as the role that drawing has in facilitating learning and communication. Taken together, information about how drawings are used in different contexts illuminates the central role of visually grounded abstractions in human thought and behaviour

    Visual scoping operations for physical assembly

    Full text link
    Planning is hard. The use of subgoals can make planning more tractable, but selecting these subgoals is computationally costly. What algorithms might enable us to reap the benefits of planning using subgoals while minimizing the computational overhead of selecting them? We propose visual scoping, a strategy that interleaves planning and acting by alternately defining a spatial region as the next subgoal and selecting actions to achieve it. We evaluated our visual scoping algorithm on a variety of physical assembly problems against two baselines: planning all subgoals in advance and planning without subgoals. We found that visual scoping achieves comparable task performance to the subgoal planner while requiring only a fraction of the total computational cost. Together, these results contribute to our understanding of how humans might make efficient use of cognitive resources to solve complex planning problems

    Learning to communicate about shared procedural abstractions

    Full text link
    Many real-world tasks require agents to coordinate their behavior to achieve shared goals. Successful collaboration requires not only adopting the same communicative conventions, but also grounding these conventions in the same task-appropriate conceptual abstractions. We investigate how humans use natural language to collaboratively solve physical assembly problems more effectively over time. Human participants were paired up in an online environment to reconstruct scenes containing two block towers. One participant could see the target towers, and sent assembly instructions for the other participant to reconstruct. Participants provided increasingly concise instructions across repeated attempts on each pair of towers, using higher-level referring expressions that captured each scene's hierarchical structure. To explain these findings, we extend recent probabilistic models of ad-hoc convention formation with an explicit perceptual learning mechanism. These results shed light on the inductive biases that enable intelligent agents to coordinate upon shared procedural abstractions

    Measuring and Modeling Physical Intrinsic Motivation

    Full text link
    Humans are interactive agents driven to seek out situations with interesting physical dynamics. Here we formalize the functional form of physical intrinsic motivation. We first collect ratings of how interesting humans find a variety of physics scenarios. We then model human interestingness responses by implementing various hypotheses of intrinsic motivation including models that rely on simple scene features to models that depend on forward physics prediction. We find that the single best predictor of human responses is adversarial reward, a model derived from physical prediction loss. We also find that simple scene feature models do not generalize their prediction of human responses across all scenarios. Finally, linearly combining the adversarial model with the number of collisions in a scene leads to the greatest improvement in predictivity of human responses, suggesting humans are driven towards scenarios that result in high information gain and physical activity.Comment: 6 pages, 5 figures, accepted to CogSci 2023 with full paper publication in the proceeding

    Diversity in rural communities: palliative care for the low German Mennonites

    Get PDF
    Open access articleMulticulturalism is an important concept in Canada although there has been limited examination of its meaning in rural settings. To expand this knowledge base, this article presents the Low German (LG) Mennonites as a case example within a context of providing palliative care to this diverse population which is one of the Anabaptist groups that practice adult baptism and focus on a literal interpretation of the Bible. Findings from a mixed methods study and a graduate thesis conducted within the framework of this study focus on three main ideas to understand the perspectives related to death and dying among this group: faith-based healthcare facilities; family, community and mutual aid; and, keeping the dying connected. Implications of these concepts to health and social service providers are included. Canada has prided itself in being a multicultural country, but there has been little indepth examination of cultural or religious diversity in rural settings. This article presents the research related to understanding death and dying among the Low German (LG) Mennonites as a case example to illustrate the complexity of providing care to a unique religious group in rural Canada. The mixed methods and qualitative research were conducted in both Canada and Mexico. The findings are presented according to the following three ideas: (1) faith-based healthcare facilities; (2) family, community and mutual aid; and, (3) keeping the dying connected. The significance of the findings for the provision of care of this group are also included.Ye

    Mental health beliefs and practices among Low German Mennonites: application to practice

    Get PDF
    The purpose of this mixed-methods study was to identify the cultural and religious understandings and beliefs related to mental health wellness and illness in general among the LG Mennonites, with the intention of applying what was learned to assist in the development and deployment of more appropriate healthcare services for the LG Mennonite communities
    corecore