597 research outputs found
Open Vocabulary Semantic Scene Sketch Understanding
We study the underexplored but fundamental vision problem of machine
understanding of abstract freehand scene sketches. We introduce a sketch
encoder that results in semantically-aware feature space, which we evaluate by
testing its performance on a semantic sketch segmentation task. To train our
model we rely only on the availability of bitmap sketches with their brief
captions and do not require any pixel-level annotations. To obtain
generalization to a large set of sketches and categories, we build on a vision
transformer encoder pretrained with the CLIP model. We freeze the text encoder
and perform visual-prompt tuning of the visual encoder branch while introducing
a set of critical modifications. Firstly, we augment the classical key-query
(k-q) self-attention blocks with value-value (v-v) self-attention blocks.
Central to our model is a two-level hierarchical network design that enables
efficient semantic disentanglement: The first level ensures holistic scene
sketch encoding, and the second level focuses on individual categories. We,
then, in the second level of the hierarchy, introduce a cross-attention between
textual and visual branches. Our method outperforms zero-shot CLIP pixel
accuracy of segmentation results by 37 points, reaching an accuracy of
on the FS-COCO sketch dataset. Finally, we conduct a user study that allows us
to identify further improvements needed over our method to reconcile machine
and human understanding of scene sketches
Learning Dense Correspondences between Photos and Sketches
Humans effortlessly grasp the connection between sketches and real-world
objects, even when these sketches are far from realistic. Moreover, human
sketch understanding goes beyond categorization -- critically, it also entails
understanding how individual elements within a sketch correspond to parts of
the physical world it represents. What are the computational ingredients needed
to support this ability? Towards answering this question, we make two
contributions: first, we introduce a new sketch-photo correspondence benchmark,
, containing 150K annotations of 6250 sketch-photo pairs across
125 object categories, augmenting the existing Sketchy dataset with
fine-grained correspondence metadata. Second, we propose a self-supervised
method for learning dense correspondences between sketch-photo pairs, building
upon recent advances in correspondence learning for pairs of photos. Our model
uses a spatial transformer network to estimate the warp flow between latent
representations of a sketch and photo extracted by a contrastive learning-based
ConvNet backbone. We found that this approach outperformed several strong
baselines and produced predictions that were quantitatively consistent with
other warp-based methods. However, our benchmark also revealed systematic
differences between predictions of the suite of models we tested and those of
humans. Taken together, our work suggests a promising path towards developing
artificial systems that achieve more human-like understanding of visual images
at different levels of abstraction. Project page:
https://photo-sketch-correspondence.github.ioComment: Accepted to ICML 2023. Project page:
https://photo-sketch-correspondence.github.i
Drawing as a versatile cognitive tool
Drawing is a cognitive tool that makes the invisible contents of mental life visible. Humans use this tool to produce a remarkable variety of pictures, from realistic portraits to schematic diagrams. Despite this variety and the prevalence of drawn images, the psychological mechanisms that enable drawings to be so versatile have yet to be fully explored. In this Review, we synthesize contemporary work in multiple areas of psychology, computer science and neuroscience that examines the cognitive processes involved in drawing production and comprehension. This body of findings suggests that the balance of contributions from perception, memory and social inference during drawing production varies depending on the situation, resulting in some drawings that are more realistic and other drawings that are more abstract. We also consider the use of drawings as a research tool for investigating various aspects of cognition, as well as the role that drawing has in facilitating learning and communication. Taken together, information about how drawings are used in different contexts illuminates the central role of visually grounded abstractions in human thought and behaviour
Visual scoping operations for physical assembly
Planning is hard. The use of subgoals can make planning more tractable, but
selecting these subgoals is computationally costly. What algorithms might
enable us to reap the benefits of planning using subgoals while minimizing the
computational overhead of selecting them? We propose visual scoping, a strategy
that interleaves planning and acting by alternately defining a spatial region
as the next subgoal and selecting actions to achieve it. We evaluated our
visual scoping algorithm on a variety of physical assembly problems against two
baselines: planning all subgoals in advance and planning without subgoals. We
found that visual scoping achieves comparable task performance to the subgoal
planner while requiring only a fraction of the total computational cost.
Together, these results contribute to our understanding of how humans might
make efficient use of cognitive resources to solve complex planning problems
Learning to communicate about shared procedural abstractions
Many real-world tasks require agents to coordinate their behavior to achieve
shared goals. Successful collaboration requires not only adopting the same
communicative conventions, but also grounding these conventions in the same
task-appropriate conceptual abstractions. We investigate how humans use natural
language to collaboratively solve physical assembly problems more effectively
over time. Human participants were paired up in an online environment to
reconstruct scenes containing two block towers. One participant could see the
target towers, and sent assembly instructions for the other participant to
reconstruct. Participants provided increasingly concise instructions across
repeated attempts on each pair of towers, using higher-level referring
expressions that captured each scene's hierarchical structure. To explain these
findings, we extend recent probabilistic models of ad-hoc convention formation
with an explicit perceptual learning mechanism. These results shed light on the
inductive biases that enable intelligent agents to coordinate upon shared
procedural abstractions
Measuring and Modeling Physical Intrinsic Motivation
Humans are interactive agents driven to seek out situations with interesting
physical dynamics. Here we formalize the functional form of physical intrinsic
motivation. We first collect ratings of how interesting humans find a variety
of physics scenarios. We then model human interestingness responses by
implementing various hypotheses of intrinsic motivation including models that
rely on simple scene features to models that depend on forward physics
prediction. We find that the single best predictor of human responses is
adversarial reward, a model derived from physical prediction loss. We also find
that simple scene feature models do not generalize their prediction of human
responses across all scenarios. Finally, linearly combining the adversarial
model with the number of collisions in a scene leads to the greatest
improvement in predictivity of human responses, suggesting humans are driven
towards scenarios that result in high information gain and physical activity.Comment: 6 pages, 5 figures, accepted to CogSci 2023 with full paper
publication in the proceeding
Diversity in rural communities: palliative care for the low German Mennonites
Open access articleMulticulturalism is an important concept in Canada although there has been limited examination of its meaning in rural settings. To expand this knowledge base, this article presents the Low German (LG) Mennonites as a case example within a context of providing palliative care to this diverse population which is one of the Anabaptist groups that practice adult baptism and focus on a literal interpretation of the Bible. Findings from a mixed methods study and a graduate thesis conducted within the framework of this study focus on three main ideas to understand the perspectives related to death and dying among this group: faith-based healthcare facilities; family, community and mutual aid; and, keeping the dying connected. Implications of these concepts to health and social service providers are included. Canada has prided itself in being a multicultural country, but there has been little indepth examination of cultural or religious diversity in rural settings. This article presents the research related to understanding death and dying among the Low German (LG) Mennonites as a case example to illustrate the complexity of providing care to a unique religious group in rural Canada. The mixed methods and qualitative research were conducted in both Canada and Mexico. The findings are presented according to the following three ideas: (1) faith-based healthcare facilities; (2) family, community and mutual aid; and, (3) keeping the dying connected. The significance of the findings for the provision of care of this group are also included.Ye
Mental health beliefs and practices among Low German Mennonites: application to practice
The purpose of this mixed-methods study was to identify
the cultural and religious understandings and beliefs related
to mental health wellness and illness in general among the
LG Mennonites, with the intention of applying what was
learned to assist in the development and deployment of
more appropriate healthcare services for the LG Mennonite
communities
- …