91 research outputs found
Robustness of Object Recognition under Extreme Occlusion in Humans and Computational Models
Most objects in the visual world are partially occluded, but humans can
recognize them without difficulty. However, it remains unknown whether object
recognition models like convolutional neural networks (CNNs) can handle
real-world occlusion. It is also a question whether efforts to make these
models robust to constant mask occlusion are effective for real-world
occlusion. We test both humans and the above-mentioned computational models in
a challenging task of object recognition under extreme occlusion, where target
objects are heavily occluded by irrelevant real objects in real backgrounds.
Our results show that human vision is very robust to extreme occlusion while
CNNs are not, even with modifications to handle constant mask occlusion. This
implies that the ability to handle constant mask occlusion does not entail
robustness to real-world occlusion. As a comparison, we propose another
computational model that utilizes object parts/subparts in a compositional
manner to build robustness to occlusion. This performs significantly better
than CNN-based models on our task with error patterns similar to humans. These
findings suggest that testing under extreme occlusion can better reveal the
robustness of visual recognition, and that the principle of composition can
encourage such robustness.Comment: To be presented at the 41st Annual Meeting of the Cognitive Science
Societ
Investigating compositional visual knowledge through challenging visual tasks
Human vision manifests remarkable robustness to recognize objects from the visual world filled with a chaotic, dynamic assortment of information. Computationally, our visual system is challenged by the enormous variability in two-dimensional projected images as a function of viewpoint, lighting, material, articulation as well as occlusion. Many past research investigated the underlying representations and computational principles that support human vision robustness with controlled and simplified visual stimuli. Nevertheless, the generality of these findings was unclear until tested on more challenging and more naturalistic stimuli.
In this thesis, I study human vision robustness with several challenging visual tasks and more naturalistic stimuli, including the recognition of occluded objects and the recognition of non-rigid human bodies from natural images of scenes. I use psychophysics, functional magnetic resonance imaging as well as computational modeling approaches to measure human vision robustness and examine the hierarchical, compositional framework as the underlying principle where the representation of the whole is composed of the representation of its parts through different hierarchies. I show that human vision has impressive abilities to recognize heavily occluded natural objects, and the human behavioral performance is better explained by compositional models rather than standard deep convolutional neural networks. In addition, I also show that human vision can rapidly and robustly extract information about spatial relationships between human body parts and discriminate three-dimensional non-rigid human poses even from a mere glance. Lastly, I show that there exists a distributed cortical network that encodes compositional pose representations with different view invariance and depth sensitivity, and the difference in these neural representations might be driven by the diversity of the supported behavior tasks. Taken together, this thesis demonstrates that human vision manifests great robustness even in these challenging visual tasks, and that the hierarchical, compositional framework may be one of the underlying principles supporting such robustness
DDRel: A New Dataset for Interpersonal Relation Classification in Dyadic Dialogues
Interpersonal language style shifting in dialogues is an interesting and
almost instinctive ability of human. Understanding interpersonal relationship
from language content is also a crucial step toward further understanding
dialogues. Previous work mainly focuses on relation extraction between named
entities in texts. In this paper, we propose the task of relation
classification of interlocutors based on their dialogues. We crawled movie
scripts from IMSDb, and annotated the relation labels for each session
according to 13 pre-defined relationships. The annotated dataset DDRel consists
of 6300 dyadic dialogue sessions between 694 pair of speakers with 53,126
utterances in total. We also construct session-level and pair-level relation
classification tasks with widely-accepted baselines. The experimental results
show that this task is challenging for existing models and the dataset will be
useful for future research.Comment: This paper has been accepted by AAAI202
Pica in a girl with non-suicidal self-injury: a case report
Non-suicidal self-injury (NSSI) is on the rise globally, posing a significant societal challenge. Pica, an eating disorder, presents difficulties in treatment due to the absence of effective medications. In this report, we discuss a complex case involving the co-occurrence of pica and non-suicidal self-injury. A 13-year-old girl was admitted to our hospital due to ingesting two batteries. She features a persistent, intense appetite along with sudden and compulsive behaviors such as consuming inedible items or self-inflicted cutting. After receiving a combination of pharmacological treatments (quetiapine, lithium and sertraline), cognitive behavioral therapy (CBT) and modified electroconvulsive therapy (MECT) for 25 days, she was discharged with relief from her clinical symptoms
An invisibility cloak using silver nanowires
In this paper, we use the parameter retrieval method together with an
analytical effective medium approach to design a well-performed invisible
cloak, which is based on an empirical revised version of the reduced cloak. The
designed cloak can be implemented by silver nanowires with elliptical
cross-sections embedded in a polymethyl methacrylate host. This cloak is
numerically proved to be robust for both the inner hidden object as well as
incoming detecting waves, and is much simpler thus easier to manufacture when
compared with the earlier proposed one [Nat. Photon. 1, 224 (2007)].Comment: 7 pages, 4 figures, 2 table
Pre-treatment Resting-State Functional MR Imaging Predicts the Long-Term Clinical Outcome After Short-Term Paroxtine Treatment in Post-traumatic Stress Disorder
Background: The chronic phase of post-traumatic stress disorder (PTSD) and the limited effectiveness of existing treatments creates the need for the development of potential biomarkers to predict response to antidepressant medication at an early stage. However, findings at present focus on acute therapeutic effect without following-up the long-term clinical outcome of PTSD. So far, studies predicting the long-term clinical outcome of short-term treatment based on both pre-treatment and post-treatment functional MRI in PTSD remains limited.Methods: Twenty-two PTSD patients were scanned using resting-state functional MRI (rs-fMRI) before and after 12 weeks of treatment with paroxetine. Twenty patients were followed up using the same psychopathological assessments 2 years after they underwent the second MRI scan. Based on clinical outcome, the follow-up patients were divided into those with remitted PTSD or persistent PTSD. Amplitude of low-frequency fluctuations (ALFF) and degree centrality (DC) derived from pre-treatment and post-treatment rs-fMRI were used as classification features in a support vector machine (SVM) classifier.Results: Prediction of long-term clinical outcome by combined ALFF and DC features derived from pre-treatment rs-fMRI yielded an accuracy rate of 72.5% (p < 0.005). The most informative voxels for outcome prediction were mainly located in the precuneus, superior temporal area, insula, dorsal medial prefrontal cortex, frontal orbital cortex, supplementary motor area, lingual gyrus, and cerebellum. Long-term outcome could not be successfully classified by post-treatment imaging features with accuracy rates <50%.Conclusions: Combined information from ALFF and DC from rs-fMRI data before treatment could predict the long-term clinical outcome of PTSD, which is critical for defining potential biomarkers to customize PTSD treatment and improve the prognosis
- …