1,602 research outputs found
Diagnosing Rarity in Human-Object Interaction Detection
Human-object interaction (HOI) detection is a core task in computer vision.
The goal is to localize all human-object pairs and recognize their
interactions. An interaction defined by a tuple leads to a
long-tailed visual recognition challenge since many combinations are rarely
represented. The performance of the proposed models is limited especially for
the tail categories, but little has been done to understand the reason. To that
end, in this paper, we propose to diagnose rarity in HOI detection. We propose
a three-step strategy, namely Detection, Identification and Recognition where
we carefully analyse the limiting factors by studying state-of-the-art models.
Our findings indicate that detection and identification steps are altered by
the interaction signals like occlusion and relative location, as a result
limiting the recognition accuracy.Comment: Accepted at CVPR'20 Workshop on Learning from Limited Label
Diagnosing Human-object Interaction Detectors
Although we have witnessed significant progress in human-object interaction
(HOI) detection with increasingly high mAP (mean Average Precision), a single
mAP score is too concise to obtain an informative summary of a model's
performance and to understand why one approach is better than another. In this
paper, we introduce a diagnosis toolbox for analyzing the error sources of the
existing HOI detection models. We first conduct holistic investigations in the
pipeline of HOI detection, consisting of human-object pair detection and then
interaction classification. We define a set of errors and the oracles to fix
each of them. By measuring the mAP improvement obtained from fixing an error
using its oracle, we can have a detailed analysis of the significance of
different errors. We then delve into the human-object detection and interaction
classification, respectively, and check the model's behavior. For the first
detection task, we investigate both recall and precision, measuring the
coverage of ground-truth human-object pairs as well as the noisiness level in
the detections. For the second classification task, we compute mAP for
interaction classification only, without considering the detection scores. We
also measure the performance of the models in differentiating human-object
pairs with and without actual interactions using the AP (Average Precision)
score. Our toolbox is applicable for different methods across different
datasets and available at https://github.com/neu-vi/Diag-HOI
Human-Object Interaction Detection:A Quick Survey and Examination of Methods
Human-object interaction detection is a relatively new task in the world of
computer vision and visual semantic information extraction. With the goal of
machines identifying interactions that humans perform on objects, there are
many real-world use cases for the research in this field. To our knowledge,
this is the first general survey of the state-of-the-art and milestone works in
this field. We provide a basic survey of the developments in the field of
human-object interaction detection. Many works in this field use multi-stream
convolutional neural network architectures, which combine features from
multiple sources in the input image. Most commonly these are the humans and
objects in question, as well as the spatial quality of the two. As far as we
are aware, there have not been in-depth studies performed that look into the
performance of each component individually. In order to provide insight to
future researchers, we perform an individualized study that examines the
performance of each component of a multi-stream convolutional neural network
architecture for human-object interaction detection. Specifically, we examine
the HORCNN architecture as it is a foundational work in the field. In addition,
we provide an in-depth look at the HICO-DET dataset, a popular benchmark in the
field of human-object interaction detection. Code and papers can be found at
https://github.com/SHI-Labs/Human-Object-Interaction-Detection.Comment: Published at The 1st International Workshop On Human-Centric
Multimedia Analysis, at ACM Multimedia Conference 202
Recommended from our members
Olfaction-enhanced multimedia: Perspectives and challenges
This is the post-print version of the Article. The official published version can be accessed from the link below - Copyright @ 2011 Springer VerlagOlfaction—or smell—is one of the last challenges which multimedia and multimodal applications have to conquer. Enhancing such applications with olfactory stimuli has the potential to create a more complex—and richer—user multimedia experience, by heightening the sense of reality and diversifying user interaction modalities. Nonetheless, olfaction-enhanced multimedia still remains a challenging research area. More recently, however, there have been initial signs of olfactory-enhanced applications in multimedia, with olfaction being used towards a variety of goals, including notification alerts, enhancing the sense of reality in immersive applications, and branding, to name but a few. However, as the goal of a multimedia application is to inform and/or entertain users, achieving quality olfaction-enhanced multimedia applications from the users’ perspective is vital to the success and continuity of these applications. Accordingly, in this paper we have focused on investigating the user perceived experience of olfaction-enhanced multimedia applications, with the aim of discovering the quality evaluation factors that are important from a user’s perspective of these applications, and consequently ensure the continued advancement and success of olfaction-enhanced multimedia applications
A mobile augmented reality application for supporting real-time skin lesion analysis based on deep learning
AbstractMelanoma is considered the deadliest skin cancer and when it is in an advanced state it is difficult to treat. Diagnoses are visually performed by dermatologists, by naked-eye observation. This paper proposes an augmented reality smartphone application for supporting the dermatologist in the real-time analysis of a skin lesion. The app augments the camera view with information related to the lesion features generally measured by the dermatologist for formulating the diagnosis. The lesion is also classified by a deep learning approach for identifying melanoma. The real-time process adopted for generating the augmented content is described. The real-time performances are also evaluated and a user study is also conducted. Results revealed that the real-time process may be entirely executed on the Smartphone and that the support provided is well judged by the target users
Spatial Dichotomies: Research into the Development of 3D Representation
Abstract
Matthew McGuire, for the Masters of Science degree in Professional Media & Media Management, presented on Friday, November 14th 2014, at Southern Illinois University Carbondale.
Spatial Dichotomies: Research into the Development of 3D Representation
Chair of Committee: Robert Spahr
This research paper covers how 3D technology influences different industries over time. Key points within the research find that over time humans consistently gravitate toward increasing their ability to recreate depth, texture and image. Industry leaders in education, film, medical, television, and the Web will find literature and discussion of how to adjust their modes of production and manufacturing in the future. Each of these industries will be reviewed throughout the report. The data in the report identifies 3D technology in two sections: object and screen-based technology. Each chapter contains social and financial elements that assist and react to new technology in the market. Additionally, to help show the effects on the different industries there are trends and time lines to show how individual industries develop. In each industry, there are multiple ways that 3D technology can be effective and ineffective. The research goes over transitions in early media to compare to transitions in modern mediums of art and science. After covering a short history on the transformation that society has had with the image, the paper covers public demand for new virtual settings in education and entertainment. Image-based research indicates that users will on average continuously desire more depth and interactivity in content
Preserved Consciousness in the Absence of a Cerebral Cortex, the Legal and Ethical Implications of Redefining Consciousness and Its Neural Correlates: A Case for a Subcortical System Generating Affective Consciousness
Historically, the scientific and medical communities have taken a corticocentric view on consciousness, emphasizing the need for a cortex in producing the conscious experience. The preserved consciousness observed in hydranencephalic children and decorticated rats suggests that some form of consciousness may be produced by a subcortical network. The brainstem, a phylogenetically ancient and conserved brain structure, could serve as the major integrative machinery to produce this form of consciousness, which is called affective consciousness—the evolutionary antecedent to the reflective consciousness that allows humans to reflect on their experiences. The functional convergence of the brainstem with the amygdala, motor system, and other subcortical structures provides the necessary architecture to support an affective state of consciousness by which instinctual-emotional goal-directed behavior is produced. This subcortical system operates by what Merker (2007) calls the selection triangle—an interface between bodily actions (action selection), the world (target selection), and personal motivation—to produce action through integration. By this model, it is possible that consciousness may persist in the absence of a cortex, such as in the persistent vegetative state. Because of this, it is necessary to establish that multiple forms of consciousness exist and to distinguish between affective and reflective consciousness, because such a distinction would have tremendous ethical implications in the conventional medical treatment of those with disorders of consciousness
Symptom profiles and rates of diagnosis in autistic and other atypically developing infants and toddlers
Although there are many assessment scales that aid in the diagnosis of Autism Spectrum Disorders (ASD), very few instruments are designed specifically to identify the condition in the population of infants and toddlers. The primary purpose of this study is to systematically examine the differences between scores on the Baby and Infant Screen for Children with aUtIsm Traits (BISCUIT)-Part 1 in a sample of at risk atypically developing children. Participants are children enrolled in Louisiana’s EarlySteps Program, which provides support services (e.g., speech therapy, occupational therapy, physical therapy, behavior psychology) to infants/toddlers and their families from birth to 36 months of age. All children enrolled in EarlySteps have a medical condition that is likely to result in a developmental delay/atypical development (e.g., premature birth, seizure disorders, Down Syndrome), or are currently diagnosed with developmental delays. Using one-way analysis of variance (ANOVA) and multivariate analysis of variance (MANOVA), three groups (ASD alone, seizure disorder and ASD, and premature and ASD) were compared on BISCUIT-Part 1 total and subscale scores. Implications of these results and directions for future research are discussed
- …