37,210 research outputs found
Negative Results in Computer Vision: A Perspective
A negative result is when the outcome of an experiment or a model is not what
is expected or when a hypothesis does not hold. Despite being often overlooked
in the scientific community, negative results are results and they carry value.
While this topic has been extensively discussed in other fields such as social
sciences and biosciences, less attention has been paid to it in the computer
vision community. The unique characteristics of computer vision, particularly
its experimental aspect, call for a special treatment of this matter. In this
paper, I will address what makes negative results important, how they should be
disseminated and incentivized, and what lessons can be learned from cognitive
vision research in this regard. Further, I will discuss issues such as computer
vision and human vision interaction, experimental design and statistical
hypothesis testing, explanatory versus predictive modeling, performance
evaluation, model comparison, as well as computer vision research culture
An Empirical Study Comparing Unobtrusive Physiological Sensors for Stress Detection in Computer Work.
Several unobtrusive sensors have been tested in studies to capture physiological reactions to stress in workplace settings. Lab studies tend to focus on assessing sensors during a specific computer task, while in situ studies tend to offer a generalized view of sensors' efficacy for workplace stress monitoring, without discriminating different tasks. Given the variation in workplace computer activities, this study investigates the efficacy of unobtrusive sensors for stress measurement across a variety of tasks. We present a comparison of five physiological measurements obtained in a lab experiment, where participants completed six different computer tasks, while we measured their stress levels using a chest-band (ECG, respiration), a wristband (PPG and EDA), and an emerging thermal imaging method (perinasal perspiration). We found that thermal imaging can detect increased stress for most participants across all tasks, while wrist and chest sensors were less generalizable across tasks and participants. We summarize the costs and benefits of each sensor stream, and show how some computer use scenarios present usability and reliability challenges for stress monitoring with certain physiological sensors. We provide recommendations for researchers and system builders for measuring stress with physiological sensors during workplace computer use
Does comorbid anxiety counteract emotion recognition deficits in conduct disorder?
Background: Previous research has reported altered emotion recognition in both conduct disorder (CD) and anxiety disorders (ADs) - but these effects appear to be of different kinds. Adolescents with CD often show a generalised pattern of deficits, while those with ADs show hypersensitivity to specific negative emotions. Although these conditions often cooccur, little is known regarding emotion recognition performance in comorbid CD+ADs. Here, we test the hypothesis that in the comorbid case, anxiety-related emotion hypersensitivity counteracts the emotion recognition deficits typically observed in CD. Method: We compared facial emotion recognition across four groups of adolescents aged 12-18 years: those with CD alone (n = 28), ADs alone (n = 23), cooccurring CD+ADs (n = 20) and typically developing controls (n = 28). The emotion recognition task we used systematically manipulated the emotional intensity of facial expressions as well as fixation location (eye, nose or mouth region). Results: Conduct disorder was associated with a generalised impairment in emotion recognition; however, this may have been modulated by group differences in IQ. AD was associated with increased sensitivity to low-intensity happiness, disgust and sadness. In general, the comorbid CD+ADs group performed similarly to typically developing controls. Conclusions: Although CD alone was associated with emotion recognition impairments, ADs and comorbid CD+ADs were associated with normal or enhanced emotion recognition performance. The presence of comorbid ADs appeared to counteract the effects of CD, suggesting a potentially protective role, although future research should examine the contribution of IQ and gender to these effects
What is Holding Back Convnets for Detection?
Convolutional neural networks have recently shown excellent results in
general object detection and many other tasks. Albeit very effective, they
involve many user-defined design choices. In this paper we want to better
understand these choices by inspecting two key aspects "what did the network
learn?", and "what can the network learn?". We exploit new annotations
(Pascal3D+), to enable a new empirical analysis of the R-CNN detector. Despite
common belief, our results indicate that existing state-of-the-art convnet
architectures are not invariant to various appearance factors. In fact, all
considered networks have similar weak points which cannot be mitigated by
simply increasing the training data (architectural changes are needed). We show
that overall performance can improve when using image renderings for data
augmentation. We report the best known results on the Pascal3D+ detection and
view-point estimation tasks
Infrared face recognition: a comprehensive review of methodologies and databases
Automatic face recognition is an area with immense practical potential which
includes a wide range of commercial and law enforcement applications. Hence it
is unsurprising that it continues to be one of the most active research areas
of computer vision. Even after over three decades of intense research, the
state-of-the-art in face recognition continues to improve, benefitting from
advances in a range of different research fields such as image processing,
pattern recognition, computer graphics, and physiology. Systems based on
visible spectrum images, the most researched face recognition modality, have
reached a significant level of maturity with some practical success. However,
they continue to face challenges in the presence of illumination, pose and
expression changes, as well as facial disguises, all of which can significantly
decrease recognition accuracy. Amongst various approaches which have been
proposed in an attempt to overcome these limitations, the use of infrared (IR)
imaging has emerged as a particularly promising research direction. This paper
presents a comprehensive and timely review of the literature on this subject.
Our key contributions are: (i) a summary of the inherent properties of infrared
imaging which makes this modality promising in the context of face recognition,
(ii) a systematic review of the most influential approaches, with a focus on
emerging common trends as well as key differences between alternative
methodologies, (iii) a description of the main databases of infrared facial
images available to the researcher, and lastly (iv) a discussion of the most
promising avenues for future research.Comment: Pattern Recognition, 2014. arXiv admin note: substantial text overlap
with arXiv:1306.160
Scene Graph Generation via Conditional Random Fields
Despite the great success object detection and segmentation models have
achieved in recognizing individual objects in images, performance on cognitive
tasks such as image caption, semantic image retrieval, and visual QA is far
from satisfactory. To achieve better performance on these cognitive tasks,
merely recognizing individual object instances is insufficient. Instead, the
interactions between object instances need to be captured in order to
facilitate reasoning and understanding of the visual scenes in an image. Scene
graph, a graph representation of images that captures object instances and
their relationships, offers a comprehensive understanding of an image. However,
existing techniques on scene graph generation fail to distinguish subjects and
objects in the visual scenes of images and thus do not perform well with
real-world datasets where exist ambiguous object instances. In this work, we
propose a novel scene graph generation model for predicting object instances
and its corresponding relationships in an image. Our model, SG-CRF, learns the
sequential order of subject and object in a relationship triplet, and the
semantic compatibility of object instance nodes and relationship nodes in a
scene graph efficiently. Experiments empirically show that SG-CRF outperforms
the state-of-the-art methods, on three different datasets, i.e., CLEVR, VRD,
and Visual Genome, raising the Recall@100 from 24.99% to 49.95%, from 41.92% to
50.47%, and from 54.69% to 54.77%, respectively
Fixation prediction with a combined model of bottom-up saliency and vanishing point
By predicting where humans look in natural scenes, we can understand how they
perceive complex natural scenes and prioritize information for further
high-level visual processing. Several models have been proposed for this
purpose, yet there is a gap between best existing saliency models and human
performance. While many researchers have developed purely computational models
for fixation prediction, less attempts have been made to discover cognitive
factors that guide gaze. Here, we study the effect of a particular type of
scene structural information, known as the vanishing point, and show that human
gaze is attracted to the vanishing point regions. We record eye movements of 10
observers over 532 images, out of which 319 have vanishing points. We then
construct a combined model of traditional saliency and a vanishing point
channel and show that our model outperforms state of the art saliency models
using three scores on our dataset.Comment: arXiv admin note: text overlap with arXiv:1512.0172
- …