160,813 research outputs found
Benchmark evaluation of object segmentation proposal
Abstract. In this research, we provide an in depth analysis and evaluation of four recent segmentation proposals algorithms on PASCAL VOC benchmark. The principal goal of this study is to investigate these object detection proposal methods in an un-biased evaluation framework.
Despite having a widespread application, the strengths and weaknesses of different segmentation proposal methods with respect to each other are mostly not completely clear in the previous works. This thesis provides additional insights to the segmentation proposal methods. In order to evaluate the quality of proposals we plot the recall as a function of average number of regions per image. PASCAL VOC 2012 Object categories, where the methodologies show high performance and instances where these algorithms suffer low recall is also discussed in this work. Experimental evaluation reveals that, despite being different in the operational nature, generally all segmentation proposal methods share similar strengths and weaknesses. The analysis also show how one could select a proposal generation method based on object attributes.
Finally we show that, improvement in recall can be obtained by merging the proposals of different algorithms together. Experimental evaluation shows that this merging approach outperforms individual algorithms both in terms of precision and recall
Object Referring in Visual Scene with Spoken Language
Object referring has important applications, especially for human-machine
interaction. While having received great attention, the task is mainly attacked
with written language (text) as input rather than spoken language (speech),
which is more natural. This paper investigates Object Referring with Spoken
Language (ORSpoken) by presenting two datasets and one novel approach. Objects
are annotated with their locations in images, text descriptions and speech
descriptions. This makes the datasets ideal for multi-modality learning. The
approach is developed by carefully taking down ORSpoken problem into three
sub-problems and introducing task-specific vision-language interactions at the
corresponding levels. Experiments show that our method outperforms competing
methods consistently and significantly. The approach is also evaluated in the
presence of audio noise, showing the efficacy of the proposed vision-language
interaction methods in counteracting background noise.Comment: 10 pages, Submitted to WACV 201
Neural Motifs: Scene Graph Parsing with Global Context
We investigate the problem of producing structured graph representations of
visual scenes. Our work analyzes the role of motifs: regularly appearing
substructures in scene graphs. We present new quantitative insights on such
repeated structures in the Visual Genome dataset. Our analysis shows that
object labels are highly predictive of relation labels but not vice-versa. We
also find that there are recurring patterns even in larger subgraphs: more than
50% of graphs contain motifs involving at least two relations. Our analysis
motivates a new baseline: given object detections, predict the most frequent
relation between object pairs with the given labels, as seen in the training
set. This baseline improves on the previous state-of-the-art by an average of
3.6% relative improvement across evaluation settings. We then introduce Stacked
Motif Networks, a new architecture designed to capture higher order motifs in
scene graphs that further improves over our strong baseline by an average 7.1%
relative gain. Our code is available at github.com/rowanz/neural-motifs.Comment: CVPR 2018 camera read
How good are detection proposals, really?
Current top performing Pascal VOC object detectors employ detection proposals
to guide the search for objects thereby avoiding exhaustive sliding window
search across images. Despite the popularity of detection proposals, it is
unclear which trade-offs are made when using them during object detection. We
provide an in depth analysis of ten object proposal methods along with four
baselines regarding ground truth annotation recall (on Pascal VOC 2007 and
ImageNet 2013), repeatability, and impact on DPM detector performance. Our
findings show common weaknesses of existing methods, and provide insights to
choose the most adequate method for different settings
Repulsion Loss: Detecting Pedestrians in a Crowd
Detecting individual pedestrians in a crowd remains a challenging problem
since the pedestrians often gather together and occlude each other in
real-world scenarios. In this paper, we first explore how a state-of-the-art
pedestrian detector is harmed by crowd occlusion via experimentation, providing
insights into the crowd occlusion problem. Then, we propose a novel bounding
box regression loss specifically designed for crowd scenes, termed repulsion
loss. This loss is driven by two motivations: the attraction by target, and the
repulsion by other surrounding objects. The repulsion term prevents the
proposal from shifting to surrounding objects thus leading to more crowd-robust
localization. Our detector trained by repulsion loss outperforms all the
state-of-the-art methods with a significant improvement in occlusion cases.Comment: Accepted to IEEE Conference on Computer Vision and Pattern
Recognition (CVPR) 201
Reanalyzing Chisholm Paradox. Structural Insights
In this paper I focus on the conditions that have to be met for Chisholm’s
Paradox (CP) to occur. My claim is that identity and structure are notions closely
related to each other. I propose a discussion in which the minimal framework for CP
is set, then analyze the paradox in terms of S5, and suggest that in order to capture the
core of the paradox one should use a dynamic valuation function for the model.
Identity appears, at this point, to be dependent upon a structuralist point of vie
- …