160,813 research outputs found

    Benchmark evaluation of object segmentation proposal

    Get PDF
    Abstract. In this research, we provide an in depth analysis and evaluation of four recent segmentation proposals algorithms on PASCAL VOC benchmark. The principal goal of this study is to investigate these object detection proposal methods in an un-biased evaluation framework. Despite having a widespread application, the strengths and weaknesses of different segmentation proposal methods with respect to each other are mostly not completely clear in the previous works. This thesis provides additional insights to the segmentation proposal methods. In order to evaluate the quality of proposals we plot the recall as a function of average number of regions per image. PASCAL VOC 2012 Object categories, where the methodologies show high performance and instances where these algorithms suffer low recall is also discussed in this work. Experimental evaluation reveals that, despite being different in the operational nature, generally all segmentation proposal methods share similar strengths and weaknesses. The analysis also show how one could select a proposal generation method based on object attributes. Finally we show that, improvement in recall can be obtained by merging the proposals of different algorithms together. Experimental evaluation shows that this merging approach outperforms individual algorithms both in terms of precision and recall

    Object Referring in Visual Scene with Spoken Language

    Full text link
    Object referring has important applications, especially for human-machine interaction. While having received great attention, the task is mainly attacked with written language (text) as input rather than spoken language (speech), which is more natural. This paper investigates Object Referring with Spoken Language (ORSpoken) by presenting two datasets and one novel approach. Objects are annotated with their locations in images, text descriptions and speech descriptions. This makes the datasets ideal for multi-modality learning. The approach is developed by carefully taking down ORSpoken problem into three sub-problems and introducing task-specific vision-language interactions at the corresponding levels. Experiments show that our method outperforms competing methods consistently and significantly. The approach is also evaluated in the presence of audio noise, showing the efficacy of the proposed vision-language interaction methods in counteracting background noise.Comment: 10 pages, Submitted to WACV 201

    Neural Motifs: Scene Graph Parsing with Global Context

    Full text link
    We investigate the problem of producing structured graph representations of visual scenes. Our work analyzes the role of motifs: regularly appearing substructures in scene graphs. We present new quantitative insights on such repeated structures in the Visual Genome dataset. Our analysis shows that object labels are highly predictive of relation labels but not vice-versa. We also find that there are recurring patterns even in larger subgraphs: more than 50% of graphs contain motifs involving at least two relations. Our analysis motivates a new baseline: given object detections, predict the most frequent relation between object pairs with the given labels, as seen in the training set. This baseline improves on the previous state-of-the-art by an average of 3.6% relative improvement across evaluation settings. We then introduce Stacked Motif Networks, a new architecture designed to capture higher order motifs in scene graphs that further improves over our strong baseline by an average 7.1% relative gain. Our code is available at github.com/rowanz/neural-motifs.Comment: CVPR 2018 camera read

    How good are detection proposals, really?

    Full text link
    Current top performing Pascal VOC object detectors employ detection proposals to guide the search for objects thereby avoiding exhaustive sliding window search across images. Despite the popularity of detection proposals, it is unclear which trade-offs are made when using them during object detection. We provide an in depth analysis of ten object proposal methods along with four baselines regarding ground truth annotation recall (on Pascal VOC 2007 and ImageNet 2013), repeatability, and impact on DPM detector performance. Our findings show common weaknesses of existing methods, and provide insights to choose the most adequate method for different settings

    Repulsion Loss: Detecting Pedestrians in a Crowd

    Full text link
    Detecting individual pedestrians in a crowd remains a challenging problem since the pedestrians often gather together and occlude each other in real-world scenarios. In this paper, we first explore how a state-of-the-art pedestrian detector is harmed by crowd occlusion via experimentation, providing insights into the crowd occlusion problem. Then, we propose a novel bounding box regression loss specifically designed for crowd scenes, termed repulsion loss. This loss is driven by two motivations: the attraction by target, and the repulsion by other surrounding objects. The repulsion term prevents the proposal from shifting to surrounding objects thus leading to more crowd-robust localization. Our detector trained by repulsion loss outperforms all the state-of-the-art methods with a significant improvement in occlusion cases.Comment: Accepted to IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 201

    Reanalyzing Chisholm Paradox. Structural Insights

    Get PDF
    In this paper I focus on the conditions that have to be met for Chisholm’s Paradox (CP) to occur. My claim is that identity and structure are notions closely related to each other. I propose a discussion in which the minimal framework for CP is set, then analyze the paradox in terms of S5, and suggest that in order to capture the core of the paradox one should use a dynamic valuation function for the model. Identity appears, at this point, to be dependent upon a structuralist point of vie
    • …
    corecore