44,610 research outputs found
MaskLab: Instance Segmentation by Refining Object Detection with Semantic and Direction Features
In this work, we tackle the problem of instance segmentation, the task of
simultaneously solving object detection and semantic segmentation. Towards this
goal, we present a model, called MaskLab, which produces three outputs: box
detection, semantic segmentation, and direction prediction. Building on top of
the Faster-RCNN object detector, the predicted boxes provide accurate
localization of object instances. Within each region of interest, MaskLab
performs foreground/background segmentation by combining semantic and direction
prediction. Semantic segmentation assists the model in distinguishing between
objects of different semantic classes including background, while the direction
prediction, estimating each pixel's direction towards its corresponding center,
allows separating instances of the same semantic class. Moreover, we explore
the effect of incorporating recent successful methods from both segmentation
and detection (i.e. atrous convolution and hypercolumn). Our proposed model is
evaluated on the COCO instance segmentation benchmark and shows comparable
performance with other state-of-art models.Comment: 10 pages including referenc
Hotels-50K: A Global Hotel Recognition Dataset
Recognizing a hotel from an image of a hotel room is important for human
trafficking investigations. Images directly link victims to places and can help
verify where victims have been trafficked, and where their traffickers might
move them or others in the future. Recognizing the hotel from images is
challenging because of low image quality, uncommon camera perspectives, large
occlusions (often the victim), and the similarity of objects (e.g., furniture,
art, bedding) across different hotel rooms.
To support efforts towards this hotel recognition task, we have curated a
dataset of over 1 million annotated hotel room images from 50,000 hotels. These
images include professionally captured photographs from travel websites and
crowd-sourced images from a mobile application, which are more similar to the
types of images analyzed in real-world investigations. We present a baseline
approach based on a standard network architecture and a collection of
data-augmentation approaches tuned to this problem domain
- …