2,641 research outputs found
CoupleNet: Coupling Global Structure with Local Parts for Object Detection
The region-based Convolutional Neural Network (CNN) detectors such as Faster
R-CNN or R-FCN have already shown promising results for object detection by
combining the region proposal subnetwork and the classification subnetwork
together. Although R-FCN has achieved higher detection speed while keeping the
detection performance, the global structure information is ignored by the
position-sensitive score maps. To fully explore the local and global
properties, in this paper, we propose a novel fully convolutional network,
named as CoupleNet, to couple the global structure with local parts for object
detection. Specifically, the object proposals obtained by the Region Proposal
Network (RPN) are fed into the the coupling module which consists of two
branches. One branch adopts the position-sensitive RoI (PSRoI) pooling to
capture the local part information of the object, while the other employs the
RoI pooling to encode the global and context information. Next, we design
different coupling strategies and normalization ways to make full use of the
complementary advantages between the global and local branches. Extensive
experiments demonstrate the effectiveness of our approach. We achieve
state-of-the-art results on all three challenging datasets, i.e. a mAP of 82.7%
on VOC07, 80.4% on VOC12, and 34.4% on COCO. Codes will be made publicly
available.Comment: Accepted by ICCV 201
Feature Selective Networks for Object Detection
Objects for detection usually have distinct characteristics in different
sub-regions and different aspect ratios. However, in prevalent two-stage object
detection methods, Region-of-Interest (RoI) features are extracted by RoI
pooling with little emphasis on these translation-variant feature components.
We present feature selective networks to reform the feature representations of
RoIs by exploiting their disparities among sub-regions and aspect ratios. Our
network produces the sub-region attention bank and aspect ratio attention bank
for the whole image. The RoI-based sub-region attention map and aspect ratio
attention map are selectively pooled from the banks, and then used to refine
the original RoI features for RoI classification. Equipped with a light-weight
detection subnetwork, our network gets a consistent boost in detection
performance based on general ConvNet backbones (ResNet-101, GoogLeNet and
VGG-16). Without bells and whistles, our detectors equipped with ResNet-101
achieve more than 3% mAP improvement compared to counterparts on PASCAL VOC
2007, PASCAL VOC 2012 and MS COCO datasets
Single Shot Temporal Action Detection
Temporal action detection is a very important yet challenging problem, since
videos in real applications are usually long, untrimmed and contain multiple
action instances. This problem requires not only recognizing action categories
but also detecting start time and end time of each action instance. Many
state-of-the-art methods adopt the "detection by classification" framework:
first do proposal, and then classify proposals. The main drawback of this
framework is that the boundaries of action instance proposals have been fixed
during the classification step. To address this issue, we propose a novel
Single Shot Action Detector (SSAD) network based on 1D temporal convolutional
layers to skip the proposal generation step via directly detecting action
instances in untrimmed video. On pursuit of designing a particular SSAD network
that can work effectively for temporal action detection, we empirically search
for the best network architecture of SSAD due to lacking existing models that
can be directly adopted. Moreover, we investigate into input feature types and
fusion strategies to further improve detection accuracy. We conduct extensive
experiments on two challenging datasets: THUMOS 2014 and MEXaction2. When
setting Intersection-over-Union threshold to 0.5 during evaluation, SSAD
significantly outperforms other state-of-the-art systems by increasing mAP from
19.0% to 24.6% on THUMOS 2014 and from 7.4% to 11.0% on MEXaction2.Comment: ACM Multimedia 201
Deep Learning for Logo Detection: A Survey
When logos are increasingly created, logo detection has gradually become a
research hotspot across many domains and tasks. Recent advances in this area
are dominated by deep learning-based solutions, where many datasets, learning
strategies, network architectures, etc. have been employed. This paper reviews
the advance in applying deep learning techniques to logo detection. Firstly, we
discuss a comprehensive account of public datasets designed to facilitate
performance evaluation of logo detection algorithms, which tend to be more
diverse, more challenging, and more reflective of real life. Next, we perform
an in-depth analysis of the existing logo detection strategies and the
strengths and weaknesses of each learning strategy. Subsequently, we summarize
the applications of logo detection in various fields, from intelligent
transportation and brand monitoring to copyright and trademark compliance.
Finally, we analyze the potential challenges and present the future directions
for the development of logo detection to complete this survey
A Multi-Level Approach to Waste Object Segmentation
We address the problem of localizing waste objects from a color image and an
optional depth image, which is a key perception component for robotic
interaction with such objects. Specifically, our method integrates the
intensity and depth information at multiple levels of spatial granularity.
Firstly, a scene-level deep network produces an initial coarse segmentation,
based on which we select a few potential object regions to zoom in and perform
fine segmentation. The results of the above steps are further integrated into a
densely connected conditional random field that learns to respect the
appearance, depth, and spatial affinities with pixel-level accuracy. In
addition, we create a new RGBD waste object segmentation dataset, MJU-Waste,
that is made public to facilitate future research in this area. The efficacy of
our method is validated on both MJU-Waste and the Trash Annotation in Context
(TACO) dataset.Comment: Paper appears in Sensors 2020, 20(14), 381
Robust Lightweight Object Detection
Object detection is a very challenging problem in computer vision and has been a prominent subject of research for nearly three decades. There has been a promising in- crease in the accuracy and performance of object detectors ever since deep convolutional networks (CNN) were introduced. CNNs can be trained on large datasets made of high resolution images without flattening them, thereby using the spatial information. Their superior learning ability also makes them ideal for image classification and object de- tection tasks. Unfortunately, this power comes at the big cost of compute and memory. For instance, the Faster R-CNN detector required 180 billion FLOPs for training, and has over 100 million parameters.
In this project, we explore the popular state-of-the-art object detectors and present their contributions and shortcomings. Then we explore the recent lightweight detectors which try to address the issue of high resource requirements by building leaner models. Building upon the contributions of the state-of-the-art object detectors, and recent de- velopments in CNN training, we propose our own lightweight detector. We proposed a novel CNN block, to improve the inter-channel dependency in feature maps, called the inter-channel dependency block (ICDB). Through experiments on benchmark datasets we demonstrated our model attains better accuracy compared to the previous methods. Three benchmarking datasets PASCAL VOC 2007, KITTI and COCO have been used to demonstrate that our model scales well to different scenarios
- …