323,316 research outputs found
NeRF-RPN: A general framework for object detection in NeRFs
This paper presents the first significant object detection framework,
NeRF-RPN, which directly operates on NeRF. Given a pre-trained NeRF model,
NeRF-RPN aims to detect all bounding boxes of objects in a scene. By exploiting
a novel voxel representation that incorporates multi-scale 3D neural volumetric
features, we demonstrate it is possible to regress the 3D bounding boxes of
objects in NeRF directly without rendering the NeRF at any viewpoint. NeRF-RPN
is a general framework and can be applied to detect objects without class
labels. We experimented the NeRF-RPN with various backbone architectures, RPN
head designs and loss functions. All of them can be trained in an end-to-end
manner to estimate high quality 3D bounding boxes. To facilitate future
research in object detection for NeRF, we built a new benchmark dataset which
consists of both synthetic and real-world data with careful labeling and clean
up. Please click https://youtu.be/M8_4Ih1CJjE for visualizing the 3D region
proposals by our NeRF-RPN. Code and dataset will be made available
Computer Vision-based Accident Detection in Traffic Surveillance
Computer vision-based accident detection through video surveillance has
become a beneficial but daunting task. In this paper, a neoteric framework for
detection of road accidents is proposed. The proposed framework capitalizes on
Mask R-CNN for accurate object detection followed by an efficient centroid
based object tracking algorithm for surveillance footage. The probability of an
accident is determined based on speed and trajectory anomalies in a vehicle
after an overlap with other vehicles. The proposed framework provides a robust
method to achieve a high Detection Rate and a low False Alarm Rate on general
road-traffic CCTV surveillance footage. This framework was evaluated on diverse
conditions such as broad daylight, low visibility, rain, hail, and snow using
the proposed dataset. This framework was found effective and paves the way to
the development of general-purpose vehicular accident detection algorithms in
real-time.Comment: Accepted in 10th ICCCNT 201
Multi-Label Image Classification via Knowledge Distillation from Weakly-Supervised Detection
Multi-label image classification is a fundamental but challenging task
towards general visual understanding. Existing methods found the region-level
cues (e.g., features from RoIs) can facilitate multi-label classification.
Nevertheless, such methods usually require laborious object-level annotations
(i.e., object labels and bounding boxes) for effective learning of the
object-level visual features. In this paper, we propose a novel and efficient
deep framework to boost multi-label classification by distilling knowledge from
weakly-supervised detection task without bounding box annotations.
Specifically, given the image-level annotations, (1) we first develop a
weakly-supervised detection (WSD) model, and then (2) construct an end-to-end
multi-label image classification framework augmented by a knowledge
distillation module that guides the classification model by the WSD model
according to the class-level predictions for the whole image and the
object-level visual features for object RoIs. The WSD model is the teacher
model and the classification model is the student model. After this cross-task
knowledge distillation, the performance of the classification model is
significantly improved and the efficiency is maintained since the WSD model can
be safely discarded in the test phase. Extensive experiments on two large-scale
datasets (MS-COCO and NUS-WIDE) show that our framework achieves superior
performances over the state-of-the-art methods on both performance and
efficiency.Comment: accepted by ACM Multimedia 2018, 9 pages, 4 figures, 5 table
Fused Text Segmentation Networks for Multi-oriented Scene Text Detection
In this paper, we introduce a novel end-end framework for multi-oriented
scene text detection from an instance-aware semantic segmentation perspective.
We present Fused Text Segmentation Networks, which combine multi-level features
during the feature extracting as text instance may rely on finer feature
expression compared to general objects. It detects and segments the text
instance jointly and simultaneously, leveraging merits from both semantic
segmentation task and region proposal based object detection task. Not
involving any extra pipelines, our approach surpasses the current state of the
art on multi-oriented scene text detection benchmarks: ICDAR2015 Incidental
Scene Text and MSRA-TD500 reaching Hmean 84.1% and 82.0% respectively. Morever,
we report a baseline on total-text containing curved text which suggests
effectiveness of the proposed approach.Comment: Accepted by ICPR201
Efficient object detection via structured learning and local classifiers
Object detection has made great strides recently. However, it is still facing two big challenges: detection accuracy and computational efficiency. In this thesis, we present an automatic efficient object detection frarnework to detect object instances ·in images using bounding boxes, which can be trained and tested easily on current personal computers. Our framework is a sliding-window based approach, and consists of two major components: (1) efficient object proposal generation, predicting possible object bounding boxes, and (2) efficient object proposal verification, classifying each bounding box in a multiclass manner.
For object proposal generation, we formulate this problem as a structured learning problem and investigate structural support vector machines (SSVMs) with our proposed scale/aspect-ratio quantization scheme and ranking constraints. A general ranking-order decomposition algorithm is developed for solving the formulation efficiently, and applied to generate proposals using a two-stage cascade. Using image gradients as features, our object proposal generation method achieves
state-of-the-art results in terms Df object recall at a low cost in computation.
For object proposal verification, we propose two locally linear and one locally nonlinear classifiers to approximate the nonlinear decision boundaries in the feature space efficiently. Inspired by the kernel trick, these classifiers map the original features into another feature space explicitly where linear classifiers are employed for classification, and thus have linear computational complexity in
both training and testing, similar to that of linear classifiers. Therefore, in general,
our classifiers can achieve comparable accuracy to kernel based classifiers at
the cost of lower computational time.
To demonstrate its efficiency and generality, our framework is applied to four different object detection tasks: VOC detection challenges, traffic sign detection, pedestrian detection, and face detection. In each task, it can perform reasonably well with acceptable detection accuracy and good computational efficiency. For instance, on VOC datasets with 20 object classes, our method achieved about 0.1 mean average precision (AP) within 2 hours of training and 0.05 second of testing a 500 x 300 pixel image using a mixture of MATLAB and C++ code on a current personal computer
- …