1,728 research outputs found
Oriented Point Sampling for Plane Detection in Unorganized Point Clouds
Plane detection in 3D point clouds is a crucial pre-processing step for
applications such as point cloud segmentation, semantic mapping and SLAM. In
contrast to many recent plane detection methods that are only applicable on
organized point clouds, our work is targeted to unorganized point clouds that
do not permit a 2D parametrization. We compare three methods for detecting
planes in point clouds efficiently. One is a novel method proposed in this
paper that generates plane hypotheses by sampling from a set of points with
estimated normals. We named this method Oriented Point Sampling (OPS) to
contrast with more conventional techniques that require the sampling of three
unoriented points to generate plane hypotheses. We also implemented an
efficient plane detection method based on local sampling of three unoriented
points and compared it with OPS and the 3D-KHT algorithm, which is based on
octrees, on the detection of planes on 10,000 point clouds from the SUN RGB-D
dataset.Comment: 7 pages, 3 figures, 2019 IEEE International Conference on Robotics
and Automation (Accepted
Local Color Contrastive Descriptor for Image Classification
Image representation and classification are two fundamental tasks towards
multimedia content retrieval and understanding. The idea that shape and texture
information (e.g. edge or orientation) are the key features for visual
representation is ingrained and dominated in current multimedia and computer
vision communities. A number of low-level features have been proposed by
computing local gradients (e.g. SIFT, LBP and HOG), and have achieved great
successes on numerous multimedia applications. In this paper, we present a
simple yet efficient local descriptor for image classification, referred as
Local Color Contrastive Descriptor (LCCD), by leveraging the neural mechanisms
of color contrast. The idea originates from the observation in neural science
that color and shape information are linked inextricably in visual cortical
processing. The color contrast yields key information for visual color
perception and provides strong linkage between color and shape. We propose a
novel contrastive mechanism to compute the color contrast in both spatial
location and multiple channels. The color contrast is computed by measuring
\emph{f}-divergence between the color distributions of two regions. Our
descriptor enriches local image representation with both color and contrast
information. We verified experimentally that it can compensate strongly for the
shape based descriptor (e.g. SIFT), while keeping computationally simple.
Extensive experimental results on image classification show that our descriptor
improves the performance of SIFT substantially by combinations, and achieves
the state-of-the-art performance on three challenging benchmark datasets. It
improves recent Deep Learning model (DeCAF) [1] largely from the accuracy of
40.94% to 49.68% in the large scale SUN397 database. Codes for the LCCD will be
available
DAP3D-Net: Where, What and How Actions Occur in Videos?
Action parsing in videos with complex scenes is an interesting but
challenging task in computer vision. In this paper, we propose a generic 3D
convolutional neural network in a multi-task learning manner for effective Deep
Action Parsing (DAP3D-Net) in videos. Particularly, in the training phase,
action localization, classification and attributes learning can be jointly
optimized on our appearancemotion data via DAP3D-Net. For an upcoming test
video, we can describe each individual action in the video simultaneously as:
Where the action occurs, What the action is and How the action is performed. To
well demonstrate the effectiveness of the proposed DAP3D-Net, we also
contribute a new Numerous-category Aligned Synthetic Action dataset, i.e.,
NASA, which consists of 200; 000 action clips of more than 300 categories and
with 33 pre-defined action attributes in two hierarchical levels (i.e.,
low-level attributes of basic body part movements and high-level attributes
related to action motion). We learn DAP3D-Net using the NASA dataset and then
evaluate it on our collected Human Action Understanding (HAU) dataset.
Experimental results show that our approach can accurately localize, categorize
and describe multiple actions in realistic videos
Large-Scale Mapping of Human Activity using Geo-Tagged Videos
This paper is the first work to perform spatio-temporal mapping of human
activity using the visual content of geo-tagged videos. We utilize a recent
deep-learning based video analysis framework, termed hidden two-stream
networks, to recognize a range of activities in YouTube videos. This framework
is efficient and can run in real time or faster which is important for
recognizing events as they occur in streaming video or for reducing latency in
analyzing already captured video. This is, in turn, important for using video
in smart-city applications. We perform a series of experiments to show our
approach is able to accurately map activities both spatially and temporally. We
also demonstrate the advantages of using the visual content over the
tags/titles.Comment: Accepted at ACM SIGSPATIAL 201
Conceptual spatial representations for indoor mobile robots
We present an approach for creating conceptual representations of human-made indoor environments using mobile
robots. The concepts refer to spatial and functional properties of typical indoor environments. Following ļ¬ndings
in cognitive psychology, our model is composed of layers representing maps at diļ¬erent levels of abstraction. The
complete system is integrated in a mobile robot endowed with laser and vision sensors for place and object recognition.
The system also incorporates a linguistic framework that actively supports the map acquisition process, and which
is used for situated dialogue. Finally, we discuss the capabilities of the integrated system
Cross-Modal Attentional Context Learning for RGB-D Object Detection
Recognizing objects from simultaneously sensed photometric (RGB) and depth
channels is a fundamental yet practical problem in many machine vision
applications such as robot grasping and autonomous driving. In this paper, we
address this problem by developing a Cross-Modal Attentional Context (CMAC)
learning framework, which enables the full exploitation of the context
information from both RGB and depth data. Compared to existing RGB-D object
detection frameworks, our approach has several appealing properties. First, it
consists of an attention-based global context model for exploiting adaptive
contextual information and incorporating this information into a region-based
CNN (e.g., Fast RCNN) framework to achieve improved object detection
performance. Second, our CMAC framework further contains a fine-grained object
part attention module to harness multiple discriminative object parts inside
each possible object region for superior local feature representation. While
greatly improving the accuracy of RGB-D object detection, the effective
cross-modal information fusion as well as attentional context modeling in our
proposed model provide an interpretable visualization scheme. Experimental
results demonstrate that the proposed method significantly improves upon the
state of the art on all public benchmarks.Comment: Accept as a regular paper to IEEE Transactions on Image Processin
Mini-Unmanned Aerial Vehicle-Based Remote Sensing: Techniques, Applications, and Prospects
The past few decades have witnessed the great progress of unmanned aircraft
vehicles (UAVs) in civilian fields, especially in photogrammetry and remote
sensing. In contrast with the platforms of manned aircraft and satellite, the
UAV platform holds many promising characteristics: flexibility, efficiency,
high-spatial/temporal resolution, low cost, easy operation, etc., which make it
an effective complement to other remote-sensing platforms and a cost-effective
means for remote sensing. Considering the popularity and expansion of UAV-based
remote sensing in recent years, this paper provides a systematic survey on the
recent advances and future prospectives of UAVs in the remote-sensing
community. Specifically, the main challenges and key technologies of
remote-sensing data processing based on UAVs are discussed and summarized
firstly. Then, we provide an overview of the widespread applications of UAVs in
remote sensing. Finally, some prospects for future work are discussed. We hope
this paper will provide remote-sensing researchers an overall picture of recent
UAV-based remote sensing developments and help guide the further research on
this topic
Deep Learning for LiDAR Point Clouds in Autonomous Driving: A Review
Recently, the advancement of deep learning in discriminative feature learning
from 3D LiDAR data has led to rapid development in the field of autonomous
driving. However, automated processing uneven, unstructured, noisy, and massive
3D point clouds is a challenging and tedious task. In this paper, we provide a
systematic review of existing compelling deep learning architectures applied in
LiDAR point clouds, detailing for specific tasks in autonomous driving such as
segmentation, detection, and classification. Although several published
research papers focus on specific topics in computer vision for autonomous
vehicles, to date, no general survey on deep learning applied in LiDAR point
clouds for autonomous vehicles exists. Thus, the goal of this paper is to
narrow the gap in this topic. More than 140 key contributions in the recent
five years are summarized in this survey, including the milestone 3D deep
architectures, the remarkable deep learning applications in 3D semantic
segmentation, object detection, and classification; specific datasets,
evaluation metrics, and the state of the art performance. Finally, we conclude
the remaining challenges and future researches.Comment: 21 pages, submitted to IEEE Transactions on Neural Networks and
Learning System
Video Object Detection with an Aligned Spatial-Temporal Memory
We introduce Spatial-Temporal Memory Networks for video object detection. At
its core, a novel Spatial-Temporal Memory module (STMM) serves as the recurrent
computation unit to model long-term temporal appearance and motion dynamics.
The STMM's design enables full integration of pretrained backbone CNN weights,
which we find to be critical for accurate detection. Furthermore, in order to
tackle object motion in videos, we propose a novel MatchTrans module to align
the spatial-temporal memory from frame to frame. Our method produces
state-of-the-art results on the benchmark ImageNet VID dataset, and our
ablative studies clearly demonstrate the contribution of our different design
choices. We release our code and models at
http://fanyix.cs.ucdavis.edu/project/stmn/project.html
EdgeLoc: An Edge-IoT Framework for Robust Indoor Localization Using Capsule Networks
With the unprecedented demand for location-based services in indoor
scenarios, wireless indoor localization has become essential for mobile users.
While GPS is not available at indoor spaces, WiFi RSS fingerprinting has become
popular with its ubiquitous accessibility. However, it is challenging to
achieve robust and efficient indoor localization with two major challenges.
First, the localization accuracy can be degraded by the random signal
fluctuations, which would influence conventional localization algorithms that
simply learn handcrafted features from raw fingerprint data. Second, mobile
users are sensitive to the localization delay, but conventional indoor
localization algorithms are computation-intensive and time-consuming. In this
paper, we propose EdgeLoc, an edge-IoT framework for efficient and robust
indoor localization using capsule networks. We develop a deep learning model
with the CapsNet to efficiently extract hierarchical information from WiFi
fingerprint data, thereby significantly improving the localization accuracy.
Moreover, we implement an edge-computing prototype system to achieve a nearly
real-time localization process, by enabling mobile users with the deep-learning
model that has been well-trained by the edge server. We conduct a real-world
field experimental study with over 33,600 data points and an extensive
synthetic experiment with the open dataset, and the experimental results
validate the effectiveness of EdgeLoc. The best trade-off of the EdgeLoc system
achieves 98.5% localization accuracy within an average positioning time of only
2.31 ms in the field experiment.Comment: 11 pages, 12 figure
- ā¦