44,623 research outputs found
CNNs based Viewpoint Estimation for Volume Visualization
Viewpoint estimation from 2D rendered images is helpful in understanding how
users select viewpoints for volume visualization and guiding users to select
better viewpoints based on previous visualizations. In this paper, we propose a
viewpoint estimation method based on Convolutional Neural Networks (CNNs) for
volume visualization. We first design an overfit-resistant image rendering
pipeline to generate the training images with accurate viewpoint annotations,
and then train a category-specific viewpoint classification network to estimate
the viewpoint for the given rendered image. Our method can achieve good
performance on images rendered with different transfer functions and rendering
parameters in several categories. We apply our model to recover the viewpoints
of the rendered images in publications, and show how experts look at volumes.
We also introduce a CNN feature-based image similarity measure for similarity
voting based viewpoint selection, which can suggest semantically meaningful
optimal viewpoints for different volumes and transfer functions
Selective Distillation of Weakly Annotated GTD for Vision-based Slab Identification System
This paper proposes an algorithm for recognizing slab identification numbers
in factory scenes. In the development of a deep-learning based system, manual
labeling to make ground truth data (GTD) is an important but expensive task.
Furthermore, the quality of GTD is closely related to the performance of a
supervised learning algorithm. To reduce manual work in the labeling process,
we generated weakly annotated GTD by marking only character centroids. Whereas
bounding-boxes for characters require at least a drag-and-drop operation or two
clicks to annotate a character location, the weakly annotated GTD requires a
single click to record a character location. The main contribution of this
paper is on selective distillation to improve the quality of the weakly
annotated GTD. Because manual GTD are usually generated by many people, it may
contain personal bias or human error. To address this problem, the information
in manual GTD is integrated and refined by selective distillation. In the
process of selective distillation, a fully convolutional network is trained
using the weakly annotated GTD, and its prediction maps are selectively used to
revise locations and boundaries of semantic regions of characters in the
initial GTD. The modified GTD are used in the main training stage, and a
post-processing is conducted to retrieve text information. Experiments were
thoroughly conducted on actual industry data collected at a steelmaking factory
to demonstrate the effectiveness of the proposed method.Comment: 10 pages, 12 figures, submitted to a journa
Automated building image extraction from 360{\deg} panoramas for postdisaster evaluation
After a disaster, teams of structural engineers collect vast amounts of
images from damaged buildings to obtain new knowledge and extract lessons from
the event. However, in many cases, the images collected are captured without
sufficient spatial context. When damage is severe, it may be quite difficult to
even recognize the building. Accessing images of the pre-disaster condition of
those buildings is required to accurately identify the cause of the failure or
the actual loss in the building. Here, to address this issue, we develop a
method to automatically extract pre-event building images from 360o panorama
images (panoramas). By providing a geotagged image collected near the target
building as the input, panoramas close to the input image location are
automatically downloaded through street view services (e.g., Google or Bing in
the United States). By computing the geometric relationship between the
panoramas and the target building, the most suitable projection direction for
each panorama is identified to generate high-quality 2D images of the building.
Region-based convolutional neural networks are exploited to recognize the
building within those 2D images. Several panoramas are used so that the
detected building images provide various viewpoints of the building. To
demonstrate the capability of the technique, we consider residential buildings
in Holiday Beach, Texas, the United States which experienced significant
devastation in Hurricane Harvey in 2017. Using geotagged images gathered during
actual post-disaster building reconnaissance missions, we verify the method by
successfully extracting residential building images from Google Street View
images, which were captured before the event
A Review of Co-saliency Detection Technique: Fundamentals, Applications, and Challenges
Co-saliency detection is a newly emerging and rapidly growing research area
in computer vision community. As a novel branch of visual saliency, co-saliency
detection refers to the discovery of common and salient foregrounds from two or
more relevant images, and can be widely used in many computer vision tasks. The
existing co-saliency detection algorithms mainly consist of three components:
extracting effective features to represent the image regions, exploring the
informative cues or factors to characterize co-saliency, and designing
effective computational frameworks to formulate co-saliency. Although numerous
methods have been developed, the literature is still lacking a deep review and
evaluation of co-saliency detection techniques. In this paper, we aim at
providing a comprehensive review of the fundamentals, challenges, and
applications of co-saliency detection. Specifically, we provide an overview of
some related computer vision works, review the history of co-saliency
detection, summarize and categorize the major algorithms in this research area,
discuss some open issues in this area, present the potential applications of
co-saliency detection, and finally point out some unsolved challenges and
promising future works. We expect this review to be beneficial to both fresh
and senior researchers in this field, and give insights to researchers in other
related areas regarding the utility of co-saliency detection algorithms.Comment: 28 pages, 12 figures, 3 table
Automated dataset generation for image recognition using the example of taxonomy
This master thesis addresses the subject of automatically generating a
dataset for image recognition, which takes a lot of time when being done
manually. As the thesis was written with motivation from the context of the
biodiversity workgroup at the City University of Applied Sciences Bremen, the
classification of taxonomic entries was chosen as an exemplary use case. In
order to automate the dataset creation, a prototype was conceptualized and
implemented after working out knowledge basics and analyzing requirements for
it. It makes use of an pre-trained abstract artificial intelligence which is
able to sort out images that do not contain the desired content. Subsequent to
the implementation and the automated dataset creation resulting from it, an
evaluation was performed. Other, manually collected datasets were compared to
the one the prototype produced in means of specifications and accuracy. The
results were more than satisfactory and showed that automatically generating a
dataset for image recognition is not only possible, but also might be a decent
alternative to spending time and money in doing this task manually. At the very
end of this work, an idea of how to use the principle of employing abstract
artificial intelligences for step-by-step classification of deeper taxonomic
layers in a productive system is presented and discussed
Algorithms for Semantic Segmentation of Multispectral Remote Sensing Imagery using Deep Learning
Deep convolutional neural networks (DCNNs) have been used to achieve
state-of-the-art performance on many computer vision tasks (e.g., object
recognition, object detection, semantic segmentation) thanks to a large
repository of annotated image data. Large labeled datasets for other sensor
modalities, e.g., multispectral imagery (MSI), are not available due to the
large cost and manpower required. In this paper, we adapt state-of-the-art DCNN
frameworks in computer vision for semantic segmentation for MSI imagery. To
overcome label scarcity for MSI data, we substitute real MSI for generated
synthetic MSI in order to initialize a DCNN framework. We evaluate our network
initialization scheme on the new RIT-18 dataset that we present in this paper.
This dataset contains very-high resolution MSI collected by an unmanned
aircraft system. The models initialized with synthetic imagery were less prone
to over-fitting and provide a state-of-the-art baseline for future work.Comment: 45 page
A Fully Automated System for Sizing Nasal PAP Masks Using Facial Photographs
We present a fully automated system for sizing nasal Positive Airway Pressure
(PAP) masks. The system is comprised of a mix of HOG object detectors as well
as multiple convolutional neural network stages for facial landmark detection.
The models were trained using samples from the publicly available PUT and MUCT
datasets while transfer learning was also employed to improve the performance
of the models on facial photographs of actual PAP mask users. The fully
automated system demonstrated an overall accuracy of 64.71% in correctly
selecting the appropriate mask size and 86.1% accuracy sizing within 1 mask
size.Comment: IEEE EMBS 201
Large-Scale Object Discovery and Detector Adaptation from Unlabeled Video
We explore object discovery and detector adaptation based on unlabeled video
sequences captured from a mobile platform. We propose a fully automatic
approach for object mining from video which builds upon a generic object
tracking approach. By applying this method to three large video datasets from
autonomous driving and mobile robotics scenarios, we demonstrate its robustness
and generality. Based on the object mining results, we propose a novel approach
for unsupervised object discovery by appearance-based clustering. We show that
this approach successfully discovers interesting objects relevant to driving
scenarios. In addition, we perform self-supervised detector adaptation in order
to improve detection performance on the KITTI dataset for existing categories.
Our approach has direct relevance for enabling large-scale object learning for
autonomous driving.Comment: CVPR'18 submissio
Robobarista: Object Part based Transfer of Manipulation Trajectories from Crowd-sourcing in 3D Pointclouds
There is a large variety of objects and appliances in human environments,
such as stoves, coffee dispensers, juice extractors, and so on. It is
challenging for a roboticist to program a robot for each of these object types
and for each of their instantiations. In this work, we present a novel approach
to manipulation planning based on the idea that many household objects share
similarly-operated object parts. We formulate the manipulation planning as a
structured prediction problem and design a deep learning model that can handle
large noise in the manipulation demonstrations and learns features from three
different modalities: point-clouds, language and trajectory. In order to
collect a large number of manipulation demonstrations for different objects, we
developed a new crowd-sourcing platform called Robobarista. We test our model
on our dataset consisting of 116 objects with 249 parts along with 250 language
instructions, for which there are 1225 crowd-sourced manipulation
demonstrations. We further show that our robot can even manipulate objects it
has never seen before.Comment: In International Symposium on Robotics Research (ISRR) 201
The ApolloScape Open Dataset for Autonomous Driving and its Application
Autonomous driving has attracted tremendous attention especially in the past
few years. The key techniques for a self-driving car include solving tasks like
3D map construction, self-localization, parsing the driving road and
understanding objects, which enable vehicles to reason and act. However, large
scale data set for training and system evaluation is still a bottleneck for
developing robust perception models. In this paper, we present the ApolloScape
dataset [1] and its applications for autonomous driving. Compared with existing
public datasets from real scenes, e.g. KITTI [2] or Cityscapes [3], ApolloScape
contains much large and richer labelling including holistic semantic dense
point cloud for each site, stereo, per-pixel semantic labelling, lanemark
labelling, instance segmentation, 3D car instance, high accurate location for
every frame in various driving videos from multiple sites, cities and daytimes.
For each task, it contains at lease 15x larger amount of images than SOTA
datasets. To label such a complete dataset, we develop various tools and
algorithms specified for each task to accelerate the labelling process, such as
3D-2D segment labeling tools, active labelling in videos etc. Depend on
ApolloScape, we are able to develop algorithms jointly consider the learning
and inference of multiple tasks. In this paper, we provide a sensor fusion
scheme integrating camera videos, consumer-grade motion sensors (GPS/IMU), and
a 3D semantic map in order to achieve robust self-localization and semantic
segmentation for autonomous driving. We show that practically, sensor fusion
and joint learning of multiple tasks are beneficial to achieve a more robust
and accurate system. We expect our dataset and proposed relevant algorithms can
support and motivate researchers for further development of multi-sensor fusion
and multi-task learning in the field of computer vision.Comment: Version 4: Accepted by TPAMI. Version 3: 17 pages, 10 tables, 11
figures, added the application (DeLS-3D) based on the ApolloScape Dataset.
Version 2: 7 pages, 6 figures, added comparison with BDD100K datase
- …