10,453 research outputs found
Learning-based Feedback Controller for Deformable Object Manipulation
In this paper, we present a general learning-based framework to automatically
visual-servo control the position and shape of a deformable object with unknown
deformation parameters. The servo-control is accomplished by learning a
feedback controller that determines the robotic end-effector's movement
according to the deformable object's current status. This status encodes the
object's deformation behavior by using a set of observed visual features, which
are either manually designed or automatically extracted from the robot's sensor
stream. A feedback control policy is then optimized to push the object toward a
desired featured status efficiently. The feedback policy can be learned either
online or offline. Our online policy learning is based on the Gaussian Process
Regression (GPR), which can achieve fast and accurate manipulation and is
robust to small perturbations. An offline imitation learning framework is also
proposed to achieve a control policy that is robust to large perturbations in
the human-robot interaction. We validate the performance of our controller on a
set of deformable object manipulation tasks and demonstrate that our method can
achieve effective and accurate servo-control for general deformable objects
with a wide variety of goal settings.Comment: arXiv admin note: text overlap with arXiv:1709.07218,
arXiv:1710.06947, arXiv:1802.0966
Joint-ViVo: Selecting and Weighting Visual Words Jointly for Bag-of-Features based Tissue Classification in Medical Images
Automatically classifying the tissues types of Region of Interest (ROI) in
medical imaging has been an important application in Computer-Aided Diagnosis
(CAD), such as classification of breast parenchymal tissue in the mammogram,
classify lung disease patterns in High-Resolution Computed Tomography (HRCT)
etc. Recently, bag-of-features method has shown its power in this field,
treating each ROI as a set of local features. In this paper, we investigate
using the bag-of-features strategy to classify the tissue types in medical
imaging applications. Two important issues are considered here: the visual
vocabulary learning and weighting. Although there are already plenty of
algorithms to deal with them, all of them treat them independently, namely, the
vocabulary learned first and then the histogram weighted. Inspired by
Auto-Context who learns the features and classifier jointly, we try to develop
a novel algorithm that learns the vocabulary and weights jointly. The new
algorithm, called Joint-ViVo, works in an iterative way. In each iteration, we
first learn the weights for each visual word by maximizing the margin of ROI
triplets, and then select the most discriminate visual words based on the
learned weights for the next iteration. We test our algorithm on three tissue
classification tasks: identifying brain tissue type in magnetic resonance
imaging (MRI), classifying lung tissue in HRCT images, and classifying breast
tissue density in mammograms. The results show that Joint-ViVo can perform
effectively for classifying tissues.Comment: This paper has been withdrawn by the author due to the terrible
writin
Crowded Scene Analysis: A Survey
Automated scene analysis has been a topic of great interest in computer
vision and cognitive science. Recently, with the growth of crowd phenomena in
the real world, crowded scene analysis has attracted much attention. However,
the visual occlusions and ambiguities in crowded scenes, as well as the complex
behaviors and scene semantics, make the analysis a challenging task. In the
past few years, an increasing number of works on crowded scene analysis have
been reported, covering different aspects including crowd motion pattern
learning, crowd behavior and activity analysis, and anomaly detection in
crowds. This paper surveys the state-of-the-art techniques on this topic. We
first provide the background knowledge and the available features related to
crowded scenes. Then, existing models, popular algorithms, evaluation
protocols, as well as system performance are provided corresponding to
different aspects of crowded scene analysis. We also outline the available
datasets for performance evaluation. Finally, some research problems and
promising future directions are presented with discussions.Comment: 20 pages in IEEE Transactions on Circuits and Systems for Video
Technology, 201
Single-Shot Clothing Category Recognition in Free-Configurations with Application to Autonomous Clothes Sorting
This paper proposes a single-shot approach for recognising clothing
categories from 2.5D features. We propose two visual features, BSP (B-Spline
Patch) and TSD (Topology Spatial Distances) for this task. The local BSP
features are encoded by LLC (Locality-constrained Linear Coding) and fused with
three different global features. Our visual feature is robust to deformable
shapes and our approach is able to recognise the category of unknown clothing
in unconstrained and random configurations. We integrated the category
recognition pipeline with a stereo vision system, clothing instance detection,
and dual-arm manipulators to achieve an autonomous sorting system. To verify
the performance of our proposed method, we build a high-resolution RGBD
clothing dataset of 50 clothing items of 5 categories sampled in random
configurations (a total of 2,100 clothing samples). Experimental results show
that our approach is able to reach 83.2\% accuracy while classifying clothing
items which were previously unseen during training. This advances beyond the
previous state-of-the-art by 36.2\%. Finally, we evaluate the proposed approach
in an autonomous robot sorting system, in which the robot recognises a clothing
item from an unconstrained pile, grasps it, and sorts it into a box according
to its category. Our proposed sorting system achieves reasonable sorting
success rates with single-shot perception.Comment: 9 pages, accepted by IROS201
Localizing Region-Based Active Contours
©2008 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or distribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.DOI: 10.1109/TIP.2008.2004611In this paper, we propose a natural framework that allows any region-based segmentation energy to be re-formulated in a local way. We consider local rather than global image statistics and evolve a contour based on local information. Localized contours are capable of segmenting objects with heterogeneous feature profiles that would be difficult to capture correctly using a standard global method. The presented technique is versatile enough to be used with any global region-based active contour energy and instill in it the benefits of localization. We describe this framework and demonstrate the localization of three well-known energies in order to illustrate how our framework can be applied to any energy. We then compare each localized energy to its global counterpart to show the improvements that can be achieved. Next, an in-depth study of the behaviors of these energies in response to the degree of localization is given. Finally, we show results on challenging images to illustrate the robust and accurate segmentations that are possible with this new class of active contour models
Adaptive Scene Category Discovery with Generative Learning and Compositional Sampling
This paper investigates a general framework to discover categories of
unlabeled scene images according to their appearances (i.e., textures and
structures). We jointly solve the two coupled tasks in an unsupervised manner:
(i) classifying images without pre-determining the number of categories, and
(ii) pursuing generative model for each category. In our method, each image is
represented by two types of image descriptors that are effective to capture
image appearances from different aspects. By treating each image as a graph
vertex, we build up an graph, and pose the image categorization as a graph
partition process. Specifically, a partitioned sub-graph can be regarded as a
category of scenes, and we define the probabilistic model of graph partition by
accumulating the generative models of all separated categories. For efficient
inference with the graph, we employ a stochastic cluster sampling algorithm,
which is designed based on the Metropolis-Hasting mechanism. During the
iterations of inference, the model of each category is analytically updated by
a generative learning algorithm. In the experiments, our approach is validated
on several challenging databases, and it outperforms other popular
state-of-the-art methods. The implementation details and empirical analysis are
presented as well.Comment: 11 pages, 7 figure
Efficient Image-Space Extraction and Representation of 3D Surface Topography
Surface topography refers to the geometric micro-structure of a surface and
defines its tactile characteristics (typically in the sub-millimeter range).
High-resolution 3D scanning techniques developed recently enable the 3D
reconstruction of surfaces including their surface topography. In his paper, we
present an efficient image-space technique for the extraction of surface
topography from high-resolution 3D reconstructions. Additionally, we filter
noise and enhance topographic attributes to obtain an improved representation
for subsequent topography classification. Comprehensive experiments show that
the our representation captures well topographic attributes and significantly
improves classification performance compared to alternative 2D and 3D
representations.Comment: Initial version of the paper accepted at the IEEE ICIP Conference
201
Learn to Evaluate Image Perceptual Quality Blindly from Statistics of Self-similarity
Among the various image quality assessment (IQA) tasks, blind IQA (BIQA) is
particularly challenging due to the absence of knowledge about the reference
image and distortion type. Features based on natural scene statistics (NSS)
have been successfully used in BIQA, while the quality relevance of the feature
plays an essential role to the quality prediction performance. Motivated by the
fact that the early processing stage in human visual system aims to remove the
signal redundancies for efficient visual coding, we propose a simple but very
effective BIQA method by computing the statistics of self-similarity (SOS) in
an image. Specifically, we calculate the inter-scale similarity and intra-scale
similarity of the distorted image, extract the SOS features from these
similarities, and learn a regression model to map the SOS features to the
subjective quality score. Extensive experiments demonstrate very competitive
quality prediction performance and generalization ability of the proposed SOS
based BIQA method
Pedestrian Detection with Spatially Pooled Features and Structured Ensemble Learning
Many typical applications of object detection operate within a prescribed
false-positive range. In this situation the performance of a detector should be
assessed on the basis of the area under the ROC curve over that range, rather
than over the full curve, as the performance outside the range is irrelevant.
This measure is labelled as the partial area under the ROC curve (pAUC). We
propose a novel ensemble learning method which achieves a maximal detection
rate at a user-defined range of false positive rates by directly optimizing the
partial AUC using structured learning.
In order to achieve a high object detection performance, we propose a new
approach to extract low-level visual features based on spatial pooling.
Incorporating spatial pooling improves the translational invariance and thus
the robustness of the detection process. Experimental results on both synthetic
and real-world data sets demonstrate the effectiveness of our approach, and we
show that it is possible to train state-of-the-art pedestrian detectors using
the proposed structured ensemble learning method with spatially pooled
features. The result is the current best reported performance on the
Caltech-USA pedestrian detection dataset.Comment: 19 page
From BoW to CNN: Two Decades of Texture Representation for Texture Classification
Texture is a fundamental characteristic of many types of images, and texture
representation is one of the essential and challenging problems in computer
vision and pattern recognition which has attracted extensive research
attention. Since 2000, texture representations based on Bag of Words (BoW) and
on Convolutional Neural Networks (CNNs) have been extensively studied with
impressive performance. Given this period of remarkable evolution, this paper
aims to present a comprehensive survey of advances in texture representation
over the last two decades. More than 200 major publications are cited in this
survey covering different aspects of the research, which includes (i) problem
description; (ii) recent advances in the broad categories of BoW-based,
CNN-based and attribute-based methods; and (iii) evaluation issues,
specifically benchmark datasets and state of the art results. In retrospect of
what has been achieved so far, the survey discusses open challenges and
directions for future research.Comment: Accepted by IJC
- …