10,453 research outputs found

    Learning-based Feedback Controller for Deformable Object Manipulation

    Full text link
    In this paper, we present a general learning-based framework to automatically visual-servo control the position and shape of a deformable object with unknown deformation parameters. The servo-control is accomplished by learning a feedback controller that determines the robotic end-effector's movement according to the deformable object's current status. This status encodes the object's deformation behavior by using a set of observed visual features, which are either manually designed or automatically extracted from the robot's sensor stream. A feedback control policy is then optimized to push the object toward a desired featured status efficiently. The feedback policy can be learned either online or offline. Our online policy learning is based on the Gaussian Process Regression (GPR), which can achieve fast and accurate manipulation and is robust to small perturbations. An offline imitation learning framework is also proposed to achieve a control policy that is robust to large perturbations in the human-robot interaction. We validate the performance of our controller on a set of deformable object manipulation tasks and demonstrate that our method can achieve effective and accurate servo-control for general deformable objects with a wide variety of goal settings.Comment: arXiv admin note: text overlap with arXiv:1709.07218, arXiv:1710.06947, arXiv:1802.0966

    Joint-ViVo: Selecting and Weighting Visual Words Jointly for Bag-of-Features based Tissue Classification in Medical Images

    Full text link
    Automatically classifying the tissues types of Region of Interest (ROI) in medical imaging has been an important application in Computer-Aided Diagnosis (CAD), such as classification of breast parenchymal tissue in the mammogram, classify lung disease patterns in High-Resolution Computed Tomography (HRCT) etc. Recently, bag-of-features method has shown its power in this field, treating each ROI as a set of local features. In this paper, we investigate using the bag-of-features strategy to classify the tissue types in medical imaging applications. Two important issues are considered here: the visual vocabulary learning and weighting. Although there are already plenty of algorithms to deal with them, all of them treat them independently, namely, the vocabulary learned first and then the histogram weighted. Inspired by Auto-Context who learns the features and classifier jointly, we try to develop a novel algorithm that learns the vocabulary and weights jointly. The new algorithm, called Joint-ViVo, works in an iterative way. In each iteration, we first learn the weights for each visual word by maximizing the margin of ROI triplets, and then select the most discriminate visual words based on the learned weights for the next iteration. We test our algorithm on three tissue classification tasks: identifying brain tissue type in magnetic resonance imaging (MRI), classifying lung tissue in HRCT images, and classifying breast tissue density in mammograms. The results show that Joint-ViVo can perform effectively for classifying tissues.Comment: This paper has been withdrawn by the author due to the terrible writin

    Crowded Scene Analysis: A Survey

    Full text link
    Automated scene analysis has been a topic of great interest in computer vision and cognitive science. Recently, with the growth of crowd phenomena in the real world, crowded scene analysis has attracted much attention. However, the visual occlusions and ambiguities in crowded scenes, as well as the complex behaviors and scene semantics, make the analysis a challenging task. In the past few years, an increasing number of works on crowded scene analysis have been reported, covering different aspects including crowd motion pattern learning, crowd behavior and activity analysis, and anomaly detection in crowds. This paper surveys the state-of-the-art techniques on this topic. We first provide the background knowledge and the available features related to crowded scenes. Then, existing models, popular algorithms, evaluation protocols, as well as system performance are provided corresponding to different aspects of crowded scene analysis. We also outline the available datasets for performance evaluation. Finally, some research problems and promising future directions are presented with discussions.Comment: 20 pages in IEEE Transactions on Circuits and Systems for Video Technology, 201

    Single-Shot Clothing Category Recognition in Free-Configurations with Application to Autonomous Clothes Sorting

    Get PDF
    This paper proposes a single-shot approach for recognising clothing categories from 2.5D features. We propose two visual features, BSP (B-Spline Patch) and TSD (Topology Spatial Distances) for this task. The local BSP features are encoded by LLC (Locality-constrained Linear Coding) and fused with three different global features. Our visual feature is robust to deformable shapes and our approach is able to recognise the category of unknown clothing in unconstrained and random configurations. We integrated the category recognition pipeline with a stereo vision system, clothing instance detection, and dual-arm manipulators to achieve an autonomous sorting system. To verify the performance of our proposed method, we build a high-resolution RGBD clothing dataset of 50 clothing items of 5 categories sampled in random configurations (a total of 2,100 clothing samples). Experimental results show that our approach is able to reach 83.2\% accuracy while classifying clothing items which were previously unseen during training. This advances beyond the previous state-of-the-art by 36.2\%. Finally, we evaluate the proposed approach in an autonomous robot sorting system, in which the robot recognises a clothing item from an unconstrained pile, grasps it, and sorts it into a box according to its category. Our proposed sorting system achieves reasonable sorting success rates with single-shot perception.Comment: 9 pages, accepted by IROS201

    Localizing Region-Based Active Contours

    Get PDF
    ©2008 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or distribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.DOI: 10.1109/TIP.2008.2004611In this paper, we propose a natural framework that allows any region-based segmentation energy to be re-formulated in a local way. We consider local rather than global image statistics and evolve a contour based on local information. Localized contours are capable of segmenting objects with heterogeneous feature profiles that would be difficult to capture correctly using a standard global method. The presented technique is versatile enough to be used with any global region-based active contour energy and instill in it the benefits of localization. We describe this framework and demonstrate the localization of three well-known energies in order to illustrate how our framework can be applied to any energy. We then compare each localized energy to its global counterpart to show the improvements that can be achieved. Next, an in-depth study of the behaviors of these energies in response to the degree of localization is given. Finally, we show results on challenging images to illustrate the robust and accurate segmentations that are possible with this new class of active contour models

    Adaptive Scene Category Discovery with Generative Learning and Compositional Sampling

    Full text link
    This paper investigates a general framework to discover categories of unlabeled scene images according to their appearances (i.e., textures and structures). We jointly solve the two coupled tasks in an unsupervised manner: (i) classifying images without pre-determining the number of categories, and (ii) pursuing generative model for each category. In our method, each image is represented by two types of image descriptors that are effective to capture image appearances from different aspects. By treating each image as a graph vertex, we build up an graph, and pose the image categorization as a graph partition process. Specifically, a partitioned sub-graph can be regarded as a category of scenes, and we define the probabilistic model of graph partition by accumulating the generative models of all separated categories. For efficient inference with the graph, we employ a stochastic cluster sampling algorithm, which is designed based on the Metropolis-Hasting mechanism. During the iterations of inference, the model of each category is analytically updated by a generative learning algorithm. In the experiments, our approach is validated on several challenging databases, and it outperforms other popular state-of-the-art methods. The implementation details and empirical analysis are presented as well.Comment: 11 pages, 7 figure

    Efficient Image-Space Extraction and Representation of 3D Surface Topography

    Full text link
    Surface topography refers to the geometric micro-structure of a surface and defines its tactile characteristics (typically in the sub-millimeter range). High-resolution 3D scanning techniques developed recently enable the 3D reconstruction of surfaces including their surface topography. In his paper, we present an efficient image-space technique for the extraction of surface topography from high-resolution 3D reconstructions. Additionally, we filter noise and enhance topographic attributes to obtain an improved representation for subsequent topography classification. Comprehensive experiments show that the our representation captures well topographic attributes and significantly improves classification performance compared to alternative 2D and 3D representations.Comment: Initial version of the paper accepted at the IEEE ICIP Conference 201

    Learn to Evaluate Image Perceptual Quality Blindly from Statistics of Self-similarity

    Full text link
    Among the various image quality assessment (IQA) tasks, blind IQA (BIQA) is particularly challenging due to the absence of knowledge about the reference image and distortion type. Features based on natural scene statistics (NSS) have been successfully used in BIQA, while the quality relevance of the feature plays an essential role to the quality prediction performance. Motivated by the fact that the early processing stage in human visual system aims to remove the signal redundancies for efficient visual coding, we propose a simple but very effective BIQA method by computing the statistics of self-similarity (SOS) in an image. Specifically, we calculate the inter-scale similarity and intra-scale similarity of the distorted image, extract the SOS features from these similarities, and learn a regression model to map the SOS features to the subjective quality score. Extensive experiments demonstrate very competitive quality prediction performance and generalization ability of the proposed SOS based BIQA method

    Pedestrian Detection with Spatially Pooled Features and Structured Ensemble Learning

    Full text link
    Many typical applications of object detection operate within a prescribed false-positive range. In this situation the performance of a detector should be assessed on the basis of the area under the ROC curve over that range, rather than over the full curve, as the performance outside the range is irrelevant. This measure is labelled as the partial area under the ROC curve (pAUC). We propose a novel ensemble learning method which achieves a maximal detection rate at a user-defined range of false positive rates by directly optimizing the partial AUC using structured learning. In order to achieve a high object detection performance, we propose a new approach to extract low-level visual features based on spatial pooling. Incorporating spatial pooling improves the translational invariance and thus the robustness of the detection process. Experimental results on both synthetic and real-world data sets demonstrate the effectiveness of our approach, and we show that it is possible to train state-of-the-art pedestrian detectors using the proposed structured ensemble learning method with spatially pooled features. The result is the current best reported performance on the Caltech-USA pedestrian detection dataset.Comment: 19 page

    From BoW to CNN: Two Decades of Texture Representation for Texture Classification

    Full text link
    Texture is a fundamental characteristic of many types of images, and texture representation is one of the essential and challenging problems in computer vision and pattern recognition which has attracted extensive research attention. Since 2000, texture representations based on Bag of Words (BoW) and on Convolutional Neural Networks (CNNs) have been extensively studied with impressive performance. Given this period of remarkable evolution, this paper aims to present a comprehensive survey of advances in texture representation over the last two decades. More than 200 major publications are cited in this survey covering different aspects of the research, which includes (i) problem description; (ii) recent advances in the broad categories of BoW-based, CNN-based and attribute-based methods; and (iii) evaluation issues, specifically benchmark datasets and state of the art results. In retrospect of what has been achieved so far, the survey discusses open challenges and directions for future research.Comment: Accepted by IJC
    • …
    corecore