42 research outputs found

    Active recognition through next view planning: a survey

    Full text link

    Active recognition and pose estimation of rigid and deformable objects in 3D space

    Get PDF
    Object recognition and pose estimation is a fundamental problem in computer vision and of utmost importance in robotic applications. Object recognition refers to the problem of recognizing certain object instances, or categorizing objects into specific classes. Pose estimation deals with estimating the exact position of the object in 3D space, usually expressed in Euler angles. There are generally two types of objects that require special care when designing solutions to the aforementioned problems: rigid and deformable. Dealing with deformable objects has been a much harder problem, and usually solutions that apply to rigid objects, fail when used for deformable objects due to the inherent assumptions made during the design. In this thesis we deal with object categorization, instance recognition and pose estimation of both rigid and deformable objects. In particular, we are interested in a special type of deformable objects, clothes. We tackle the problem of autonomously recognizing and unfolding articles of clothing using a dual manipulator. This problem consists of grasping an article from a random point, recognizing it and then bringing it into an unfolded state by a dual arm robot. We propose a data-driven method for clothes recognition from depth images using Random Decision Forests. We also propose a method for unfolding an article of clothing after estimating and grasping two key-points, using Hough Forests. Both methods are implemented into a POMDP framework allowing the robot to interact optimally with the garments, taking into account uncertainty in the recognition and point estimation process. This active recognition and unfolding makes our system very robust to noisy observations. Our methods were tested on regular-sized clothes using a dual-arm manipulator. Our systems perform better in both accuracy and speed compared to state-of-the-art approaches. In order to take advantage of the robotic manipulator and increase the accuracy of our system, we developed a novel approach to address generic active vision problems, called Active Random Forests. While state of the art focuses on best viewing parameters selection based on single view classifiers, we propose a multi-view classifier where the decision mechanism of optimally changing viewing parameters is inherent to the classification process. This has many advantages: a) the classifier exploits the entire set of captured images and does not simply aggregate probabilistically per view hypotheses; b) actions are based on learnt disambiguating features from all views and are optimally selected using the powerful voting scheme of Random Forests and c) the classifier can take into account the costs of actions. The proposed framework was applied to the same task of autonomously unfolding clothes by a robot, addressing the problem of best viewpoint selection in classification, grasp point and pose estimation of garments. We show great performance improvement compared to state of the art methods and our previous POMDP formulation. Moving from deformable to rigid objects while keeping our interest to domestic robotic applications, we focus on object instance recognition and 3D pose estimation of household objects. We are particularly interested in realistic scenes that are very crowded and objects can be perceived under severe occlusions. Single shot-based 6D pose estimators with manually designed features are still unable to tackle such difficult scenarios for a variety of objects, motivating the research towards unsupervised feature learning and next-best-view estimation. We present a complete framework for both single shot-based 6D object pose estimation and next-best-view prediction based on Hough Forests, the state of the art object pose estimator that performs classification and regression jointly. Rather than using manually designed features we propose an unsupervised feature learnt from depth-invariant patches using a Sparse Autoencoder. Furthermore, taking advantage of the clustering performed in the leaf nodes of Hough Forests, we learn to estimate the reduction of uncertainty in other views, formulating the problem of selecting the next-best-view. To further improve 6D object pose estimation, we propose an improved joint registration and hypotheses verification module as a final refinement step to reject false detections. We provide two additional challenging datasets inspired from realistic scenarios to extensively evaluate the state of the art and our framework. One is related to domestic environments and the other depicts a bin-picking scenario mostly found in industrial settings. We show that our framework significantly outperforms state of the art both on public and on our datasets. Unsupervised feature learning, although efficient, might produce sub-optimal features for our particular tast. Therefore in our last work, we leverage the power of Convolutional Neural Networks to tackled the problem of estimating the pose of rigid objects by an end-to-end deep regression network. To improve the moderate performance of the standard regression objective function, we introduce the Siamese Regression Network. For a given image pair, we enforce a similarity measure between the representation of the sample images in the feature and pose space respectively, that is shown to boost regression performance. Furthermore, we argue that our pose-guided feature learning using our Siamese Regression Network generates more discriminative features that outperform the state of the art. Last, our feature learning formulation provides the ability of learning features that can perform under severe occlusions, demonstrating high performance on our novel hand-object dataset. Concluding, this work is a research on the area of object detection and pose estimation in 3D space, on a variety of object types. Furthermore we investigate how accuracy can be further improved by applying active vision techniques to optimally move the camera view to minimize the detection error.Open Acces

    Information-theoretic environment modeling for mobile robot localization

    Full text link
    To enhance robotic computational efficiency without degenerating accuracy, it is imperative to fit the right and exact amount of information in its simplest form to the investigated task. This thesis conforms to this reasoning in environment model building and robot localization. It puts forth an approach towards building maps and localizing a mobile robot efficiently with respect to unknown, unstructured and moderately dynamic environments. For this, the environment is modeled on an information-theoretic basis, more specifically in terms of its transmission property. Subsequently, the presented environment model, which does not specifically adhere to classical geometric modeling, succeeds in solving the environment disambiguation effectively. The proposed solution lays out a two-level hierarchical structure for localization. The structure makes use of extracted features, which are stored in two different resolutions in a single hybrid feature-map. This enables dual coarse-topological and fine-geometric localization modalities. The first level in the hierarchy describes the environment topologically, where a defined set of places is described by a probabilistic feature representation. A conditional entropy-based criterion is proposed to quantify the transinformation between the feature and the place domains. This criterion provides a double benefit of pruning the large dimensional feature space, and at the same time selecting the best discriminative features that overcome environment aliasing problems. Features with the highest transinformation are filtered and compressed to form a coarse resolution feature-map (codebook). Localization at this level is conducted through place matching. In the second level of the hierarchy, the map is viewed in high-resolution, as consisting of non-compressed entropy-processed features. These features are additionally tagged with their position information. Given the identified topological place provided by the first level, fine localization corresponding to the second level is executed using feature triangulation. To enhance the triangulation accuracy, redundant features are used and two metric evaluating criteria are employ-ed; one for dynamic features and mismatches detection, and another for feature selection. The proposed approach and methods have been tested in realistic indoor environments using a vision sensor and the Scale Invariant Feature Transform local feature extraction. Through experiments, it is demonstrated that an information-theoretic modeling approach is highly efficient in attaining combined accuracy and computational efficiency performances for localization. It has also been proven that the approach is capable of modeling environments with a high degree of unstructuredness, perceptual aliasing, and dynamic variations (illumination conditions; scene dynamics). The merit of employing this modeling type is that environment features are evaluated quantitatively, while at the same time qualitative conclusions are generated about feature selection and performance in a robot localization task. In this way, the accuracy of localization can be adapted in accordance with the available resources. The experimental results also show that the hybrid topological-metric map provides sufficient information to localize a mobile robot on two scales, independent of the robot motion model. The codebook exhibits fast and accurate topological localization at significant compression ratios. The hierarchical localization framework demonstrates robustness and optimized space and time complexities. This, in turn, provides scalability to large environments application and real-time employment adequacies

    Active object recognition for 2D and 3D applications

    Get PDF
    Includes bibliographical referencesActive object recognition provides a mechanism for selecting informative viewpoints to complete recognition tasks as quickly and accurately as possible. One can manipulate the position of the camera or the object of interest to obtain more useful information. This approach can improve the computational efficiency of the recognition task by only processing viewpoints selected based on the amount of relevant information they contain. Active object recognition methods are based around how to select the next best viewpoint and the integration of the extracted information. Most active recognition methods do not use local interest points which have been shown to work well in other recognition tasks and are tested on images containing a single object with no occlusions or clutter. In this thesis we investigate using local interest points (SIFT) in probabilistic and non-probabilistic settings for active single and multiple object and viewpoint/pose recognition. Test images used contain objects that are occluded and occur in significant clutter. Visually similar objects are also included in our dataset. Initially we introduce a non-probabilistic 3D active object recognition system which consists of a mechanism for selecting the next best viewpoint and an integration strategy to provide feedback to the system. A novel approach to weighting the uniqueness of features extracted is presented, using a vocabulary tree data structure. This process is then used to determine the next best viewpoint by selecting the one with the highest number of unique features. A Bayesian framework uses the modified statistics from the vocabulary structure to update the system's confidence in the identity of the object. New test images are only captured when the belief hypothesis is below a predefined threshold. This vocabulary tree method is tested against randomly selecting the next viewpoint and a state-of-the-art active object recognition method by Kootstra et al.. Our approach outperforms both methods by correctly recognizing more objects with less computational expense. This vocabulary tree method is extended for use in a probabilistic setting to improve the object recognition accuracy. We introduce Bayesian approaches for object recognition and object and pose recognition. Three likelihood models are introduced which incorporate various parameters and levels of complexity. The occlusion model, which includes geometric information and variables that cater for the background distribution and occlusion, correctly recognizes all objects on our challenging database. This probabilistic approach is further extended for recognizing multiple objects and poses in a test images. We show through experiments that this model can recognize multiple objects which occur in close proximity to distractor objects. Our viewpoint selection strategy is also extended to the multiple object application and performs well when compared to randomly selecting the next viewpoint, the activation model and mutual information. We also study the impact of using active vision for shape recognition. Fourier descriptors are used as input to our shape recognition system with mutual information as the active vision component. We build multinomial and Gaussian distributions using this information, which correctly recognizes a sequence of objects. We demonstrate the effectiveness of active vision in object recognition systems. We show that even in different recognition applications using different low level inputs, incorporating active vision improves the overall accuracy and decreases the computational expense of object recognition systems

    3D Object Recognition Based On Constrained 2D Views

    Get PDF
    The aim of the present work was to build a novel 3D object recognition system capable of classifying man-made and natural objects based on single 2D views. The approach to this problem has been one motivated by recent theories on biological vision and multiresolution analysis. The project's objectives were the implementation of a system that is able to deal with simple 3D scenes and constitutes an engineering solution to the problem of 3D object recognition, allowing the proposed recognition system to operate in a practically acceptable time frame. The developed system takes further the work on automatic classification of marine phytoplank- (ons, carried out at the Centre for Intelligent Systems, University of Plymouth. The thesis discusses the main theoretical issues that prompted the fundamental system design options. The principles and the implementation of the coarse data channels used in the system are described. A new multiresolution representation of 2D views is presented, which provides the classifier module of the system with coarse-coded descriptions of the scale-space distribution of potentially interesting features. A multiresolution analysis-based mechanism is proposed, which directs the system's attention towards potentially salient features. Unsupervised similarity-based feature grouping is introduced, which is used in coarse data channels to yield feature signatures that are not spatially coherent and provide the classifier module with salient descriptions of object views. A simple texture descriptor is described, which is based on properties of a special wavelet transform. The system has been tested on computer-generated and natural image data sets, in conditions where the inter-object similarity was monitored and quantitatively assessed by human subjects, or the analysed objects were very similar and their discrimination constituted a difficult task even for human experts. The validity of the above described approaches has been proven. The studies conducted with various statistical and artificial neural network-based classifiers have shown that the system is able to perform well in all of the above mentioned situations. These investigations also made possible to take further and generalise a number of important conclusions drawn during previous work carried out in the field of 2D shape (plankton) recognition, regarding the behaviour of multiple coarse data channels-based pattern recognition systems and various classifier architectures. The system possesses the ability of dealing with difficult field-collected images of objects and the techniques employed by its component modules make possible its extension to the domain of complex multiple-object 3D scene recognition. The system is expected to find immediate applicability in the field of marine biota classification

    Task-oriented viewpoint planning for free-form objects

    Get PDF
    A thesis submitted to the Universitat Polit猫cnica de Catalunya to obtain the degree of Doctor of Philosophy. Doctoral programme: Automatic Control, Robotics and Computer Vision. This thesis was completed at: Institut de Rob貌tica i Inform脿tica Industrial, CSIC-UPC.[EN]: This thesis deals with active sensing and its use in real exploration tasks under both scene ambiguities and measurement uncertainties. While object modeling is the implicit objective of most of active sensing algorithms, in this work we have explored new strategies to deal with more generic and more complex tasks. Active sensing requires the ability of moving the perceptual system to gather new information. Our approach uses a robot manipulator with a 3D Time-of-Flight (ToF) camera attached to the end-effector. For a complex task, we have focused our attention on plant phenotyping. Plants are complex objects, with leaves that change their position and size along time. Valid viewpoints for a certain plant are hardly valid for a different one, even belonging to the same species. Some instruments, such as chlorophyll meters or disk sampling tools, require being precisely positioned over a particular location of the leaf. Therefore, their use requires the modeling of specific regions of interest of the plant, including also the free space needed for avoiding obstacles and approaching the leaf with tool. It is easy to observe that predefined camera trajectories are not valid here, and that usually with one single view it is very difficult to acquire all the required information. The overall objective of this thesis is to solve complex active sensing tasks by embedding their exploratory goal into a pre-estimated geometrical model, using information-gain as the fundamental guideline for the reward function. The main contributions can be divided in two groups: first, the evaluation of ToF cameras and their calibration to assess the uncertainty of the measurements (presented in Part I); and second, the proposal of a framework capable of embedding the task, modeled as free and occupied space, and that takes into account the modeled sensor's uncertainty to improve the action selection algorithm (presented in Part II). This thesishas given rise to 14 publications, including 5 indexed journals, and its results have been used in the GARNICS European project. The complete framework is based on the Next-Best-View methodology and it can be summarized in the following main steps. First, an initial view of the object (e.g., a plant) is acquired. From this initial view and given a set of candidate viewpoints, the expected gain obtained by moving the robot and acquiring the next image is computed. This computation takes into account the uncertainty from all the different pixels of the sensor, the expected information based on a predefined task model, and the possible occlusions. Once the most promising view is selected, the robot moves, takes a new image, integrates this information intothe model, and evaluates again the set of remaining views. Finally, the task terminates when enough information is gathered. In our examples, this process enables the robot to perform a measurement on top of a leaf. The key ingredient is to model the complexity of the task in a layered representation of free-occupied occupancy grid maps. This allows to naturally encode the requirements of the task, to maintain and update the belief state with the measurements performed, to simulate and compute the expected gains of all potential viewpoints, and to encode the termination condition. During this work the technology of ToF cameras has incredibly evolved. Nowadays it is very popular and ToF cameras are already embedded in some consumer devices. Although the quality of the measurements has been considerably improved, it is still not uniform in the sensor. We believe, as it has been demonstrated in various experiments in this work, that a careful modeling of the sensor's uncertainty is highly beneficial and helps to design better decision systems. In our case, it enables a more realistic computation of the information gain measure, and consequently, a better selection criterion.[CA]: Aquesta tesi aborda el tema de la percepci贸 activa i el seu 煤s en tasques d'exploraci贸 en entorns reals tot considerant la ambig眉itat en l'escena i la incertesa del sistema de percepci贸. Al contrari de la majoria d'algoritmes de percepci贸 activa, on el modelatge d'objectes sol ser l'objectiu impl铆cit, en aquesta tesi hem explorat noves estrat猫gies per poder tractar tasques gen猫riques i de major complexitat. Tot sistema de percepci贸 activa requereix un aparell sensorial amb la capacitat de variar els seus par脿metres de forma controlada, per poder, d'aquesta manera, recopilar nova informaci贸 per resoldre una tasca determinada. En tasques d'exploraci贸, la posici贸 i orientaci贸 del sensor s贸n par脿metres claus per resoldre la tasca. En el nostre estudi hem fet 煤s d'un robot manipulador com a sistema de posicionament i d'una c脿mera de profunditat de temps de vol (ToF), adherida al seu efector final, com a sistema de percepci贸. Com a tasca final, ens hem concentrat en l'adquisici贸 de mesures sobre fulles dins de l'脿mbit del fenotipatge de les plantes. Les plantes son objectes molt complexos, amb fulles que canvien de textura, posici贸 i mida al llarg del temps. Aix貌 comporta diverses dificultats. Per una banda, abans de dur a terme una mesura sobre un fulla s'ha d'explorar l'entorn i trobar una regi贸 que ho permeti. A m茅s a m茅s, aquells punts de vista que han estat adequats per una determinada planta dif铆cilment ho seran per una altra, tot i sent les dues de la mateixa esp猫cie. Per un altra banda, en el moment de la mesura, certs instruments, tals com els mesuradors de clorofil路la o les eines d'extracci贸 de mostres, requereixen ser posicionats amb molta precisi贸. 脡s necessari, doncs, disposar d'un model detallat d'aquestes regions d'inter猫s, i que inclogui no nom茅s l'espai ocupat sin贸 tamb茅 el lliure. Gr脿cies a la modelitzaci贸 de l'espai lliure es pot dur a terme una bona evitaci贸 d'obstacles i un bon c脿lcul de la traject貌ria d'aproximaci贸 de l'eina a la fulla. En aquest context, 茅s f脿cil veure que, en general, amb un sol punt de vistano n'hi haprou per adquirir tota la informaci贸 necess脿ria per prendre una mesura, i que l'煤s de traject貌ries predeterminades no garanteixen l'猫xit. L'objectiu general d'aquesta tesi 茅s resoldre tasques complexes de percepci贸 activa mitjan莽ant la codificaci贸 del seu objectiu d'exploraci贸 en un model geom猫tric pr猫viament estimat, fent servir el guany d'informaci贸 com a guia fonamental dins de la funci贸 de cost. Les principals contribucions d'aquesta tesi es poden dividir en dos grups: primer, l'avaluaci贸 de les c脿meres ToF i el seu calibratge per poder avaluar la incertesa de les seves mesures (presentat en la Part I); i en segon lloc, la proposta d'un sistema capa莽 de codificar la tasca mitjan莽ant el modelatge de l'espai lliure i ocupat, i que t茅 en compte la incertesa del sensor per millorar la selecci贸 de les accions (presentat en la Part II). Aquesta tesi ha donat lloc a 14 publicacions, incloent 5 en revistes indexades, i els resultats obtinguts s'han fet servir en el projecte Europeu GARNICS. La funcionalitat del sistema complet est脿 basada en els m猫todes Next-Best-View (seg眉ent-millor-vista) i es pot desglossar en els seg眉ents passos principals. En primer lloc, s'obt茅 una vista inicial de l'objecte (p. ex., una planta). A partir d'aquesta vista inicial i d'un conjunt de vistes candidates, s'estima, per cada una d'elles, el guany d'informaci贸 resultant, tant de moure la c脿mera com d'obtenir una nova mesura. 脡s rellevant dir que aquest c脿lcul t茅 en compte la incertesa de cada un dels p铆xels del sensor, l'estimaci贸 de la informaci贸 basada en el model de la tasca preestablerta i les possibles oclusions. Un cop seleccionada la vista m茅s prometedora, el robot es mou a la nova posici贸, pren una nova imatge, integra aquesta informaci贸 en el model i torna a avaluar, un altre cop, el conjunt de punts de vista restants. Per 煤ltim, la tasca acaba en el moment que es recopila suficient informaci贸.This work has been partially supported by a JAE fellowship of the Spanish Scientific Research Council (CSIC), the Spanish Ministry of Science and Innovation, the Catalan Research Commission and the European Commission under the research projects: DPI2008-06022: PAU: Percepci贸n y acci贸n ante incertidumbre. DPI2011-27510: PAU+: Perception and Action in Robotics Problems with Large State Spaces. 201350E102: MANIPlus: Manipulaci贸n robotizada de objetos deformables. 2009-SGR-155: SGR ROB脪TICA: Grup de recerca consolidat - Grup de Rob貌tica. FP6-2004-IST-4-27657: EU PACO PLUS project. FP7-ICT-2009-4-247947: GARNICS: Gardening with a cognitive system. FP7-ICT-2009-6-269959: IntellAct: Intelligent observation and execution of Actions and manipulations.Peer Reviewe

    Aerospace Medicine and Biology: A continuing bibliography with indexes

    Get PDF
    This bibliography lists 356 reports, articles and other documents introduced into the NASA scientific and technical information system in June 1982
    corecore