5 research outputs found

    Efficient and accurate stereo matching for cloth manipulation

    Get PDF
    Due to the recent development of robotic techniques, researching robots that can assist in everyday household tasks, especially robotic cloth manipulation has become popular in recent years. Stereo matching forms a crucial part of the robotic vision and aims to derive depth information from image pairs captured by the stereo cameras. Although stereo robotic vision is widely adopted for cloth manipulation robots in the research community, this remains a challenging research task. Robotic vision requires very accurate depth output in a relatively short timespan in order to successfully perform cloth manipulation in real-time. In this thesis, we mainly aim to develop a robotic stereo matching based vision system that is both efficient and effective for the task of robotic cloth manipulation. Effectiveness refers to the accuracy of the depth map generated from the stereo matching algorithms for the robot to grasp the required details to achieve the given task on cloth materials while efficiency emphasizes the required time for the stereo matching to process the images. With respect to efficiency, firstly, by exploring a variety of different hardware architectures such as multi-core CPU and graphic processors (GPU) to accelerate stereo matching, we demonstrate that the parallelised stereo-matching algorithm can be significantly accelerated, achieving 12X and 176X speed-ups respectively for multi-core CPU and GPU, compared with SISD (Single Instruction, Single Data) single-thread CPU. In terms of effectiveness, due to the fact that there are no cloth based testbeds with depth map ground-truths for evaluating the accuracy of stereo matching performance in this context, we created five different testbeds to facilitate evaluation of stereo matching in the context of cloth manipulation. In addition, we adapted a guided filtering algorithm into a pyramidical stereo matching framework that works directly for unrectified images, and evaluate its accuracy utilizing the created cloth testbeds. We demonstrate that our proposed approach is not only efficient, but also accurate and suits well to the characteristics of the task of cloth manipulations. This also shows that rather than relying on image rectification, directly applying stereo matching to unrectified images is effective and efficient. Finally, we further explore whether we can improve efficiency while maintaining reasonable accuracy for robotic cloth manipulations (i.e.~trading off accuracy for efficiency). We use a foveated matching algorithm, inspired by biological vision systems, and found that it is effective in trading off accuracy for efficiency, achieving almost the same level of accuracy for both cloth grasping and flattening tasks with two to three fold acceleration. We also demonstrate that with the robot we can use machine learning techniques to predict the optimal foveation level in order to accomplish the robotic cloth manipulation tasks successfully and much more efficiently. To summarize, in this thesis, we extensively study stereo matching, contributing to the long-term goal of developing effective ways for efficient whilst accurate robotic stereo matching for cloth manipulation

    Scene understanding by robotic interactive perception

    Get PDF
    This thesis presents a novel and generic visual architecture for scene understanding by robotic interactive perception. This proposed visual architecture is fully integrated into autonomous systems performing object perception and manipulation tasks. The proposed visual architecture uses interaction with the scene, in order to improve scene understanding substantially over non-interactive models. Specifically, this thesis presents two experimental validations of an autonomous system interacting with the scene: Firstly, an autonomous gaze control model is investigated, where the vision sensor directs its gaze to satisfy a scene exploration task. Secondly, autonomous interactive perception is investigated, where objects in the scene are repositioned by robotic manipulation. The proposed visual architecture for scene understanding involving perception and manipulation tasks has four components: 1) A reliable vision system, 2) Camera-hand eye calibration to integrate the vision system into an autonomous robot’s kinematic frame chain, 3) A visual model performing perception tasks and providing required knowledge for interaction with scene, and finally, 4) A manipulation model which, using knowledge received from the perception model, chooses an appropriate action (from a set of simple actions) to satisfy a manipulation task. This thesis presents contributions for each of the aforementioned components. Firstly, a portable active binocular robot vision architecture that integrates a number of visual behaviours are presented. This active vision architecture has the ability to verge, localise, recognise and simultaneously identify multiple target object instances. The portability and functional accuracy of the proposed vision architecture is demonstrated by carrying out both qualitative and comparative analyses using different robot hardware configurations, feature extraction techniques and scene perspectives. Secondly, a camera and hand-eye calibration methodology for integrating an active binocular robot head within a dual-arm robot are described. For this purpose, the forward kinematic model of the active robot head is derived and the methodology for calibrating and integrating the robot head is described in detail. A rigid calibration methodology has been implemented to provide a closed-form hand-to-eye calibration chain and this has been extended with a mechanism to allow the camera external parameters to be updated dynamically for optimal 3D reconstruction to meet the requirements for robotic tasks such as grasping and manipulating rigid and deformable objects. It is shown from experimental results that the robot head achieves an overall accuracy of fewer than 0.3 millimetres while recovering the 3D structure of a scene. In addition, a comparative study between current RGB-D cameras and our active stereo head within two dual-arm robotic test-beds is reported that demonstrates the accuracy and portability of our proposed methodology. Thirdly, this thesis proposes a visual perception model for the task of category-wise objects sorting, based on Gaussian Process (GP) classification that is capable of recognising objects categories from point cloud data. In this approach, Fast Point Feature Histogram (FPFH) features are extracted from point clouds to describe the local 3D shape of objects and a Bag-of-Words coding method is used to obtain an object-level vocabulary representation. Multi-class Gaussian Process classification is employed to provide a probability estimate of the identity of the object and serves the key role of modelling perception confidence in the interactive perception cycle. The interaction stage is responsible for invoking the appropriate action skills as required to confirm the identity of an observed object with high confidence as a result of executing multiple perception-action cycles. The recognition accuracy of the proposed perception model has been validated based on simulation input data using both Support Vector Machine (SVM) and GP based multi-class classifiers. Results obtained during this investigation demonstrate that by using a GP-based classifier, it is possible to obtain true positive classification rates of up to 80\%. Experimental validation of the above semi-autonomous object sorting system shows that the proposed GP based interactive sorting approach outperforms random sorting by up to 30\% when applied to scenes comprising configurations of household objects. Finally, a fully autonomous visual architecture is presented that has been developed to accommodate manipulation skills for an autonomous system to interact with the scene by object manipulation. This proposed visual architecture is mainly made of two stages: 1) A perception stage, that is a modified version of the aforementioned visual interaction model, 2) An interaction stage, that performs a set of ad-hoc actions relying on the information received from the perception stage. More specifically, the interaction stage simply reasons over the information (class label and associated probabilistic confidence score) received from perception stage to choose one of the following two actions: 1) An object class has been identified with high confidence, so remove from the scene and place it in the designated basket/bin for that particular class. 2) An object class has been identified with less probabilistic confidence, since from observation and inspired from the human behaviour of inspecting doubtful objects, an action is chosen to further investigate that object in order to confirm the object’s identity by capturing more images from different views in isolation. The perception stage then processes these views, hence multiple perception-action/interaction cycles take place. From an application perspective, the task of autonomous category based objects sorting is performed and the experimental design for the task is described in detail

    Towards Interactive Photorealistic Rendering

    Get PDF

    Reconnaissance perceptuelle des objets d’IntĂ©rĂȘt : application Ă  l’interprétation des activités instrumentales de la vie quotidienne pour les Ă©tudes de démence

    Get PDF
    The rationale and motivation of this PhD thesis is in the diagnosis, assessment,maintenance and promotion of self-independence of people with dementia in their InstrumentalActivities of Daily Living (IADLs). In this context a strong focus is held towardsthe task of automatically recognizing IADLs. Egocentric video analysis (cameras worn by aperson) has recently gained much interest regarding this goal. Indeed recent studies havedemonstrated how crucial is the recognition of active objects (manipulated or observedby the person wearing the camera) for the activity recognition task and egocentric videospresent the advantage of holding a strong differentiation between active and passive objects(associated to background). One recent approach towards finding active elements in a sceneis the incorporation of visual saliency in the object recognition paradigms. Modeling theselective process of human perception of visual scenes represents an efficient way to drivethe scene analysis towards particular areas considered of interest or salient, which, in egocentricvideos, strongly corresponds to the locus of objects of interest. The objective of thisthesis is to design an object recognition system that relies on visual saliency-maps to providemore precise object representations, that are robust against background clutter and, therefore,improve the recognition of active object for the IADLs recognition task. This PhD thesisis conducted in the framework of the Dem@care European project.Regarding the vast field of visual saliency modeling, we investigate and propose a contributionin both Bottom-up (gaze driven by stimuli) and Top-down (gaze driven by semantics)areas that aim at enhancing the particular task of active object recognition in egocentricvideo content. Our first contribution on Bottom-up models originates from the fact thatobservers are attracted by a central stimulus (the center of an image). This biological phenomenonis known as central bias. In egocentric videos however this hypothesis does not alwayshold. We study saliency models with non-central bias geometrical cues. The proposedvisual saliency models are trained based on eye fixations of observers and incorporated intospatio-temporal saliency models. When compared to state of the art visual saliency models,the ones we present show promising results as they highlight the necessity of a non-centeredgeometric saliency cue. For our top-down model contribution we present a probabilisticvisual attention model for manipulated object recognition in egocentric video content. Althougharms often occlude objects and are usually considered as a burden for many visionsystems, they become an asset in our approach, as we extract both global and local featuresdescribing their geometric layout and pose, as well as the objects being manipulated. We integratethis information in a probabilistic generative model, provide update equations thatautomatically compute the model parameters optimizing the likelihood of the data, and designa method to generate maps of visual attention that are later used in an object-recognitionframework. This task-driven assessment reveals that the proposed method outperforms thestate-of-the-art in object recognition for egocentric video content. [...]Cette thĂšse est motivĂ©e par le diagnostic, l’évaluation, la maintenance et la promotion de l’indĂ©pendance des personnes souffrant de maladies dĂ©mentielles pour leurs activitĂ©s de la vie quotidienne. Dans ce contexte nous nous intĂ©ressons Ă  la reconnaissance automatique des activitĂ©s de la vie quotidienne.L’analyse des vidĂ©os de type Ă©gocentriques (oĂč la camĂ©ra est posĂ©e sur une personne) a rĂ©cemment gagnĂ© beaucoup d’intĂ©rĂȘt en faveur de cette tĂąche. En effet de rĂ©centes Ă©tudes dĂ©montrent l’importance cruciale de la reconnaissance des objets actifs (manipulĂ©s ou observĂ©s par le patient) pour la reconnaissance d’activitĂ©s et les vidĂ©os Ă©gocentriques prĂ©sentent l’avantage d’avoir une forte diffĂ©renciation entre les objets actifs et passifs (associĂ©s Ă  l’arriĂšre plan). Une des approches rĂ©centes envers la reconnaissance des Ă©lĂ©ments actifs dans une scĂšne est l’incorporation de la saillance visuelle dans les algorithmes de reconnaissance d’objets. ModĂ©liser le processus sĂ©lectif du systĂšme visuel humain reprĂ©sente un moyen efficace de focaliser l’analyse d’une scĂšne vers les endroits considĂ©rĂ©s d’intĂ©rĂȘts ou saillants,qui, dans les vidĂ©os Ă©gocentriques, correspondent fortement aux emplacements des objets d’intĂ©rĂȘt. L’objectif de cette thĂšse est de permettre au systĂšmes de reconnaissance d’objets de fournir une dĂ©tection plus prĂ©cise des objets d’intĂ©rĂȘts grĂące Ă  la saillance visuelle afin d’amĂ©liorer les performances de reconnaissances d’activitĂ©s de la vie de tous les jours. Cette thĂšse est menĂ©e dans le cadre du projet EuropĂ©en [email protected] le vaste domaine de la modĂ©lisation de la saillance visuelle, nous Ă©tudions et proposons une contribution Ă  la fois dans le domaine "Bottom-up" (regard attirĂ© par des stimuli) que dans le domaine "Top-down" (regard attirĂ© par la sĂ©mantique) qui ont pour but d’amĂ©liorer la reconnaissance d’objets actifs dans les vidĂ©os Ă©gocentriques. Notre premiĂšre contribution pour les modĂšles Bottom-up prend racine du fait que les observateurs d’une vidĂ©o sont normalement attirĂ©s par le centre de celle-ci. Ce phĂ©nomĂšne biologique s’appelle le biais central. Dans les vidĂ©os Ă©gocentriques cependant, cette hypothĂšse n’est plus valable.Nous proposons et Ă©tudions des modĂšles de saillance basĂ©s sur ce phĂ©nomĂšne de biais non central.Les modĂšles proposĂ©s sont entrainĂ©s Ă  partir de fixations d’oeil enregistrĂ©es et incorporĂ©es dans des modĂšles spatio-temporels. Lorsque comparĂ©s Ă  l’état-de-l’art des modĂšles Bottom-up, ceux que nous prĂ©sentons montrent des rĂ©sultats prometteurs qui illustrent la nĂ©cessitĂ© d’un modĂšle gĂ©omĂ©trique biaisĂ© non-centrĂ© dans ce type de vidĂ©os. Pour notre contribution dans le domaine Top-down, nous prĂ©sentons un modĂšle probabiliste d’attention visuelle pour la reconnaissance d’objets manipulĂ©s dans les vidĂ©os Ă©gocentriques. Bien que les bras soient souvent source d’occlusion des objets et considĂ©rĂ©s comme un fardeau, ils deviennent un atout dans notre approche. En effet nous extrayons Ă  la fois des caractĂ©ristiques globales et locales permettant d’estimer leur disposition gĂ©omĂ©trique. Nous intĂ©grons cette information dans un modĂšle probabiliste, avec Ă©quations de mise a jour pour optimiser la vraisemblance du modĂšle en fonction de ses paramĂštres et enfin gĂ©nĂ©rons les cartes d’attention visuelle pour la reconnaissance d’objets manipulĂ©s. [...
    corecore