773 research outputs found

    Robotic system for garment perception and manipulation

    Get PDF
    Mención Internacional en el título de doctorGarments are a key element of people’s daily lives, as many domestic tasks -such as laundry-, revolve around them. Performing such tasks, generally dull and repetitive, implies devoting many hours of unpaid labor to them, that could be freed through automation. But automation of such tasks has been traditionally hard due to the deformable nature of garments, that creates additional challenges to the already existing when performing object perception and manipulation. This thesis presents a Robotic System for Garment Perception and Manipulation that intends to address these challenges. The laundry pipeline as defined in this work is composed by four independent -but sequential- tasks: hanging, unfolding, ironing and folding. The aim of this work is the automation of this pipeline through a robotic system able to work on domestic environments as a robot household companion. Laundry starts by washing the garments, that then need to be dried, frequently by hanging them. As hanging is a complex task requiring bimanipulation skills and dexterity, a simplified approach is followed in this work as a starting point, by using a deep convolutional neural network and a custom synthetic dataset to study if a robot can predict whether a garment will hang or not when dropped over a hanger, as a first step towards a more complex controller. After the garment is dry, it has to be unfolded to ease recognition of its garment category for the next steps. The presented model-less unfolding method uses only color and depth information from the garment to determine the grasp and release points of an unfolding action, that is repeated iteratively until the garment is fully spread. Before storage, wrinkles have to be removed from the garment. For that purpose, a novel ironing method is proposed, that uses a custom wrinkle descriptor to locate the most prominent wrinkles and generate a suitable ironing plan. The method does not require a precise control of the light conditions of the scene, and is able to iron using unmodified ironing tools through a force-feedback-based controller. Finally, the last step is to fold the garment to store it. One key aspect when folding is to perform the folding operation in a precise manner, as errors will accumulate when several folds are required. A neural folding controller is proposed that uses visual feedback of the current garment shape, extracted through a deep neural network trained with synthetic data, to accurately perform a fold. All the methods presented to solve each of the laundry pipeline tasks have been validated experimentally on different robotic platforms, including a full-body humanoid robot.La ropa es un elemento clave en la vida diaria de las personas, no sólo a la hora de vestir, sino debido también a que muchas de las tareas domésticas que una persona debe realizar diariamente, como hacer la colada, requieren interactuar con ellas. Estas tareas, a menudo tediosas y repetitivas, obligan a invertir una gran cantidad de horas de trabajo no remunerado en su realización, las cuales podrían reducirse a través de su automatización. Sin embargo, automatizar dichas tareas ha sido tradicionalmente un reto, debido a la naturaleza deformable de las prendas, que supone una dificultad añadida a las ya existentes al llevar a cabo percepción y manipulación de objetos a través de robots. Esta tesis presenta un sistema robótico orientado a la percepción y manipulación de prendas, que pretende resolver dichos retos. La colada es una tarea doméstica compuesta de varias subtareas que se llevan a cabo de manera secuencial. En este trabajo, se definen dichas subtareas como: tender, desdoblar, planchar y doblar. El objetivo de este trabajo es automatizar estas tareas a través de un sistema robótico capaz de trabajar en entornos domésticos, convirtiéndose en un asistente robótico doméstico. La colada comienza lavando las prendas, las cuales han de ser posteriormente secadas, generalmente tendiéndolas al aire libre, para poder realizar el resto de subtareas con ellas. Tender la ropa es una tarea compleja, que requiere de bimanipulación y una gran destreza al manipular la prenda. Por ello, en este trabajo se ha optado por abordar una versión simplicada de la tarea de tendido, como punto de partida para llevar a cabo investigaciones más avanzadas en el futuro. A través de una red neuronal convolucional profunda y un conjunto de datos de entrenamiento sintéticos, se ha llevado a cabo un estudio sobre la capacidad de predecir el resultado de dejar caer una prenda sobre un tendedero por parte de un robot. Este estudio, que sirve como primer paso hacia un controlador más avanzado, ha resultado en un modelo capaz de predecir si la prenda se quedará tendida o no a partir de una imagen de profundidad de la misma en la posición en la que se dejará caer. Una vez las prendas están secas, y para facilitar su reconocimiento por parte del robot de cara a realizar las siguientes tareas, la prenda debe ser desdoblada. El método propuesto en este trabajo para realizar el desdoble no requiere de un modelo previo de la prenda, y utiliza únicamente información de profundidad y color, obtenida mediante un sensor RGB-D, para calcular los puntos de agarre y soltado de una acción de desdoble. Este proceso es iterativo, y se repite hasta que la prenda se encuentra totalmente desdoblada. Antes de almacenar la prenda, se deben eliminar las posibles arrugas que hayan surgido en el proceso de lavado y secado. Para ello, se propone un nuevo algoritmo de planchado, que utiliza un descriptor de arrugas desarrollado en este trabajo para localizar las arrugas más prominentes y generar un plan de planchado acorde a las condiciones de la prenda. A diferencia de otros métodos existentes, este método puede aplicarse en un entorno doméstico, ya que no requiere de un contol preciso de las condiciones de iluminación. Además, es capaz de usar las mismas herramientas de planchado que usaría una persona sin necesidad de realizar modificaciones a las mismas, a través de un controlador que usa realimentación de fuerza para aplicar una presión constante durante el planchado. El último paso al hacer la colada es doblar la prenda para almacenarla. Un aspecto importante al doblar prendas es ejecutar cada uno de los dobleces necesarios con precisión, ya que cada error o desfase cometido en un doblez se acumula cuando la secuencia de doblado está formada por varios dobleces consecutivos. Para llevar a cabo estos dobleces con la precisión requerida, se propone un controlador basado en una red neuronal, que utiliza realimentación visual de la forma de la prenda durante cada operación de doblado. Esta realimentación es obtenida a través de una red neuronal profunda entrenada con un conjunto de entrenamiento sintético, que permite estimar la forma en 3D de la parte a doblar a través de una imagen monocular de la misma. Todos los métodos descritos en esta tesis han sido validados experimentalmente con éxito en diversas plataformas robóticas, incluyendo un robot humanoide.Programa de Doctorado en Ingeniería Eléctrica, Electrónica y Automática por la Universidad Carlos III de MadridPresidente: Abderrahmane Kheddar.- Secretario: Ramón Ignacio Barber Castaño.- Vocal: Karinne Ramírez-Amar

    Assessment of the potentials and limitations of cortical-based analysis for the integration of structure and function in normal and pathological brains using MRI

    Get PDF
    The software package Brainvisa (www.brainvisa.tnfo) offers a wide range of possibilities for cortical analysis using its automatic sulci recognition feature. Automated sulci identification is an attractive feature as the manual labelling of the cortical sulci is often challenging even for the experienced neuro-radiologists. This can also be of interest in fMRI studies of individual subjects where activated regions of the cortex can simply be identified using sulcal labels without the need for normalization to an atlas. As it will be explained later in this thesis, normalization to atlas can especially be problematic for pathologic brains. In addition, Brainvisa allows for sulcal morphometry from structural MR images by estimating a wide range of sulcal properties such as size, coordinates, direction, and pattern. Morphometry of abnormal brains has gained huge interest and has been widely used in finding the biomarkers of several neurological diseases or psychiatric disorders. However mainly because of its complexity, only a limited use of sulcal morphometry has been reported so far. With a wide range of possibilities for sulcal morphometry offered by Brainvisa, it is possible to thoroughly investigate the sulcal changes due to the abnormality. However, as any other automated method, Brainvisa can be susceptible to limitations associated with image quality. Factors such as noise, spatial resolution, and so on, can have an impact on the detection of the cortical folds and estimation of their attributes. Hence the robustness of Brainvisa needs to be assessed. This can be done by estimating the reliability and reproducibility of results as well as exploring the changes in results caused by other factors. This thesis is an attempt to investigate the possible benefits of sulci identification and sulcal morphometry for functional and structural MRI studies as well as the limitations of Brainvisa. In addition, the possibility of improvement of activation localization with functional MRI studies is further investigated. This investigation was motivated by a review of other cortical-based analysis methods, namely the cortical surface-based methods, which are discussed in the literature review chapter of this thesis. The application of these approaches in functional MRI data analysis and their potential benefits is used in this investigation

    Atlas-Based Automatic Generation of Subject-Specific Finite Element Tongue Meshes

    Get PDF
    Generation of subject-specific 3D finite element (FE) models requires the processing of numerous medical images in order to precisely extract geometrical information about subject-specific anatomy. This processing remains extremely challenging. To overcome this difficulty, we present an automatic atlas-based method that generates subject-specific FE meshes via a 3D registration guided by Magnetic Resonance images. The method extracts a 3D transformation by registering the atlas’ volume image to the subject’s one, and establishes a one-to-one correspondence between the two volumes. The 3D transformation field deforms the atlas’ mesh to generate the subject-specific FE mesh. To preserve the quality of the subject-specific mesh, a diffeomorphic non-rigid registration based on B-spline free-form deformations is used, which guarantees a non-folding and one-to-one transformation. Two evaluations of the method are provided. First, a publicly available CT-database is used to assess the capability to accurately capture the complexity of each subject-specific Lung’s geometry. Second, FE tongue meshes are generated for two healthy volunteers and two patients suffering from tongue cancer using MR images. It is shown that the method generates an appropriate representation of the subject-specific geometry while preserving the quality of the FE meshes for subsequent FE analysis. To demonstrate the importance of our method in a clinical context, a subject-specific mesh is used to simulate tongue’s biomechanical response to the activation of an important tongue muscle, before and after cancer surgery

    PointMCD: Boosting Deep Point Cloud Encoders via Multi-view Cross-modal Distillation for 3D Shape Recognition

    Full text link
    As two fundamental representation modalities of 3D objects, 3D point clouds and multi-view 2D images record shape information from different domains of geometric structures and visual appearances. In the current deep learning era, remarkable progress in processing such two data modalities has been achieved through respectively customizing compatible 3D and 2D network architectures. However, unlike multi-view image-based 2D visual modeling paradigms, which have shown leading performance in several common 3D shape recognition benchmarks, point cloud-based 3D geometric modeling paradigms are still highly limited by insufficient learning capacity, due to the difficulty of extracting discriminative features from irregular geometric signals. In this paper, we explore the possibility of boosting deep 3D point cloud encoders by transferring visual knowledge extracted from deep 2D image encoders under a standard teacher-student distillation workflow. Generally, we propose PointMCD, a unified multi-view cross-modal distillation architecture, including a pretrained deep image encoder as the teacher and a deep point encoder as the student. To perform heterogeneous feature alignment between 2D visual and 3D geometric domains, we further investigate visibility-aware feature projection (VAFP), by which point-wise embeddings are reasonably aggregated into view-specific geometric descriptors. By pair-wisely aligning multi-view visual and geometric descriptors, we can obtain more powerful deep point encoders without exhausting and complicated network modification. Experiments on 3D shape classification, part segmentation, and unsupervised learning strongly validate the effectiveness of our method. The code and data will be publicly available at https://github.com/keeganhk/PointMCD

    Neural Representations of Visual Motion Processing in the Human Brain Using Laminar Imaging at 9.4 Tesla

    Get PDF
    During natural behavior, much of the motion signal falling into our eyes is due to our own movements. Therefore, in order to correctly perceive motion in our environment, it is important to parse visual motion signals into those caused by self-motion such as eye- or head-movements and those caused by external motion. Neural mechanisms underlying this task, which are also required to allow for a stable perception of the world during pursuit eye movements, are not fully understood. Both, perceptual stability as well as perception of real-world (i.e. objective) motion are the product of integration between motion signals on the retina and efference copies of eye movements. The central aim of this thesis is to examine whether different levels of cortical depth or distinct columnar structures of visual motion regions are differentially involved in disentangling signals related to self-motion, objective, or object motion. Based on previous studies reporting segregated populations of voxels in high level visual areas such as V3A, V6, and MST responding predominantly to either retinal or extra- retinal (‘real’) motion, we speculated such voxels to reside within laminar or columnar functional units. We used ultra-high field (9.4T) fMRI along with an experimental paradigm that independently manipulated retinal and extra-retinal motion signals (smooth pursuit) while controlling for effects of eye-movements, to investigate whether processing of real world motion in human V5/MT, putative MST (pMST), and V1 is associated to differential laminar signal intensities. We also examined motion integration across cortical depths in human motion areas V3A and V6 that have strong objective motion responses. We found a unique, condition specific laminar profile in human area V6, showing reduced mid-layer responses for retinal motion only, suggestive of an inhibitory retinal contribution to motion integration in mid layers or alternatively an excitatory contribution in deep and superficial layers. We also found evidence indicating that in V5/MT and pMST, processing related to retinal, objective, and pursuit motion are either integrated or colocalized at the scale of our resolution. In contrast, in V1, independent functional processes seem to be driving the response to retinal and objective motion on the one hand, and to pursuit signals on the other. The lack of differential signals across depth in these regions suggests either that a columnar rather than laminar segregation governs these functions in these areas, or that the methods used were unable to detect differential neural laminar processing. Furthermore, the thesis provides a thorough analysis of the relevant technical modalities used for data acquisition and data analysis at ultra-high field in the context of laminar fMRI. Relying on our technical implementations we were able to conduct two high-resolution fMRI experiments that helped us to further investigate the laminar organization of self-induced and externally induced motion cues in human high-level visual areas and to form speculations about the site and the mechanisms of their integration
    • …
    corecore