40 research outputs found

    An intelligent multi-floor mobile robot transportation system in life science laboratories

    Get PDF
    In this dissertation, a new intelligent multi-floor transportation system based on mobile robot is presented to connect the distributed laboratories in multi-floor environment. In the system, new indoor mapping and localization are presented, hybrid path planning is proposed, and an automated doors management system is presented. In addition, a hybrid strategy with innovative floor estimation to handle the elevator operations is implemented. Finally the presented system controls the working processes of the related sub-system. The experiments prove the efficiency of the presented system

    Human robot interaction in a crowded environment

    No full text
    Human Robot Interaction (HRI) is the primary means of establishing natural and affective communication between humans and robots. HRI enables robots to act in a way similar to humans in order to assist in activities that are considered to be laborious, unsafe, or repetitive. Vision based human robot interaction is a major component of HRI, with which visual information is used to interpret how human interaction takes place. Common tasks of HRI include finding pre-trained static or dynamic gestures in an image, which involves localising different key parts of the human body such as the face and hands. This information is subsequently used to extract different gestures. After the initial detection process, the robot is required to comprehend the underlying meaning of these gestures [3]. Thus far, most gesture recognition systems can only detect gestures and identify a person in relatively static environments. This is not realistic for practical applications as difficulties may arise from people‟s movements and changing illumination conditions. Another issue to consider is that of identifying the commanding person in a crowded scene, which is important for interpreting the navigation commands. To this end, it is necessary to associate the gesture to the correct person and automatic reasoning is required to extract the most probable location of the person who has initiated the gesture. In this thesis, we have proposed a practical framework for addressing the above issues. It attempts to achieve a coarse level understanding about a given environment before engaging in active communication. This includes recognizing human robot interaction, where a person has the intention to communicate with the robot. In this regard, it is necessary to differentiate if people present are engaged with each other or their surrounding environment. The basic task is to detect and reason about the environmental context and different interactions so as to respond accordingly. For example, if individuals are engaged in conversation, the robot should realize it is best not to disturb or, if an individual is receptive to the robot‟s interaction, it may approach the person. Finally, if the user is moving in the environment, it can analyse further to understand if any help can be offered in assisting this user. The method proposed in this thesis combines multiple visual cues in a Bayesian framework to identify people in a scene and determine potential intentions. For improving system performance, contextual feedback is used, which allows the Bayesian network to evolve and adjust itself according to the surrounding environment. The results achieved demonstrate the effectiveness of the technique in dealing with human-robot interaction in a relatively crowded environment [7]

    Medical SLAM in an autonomous robotic system

    Get PDF
    One of the main challenges for computer-assisted surgery (CAS) is to determine the intra-operative morphology and motion of soft-tissues. This information is prerequisite to the registration of multi-modal patient-specific data for enhancing the surgeon’s navigation capabilities by observing beyond exposed tissue surfaces and for providing intelligent control of robotic-assisted instruments. In minimally invasive surgery (MIS), optical techniques are an increasingly attractive approach for in vivo 3D reconstruction of the soft-tissue surface geometry. This thesis addresses the ambitious goal of achieving surgical autonomy, through the study of the anatomical environment by Initially studying the technology present and what is needed to analyze the scene: vision sensors. A novel endoscope for autonomous surgical task execution is presented in the first part of this thesis. Which combines a standard stereo camera with a depth sensor. This solution introduces several key advantages, such as the possibility of reconstructing the 3D at a greater distance than traditional endoscopes. Then the problem of hand-eye calibration is tackled, which unites the vision system and the robot in a single reference system. Increasing the accuracy in the surgical work plan. In the second part of the thesis the problem of the 3D reconstruction and the algorithms currently in use were addressed. In MIS, simultaneous localization and mapping (SLAM) can be used to localize the pose of the endoscopic camera and build ta 3D model of the tissue surface. Another key element for MIS is to have real-time knowledge of the pose of surgical tools with respect to the surgical camera and underlying anatomy. Starting from the ORB-SLAM algorithm we have modified the architecture to make it usable in an anatomical environment by adding the registration of the pre-operative information of the intervention to the map obtained from the SLAM. Once it has been proven that the slam algorithm is usable in an anatomical environment, it has been improved by adding semantic segmentation to be able to distinguish dynamic features from static ones. All the results in this thesis are validated on training setups, which mimics some of the challenges of real surgery and on setups that simulate the human body within Autonomous Robotic Surgery (ARS) and Smart Autonomous Robotic Assistant Surgeon (SARAS) projects

    Perception and intelligent localization for autonomous driving

    Get PDF
    Mestrado em Engenharia de Computadores e TelemáticaVisão por computador e fusão sensorial são temas relativamente recentes, no entanto largamente adoptados no desenvolvimento de robôs autónomos que exigem adaptabilidade ao seu ambiente envolvente. Esta dissertação foca-se numa abordagem a estes dois temas para alcançar percepção no contexto de condução autónoma. O uso de câmaras para atingir este fim é um processo bastante complexo. Ao contrário dos meios sensoriais clássicos que fornecem sempre o mesmo tipo de informação precisa e atingida de forma determinística, as sucessivas imagens adquiridas por uma câmara estão repletas da mais variada informação e toda esta ambígua e extremamente difícil de extrair. A utilização de câmaras como meio sensorial em robótica é o mais próximo que chegamos na semelhança com aquele que é o de maior importância no processo de percepção humana, o sistema de visão. Visão por computador é uma disciplina científica que engloba àreas como: processamento de sinal, inteligência artificial, matemática, teoria de controlo, neurobiologia e física. A plataforma de suporte ao estudo desenvolvido no âmbito desta dissertação é o ROTA (RObô Triciclo Autónomo) e todos os elementos que consistem o seu ambiente. No contexto deste, são descritas abordagens que foram introduzidas com fim de desenvolver soluções para todos os desafios que o robô enfrenta no seu ambiente: detecção de linhas de estrada e consequente percepção desta, detecção de obstáculos, semáforos, zona da passadeira e zona de obras. É também descrito um sistema de calibração e aplicação da remoção da perspectiva da imagem, desenvolvido de modo a mapear os elementos percepcionados em distâncias reais. Em consequência do sistema de percepção, é ainda abordado o desenvolvimento de auto-localização integrado numa arquitectura distribuída incluindo navegação com planeamento inteligente. Todo o trabalho desenvolvido no decurso da dissertação é essencialmente centrado no desenvolvimento de percepção robótica no contexto de condução autónoma.Computer vision and sensor fusion are subjects that are quite recent, however widely adopted in the development of autonomous robots that require adaptability to their surrounding environment. This thesis gives an approach on both in order to achieve perception in the scope of autonomous driving. The use of cameras to achieve this goal is a rather complex subject. Unlike the classic sensorial devices that provide the same type of information with precision and achieve this in a deterministic way, the successive images acquired by a camera are replete with the most varied information, that this ambiguous and extremely dificult to extract. The use of cameras for robotic sensing is the closest we got within the similarities with what is of most importance in the process of human perception, the vision system. Computer vision is a scientific discipline that encompasses areas such as signal processing, artificial intelligence, mathematics, control theory, neurobiology and physics. The support platform in which the study within this thesis was developed, includes ROTA (RObô Triciclo Autónomo) and all elements comprising its environment. In its context, are described approaches that introduced in the platform in order to develop solutions for all the challenges facing the robot in its environment: detection of lane markings and its consequent perception, obstacle detection, trafic lights, crosswalk and road maintenance area. It is also described a calibration system and implementation for the removal of the image perspective, developed in order to map the elements perceived in actual real world distances. As a result of the perception system development, it is also addressed self-localization integrated in a distributed architecture that allows navigation with long term planning. All the work developed in the course of this work is essentially focused on robotic perception in the context of autonomous driving

    Medical SLAM in an autonomous robotic system

    Get PDF
    One of the main challenges for computer-assisted surgery (CAS) is to determine the intra-operative morphology and motion of soft-tissues. This information is prerequisite to the registration of multi-modal patient-specific data for enhancing the surgeon’s navigation capabilities by observing beyond exposed tissue surfaces and for providing intelligent control of robotic-assisted instruments. In minimally invasive surgery (MIS), optical techniques are an increasingly attractive approach for in vivo 3D reconstruction of the soft-tissue surface geometry. This thesis addresses the ambitious goal of achieving surgical autonomy, through the study of the anatomical environment by Initially studying the technology present and what is needed to analyze the scene: vision sensors. A novel endoscope for autonomous surgical task execution is presented in the first part of this thesis. Which combines a standard stereo camera with a depth sensor. This solution introduces several key advantages, such as the possibility of reconstructing the 3D at a greater distance than traditional endoscopes. Then the problem of hand-eye calibration is tackled, which unites the vision system and the robot in a single reference system. Increasing the accuracy in the surgical work plan. In the second part of the thesis the problem of the 3D reconstruction and the algorithms currently in use were addressed. In MIS, simultaneous localization and mapping (SLAM) can be used to localize the pose of the endoscopic camera and build ta 3D model of the tissue surface. Another key element for MIS is to have real-time knowledge of the pose of surgical tools with respect to the surgical camera and underlying anatomy. Starting from the ORB-SLAM algorithm we have modified the architecture to make it usable in an anatomical environment by adding the registration of the pre-operative information of the intervention to the map obtained from the SLAM. Once it has been proven that the slam algorithm is usable in an anatomical environment, it has been improved by adding semantic segmentation to be able to distinguish dynamic features from static ones. All the results in this thesis are validated on training setups, which mimics some of the challenges of real surgery and on setups that simulate the human body within Autonomous Robotic Surgery (ARS) and Smart Autonomous Robotic Assistant Surgeon (SARAS) projects

    Real-time object detection using monocular vision for low-cost automotive sensing systems

    Get PDF
    This work addresses the problem of real-time object detection in automotive environments using monocular vision. The focus is on real-time feature detection, tracking, depth estimation using monocular vision and finally, object detection by fusing visual saliency and depth information. Firstly, a novel feature detection approach is proposed for extracting stable and dense features even in images with very low signal-to-noise ratio. This methodology is based on image gradients, which are redefined to take account of noise as part of their mathematical model. Each gradient is based on a vector connecting a negative to a positive intensity centroid, where both centroids are symmetric about the centre of the area for which the gradient is calculated. Multiple gradient vectors define a feature with its strength being proportional to the underlying gradient vector magnitude. The evaluation of the Dense Gradient Features (DeGraF) shows superior performance over other contemporary detectors in terms of keypoint density, tracking accuracy, illumination invariance, rotation invariance, noise resistance and detection time. The DeGraF features form the basis for two new approaches that perform dense 3D reconstruction from a single vehicle-mounted camera. The first approach tracks DeGraF features in real-time while performing image stabilisation with minimal computational cost. This means that despite camera vibration the algorithm can accurately predict the real-world coordinates of each image pixel in real-time by comparing each motion-vector to the ego-motion vector of the vehicle. The performance of this approach has been compared to different 3D reconstruction methods in order to determine their accuracy, depth-map density, noise-resistance and computational complexity. The second approach proposes the use of local frequency analysis of i ii gradient features for estimating relative depth. This novel method is based on the fact that DeGraF gradients can accurately measure local image variance with subpixel accuracy. It is shown that the local frequency by which the centroid oscillates around the gradient window centre is proportional to the depth of each gradient centroid in the real world. The lower computational complexity of this methodology comes at the expense of depth map accuracy as the camera velocity increases, but it is at least five times faster than the other evaluated approaches. This work also proposes a novel technique for deriving visual saliency maps by using Division of Gaussians (DIVoG). In this context, saliency maps express the difference of each image pixel is to its surrounding pixels across multiple pyramid levels. This approach is shown to be both fast and accurate when evaluated against other state-of-the-art approaches. Subsequently, the saliency information is combined with depth information to identify salient regions close to the host vehicle. The fused map allows faster detection of high-risk areas where obstacles are likely to exist. As a result, existing object detection algorithms, such as the Histogram of Oriented Gradients (HOG) can execute at least five times faster. In conclusion, through a step-wise approach computationally-expensive algorithms have been optimised or replaced by novel methodologies to produce a fast object detection system that is aligned to the requirements of the automotive domain

    Autonomisten metsäkoneiden koneaistijärjestelmät

    Get PDF
    A prerequisite for increasing the autonomy of forest machinery is to provide robots with digital situational awareness, including a representation of the surrounding environment and the robot's own state in it. Therefore, this article-based dissertation proposes perception systems for autonomous or semi-autonomous forest machinery as a summary of seven publications. The work consists of several perception methods using machine vision, lidar, inertial sensors, and positioning sensors. The sensors are used together by means of probabilistic sensor fusion. Semi-autonomy is interpreted as a useful intermediary step, situated between current mechanized solutions and full autonomy, to assist the operator. In this work, the perception of the robot's self is achieved through estimation of its orientation and position in the world, the posture of its crane, and the pose of the attached tool. The view around the forest machine is produced with a rotating lidar, which provides approximately equal-density 3D measurements in all directions. Furthermore, a machine vision camera is used for detecting young trees among other vegetation, and sensor fusion of an actuated lidar and machine vision camera is utilized for detection and classification of tree species. In addition, in an operator-controlled semi-autonomous system, the operator requires a functional view of the data around the robot. To achieve this, the thesis proposes the use of an augmented reality interface, which requires measuring the pose of the operator's head-mounted display in the forest machine cabin. Here, this work adopts a sensor fusion solution for a head-mounted camera and inertial sensors. In order to increase the level of automation and productivity of forest machines, the work focuses on scientifically novel solutions that are also adaptable for industrial use in forest machinery. Therefore, all the proposed perception methods seek to address a real existing problem within current forest machinery. All the proposed solutions are implemented in a prototype forest machine and field tested in a forest. The proposed methods include posture measurement of a forestry crane, positioning of a freely hanging forestry crane attachment, attitude estimation of an all-terrain vehicle, positioning a head mounted camera in a forest machine cabin, detection of young trees for point cleaning, classification of tree species, and measurement of surrounding tree stems and the ground surface underneath.Metsäkoneiden autonomia-asteen kasvattaminen edellyttää, että robotilla on digitaalinen tilannetieto sekä ympäristöstä että robotin omasta toiminnasta. Tämän saavuttamiseksi työssä on kehitetty autonomisen tai puoliautonomisen metsäkoneen koneaistijärjestelmiä, jotka hyödyntävät konenäkö-, laserkeilaus- ja inertia-antureita sekä paikannusantureita. Työ liittää yhteen seitsemässä artikkelissa toteutetut havainnointimenetelmät, joissa useiden anturien mittauksia yhdistetään sensorifuusiomenetelmillä. Työssä puoliautonomialla tarkoitetaan hyödyllisiä kuljettajaa avustavia välivaiheita nykyisten mekanisoitujen ratkaisujen ja täyden autonomian välillä. Työssä esitettävissä autonomisen metsäkoneen koneaistijärjestelmissä koneen omaa toimintaa havainnoidaan estimoimalla koneen asentoa ja sijaintia, nosturin asentoa sekä siihen liitetyn työkalun asentoa suhteessa ympäristöön. Yleisnäkymä metsäkoneen ympärille toteutetaan pyörivällä laserkeilaimella, joka tuottaa lähes vakiotiheyksisiä 3D-mittauksia jokasuuntaisesti koneen ympäristöstä. Nuoret puut tunnistetaan muun kasvillisuuden joukosta käyttäen konenäkökameraa. Lisäksi puiden tunnistamisessa ja puulajien luokittelussa käytetään konenäkökameraa ja laserkeilainta yhdessä sensorifuusioratkaisun avulla. Lisäksi kuljettajan ohjaamassa puoliautonomisessa järjestelmässä kuljettaja tarvitsee toimivan tavan ymmärtää koneen tuottaman mallin ympäristöstä. Työssä tämä ehdotetaan toteutettavaksi lisätyn todellisuuden käyttöliittymän avulla, joka edellyttää metsäkoneen ohjaamossa istuvan kuljettajan lisätyn todellisuuden lasien paikan ja asennon mittaamista. Työssä se toteutetaan kypärään asennetun kameran ja inertia-anturien sensorifuusiona. Jotta metsäkoneiden automatisaatiotasoa ja tuottavuutta voidaan lisätä, työssä keskitytään uusiin tieteellisiin ratkaisuihin, jotka soveltuvat teolliseen käyttöön metsäkoneissa. Kaikki esitetyt koneaistijärjestelmät pyrkivät vastaamaan todelliseen olemassa olevaan tarpeeseen nykyisten metsäkoneiden käytössä. Siksi kaikki menetelmät on implementoitu prototyyppimetsäkoneisiin ja tulokset on testattu metsäympäristössä. Työssä esitetyt menetelmät mahdollistavat metsäkoneen nosturin, vapaasti riippuvan työkalun ja ajoneuvon asennon estimoinnin, lisätyn todellisuuden lasien asennon mittaamisen metsäkoneen ohjaamossa, nuorten puiden havaitsemisen reikäperkauksessa, ympäröivien puiden puulajien tunnistuksen, sekä puun runkojen ja maanpinnan mittauksen

    Advancements in multi-view processing for reconstruction, registration and visualization.

    Get PDF
    The ever-increasing diffusion of digital cameras and the advancements in computer vision, image processing and storage capabilities have lead, in the latest years, to the wide diffusion of digital image collections. A set of digital images is usually referred as a multi-view images set when the pictures cover different views of the same physical object or location. In multi-view datasets, correlations between images are exploited in many different ways to increase our capability to gather enhanced understanding and information on a scene. For example, a collection can be enhanced leveraging on the camera position and orientation, or with information about the 3D structure of the scene. The range of applications of multi-view data is really wide, encompassing diverse fields such as image-based reconstruction, image-based localization, navigation of virtual environments, collective photographic retouching, computational photography, object recognition, etc. For all these reasons, the development of new algorithms to effectively create, process, and visualize this type of data is an active research trend. The thesis will present four different advancements related to different aspects of the multi-view data processing: - Image-based 3D reconstruction: we present a pre-processing algorithm, that is a special color-to-gray conversion. This was developed with the aim to improve the accuracy of image-based reconstruction algorithms. In particular, we show how different dense stereo matching results can be enhanced by application of a domain separation approach that pre-computes a single optimized numerical value for each image location. - Image-based appearance reconstruction: we present a multi-view processing algorithm, this can enhance the quality of the color transfer from multi-view images to a geo-referenced 3D model of a location of interest. The proposed approach computes virtual shadows and allows to automatically segment shadowed regions from the input images preventing to use those pixels in subsequent texture synthesis. - 2D to 3D registration: we present an unsupervised localization and registration system. This system can recognize a site that has been framed in a multi-view data and calibrate it on a pre-existing 3D representation. The system has a very high accuracy and it can validate the result in a completely unsupervised manner. The system accuracy is enough to seamlessly view input images correctly super-imposed on the 3D location of interest. - Visualization: we present PhotoCloud, a real-time client-server system for interactive exploration of high resolution 3D models and up to several thousand photographs aligned over this 3D data. PhotoCloud supports any 3D models that can be rendered in a depth-coherent way and arbitrary multi-view image collections. Moreover, it tolerates 2D-to-2D and 2D-to-3D misalignments, and it provides scalable visualization of generic integrated 2D and 3D datasets by exploiting data duality. A set of effective 3D navigation controls, tightly integrated with innovative thumbnail bars, enhances the user navigation. These advancements have been developed in tourism and cultural heritage application contexts, but they are not limited to these

    Fruit Detection and Tree Segmentation for Yield Mapping in Orchards

    Get PDF
    Accurate information gathering and processing is critical for precision horticulture, as growers aim to optimise their farm management practices. An accurate inventory of the crop that details its spatial distribution along with health and maturity, can help farmers efficiently target processes such as chemical and fertiliser spraying, crop thinning, harvest management, labour planning and marketing. Growers have traditionally obtained this information by using manual sampling techniques, which tend to be labour intensive, spatially sparse, expensive, inaccurate and prone to subjective biases. Recent advances in sensing and automation for field robotics allow for key measurements to be made for individual plants throughout an orchard in a timely and accurate manner. Farmer operated machines or unmanned robotic platforms can be equipped with a range of sensors to capture a detailed representation over large areas. Robust and accurate data processing techniques are therefore required to extract high level information needed by the grower to support precision farming. This thesis focuses on yield mapping in orchards using image and light detection and ranging (LiDAR) data captured using an unmanned ground vehicle (UGV). The contribution is the framework and algorithmic components for orchard mapping and yield estimation that is applicable to different fruit types and orchard configurations. The framework includes detection of fruits in individual images and tracking them over subsequent frames. The fruit counts are then associated to individual trees, which are segmented from image and LiDAR data, resulting in a structured spatial representation of yield. The first contribution of this thesis is the development of a generic and robust fruit detection algorithm. Images captured in the outdoor environment are susceptible to highly variable external factors that lead to significant appearance variations. Specifically in orchards, variability is caused by changes in illumination, target pose, tree types, etc. The proposed techniques address these issues by using state-of-the-art feature learning approaches for image classification, while investigating the utility of orchard domain knowledge for fruit detection. Detection is performed using both pixel-wise classification of images followed instance segmentation, and bounding-box regression approaches. The experimental results illustrate the versatility of complex deep learning approaches over a multitude of fruit types. The second contribution of this thesis is a tree segmentation approach to detect the individual trees that serve as a standard unit for structured orchard information systems. The work focuses on trellised trees, which present unique challenges for segmentation algorithms due to their intertwined nature. LiDAR data are used to segment the trellis face, and to generate proposals for individual trees trunks. Additional trunk proposals are provided using pixel-wise classification of the image data. The multi-modal observations are fine-tuned by modelling trunk locations using a hidden semi-Markov model (HSMM), within which prior knowledge of tree spacing is incorporated. The final component of this thesis addresses the visual occlusion of fruit within geometrically complex canopies by using a multi-view detection and tracking approach. Single image fruit detections are tracked over a sequence of images, and associated to individual trees or farm rows, with the spatial distribution of the fruit counting forming a yield map over the farm. The results show the advantage of using multi-view imagery (instead of single view analysis) for fruit counting and yield mapping. This thesis includes extensive experimentation in almond, apple and mango orchards, with data captured by a UGV spanning a total of 5 hectares of farm area, over 30 km of vehicle traversal and more than 7,000 trees. The validation of the different processes is performed using manual annotations, which includes fruit and tree locations in image and LiDAR data respectively. Additional evaluation of yield mapping is performed by comparison against fruit counts on trees at the farm and counts made by the growers post-harvest. The framework developed in this thesis is demonstrated to be accurate compared to ground truth at all scales of the pipeline, including fruit detection and tree mapping, leading to accurate yield estimation, per tree and per row, for the different crops. Through the multitude of field experiments conducted over multiple seasons and years, the thesis presents key practical insights necessary for commercial development of an information gathering system in orchards
    corecore