6 research outputs found

    Semantic and generic object segmentation for scene analysis using RGB-D Data

    Get PDF
    In this thesis, we study RGB-D based segmentation problems from different perspectives in terms of the input data. Apart from the basic photometric and geometric information contained in the RGB-D data, also semantic and temporal information are usually considered in an RGB-D based segmentation system. The first part of this thesis focuses on an RGB-D based semantic segmentation problem, where the predefined semantics and annotated training data are available. First, we review how RGB-D data has been exploited in the state of the art to help training classifiers in a semantic segmentation tasks. Inspired by these works, we follow a multi-task learning schema, where semantic segmentation and depth estimation are jointly tackled in a Convolutional Neural Network (CNN). Since semantic segmentation and depth estimation are two highly correlated tasks, approaching them jointly can be mutually beneficial. In this case, depth information along with the segmentation annotation in the training data helps better defining the target of the training process of the classifier, instead of feeding the system blindly with an extra input channel. We design a novel hybrid CNN architecture by investigating the common attributes as well as the distinction for depth estimation and semantic segmentation. The proposed architecture is tested and compared with state of the art approaches in different datasets. Although outstanding results are achieved in semantic segmentation, the limitations in these approaches are also obvious. Semantic segmentation strongly relies on predefined semantics and a large amount of annotated data, which may not be available in more general applications. On the other hand, classical image segmentation tackles the segmentation task in a more general way. But classical approaches hardly obtain object level segmentation due to the lack of higher level knowledge. Thus, in the second part of this thesis, we focus on an RGB-D based generic instance segmentation problem where temporal information is available from the RGB-D video while no semantic information is provided. We present a novel generic segmentation approach for 3D point cloud video (stream data) thoroughly exploiting the explicit geometry and temporal correspondences in RGB-D. The proposed approach is validated and compared with state of the art generic segmentation approaches in different datasets. Finally, in the third part of this thesis, we present a method which combines the advantages in both semantic segmentation and generic segmentation, where we discover object instances using the generic approach and model them by learning from the few discovered examples by applying the approach of semantic segmentation. To do so, we employ the one shot learning technique, which performs knowledge transfer from a generally trained model to a specific instance model. The learned instance models generate robust features in distinguishing different instances, which is fed to the generic segmentation approach to perform improved segmentation. The approach is validated with experiments conducted on a carefully selected dataset.En aquesta tesi, estudiem problemes de segmentaci贸 basats en RGB-D des de diferents perspectives pel que fa a les dades d'entrada. A part de la informaci贸 fotom猫trica i geom猫trica b脿sica que cont茅 les dades RGB-D, tamb茅 es considera normalment informaci贸 sem脿ntica i temporal en un sistema de segmentaci贸 basat en RGB-D. La primera part d'aquesta tesi se centra en un problema de segmentaci贸 sem脿ntica basat en RGB-D, on hi ha disponibles les dades sem脿ntiques predefinides i la informaci贸 d'entrenament anotada. En primer lloc, revisem com les dades RGB-D s'han explotat en l'estat de l'art per ajudar a entrenar classificadors en tasques de segmentaci贸 sem脿ntica. Inspirats en aquests treballs, seguim un esquema d'aprenentatge multidisciplinar, on la segmentaci贸 sem脿ntica i l'estimaci贸 de profunditat es tracten conjuntament en una Xarxa Neural Convolucional (CNN). At猫s que la segmentaci贸 sem脿ntica i l'estimaci贸 de profunditat s贸n dues tasques altament correlacionades, l'aproximaci贸 a les mateixes pot ser m煤tuament beneficiosa. En aquest cas, la informaci贸 de profunditat juntament amb l'anotaci贸 de segmentaci贸 en les dades d'entrenament ajuda a definir millor l'objectiu del proc茅s d'entrenament del classificador, en comptes d'alimentar el sistema cegament amb un canal d'entrada addicional. Dissenyem una nova arquitectura h铆brida CNN investigant els atributs comuns, aix铆 com la distinci贸 per a l'estimaci贸 de profunditat i la segmentaci贸 sem脿ntica. L'arquitectura proposada es prova i es compara amb l'estat de l'art en diferents conjunts de dades. Encara que s'obtenen resultats excel路lents en la segmentaci贸 sem脿ntica, les limitacions d'aquests enfocaments tamb茅 s贸n evidents. La segmentaci贸 sem脿ntica es recolza fortament en la sem脿ntica predefinida i una gran quantitat de dades anotades, que potser no estaran disponibles en aplicacions m茅s generals. D'altra banda, la segmentaci贸 d'imatge cl脿ssica aborda la tasca de segmentaci贸 d'una manera m茅s general. Per貌 els enfocaments cl脿ssics gaireb茅 no aconsegueixen la segmentaci贸 a nivell d'objectes a causa de la manca de coneixements de nivell superior. Aix铆, en la segona part d'aquesta tesi, ens centrem en un problema de segmentaci贸 d'inst脿ncies gen猫ric basat en RGB-D, on la informaci贸 temporal est脿 disponible a partir del v铆deo RGB-D, mentre que no es proporciona informaci贸 sem脿ntica. Presentem un nou enfocament gen猫ric de segmentaci贸 per a v铆deos de n煤vols de punts 3D explotant a fons la geometria expl铆cita i les correspond猫ncies temporals en RGB-D. L'enfocament proposat es valida i es compara amb enfocaments de segmentaci贸 gen猫rica de l'estat de l'art en diferents conjunts de dades. Finalment, en la tercera part d'aquesta tesi, presentem un m猫tode que combina els avantatges tant en la segmentaci贸 sem脿ntica com en la segmentaci贸 gen猫rica, on descobrim inst脿ncies de l'objecte utilitzant l'enfocament gen猫ric i les modelem mitjan莽ant l'aprenentatge dels pocs exemples descoberts aplicant l'enfocament de segmentaci贸 sem脿ntica. Per fer-ho, utilitzem la t猫cnica d'aprenentatge d'un tir, que realitza la transfer猫ncia de coneixement d'un model entrenat de forma gen猫rica a un model d'inst脿ncia espec铆fic. Els models apresos d'inst脿ncia generen funcions robustes per distingir diferents inst脿ncies, que alimenten la segmentaci贸 gen猫rica de segmentaci贸 per a la seva millora. L'enfocament es valida amb experiments realitzats en un conjunt de dades acuradament seleccionat

    Semantic and generic object segmentation for scene analysis using RGB-D Data

    Get PDF
    In this thesis, we study RGB-D based segmentation problems from different perspectives in terms of the input data. Apart from the basic photometric and geometric information contained in the RGB-D data, also semantic and temporal information are usually considered in an RGB-D based segmentation system. The first part of this thesis focuses on an RGB-D based semantic segmentation problem, where the predefined semantics and annotated training data are available. First, we review how RGB-D data has been exploited in the state of the art to help training classifiers in a semantic segmentation tasks. Inspired by these works, we follow a multi-task learning schema, where semantic segmentation and depth estimation are jointly tackled in a Convolutional Neural Network (CNN). Since semantic segmentation and depth estimation are two highly correlated tasks, approaching them jointly can be mutually beneficial. In this case, depth information along with the segmentation annotation in the training data helps better defining the target of the training process of the classifier, instead of feeding the system blindly with an extra input channel. We design a novel hybrid CNN architecture by investigating the common attributes as well as the distinction for depth estimation and semantic segmentation. The proposed architecture is tested and compared with state of the art approaches in different datasets. Although outstanding results are achieved in semantic segmentation, the limitations in these approaches are also obvious. Semantic segmentation strongly relies on predefined semantics and a large amount of annotated data, which may not be available in more general applications. On the other hand, classical image segmentation tackles the segmentation task in a more general way. But classical approaches hardly obtain object level segmentation due to the lack of higher level knowledge. Thus, in the second part of this thesis, we focus on an RGB-D based generic instance segmentation problem where temporal information is available from the RGB-D video while no semantic information is provided. We present a novel generic segmentation approach for 3D point cloud video (stream data) thoroughly exploiting the explicit geometry and temporal correspondences in RGB-D. The proposed approach is validated and compared with state of the art generic segmentation approaches in different datasets. Finally, in the third part of this thesis, we present a method which combines the advantages in both semantic segmentation and generic segmentation, where we discover object instances using the generic approach and model them by learning from the few discovered examples by applying the approach of semantic segmentation. To do so, we employ the one shot learning technique, which performs knowledge transfer from a generally trained model to a specific instance model. The learned instance models generate robust features in distinguishing different instances, which is fed to the generic segmentation approach to perform improved segmentation. The approach is validated with experiments conducted on a carefully selected dataset.En aquesta tesi, estudiem problemes de segmentaci贸 basats en RGB-D des de diferents perspectives pel que fa a les dades d'entrada. A part de la informaci贸 fotom猫trica i geom猫trica b脿sica que cont茅 les dades RGB-D, tamb茅 es considera normalment informaci贸 sem脿ntica i temporal en un sistema de segmentaci贸 basat en RGB-D. La primera part d'aquesta tesi se centra en un problema de segmentaci贸 sem脿ntica basat en RGB-D, on hi ha disponibles les dades sem脿ntiques predefinides i la informaci贸 d'entrenament anotada. En primer lloc, revisem com les dades RGB-D s'han explotat en l'estat de l'art per ajudar a entrenar classificadors en tasques de segmentaci贸 sem脿ntica. Inspirats en aquests treballs, seguim un esquema d'aprenentatge multidisciplinar, on la segmentaci贸 sem脿ntica i l'estimaci贸 de profunditat es tracten conjuntament en una Xarxa Neural Convolucional (CNN). At猫s que la segmentaci贸 sem脿ntica i l'estimaci贸 de profunditat s贸n dues tasques altament correlacionades, l'aproximaci贸 a les mateixes pot ser m煤tuament beneficiosa. En aquest cas, la informaci贸 de profunditat juntament amb l'anotaci贸 de segmentaci贸 en les dades d'entrenament ajuda a definir millor l'objectiu del proc茅s d'entrenament del classificador, en comptes d'alimentar el sistema cegament amb un canal d'entrada addicional. Dissenyem una nova arquitectura h铆brida CNN investigant els atributs comuns, aix铆 com la distinci贸 per a l'estimaci贸 de profunditat i la segmentaci贸 sem脿ntica. L'arquitectura proposada es prova i es compara amb l'estat de l'art en diferents conjunts de dades. Encara que s'obtenen resultats excel路lents en la segmentaci贸 sem脿ntica, les limitacions d'aquests enfocaments tamb茅 s贸n evidents. La segmentaci贸 sem脿ntica es recolza fortament en la sem脿ntica predefinida i una gran quantitat de dades anotades, que potser no estaran disponibles en aplicacions m茅s generals. D'altra banda, la segmentaci贸 d'imatge cl脿ssica aborda la tasca de segmentaci贸 d'una manera m茅s general. Per貌 els enfocaments cl脿ssics gaireb茅 no aconsegueixen la segmentaci贸 a nivell d'objectes a causa de la manca de coneixements de nivell superior. Aix铆, en la segona part d'aquesta tesi, ens centrem en un problema de segmentaci贸 d'inst脿ncies gen猫ric basat en RGB-D, on la informaci贸 temporal est脿 disponible a partir del v铆deo RGB-D, mentre que no es proporciona informaci贸 sem脿ntica. Presentem un nou enfocament gen猫ric de segmentaci贸 per a v铆deos de n煤vols de punts 3D explotant a fons la geometria expl铆cita i les correspond猫ncies temporals en RGB-D. L'enfocament proposat es valida i es compara amb enfocaments de segmentaci贸 gen猫rica de l'estat de l'art en diferents conjunts de dades. Finalment, en la tercera part d'aquesta tesi, presentem un m猫tode que combina els avantatges tant en la segmentaci贸 sem脿ntica com en la segmentaci贸 gen猫rica, on descobrim inst脿ncies de l'objecte utilitzant l'enfocament gen猫ric i les modelem mitjan莽ant l'aprenentatge dels pocs exemples descoberts aplicant l'enfocament de segmentaci贸 sem脿ntica. Per fer-ho, utilitzem la t猫cnica d'aprenentatge d'un tir, que realitza la transfer猫ncia de coneixement d'un model entrenat de forma gen猫rica a un model d'inst脿ncia espec铆fic. Els models apresos d'inst脿ncia generen funcions robustes per distingir diferents inst脿ncies, que alimenten la segmentaci贸 gen猫rica de segmentaci贸 per a la seva millora. L'enfocament es valida amb experiments realitzats en un conjunt de dades acuradament seleccionat.Postprint (published version

    The correlation between vehicle vertical dynamics and deep learning-based visual target state estimation:A sensitivity study

    Get PDF
    Automated vehicles will provide greater transport convenience and interconnectivity, increase mobility options to young and elderly people, and reduce traffic congestion and emissions. However, the largest obstacle towards the deployment of automated vehicles on public roads is their safety evaluation and validation. Undeniably, the role of cameras and Artificial Intelligence-based (AI) vision is vital in the perception of the driving environment and road safety. Although a significant number of studies on the detection and tracking of vehicles have been conducted, none of them focused on the role of vertical vehicle dynamics. For the first time, this paper analyzes and discusses the influence of road anomalies and vehicle suspension on the performance of detecting and tracking driving objects. To this end, we conducted an extensive road field study and validated a computational tool for performing the assessment using simulations. A parametric study revealed the cases where AI-based vision underperforms and may significantly degrade the safety performance of AV

    Analyzing and controlling large nanosystems with physics-trained neural networks

    Get PDF
    In dieser Arbeit wird untersucht, wie Neuronale Netze genutzt werden k枚nnen, um die Auswertung von Experimenten durch Minimierung des Simulationsaufwandes beschleunigen zu k枚nnen. F眉r die Rekonstruktion von Silber-Nanoclustern aus Einzelschuss-Weitwinkel-Streubildern k枚nnen diese bereits aus kleinen Daten盲tzen allgemeine Rekonstruktionsregeln ableiten und erm枚glichen durch direktes Training auf der Streuphysik unerreichte Detailtiefen. F眉r Giant-Dipole-Zust盲nde von Rydbergexzitonen in Kupferoxydul wird mittels Deep Reinforcement Learning ein Anregungsschema aus Simulationen hergeleitet.This thesis investigates the possible application of neural networks in accelerating the evaluation of physical experiments while minimizing the required simulation effort. Neural networks are capable of inferring universal reconstruction rules for reconstructing silver nanoclusters from single wide-angle scattering patterns from a small set of simulated data and when trained directly on scattering theory reaching unmatched accuracy. A dynamic excitation for giant dipole states of Rydberg excitons in cuprous oxide is derived through deep reinforcement learning interacting and simulation data

    Development of a real-time classifier for the identification of the Sit-To-Stand motion pattern

    Get PDF
    The Sit-to-Stand (STS) movement has significant importance in clinical practice, since it is an indicator of lower limb functionality. As an optimal trade-off between costs and accuracy, accelerometers have recently been used to synchronously recognise the STS transition in various Human Activity Recognition-based tasks. However, beyond the mere identification of the entire action, a major challenge remains the recognition of clinically relevant phases inside the STS motion pattern, due to the intrinsic variability of the movement. This work presents the development process of a deep-learning model aimed at recognising specific clinical valid phases in the STS, relying on a pool of 39 young and healthy participants performing the task under self-paced (SP) and controlled speed (CT). The movements were registered using a total of 6 inertial sensors, and the accelerometric data was labelised into four sequential STS phases according to the Ground Reaction Force profiles acquired through a force plate. The optimised architecture combined convolutional and recurrent neural networks into a hybrid approach and was able to correctly identify the four STS phases, both under SP and CT movements, relying on the single sensor placed on the chest. The overall accuracy estimate (median [95% confidence intervals]) for the hybrid architecture was 96.09 [95.37 - 96.56] in SP trials and 95.74 [95.39 \u2013 96.21] in CT trials. Moreover, the prediction delays ( 4533 ms) were compatible with the temporal characteristics of the dataset, sampled at 10 Hz (100 ms). These results support the implementation of the proposed model in the development of digital rehabilitation solutions able to synchronously recognise the STS movement pattern, with the aim of effectively evaluate and correct its execution

    Gaze-Based Human-Robot Interaction by the Brunswick Model

    Get PDF
    We present a new paradigm for human-robot interaction based on social signal processing, and in particular on the Brunswick model. Originally, the Brunswick model copes with face-to-face dyadic interaction, assuming that the interactants are communicating through a continuous exchange of non verbal social signals, in addition to the spoken messages. Social signals have to be interpreted, thanks to a proper recognition phase that considers visual and audio information. The Brunswick model allows to quantitatively evaluate the quality of the interaction using statistical tools which measure how effective is the recognition phase. In this paper we cast this theory when one of the interactants is a robot; in this case, the recognition phase performed by the robot and the human have to be revised w.r.t. the original model. The model is applied to Berrick, a recent open-source low-cost robotic head platform, where the gazing is the social signal to be considered
    corecore