76 research outputs found

    Human robot interaction in a crowded environment

    No full text
    Human Robot Interaction (HRI) is the primary means of establishing natural and affective communication between humans and robots. HRI enables robots to act in a way similar to humans in order to assist in activities that are considered to be laborious, unsafe, or repetitive. Vision based human robot interaction is a major component of HRI, with which visual information is used to interpret how human interaction takes place. Common tasks of HRI include finding pre-trained static or dynamic gestures in an image, which involves localising different key parts of the human body such as the face and hands. This information is subsequently used to extract different gestures. After the initial detection process, the robot is required to comprehend the underlying meaning of these gestures [3]. Thus far, most gesture recognition systems can only detect gestures and identify a person in relatively static environments. This is not realistic for practical applications as difficulties may arise from people‟s movements and changing illumination conditions. Another issue to consider is that of identifying the commanding person in a crowded scene, which is important for interpreting the navigation commands. To this end, it is necessary to associate the gesture to the correct person and automatic reasoning is required to extract the most probable location of the person who has initiated the gesture. In this thesis, we have proposed a practical framework for addressing the above issues. It attempts to achieve a coarse level understanding about a given environment before engaging in active communication. This includes recognizing human robot interaction, where a person has the intention to communicate with the robot. In this regard, it is necessary to differentiate if people present are engaged with each other or their surrounding environment. The basic task is to detect and reason about the environmental context and different interactions so as to respond accordingly. For example, if individuals are engaged in conversation, the robot should realize it is best not to disturb or, if an individual is receptive to the robot‟s interaction, it may approach the person. Finally, if the user is moving in the environment, it can analyse further to understand if any help can be offered in assisting this user. The method proposed in this thesis combines multiple visual cues in a Bayesian framework to identify people in a scene and determine potential intentions. For improving system performance, contextual feedback is used, which allows the Bayesian network to evolve and adjust itself according to the surrounding environment. The results achieved demonstrate the effectiveness of the technique in dealing with human-robot interaction in a relatively crowded environment [7]

    Statistical and Dynamical Modeling of Riemannian Trajectories with Application to Human Movement Analysis

    Get PDF
    abstract: The data explosion in the past decade is in part due to the widespread use of rich sensors that measure various physical phenomenon -- gyroscopes that measure orientation in phones and fitness devices, the Microsoft Kinect which measures depth information, etc. A typical application requires inferring the underlying physical phenomenon from data, which is done using machine learning. A fundamental assumption in training models is that the data is Euclidean, i.e. the metric is the standard Euclidean distance governed by the L-2 norm. However in many cases this assumption is violated, when the data lies on non Euclidean spaces such as Riemannian manifolds. While the underlying geometry accounts for the non-linearity, accurate analysis of human activity also requires temporal information to be taken into account. Human movement has a natural interpretation as a trajectory on the underlying feature manifold, as it evolves smoothly in time. A commonly occurring theme in many emerging problems is the need to \emph{represent, compare, and manipulate} such trajectories in a manner that respects the geometric constraints. This dissertation is a comprehensive treatise on modeling Riemannian trajectories to understand and exploit their statistical and dynamical properties. Such properties allow us to formulate novel representations for Riemannian trajectories. For example, the physical constraints on human movement are rarely considered, which results in an unnecessarily large space of features, making search, classification and other applications more complicated. Exploiting statistical properties can help us understand the \emph{true} space of such trajectories. In applications such as stroke rehabilitation where there is a need to differentiate between very similar kinds of movement, dynamical properties can be much more effective. In this regard, we propose a generalization to the Lyapunov exponent to Riemannian manifolds and show its effectiveness for human activity analysis. The theory developed in this thesis naturally leads to several benefits in areas such as data mining, compression, dimensionality reduction, classification, and regression.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201

    Deep learning for time series classification

    Full text link
    Time series analysis is a field of data science which is interested in analyzing sequences of numerical values ordered in time. Time series are particularly interesting because they allow us to visualize and understand the evolution of a process over time. Their analysis can reveal trends, relationships and similarities across the data. There exists numerous fields containing data in the form of time series: health care (electrocardiogram, blood sugar, etc.), activity recognition, remote sensing, finance (stock market price), industry (sensors), etc. Time series classification consists of constructing algorithms dedicated to automatically label time series data. The sequential aspect of time series data requires the development of algorithms that are able to harness this temporal property, thus making the existing off-the-shelf machine learning models for traditional tabular data suboptimal for solving the underlying task. In this context, deep learning has emerged in recent years as one of the most effective methods for tackling the supervised classification task, particularly in the field of computer vision. The main objective of this thesis was to study and develop deep neural networks specifically constructed for the classification of time series data. We thus carried out the first large scale experimental study allowing us to compare the existing deep methods and to position them compared other non-deep learning based state-of-the-art methods. Subsequently, we made numerous contributions in this area, notably in the context of transfer learning, data augmentation, ensembling and adversarial attacks. Finally, we have also proposed a novel architecture, based on the famous Inception network (Google), which ranks among the most efficient to date.Comment: PhD thesi

    Prosody and Kinesics Based Co-analysis Towards Continuous Gesture Recognition

    Get PDF
    The aim of this study is to develop a multimodal co-analysis framework for continuous gesture recognition by exploiting prosodic and kinesics manifestation of natural communication. Using this framework, a co-analysis pattern between correlating components is obtained. The co-analysis pattern is clustered using K-means clustering to determine how well the pattern distinguishes the gestures. Features of the proposed approach that differentiate it from the other models are its less susceptibility to idiosyncrasies, its scalability, and simplicity. The experiment was performed on Multimodal Annotated Gesture Corpus (MAGEC) that we created for research on understanding non-verbal communication community, particularly the gestures

    Learning of Surgical Gestures for Robotic Minimally Invasive Surgery Using Dynamic Movement Primitives and Latent Variable Models

    Get PDF
    Full and partial automation of Robotic Minimally Invasive Surgery holds significant promise to improve patient treatment, reduce recovery time, and reduce the fatigue of the surgeons. However, to accomplish this ambitious goal, a mathematical model of the intervention is needed. In this thesis, we propose to use Dynamic Movement Primitives (DMPs) to encode the gestures a surgeon has to perform to achieve a task. DMPs allow to learn a trajectory, thus imitating the dexterity of the surgeon, and to execute it while allowing to generalize it both spatially (to new starting and goal positions) and temporally (to different speeds of executions). Moreover, they have other desirable properties that make them well suited for surgical applications, such as online adaptability, robustness to perturbations, and the possibility to implement obstacle avoidance. We propose various modifications to improve the state-of-the-art of the framework, as well as novel methods to handle obstacles. Moreover, we validate the usage of DMPs to model gestures by automating a surgical-related task and using DMPs as the low-level trajectory generator. In the second part of the thesis, we introduce the problem of unsupervised segmentation of tasks' execution in gestures. We will introduce latent variable models to tackle the problem, proposing further developments to combine such models with the DMP theory. We will review the Auto-Regressive Hidden Markov Model (AR-HMM) and test it on surgical-related datasets. Then, we will propose a generalization of the AR-HMM to general, non-linear, dynamics, showing that this results in a more accurate segmentation, with a less severe over-segmentation. Finally, we propose a further generalization of the AR-HMM that aims at integrating a DMP-like dynamic into the latent variable model

    Determining normal and abnormal lip shapes during movement for use as a surgical outcome measure

    Get PDF
    Craniofacial assessment for diagnosis, treatment planning and outcome has traditionally relied on imaging techniques that provide a static image of the facial structure. Objective measures of facial movement are however becoming increasingly important for clinical interventions where surgical repositioning of facial structures can influence soft tissue mobility. These applications include the management of patients with cleft lip, facial nerve palsy and orthognathic surgery. Although technological advances in medical imaging have now enabled three-dimensional (3D) motion scanners to become commercially available their clinical application to date has been limited. Therefore, the aim of this study is to determine normal and abnormal lip shapes during movement for use as a clinical outcome measure using such a scanner. Lip movements were captured from an average population using a 3D motion scanner. Consideration was given to the type of facial movement captured (i.e. verbal or non-verbal) and also the method of feature extraction (i.e. manual or semi-automatic landmarking). Statistical models of appearance (Active Shape Models) were used to convert the video motion sequences into linear data and identify reproducible facial movements via pattern recognition. Average templates of lip movement were created based on the most reproducible lip movements using Geometric Morphometrics (GMM) incorporating Generalised Procrustes Analysis (GPA) and Principal Component Analysis (PCA). Finally lip movement data from a patient group undergoing orthognathic surgery was incorporated into the model and Discriminant Analysis (DA) employed in an attempt to statistically distinguish abnormal lip movement. The results showed that manual landmarking was the preferred method of feature extraction. Verbal facial gestures (i.e. words) were significantly more reproducible/repeatable over time when compared to non-verbal gestures (i.e. facial expressions). It was possible to create average templates of lip movement from the control group, which acted as an outcome measure, and from which abnormalities in movement could be discriminated pre-surgery. These abnormalities were found to normalise post-surgery. The concepts of this study form the basis of analysing facial movement in the clinical context. The methods are transferrable to other patient groups. Specifically, patients undergoing orthognathic surgery have differences in lip shape/movement when compared to an average population. Correcting the position of the basal bones in this group of patients appears to normalise lip mobility

    Hand eye coordination in surgery

    Get PDF
    The coordination of the hand in response to visual target selection has always been regarded as an essential quality in a range of professional activities. This quality has thus far been elusive to objective scientific measurements, and is usually engulfed in the overall performance of the individuals. Parallels can be drawn to surgery, especially Minimally Invasive Surgery (MIS), where the physical constraints imposed by the arrangements of the instruments and visualisation methods require certain coordination skills that are unprecedented. With the current paradigm shift towards early specialisation in surgical training and shortened focused training time, selection process should identify trainees with the highest potentials in certain specific skills. Although significant effort has been made in objective assessment of surgical skills, it is only currently possible to measure surgeons’ abilities at the time of assessment. It has been particularly difficult to quantify specific details of hand-eye coordination and assess innate ability of future skills development. The purpose of this thesis is to examine hand-eye coordination in laboratory-based simulations, with a particular emphasis on details that are important to MIS. In order to understand the challenges of visuomotor coordination, movement trajectory errors have been used to provide an insight into the innate coordinate mapping of the brain. In MIS, novel spatial transformations, due to a combination of distorted endoscopic image projections and the “fulcrum” effect of the instruments, accentuate movement generation errors. Obvious differences in the quality of movement trajectories have been observed between novices and experts in MIS, however, this is difficult to measure quantitatively. A Hidden Markov Model (HMM) is used in this thesis to reveal the underlying characteristic movement details of a particular MIS manoeuvre and how such features are exaggerated by the introduction of rotation in the endoscopic camera. The proposed method has demonstrated the feasibility of measuring movement trajectory quality by machine learning techniques without prior arbitrary classification of expertise. Experimental results have highlighted these changes in novice laparoscopic surgeons, even after a short period of training. The intricate relationship between the hands and the eyes changes when learning a skilled visuomotor task has been previously studied. Reactive eye movement, when visual input is used primarily as a feedback mechanism for error correction, implies difficulties in hand-eye coordination. As the brain learns to adapt to this new coordinate map, eye movements then become predictive of the action generated. The concept of measuring this spatiotemporal relationship is introduced as a measure of hand-eye coordination in MIS, by comparing the Target Distance Function (TDF) between the eye fixation and the instrument tip position on the laparoscopic screen. Further validation of this concept using high fidelity experimental tasks is presented, where higher cognitive influence and multiple target selection increase the complexity of the data analysis. To this end, Granger-causality is presented as a measure of the predictability of the instrument movement with the eye fixation pattern. Partial Directed Coherence (PDC), a frequency-domain variation of Granger-causality, is used for the first time to measure hand-eye coordination. Experimental results are used to establish the strengths and potential pitfalls of the technique. To further enhance the accuracy of this measurement, a modified Jensen-Shannon Divergence (JSD) measure has been developed for enhancing the signal matching algorithm and trajectory segmentations. The proposed framework incorporates high frequency noise filtering, which represents non-purposeful hand and eye movements. The accuracy of the technique has been demonstrated by quantitative measurement of multiple laparoscopic tasks by expert and novice surgeons. Experimental results supporting visual search behavioural theory are presented, as this underpins the target selection process immediately prior to visual motor action generation. The effects of specialisation and experience on visual search patterns are also examined. Finally, pilot results from functional brain imaging are presented, where the Posterior Parietal Cortical (PPC) activation is measured using optical spectroscopy techniques. PPC has been demonstrated to involve in the calculation of the coordinate transformations between the visual and motor systems, which establishes the possibilities of exciting future studies in hand-eye coordination

    Micro-facial movement detection using spatio-temporal features

    Get PDF
    Micro-facial expressions are fast, subtle movements of facial muscles that occur when someone is attempting to conceal their true emotion. Detecting these movements for a human is di�cult, as the movement could appear and disappear within half of a second. Recently, research into detecting micro-facial movements using computer vision and other techniques has emerged with the aim of outperforming a human. The motivation behind a lot of this research is the potential applications in security, healthcare and emotional-based training. The research has also introduced some ethical concerns on whether it is okay to detect micro-movements when people do not know they are showing them. The main aim of this thesis is to investigate and develop novel ways of detecting micro-facial movements using features based in the spatial and temporal domains. The contributions towards this aim are: an extended feature descriptor to describe micro-facial movement namely Local Binary Patterns on Three Orthogonal Planes (LBP-TOP) combined with Gaussian Derivatives (GD); a dataset of spontaneously induced micro-facial movements, namely Spontaneous Activity of Micro-Movements (SAMM); an individualised baseline method for micromovement detection that forms an Adaptive Baseline Threshold (ABT); Facial Action Coding System (FACS)-based regions are proposed to focus on the local movement of relevant facial areas. The LBP-TOP with GD feature was developed to improve on an established feature and use the GD to enhance the facial features. Using machine learning, the method performs well achieving an accuracy of 92.6%. Next a new dataset, SAMM, was introduced that improved on the limitations of previous sets, including a wider demographic, increased resolution and comprehensively FACS coded. An individualised baseline method was the introduced and tested using the new dataset. Using feature di�erence instead of machine learning, the performance increased with a recall of 0.8429 on the maximum thresholding and a further increase of the recall to 0.9125 when using the ABT. To increase the relevance of what is being processed on the face, FACS-based regions were created. By focusing on local regions and individualised baselines, this method outperformed similar state-of-the-art with an Area Under Curve (AUC) of 0.7513. The research into detecting micro-movements is still in it's infancy, and much more can be done to advance this �eld. While machine learning can �nd patterns in normal facial expressions, it is the feature di�erence methods that perform the best when detecting the subtle changes of the face. By using this and comparing the movement against a person's baseline, the micro-movements can �nally be accurately detected

    On-line Time Warping of Human Motion Sequences

    Get PDF
    Some application areas require motions to be time warped on-line as a motion is captured, aligning a partially captured motion to a complete prerecorded motion. For example movement training applications for dance and medical procedures, require on-line time warping for analysing and visually feeding back the accuracy of human motions as they are being performed. Additionally, real-time production techniques such as virtual production, in camera visual effects and the use of avatars in live stage performances, require on-line time warping to align virtual character performances to a live performer. The work in this thesis first addresses a research gap in the measurement of the alignment of two motions, proposing approaches based on rank correlation and evaluating them against existing distance based approaches to measuring motion similarity. The thesis then goes onto propose and evaluate novel methods for on-line time warping, which plot alignments in a forward direction and utilise forecasting and local continuity constraint techniques. Current studies into measuring the similarity of motions focus on distance based metrics for measuring the similarity of the motions to support motion recognition applications, leaving a research gap regarding the effectiveness of similarity metrics bases on correlation and the optimal metrics for measuring the alignment of two motions. This thesis addresses this research gap by comparing the performance of variety of similarity metrics based on distance and correlation, including novel combinations of joint parameterisation and correlation methods. The ability of each metric to measure both the similarity and alignment of two motions is independently assessed. This work provides a detailed evaluation of a variety of different approaches to using correlation within a similarity metric, testing their performance to determine which approach is optimal and comparing their performance against established distance based metrics. The results show that a correlation based metric, in which joints are parameterised using displacement vectors and correlation is measured using Kendall Tau rank correlation, is the optimal approach for measuring the alignment between two motions. The study also showed that similarity metrics based on correlation are better at measuring the alignment of two motions, which is important in motion blending and style transfer applications as well as evaluating the performance of time warping algorithms. It also showed that metrics based on distance are better at measuring the similarity of two motions, which is more relevant to motion recognition and classification applications. A number of approaches to on-line time warping have been proposed within existing research, that are based on plotting an alignment path backwards from a selected end-point within the complete motion. While these approaches work for discrete applications, such as recognising a motion, their lack of monotonic constraint between alignment of each frame, means these approaches do not support applications that require an alignment to be maintained continuously over a number of frames. For example applications involving continuous real-time visualisation, feedback or interaction. To solve this problem, a number of novel on-line time warping algorithms, based on forward plotting, motion forecasting and local continuity constraints are proposed and evaluated by applying them to human motions. Two benchmarks standards for evaluating the performance of on-line time warping algorithms are established, based on UTW time warping and compering the resulting alignment path with that produced by DTW. This work also proposes a novel approach to adapting existing local continuity constraints to a forward plotting approach. The studies within this thesis demonstrates that these time warping approaches are able to produce alignments of sufficient quality to support applications that require an alignment to be maintained continuously. The on-line time warping algorithms proposed in this study can align a previously recorded motion to a user in real-time, as they are performing the same action or an opposing action recorded at the same time as the motion being align. This solution has a variety of potential application areas including: visualisation applications, such as aligning a motion to a live performer to facilitate in camera visual effects or a live stage performance with a virtual avatar; motion feedback applications such as dance training or medical rehabilitation; and interaction applications such as working with Cobots

    Safe navigation and human-robot interaction in assistant robotic applications

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen
    corecore