159 research outputs found

    Detection of bimanual gestures everywhere: why it matters, what we need and what is missing

    Full text link
    Bimanual gestures are of the utmost importance for the study of motor coordination in humans and in everyday activities. A reliable detection of bimanual gestures in unconstrained environments is fundamental for their clinical study and to assess common activities of daily living. This paper investigates techniques for a reliable, unconstrained detection and classification of bimanual gestures. It assumes the availability of inertial data originating from the two hands/arms, builds upon a previously developed technique for gesture modelling based on Gaussian Mixture Modelling (GMM) and Gaussian Mixture Regression (GMR), and compares different modelling and classification techniques, which are based on a number of assumptions inspired by literature about how bimanual gestures are represented and modelled in the brain. Experiments show results related to 5 everyday bimanual activities, which have been selected on the basis of three main parameters: (not) constraining the two hands by a physical tool, (not) requiring a specific sequence of single-hand gestures, being recursive (or not). In the best performing combination of modeling approach and classification technique, five out of five activities are recognized up to an accuracy of 97%, a precision of 82% and a level of recall of 100%.Comment: Submitted to Robotics and Autonomous Systems (Elsevier

    Review of Wearable Devices and Data Collection Considerations for Connected Health

    Get PDF
    Wearable sensor technology has gradually extended its usability into a wide range of well-known applications. Wearable sensors can typically assess and quantify the wearer’s physiology and are commonly employed for human activity detection and quantified self-assessment. Wearable sensors are increasingly utilised to monitor patient health, rapidly assist with disease diagnosis, and help predict and often improve patient outcomes. Clinicians use various self-report questionnaires and well-known tests to report patient symptoms and assess their functional ability. These assessments are time consuming and costly and depend on subjective patient recall. Moreover, measurements may not accurately demonstrate the patient’s functional ability whilst at home. Wearable sensors can be used to detect and quantify specific movements in different applications. The volume of data collected by wearable sensors during long-term assessment of ambulatory movement can become immense in tuple size. This paper discusses current techniques used to track and record various human body movements, as well as techniques used to measure activity and sleep from long-term data collected by wearable technology devices

    Identification, synchronisation and composition of user-generated videos

    Get PDF
    Cotutela Universitat Politècnica de Catalunya i Queen Mary University of LondonThe increasing availability of smartphones is facilitating people to capture videos of their experience when attending events such as concerts, sports competitions and public rallies. Smartphones are equipped with inertial sensors which could be beneficial for event understanding. The captured User-Generated Videos (UGVs) are made available on media sharing websites. Searching and mining of UGVs of the same event are challenging due to inconsistent tags or incorrect timestamps. A UGV recorded from a fixed location contains monotonic content and unintentional camera motions, which may make it less interesting to playback. In this thesis, we propose the following identification, synchronisation and video composition frameworks for UGVs. We propose a framework for the automatic identification and synchronisation of unedited multi-camera UGVs within a database. The proposed framework analyses the sound to match and cluster UGVs that capture the same spatio-temporal event, and estimate their relative time-shift to temporally align them. We design a novel descriptor derived from the pairwise matching of audio chroma features of UGVs. The descriptor facilitates the definition of a classification threshold for automatic query-by-example event identification. We contribute a database of 263 multi-camera UGVs of 48 real-world events. We evaluate the proposed framework on this database and compare it with state-of-the-art methods. Experimental results show the effectiveness of the proposed approach in the presence of audio degradations (channel noise, ambient noise, reverberations). Moreover, we present an automatic audio and visual-based camera selection framework for composing uninterrupted recording from synchronised multi-camera UGVs of the same event. We design an automatic audio-based cut-point selection method that provides a common reference for audio and video segmentation. To filter low quality video segments, spatial and spatio-temporal assessments are computed. The framework combines segments of UGVs using a rank-based camera selection strategy by considering visual quality scores and view diversity. The proposed framework is validated on a dataset of 13 events (93~UGVs) through subjective tests and compared with state-of-the-art methods. Suitable cut-point selection, specific visual quality assessments and rank-based camera selection contribute to the superiority of the proposed framework over the existing methods. Finally, we contribute a method for Camera Motion Detection using Gyroscope for UGVs captured from smartphones and design a gyro-based quality score for video composition. The gyroscope measures the angular velocity of the smartphone that can be use for camera motion analysis. We evaluate the proposed camera motion detection method on a dataset of 24 multi-modal UGVs captured by us, and compare it with existing visual and inertial sensor-based methods. By designing a gyro-based score to quantify the goodness of the multi-camera UGVs, we develop a gyro-based video composition framework. A gyro-based score substitutes the spatial and spatio-temporal scores and reduces the computational complexity. We contribute a multi-modal dataset of 3 events (12~UGVs), which is used to validate the proposed gyro-based video composition framework.El incremento de la disponibilidad de teléfonos inteligentes o smartphones posibilita a la gente capturar videos de sus experiencias cuando asisten a eventos así como como conciertos, competiciones deportivas o mítines públicos. Los Videos Generados por Usuarios (UGVs) pueden estar disponibles en sitios web públicos especializados en compartir archivos. La búsqueda y la minería de datos de los UGVs del mismo evento son un reto debido a que los etiquetajes son incoherentes o las marcas de tiempo erróneas. Por otra parte, un UGV grabado desde una ubicación fija, contiene información monótona y movimientos de cámara no intencionados haciendo menos interesante su reproducción. En esta tesis, se propone una identificación, sincronización y composición de tramas de vídeo para UGVs. Se ha propuesto un sistema para la identificación y sincronización automática de UGVs no editados provenientes de diferentes cámaras dentro de una base de datos. El sistema propuesto analiza el sonido con el fin de hacerlo coincidir e integrar UGVs que capturan el mismo evento en el espacio y en el tiempo, estimando sus respectivos desfases temporales y alinearlos en el tiempo. Se ha diseñado un nuevo descriptor a partir de la coincidencia por parejas de características de la croma del audio de los UGVs. Este descriptor facilita la determinación de una clasificación por umbral para una identificación de eventos automática basada en búsqueda mediante ejemplo (en inglés, query by example). Se ha contribuido con una base de datos de 263 multi-cámaras UGVs de un total de 48 eventos reales. Se ha evaluado la trama propuesta en esta base de datos y se ha comparado con los métodos elaborados en el estado del arte. Los resultados experimentales muestran la efectividad del enfoque propuesto con la presencia alteraciones en el audio. Además, se ha presentado una selección automática de tramas en base a la reproducción de video y audio componiendo una grabación ininterrumpida de multi-cámaras UGVs sincronizadas en el mismo evento. También se ha diseñado un método de selección de puntos de corte automático basado en audio que proporciona una referencia común para la segmentación de audio y video. Con el fin de filtrar segmentos de videos de baja calidad, se han calculado algunas medidas espaciales y espacio-temporales. El sistema combina segmentos de UGVs empleando una estrategia de selección de cámaras basadas en la evaluación a través de un ranking considerando puntuaciones de calidad visuales y diversidad de visión. El sistema propuesto se ha validado con un conjunto de datos de 13 eventos (93 UGVs) a través de pruebas subjetivas y se han comparado con los métodos elaborados en el estado del arte. La selección de puntos de corte adecuados, evaluaciones de calidad visual específicas y la selección de cámara basada en ranking contribuyen en la mejoría de calidad del sistema propuesto respecto a otros métodos existentes. Finalmente, se ha realizado un método para la Detección de Movimiento de Cámara usando giróscopos para las UGVs capturadas desde smartphones y se ha diseñado un método de puntuación de calidad basada en el giro. El método de detección de movimiento de la cámara con una base de datos de 24 UGVs multi-modales y se ha comparado con los métodos actuales basados en visión y sistemas inerciales. A través del diseño de puntuación para cuantificar con el giróscopo cuán bien funcionan los sistemas de UGVs con multi-cámara, se ha desarrollado un sistema de composición de video basada en el movimiento del giroscopio. Este sistema basado en la puntuación a través del giróscopo sustituye a los sistemas de puntuaciones basados en parámetros espacio-temporales reduciendo la complejidad computacional. Además, se ha contribuido con un conjunto de datos de 3 eventos (12 UGVs), que se han empleado para validar los sistemas de composición de video basados en giróscopo.Postprint (published version

    Combining haptics and inertial motion capture to enhance remote control of a dual-arm robot

    Get PDF
    High dexterity is required in tasks in which there is contact between objects, such as surface conditioning (wiping, polishing, scuffing, sanding, etc.), specially when the location of the objects involved is unknown or highly inaccurate because they are moving, like a car body in automotive industry lines. These applications require the human adaptability and the robot accuracy. However, sharing the same workspace is not possible in most cases due to safety issues. Hence, a multi-modal teleoperation system combining haptics and an inertial motion capture system is introduced in this work. The human operator gets the sense of touch thanks to haptic feedback, whereas using the motion capture device allows more naturalistic movements. Visual feedback assistance is also introduced to enhance immersion. A Baxter dual-arm robot is used to offer more flexibility and manoeuvrability, allowing to perform two independent operations simultaneously. Several tests have been carried out to assess the proposed system. As it is shown by the experimental results, the task duration is reduced and the overall performance improves thanks to the proposed teleoperation method

    Fused mechanomyography and inertial measurement for human-robot interface

    Get PDF
    Human-Machine Interfaces (HMI) are the technology through which we interact with the ever-increasing quantity of smart devices surrounding us. The fundamental goal of an HMI is to facilitate robot control through uniting a human operator as the supervisor with a machine as the task executor. Sensors, actuators, and onboard intelligence have not reached the point where robotic manipulators may function with complete autonomy and therefore some form of HMI is still necessary in unstructured environments. These may include environments where direct human action is undesirable or infeasible, and situations where a robot must assist and/or interface with people. Contemporary literature has introduced concepts such as body-worn mechanical devices, instrumented gloves, inertial or electromagnetic motion tracking sensors on the arms, head, or legs, electroencephalographic (EEG) brain activity sensors, electromyographic (EMG) muscular activity sensors and camera-based (vision) interfaces to recognize hand gestures and/or track arm motions for assessment of operator intent and generation of robotic control signals. While these developments offer a wealth of future potential their utility has been largely restricted to laboratory demonstrations in controlled environments due to issues such as lack of portability and robustness and an inability to extract operator intent for both arm and hand motion. Wearable physiological sensors hold particular promise for capture of human intent/command. EMG-based gesture recognition systems in particular have received significant attention in recent literature. As wearable pervasive devices, they offer benefits over camera or physical input systems in that they neither inhibit the user physically nor constrain the user to a location where the sensors are deployed. Despite these benefits, EMG alone has yet to demonstrate the capacity to recognize both gross movement (e.g. arm motion) and finer grasping (e.g. hand movement). As such, many researchers have proposed fusing muscle activity (EMG) and motion tracking e.g. (inertial measurement) to combine arm motion and grasp intent as HMI input for manipulator control. However, such work has arguably reached a plateau since EMG suffers from interference from environmental factors which cause signal degradation over time, demands an electrical connection with the skin, and has not demonstrated the capacity to function out of controlled environments for long periods of time. This thesis proposes a new form of gesture-based interface utilising a novel combination of inertial measurement units (IMUs) and mechanomyography sensors (MMGs). The modular system permits numerous configurations of IMU to derive body kinematics in real-time and uses this to convert arm movements into control signals. Additionally, bands containing six mechanomyography sensors were used to observe muscular contractions in the forearm which are generated using specific hand motions. This combination of continuous and discrete control signals allows a large variety of smart devices to be controlled. Several methods of pattern recognition were implemented to provide accurate decoding of the mechanomyographic information, including Linear Discriminant Analysis and Support Vector Machines. Based on these techniques, accuracies of 94.5% and 94.6% respectively were achieved for 12 gesture classification. In real-time tests, accuracies of 95.6% were achieved in 5 gesture classification. It has previously been noted that MMG sensors are susceptible to motion induced interference. The thesis also established that arm pose also changes the measured signal. This thesis introduces a new method of fusing of IMU and MMG to provide a classification that is robust to both of these sources of interference. Additionally, an improvement in orientation estimation, and a new orientation estimation algorithm are proposed. These improvements to the robustness of the system provide the first solution that is able to reliably track both motion and muscle activity for extended periods of time for HMI outside a clinical environment. Application in robot teleoperation in both real-world and virtual environments were explored. With multiple degrees of freedom, robot teleoperation provides an ideal test platform for HMI devices, since it requires a combination of continuous and discrete control signals. The field of prosthetics also represents a unique challenge for HMI applications. In an ideal situation, the sensor suite should be capable of detecting the muscular activity in the residual limb which is naturally indicative of intent to perform a specific hand pose and trigger this post in the prosthetic device. Dynamic environmental conditions within a socket such as skin impedance have delayed the translation of gesture control systems into prosthetic devices, however mechanomyography sensors are unaffected by such issues. There is huge potential for a system like this to be utilised as a controller as ubiquitous computing systems become more prevalent, and as the desire for a simple, universal interface increases. Such systems have the potential to impact significantly on the quality of life of prosthetic users and others.Open Acces

    Visual / acoustic detection and localisation in embedded systems

    Get PDF
    ©Cranfield UniversityThe continuous miniaturisation of sensing and processing technologies is increasingly offering a variety of embedded platforms, enabling the accomplishment of a broad range of tasks using such systems. Motivated by these advances, this thesis investigates embedded detection and localisation solutions using vision and acoustic sensors. Focus is particularly placed on surveillance applications using sensor networks. Existing vision-based detection solutions for embedded systems suffer from the sensitivity to environmental conditions. In the literature, there seems to be no algorithm able to simultaneously tackle all the challenges inherent to real-world videos. Regarding the acoustic modality, many research works have investigated acoustic source localisation solutions in distributed sensor networks. Nevertheless, it is still a challenging task to develop an ecient algorithm that deals with the experimental issues, to approach the performance required by these systems and to perform the data processing in a distributed and robust manner. The movement of scene objects is generally accompanied with sound emissions with features that vary from an environment to another. Therefore, considering the combination of the visual and acoustic modalities would offer a significant opportunity for improving the detection and/or localisation using the described platforms. In the light of the described framework, we investigate in the first part of the thesis the use of a cost-effective visual based method that can deal robustly with the issue of motion detection in static, dynamic and moving background conditions. For motion detection in static and dynamic backgrounds, we present the development and the performance analysis of a spatio- temporal form of the Gaussian mixture model. On the other hand, the problem of motion detection in moving backgrounds is addressed by accounting for registration errors in the captured images. By adopting a robust optimisation technique that takes into account the uncertainty about the visual measurements, we show that high detection accuracy can be achieved. In the second part of this thesis, we investigate solutions to the problem of acoustic source localisation using a trust region based optimisation technique. The proposed method shows an overall higher accuracy and convergence improvement compared to a linear-search based method. More importantly, we show that through characterising the errors in measurements, which is a common problem for such platforms, higher accuracy in the localisation can be attained. The last part of this work studies the different possibilities of combining visual and acoustic information in a distributed sensors network. In this context, we first propose to include the acoustic information in the visual model. The obtained new augmented model provides promising improvements in the detection and localisation processes. The second investigated solution consists in the fusion of the measurements coming from the different sensors. An evaluation of the accuracy of localisation and tracking using a centralised/decentralised architecture is conducted in various scenarios and experimental conditions. Results have shown the capability of this fusion approach to yield higher accuracy in the localisation and tracking of an active acoustic source than by using a single type of data

    Learning Algorithm Design for Human-Robot Skill Transfer

    Get PDF
    In this research, we develop an intelligent learning scheme for performing human-robot skills transfer. Techniques adopted in the scheme include the Dynamic Movement Prim- itive (DMP) method with Dynamic Time Warping (DTW), Gaussian Mixture Model (G- MM) with Gaussian Mixture Regression (GMR) and the Radical Basis Function Neural Networks (RBFNNs). A series of experiments are conducted on a Baxter robot, a NAO robot and a KUKA iiwa robot to verify the effectiveness of the proposed design.During the design of the intelligent learning scheme, an online tracking system is de- veloped to control the arm and head movement of the NAO robot using a Kinect sensor. The NAO robot is a humanoid robot with 5 degrees of freedom (DOF) for each arm. The joint motions of the operator’s head and arm are captured by a Kinect V2 sensor, and this information is then transferred into the workspace via the forward and inverse kinematics. In addition, to improve the tracking performance, a Kalman filter is further employed to fuse motion signals from the operator sensed by the Kinect V2 sensor and a pair of MYO armbands, so as to teleoperate the Baxter robot. In this regard, a new strategy is developed using the vector approach to accomplish a specific motion capture task. For instance, the arm motion of the operator is captured by a Kinect sensor and programmed through a processing software. Two MYO armbands with embedded inertial measurement units are worn by the operator to aid the robots in detecting and replicating the operator’s arm movements. For this purpose, the armbands help to recognize and calculate the precise velocity of motion of the operator’s arm. Additionally, a neural network based adaptive controller is designed and implemented on the Baxter robot to illustrate the validation forthe teleoperation of the Baxter robot.Subsequently, an enhanced teaching interface has been developed for the robot using DMP and GMR. Motion signals are collected from a human demonstrator via the Kinect v2 sensor, and the data is sent to a remote PC for teleoperating the Baxter robot. At this stage, the DMP is utilized to model and generalize the movements. In order to learn from multiple demonstrations, DTW is used for the preprocessing of the data recorded on the robot platform, and GMM is employed for the evaluation of DMP to generate multiple patterns after the completion of the teaching process. Next, we apply the GMR algorithm to generate a synthesized trajectory to minimize position errors in the three dimensional (3D) space. This approach has been tested by performing tasks on a KUKA iiwa and a Baxter robot, respectively.Finally, an optimized DMP is added to the teaching interface. A character recombination technology based on DMP segmentation that uses verbal command has also been developed and incorporated in a Baxter robot platform. To imitate the recorded motion signals produced by the demonstrator, the operator trains the Baxter robot by physically guiding it to complete the given task. This is repeated five times, and the generated training data set is utilized via the playback system. Subsequently, the DTW is employed to preprocess the experimental data. For modelling and overall movement control, DMP is chosen. The GMM is used to generate multiple patterns after implementing the teaching process. Next, we employ the GMR algorithm to reduce position errors in the 3D space after a synthesized trajectory has been generated. The Baxter robot, remotely controlled by the user datagram protocol (UDP) in a PC, records and reproduces every trajectory. Additionally, Dragon Natural Speaking software is adopted to transcribe the voice data. This proposed approach has been verified by enabling the Baxter robot to perform a writing task of drawing robot has been taught to write only one character

    Machine Learning Methods for Behaviour Analysis and Anomaly Detection in Video

    Get PDF

    Earthquake damage analysis and mapping with the use of satellite remote sensing

    Get PDF
    After a seismic event a rapid and accurate evaluation of the impact of the damages is extremely important. Such evaluation may support rescue team operations and identify the actual dimensions of the event and its potential impact on the territory and on the population. The use of Earth Observation (EO) data has been significantly increasing in the last years, particularly the use of Very High Resolution (VHR) optical images, which are able to provide detailed information at single building level. However, most of the existing approaches mainly rely on the use of remote sensing data, either optical or SAR (Synthetic Aperture Radar), and perform a classification based on change detection techniques. In this work we aim at creating a flexible tool that is able to perform a damage classification taking into account, not only EO available data, but also additional information that is supposed to be available even before the occurrence of any seismic event (a-priori data). This data includes soil vulnerability, which can play a very important role on local amplification effects as well as structural information of the individual building. Such approach, pursued within the framework of the EC-FP7 funded project APhoRISM (Advanced Procedures for Volcanic and Seismic Monitoring- grant agreement n. 606738) aims at generating maps of damage caused by a seism using both satellite remote sensing data (SAR and/or optical sensors) and ground and structural data. The basic idea is to integrate both satellite remote sensing data (SAR and/or optical sensors) with structural and ground data to improve the accuracy and limit false alarms that derive by the use of EO data only. In order to do this, we first review the general approach and methods to data fusion and we identify what is the level of information that is better to merge referring to our goals. We also examine how the structural information is evaluated and we then focus on the description of Bayesian approaches and, more specifically, of Bayesian networks. Such type of graphical approach for our data fusion tool is implemented to assess post-earthquake building damage. We validate our Bayesian networks against the real test case based on L’ Aquila (Italy) earthquake which took place on April 6, 2009. In this case, we have a set of data available to build the Ground Truth validation test set. For what concerns remote sensing data, for this event, both COSMO-Skymed Radar and Quickbird VHR optical sensors were available thus allowing a complete remote sensing dataset. The in-situ information, though fragmentary, was built using data coming from different sources, mainly from INGV (Italian Geophysical and Volcano Institute) and the Italian Civil Protection Department. The promising results of different Bayesian networks are presented showing the step-by-step approach adopted, which aims at generalising the methodology in order to further implement the network in future cases
    corecore