1,000 research outputs found

    Sparse Inertial Poser: Automatic 3D Human Pose Estimation from Sparse IMUs

    Full text link
    We address the problem of making human motion capture in the wild more practical by using a small set of inertial sensors attached to the body. Since the problem is heavily under-constrained, previous methods either use a large number of sensors, which is intrusive, or they require additional video input. We take a different approach and constrain the problem by: (i) making use of a realistic statistical body model that includes anthropometric constraints and (ii) using a joint optimization framework to fit the model to orientation and acceleration measurements over multiple frames. The resulting tracker Sparse Inertial Poser (SIP) enables 3D human pose estimation using only 6 sensors (attached to the wrists, lower legs, back and head) and works for arbitrary human motions. Experiments on the recently released TNT15 dataset show that, using the same number of sensors, SIP achieves higher accuracy than the dataset baseline without using any video data. We further demonstrate the effectiveness of SIP on newly recorded challenging motions in outdoor scenarios such as climbing or jumping over a wall.Comment: 12 pages, Accepted at Eurographics 201

    Spatial-Related Sensors Matters: 3D Human Motion Reconstruction Assisted with Textual Semantics

    Full text link
    Leveraging wearable devices for motion reconstruction has emerged as an economical and viable technique. Certain methodologies employ sparse Inertial Measurement Units (IMUs) on the human body and harness data-driven strategies to model human poses. However, the reconstruction of motion based solely on sparse IMUs data is inherently fraught with ambiguity, a consequence of numerous identical IMU readings corresponding to different poses. In this paper, we explore the spatial importance of multiple sensors, supervised by text that describes specific actions. Specifically, uncertainty is introduced to derive weighted features for each IMU. We also design a Hierarchical Temporal Transformer (HTT) and apply contrastive learning to achieve precise temporal and feature alignment of sensor data with textual semantics. Experimental results demonstrate our proposed approach achieves significant improvements in multiple metrics compared to existing methods. Notably, with textual supervision, our method not only differentiates between ambiguous actions such as sitting and standing but also produces more precise and natural motion.Comment: Accepted by AAAI 202

    DeepMoCap: Deep Optical Motion Capture Using Multiple Depth Sensors and Retro-Reflectors

    Get PDF
    In this paper, a marker-based, single-person optical motion capture method (DeepMoCap) is proposed using multiple spatio-temporally aligned infrared-depth sensors and retro-reflective straps and patches (reflectors). DeepMoCap explores motion capture by automatically localizing and labeling reflectors on depth images and, subsequently, on 3D space. Introducing a non-parametric representation to encode the temporal correlation among pairs of colorized depthmaps and 3D optical flow frames, a multi-stage Fully Convolutional Network (FCN) architecture is proposed to jointly learn reflector locations and their temporal dependency among sequential frames. The extracted reflector 2D locations are spatially mapped in 3D space, resulting in robust 3D optical data extraction. The subject’s motion is efficiently captured by applying a template-based fitting technique on the extracted optical data. Two datasets have been created and made publicly available for evaluation purposes; one comprising multi-view depth and 3D optical flow annotated images (DMC2.5D), and a second, consisting of spatio-temporally aligned multi-view depth images along with skeleton, inertial and ground truth MoCap data (DMC3D). The FCN model outperforms its competitors on the DMC2.5D dataset using 2D Percentage of Correct Keypoints (PCK) metric, while the motion capture outcome is evaluated against RGB-D and inertial data fusion approaches on DMC3D, outperforming the next best method by 4.5% in total 3D PCK accuracy

    Fusion of wearable and visual sensors for human motion analysis

    No full text
    Human motion analysis is concerned with the study of human activity recognition, human motion tracking, and the analysis of human biomechanics. Human motion analysis has applications within areas of entertainment, sports, and healthcare. For example, activity recognition, which aims to understand and identify different tasks from motion can be applied to create records of staff activity in the operating theatre at a hospital; motion tracking is already employed in some games to provide an improved user interaction experience and can be used to study how medical staff interact in the operating theatre; and human biomechanics, which is the study of the structure and function of the human body, can be used to better understand athlete performance, pathologies in certain patients, and assess the surgical skill of medical staff. As health services strive to improve the quality of patient care and meet the growing demands required to care for expanding populations around the world, solutions that can improve patient care, diagnosis of pathology, and the monitoring and training of medical staff are necessary. Surgical workflow analysis, for example, aims to assess and optimise surgical protocols in the operating theatre by evaluating the tasks that staff perform and measurable outcomes. Human motion analysis methods can be used to quantify the activities and performance of staff for surgical workflow analysis; however, a number of challenges must be overcome before routine motion capture of staff in an operating theatre becomes feasible. Current commercial human motion capture technologies have demonstrated that they are capable of acquiring human movement with sub-centimetre accuracy; however, the complicated setup procedures, size, and embodiment of current systems make them cumbersome and unsuited for routine deployment within an operating theatre. Recent advances in pervasive sensing have resulted in camera systems that can detect and analyse human motion, and small wear- able sensors that can measure a variety of parameters from the human body, such as heart rate, fatigue, balance, and motion. The work in this thesis investigates different methods that enable human motion to be more easily, reliably, and accurately captured through ambient and wearable sensor technologies to address some of the main challenges that have limited the use of motion capture technologies in certain areas of study. Sensor embodiment and accuracy of activity recognition is one of the challenges that affect the adoption of wearable devices for monitoring human activity. Using a single inertial sensor, which captures the movement of the subject, a variety of motion characteristics can be measured. For patients, wearable inertial sensors can be used in long-term activity monitoring to better understand the condition of the patient and potentially identify deviations from normal activity. For medical staff, inertial sensors can be used to capture tasks being performed for automated workflow analysis, which is useful for staff training, optimisation of existing processes, and early indications of complications within clinical procedures. Feature extraction and classification methods are introduced in thesis that demonstrate motion classification accuracies of over 90% for five different classes of walking motion using a single ear-worn sensor. To capture human body posture, current capture systems generally require a large number of sensors or reflective reference markers to be worn on the body, which presents a challenge for many applications, such as monitoring human motion in the operating theatre, as they may restrict natural movements and make setup complex and time consuming. To address this, a method is proposed, which uses a regression method to estimate motion using a subset of fewer wearable inertial sensors. This method is demonstrated using three sensors on the upper body and is shown to achieve mean estimation accuracies as low as 1.6cm, 1.1cm, and 1.4cm for the hand, elbow, and shoulders, respectively, when compared with the gold standard optical motion capture system. Using a subset of three sensors, mean errors for hand position reach 15.5cm. Unlike human motion capture systems that rely on vision and reflective reference point markers, commonly known as marker-based optical motion capture, wearable inertial sensors are prone to inaccuracies resulting from an accumulation of inaccurate measurements, which becomes increasingly prevalent over time. Two methods are introduced in this thesis, which aim to solve this challenge using visual rectification of the assumed state of the subject. Using a ceiling-mounted camera, a human detection and human motion tracking method is introduced to improve the average mean accuracy of tracking to within 5.8cm in a laboratory of 3m × 5m. To improve the accuracy of capturing the position of body parts and posture for human biomechanics, a camera is also utilised to track the body part movements and provide visual rectification of human pose estimates from inertial sensing. For most subjects, deviations of less than 10% from the ground truth are achieved for hand positions, which exhibit the greatest error, and the occurrence of sources of other common visual and inertial estimation errors, such as measurement noise, visual occlusion, and sensor calibration are shown to be reduced.Open Acces

    Assessing the Utility of a Video-Based Motion Capture Alternative in the Assessment of Lumbar Spine Planar Angular Joint Kinematics

    Get PDF
    Markerless motion capture is a novel technique to measure human movement kinematics. The purpose of this research is to evaluate the markerless algorithm, DeepLabCut (DLC) against a 3D motion capture system (Vicon Motion Systems Ltd., Oxford, UK) in the analysis of planar spine and elbow flexion-extension movement. Data were acquired concurrently from DLC and Vicon for all movements. A novel DLC model was trained using data derived from a subset of participants (training group). Accuracy and precision were assessed from data derived from the training group as well as in a new set of participants (testing group). Two-way SPM ANOVAs were used to detect significant differences between the training vs. testing sets, capture methods (Vicon vs. DLC), as well as potential higher order interaction effect between these independent variables in the estimation of flexion extension angles and variability. No significant differences were observed in any planar angles, nor were any higher order interactions observed between each motion capture modality and the training vs. testing datasets. Bland Altman plots were also generated to depict the mean bias and level of agreement between DLC and Vicon for both training, and testing datasets. Supplemental analyses, suggest that these results are partially affected by the alignment of each participant’s body segments with respect to each planar reference frame. This research suggests that DLC-derived planar kinematics of both the elbow and lumbar spine are of acceptable accuracy and precision when compared to conventional laboratory gold-standards (Vicon)

    Explaining the Ergonomic Assessment of Human Movement in Industrial Contexts

    Get PDF
    Manufacturing processes are based on human labour and the symbiosis between human operators and machines. The operators are required to follow predefined sequences of movements. The operations carried out at assembly lines are repetitive, being identified as a risk factor for the onset of musculoskeletal disorders. Ergonomics plays a big role in preventing occupational diseases. Ergonomic risk scores measure the overall risk exposure of operators however these methods still present challenges: the scores are often associated to a given workstation, being agnostic to the variability among operators. Observation methods are most often employed yet require a significant amount of effort, preventing an accurate and continuous ergonomic evaluation to the entire population of operators. Finally, the risk’s results are rendered as index scores, hindering a more comprehensive interpretation by occupational physicians. This dissertation developed a solution for automatic operator risk exposure in assembly lines. Three main contributions were presented: (1) an upper limb and torso motion tracking algorithm which relies on inertial sensors to estimate the orientation of anatomical joints; (2) an adjusted ergonomic risk score; (3) an ergonomic risk explanation approach based on the analysis of the angular risk factors. Throughout the research, two experimental assessments were conducted: laboratory validation and field evaluation. The laboratory tests enabled the creation of a movements’ dataset and used an optical motion capture system as reference. The field evaluation dataset was acquired on an automotive assembly line and serve as the basis for an ergonomic risk evaluation study. The experimental results revealed that the proposed solution has the potential to be applied in a real environment. Through direct measures, the ergonomic feedback is fastened, and consequently, the evaluation can be extended to more operators, ultimately preventing, in long-term, work-related injuries

    Body sensor network for in-home personal healthcare

    Get PDF
    A body sensor network solution for personal healthcare under an indoor environment is developed. The system is capable of logging the physiological signals of human beings, tracking the orientations of human body, and monitoring the environmental attributes, which covers all necessary information for the personal healthcare in an indoor environment. The major three chapters of this dissertation contain three subsystems in this work, each corresponding to one subsystem: BioLogger, PAMS and CosNet. Each chapter covers the background and motivation of the subsystem, the related theory, the hardware/software design, and the evaluation of the prototype’s performance

    ThirdLight: low-cost and high-speed 3D interaction using photosensor markers

    No full text
    We present a low-cost 3D tracking system for virtual reality, gesture modeling, and robot manipulation applications which require fast and precise localization of headsets, data gloves, props, or controllers. Our system removes the need for cameras or projectors for sensing, and instead uses cheap LEDs and printed masks for illumination, and low-cost photosensitive markers. The illumination device transmits a spatiotemporal pattern as a series of binary Gray-code patterns. Multiple illumination devices can be combined to localize each marker in 3D at high speed (333Hz). Our method has strengths in accuracy, speed, cost, ambient performance, large working space (1m-5m) and robustness to noise compared with conventional techniques. We compare with a state-of-the-art instrumented glove and vision-based systems to demonstrate the accuracy, scalability, and robustness of our approach. We propose a fast and accurate method for hand gesture modeling using an inverse kinematics approach with the six photosensitive markers. We additionally propose a passive markers system and demonstrate various interaction scenarios as practical applications

    Context-aware home monitoring system for Parkinson's disease patients : ambient and wearable sensing for freezing of gait detection

    Get PDF
    Tesi en modalitat de cotutela: Universitat Politècnica de Catalunya i Technische Universiteit Eindhoven. This PhD Thesis has been developed in the framework of, and according to, the rules of the Erasmus Mundus Joint Doctorate on Interactive and Cognitive Environments EMJD ICE [FPA no. 2010-0012]Parkinson’s disease (PD). It is characterized by brief episodes of inability to step, or by extremely short steps that typically occur on gait initiation or on turning while walking. The consequences of FOG are aggravated mobility and higher affinity to falls, which have a direct effect on the quality of life of the individual. There does not exist completely effective pharmacological treatment for the FOG phenomena. However, external stimuli, such as lines on the floor or rhythmic sounds, can focus the attention of a person who experiences a FOG episode and help her initiate gait. The optimal effectiveness in such approach, known as cueing, is achieved through timely activation of a cueing device upon the accurate detection of a FOG episode. Therefore, a robust and accurate FOG detection is the main problem that needs to be solved when developing a suitable assistive technology solution for this specific user group. This thesis proposes the use of activity and spatial context of a person as the means to improve the detection of FOG episodes during monitoring at home. The thesis describes design, algorithm implementation and evaluation of a distributed home system for FOG detection based on multiple cameras and a single inertial gait sensor worn at the waist of the patient. Through detailed observation of collected home data of 17 PD patients, we realized that a novel solution for FOG detection could be achieved by using contextual information of the patient’s position, orientation, basic posture and movement on a semantically annotated two-dimensional (2D) map of the indoor environment. We envisioned the future context-aware system as a network of Microsoft Kinect cameras placed in the patient’s home that interacts with a wearable inertial sensor on the patient (smartphone). Since the hardware platform of the system constitutes from the commercial of-the-shelf hardware, the majority of the system development efforts involved the production of software modules (for position tracking, orientation tracking, activity recognition) that run on top of the middle-ware operating system in the home gateway server. The main component of the system that had to be developed is the Kinect application for tracking the position and height of multiple people, based on the input in the form of 3D point cloud data. Besides position tracking, this software module also provides mapping and semantic annotation of FOG specific zones on the scene in front of the Kinect. One instance of vision tracking application is supposed to run for every Kinect sensor in the system, yielding potentially high number of simultaneous tracks. At any moment, the system has to track one specific person - the patient. To enable tracking of the patient between different non-overlapped cameras in the distributed system, a new re-identification approach based on appearance model learning with one-class Support Vector Machine (SVM) was developed. Evaluation of the re-identification method was conducted on a 16 people dataset in a laboratory environment. Since the patient orientation in the indoor space was recognized as an important part of the context, the system necessitated the ability to estimate the orientation of the person, expressed in the frame of the 2D scene on which the patient is tracked by the camera. We devised method to fuse position tracking information from the vision system and inertial data from the smartphone in order to obtain patient’s 2D pose estimation on the scene map. Additionally, a method for the estimation of the position of the smartphone on the waist of the patient was proposed. Position and orientation estimation accuracy were evaluated on a 12 people dataset. Finally, having available positional, orientation and height information, a new seven-class activity classification was realized using a hierarchical classifier that combines height-based posture classifier with translational and rotational SVM movement classifiers. Each of the SVM movement classifiers and the joint hierarchical classifier were evaluated in the laboratory experiment with 8 healthy persons. The final context-based FOG detection algorithm uses activity information and spatial context information in order to confirm or disprove FOG detected by the current state-of-the-art FOG detection algorithm (which uses only wearable sensor data). A dataset with home data of 3 PD patients was produced using two Kinect cameras and a smartphone in synchronized recording. The new context-based FOG detection algorithm and the wearable-only FOG detection algorithm were both evaluated with the home dataset and their results were compared. The context-based algorithm very positively influences the reduction of false positive detections, which is expressed through achieved higher specificity. In some cases, context-based algorithm also eliminates true positive detections, reducing sensitivity to the lesser extent. The final comparison of the two algorithms on the basis of their sensitivity and specificity, shows the improvement in the overall FOG detection achieved with the new context-aware home system.Esta tesis propone el uso de la actividad y el contexto espacial de una persona como medio para mejorar la detección de episodios de FOG (Freezing of gait) durante el seguimiento en el domicilio. La tesis describe el diseño, implementación de algoritmos y evaluación de un sistema doméstico distribuido para detección de FOG basado en varias cámaras y un único sensor de marcha inercial en la cintura del paciente. Mediante de la observación detallada de los datos caseros recopilados de 17 pacientes con EP, nos dimos cuenta de que se puede lograr una solución novedosa para la detección de FOG mediante el uso de información contextual de la posición del paciente, orientación, postura básica y movimiento anotada semánticamente en un mapa bidimensional (2D) del entorno interior. Imaginamos el futuro sistema de consciencia del contexto como una red de cámaras Microsoft Kinect colocadas en el hogar del paciente, que interactúa con un sensor de inercia portátil en el paciente (teléfono inteligente). Al constituirse la plataforma del sistema a partir de hardware comercial disponible, los esfuerzos de desarrollo consistieron en la producción de módulos de software (para el seguimiento de la posición, orientación seguimiento, reconocimiento de actividad) que se ejecutan en la parte superior del sistema operativo del servidor de puerta de enlace de casa. El componente principal del sistema que tuvo que desarrollarse es la aplicación Kinect para seguimiento de la posición y la altura de varias personas, según la entrada en forma de punto 3D de datos en la nube. Además del seguimiento de posición, este módulo de software también proporciona mapeo y semántica. anotación de zonas específicas de FOG en la escena frente al Kinect. Se supone que una instancia de la aplicación de seguimiento de visión se ejecuta para cada sensor Kinect en el sistema, produciendo un número potencialmente alto de pistas simultáneas. En cualquier momento, el sistema tiene que rastrear a una persona específica - el paciente. Para habilitar el seguimiento del paciente entre diferentes cámaras no superpuestas en el sistema distribuido, se desarrolló un nuevo enfoque de re-identificación basado en el aprendizaje de modelos de apariencia con one-class Suport Vector Machine (SVM). La evaluación del método de re-identificación se realizó con un conjunto de datos de 16 personas en un entorno de laboratorio. Dado que la orientación del paciente en el espacio interior fue reconocida como una parte importante del contexto, el sistema necesitaba la capacidad de estimar la orientación de la persona, expresada en el marco de la escena 2D en la que la cámara sigue al paciente. Diseñamos un método para fusionar la información de seguimiento de posición del sistema de visión y los datos de inercia del smartphone para obtener la estimación de postura 2D del paciente en el mapa de la escena. Además, se propuso un método para la estimación de la posición del Smartphone en la cintura del paciente. La precisión de la estimación de la posición y la orientación se evaluó en un conjunto de datos de 12 personas. Finalmente, al tener disponible información de posición, orientación y altura, se realizó una nueva clasificación de actividad de seven-class utilizando un clasificador jerárquico que combina un clasificador de postura basado en la altura con clasificadores de movimiento SVM traslacional y rotacional. Cada uno de los clasificadores de movimiento SVM y el clasificador jerárquico conjunto se evaluaron en el experimento de laboratorio con 8 personas sanas. El último algoritmo de detección de FOG basado en el contexto utiliza información de actividad e información de texto espacial para confirmar o refutar el FOG detectado por el algoritmo de detección de FOG actual. El algoritmo basado en el contexto influye muy positivamente en la reducción de las detecciones de falsos positivos, que se expresa a través de una mayor especificida
    corecore