98 research outputs found

    On the 3D point cloud for human-pose estimation

    Get PDF
    This thesis aims at investigating methodologies for estimating a human pose from a 3D point cloud that is captured by a static depth sensor. Human-pose estimation (HPE) is important for a range of applications, such as human-robot interaction, healthcare, surveillance, and so forth. Yet, HPE is challenging because of the uncertainty in sensor measurements and the complexity of human poses. In this research, we focus on addressing challenges related to two crucial components in the estimation process, namely, human-pose feature extraction and human-pose modeling. In feature extraction, the main challenge involves reducing feature ambiguity. We propose a 3D-point-cloud feature called viewpoint and shape feature histogram (VISH) to reduce feature ambiguity by capturing geometric properties of the 3D point cloud of a human. The feature extraction consists of three steps: 3D-point-cloud pre-processing, hierarchical structuring, and feature extraction. In the pre-processing step, 3D points corresponding to a human are extracted and outliers from the environment are removed to retain the 3D points of interest. This step is important because it allows us to reduce the number of 3D points by keeping only those points that correspond to the human body for further processing. In the hierarchical structuring, the pre-processed 3D point cloud is partitioned and replicated into a tree structure as nodes. Viewpoint feature histogram (VFH) and shape features are extracted from each node in the tree to provide a descriptor to represent each node. As the features are obtained based on histograms, coarse-level details are highlighted in large regions and fine-level details are highlighted in small regions. Therefore, the features from the point cloud in the tree can capture coarse level to fine level information to reduce feature ambiguity. In human-pose modeling, the main challenges involve reducing the dimensionality of human-pose space and designing appropriate factors that represent the underlying probability distributions for estimating human poses. To reduce the dimensionality, we propose a non-parametric action-mixture model (AMM). It represents high-dimensional human-pose space using low-dimensional manifolds in searching human poses. In each manifold, a probability distribution is estimated based on feature similarity. The distributions in the manifolds are then redistributed according to the stationary distribution of a Markov chain that models the frequency of human actions. After the redistribution, the manifolds are combined according to a probability distribution determined by action classification. Experiments were conducted using VISH features as input to the AMM. The results showed that the overall error and standard deviation of the AMM were reduced by about 7.9% and 7.1%, respectively, compared with a model without action classification. To design appropriate factors, we consider the AMM as a Bayesian network and propose a mapping that converts the Bayesian network to a neural network called NN-AMM. The proposed mapping consists of two steps: structure identification and parameter learning. In structure identification, we have developed a bottom-up approach to build a neural network while preserving the Bayesian-network structure. In parameter learning, we have created a part-based approach to learn synaptic weights by decomposing a neural network into parts. Based on the concept of distributed representation, the NN-AMM is further modified into a scalable neural network called NND-AMM. A neural-network-based system is then built by using VISH features to represent 3D-point-cloud input and the NND-AMM to estimate 3D human poses. The results showed that the proposed mapping can be utilized to design AMM factors automatically. The NND-AMM can provide more accurate human-pose estimates with fewer hidden neurons than both the AMM and NN-AMM can. Both the NN-AMM and NND-AMM can adapt to different types of input, showing the advantage of using neural networks to design factors

    Computational intelligence approaches to robotics, automation, and control [Volume guest editors]

    Get PDF
    No abstract available

    Intelligent Sensors for Human Motion Analysis

    Get PDF
    The book, "Intelligent Sensors for Human Motion Analysis," contains 17 articles published in the Special Issue of the Sensors journal. These articles deal with many aspects related to the analysis of human movement. New techniques and methods for pose estimation, gait recognition, and fall detection have been proposed and verified. Some of them will trigger further research, and some may become the backbone of commercial systems

    Multi-particle reconstruction with dynamic graph neural networks

    Get PDF
    The task of finding the incident particles from the sensor deposits they leave on particle detectors is called event or particle reconstruction. The sensor deposits can be represented generically as a point cloud, with each point corresponding to three spatial dimensions of the sensor location, the energy deposit, and occasionally, also the time of the deposit. As particle detectors become increasingly more complex, ever-more sophisticated methods are needed to perform particle reconstruction. An example is the ongoing High Luminosity (HL) upgrade of the Large Hadron Collider (HL-LHC). The HLHLC is the most significant milestone in experimental particle physics and aims to deliver an order of magnitude more data rate compared to the current LHC. As part of the upgrade, the endcap calorimeters of the Compact Muon Solenoid (CMS) experiment – one of the two largest and generalpurpose detectors at the LHC – will be replaced by the radiation-hard High Granularity Calorimeter (HGCAL). The HGCAL will contain ∼ 6 million sensors to achieve the spatial resolution required for reconstructing individual particles in HL-LHC conditions. It has an irregular geometry due to its hexagonal sensors, with sizes varying across the longitudinal and transverse axes. Further, it generates sparse data as less than 10% of the sensors register positive energy. Reconstruction in this environment, where highly irregular patterns of hits are left by the particles, is an unprecedentedly intractable and compute-intensive pattern recognition problem. This motivates the use of parallelisationfriendly deep learning approaches. More traditional deep learning methods, however, are not feasible for the HGCAL because a regular grid-like structure is assumed in those approaches. In this thesis, a reconstruction algorithm based on a dynamic graph neural network called GravNet is presented. The network is paired with a segmentation technique, Object Condensation, to first perform point-cloud segmentation on the detector hits. The property-prediction capability of the Object Condensation approach is then used for energy regression of the reconstructed particles. A range of experiments are conducted to show that this method works well in conditions expected in the HGCAL i.e., with 200 simultaneous proton-proton collisions. Parallel algorithms based on Nvidia CUDA are also presented to address the computational challenges of the graph neural network discussed in this thesis. With the optimisations, reconstruction can be performed by this method in approximately 2 seconds which is suitable considering the computational constraints at the LHC. The presented method is the first-ever example of deep learning based end-to-end calorimetric reconstruction in high occupancy environments. This sets the stage for the next era of particle reconstruction, which is expected to be end-to-end. While this thesis is focused on the HGCAL, the method discussed is general and can be extended not only to other calorimeters but also to other tasks such as track reconstruction

    Tracking Extended Objects in Noisy Point Clouds with Application in Telepresence Systems

    Get PDF
    We discuss theory and application of extended object tracking. This task is challenging as sensor noise prevents a correct association of the measurements to their sources on the object, the shape itself might be unknown a priori, and due to occlusion effects, only parts of the object are visible at a given time. We propose an approach to track the parameters of arbitrary objects, which provides new solutions to the above challenges, and marks a significant advance to the state of the art

    From motion capture to interactive virtual worlds : towards unconstrained motion-capture algorithms for real-time performance-driven character animation

    Get PDF
    This dissertation takes performance-driven character animation as a representative application and advances motion capture algorithms and animation methods to meet its high demands. Existing approaches have either coarse resolution and restricted capture volume, require expensive and complex multi-camera systems, or use intrusive suits and controllers. For motion capture, set-up time is reduced using fewer cameras, accuracy is increased despite occlusions and general environments, initialization is automated, and free roaming is enabled by egocentric cameras. For animation, increased robustness enables the use of low-cost sensors input, custom control gesture definition is guided to support novice users, and animation expressiveness is increased. The important contributions are: 1) an analytic and differentiable visibility model for pose optimization under strong occlusions, 2) a volumetric contour model for automatic actor initialization in general scenes, 3) a method to annotate and augment image-pose databases automatically, 4) the utilization of unlabeled examples for character control, and 5) the generalization and disambiguation of cyclical gestures for faithful character animation. In summary, the whole process of human motion capture, processing, and application to animation is advanced. These advances on the state of the art have the potential to improve many interactive applications, within and outside virtual reality.Diese Arbeit befasst sich mit Performance-driven Character Animation, insbesondere werden Motion Capture-Algorithmen entwickelt um den hohen Anforderungen dieser Beispielanwendung gerecht zu werden. Existierende Methoden haben entweder eine geringe Genauigkeit und einen eingeschränkten Aufnahmebereich oder benötigen teure Multi-Kamera-Systeme, oder benutzen störende Controller und spezielle Anzüge. Für Motion Capture wird die Setup-Zeit verkürzt, die Genauigkeit für Verdeckungen und generelle Umgebungen erhöht, die Initialisierung automatisiert, und Bewegungseinschränkung verringert. Für Character Animation wird die Robustheit für ungenaue Sensoren erhöht, Hilfe für benutzerdefinierte Gestendefinition geboten, und die Ausdrucksstärke der Animation verbessert. Die wichtigsten Beiträge sind: 1) ein analytisches und differenzierbares Sichtbarkeitsmodell für Rekonstruktionen unter starken Verdeckungen, 2) ein volumetrisches Konturenmodell für automatische Körpermodellinitialisierung in genereller Umgebung, 3) eine Methode zur automatischen Annotation von Posen und Augmentation von Bildern in großen Datenbanken, 4) das Nutzen von Beispielbewegungen für Character Animation, und 5) die Generalisierung und Übertragung von zyklischen Gesten für genaue Charakteranimation. Es wird der gesamte Prozess erweitert, von Motion Capture bis hin zu Charakteranimation. Die Verbesserungen sind für viele interaktive Anwendungen geeignet, innerhalb und außerhalb von virtueller Realität

    Data mining and modelling for sign language

    Get PDF
    Sign languages have received significantly less attention than spoken languages in the research areas of corpus analysis, machine translation, recognition, synthesis and social signal processing, amongst others. This is mainly due to signers being in a clear minority and there being a strong prior belief that sign languages are simply arbitrary gestures. To date, this manifests in the insufficiency of sign language resources available for computational modelling and analysis, with no agreed standards and relatively stagnated advancements compared to spoken language interaction research. Fortunately, the machine learning community has developed methods, such as transfer learning, for dealing with sparse resources, while data mining techniques, such as clustering can provide insights into the data. The work described here utilises such transfer learning techniques to apply neural language model to signed utterances and to compare sign language phonemes, which allows for clustering of similar signs, leading to automated annotation of sign language resources. This thesis promotes the idea that sign language research in computing should rely less on hand-annotated data thus opening up the prospect of using readily available online data (e.g. signed song videos) through the computational modelling and automated annotation techniques presented in this thesis
    corecore