885 research outputs found

    3D Hand reconstruction from monocular camera with model-based priors

    Get PDF
    As virtual and augmented reality (VR/AR) technology gains popularity, facilitating intuitive digital interactions in 3D is of crucial importance. Tools such as VR controllers exist, but such devices support only a limited range of interactions, mapped onto complex sequences of button presses that can be intimidating to learn. In contrast, users already have an instinctive understanding of manual interactions in the real world, which is readily transferable to the virtual world. This makes hands the ideal mode of interaction for down-stream applications such as robotic teleoperation, sign-language translation, and computer-aided design. Existing hand-tracking systems come with several inconvenient limitations. Wearable solutions such as gloves and markers unnaturally limit the range of articulation. Multi-camera systems are not trivial to calibrate and have specialized hardware requirements which make them cumbersome to use. Given these drawbacks, recent research tends to focus on monocular inputs, as these do not constrain articulation and suitable devices are pervasive in everyday life. 3D reconstruction in this setting is severely under-constrained, however, due to occlusions and depth ambiguities. The majority of state-of-the-art works rely on a learning framework to resolve these ambiguities statistically; as a result they have several limitations in common. For example, they require a vast amount of annotated 3D data that is labor intensive to obtain and prone to systematic error. Additionally, traits that are hard to quantify with annotations - the details of individual hand appearance - are difficult to reconstruct in such a framework. Existing methods also make the simplifying assumption that only a single hand is present in the scene. Two-hand interactions introduce additional challenges, however, in the form of inter-hand occlusion, left-right confusion, and collision constraints, that single hand methods cannot address. To tackle the aforementioned shortcomings of previous methods, this thesis advances the state-of-the-art through the novel use of model-based priors to incorporate hand-specific knowledge. In particular, this thesis presents a training method that reduces the amount of annotations required and is robust to systemic biases; it presents the first tracking method that addresses the challenging two-hand-interaction scenario using monocular RGB video, and also the first probabilistic method to model image ambiguity for two-hand interactions. Additionally, this thesis also contributes the first parametric hand texture model with example applications in hand personalization.Virtual- und Augmented-Reality-Technologien (VR/AR) gewinnen rapide an Beliebtheit und Einfluss, und so ist die Erleichterung intuitiver digitaler Interaktionen in 3D von wachsender Bedeutung. Zwar gibt es Tools wie VR-Controller, doch solche GerĂ€te unterstĂŒtzen nur ein begrenztes Spektrum an Interaktionen, oftmals abgebildet auf komplexe Sequenzen von TastendrĂŒcken, deren Erlernen einschĂŒchternd sein kann. Im Gegensatz dazu haben Nutzer bereits ein instinktives VerstĂ€ndnis fĂŒr manuelle Interaktionen in der realen Welt, das sich leicht auf die virtuelle Welt ĂŒbertragen lĂ€sst. Dies macht HĂ€nde zum idealen Werkzeug der Interaktion fĂŒr nachgelagerte Anwendungen wie robotergestĂŒtzte Teleoperation, Übersetzung von GebĂ€rdensprache und computergestĂŒtztes Design. Existierende Hand-Tracking Systeme leiden unter mehreren unbequemen EinschrĂ€nkungen. Tragbare Lösungen wie Handschuhe und aufgesetzte Marker schrĂ€nken den Bewegungsspielraum auf unnatĂŒrliche Weise ein. Systeme mit mehreren Kameras erfordern genaue Kalibrierung und haben spezielle Hardwareanforderungen, die ihre Anwendung umstĂ€ndlich gestalten. Angesichts dieser Nachteile konzentriert sich die neuere Forschung tendenziell auf monokularen Input, da so BewegungsablĂ€ufe nicht gestört werden und geeignete GerĂ€te im Alltag allgegenwĂ€rtig sind. Die 3D-Rekonstruktion in diesem Kontext stĂ¶ĂŸt jedoch aufgrund von Okklusionen und Tiefenmehrdeutigkeiten schnell an ihre Grenzen. Die Mehrheit der Arbeiten auf dem neuesten Stand der Technik setzt hierbei auf ein ML-Framework, um diese Mehrdeutigkeiten statistisch aufzulösen; infolgedessen haben all diese mehrere EinschrĂ€nkungen gemein. Beispielsweise benötigen sie eine große Menge annotierter 3D-Daten, deren Beschaffung arbeitsintensiv und anfĂ€llig fĂŒr systematische Fehler ist. DarĂŒber hinaus sind Merkmale, die mit Anmerkungen nur schwer zu quantifizieren sind – die Details des individuellen Erscheinungsbildes – in einem solchen Rahmen schwer zu rekonstruieren. Bestehende Verfahren gehen auch vereinfachend davon aus, dass nur eine einzige Hand in der Szene vorhanden ist. Zweihand-Interaktionen bringen jedoch zusĂ€tzliche Herausforderungen in Form von Okklusion der HĂ€nde untereinander, Links-Rechts-Verwirrung und KollisionsbeschrĂ€nkungen mit sich, die Einhand-Methoden nicht bewĂ€ltigen können. Um die oben genannten MĂ€ngel frĂŒherer Methoden anzugehen, bringt diese Arbeit den Stand der Technik durch die neuartige Verwendung modellbasierter Priors voran, um Hand-spezifisches Wissen zu integrieren. Insbesondere stellt diese Arbeit eine Trainingsmethode vor, die die Menge der erforderlichen Annotationen reduziert und robust gegenĂŒber systemischen Verzerrungen ist; es wird die erste Tracking-Methode vorgestellt, die das herausfordernde Zweihand-Interaktionsszenario mit monokularem RGB-Video angeht, und auch die erste probabilistische Methode zur Modellierung der Bildmehrdeutigkeit fĂŒr Zweihand-Interaktionen. DarĂŒber hinaus trĂ€gt diese Arbeit auch das erste parametrische Handtexturmodell mit Beispielanwendungen in der Hand-Personalisierung bei

    An original framework for understanding human actions and body language by using deep neural networks

    Get PDF
    The evolution of both fields of Computer Vision (CV) and Artificial Neural Networks (ANNs) has allowed the development of efficient automatic systems for the analysis of people's behaviour. By studying hand movements it is possible to recognize gestures, often used by people to communicate information in a non-verbal way. These gestures can also be used to control or interact with devices without physically touching them. In particular, sign language and semaphoric hand gestures are the two foremost areas of interest due to their importance in Human-Human Communication (HHC) and Human-Computer Interaction (HCI), respectively. While the processing of body movements play a key role in the action recognition and affective computing fields. The former is essential to understand how people act in an environment, while the latter tries to interpret people's emotions based on their poses and movements; both are essential tasks in many computer vision applications, including event recognition, and video surveillance. In this Ph.D. thesis, an original framework for understanding Actions and body language is presented. The framework is composed of three main modules: in the first one, a Long Short Term Memory Recurrent Neural Networks (LSTM-RNNs) based method for the Recognition of Sign Language and Semaphoric Hand Gestures is proposed; the second module presents a solution based on 2D skeleton and two-branch stacked LSTM-RNNs for action recognition in video sequences; finally, in the last module, a solution for basic non-acted emotion recognition by using 3D skeleton and Deep Neural Networks (DNNs) is provided. The performances of RNN-LSTMs are explored in depth, due to their ability to model the long term contextual information of temporal sequences, making them suitable for analysing body movements. All the modules were tested by using challenging datasets, well known in the state of the art, showing remarkable results compared to the current literature methods

    Real-time 3D hand reconstruction in challenging scenes from a single color or depth camera

    Get PDF
    Hands are one of the main enabling factors for performing complex tasks and humans naturally use them for interactions with their environment. Reconstruction and digitization of 3D hand motion opens up many possibilities for important applications. Hands gestures can be directly used for human–computer interaction, which is especially relevant for controlling augmented or virtual reality (AR/VR) devices where immersion is of utmost importance. In addition, 3D hand motion capture is a precondition for automatic sign-language translation, activity recognition, or teaching robots. Different approaches for 3D hand motion capture have been actively researched in the past. While being accurate, gloves and markers are intrusive and uncomfortable to wear. Hence, markerless hand reconstruction based on cameras is desirable. Multi-camera setups provide rich input, however, they are hard to calibrate and lack the flexibility for mobile use cases. Thus, the majority of more recent methods uses a single color or depth camera which, however, makes the problem harder due to more ambiguities in the input. For interaction purposes, users need continuous control and immediate feedback. This means the algorithms have to run in real time and be robust in uncontrolled scenes. These requirements, achieving 3D hand reconstruction in real time from a single camera in general scenes, make the problem significantly more challenging. While recent research has shown promising results, current state-of-the-art methods still have strong limitations. Most approaches only track the motion of a single hand in isolation and do not take background-clutter or interactions with arbitrary objects or the other hand into account. The few methods that can handle more general and natural scenarios run far from real time or use complex multi-camera setups. Such requirements make existing methods unusable for many aforementioned applications. This thesis pushes the state of the art for real-time 3D hand tracking and reconstruction in general scenes from a single RGB or depth camera. The presented approaches explore novel combinations of generative hand models, which have been used successfully in the computer vision and graphics community for decades, and powerful cutting-edge machine learning techniques, which have recently emerged with the advent of deep learning. In particular, this thesis proposes a novel method for hand tracking in the presence of strong occlusions and clutter, the first method for full global 3D hand tracking from in-the-wild RGB video, and a method for simultaneous pose and dense shape reconstruction of two interacting hands that, for the first time, combines a set of desirable properties previously unseen in the literature.HĂ€nde sind einer der Hauptfaktoren fĂŒr die AusfĂŒhrung komplexer Aufgaben, und Menschen verwenden sie auf natĂŒrliche Weise fĂŒr Interaktionen mit ihrer Umgebung. Die Rekonstruktion und Digitalisierung der 3D-Handbewegung eröffnet viele Möglichkeiten fĂŒr wichtige Anwendungen. Handgesten können direkt als Eingabe fĂŒr die Mensch-Computer-Interaktion verwendet werden. Dies ist insbesondere fĂŒr GerĂ€te der erweiterten oder virtuellen RealitĂ€t (AR / VR) relevant, bei denen die Immersion von grĂ¶ĂŸter Bedeutung ist. DarĂŒber hinaus ist die Rekonstruktion der 3D Handbewegung eine Voraussetzung zur automatischen Übersetzung von GebĂ€rdensprache, zur AktivitĂ€tserkennung oder zum Unterrichten von Robotern. In der Vergangenheit wurden verschiedene AnsĂ€tze zur 3D-Handbewegungsrekonstruktion aktiv erforscht. Handschuhe und physische Markierungen sind zwar prĂ€zise, aber aufdringlich und unangenehm zu tragen. Daher ist eine markierungslose Handrekonstruktion auf der Basis von Kameras wĂŒnschenswert. Multi-Kamera-Setups bieten umfangreiche Eingabedaten, sind jedoch schwer zu kalibrieren und haben keine FlexibilitĂ€t fĂŒr mobile AnwendungsfĂ€lle. Daher verwenden die meisten neueren Methoden eine einzelne Farb- oder Tiefenkamera, was die Aufgabe jedoch schwerer macht, da mehr AmbiguitĂ€ten in den Eingabedaten vorhanden sind. FĂŒr Interaktionszwecke benötigen Benutzer kontinuierliche Kontrolle und sofortiges Feedback. Dies bedeutet, dass die Algorithmen in Echtzeit ausgefĂŒhrt werden mĂŒssen und robust in unkontrollierten Szenen sein mĂŒssen. Diese Anforderungen, 3D-Handrekonstruktion in Echtzeit mit einer einzigen Kamera in allgemeinen Szenen, machen das Problem erheblich schwieriger. WĂ€hrend neuere Forschungsarbeiten vielversprechende Ergebnisse gezeigt haben, weisen aktuelle Methoden immer noch EinschrĂ€nkungen auf. Die meisten AnsĂ€tze verfolgen die Bewegung einer einzelnen Hand nur isoliert und berĂŒcksichtigen keine alltĂ€glichen Umgebungen oder Interaktionen mit beliebigen Objekten oder der anderen Hand. Die wenigen Methoden, die allgemeinere und natĂŒrlichere Szenarien verarbeiten können, laufen nicht in Echtzeit oder verwenden komplexe Multi-Kamera-Setups. Solche Anforderungen machen bestehende Verfahren fĂŒr viele der oben genannten Anwendungen unbrauchbar. Diese Dissertation erweitert den Stand der Technik fĂŒr die Echtzeit-3D-Handverfolgung und -Rekonstruktion in allgemeinen Szenen mit einer einzelnen RGB- oder Tiefenkamera. Die vorgestellten Algorithmen erforschen neue Kombinationen aus generativen Handmodellen, die seit Jahrzehnten erfolgreich in den Bereichen Computer Vision und Grafik eingesetzt werden, und leistungsfĂ€higen innovativen Techniken des maschinellen Lernens, die vor kurzem mit dem Aufkommen neuronaler Netzwerke entstanden sind. In dieser Arbeit werden insbesondere vorgeschlagen: eine neuartige Methode zur Handbewegungsrekonstruktion bei starken Verdeckungen und in unkontrollierten Szenen, die erste Methode zur Rekonstruktion der globalen 3D Handbewegung aus RGB-Videos in freier Wildbahn und die erste Methode zur gleichzeitigen Rekonstruktion von Handpose und -form zweier interagierender HĂ€nde, die eine Reihe wĂŒnschenwerter Eigenschaften komibiniert

    Analyzing fibrous tissue pattern in fibrous dysplasia bone images using deep R-CNN networks for segmentation

    Get PDF
    Predictive health monitoring systems help to detect human health threats in the early stage. Evolving deep learning techniques in medical image analysis results in efficient feedback in quick time. Fibrous dysplasia (FD) is a genetic disorder, triggered by the mutation in Guanine Nucleotide binding protein with alpha stimulatory activities in the human bone genesis. It slowly occupies the bone marrow and converts the bone cell into fibrous tissues. It weakens the bone structure and leads to permanent disability. This paper proposes the study of FD bone image analyzing techniques with deep networks. Also, the linear regression model is annotated for predicting the bone abnormality levels with observed coefficients. Modern image processing begins with various image filters. It describes the edges, shades, texture values of the receptive field. Different types of segmentation and edge detection mechanisms are applied to locate the tumor, lesion, and fibrous tissues in the bone image. Extract the fibrous region in the bone image using the region-based convolutional neural network algorithm. The segmented results are compared with their accuracy metrics. The segmentation loss is reduced by each iteration. The overall loss is 0.24% and the accuracy is 99%, segmenting the masked region produces 98% of accuracy, and building the bounding boxes is 99% of accuracy

    Novel Approaches to the Representation and Analysis of 3D Segmented Anatomical Districts

    Get PDF
    Nowadays, image processing and 3D shape analysis are an integral part of clinical practice and have the potentiality to support clinicians with advanced analysis and visualization techniques. Both approaches provide visual and quantitative information to medical practitioners, even if from different points of view. Indeed, shape analysis is aimed at studying the morphology of anatomical structures, while image processing is focused more on the tissue or functional information provided by the pixels/voxels intensities levels. Despite the progress obtained by research in both fields, a junction between these two complementary worlds is missing. When working with 3D models analyzing shape features, the information of the volume surrounding the structure is lost, since a segmentation process is needed to obtain the 3D shape model; however, the 3D nature of the anatomical structure is represented explicitly. With volume images, instead, the tissue information related to the imaged volume is the core of the analysis, while the shape and morphology of the structure are just implicitly represented, thus not clear enough. The aim of this Thesis work is the integration of these two approaches in order to increase the amount of information available for physicians, allowing a more accurate analysis of each patient. An augmented visualization tool able to provide information on both the anatomical structure shape and the surrounding volume through a hybrid representation, could reduce the gap between the two approaches and provide a more complete anatomical rendering of the subject. To this end, given a segmented anatomical district, we propose a novel mapping of volumetric data onto the segmented surface. The grey-levels of the image voxels are mapped through a volume-surface correspondence map, which defines a grey-level texture on the segmented surface. The resulting texture mapping is coherent to the local morphology of the segmented anatomical structure and provides an enhanced visual representation of the anatomical district. The integration of volume-based and surface-based information in a unique 3D representation also supports the identification and characterization of morphological landmarks and pathology evaluations. The main research contributions of the Ph.D. activities and Thesis are: \u2022 the development of a novel integration algorithm that combines surface-based (segmented 3D anatomical structure meshes) and volume-based (MRI volumes) information. The integration supports different criteria for the grey-levels mapping onto the segmented surface; \u2022 the development of methodological approaches for using the grey-levels mapping together with morphological analysis. The final goal is to solve problems in real clinical tasks, such as the identification of (patient-specific) ligament insertion sites on bones from segmented MR images, the characterization of the local morphology of bones/tissues, the early diagnosis, classification, and monitoring of muscle-skeletal pathologies; \u2022 the analysis of segmentation procedures, with a focus on the tissue classification process, in order to reduce operator dependency and to overcome the absence of a real gold standard for the evaluation of automatic segmentations; \u2022 the evaluation and comparison of (unsupervised) segmentation methods, finalized to define a novel segmentation method for low-field MR images, and for the local correction/improvement of a given segmentation. The proposed method is simple but effectively integrates information derived from medical image analysis and 3D shape analysis. Moreover, the algorithm is general enough to be applied to different anatomical districts independently of the segmentation method, imaging techniques (such as CT), or image resolution. The volume information can be integrated easily in different shape analysis applications, taking into consideration not only the morphology of the input shape but also the real context in which it is inserted, to solve clinical tasks. The results obtained by this combined analysis have been evaluated through statistical analysis

    Towards electrodeless EMG linear envelope signal recording for myo-activated prostheses control

    Get PDF
    After amputation, the residual muscles of the limb may function in a normal way, enabling the electromyogram (EMG) signals recorded from them to be used to drive a replacement limb. These replacement limbs are called myoelectric prosthesis. The prostheses that use EMG have always been the first choice for both clinicians and engineers. Unfortunately, due to the many drawbacks of EMG (e.g. skin preparation, electromagnetic interferences, high sample rate, etc.); researchers have aspired to find suitable alternatives. One proposes the dry-contact, low-cost sensor based on a force-sensitive resistor (FSR) as a valid alternative which instead of detecting electrical events, detects mechanical events of muscle. FSR sensor is placed on the skin through a hard, circular base to sense the muscle contraction and to acquire the signal. Similarly, to reduce the output drift (resistance) caused by FSR edges (creep) and to maintain the FSR sensitivity over a wide input force range, signal conditioning (Voltage output proportional to force) is implemented. This FSR signal acquired using FSR sensor can be used directly to replace the EMG linear envelope (an important control signal in prosthetics applications). To find the best FSR position(s) to replace a single EMG lead, the simultaneous recording of EMG and FSR output is performed. Three FSRs are placed directly over the EMG electrodes, in the middle of the targeted muscle and then the individual (FSR1, FSR2 and FSR3) and combination of FSR (e.g. FSR1+FSR2, FSR2-FSR3) is evaluated. The experiment is performed on a small sample of five volunteer subjects. The result shows a high correlation (up to 0.94) between FSR output and EMG linear envelope. Consequently, the usage of the best FSR sensor position shows the ability of electrode less FSR-LE to proportionally control the prosthesis (3-D claw). Furthermore, FSR can be used to develop a universal programmable muscle signal sensor that can be suitable to control the myo-activated prosthesis

    Model-Based High-Dimensional Pose Estimation with Application to Hand Tracking

    Get PDF
    This thesis presents novel techniques for computer vision based full-DOF human hand motion estimation. Our main contributions are: A robust skin color estimation approach; A novel resolution-independent and memory efficient representation of hand pose silhouettes, which allows us to compute area-based similarity measures in near-constant time; A set of new segmentation-based similarity measures; A new class of similarity measures that work for nearly arbitrary input modalities; A novel edge-based similarity measure that avoids any problematic thresholding or discretizations and can be computed very efficiently in Fourier space; A template hierarchy to minimize the number of similarity computations needed for finding the most likely hand pose observed; And finally, a novel image space search method, which we naturally combine with our hierarchy. Consequently, matching can efficiently be formulated as a simultaneous template tree traversal and function maximization

    Synthetic Data Generation for Deep Learning-based Semantic Segmentation

    Get PDF
    The semantic segmentation of a scene is one of the basic components towards the total understanding of this scene that make up a robotic perception system. Currently, systems based on deep learning, specifically convolutional networks, dominate the state of the art with highly accurate results. However, these systems rely on datasets of unprecedented scale and variability in order to properly generalize into the potentially infinite number of situations in which they can be deployed. Current datasets often have problems in achieving this scale and variability as they rely on human operators both for the capture of the data itself and for its labelling, which is essential for this type of supervised learning techniques. The high cost in time and resources of this task makes it difficult to obtain large-scale and highly representative data sets for specific situations. In this work we propose the exploration of photorealistic synthetic data as a source to train new systems, to improve the capacity of generalization of those already trained with real data or to facilitate training when a small amount of them is available. To do this we will resort to Unreal Engine 4 to create UnrealROX1 with the objective of generating an extremely photorealistic data set. We will implement a series of tools to generate this data by creating a simulator capable of doing this work

    Artificial intelligence in musculoskeletal ultrasound imaging

    Get PDF
    Ultrasonography (US) is noninvasive and offers real-time, low-cost, and portable imaging that facilitates the rapid and dynamic assessment of musculoskeletal components. Significant technological improvements have contributed to the increasing adoption of US for musculoskeletal assessments, as artificial intelligence (AI)-based computer-aided detection and computer-aided diagnosis are being utilized to improve the quality, efficiency, and cost of US imaging. This review provides an overview of classical machine learning techniques and modern deep learning approaches for musculoskeletal US, with a focus on the key categories of detection and diagnosis of musculoskeletal disorders, predictive analysis with classification and regression, and automated image segmentation. Moreover, we outline challenges and a range of opportunities for AI in musculoskeletal US practice.11Nsciescopu
    • 

    corecore