10 research outputs found

    Μια υβριδική μέθοδος 3Δ οπτικής παρακολούθησης εξατομικευμένων μοντέλων του ανθρώπινου σώματος

    No full text
    Προτείνουμε μια νέα υβριδική μέθοδο για την τρισδιάστατη παρακολούθηση της στάσης του ανθρώπινου σώματος η οποία βασίζεται στην εκτίμηση δεδομένων που προκύπτουν από κάμερες χρώματος και βάθους. Αντιμετωπίζουμε το πρόβλημα ως πρόβλημα βελτιστοποίησης που επιλύεται χρησιμοποιώντας μια στοχαστική τεχνική. Τα αποτελέσματα που προκύπτουν από την εκτελούμενη βελτιστοποίηση είναι οι παράμετροι θέσης ενός ανθρώπινου μοντέλου που ταιριάζουν όσο το δυνατόν ακριβέστερα στις διαθέσιμες παρατηρήσεις. Η μέθοδος μας μπορεί να κάνει χρήση οποιουδήποτε τρισδιάστατου μοντέλου ανθρώπινου σώματος για την εκτέλεση της οπτικής παρακολούθησης. Ωστόσο, εστιάζουμε σε εξατομικευμένα μοντέλα που μπορούν εύκολα να αποκτηθούν χάρη στις υπάρχουσες σύγχρονες τεχνικές τρισδιάστατης ανακατασκευής. Οι παρατηρήσεις συνίστανται στην τρισδιάστατη δομή του ανθρώπου (που αποτυπώνεται από την κάμερα) και τις θέσεις των αρθρώσεων του σώματος (που υπολογίζονται με βάση ένα συνελικτικό νευρωνικό δίκτυο). Μια σειρά από ποσοτικά και ποιοτικά πειράματα καταδεικνύουν την ακρίβεια και τα οφέλη της προτεινόμενης προσέγγισης. Συγκεκριμένα όπως αποδεικνύουμε, η προτεινόμενη προσέγγιση επιτυγχάνει σημαντική ακρίβεια παρακολούθησης σε σχέση με ανταγωνιστικές μεθόδους και η χρήση εξατομικευμένων μοντέλων σώματος βελτιώνει σημαντικά την ποιότητα των αποτελεσμάτων τρισδιάστατης εκτίμησης της ανθρώπινης θέσης και στάσης.We propose a new hybrid method for 3D human body pose estimation based on RGBD data. We treat this as an optimization problem that is solved using a stochastic optimization technique. The solution to the optimization problem is the pose parameters of a human model that register it to the available observations. Our method can make use of any skinned, articulated human body model. However, we focus on personalized models that can be acquired easily and automatically based on existing human scanning and mesh rigging techniques. Observations consist of the 3D structure of the human (measured by the RGBD camera) and the body joints locations (computed based on a discriminative, CNN-based component). A series of quantitative and qualitative experiments demonstrate the accuracy and the benefits of the proposed approach. In particular, we show that the proposed approach achieves state of the art results compared to competitive methods and that the use of personalized body models improve significantly the accuracy in 3D human pose estimation

    Ολική εκτίμηση της τρισδιάστατης ανθρώπινης πόζας με MocapNETs

    No full text
    The goal of the presented thesis was to investigate and develop a novel, fast, portable, robust and accurate plug and play 3D Human Capture module that receives RGB images captured in-the-wild and regresses the 3D body configuration of any depicted person in the scene. The proposed architecture was built from scratch using first principles and taking advantage of recent advancements in Neural Networks, taking its final form as an ensemble of neural networks. We identified and bridged gaps between state-of-art deep learning methods and well-established model-based vision methodologies predating CNNs. Its name, “MocapNET” was coined to concisely describe it as it became the first neural network-based method in the literature to directly regress Motion Capture (Mocap) output in an end-to-end fashion. To improve accuracy and address personalization aspects, a novel real-time generative optimization algorithm was also developed named “Hierarchical Coordinate Descent” and tailored to the conditionally independent encoders of the MocapNET ensemble complementing their output. The ambition and scope of the retrieved 3D output gradually broadened as the method successfully generalized to more articulated structures during the course of its development. The total 3D capture solution presented includes upper body, lower body, hands, face and gaze. With the term 3D Human Capture we refer not only to positions in a 3D space but rather, the full kinematic solution of the skeleton. The method performs in real-time and its output is natively compatible with 3D editing software due to its BVH container. This makes it globally unique and among a select very few methods that can successfully tackle all these sub-problems that traditionally were sub-fields of the broader computer vision research. The 3D human pose estimation solution developed can be used in devices such as mobile phones, AR/VR headsets, self-driving cars, smart devices, home and factory robots etc, endowing them with capabilities to perceive, compare and enumerate human body poses, which would ultimately facilitate understanding of human behavior. The thesis attempts to carefully document all the aspects of the method including 2D shape descriptors, NN design, PCA compression to allow usage on mobile devices and the various attempts that shaped the method to its final version.Ο στόχος αυτής της διδακτορικής εργασίας ήταν να διερευνήσει και να αναπτύξει μία νέα, γρήγορη, φορητή, αξιόπιστη μέθοδο για τρισδιάστατη εκτίμηση της πόζας του σώματος των ανθρώπων που να λαμβάνει εικόνες από κάμερες χαμηλού κόστους και να εξάγει με ακρίβεια την τρισδιάστατη διαμόρφωση του σώματος ενός επιλεγμένου ατόμου που απεικονίζεται στη σκηνή. Η προτεινόμενη αρχιτεκτονική εκμεταλλεύτηκε τις πρόσφατες εξελίξεις στα Νευρωνικά ∆ίκτυα παίρνοντας την τελική της μορφή ως μια σύνθεση ενός σύνολου κωδικοποιητών νευρωνικών δικτύων. Εντοπίσαμε και γεφυρώσαμε κενά μεταξύ των μεθόδων βαθιάς μάθησης αιχμής και των παλαιότερων και πιο παγιωμένων μεθοδολογιών όρασης που βασίζονται σε μοντέλα που προηγήθηκαν των CNN. Το όνομα της, ≪MocapNET≫ επινοήθηκε για να την περιγράψει συνοπτικά, καθώς έγινε το πρώτο νευρωνικό δίκτυο στη βιβλιογραφία που επιτυγχάνει άμεσα τη σύλληψη κίνησης (motion capture - Mocap) χρησιμοποιώντας ένα νευρωνικό δίκτυο. Για να βελτιωθεί η ακρίβεια της μεθόδου και να αντιμετωπιστούν ζητήματα εξατομίκευσης, αναπτύχθηκε επίσης ένας νέος αλγόριθμος γενετικής βελτιστοποίησης σε πραγματικό χρόνο με το όνομα ἢΙεραρχική Κάθοδος Συντεταγμένωνὢ, ο οποίος εφαρμόζεται στην έξοδο των υπό συνθήκες ανεξάρτητων κωδικοποιητών του MocapNET. Η φιλοδοξία σχετικά με το εύρος της ανακτημένης 3∆ εξόδου σταδιακά διευρύνθηκε καθώς η μέθοδος γενικεύτηκε με επιτυχία σε πιο πολλές αρθρωτές δομές του ανθρώπινου σώματος. Η συνολική λύση τρισδιάστατης εκτίμησης της πόζας περιλαμβάνει το πάνω και κάτω μέρος του κορμού του σώματος, τα χέρια, το πρόσωπο και το βλέμμα. Το MocapNET είναι μία από τις πολύ λίγες μεθόδους που μπορούν να αντιμετωπίσουν με επιτυχία όλα αυτά τα υποπροβλήματα που παραδοσιακά αποτελούν υποπεδία της ευρύτερης έρευνας στην υπολογιστική όραση. Με τον όρο τρισδιάστατη λήψη δεν αναφερόμαστε μόνο σε ϑέσεις σημείων σε έναν τρισδιάστατο χώρο αλλά στην πλήρη κινηματική λύση του σκελετού. Η μέθοδος λειτουργεί σε πραγματικό χρόνο και η έξοδος της είναι άμεσα και εγγενώς συμβατή με λογισμικά 3∆ επεξεργασίας, λόγω της κωδικοποίησης BVH. Αυτό καθιστά το MocapNET παγκοσμίως μοναδικό. Επίσης, η εκτίμησης της 3∆ ανθρώπινης πόζας που αναπτύχθηκε μπορεί να χρησιμοποιηθεί σε συσκευές όπως κινητά τηλέφωνα, γυαλιά εικονικής πραγματικότητας, αυτοοδηγούμενα αυτοκίνητα, έξυπνες συσκευές, οικιακά και εργοστασιακά ρομπότ κ.λπ., προσδίδοντάς τους δυνατότητες αντίληψης, σύγκρισης και απαρίθμησης στάσεων του ανθρώπινου σώματος, κάτι που ϑα διευκολύνει τελικά την υπολογιστική κατανόηση και ερμηνεία των ανθρώπινων δράσεων. Η διατριβή επιχειρεί να τεκμηριώσει προσεκτικά όλες τις πτυχές της μεθόδου, συμπεριλαμβανομένων των 2∆ περιγραφέων σχήματος, της συμπίεσης PCA για να επιτρέπεται η χρήση σε κινητές συσκευές και τις διάφορες προσπάθειες που διαμόρφωσαν τη μέθοδο μέχρι την τελική της έκδοση

    Capturing and Reproducing Hand-Object Interactions Through Vision-Based Force Sensing

    No full text
    International audienceCapturing and reproducing hand-objects interactions would open considerable possibilities in computer vision, human-computer interfaces, robotics, animation and rehabilitation. Recently, we witnessed impressive vision-based hand tracking solutions that can potentially be used for such purposes. Yet, a challenging question is: to what extent can vision also capture haptic interactions? These induce motions and constraints that are key for learning and understanding tasks, such as dexterous grasping, manipulation and assembly, as well as enabling their reproduction from either virtual characters or physical embodiments. Contact forces are traditionally measured by means of haptic technologies such as force transducers, whose major drawback lies in their intrusiveness, with respect to the manipulated objects (impacting their physical properties) and the operator's hands (obstructing the human haptic senses). Others include their extensive need for calibration, time-varying accuracy and cost. In this paper, we present the force sensing from vision framework to capture haptic interaction by means of a cheap and simple set-up (e.g., a single RGB-D camera). We then illustrate its use as an implicit force model improving the reproduction of hand-object manipulation scenarios even in poor performance visual tracking conditions

    Complexity based investigation in collaborative assembly scenarios via non intrusive techniques

    No full text
    Human and robot collaboration in assembly tasks is an integral part in modern manufactories. Robots provide advantages in both process and productivity with their repeatability and usability in different tasks, while human operators provide flexibility and can act as safeguards. However, process complexity increases which can lower the overall quality. Increased complexity can negatively influence decision making due to cognitive load on human operators, which can lead to lower quality, be it product, process or human work. Moreover, it can lead to safety risks, human-system error and accidents. In this work, we present the preliminary results on an experiment performed with student-participants, based on an assembly task. The experiment was set up to emulate an industrial assembly, and data collection was performed through qualitative and non-intrusive quantitative methods. Questionnaires were used to assess perceptual task complexity and cognitive load, while a stereo camera provided recordings for after-task analysis on process errors and human work quality based on a 3D skeleton-based human pose estimation and tracking method. The aim of the study is to investigate causes of errors and implications on quality. Future direction of the work is discussed

    Multimodal Narratives for the Presentation of Silk Heritage in the Museum

    No full text
    International audienceIn this paper, a representation based on digital assets and semantic annotations is established for Traditional Craft instances, in a way that captures their socio-historic context and preserves both their tangible and intangible Cultural Heritage dimensions. These meaningful and documented experiential presentations are delivered to the target audience through narratives that address a range of uses, including personalized storytelling, interactive Augmented Reality (AR), augmented physical artifacts, Mixed Reality (MR) exhibitions, and the Web. The provided engaging cultural experiences have the potential to have an impact on interest growth and tourism, which can support Traditional Craft communities and institutions. A secondary impact is the attraction of new apprentices through training and demonstrations that guarantee long-term preservation. The proposed approach is demonstrated in the context of textile manufacturing as practiced by the community of the Haus der Seidenkultur, a former silk factory that was turned into a museum where the traditional craft of Jacquard weaving is still practiced

    Hobbit: Providing Fall Detection and Prevention for the Elderly in the Real World

    No full text
    We present the robot developed within the Hobbit project, a socially assistive service robot aiming at the challenge of enabling prolonged independent living of elderly people in their own homes. We present the second prototype (Hobbit PT2) in terms of hardware and functionality improvements following first user studies. Our main contribution lies within the description of all components developed within the Hobbit project, leading to autonomous operation of 371 days during field trials in Austria, Greece, and Sweden. In these field trials, we studied how 18 elderly users (aged 75 years and older) lived with the autonomously interacting service robot over multiple weeks. To the best of our knowledge, this is the first time a multifunctional, low-cost service robot equipped with a manipulator was studied and evaluated for several weeks under real-world conditions. We show that Hobbit’s adaptive approach towards the user increasingly eased the interaction between the users and Hobbit. We provide lessons learned regarding the need for adaptive behavior coordination, support during emergency situations, and clear communication of robotic actions and their consequences for fellow researchers who are developing an autonomous, low-cost service robot designed to interact with their users in domestic contexts. Our trials show the necessity to move out into actual user homes, as only there can we encounter issues such as misinterpretation of actions during unscripted human-robot interaction

    Results of Field Trials with a Mobile Service Robot for Older Adults in 16 Private Households

    No full text
    In this article, we present results obtained from field trials with the Hobbit robotic platform, an assistive, social service robot aiming at enabling prolonged independent living of older adults in their own homes. Our main contribution lies within the detailed results on perceived safety, usability, and acceptance from field trials with autonomous robots in real homes of older users. In these field trials, we studied how 16 older adults (75 plus) lived with autonomously interacting service robots over multiple weeks. Robots have been employed for periods of months previously in home environments for older people, and some have been tested with manipulation abilities, but this is the first time a study has tested a robot in private homes that provided the combination of manipulation abilities, autonomous navigation, and nonscheduled interaction for an extended period of time. This article aims to explore how older adults interact with such a robot in their private homes. Our results show that all users interacted with Hobbit daily, rated most functions as well working, and reported that they believe that Hobbit will be part of future elderly care. We show that Hobbit's adaptive behavior approach towards the user increasingly eased the interaction between the users and the robot. Our trials reveal the necessity to move into actual users' homes, as only there, we encounter real-world challenges and demonstrate issues such as misinterpretation of actions during non-scripted human-robot interaction

    Representation and Preservation of Heritage Crafts

    No full text
    This work regards the digital representation of tangible and intangible dimensions of heritage crafts, towards craft preservation. Based on state-of-the-art digital documentation, knowledge representation and narrative creation approach are presented. Craft presentation methods that use the represented content to provide accurate, intuitive, engaging, and educational ways for HC presentation and appreciation are proposed. The proposed methods aim to contribute to HC preservation, by adding value to the cultural visit, before, and after it
    corecore