12 research outputs found

    Gait analysis comparison between manual marking, 2D pose estimation algorithms, and 3D marker-based system

    Get PDF
    IntroductionRecent advances in Artificial Intelligence (AI) and Computer Vision (CV) have led to automated pose estimation algorithms using simple 2D videos. This has created the potential to perform kinematic measurements without the need for specialized, and often expensive, equipment. Even though there's a growing body of literature on the development and validation of such algorithms for practical use, they haven't been adopted by health professionals. As a result, manual video annotation tools remain pretty common. Part of the reason is that the pose estimation modules can be erratic, producing errors that are difficult to rectify. Because of that, health professionals prefer the use of tried and true methods despite the time and cost savings pose estimation can offer.MethodsIn this work, the gait cycle of a sample of the elderly population on a split-belt treadmill is examined. The Openpose (OP) and Mediapipe (MP) AI pose estimation algorithms are compared to joint kinematics from a marker-based 3D motion capture system (Vicon), as well as from a video annotation tool designed for biomechanics (Kinovea). Bland-Altman (B-A) graphs and Statistical Parametric Mapping (SPM) are used to identify regions of statistically significant difference.ResultsResults showed that pose estimation can achieve motion tracking comparable to marker-based systems but struggle to identify joints that exhibit small, but crucial motion.DiscussionJoints such as the ankle, can suffer from misidentification of their anatomical landmarks. Manual tools don't have that problem, but the user will introduce a static offset across the measurements. It is proposed that an AI-powered video annotation tool that allows the user to correct errors would bring the benefits of pose estimation to professionals at a low cost

    Βαθιά εκμάθηση αναπαράστασης με εφαρμογή στην πολυτροπική ανάλυση συναισθήματος στη ρομποτική

    No full text
    One of the most prominent attributes of Neural Networks (NNs) constitutes their capability of learning to extract robust and descriptive features from high-dimensional data, like images. Hence, such an ability renders their exploitation as feature extractors particularly frequent in an abundance of modern reasoning systems. Their application scope mainly includes complex cascade tasks, like multi-modal recognition, deep Reinforcement Learning (RL), as well as their exploitation as descriptors in feature learning challenges, a field that enjoys apparent popularity over the past few years. Feature or representation learning focuses on the development of effective loss functions that ensure both high feature discrimination among different classes, as well as low geodesic distance between the feature vectors of a given class. The vast majority of the contemporary works rely their formulation on an empirical assumption (H) about the feature space (F) of a network’s last hidden layer, claiming that the weight vector of a class accounts for its geometrical center in the studied space. However, NNs induce implicit biases that are difficult to avoid or to deal with and are not met in traditional image descriptors. Moreover, the lack of knowledge for describing the intra-layer properties -and thus their general behavior- restricts the further applicability of the extracted features. A research field that can be considerably benefited by a robust feature extraction scheme is that of multi-modal emotion recognition. The advancement of Human-Robot Interaction (HRI) drives research into the development of advanced emotion identification architectures that fathom audio-visual (A-V) modalities of human emotion. State-of-the-art approaches exploit unimodal Deep Neural Networks (DNN) to process sensory inputs and learn a latent representation for each modality. Then, the extracted unimodal vectors are concatenated and fed into a fusion network that is responsible for the extraction of a compact emotional representation. However, the introduced methods in multi-modal emotion recognition mainly focus on the classification of complete video sequences, leading to systems with no online potentialities. Such techniques are capable of predicting emotions only when the videos are concluded, thus restricting their applicability in practical scenarios. The motivation of this study can be conceived through a bottom-up manner. In specific, beginning with the task to incorporate online capabilities in an A-V emotion recognition system, in order to render it suitable for HRI scenarios, the necessity of Summary and Contribution improving the adopted techniques in the fusion network emerged. Consequently, by investigating the possible strategies for fusing the A-V modalities, we came up with the challenge of understanding the properties of the representations learned by the unimodal feature extractors. The findings of the above analysis led to the conclusion that empirical assumptions such H can not be arbitrarily taken for granted. Hence, we were challenged to proceed with several adjustments regarding existing methods in the field of feature learning.Ένα από τα πιο εμφανή χαρακτηριστικά των νευρωνικών δικτύων (NNs) αποτελεί η ικανότητά τους να μαϑαίνουνε να εξάγουνε εύρωστα και περιγραφικά χαρακτηριστικά από δεδομένα υψηλής διάστασης, όπως οι ειkόνες. Ως εκ τούτου, μια τέτοια ικανότητα καϑιστά συχνή την εκμετάλλευσή τους ως εξαγωγείς χαρακτηριστικών σε πληϑώρα σύγχρονων συστημάτων συλλογιστικής. Το πεδίο εφαρμογής τους κυρίως περιλαμβάνει πολύπλοκες διαδοχικές εργασίες, όπως η πολυτροπική αναγνώριση, βαϑειά ενισχυτική μάϑηση (RL), καϑώς και η εκμετάλλευσή τους ως περιγραφείς σε προκλήσεις εκμάϑησης αναπαραστάσεων, ένα πεδίο που απολαμβάνει εμφανή δημοτικότητα τα τελευταία χρόνια. Η εκμάϑηση χαρακτηριστικών ή αναπαραστάσεων επικεντρώνεται στην ανάπτυξη αποτελεσματικών συναρτήσεων κόστους που εξασφαλίζουν τόσο υψηλή διακριτοποίηση μεταξύ χαραkτηριστικών διαφορετικών κλάσεων, καϑώς και χαμηλή απόσταση μεταξύ διανυσμάτων χαρακτηριστικών μιας δεδομένης κλάσης. Η συντριπτική πλειονότητα των σύγχρονων μεϑόδων βασίζει τη διατύπωσή τους σε μια εμπειρική υπόϑεση (H) σχετικά με το χώρο των χαρακτηριστικών (F) του τελευταίου κρυφού επιπέδου ενός δικτύου, υποστηρίζοντας ότι το διάνυσμα βάρους μιας κλάσης συμπίπτει στον μελετούμενο χώρο με το γεωμετρικό της κέντρο. Ωστόσο, τα NN εμπεριέχουν έμμεσα μεροληψίες, οι οποίες είναι δύσκολο να αποφευχϑούν ή να αντιμετωπιστούν και δεν απαντώνται σε παραδοσιακούς περιγραφείς εικόνων. Επιπλέον, η έλλειψη γνώσεων για την περιγραφή των ιδιοτήτων εντός ενός επιπέδου - και επομένως της γενιkής τους συμπεριφοράς - περιορίζει την περαιτέρω δυνατότητα εφαρμογής των εξαγόμενων χαρακτηριστικών. Ένα ερευνητιkό πεδίο που μπορεί να επωφεληϑεί σημαντικά από ένα εύρωστο σύστημα εξαγωγής χαρακτηριστιkών είναι αυτό της πολυτροπικής αναγνώρισης συναισϑημάτων. Η πρόοδος του Human Robot Interaction (HRI) οδηγεί την έρευνα στην ανάπτυξη εξελιγμένων αρχιτεκτονικών ταυτοποίησης συναισϑήματος που κατανοούν τις οπτικοακουστικές (A-V) τροπικότητες του ανϑρώπινου συναισϑήματος. Οι υπερσύγχρονες προσεγγίσεις εκμεταλλεύονται μονοτροπιkά Deep Neural Networks (DNN) για την επεξεργασία των εισόδων από τους αισϑητήρες και την εκμάϑηση μιας λανϑάνουσας αναπαράστασης για κάϑε τροπικότητα. Στη συνέχεια, τα εξαγώμενα μονοτροπικά διανύσματα συνενώνονται και τροφοδοτούνται σε ένα δίκτυο σύντηξης που είναι υπεύϑυνο για την εξαγωγή μιας συνολικής αναπαράστασης. Ωστόσο, οι προτεινόμενες μέϑοδοι στην πολυτροπική αναγνώριση συναισϑημάτων επικεντρώνονται κυρίως στην ταξινόμηση ολοκληρωμένων βίντεο, οδηγώντας σε συστήματα χωρίς online δυνατότητες. Τέτοιες τεχνικές είναι ικανές να προβλέψουν συναισϑήματα μόνο όταν τα βίντεο ολοκληρώνονται, περιορίζοντας έτσι την εφαρμογή τους σε πρακτικά σενάρια. Το κίνητρο αυτής της μελέτης μπορεί να συλληφϑεί με τρόπο από κάτω προς τα πάνω. Πιο συγκεκριμένα, ξεκινώντας από τον στόχο να ενσωματωϑούν online δυνατότητες σε ένα σύστημα A-V αναγνώρισης συναισϑημάτων, προκειμένου να κατασταϑεί κατάλληλο για σενάρια HRI, προέκυψε η ανάγκη βελτίωσης των υιοϑετούμενων τεχνικών στο δίκτυο σύντηξης. Εν συνεχεία, διερευνώντας τις πιϑανές στρατηγικές για τη σύντηξη των A-V τροπικοτήτων, καταλήξαμε στην πρόκληση της κατανόησης των ιδιοτήτων των αναπαραστάσεων που μαϑαίνονται από τους μονοτροπικούς εξαγωγείς χαρακτηριστικών. Τα ευρήματα της παραπάνω ανάλυσης οδήγησαν στο συμπέρασμα ότι εμπειρικές υποϑέσεις όπως η H δεν μπορούν να ϑεωρηϑούν αυϑαίρετα δεδομένες. Ως εκ τούτου, κληϑήkαμε να προχωρήσουμε σε ορισμένες τροποποιήσεις σχετικά με υπάρχουσες μεϑόδους στο πεδίο της εκμάϑησης χαρακτηριστικών

    Methane Concentration Forecasting Based on Sentinel-5P Products and Recurrent Neural Networks

    No full text
    The increase in the concentration of geological gas emissions in the atmosphere and particularly the increase of methane is considered by the majority of the scientific community as the main cause of global climate change. The main reasons that place methane at the center of interest, lie in its high global warming potential (GWP) and its lifetime in the atmosphere. Anthropogenic processes, like engineering geology ones, highly affect the daily profile of gasses in the atmosphere. Should direct measures be taken to reduce emissions of methane, immediate global warming mitigation could be achieved. Due to its significance, methane has been monitored by many space missions over the years and as of 2017 by the Sentinel-5P mission. Considering the above, we conclude that monitoring and predicting future methane concentration based on past data is of vital importance for the course of climate change over the next decades. To that end, we introduce a method exploiting state-of-the-art recurrent neural networks (RNNs), which have been proven particularly effective in regression problems, such as time-series forecasting. Aligned with the green artificial intelligence (AI) initiative, the paper at hand investigates the ability of different RNN architectures to predict future methane concentration in the most active regions of Texas, Pennsylvania and West Virginia, by using Sentinel-5P methane data and focusing on computational and complexity efficiency. We conduct several empirical studies and utilize the obtained results to conclude the most effective architecture for the specific use case, establishing a competitive prediction performance that reaches up to a 0.7578 mean squared error on the evaluation set. Yet, taking into consideration the overall efficiency of the investigated models, we conclude that the exploitation of RNN architectures with less number of layers and a restricted number of units, i.e., one recurrent layer with 8 neurons, is able to better compensate for competitive prediction performance, meanwhile sustaining lower computational complexity and execution time. Finally, we compare RNN models against deep neural networks along with the well-established support vector regression, clearly highlighting the supremacy of the recurrent ones, as well as discuss future extensions of the introduced work

    A Hybrid Spiking Neural Network Reinforcement Learning Agent for Energy-Efficient Object Manipulation

    No full text
    Due to the wide spread of robotics technologies in everyday activities, from industrial automation to domestic assisted living applications, cutting-edge techniques such as deep reinforcement learning are intensively investigated with the aim to advance the technological robotics front. The mandatory limitation of power consumption remains an open challenge in contemporary robotics, especially in real-case applications. Spiking neural networks (SNN) constitute an ideal compromise as a strong computational tool with low-power capacities. This paper introduces a spiking neural network actor for a baseline robotic manipulation task using a dual-finger gripper. To achieve that, we used a hybrid deep deterministic policy gradient (DDPG) algorithm designed with a spiking actor and a deep critic network to train the robotic agent. Thus, the agent learns to obtain the optimal policies for the three main tasks of the robotic manipulation approach: target-object reach, grasp, and transfer. The proposed method has one of the main advantages that an SNN possesses, namely, its neuromorphic hardware implementation capacity that results in energy-efficient implementations. The latter accomplishment is highly demonstrated in the evaluation results of the SNN actor since the deep critic network was exploited only during training. Aiming to further display the capabilities of the introduced approach, we compare our model with the well-established DDPG algorithm

    A Hybrid Spiking Neural Network Reinforcement Learning Agent for Energy-Efficient Object Manipulation

    No full text
    Due to the wide spread of robotics technologies in everyday activities, from industrial automation to domestic assisted living applications, cutting-edge techniques such as deep reinforcement learning are intensively investigated with the aim to advance the technological robotics front. The mandatory limitation of power consumption remains an open challenge in contemporary robotics, especially in real-case applications. Spiking neural networks (SNN) constitute an ideal compromise as a strong computational tool with low-power capacities. This paper introduces a spiking neural network actor for a baseline robotic manipulation task using a dual-finger gripper. To achieve that, we used a hybrid deep deterministic policy gradient (DDPG) algorithm designed with a spiking actor and a deep critic network to train the robotic agent. Thus, the agent learns to obtain the optimal policies for the three main tasks of the robotic manipulation approach: target-object reach, grasp, and transfer. The proposed method has one of the main advantages that an SNN possesses, namely, its neuromorphic hardware implementation capacity that results in energy-efficient implementations. The latter accomplishment is highly demonstrated in the evaluation results of the SNN actor since the deep critic network was exploited only during training. Aiming to further display the capabilities of the introduced approach, we compare our model with the well-established DDPG algorithm

    Continuous Emotion Recognition for Long-Term Behavior Modeling through Recurrent Neural Networks

    No full text
    One’s internal state is mainly communicated through nonverbal cues, such as facial expressions, gestures and tone of voice, which in turn shape the corresponding emotional state. Hence, emotions can be effectively used, in the long term, to form an opinion of an individual’s overall personality. The latter can be capitalized on in many human–robot interaction (HRI) scenarios, such as in the case of an assisted-living robotic platform, where a human’s mood may entail the adaptation of a robot’s actions. To that end, we introduce a novel approach that gradually maps and learns the personality of a human, by conceiving and tracking the individual’s emotional variations throughout their interaction. The proposed system extracts the facial landmarks of the subject, which are used to train a suitably designed deep recurrent neural network architecture. The above architecture is responsible for estimating the two continuous coefficients of emotion, i.e., arousal and valence, following the broadly known Russell’s model. Finally, a user-friendly dashboard is created, presenting both the momentary and the long-term fluctuations of a subject’s emotional state. Therefore, we propose a handy tool for HRI scenarios, where robot’s activity adaptation is needed for enhanced interaction performance and safety

    Continuous Emotion Recognition for Long-Term Behavior Modeling through Recurrent Neural Networks

    No full text
    One’s internal state is mainly communicated through nonverbal cues, such as facial expressions, gestures and tone of voice, which in turn shape the corresponding emotional state. Hence, emotions can be effectively used, in the long term, to form an opinion of an individual’s overall personality. The latter can be capitalized on in many human–robot interaction (HRI) scenarios, such as in the case of an assisted-living robotic platform, where a human’s mood may entail the adaptation of a robot’s actions. To that end, we introduce a novel approach that gradually maps and learns the personality of a human, by conceiving and tracking the individual’s emotional variations throughout their interaction. The proposed system extracts the facial landmarks of the subject, which are used to train a suitably designed deep recurrent neural network architecture. The above architecture is responsible for estimating the two continuous coefficients of emotion, i.e., arousal and valence, following the broadly known Russell’s model. Finally, a user-friendly dashboard is created, presenting both the momentary and the long-term fluctuations of a subject’s emotional state. Therefore, we propose a handy tool for HRI scenarios, where robot’s activity adaptation is needed for enhanced interaction performance and safety
    corecore