653 research outputs found

    Human Shape-Motion Analysis In Athletics Videos for Coarse To Fine Action/Activity Recognition Using Transferable BeliefModel

    Get PDF
    We present an automatic human shape-motion analysis method based on a fusion architecture for human action and activity recognition in athletic videos. Robust shape and motion features are extracted from human detection and tracking. The features are combined within the Transferable Belief Model (TBM framework for two levels of recognition. The TBM-based modelling of the fusion process allows to take into account imprecision, uncertainty and conflict inherent to the features. First, in a coarse step, actions are roughly recognized. Then, in a fine step, an action sequence recognition method is used to discriminate activities. Belief on actions are made smooth by a Temporal Credal Filter and action sequences, i.e. activities, are recognized using a state machine, called belief scheduler, based on TBM. The belief scheduler is also exploited for feedback information extraction in order to improve tracking results. The system is tested on real videos of athletics meetings to recognize four types of actions (running, jumping, falling and standing) and four types of activities (high jump, pole vault, triple jump and long jump). Results on actions, activities and feedback demonstrate the relevance of the proposed features and as well the efficiency of the proposed recognition approach based on TBM

    Deep convolutional and LSTM recurrent neural networks for multimodal wearable activity recognition

    Get PDF
    Human activity recognition (HAR) tasks have traditionally been solved using engineered features obtained by heuristic processes. Current research suggests that deep convolutional neural networks are suited to automate feature extraction from raw sensor inputs. However, human activities are made of complex sequences of motor movements, and capturing this temporal dynamics is fundamental for successful HAR. Based on the recent success of recurrent neural networks for time series domains, we propose a generic deep framework for activity recognition based on convolutional and LSTM recurrent units, which: (i) is suitable for multimodal wearable sensors; (ii) can perform sensor fusion naturally; (iii) does not require expert knowledge in designing features; and (iv) explicitly models the temporal dynamics of feature activations. We evaluate our framework on two datasets, one of which has been used in a public activity recognition challenge. Our results show that our framework outperforms competing deep non-recurrent networks on the challenge dataset by 4% on average; outperforming some of the previous reported results by up to 9%. Our results show that the framework can be applied to homogeneous sensor modalities, but can also fuse multimodal sensors to improve performance. We characterise key architectural hyperparameters’ influence on performance to provide insights about their optimisation

    Automatic Scaffolding Productivity Measurement through Deep Learning

    Get PDF
    This study developed a method to automatically measure scaffolding productivity by extracting and analysing semantic information from onsite vision data

    Multimodaalsel emotsioonide tuvastamisel pÔhineva inimese-roboti suhtluse arendamine

    Get PDF
    VĂ€itekirja elektrooniline versioon ei sisalda publikatsiooneÜks afektiivse arvutiteaduse peamistest huviobjektidest on mitmemodaalne emotsioonituvastus, mis leiab rakendust peamiselt inimese-arvuti interaktsioonis. Emotsiooni Ă€ratundmiseks uuritakse nendes sĂŒsteemides nii inimese nĂ€oilmeid kui kakĂ”net. KĂ€esolevas töös uuritakse inimese emotsioonide ja nende avaldumise visuaalseid ja akustilisi tunnuseid, et töötada vĂ€lja automaatne multimodaalne emotsioonituvastussĂŒsteem. KĂ”nest arvutatakse mel-sageduse kepstri kordajad, helisignaali erinevate komponentide energiad ja prosoodilised nĂ€itajad. NĂ€oilmeteanalĂŒĂŒsimiseks kasutatakse kahte erinevat strateegiat. Esiteks arvutatakse inimesenĂ€o tĂ€htsamate punktide vahelised erinevad geomeetrilised suhted. Teiseks vĂ”etakse emotsionaalse sisuga video kokku vĂ€hendatud hulgaks pĂ”hikaadriteks, misantakse sisendiks konvolutsioonilisele tehisnĂ€rvivĂ”rgule emotsioonide visuaalsekseristamiseks. Kolme klassifitseerija vĂ€ljunditest (1 akustiline, 2 visuaalset) koostatakse uus kogum tunnuseid, mida kasutatakse Ă”ppimiseks sĂŒsteemi viimasesetapis. Loodud sĂŒsteemi katsetati SAVEE, Poola ja Serbia emotsionaalse kĂ”neandmebaaside, eNTERFACE’05 ja RML andmebaaside peal. Saadud tulemusednĂ€itavad, et vĂ”rreldes olemasolevatega vĂ”imaldab kĂ€esoleva töö raames loodudsĂŒsteem suuremat tĂ€psust emotsioonide Ă€ratundmisel. Lisaks anname kĂ€esolevastöös ĂŒlevaate kirjanduses vĂ€ljapakutud sĂŒsteemidest, millel on vĂ”imekus tunda Ă€raemotsiooniga seotud ̆zeste. Selle ĂŒlevaate eesmĂ€rgiks on hĂ”lbustada uute uurimissuundade leidmist, mis aitaksid lisada töö raames loodud sĂŒsteemile ̆zestipĂ”hiseemotsioonituvastuse vĂ”imekuse, et veelgi enam tĂ”sta sĂŒsteemi emotsioonide Ă€ratundmise tĂ€psust.Automatic multimodal emotion recognition is a fundamental subject of interest in affective computing. Its main applications are in human-computer interaction. The systems developed for the foregoing purpose consider combinations of different modalities, based on vocal and visual cues. This thesis takes the foregoing modalities into account, in order to develop an automatic multimodal emotion recognition system. More specifically, it takes advantage of the information extracted from speech and face signals. From speech signals, Mel-frequency cepstral coefficients, filter-bank energies and prosodic features are extracted. Moreover, two different strategies are considered for analyzing the facial data. First, facial landmarks' geometric relations, i.e. distances and angles, are computed. Second, we summarize each emotional video into a reduced set of key-frames. Then they are taught to visually discriminate between the emotions. In order to do so, a convolutional neural network is applied to the key-frames summarizing the videos. Afterward, the output confidence values of all the classifiers from both of the modalities are used to define a new feature space. Lastly, the latter values are learned for the final emotion label prediction, in a late fusion. The experiments are conducted on the SAVEE, Polish, Serbian, eNTERFACE'05 and RML datasets. The results show significant performance improvements by the proposed system in comparison to the existing alternatives, defining the current state-of-the-art on all the datasets. Additionally, we provide a review of emotional body gesture recognition systems proposed in the literature. The aim of the foregoing part is to help figure out possible future research directions for enhancing the performance of the proposed system. More clearly, we imply that incorporating data representing gestures, which constitute another major component of the visual modality, can result in a more efficient framework

    Human Pose Tracking from Monocular Image Sequences

    Get PDF
    This thesis proposes various novel approaches for improving the performance of automatic 2D human pose tracking system including multi-scale strategy, mid-level spatial dependencies to constrain more relations of multiple body parts, additional constraints between symmetric body parts and the left/right confusion correction by a head orientation estimator. These proposed approaches are employed to develop a complete human pose tracking system. The experimental results demonstrate significant improvements of all the proposed approaches towards accuracy and efficiency

    Affective Computing

    Get PDF
    This book provides an overview of state of the art research in Affective Computing. It presents new ideas, original results and practical experiences in this increasingly important research field. The book consists of 23 chapters categorized into four sections. Since one of the most important means of human communication is facial expression, the first section of this book (Chapters 1 to 7) presents a research on synthesis and recognition of facial expressions. Given that we not only use the face but also body movements to express ourselves, in the second section (Chapters 8 to 11) we present a research on perception and generation of emotional expressions by using full-body motions. The third section of the book (Chapters 12 to 16) presents computational models on emotion, as well as findings from neuroscience research. In the last section of the book (Chapters 17 to 22) we present applications related to affective computing

    A Survey of Applications and Human Motion Recognition with Microsoft Kinect

    Get PDF
    Microsoft Kinect, a low-cost motion sensing device, enables users to interact with computers or game consoles naturally through gestures and spoken commands without any other peripheral equipment. As such, it has commanded intense interests in research and development on the Kinect technology. In this paper, we present, a comprehensive survey on Kinect applications, and the latest research and development on motion recognition using data captured by the Kinect sensor. On the applications front, we review the applications of the Kinect technology in a variety of areas, including healthcare, education and performing arts, robotics, sign language recognition, retail services, workplace safety training, as well as 3D reconstructions. On the technology front, we provide an overview of the main features of both versions of the Kinect sensor together with the depth sensing technologies used, and review literatures on human motion recognition techniques used in Kinect applications. We provide a classification of motion recognition techniques to highlight the different approaches used in human motion recognition. Furthermore, we compile a list of publicly available Kinect datasets. These datasets are valuable resources for researchers to investigate better methods for human motion recognition and lower-level computer vision tasks such as segmentation, object detection and human pose estimation
    • 

    corecore