23 research outputs found

    Vision-Based 2D and 3D Human Activity Recognition

    Get PDF

    Surgical skill assessment using motion texture analysis

    Get PDF
    In this thesis, we propose a framework for automated assessment of surgical skills to expedite the manual assessment process and to provide unbiased evaluations with possible dexterity feedback. Evaluation of surgical skills is an important aspect in training of medical students. Current practices rely on manual evaluations from faculty and residents and are time consuming. Proposed solutions in literature involve retrospective evaluations such as watching the offline videos. It requires precious time and attention of expert surgeons and may vary from one surgeon to another. With recent advancements in computer vision and machine learning techniques, the retrospective video evaluation can be best delegated to the computer algorithms. Skill assessment is a challenging task requiring expert domain knowledge that may be difficult to translate into algorithms. To emulate this human observation process, an appropriate data collection mechanism is required to track motion of the surgeon's hand in an unrestricted manner. In addition, it is essential to identify skill defining motion dynamics and skill relevant hand locations. This Ph.D. research aims to address the limitations of manual skill assessment by developing an automated motion analysis framework. Specifically, we propose (1) to design and implement quantitative features to capture fine motion details from surgical video data, (2) to identify and test the efficacy of a core subset of features in classifying the surgical students into different expertise levels, (3) to derive absolute skill scores using regression methods and (4) to perform dexterity analysis using motion data from different hand locations.Ph.D

    Time-slice analysis of dyadic human activity

    Get PDF
    La reconnaissance d’activités humaines à partir de données vidéo est utilisée pour la surveillance ainsi que pour des applications d’interaction homme-machine. Le principal objectif est de classer les vidéos dans l’une des k classes d’actions à partir de vidéos entièrement observées. Cependant, de tout temps, les systèmes intelligents sont améliorés afin de prendre des décisions basées sur des incertitudes et ou des informations incomplètes. Ce besoin nous motive à introduire le problème de l’analyse de l’incertitude associée aux activités humaines et de pouvoir passer à un nouveau niveau de généralité lié aux problèmes d’analyse d’actions. Nous allons également présenter le problème de reconnaissance d’activités par intervalle de temps, qui vise à explorer l’activité humaine dans un intervalle de temps court. Il a été démontré que l’analyse par intervalle de temps est utile pour la caractérisation des mouvements et en général pour l’analyse de contenus vidéo. Ces études nous encouragent à utiliser ces intervalles de temps afin d’analyser l’incertitude associée aux activités humaines. Nous allons détailler à quel degré de certitude chaque activité se produit au cours de la vidéo. Dans cette thèse, l’analyse par intervalle de temps d’activités humaines avec incertitudes sera structurée en 3 parties. i) Nous présentons une nouvelle famille de descripteurs spatiotemporels optimisés pour la prédiction précoce avec annotations d’intervalle de temps. Notre représentation prédictive du point d’intérêt spatiotemporel (Predict-STIP) est basée sur l’idée de la contingence entre intervalles de temps. ii) Nous exploitons des techniques de pointe pour extraire des points d’intérêts afin de représenter ces intervalles de temps. iii) Nous utilisons des relations (uniformes et par paires) basées sur les réseaux neuronaux convolutionnels entre les différentes parties du corps de l’individu dans chaque intervalle de temps. Les relations uniformes enregistrent l’apparence locale de la partie du corps tandis que les relations par paires captent les relations contextuelles locales entre les parties du corps. Nous extrayons les spécificités de chaque image dans l’intervalle de temps et examinons différentes façons de les agréger temporellement afin de générer un descripteur pour tout l’intervalle de temps. En outre, nous créons une nouvelle base de données qui est annotée à de multiples intervalles de temps courts, permettant la modélisation de l’incertitude inhérente à la reconnaissance d’activités par intervalle de temps. Les résultats expérimentaux montrent l’efficience de notre stratégie dans l’analyse des mouvements humains avec incertitude.Recognizing human activities from video data is routinely leveraged for surveillance and human-computer interaction applications. The main focus has been classifying videos into one of k action classes from fully observed videos. However, intelligent systems must to make decisions under uncertainty, and based on incomplete information. This need motivates us to introduce the problem of analysing the uncertainty associated with human activities and move to a new level of generality in the action analysis problem. We also present the problem of time-slice activity recognition which aims to explore human activity at a small temporal granularity. Time-slice recognition is able to infer human behaviours from a short temporal window. It has been shown that temporal slice analysis is helpful for motion characterization and for video content representation in general. These studies motivate us to consider timeslices for analysing the uncertainty associated with human activities. We report to what degree of certainty each activity is occurring throughout the video from definitely not occurring to definitely occurring. In this research, we propose three frameworks for time-slice analysis of dyadic human activity under uncertainty. i) We present a new family of spatio-temporal descriptors which are optimized for early prediction with time-slice action annotations. Our predictive spatiotemporal interest point (Predict-STIP) representation is based on the intuition of temporal contingency between time-slices. ii) we exploit state-of-the art techniques to extract interest points in order to represent time-slices. We also present an accumulative uncertainty to depict the uncertainty associated with partially observed videos for the task of early activity recognition. iii) we use Convolutional Neural Networks-based unary and pairwise relations between human body joints in each time-slice. The unary term captures the local appearance of the joints while the pairwise term captures the local contextual relations between the parts. We extract these features from each frame in a time-slice and examine different temporal aggregations to generate a descriptor for the whole time-slice. Furthermore, we create a novel dataset which is annotated at multiple short temporal windows, allowing the modelling of the inherent uncertainty in time-slice activity recognition. All the three methods have been evaluated on TAP dataset. Experimental results demonstrate the effectiveness of our framework in the analysis of dyadic activities under uncertaint

    Multi-Modality Human Action Recognition

    Get PDF
    Human action recognition is very useful in many applications in various areas, e.g. video surveillance, HCI (Human computer interaction), video retrieval, gaming and security. Recently, human action recognition becomes an active research topic in computer vision and pattern recognition. A number of action recognition approaches have been proposed. However, most of the approaches are designed on the RGB images sequences, where the action data was collected by RGB/intensity camera. Thus the recognition performance is usually related to various occlusion, background, and lighting conditions of the image sequences. If more information can be provided along with the image sequences, more data sources other than the RGB video can be utilized, human actions could be better represented and recognized by the designed computer vision system.;In this dissertation, the multi-modality human action recognition is studied. On one hand, we introduce the study of multi-spectral action recognition, which involves the information from different spectrum beyond visible, e.g. infrared and near infrared. Action recognition in individual spectra is explored and new methods are proposed. Then the cross-spectral action recognition is also investigated and novel approaches are proposed in our work. On the other hand, since the depth imaging technology has made a significant progress recently, where depth information can be captured simultaneously with the RGB videos. The depth-based human action recognition is also investigated. I first propose a method combining different type of depth data to recognize human actions. Then a thorough evaluation is conducted on spatiotemporal interest point (STIP) based features for depth-based action recognition. Finally, I advocate the study of fusing different features for depth-based action analysis. Moreover, human depression recognition is studied by combining facial appearance model as well as facial dynamic model

    Human Action Recognition with RGB-D Sensors

    Get PDF
    Human action recognition, also known as HAR, is at the foundation of many different applications related to behavioral analysis, surveillance, and safety, thus it has been a very active research area in the last years. The release of inexpensive RGB-D sensors fostered researchers working in this field because depth data simplify the processing of visual data that could be otherwise difficult using classic RGB devices. Furthermore, the availability of depth data allows to implement solutions that are unobtrusive and privacy preserving with respect to classic video-based analysis. In this scenario, the aim of this chapter is to review the most salient techniques for HAR based on depth signal processing, providing some details on a specific method based on temporal pyramid of key poses, evaluated on the well-known MSR Action3D dataset

    Human Action Recognition from Various Data Modalities:A Review

    Get PDF
    Human Action Recognition (HAR), aiming to understand human behaviors and then assign category labels, has a wide range of applications, and thus has been attracting increasing attention in the field of computer vision. Generally, human actions can be represented using various data modalities, such as RGB, skeleton, depth, infrared sequence, point cloud, event stream, audio, acceleration, radar, and WiFi, etc., which encode different sources of useful yet distinct information and have various advantages and application scenarios. Consequently, lots of existing works have attempted to investigate different types of approaches for HAR using various modalities. In this paper, we give a comprehensive survey for HAR from the perspective of the input data modalities. Specifically, we review both the hand-crafted feature-based and deep learning-based methods for single data modalities, and also review the methods based on multiple modalities, including the fusion-based frameworks and the co-learning-based approaches. The current benchmark datasets for HAR are also introduced. Finally, we discuss some potentially important research directions in this area
    corecore