292 research outputs found

    3D Robotic Sensing of People: Human Perception, Representation and Activity Recognition

    Get PDF
    The robots are coming. Their presence will eventually bridge the digital-physical divide and dramatically impact human life by taking over tasks where our current society has shortcomings (e.g., search and rescue, elderly care, and child education). Human-centered robotics (HCR) is a vision to address how robots can coexist with humans and help people live safer, simpler and more independent lives. As humans, we have a remarkable ability to perceive the world around us, perceive people, and interpret their behaviors. Endowing robots with these critical capabilities in highly dynamic human social environments is a significant but very challenging problem in practical human-centered robotics applications. This research focuses on robotic sensing of people, that is, how robots can perceive and represent humans and understand their behaviors, primarily through 3D robotic vision. In this dissertation, I begin with a broad perspective on human-centered robotics by discussing its real-world applications and significant challenges. Then, I will introduce a real-time perception system, based on the concept of Depth of Interest, to detect and track multiple individuals using a color-depth camera that is installed on moving robotic platforms. In addition, I will discuss human representation approaches, based on local spatio-temporal features, including new “CoDe4D” features that incorporate both color and depth information, a new “SOD” descriptor to efficiently quantize 3D visual features, and the novel AdHuC features, which are capable of representing the activities of multiple individuals. Several new algorithms to recognize human activities are also discussed, including the RG-PLSA model, which allows us to discover activity patterns without supervision, the MC-HCRF model, which can explicitly investigate certainty in latent temporal patterns, and the FuzzySR model, which is used to segment continuous data into events and probabilistically recognize human activities. Cognition models based on recognition results are also implemented for decision making that allow robotic systems to react to human activities. Finally, I will conclude with a discussion of future directions that will accelerate the upcoming technological revolution of human-centered robotics

    Action recognition in depth videos using nonparametric probabilistic graphical models

    Get PDF
    Action recognition involves automatically labelling videos that contain human motion with action classes. It has applications in diverse areas such as smart surveillance, human computer interaction and content retrieval. The recent advent of depth sensing technology that produces depth image sequences has offered opportunities to solve the challenging action recognition problem. The depth images facilitate robust estimation of a human skeleton’s 3D joint positions and a high level action can be inferred from a sequence of these joint positions. A natural way to model a sequence of joint positions is to use a graphical model that describes probabilistic dependencies between the observed joint positions and some hidden state variables. A problem with these models is that the number of hidden states must be fixed a priori even though for many applications this number is not known in advance. This thesis proposes nonparametric variants of graphical models with the number of hidden states automatically inferred from data. The inference is performed in a full Bayesian setting by using the Dirichlet Process as a prior over the model’s infinite dimensional parameter space. This thesis describes three original constructions of nonparametric graphical models that are applied in the classification of actions in depth videos. Firstly, the action classes are represented by a Hidden Markov Model (HMM) with an unbounded number of hidden states. The formulation enables information sharing and discriminative learning of parameters. Secondly, a hierarchical HMM with an unbounded number of actions and poses is used to represent activities. The construction produces a simplified model for activity classification by using logistic regression to capture the relationship between action states and activity labels. Finally, the action classes are modelled by a Hidden Conditional Random Field (HCRF) with the number of intermediate hidden states learned from data. Tractable inference procedures based on Markov Chain Monte Carlo (MCMC) techniques are derived for all these constructions. Experiments with multiple benchmark datasets confirm the efficacy of the proposed approaches for action recognition

    WATCHING PEOPLE: ALGORITHMS TO STUDY HUMAN MOTION AND ACTIVITIES

    Get PDF
    Nowadays human motion analysis is one of the most active research topics in Computer Vision and it is receiving an increasing attention from both the industrial and scientific communities. The growing interest in human motion analysis is motivated by the increasing number of promising applications, ranging from surveillance, human–computer interaction, virtual reality to healthcare, sports, computer games and video conferencing, just to name a few. The aim of this thesis is to give an overview of the various tasks involved in visual motion analysis of the human body and to present the issues and possible solutions related to it. In this thesis, visual motion analysis is categorized into three major areas related to the interpretation of human motion: tracking of human motion using virtual pan-tilt-zoom (vPTZ) camera, recognition of human motions and human behaviors segmentation. In the field of human motion tracking, a virtual environment for PTZ cameras (vPTZ) is presented to overcame the mechanical limitations of PTZ cameras. The vPTZ is built on equirectangular images acquired by 360° cameras and it allows not only the development of pedestrian tracking algorithms but also the comparison of their performances. On the basis of this virtual environment, three novel pedestrian tracking algorithms for 360° cameras were developed, two of which adopt a tracking-by-detection approach while the last adopts a Bayesian approach. The action recognition problem is addressed by an algorithm that represents actions in terms of multinomial distributions of frequent sequential patterns of different length. Frequent sequential patterns are series of data descriptors that occur many times in the data. The proposed method learns a codebook of frequent sequential patterns by means of an apriori-like algorithm. An action is then represented with a Bag-of-Frequent-Sequential-Patterns approach. In the last part of this thesis a methodology to semi-automatically annotate behavioral data given a small set of manually annotated data is presented. The resulting methodology is not only effective in the semi-automated annotation task but can also be used in presence of abnormal behaviors, as demonstrated empirically by testing the system on data collected from children affected by neuro-developmental disorders

    IST Austria Thesis

    Get PDF
    The human ability to recognize objects in complex scenes has driven research in the computer vision field over couple of decades. This thesis focuses on the object recognition task in images. That is, given the image, we want the computer system to be able to predict the class of the object that appears in the image. A recent successful attempt to bridge semantic understanding of the image perceived by humans and by computers uses attribute-based models. Attributes are semantic properties of the objects shared across different categories, which humans and computers can decide on. To explore the attribute-based models we take a statistical machine learning approach, and address two key learning challenges in view of object recognition task: learning augmented attributes as mid-level discriminative feature representation, and learning with attributes as privileged information. Our main contributions are parametric and non-parametric models and algorithms to solve these frameworks. In the parametric approach, we explore an autoencoder model combined with the large margin nearest neighbor principle for mid-level feature learning, and linear support vector machines for learning with privileged information. In the non-parametric approach, we propose a supervised Indian Buffet Process for automatic augmentation of semantic attributes, and explore the Gaussian Processes classification framework for learning with privileged information. A thorough experimental analysis shows the effectiveness of the proposed models in both parametric and non-parametric views

    Classification and fusion methods for multimodal biometric authentication.

    Get PDF
    Ouyang, Hua.Thesis (M.Phil.)--Chinese University of Hong Kong, 2007.Includes bibliographical references (leaves 81-89).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Biometric Authentication --- p.1Chapter 1.2 --- Multimodal Biometric Authentication --- p.2Chapter 1.2.1 --- Combination of Different Biometric Traits --- p.3Chapter 1.2.2 --- Multimodal Fusion --- p.5Chapter 1.3 --- Audio-Visual Bi-modal Authentication --- p.6Chapter 1.4 --- Focus of This Research --- p.7Chapter 1.5 --- Organization of This Thesis --- p.8Chapter 2 --- Audio-Visual Bi-modal Authentication --- p.10Chapter 2.1 --- Audio-visual Authentication System --- p.10Chapter 2.1.1 --- Why Audio and Mouth? --- p.10Chapter 2.1.2 --- System Overview --- p.11Chapter 2.2 --- XM2VTS Database --- p.12Chapter 2.3 --- Visual Feature Extraction --- p.14Chapter 2.3.1 --- Locating the Mouth --- p.14Chapter 2.3.2 --- Averaged Mouth Images --- p.17Chapter 2.3.3 --- Averaged Optical Flow Images --- p.21Chapter 2.4 --- Audio Features --- p.23Chapter 2.5 --- Video Stream Classification --- p.23Chapter 2.6 --- Audio Stream Classification --- p.25Chapter 2.7 --- Simple Fusion --- p.26Chapter 3 --- Weighted Sum Rules for Multi-modal Fusion --- p.27Chapter 3.1 --- Measurement-Level Fusion --- p.27Chapter 3.2 --- Product Rule and Sum Rule --- p.28Chapter 3.2.1 --- Product Rule --- p.28Chapter 3.2.2 --- Naive Sum Rule (NS) --- p.29Chapter 3.2.3 --- Linear Weighted Sum Rule (WS) --- p.30Chapter 3.3 --- Optimal Weights Selection for WS --- p.31Chapter 3.3.1 --- Independent Case --- p.31Chapter 3.3.2 --- Identical Case --- p.33Chapter 3.4 --- Confidence Measure Based Fusion Weights --- p.35Chapter 4 --- Regularized k-Nearest Neighbor Classifier --- p.39Chapter 4.1 --- Motivations --- p.39Chapter 4.1.1 --- Conventional k-NN Classifier --- p.39Chapter 4.1.2 --- Bayesian Formulation of kNN --- p.40Chapter 4.1.3 --- Pitfalls and Drawbacks of kNN Classifiers --- p.41Chapter 4.1.4 --- Metric Learning Methods --- p.43Chapter 4.2 --- Regularized k-Nearest Neighbor Classifier --- p.46Chapter 4.2.1 --- Metric or Not Metric? --- p.46Chapter 4.2.2 --- Proposed Classifier: RkNN --- p.47Chapter 4.2.3 --- Hyperkernels and Hyper-RKHS --- p.49Chapter 4.2.4 --- Convex Optimization of RkNN --- p.52Chapter 4.2.5 --- Hyper kernel Construction --- p.53Chapter 4.2.6 --- Speeding up RkNN --- p.56Chapter 4.3 --- Experimental Evaluation --- p.57Chapter 4.3.1 --- Synthetic Data Sets --- p.57Chapter 4.3.2 --- Benchmark Data Sets --- p.64Chapter 5 --- Audio-Visual Authentication Experiments --- p.68Chapter 5.1 --- Effectiveness of Visual Features --- p.68Chapter 5.2 --- Performance of Simple Sum Rule --- p.71Chapter 5.3 --- Performances of Individual Modalities --- p.73Chapter 5.4 --- Identification Tasks Using Confidence-based Weighted Sum Rule --- p.74Chapter 5.4.1 --- Effectiveness of WS_M_C Rule --- p.75Chapter 5.4.2 --- WS_M_C v.s. WS_M --- p.76Chapter 5.5 --- Speaker Identification Using RkNN --- p.77Chapter 6 --- Conclusions and Future Work --- p.78Chapter 6.1 --- Conclusions --- p.78Chapter 6.2 --- Important Follow-up Works --- p.80Bibliography --- p.81Chapter A --- Proof of Proposition 3.1 --- p.90Chapter B --- Proof of Proposition 3.2 --- p.9
    • …
    corecore