501 research outputs found

    An Analytic Training Approach for Recognition in Still Images and Videos

    Get PDF
    This dissertation proposes a general framework to efficiently identify the objects of interest (OI) in still images and its application can be further extended to human action recognition in videos. The frameworks utilized in this research to process still images and videos are similar in architecture except they have different content representations. Initially, global level analysis is employed to extract distinctive feature sets from an input data. For the global analysis of data the bidirectional two dimensional principal component analysis (2D-PCA) is employed to preserve correlation amongst neighborhood pixels. Furthermore, to cope with the inherent limitations within the holistic approach local information is introduced into the framework. The local information of OI is identified utilizing FERNS and affine SIFT (ASIFT) approaches for spatial and temporal datasets, respectively. For supportive local information, the feature detection is followed by an effective pruning strategy to divide these features into inliers and outliers. A cluster of inliers represents local features which exhibit stable behavior and geometric consistency. Incremental learning is a significant but often overlooked problem in action recognition. The final part of this dissertation proposes a new action recognition algorithm based on sequential learning and adaptive representation of the human body using Pyramid of Histogram of Oriented Gradients (PHOG) features. The changing shape and appearance of human body parts is tracked based on the weak appearance constancy assumption. The constantly changing shape of an OI is maximally covered by the small blocks to approximate the body contour of a segmented foreground object. In addition, the analytically determined learning phase guarantees lower computational burden for classification. The utilization of a minimum number of video frames in a causal way to recognize an action is also explored in this dissertation. The use of PHOG features adaptively extracted from individual frames allows the recognition of an incoming action video using a small group of frames which eliminates the need of large look-ahead

    Understanding human motion : recognition and retrieval of human activities

    Get PDF
    Ankara : The Department of Computer Engineering and the Institute of Engineering and Science of Bilkent University, 2008.Thesis (Ph.D.) -- Bilkent University, 2008.Includes bibliographical references leaves 111-121.Within the ever-growing video archives is a vast amount of interesting information regarding human action/activities. In this thesis, we approach the problem of extracting this information and understanding human motion from a computer vision perspective. We propose solutions for two distinct scenarios, ordered from simple to complex. In the first scenario, we deal with the problem of single action recognition in relatively simple settings. We believe that human pose encapsulates many useful clues for recognizing the ongoing action, and we can represent this shape information for 2D single actions in very compact forms, before going into details of complex modeling. We show that high-accuracy single human action recognition is possible 1) using spatial oriented histograms of rectangular regions when the silhouette is extractable, 2) using the distribution of boundary-fitted lines when the silhouette information is missing. We demonstrate that, inside videos, we can further improve recognition accuracy by means of adding local and global motion information. We also show that within a discriminative framework, shape information is quite useful even in the case of human action recognition in still images. Our second scenario involves recognition and retrieval of complex human activities within more complicated settings, like the presence of changing background and viewpoints. We describe a method of representing human activities in 3D that allows a collection of motions to be queried without examples, using a simple and effective query language. Our approach is based on units of activity at segments of the body, that can be composed across time and across the body to produce complex queries. The presence of search units is inferred automatically by tracking the body, lifting the tracks to 3D and comparing to models trained using motion capture data. Our models of short time scale limb behaviour are built using labelled motion capture set. Our query language makes use of finite state automata and requires simple text encoding and no visual examples. We show results for a large range of queries applied to a collection of complex motion and activity. We compare with discriminative methods applied to tracker data; our method offers significantly improved performance. We show experimental evidence that our method is robust to view direction and is unaffected by some important changes of clothing.İkizler, NazlıPh.D

    Investigating the Singing Voice: Quantitative and Qualitative Approaches to Studying Cross-Cultural Vocal Production

    Get PDF
    This thesis was motivated by an experiment carried out in the 1960s that studied the relationship between vocal performance practice and society by means of statistical analysis. Using a comprehensive corpus of audio recordings of singing from around the world collected over several decades, the ethnomusicologist Alan Lomax devised the Cantometrics project, the largest comparative study of music, in which 36 performance practice characteristics were rated for each recording. With particular interest in vocal production, we intended to formalise the knowledge of vocal production to enable statistical and computational approaches in the spirit of Cantometrics. Three models of vocal production were investigated: the perceptual model from Cantometrics, a physical model from voice science and a physiological model from singing education. We built on Johan Sundberg's vocal source parameters and Jo Estill's physiological building blocks as the basis to develop an ontology of vocal production. Two approaches to automated characterisation of the ontological descriptors were considered. For the incremental approach a proof-of-concept experiment on automatic labelling of phonation modes was presented, based on reconstructing the vocal source waveform by means of inverse filtering. We created a dataset of sustained sung vowels with annotations on pitch, vowel and phonation mode on which our model was trained. Steps to generalise this experiment to more complex data were outlined, discussing the challenges of such generalisation. The integrated approach addressed the full variance in the data, turning to the methodology of expert knowledge elicitation in order to annotate the original Cantometrics dataset with our descriptors. We performed an investigative mixed-methods study in which 13 vocal physiology experts from different professional backgrounds were interviewed; they used our ontology to analyse vocal production in the Cantometrics dataset. The goal of the study was to: a) validate the acceptance of our ontological terms, b) verify the consensus between experts on the values of the descriptors, c) collect reliable annotations. While the acceptance of the ontology was good for most terms, quantitative analysis showed good agreement between experts for only two out of 11 descriptors (larynx height, aryepiglottic sphincter). A detailed qualitative analysis of the interview data (over 33 hours) was followed by a meta-analysis extracting common themes and confounding issues which point to probable reasons for the disagreement. For aryepiglottic sphincter and larynx height we collected the average ratings, which constitute the first set of reliable annotations on vocal production. A strong correlation was found between larynx height and the vocal width parameter from Cantometrics; larynx height was therefore a good candidate to replace vocal width as a more objective descriptor. The current work was based on knowledge from a number of research disciplines, and its results are discussed from the viewpoint of several fields – MIR, vocal pedagogy, Cantometrics – for which they present significant implications. Future research is suggested for each of the fields. Based on the meta-analysis, we account for the reasons for disagreement between experts on the subject of vocal production, from music information retrieval (MIR) and singing education perspectives. We further explain the various kinds of bias that affect raters. We conclude that vocal physiology, though offering a more objective language than perceptual descriptors, is not well-suited as an ontological middle layer for statistical approaches to singing given the current state of knowledge. A mixed perceptual-objective path to ontology building is suggested and ways to collect reliable annotations are outlined. In the domain of vocal pedagogy we touch on the issue of communication on vocal physiology between experts, between teacher and student; we consider the future of teaching vocal technique and make suggestions for new experiments in the field. A plan is presented for revising and scaling up Cantometrics as an interdisciplinary collaboration. Possible contributions of MIR, ethnomusicologists and vocal production specialists are specified

    Vision-based representation and recognition of human activities in image sequences

    Get PDF
    Magdeburg, Univ., Fak. für Elektrotechnik und Informationstechnik, Diss., 2013von Samy Sadek Mohamed Bakhee
    corecore