501 research outputs found
An Analytic Training Approach for Recognition in Still Images and Videos
This dissertation proposes a general framework to efficiently identify the objects of interest (OI) in still images and its application can be further extended to human action recognition in videos. The frameworks utilized in this research to process still images and videos are similar in architecture except they have different content representations. Initially, global level analysis is employed to extract distinctive feature sets from an input data. For the global analysis of data the bidirectional two dimensional principal component analysis (2D-PCA) is employed to preserve correlation amongst neighborhood pixels. Furthermore, to cope with the inherent limitations within the holistic approach local information is introduced into the framework. The local information of OI is identified utilizing FERNS and affine SIFT (ASIFT) approaches for spatial and temporal datasets, respectively. For supportive local information, the feature detection is followed by an effective pruning strategy to divide these features into inliers and outliers. A cluster of inliers represents local features which exhibit stable behavior and geometric consistency. Incremental learning is a significant but often overlooked problem in action recognition. The final part of this dissertation proposes a new action recognition algorithm based on sequential learning and adaptive representation of the human body using Pyramid of Histogram of Oriented Gradients (PHOG) features. The changing shape and appearance of human body parts is tracked based on the weak appearance constancy assumption. The constantly changing shape of an OI is maximally covered by the small blocks to approximate the body contour of a segmented foreground object. In addition, the analytically determined learning phase guarantees lower computational burden for classification. The utilization of a minimum number of video frames in a causal way to recognize an action is also explored in this dissertation. The use of PHOG features adaptively extracted from individual frames allows the recognition of an incoming action video using a small group of frames which eliminates the need of large look-ahead
Understanding human motion : recognition and retrieval of human activities
Ankara : The Department of Computer Engineering and the Institute of Engineering and Science of Bilkent University, 2008.Thesis (Ph.D.) -- Bilkent University, 2008.Includes bibliographical references leaves 111-121.Within the ever-growing video archives is a vast amount of interesting information
regarding human action/activities. In this thesis, we approach the problem of extracting
this information and understanding human motion from a computer vision perspective.
We propose solutions for two distinct scenarios, ordered from simple to complex. In
the first scenario, we deal with the problem of single action recognition in relatively
simple settings. We believe that human pose encapsulates many useful clues for recognizing
the ongoing action, and we can represent this shape information for 2D single
actions in very compact forms, before going into details of complex modeling. We
show that high-accuracy single human action recognition is possible 1) using spatial
oriented histograms of rectangular regions when the silhouette is extractable, 2) using
the distribution of boundary-fitted lines when the silhouette information is missing.
We demonstrate that, inside videos, we can further improve recognition accuracy by
means of adding local and global motion information. We also show that within a discriminative
framework, shape information is quite useful even in the case of human
action recognition in still images.
Our second scenario involves recognition and retrieval of complex human activities
within more complicated settings, like the presence of changing background and
viewpoints. We describe a method of representing human activities in 3D that allows
a collection of motions to be queried without examples, using a simple and effective
query language. Our approach is based on units of activity at segments of the body,
that can be composed across time and across the body to produce complex queries.
The presence of search units is inferred automatically by tracking the body, lifting the
tracks to 3D and comparing to models trained using motion capture data. Our models
of short time scale limb behaviour are built using labelled motion capture set. Our query language makes use of finite state automata and requires simple text encoding
and no visual examples. We show results for a large range of queries applied to a
collection of complex motion and activity. We compare with discriminative methods
applied to tracker data; our method offers significantly improved performance. We
show experimental evidence that our method is robust to view direction and is unaffected
by some important changes of clothing.İkizler, NazlıPh.D
Recommended from our members
Touching is believing: creating illusions and feeling of embodiment with mid-air haptic technology
Over the last two decades, the sense of touch has received new attention from the scientific community.Several haptic devices have been developed to address the complexity of the sense of touch, the latest addition being mid-air (contactless) haptic technology. An interesting series of previous research has suggested an easier way to tackle the complexity of designing convincing tactile sensations by exploiting tactile illusions. Tactile illusions rely on perceptual shortcuts based on the psychophysics of the tactile receptors.
Currently, studies exploring the perceptual space of mid-air haptics and its applicability in the tactile illusions field are still limited in number. This thesis aims to contribute to the field of Human-Computer Interaction (HCI) by investigating the perceptual design space of ultrasonic mid-air haptics technology.
Specifically, in a first set of three studies, we investigate the absolute thresholds (minimal amount of a property of astimulus that a user can detect) for control points (CP) at different frequencies on the hand and arm (Study 1). Then we investigate the optimal sampling rate needed to drive the device in an optimal fashion and its relationship with shape size (Study 2). Next, we apply a new technique to increase users’ performance in a shape discrimination task (Study 3).
In Study 4, we start the exploration of a tactile illusion of movement using contact touch and later, we apply a similar procedure to investigate the feasibility of creating a tactile illusion of movement between the two non-interconnected hands by using mid-air touch (Study 5).
Finally, in Study 6, we explore our sense of touch in VR, while providing an illusion of rain drops through mid-air haptics, to recreate a virtual hand illusion (VHI) to explore the boundaries of our sense of embodiment.
Therefore, the contribution of this work is threefold: a) we contribute by adding new knowledge on the psychophysical space for mid-air haptics, b) we test the potential to create realistic tactile sensations by exploiting tactile illusions with mid-air haptic technology, and c) we demonstrate how tactile illusions mediated by mid-air haptics can convey a sense of embodiment in VR environments
Investigating the Singing Voice: Quantitative and Qualitative Approaches to Studying Cross-Cultural Vocal Production
This thesis was motivated by an experiment carried out in the 1960s that studied the relationship between vocal performance practice and society by means of statistical analysis. Using a comprehensive corpus of audio recordings of singing from around the world collected over several decades, the ethnomusicologist Alan Lomax devised the Cantometrics project, the largest comparative study of music, in which 36 performance practice characteristics were rated for each recording. With particular interest in vocal production, we intended to formalise the knowledge of vocal production to enable statistical and computational approaches in the spirit of Cantometrics.
Three models of vocal production were investigated: the perceptual model from Cantometrics, a physical model from voice science and a physiological model from singing education. We built on Johan Sundberg's vocal source parameters and Jo Estill's physiological building blocks as the basis to develop an ontology of vocal production.
Two approaches to automated characterisation of the ontological descriptors were considered. For the incremental approach a proof-of-concept experiment on automatic labelling of phonation modes was presented, based on reconstructing the vocal source waveform by means of inverse filtering. We created a dataset of sustained sung vowels with annotations on pitch, vowel and phonation mode on which our model was trained. Steps to generalise this experiment to more complex data were outlined, discussing the challenges of such generalisation.
The integrated approach addressed the full variance in the data, turning to the methodology of expert knowledge elicitation in order to annotate the original Cantometrics dataset with our descriptors. We performed an investigative mixed-methods study in which 13 vocal physiology experts from different professional backgrounds were interviewed; they used our ontology to analyse vocal production in the Cantometrics dataset. The goal of the study was to: a) validate the acceptance of our ontological terms, b) verify the consensus between experts on the values of the descriptors, c) collect reliable annotations. While the acceptance of the ontology was good for most terms, quantitative analysis showed good agreement between experts for only two out of 11 descriptors (larynx height, aryepiglottic sphincter). A detailed qualitative analysis of the interview data (over 33 hours) was followed by a meta-analysis extracting common themes and confounding issues which point to probable reasons for the disagreement. For aryepiglottic sphincter and larynx height we collected the average ratings, which constitute the first set of reliable annotations on vocal production. A strong correlation was found between larynx height and the vocal width parameter from Cantometrics; larynx height was therefore a good candidate to replace vocal width as a more objective descriptor.
The current work was based on knowledge from a number of research disciplines, and its results are discussed from the viewpoint of several fields – MIR, vocal pedagogy, Cantometrics – for which they present significant implications. Future research is suggested for each of the fields. Based on the meta-analysis, we account for the reasons for disagreement between experts on the subject of vocal production, from music information retrieval (MIR) and singing education perspectives. We further explain the various kinds of bias that affect raters.
We conclude that vocal physiology, though offering a more objective language than perceptual descriptors, is not well-suited as an ontological middle layer for statistical approaches to singing given the current state of knowledge. A mixed perceptual-objective path to ontology building is suggested and ways to collect reliable annotations are outlined.
In the domain of vocal pedagogy we touch on the issue of communication on vocal physiology between experts, between teacher and student; we consider the future of teaching vocal technique and make suggestions for new experiments in the field.
A plan is presented for revising and scaling up Cantometrics as an interdisciplinary collaboration. Possible contributions of MIR, ethnomusicologists and vocal production specialists are specified
Vision-based representation and recognition of human activities in image sequences
Magdeburg, Univ., Fak. für Elektrotechnik und Informationstechnik, Diss., 2013von Samy Sadek Mohamed Bakhee
- …