128 research outputs found
Action recognition from RGB-D data
In recent years, action recognition based on RGB-D data has attracted increasing attention. Different from traditional 2D action recognition, RGB-D data contains extra depth and skeleton modalities. Different modalities have their own characteristics. This thesis presents seven novel methods to take advantages of the three modalities for action recognition.
First, effective handcrafted features are designed and frequent pattern mining method is employed to mine the most discriminative, representative and nonredundant features for skeleton-based action recognition. Second, to take advantages of powerful Convolutional Neural Networks (ConvNets), it is proposed to represent spatio-temporal information carried in 3D skeleton sequences in three 2D images by encoding the joint trajectories and their dynamics into color distribution in the images, and ConvNets are adopted to learn the discriminative features for human action recognition. Third, for depth-based action recognition, three strategies of data augmentation are proposed to apply ConvNets to small training datasets. Forth, to take full advantage of the 3D structural information offered in the depth modality and its being insensitive to illumination variations, three simple, compact yet effective images-based representations are proposed and ConvNets are adopted for feature extraction and classification. However, both of previous two methods are sensitive to noise and could not differentiate well fine-grained actions. Fifth, it is proposed to represent a depth map sequence into three pairs of structured dynamic images at body, part and joint levels respectively through bidirectional rank pooling to deal with the issue. The structured dynamic image preserves the spatial-temporal information, enhances the structure information across both body parts/joints and different temporal scales, and takes advantages of ConvNets for action recognition. Sixth, it is proposed to extract and use scene flow for action recognition from RGB and depth data. Last, to exploit the joint information in multi-modal features arising from heterogeneous sources (RGB, depth), it is proposed to cooperatively train a single ConvNet (referred to as c-ConvNet) on both RGB features and depth features, and deeply aggregate the two modalities to achieve robust action recognition
An Intelligent Robot and Augmented Reality Instruction System
Human-Centered Robotics (HCR) is a research area that focuses on how robots can empower people to live safer, simpler, and more independent lives. In this dissertation, I present a combination of two technologies to deliver human-centric solutions to an important population. The first nascent area that I investigate is the creation of an Intelligent Robot Instructor (IRI) as a learning and instruction tool for human pupils. The second technology is the use of augmented reality (AR) to create an Augmented Reality Instruction (ARI) system to provide instruction via a wearable interface.
To function in an intelligent and context-aware manner, both systems require the ability to reason about their perception of the environment and make appropriate decisions. In this work, I construct a novel formulation of several education methodologies, particularly those known as response prompting, as part of a cognitive framework to create a system for intelligent instruction, and compare these methodologies in the context of intelligent decision making using both technologies.
The IRI system is demonstrated through experiments with a humanoid robot that uses object recognition and localization for perception and interacts with students through speech, gestures, and object interaction. The ARI system uses augmented reality, computer vision, and machine learning methods to create an intelligent, contextually aware instructional system. By using AR to teach prerequisite skills that lend themselves well to visual, augmented reality instruction prior to a robot instructor teaching skills that lend themselves to embodied interaction, I am able to demonstrate the potential of each system independently as well as in combination to facilitate students\u27 learning.
I identify people with intellectual and developmental disabilities (I/DD) as a particularly significant use case and show that IRI and ARI systems can help fulfill the compelling need to develop tools and strategies for people with I/DD.
I present results that demonstrate both systems can be used independently by students with I/DD to quickly and easily acquire the skills required for performance of relevant vocational tasks. This is the first successful real-world application of response-prompting for decision making in a robotic and augmented reality intelligent instruction system
Exploration of closing-in behaviour in dementia, development and healthy adulthood
Closing-in Behaviour (CIB) is the tendency observed in copying tasks, both
graphic and gestural, in which the copy is made inappropriately close to or on top of
the model. It is classically considered as a manifestation of Constructional Apraxia
(CA) and it is often observed in patients with dementia. CIB is not only a symptom
of pathology, but it is also observed in childrenâs first attempts at graphic copying.
However, CIB shows an inverse pattern in development and dementia: while its
frequency increases in severe dementia, CIB progressively decreases with
development. The cognitive origins of CIB are still unclear. Two main interpretations
dominate CIB literature: the compensation and the attraction hypotheses. The first
hypothesis interprets CIB as a strategy specific to copying tasks that the patient
adopts to overcome visuospatial and working memory deficits. In contrast, the
attraction hypothesis considers CIB as a primitive behaviour, not specific to copying,
and characterized by the default tendency to perform an action toward the focus of
attention. This thesis aimed to study the characteristics and the cognitive origins of
CIB in dementia, development and healthy adulthood. It has three main sections. The
first and second sections explore CIB in patients (with Alzheimerâs disease- AD and
Frontotemporal dementia) and in pre-school children, using survey and experimental
studies, to investigate if CIB might have common characteristics and cognitive
substrates in these different populations. The results provided converging evidence
for the similar nature of CIB in development and dementia. For instance, survey
studies in patients with dementia (Chapter 3) and preschool children (Chapter 6)
showed that performance in attentional tasks predicted the appearance of CIB. In a
similar vein, experimental studies showed support for the attraction hypothesis of
CIB in a single patient with AD (Chapter 4) and pre-school children (Chapter 7 and
8). These results were not, however, replicated in a larger cohort of patients with AD
due to practical reasons (Chapter 5). The last section was devoted to modelling CIB
in normal participants, using complex graphic copying (Chapter 9) and dual task
paradigms (Chapter 10). The results showed further support for the attraction
hypothesis of CIB and underlined the difficulties of eliciting this default bias in
normal adults. To conclude, this thesis radically changes the classical consideration of CIB as a manifestation of CA and demonstrates that CIB is a general default
tendency, not specific to copying tasks. This work indicates avenues for new studies,
which might consider the possible expression and consequences of this behaviour in
patientsâ daily lives
Machine Learning Research Trends in Africa: A 30 Years Overview with Bibliometric Analysis Review
In this paper, a critical bibliometric analysis study is conducted, coupled
with an extensive literature survey on recent developments and associated
applications in machine learning research with a perspective on Africa. The
presented bibliometric analysis study consists of 2761 machine learning-related
documents, of which 98% were articles with at least 482 citations published in
903 journals during the past 30 years. Furthermore, the collated documents were
retrieved from the Science Citation Index EXPANDED, comprising research
publications from 54 African countries between 1993 and 2021. The bibliometric
study shows the visualization of the current landscape and future trends in
machine learning research and its application to facilitate future
collaborative research and knowledge exchange among authors from different
research institutions scattered across the African continent
De-identification for privacy protection in multimedia content : A survey
This document is the Accepted Manuscript version of the following article: Slobodan Ribaric, Aladdin Ariyaeeinia, and Nikola Pavesic, âDe-identification for privacy protection in multimedia content: A surveyâ, Signal Processing: Image Communication, Vol. 47, pp. 131-151, September 2016, doi: https://doi.org/10.1016/j.image.2016.05.020. This manuscript version is distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License CC BY NC-ND 4.0 (http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited, and is not altered, transformed, or built upon in any way.Privacy is one of the most important social and political issues in our information society, characterized by a growing range of enabling and supporting technologies and services. Amongst these are communications, multimedia, biometrics, big data, cloud computing, data mining, internet, social networks, and audio-video surveillance. Each of these can potentially provide the means for privacy intrusion. De-identification is one of the main approaches to privacy protection in multimedia contents (text, still images, audio and video sequences and their combinations). It is a process for concealing or removing personal identifiers, or replacing them by surrogate personal identifiers in personal information in order to prevent the disclosure and use of data for purposes unrelated to the purpose for which the information was originally obtained. Based on the proposed taxonomy inspired by the Safe Harbour approach, the personal identifiers, i.e., the personal identifiable information, are classified as non-biometric, physiological and behavioural biometric, and soft biometric identifiers. In order to protect the privacy of an individual, all of the above identifiers will have to be de-identified in multimedia content. This paper presents a review of the concepts of privacy and the linkage among privacy, privacy protection, and the methods and technologies designed specifically for privacy protection in multimedia contents. The study provides an overview of de-identification approaches for non-biometric identifiers (text, hairstyle, dressing style, license plates), as well as for the physiological (face, fingerprint, iris, ear), behavioural (voice, gait, gesture) and soft-biometric (body silhouette, gender, age, race, tattoo) identifiers in multimedia documents.Peer reviewe
Understanding egocentric human actions with temporal decision forests
Understanding human actions is a fundamental task in computer vision with a wide range of applications including pervasive health-care, robotics and game control. This thesis focuses on the problem of egocentric action recognition from RGB-D data, wherein the world is viewed through the eyes of the actor whose hands describe the actions.
The main contributions of this work are its findings regarding egocentric actions as described by hands in two application scenarios and a proposal of a new technique that is based on temporal decision forests. The thesis first introduces a novel framework to recognise fingertip writing in mid-air in the context of human-computer interaction. This framework detects whether the user is writing and tracks the fingertip over time to generate spatio-temporal trajectories that are recognised by using a Hough forest variant that encourages temporal consistency in prediction. A problem with using such forest approach for action recognition is that the learning of temporal dynamics is limited to hand-crafted temporal features and temporal regression, which may break the temporal continuity and lead to inconsistent predictions. To overcome this limitation, the thesis proposes transition forests. Besides any temporal information that is encoded in the feature space, the forest automatically learns the temporal dynamics during training, and it is exploited in inference in an online and efficient manner achieving state-of-the-art results. The last contribution of this thesis is its introduction of the first RGB-D benchmark to allow for the study of egocentric hand-object actions with both hand and object pose annotations. This study conducts an extensive evaluation of different baselines, state-of-the art approaches and temporal decision forest models using colour, depth and hand pose features. Furthermore, it extends the transition forest model to incorporate data from different modalities and demonstrates the benefit of using hand pose features to recognise egocentric human actions. The thesis concludes by discussing and analysing the contributions and proposing a few ideas for future work.Open Acces
Recommended from our members
Video content analysis for automated detection and tracking of humans in CCTV surveillance applications
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.The problems of achieving high detection rate with low false alarm rate for human detection and tracking in video sequence, performance scalability, and improving response time are addressed in this thesis. The underlying causes are the effect of scene complexity, human-to-human interactions, scale changes, and scene background-human interactions. A two-stage processing solution, namely, human detection, and human tracking with two novel pattern classifiers is presented. Scale independent human detection is achieved by processing in the wavelet domain using square wavelet features. These features used to characterise human silhouettes at different scales are similar to rectangular features used in [Viola 2001]. At the detection stage two detectors are combined to improve detection rate. The first detector is based on shape-outline of humans extracted from the scene using a reduced complexity outline extraction algorithm. A Shape mismatch measure is used to differentiate between the human and the background class. The second detector uses rectangular features as primitives for silhouette description in the wavelet domain. The marginal distribution of features collocated at a particular position on a candidate human (a patch of the image) is used to describe statistically the silhouette. Two similarity measures are computed between a candidate human and the model histograms of human and non human classes. The similarity measure is used to discriminate between the human and the non human class. At the tracking stage, a tracker based on joint probabilistic data association filter (JPDAF) for data association, and motion correspondence is presented. Track clustering is used to reduce hypothesis enumeration complexity. Towards improving response time with increase in frame dimension, scene complexity, and number of channels; a scalable algorithmic architecture and operating accuracy prediction technique is presented. A scheduling strategy for improving the response time and throughput by parallel processing is also presented
The Ecology of Cultural Space: Towards an Understanding of the Contemporary Artist-led Collective
The importance of friendship has been under-researched in relation to artistic discourse. This lack of research becomes particularly acute when considering ambiguous formations of collective artistic activity. My thesis draws upon friendship as a socio-cultural phenomenon in order to situate the artist-led collective both historically and within the contemporary art continuum. Tracing an historiography of the personal relationships which blurred the boundaries between art and politics, from the re-imagining of the medieval artisanal guild in the nineteenth century to the development of Futurism in the early twentieth century, I argue that the contemporary artist-led collective is haunted by these âcollectivisms pastâ and the spectre of autonomy. Further, the contradictions located within the ideological notions of individualism, which pervade the neo-liberal capitalist hegemony, both deny collective agency and yet accept collective praxis in the guise of enterprise culture. It is this contradictory character that frames my thesis and provides the context for understanding the complex role which friendship plays in the genesis of the contemporary artist-led collective.
In order to understand the implications of friendship as a vital component of the artist-led collective, I utilise Relational Dialectics Theory (RDT) developed by Leslie Baxter and Barbara Montgomery, as a conceptual framework. I employ in-depth case studies of the artist-led collective duo The Cool Couple and architecture collective Assemble, in order to explore how friendship informs artist-led collectives throughout their life cycles. I question how and why these social bonds, which constitute relationships and thus shape the collectives, interrelate with a multiplicity of forces in their specific cultural ecology. These interrelations are further explored through a mapping study of artist-led collective activity in Leeds, UK. This study problematises the dualistic perspective of resistance and co-option between artist-led collectives and institutions. I argue that the evolution of the artist-led collective is implicitly interrelated with the institution and thus the binary opposition of resistance and co-option becomes a dialectical knot of ever-changing relationships. Finally, I situate myself in the research through an auto-ethnographic study of the artist-led collective The Retro Bar at the End of the Universe, of which I am a founding member. This case study enables an internal view of the social bonds which formed The Retro Bar at the End of the Universe and provides an insight that would otherwise be impossible from an external perspective
- âŠ