Search CORE

390 research outputs found

Gender and gaze gesture recognition for human-computer interaction

Author: Abdul Farooq
Alexandre
Arandjelovic
Asadifard
Campadelli
Chang
Cristinacce
Dalal
Do
Drewes
Duda
Fagertun
Grudin
Hamouz
Hu
Huynh
Hyrskykari
Jaimes
Jesorsky
Kapoor
Karray
Koenderink
Kroon
Lasinger
Lee
Leo
Lichtenauer
Liu
Lyndon N. Smith
Melvyn L. Smith
Moghaddam
Mäkinen
Mäkinen
Ng
Niu
Ojala
Perronnin
Phillips
Phillips
Phung
Reynolds
Rozado
Shan
Simonyan
Sánchez
Tawari
Timm
Tivive
Türkan
Ullah
Valenti
Valenti
Vedaldi
Viola
Viola
Wachs
Wang
Wang
Wenhao Zhang
Zhu
Zhu
Publication venue: 'Elsevier BV'
Publication date: 01/08/2016
Field of study

© 2016 Elsevier Inc. The identification of visual cues in facial images has been widely explored in the broad area of computer vision. However theoretical analyses are often not transformed into widespread assistive Human-Computer Interaction (HCI) systems, due to factors such as inconsistent robustness, low efficiency, large computational expense or strong dependence on complex hardware. We present a novel gender recognition algorithm, a modular eye centre localisation approach and a gaze gesture recognition method, aiming to escalate the intelligence, adaptability and interactivity of HCI systems by combining demographic data (gender) and behavioural data (gaze) to enable development of a range of real-world assistive-technology applications. The gender recognition algorithm utilises Fisher Vectors as facial features which are encoded from low-level local features in facial images. We experimented with four types of low-level features: greyscale values, Local Binary Patterns (LBP), LBP histograms and Scale Invariant Feature Transform (SIFT). The corresponding Fisher Vectors were classified using a linear Support Vector Machine. The algorithm has been tested on the FERET database, the LFW database and the FRGCv2 database, yielding 97.7%, 92.5% and 96.7% accuracy respectively. The eye centre localisation algorithm has a modular approach, following a coarse-to-fine, global-to-regional scheme and utilising isophote and gradient features. A Selective Oriented Gradient filter has been specifically designed to detect and remove strong gradients from eyebrows, eye corners and self-shadows (which sabotage most eye centre localisation methods). The trajectories of the eye centres are then defined as gaze gestures for active HCI. The eye centre localisation algorithm has been compared with 10 other state-of-the-art algorithms with similar functionality and has outperformed them in terms of accuracy while maintaining excellent real-time performance. The above methods have been employed for development of a data recovery system that can be employed for implementation of advanced assistive technology tools. The high accuracy, reliability and real-time performance achieved for attention monitoring, gaze gesture control and recovery of demographic data, can enable the advanced human-robot interaction that is needed for developing systems that can provide assistance with everyday actions, thereby improving the quality of life for the elderly and/or disabled

Crossref

UWE Bristol Research Repository

2D and 3D computer vision analysis of gaze, gender and age

Author: Zhang Wenhao
Publication venue
Publication date
Field of study

Human-Computer Interaction (HCI) has been an active research area for over four decades. Research studies and commercial designs in this area have been largely facilitated by the visual modality which brings diversified functionality and improved usability to HCI interfaces by employing various computer vision techniques. This thesis explores a number of facial cues, such as gender, age and gaze, by performing 2D and 3D based computer vision analysis. The ultimate aim is to create a natural HCI strategy that can fulfil user expectations, augment user satisfaction and enrich user experience by understanding user characteristics and behaviours. To this end, salient features have been extracted and analysed from 2D and 3D face representations; 3D reconstruction algorithms and their compatible real-world imaging systems have been investigated; case study HCI systems have been designed to demonstrate the reliability, robustness, and applicability of the proposed method.More specifically, an unsupervised approach has been proposed to localise eye centres in images and videos accurately and efficiently. This is achieved by utilisation of two types of geometric features and eye models, complemented by an iris radius constraint and a selective oriented gradient filter specifically tailored to this modular scheme. This approach resolves challenges such as interfering facial edges, undesirable illumination conditions, head poses, and the presence of facial accessories and makeup. Tested on 3 publicly available databases (the BioID database, the GI4E database and the extended Yale Face Database b), and a self-collected database, this method outperforms all the methods in comparison and thus proves to be highly accurate and robust. Based on this approach, a gaze gesture recognition algorithm has been designed to increase the interactivity of HCI systems by encoding eye saccades into a communication channel similar to the role of hand gestures. As well as analysing eye/gaze data that represent user behaviours and reveal user intentions, this thesis also investigates the automatic recognition of user demographics such as gender and age. The Fisher Vector encoding algorithm is employed to construct visual vocabularies as salient features for gender and age classification. Algorithm evaluations on three publicly available databases (the FERET database, the LFW database and the FRCVv2 database) demonstrate the superior performance of the proposed method in both laboratory and unconstrained environments. In order to achieve enhanced robustness, a two-source photometric stereo method has been introduced to recover surface normals such that more invariant 3D facia features become available that can further boost classification accuracy and robustness. A 2D+3D imaging system has been designed for construction of a self-collected dataset including 2D and 3D facial data. Experiments show that utilisation of 3D facial features can increase gender classification rate by up to 6% (based on the self-collected dataset), and can increase age classification rate by up to 12% (based on the Photoface database). Finally, two case study HCI systems, a gaze gesture based map browser and a directed advertising billboard, have been designed by adopting all the proposed algorithms as well as the fully compatible imaging system. Benefits from the proposed algorithms naturally ensure that the case study systems can possess high robustness to head pose variation and illumination variation; and can achieve excellent real-time performance. Overall, the proposed HCI strategy enabled by reliably recognised facial cues can serve to spawn a wide array of innovative systems and to bring HCI to a more natural and intelligent state

UWE Bristol Research Repository

Review of Local Descriptor in RGB-D Object Recognition

Author: Khodra Masayu Leylia
Rachmawati Ema
Suwardi Iping Supriana
Publication venue: 'Universitas Ahmad Dahlan'
Publication date: 01/12/2014
Field of study

The emergence of an RGB-D (Red-Green-Blue-Depth) sensor which is capable of providing depth and RGB images gives hope to the computer vision community. Moreover, the use of local features began to increase over the last few years and has shown impressive results, especially in the field of object recognition. This article attempts to provide a survey of the recent technical achievements in this area of research. We review the use of local descriptors as the feature representation which is extracted from RGB-D images, in instances and category-level object recognition. We also highlight the involvement of depth images and how they can be combined with RGB images in constructing a local descriptor. Three different approaches are used in involving depth images into compact feature representation, that is classical approach using distribution based, kernel-trick, and feature learning. In this article, we show that the involvement of depth data successfully improves the accuracy of object recognition

Journal of Education and Learning (EduLearn)

TELKOMNIKA (Telecommunication Computing Electronics and Control)

UAD Journal Management System

A Hybrid Vision-Map Method for Urban Road Detection

Author
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2017
Field of study

Crossref

Measuring trustworthiness of image data in the internet of things environment

Author: Islam Mohammad
Publication venue: 'Federation University Australia'
Publication date: 01/01/2021
Field of study

Internet of Things (IoT) image sensors generate huge volumes of digital images every day. However, easy availability and usability of photo editing tools, the vulnerability in communication channels and malicious software have made forgery attacks on image sensor data effortless and thus expose IoT systems to cyberattacks. In IoT applications such as smart cities and surveillance systems, the smooth operation depends on sensors’ sharing data with other sensors of identical or different types. Therefore, a sensor must be able to rely on the data it receives from other sensors; in other words, data must be trustworthy. Sensors deployed in IoT applications are usually limited to low processing and battery power, which prohibits the use of complex cryptography and security mechanism and the adoption of universal security standards by IoT device manufacturers. Hence, estimating the trust of the image sensor data is a defensive solution as these data are used for critical decision-making processes. To our knowledge, only one published work has estimated the trustworthiness of digital images applied to forensic applications. However, that study’s method depends on machine learning prediction scores returned by existing forensic models, which limits its usage where underlying forensics models require different approaches (e.g., machine learning predictions, statistical methods, digital signature, perceptual image hash). Multi-type sensor data correlation and context awareness can improve the trust measurement, which is absent in that study’s model. To address these issues, novel techniques are introduced to accurately estimate the trustworthiness of IoT image sensor data with the aid of complementary non-imagery (numeric) data-generating sensors monitoring the same environment. The trust estimation models run in edge devices, relieving sensors from computationally intensive tasks. First, to detect local image forgery (splicing and copy-move attacks), an innovative image forgery detection method is proposed based on Discrete Cosine Transformation (DCT), Local Binary Pattern (LBP) and a new feature extraction method using the mean operator. Using Support Vector Machine (SVM), the proposed method is extensively tested on four well-known publicly available greyscale and colour image forgery datasets and on an IoT-based image forgery dataset that we built. Experimental results reveal the superiority of our proposed method over recent state-of-the-art methods in terms of widely used performance metrics and computational time and demonstrate robustness against low availability of forged training samples. Second, a robust trust estimation framework for IoT image data is proposed, leveraging numeric data-generating sensors deployed in the same area of interest (AoI) in an indoor environment. As low-cost sensors allow many IoT applications to use multiple types of sensors to observe the same AoI, the complementary numeric data of one sensor can be exploited to measure the trust value of another image sensor’s data. A theoretical model is developed using Shannon’s entropy to derive the uncertainty associated with an observed event and Dempster-Shafer theory (DST) for decision fusion. The proposed model’s efficacy in estimating the trust score of image sensor data is analysed by observing a fire event using IoT image and temperature sensor data in an indoor residential setup under different scenarios. The proposed model produces highly accurate trust scores in all scenarios with authentic and forged image data. Finally, as the outdoor environment varies dynamically due to different natural factors (e.g., lighting condition variations in day and night, presence of different objects, smoke, fog, rain, shadow in the scene), a novel trust framework is proposed that is suitable for the outdoor environments with these contextual variations. A transfer learning approach is adopted to derive the decision about an observation from image sensor data, while also a statistical approach is used to derive the decision about the same observation from numeric data generated from other sensors deployed in the same AoI. These decisions are then fused using CertainLogic and compared with DST-based fusion. A testbed was set up using Raspberry Pi microprocessor, image sensor, temperature sensor, edge device, LoRa nodes, LoRaWAN gateway and servers to evaluate the proposed techniques. The results show that CertainLogic is more suitable for measuring the trustworthiness of image sensor data in an outdoor environment.Doctor of Philosoph

Federation ResearchOnline

Morphological Classification of Odontogenic Keratocysts Using Bouligand-Minkowski Fractal Descriptors

Author: Bruno Odemir
Florindo Joao Batista
Landini Gabriel
Publication venue: 'Elsevier BV'
Publication date: 01/02/2017
Field of study

University of Birmingham Research Portal

A robust framework for medical image segmentation through adaptable class-specific representation

Author: Nielsen C.
Nielsen C.
Publication venue
Publication date: 01/01/2002
Field of study

Medical image segmentation is an increasingly important component in virtual pathology, diagnostic imaging and computer-assisted surgery. Better hardware for image acquisition and a variety of advanced visualisation methods have paved the way for the development of computer based tools for medical image analysis and interpretation. The routine use of medical imaging scans of multiple modalities has been growing over the last decades and data sets such as the Visible Human Project have introduced a new modality in the form of colour cryo section data. These developments have given rise to an increasing need for better automatic and semiautomatic segmentation methods. The work presented in this thesis concerns the development of a new framework for robust semi-automatic segmentation of medical imaging data of multiple modalities. Following the specification of a set of conceptual and technical requirements, the framework known as ACSR (Adaptable Class-Specific Representation) is developed in the first case for 2D colour cryo section segmentation. This is achieved through the development of a novel algorithm for adaptable class-specific sampling of point neighbourhoods, known as the PGA (Path Growing Algorithm), combined with Learning Vector Quantization. The framework is extended to accommodate 3D volume segmentation of cryo section data and subsequently segmentation of single and multi-channel greyscale MRl data. For the latter the issues of inhomogeneity and noise are specifically addressed. Evaluation is based on comparison with previously published results on standard simulated and real data sets, using visual presentation, ground truth comparison and human observer experiments. ACSR provides the user with a simple and intuitive visual initialisation process followed by a fully automatic segmentation. Results on both cryo section and MRI data compare favourably to existing methods, demonstrating robustness both to common artefacts and multiple user initialisations. Further developments into specific clinical applications are discussed in the future work section

Middlesex University Research Repository

Using Surfaces and Surface Relations in an Early Cognitive Vision System

Author: Buch Anders Glent
Jessen Jeppe Barsøe
Kraft Dirk
Krüger Norbert
Mustafa Wail
Popovic Mila
Pugeault N
Savarimuthu Thiusius Rajeeth
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/08/2015
Field of study

The final publication is available at Springer via http://dx.doi.org/10.1007/s00138-015-0705-yWe present a deep hierarchical visual system with two parallel hierarchies for edge and surface information. In the two hierarchies, complementary visual information is represented on different levels of granularity together with the associated uncertainties and confidences. At all levels, geometric and appearance information is coded explicitly in 2D and 3D allowing to access this information separately and to link between the different levels. We demonstrate the advantages of such hierarchies in three applications covering grasping, viewpoint independent object representation, and pose estimation.European Community’s Seventh Framework Programme FP7/IC

University of Surrey

Open Research Exeter

University of Southern Denmark Research Output

Surrey Research Insight