4,139 research outputs found
Recommended from our members
Pictures in Your Mind: Using Interactive Gesture-Controlled Reliefs to Explore Art
Tactile reliefs offer many benefits over the more classic raised line drawings or tactile diagrams, as depth, 3D shape, and surface textures are directly perceivable. Although often created for blind and visually impaired (BVI) people, a wider range of people may benefit from such multimodal material. However, some reliefs are still difficult to understand without proper guidance or accompanying verbal descriptions, hindering autonomous exploration.
In this work, we present a gesture-controlled interactive audio guide (IAG) based on recent low-cost depth cameras that can be operated directly with the hands on relief surfaces during tactile exploration. The interactively explorable, location-dependent verbal and captioned descriptions promise rapid tactile accessibility to 2.5D spatial information in a home or education setting, to online resources, or as a kiosk installation at public places.
We present a working prototype, discuss design decisions, and present the results of two evaluation studies: the first with 13 BVI test users and the second follow-up study with 14 test users across a wide range of people with differences and difficulties associated with perception, memory, cognition, and communication. The participant-led research method of this latter study prompted new, significant and innovative developments
Automatic emotional state detection using facial expression dynamic in videos
In this paper, an automatic emotion detection system is built for a computer or machine to detect the emotional state from facial expressions in human computer communication. Firstly, dynamic motion features are extracted from facial expression videos and then advanced machine learning methods for classification and regression are used to predict the emotional states.
The system is evaluated on two publicly available datasets, i.e. GEMEP_FERA and AVEC2013, and satisfied performances are achieved in comparison with the baseline results provided. With this emotional state detection capability, a machine can read the facial expression of its user automatically. This technique can be integrated into applications such as smart robots, interactive games and smart surveillance systems
Higher order feature extraction and selection for robust human gesture recognition using CSI of COTS Wi-Fi devices
Device-free human gesture recognition (HGR) using commercial o the shelf (COTS) Wi-Fi
devices has gained attention with recent advances in wireless technology. HGR recognizes the human
activity performed, by capturing the reflections ofWi-Fi signals from moving humans and storing
them as raw channel state information (CSI) traces. Existing work on HGR applies noise reduction
and transformation to pre-process the raw CSI traces. However, these methods fail to capture
the non-Gaussian information in the raw CSI data due to its limitation to deal with linear signal
representation alone. The proposed higher order statistics-based recognition (HOS-Re) model extracts
higher order statistical (HOS) features from raw CSI traces and selects a robust feature subset for the
recognition task. HOS-Re addresses the limitations in the existing methods, by extracting third order
cumulant features that maximizes the recognition accuracy. Subsequently, feature selection methods
derived from information theory construct a robust and highly informative feature subset, fed as
input to the multilevel support vector machine (SVM) classifier in order to measure the performance.
The proposed methodology is validated using a public database SignFi, consisting of 276 gestures
with 8280 gesture instances, out of which 5520 are from the laboratory and 2760 from the home
environment using a 10 5 cross-validation. HOS-Re achieved an average recognition accuracy of
97.84%, 98.26% and 96.34% for the lab, home and lab + home environment respectively. The average
recognition accuracy for 150 sign gestures with 7500 instances, collected from five di erent users was
96.23% in the laboratory environment.Taylor's University through its TAYLOR'S PhD SCHOLARSHIP Programmeinfo:eu-repo/semantics/publishedVersio
Real-Time Inference of Mental States from Facial Expressions and Upper Body Gestures
We present a real-time system for detecting facial action units and inferring emotional states from head and shoulder gestures and facial expressions. The dynamic system uses three levels of inference on progressively longer time scales. Firstly, facial action units and head orientation are identified from 22 feature points and Gabor filters. Secondly, Hidden Markov Models are used to classify sequences of actions into head and shoulder gestures. Finally, a multi level Dynamic Bayesian Network is used to model the unfolding emotional state based on probabilities of different gestures. The most probable state over a given video clip is chosen as the label for that clip. The average F1 score for 12 action units (AUs 1, 2, 4, 6, 7, 10, 12, 15, 17, 18, 25, 26), labelled on a frame by frame basis, was 0.461. The average classification rate for five emotional states (anger, fear, joy, relief, sadness) was 0.440. Sadness had the greatest rate, 0.64, anger the smallest, 0.11.Thales Research and Technology (UK)Bradlow Foundation TrustProcter & Gamble Compan
Single-picture reconstruction and rendering of trees for plausible vegetation synthesis
State-of-the-art approaches for tree reconstruction either put limiting constraints on the input side (requiring multiple photographs, a scanned point cloud or intensive user input) or provide a representation only suitable for front views of the tree. In this paper we present a complete pipeline for synthesizing and rendering detailed trees from a single photograph with minimal user effort. Since the overall shape and appearance of each tree is recovered from a single photograph of the tree crown, artists can benefit from georeferenced images to populate landscapes with native tree species. A key element of our approach is a compact representation of dense tree crowns through a radial distance map. Our first contribution is an automatic algorithm for generating such representations from a single exemplar image of a tree. We create a rough estimate of the crown shape by solving a thin-plate energy minimization problem, and then add detail through a simplified shape-from-shading approach. The use of seamless texture synthesis results in an image-based representation that can be rendered from arbitrary view directions at different levels of detail. Distant trees benefit from an output-sensitive algorithm inspired on relief mapping. For close-up trees we use a billboard cloud where leaflets are distributed inside the crown shape through a space colonization algorithm. In both cases our representation ensures efficient preservation of the crown shape. Major benefits of our approach include: it recovers the overall shape from a single tree image, involves no tree modeling knowledge and minimal authoring effort, and the associated image-based representation is easy to compress and thus suitable for network streaming.Peer ReviewedPostprint (author's final draft
Fully Automatic Expression-Invariant Face Correspondence
We consider the problem of computing accurate point-to-point correspondences
among a set of human face scans with varying expressions. Our fully automatic
approach does not require any manually placed markers on the scan. Instead, the
approach learns the locations of a set of landmarks present in a database and
uses this knowledge to automatically predict the locations of these landmarks
on a newly available scan. The predicted landmarks are then used to compute
point-to-point correspondences between a template model and the newly available
scan. To accurately fit the expression of the template to the expression of the
scan, we use as template a blendshape model. Our algorithm was tested on a
database of human faces of different ethnic groups with strongly varying
expressions. Experimental results show that the obtained point-to-point
correspondence is both highly accurate and consistent for most of the tested 3D
face models
Spatio-Temporal Facial Expression Recognition Using Convolutional Neural Networks and Conditional Random Fields
Automated Facial Expression Recognition (FER) has been a challenging task for
decades. Many of the existing works use hand-crafted features such as LBP, HOG,
LPQ, and Histogram of Optical Flow (HOF) combined with classifiers such as
Support Vector Machines for expression recognition. These methods often require
rigorous hyperparameter tuning to achieve good results. Recently Deep Neural
Networks (DNN) have shown to outperform traditional methods in visual object
recognition. In this paper, we propose a two-part network consisting of a
DNN-based architecture followed by a Conditional Random Field (CRF) module for
facial expression recognition in videos. The first part captures the spatial
relation within facial images using convolutional layers followed by three
Inception-ResNet modules and two fully-connected layers. To capture the
temporal relation between the image frames, we use linear chain CRF in the
second part of our network. We evaluate our proposed network on three publicly
available databases, viz. CK+, MMI, and FERA. Experiments are performed in
subject-independent and cross-database manners. Our experimental results show
that cascading the deep network architecture with the CRF module considerably
increases the recognition of facial expressions in videos and in particular it
outperforms the state-of-the-art methods in the cross-database experiments and
yields comparable results in the subject-independent experiments.Comment: To appear in 12th IEEE Conference on Automatic Face and Gesture
Recognition Worksho
- …