19,255 research outputs found
Infrared face recognition: a comprehensive review of methodologies and databases
Automatic face recognition is an area with immense practical potential which
includes a wide range of commercial and law enforcement applications. Hence it
is unsurprising that it continues to be one of the most active research areas
of computer vision. Even after over three decades of intense research, the
state-of-the-art in face recognition continues to improve, benefitting from
advances in a range of different research fields such as image processing,
pattern recognition, computer graphics, and physiology. Systems based on
visible spectrum images, the most researched face recognition modality, have
reached a significant level of maturity with some practical success. However,
they continue to face challenges in the presence of illumination, pose and
expression changes, as well as facial disguises, all of which can significantly
decrease recognition accuracy. Amongst various approaches which have been
proposed in an attempt to overcome these limitations, the use of infrared (IR)
imaging has emerged as a particularly promising research direction. This paper
presents a comprehensive and timely review of the literature on this subject.
Our key contributions are: (i) a summary of the inherent properties of infrared
imaging which makes this modality promising in the context of face recognition,
(ii) a systematic review of the most influential approaches, with a focus on
emerging common trends as well as key differences between alternative
methodologies, (iii) a description of the main databases of infrared facial
images available to the researcher, and lastly (iv) a discussion of the most
promising avenues for future research.Comment: Pattern Recognition, 2014. arXiv admin note: substantial text overlap
with arXiv:1306.160
Vehicle-Rear: A New Dataset to Explore Feature Fusion for Vehicle Identification Using Convolutional Neural Networks
This work addresses the problem of vehicle identification through
non-overlapping cameras. As our main contribution, we introduce a novel dataset
for vehicle identification, called Vehicle-Rear, that contains more than three
hours of high-resolution videos, with accurate information about the make,
model, color and year of nearly 3,000 vehicles, in addition to the position and
identification of their license plates. To explore our dataset we design a
two-stream CNN that simultaneously uses two of the most distinctive and
persistent features available: the vehicle's appearance and its license plate.
This is an attempt to tackle a major problem: false alarms caused by vehicles
with similar designs or by very close license plate identifiers. In the first
network stream, shape similarities are identified by a Siamese CNN that uses a
pair of low-resolution vehicle patches recorded by two different cameras. In
the second stream, we use a CNN for OCR to extract textual information,
confidence scores, and string similarities from a pair of high-resolution
license plate patches. Then, features from both streams are merged by a
sequence of fully connected layers for decision. In our experiments, we
compared the two-stream network against several well-known CNN architectures
using single or multiple vehicle features. The architectures, trained models,
and dataset are publicly available at https://github.com/icarofua/vehicle-rear
Robust Digital-Twin Localization via An RGBD-based Transformer Network and A Comprehensive Evaluation on a Mobile Dataset
The potential of digital-twin technology, involving the creation of precise
digital replicas of physical objects, to reshape AR experiences in 3D object
tracking and localization scenarios is significant. However, enabling robust 3D
object tracking in dynamic mobile AR environments remains a formidable
challenge. These scenarios often require a more robust pose estimator capable
of handling the inherent sensor-level measurement noise. In this paper,
recognizing the challenges of comprehensive solutions in existing literature,
we propose a transformer-based 6DoF pose estimator designed to achieve
state-of-the-art accuracy under real-world noisy data. To systematically
validate the new solution's performance against the prior art, we also
introduce a novel RGBD dataset called Digital Twin Tracking Dataset v2 (DTTD2),
which is focused on digital-twin object tracking scenarios. Expanded from an
existing DTTD v1 (DTTD1), the new dataset adds digital-twin data captured using
a cutting-edge mobile RGBD sensor suite on Apple iPhone 14 Pro, expanding the
applicability of our approach to iPhone sensor data. Through extensive
experimentation and in-depth analysis, we illustrate the effectiveness of our
methods under significant depth data errors, surpassing the performance of
existing baselines. Code and dataset are made publicly available at:
https://github.com/augcog/DTTD
Affect Recognition in Ads with Application to Computational Advertising
Advertisements (ads) often include strongly emotional content to leave a
lasting impression on the viewer. This work (i) compiles an affective ad
dataset capable of evoking coherent emotions across users, as determined from
the affective opinions of five experts and 14 annotators; (ii) explores the
efficacy of convolutional neural network (CNN) features for encoding emotions,
and observes that CNN features outperform low-level audio-visual emotion
descriptors upon extensive experimentation; and (iii) demonstrates how enhanced
affect prediction facilitates computational advertising, and leads to better
viewing experience while watching an online video stream embedded with ads
based on a study involving 17 users. We model ad emotions based on subjective
human opinions as well as objective multimodal features, and show how
effectively modeling ad emotions can positively impact a real-life application.Comment: Accepted at the ACM International Conference on Multimedia (ACM MM)
201
Left/Right Hand Segmentation in Egocentric Videos
Wearable cameras allow people to record their daily activities from a
user-centered (First Person Vision) perspective. Due to their favorable
location, wearable cameras frequently capture the hands of the user, and may
thus represent a promising user-machine interaction tool for different
applications. Existent First Person Vision methods handle hand segmentation as
a background-foreground problem, ignoring two important facts: i) hands are not
a single "skin-like" moving element, but a pair of interacting cooperative
entities, ii) close hand interactions may lead to hand-to-hand occlusions and,
as a consequence, create a single hand-like segment. These facts complicate a
proper understanding of hand movements and interactions. Our approach extends
traditional background-foreground strategies, by including a
hand-identification step (left-right) based on a Maxwell distribution of angle
and position. Hand-to-hand occlusions are addressed by exploiting temporal
superpixels. The experimental results show that, in addition to a reliable
left/right hand-segmentation, our approach considerably improves the
traditional background-foreground hand-segmentation
Learning models for semantic classification of insufficient plantar pressure images
Establishing a reliable and stable model to predict a target by using insufficient labeled samples is feasible and
effective, particularly, for a sensor-generated data-set. This paper has been inspired with insufficient data-set
learning algorithms, such as metric-based, prototype networks and meta-learning, and therefore we propose
an insufficient data-set transfer model learning method. Firstly, two basic models for transfer learning are
introduced. A classification system and calculation criteria are then subsequently introduced. Secondly, a dataset
of plantar pressure for comfort shoe design is acquired and preprocessed through foot scan system; and by
using a pre-trained convolution neural network employing AlexNet and convolution neural network (CNN)-
based transfer modeling, the classification accuracy of the plantar pressure images is over 93.5%. Finally,
the proposed method has been compared to the current classifiers VGG, ResNet, AlexNet and pre-trained
CNN. Also, our work is compared with known-scaling and shifting (SS) and unknown-plain slot (PS) partition
methods on the public test databases: SUN, CUB, AWA1, AWA2, and aPY with indices of precision (tr, ts, H)
and time (training and evaluation). The proposed method for the plantar pressure classification task shows high
performance in most indices when comparing with other methods. The transfer learning-based method can be
applied to other insufficient data-sets of sensor imaging fields
Networking Architecture and Key Technologies for Human Digital Twin in Personalized Healthcare: A Comprehensive Survey
Digital twin (DT), refers to a promising technique to digitally and
accurately represent actual physical entities. One typical advantage of DT is
that it can be used to not only virtually replicate a system's detailed
operations but also analyze the current condition, predict future behaviour,
and refine the control optimization. Although DT has been widely implemented in
various fields, such as smart manufacturing and transportation, its
conventional paradigm is limited to embody non-living entities, e.g., robots
and vehicles. When adopted in human-centric systems, a novel concept, called
human digital twin (HDT) has thus been proposed. Particularly, HDT allows in
silico representation of individual human body with the ability to dynamically
reflect molecular status, physiological status, emotional and psychological
status, as well as lifestyle evolutions. These prompt the expected application
of HDT in personalized healthcare (PH), which can facilitate remote monitoring,
diagnosis, prescription, surgery and rehabilitation. However, despite the large
potential, HDT faces substantial research challenges in different aspects, and
becomes an increasingly popular topic recently. In this survey, with a specific
focus on the networking architecture and key technologies for HDT in PH
applications, we first discuss the differences between HDT and conventional
DTs, followed by the universal framework and essential functions of HDT. We
then analyze its design requirements and challenges in PH applications. After
that, we provide an overview of the networking architecture of HDT, including
data acquisition layer, data communication layer, computation layer, data
management layer and data analysis and decision making layer. Besides reviewing
the key technologies for implementing such networking architecture in detail,
we conclude this survey by presenting future research directions of HDT
Assistive technology design and development for acceptable robotics companions for ageing years
© 2013 Farshid Amirabdollahian et al., licensee Versita Sp. z o. o. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs license, which means that the text may be used for non-commercial purposes, provided credit is given to the author.A new stream of research and development responds to changes in life expectancy across the world. It includes technologies which enhance well-being of individuals, specifically for older people. The ACCOMPANY project focuses on home companion technologies and issues surrounding technology development for assistive purposes. The project responds to some overlooked aspects of technology design, divided into multiple areas such as empathic and social human-robot interaction, robot learning and memory visualisation, and monitoring persons’ activities at home. To bring these aspects together, a dedicated task is identified to ensure technological integration of these multiple approaches on an existing robotic platform, Care-O-Bot®3 in the context of a smart-home environment utilising a multitude of sensor arrays. Formative and summative evaluation cycles are then used to assess the emerging prototype towards identifying acceptable behaviours and roles for the robot, for example role as a butler or a trainer, while also comparing user requirements to achieved progress. In a novel approach, the project considers ethical concerns and by highlighting principles such as autonomy, independence, enablement, safety and privacy, it embarks on providing a discussion medium where user views on these principles and the existing tension between some of these principles, for example tension between privacy and autonomy over safety, can be captured and considered in design cycles and throughout project developmentsPeer reviewe
- …