43 research outputs found
Practical and Rich User Digitization
A long-standing vision in computer science has been to evolve computing
devices into proactive assistants that enhance our productivity, health and
wellness, and many other facets of our lives. User digitization is crucial in
achieving this vision as it allows computers to intimately understand their
users, capturing activity, pose, routine, and behavior. Today's consumer
devices - like smartphones and smartwatches provide a glimpse of this
potential, offering coarse digital representations of users with metrics such
as step count, heart rate, and a handful of human activities like running and
biking. Even these very low-dimensional representations are already bringing
value to millions of people's lives, but there is significant potential for
improvement. On the other end, professional, high-fidelity comprehensive user
digitization systems exist. For example, motion capture suits and multi-camera
rigs that digitize our full body and appearance, and scanning machines such as
MRI capture our detailed anatomy. However, these carry significant user
practicality burdens, such as financial, privacy, ergonomic, aesthetic, and
instrumentation considerations, that preclude consumer use. In general, the
higher the fidelity of capture, the lower the user's practicality. Most
conventional approaches strike a balance between user practicality and
digitization fidelity.
My research aims to break this trend, developing sensing systems that
increase user digitization fidelity to create new and powerful computing
experiences while retaining or even improving user practicality and
accessibility, allowing such technologies to have a societal impact. Armed with
such knowledge, our future devices could offer longitudinal health tracking,
more productive work environments, full body avatars in extended reality, and
embodied telepresence experiences, to name just a few domains.Comment: PhD thesi
Multimodal Learning and Its Application to Mobile Active Authentication
Mobile devices are becoming increasingly popular due to their flexibility and convenience in managing personal information such as bank accounts, profiles and passwords. With the increasing use of mobile devices comes the issue of security as the loss of a smartphone would compromise the personal information of the user.
Traditional methods for authenticating users on mobile devices are based on passwords or fingerprints. As long as mobile devices remain active, they do not incorporate any mechanisms for verifying if the user originally authenticated is still the user in control of the mobile device. Thus, unauthorized individuals may improperly obtain access to personal information of the user if a password is compromised or if a user does not exercise adequate vigilance after initial authentication on a device. To deal with this problem, active authentication systems have been proposed in which users are continuously monitored after the initial access to the mobile device. Active authentication systems can capture users' data (facial image data, screen touch data, motion data, etc) through sensors (camera, touch screen, accelerometer, etc), extract features from different sensors' data, build classification models and authenticate users via comparing additional sensor data against the models.
Mobile active authentication can be viewed as one application of the more general problem, namely, multimodal classification. The idea of multimodal classification is to utilize multiple sources (modalities) measuring the same instance to improve the overall performance compared to using a single source (modality). Multimodal classification also arises in many computer vision tasks such as image classification, RGBD object classification and scene recognition.
In this dissertation, we not only present methods and algorithms related to active authentication problems, but also propose multimodal recognition algorithms based on low-rank and joint sparse representations as well as multimodal metric learning algorithm to improve multimodal classification performance. The multimodal learning algorithms proposed in this dissertation make no assumption about the feature type or applications, thus they can be applied to various recognition tasks such as mobile active authentication, image classification and RGBD recognition.
First, we study the mobile active authentication problem by exploiting a dataset consisting of 50 users' face captured by the phone's frontal camera and screen touch data sensed by the screen for evaluating active authentication algorithms developed under this research. The dataset is named as UMD Active Authentication (UMDAA) dataset. Details on data preprocessing and feature extraction for touch data and face data are described respectively.
Second, we present an approach for active user authentication using screen touch gestures by building linear and kernelized dictionaries based on sparse representations and associated classifiers. Experiments using the screen touch data components of UMDAA dataset as well as two other publicly available screen touch datasets show that the dictionary-based classification method compares favorably to those discussed in the literature. Experiments done using screen touch data collected in three different sessions show a drop in performance when the training and test data come from different sessions. This suggests a need for applying domain adaptation methods to further improve the performance of the classifiers.
Third, we propose a domain adaptive sparse representation-based classification method that learns projections of data in a space where the sparsity of data is maintained. We provide an efficient iterative procedure for solving the proposed optimization problem. One of the key features of the proposed method is that it is computationally efficient as learning is done in the lower-dimensional space. Various experiments on UMDAA dataset show that our method is able to capture the meaningful structure of data and can perform significantly better than many competitive domain adaptation algorithms.
Fourth, we propose low-rank and joint sparse representations-based multimodal recognition. Our formulations can be viewed as generalized versions of multivariate low-rank and sparse regression, where sparse and low-rank representations across all the modalities are imposed. One of our methods takes into account coupling information within different modalities simultaneously by enforcing the common low-rank and joint sparse representation among each modality's observations. We also modify our formulations by including an occlusion term that is assumed to be sparse. The alternating direction method of multipliers is proposed to efficiently solve the proposed optimization problems. Extensive experiments on UMDAA dataset, WVU multimodal biometrics dataset and Pascal-Sentence image classification dataset show that that our methods provide better recognition performance than other feature-level fusion methods.
Finally, we propose a hierarchical multimodal metric learning algorithm for multimodal data in order to improve multimodal classification performance. We design metric for each modality as a product of two matrices: one matrix is modality specific, the other is enforced to be shared by all the modalities. The modality specific projection matrices capture the varying characteristics exhibited by multiple modalities and the common projection matrix establishes the relationship of the distance metrics corresponding to multiple modalities. The learned metrics significantly improves classification accuracy and experimental results of tagged image classification problem as well as various RGBD recognition problems show that the proposed algorithm outperforms existing learning algorithms based on multiple metrics as well as other state-of-the-art approaches tested on these datasets. Furthermore, we make the proposed multimodal metric learning algorithm non-linear by using kernel methods
Developing an Autonomous Mobile Robotic Device for Monitoring and Assisting Older People
A progressive increase of the elderly population in the world has required technological solutions capable of improving the life prospects of people suffering from senile dementias such as Alzheimer's. Socially Assistive Robotics (SAR) in the research field of elderly care is a solution that can ensure, through observation and monitoring of behaviors, their safety and improve their physical and cognitive health. A social robot can autonomously and tirelessly monitor a person daily by providing assistive tasks such as remembering to take medication and suggesting activities to keep the assisted active both physically and cognitively. However, many projects in this area have not considered the preferences, needs, personality, and cognitive profiles of older people. Moreover, other projects have developed specific robotic applications making it difficult to reuse and adapt them on other hardware devices and for other different functional contexts. This thesis presents the development of a scalable, modular, multi-tenant robotic application and its testing in real-world environments. This work is part of the UPA4SAR project ``User-centered Profiling and Adaptation for Socially Assistive Robotics''. The UPA4SAR project aimed to develop a low-cost robotic application for faster deployment among the elderly population. The architecture of the proposed robotic system is modular, robust, and scalable due to the development of functionality in microservices with event-based communication. To improve robot acceptance the functionalities, enjoyed through microservices, adapt the robot's behaviors based on the preferences and personality of the assisted person. A key part of the assistance is the monitoring of activities that are recognized through deep neural network models proposed in this work. The final experimentation of the project carried out in the homes of elderly volunteers was performed with complete autonomy of the robotic system. Daily care plans customized to the person's needs and preferences were executed. These included notification tasks to remember when to take medication, tasks to check if basic nutrition activities were accomplished, entertainment and companionship tasks with games, videos, music for cognitive and physical stimulation of the patient
Real-time person re-identification for interactive environments
The work presented in this thesis was motivated by a vision of the future in which intelligent environments in public spaces such as galleries and museums, deliver useful and personalised services to people via natural interaction, that is, without the need for people to provide explicit instructions via tangible interfaces. Delivering the right services to the right people requires a means of biometrically identifying individuals and then re-identifying them as they move freely through the environment. Delivering the service they desire requires sensing their context, for example, sensing their location or proximity to resources.
This thesis presents both a context-aware system and a person re-identification method. A tabletop display was designed and prototyped with an infrared person-sensing context function. In experimental evaluation it exhibited tracking performance comparable to other more complex systems. A real-time, viewpoint invariant, person re-identification method is proposed based on a novel set of Viewpoint Invariant Multi-modal (ViMM) feature descriptors collected from depth-sensing cameras. The method uses colour and a combination of anthropometric properties logged as a function of body orientation. A neural network classifier is used to perform re-identification
Pathway to Future Symbiotic Creativity
This report presents a comprehensive view of our vision on the development
path of the human-machine symbiotic art creation. We propose a classification
of the creative system with a hierarchy of 5 classes, showing the pathway of
creativity evolving from a mimic-human artist (Turing Artists) to a Machine
artist in its own right. We begin with an overview of the limitations of the
Turing Artists then focus on the top two-level systems, Machine Artists,
emphasizing machine-human communication in art creation. In art creation, it is
necessary for machines to understand humans' mental states, including desires,
appreciation, and emotions, humans also need to understand machines' creative
capabilities and limitations. The rapid development of immersive environment
and further evolution into the new concept of metaverse enable symbiotic art
creation through unprecedented flexibility of bi-directional communication
between artists and art manifestation environments. By examining the latest
sensor and XR technologies, we illustrate the novel way for art data collection
to constitute the base of a new form of human-machine bidirectional
communication and understanding in art creation. Based on such communication
and understanding mechanisms, we propose a novel framework for building future
Machine artists, which comes with the philosophy that a human-compatible AI
system should be based on the "human-in-the-loop" principle rather than the
traditional "end-to-end" dogma. By proposing a new form of inverse
reinforcement learning model, we outline the platform design of machine
artists, demonstrate its functions and showcase some examples of technologies
we have developed. We also provide a systematic exposition of the ecosystem for
AI-based symbiotic art form and community with an economic model built on NFT
technology. Ethical issues for the development of machine artists are also
discussed