43 research outputs found

    Practical and Rich User Digitization

    Full text link
    A long-standing vision in computer science has been to evolve computing devices into proactive assistants that enhance our productivity, health and wellness, and many other facets of our lives. User digitization is crucial in achieving this vision as it allows computers to intimately understand their users, capturing activity, pose, routine, and behavior. Today's consumer devices - like smartphones and smartwatches provide a glimpse of this potential, offering coarse digital representations of users with metrics such as step count, heart rate, and a handful of human activities like running and biking. Even these very low-dimensional representations are already bringing value to millions of people's lives, but there is significant potential for improvement. On the other end, professional, high-fidelity comprehensive user digitization systems exist. For example, motion capture suits and multi-camera rigs that digitize our full body and appearance, and scanning machines such as MRI capture our detailed anatomy. However, these carry significant user practicality burdens, such as financial, privacy, ergonomic, aesthetic, and instrumentation considerations, that preclude consumer use. In general, the higher the fidelity of capture, the lower the user's practicality. Most conventional approaches strike a balance between user practicality and digitization fidelity. My research aims to break this trend, developing sensing systems that increase user digitization fidelity to create new and powerful computing experiences while retaining or even improving user practicality and accessibility, allowing such technologies to have a societal impact. Armed with such knowledge, our future devices could offer longitudinal health tracking, more productive work environments, full body avatars in extended reality, and embodied telepresence experiences, to name just a few domains.Comment: PhD thesi

    Multimodal Learning and Its Application to Mobile Active Authentication

    Get PDF
    Mobile devices are becoming increasingly popular due to their flexibility and convenience in managing personal information such as bank accounts, profiles and passwords. With the increasing use of mobile devices comes the issue of security as the loss of a smartphone would compromise the personal information of the user. Traditional methods for authenticating users on mobile devices are based on passwords or fingerprints. As long as mobile devices remain active, they do not incorporate any mechanisms for verifying if the user originally authenticated is still the user in control of the mobile device. Thus, unauthorized individuals may improperly obtain access to personal information of the user if a password is compromised or if a user does not exercise adequate vigilance after initial authentication on a device. To deal with this problem, active authentication systems have been proposed in which users are continuously monitored after the initial access to the mobile device. Active authentication systems can capture users' data (facial image data, screen touch data, motion data, etc) through sensors (camera, touch screen, accelerometer, etc), extract features from different sensors' data, build classification models and authenticate users via comparing additional sensor data against the models. Mobile active authentication can be viewed as one application of the more general problem, namely, multimodal classification. The idea of multimodal classification is to utilize multiple sources (modalities) measuring the same instance to improve the overall performance compared to using a single source (modality). Multimodal classification also arises in many computer vision tasks such as image classification, RGBD object classification and scene recognition. In this dissertation, we not only present methods and algorithms related to active authentication problems, but also propose multimodal recognition algorithms based on low-rank and joint sparse representations as well as multimodal metric learning algorithm to improve multimodal classification performance. The multimodal learning algorithms proposed in this dissertation make no assumption about the feature type or applications, thus they can be applied to various recognition tasks such as mobile active authentication, image classification and RGBD recognition. First, we study the mobile active authentication problem by exploiting a dataset consisting of 50 users' face captured by the phone's frontal camera and screen touch data sensed by the screen for evaluating active authentication algorithms developed under this research. The dataset is named as UMD Active Authentication (UMDAA) dataset. Details on data preprocessing and feature extraction for touch data and face data are described respectively. Second, we present an approach for active user authentication using screen touch gestures by building linear and kernelized dictionaries based on sparse representations and associated classifiers. Experiments using the screen touch data components of UMDAA dataset as well as two other publicly available screen touch datasets show that the dictionary-based classification method compares favorably to those discussed in the literature. Experiments done using screen touch data collected in three different sessions show a drop in performance when the training and test data come from different sessions. This suggests a need for applying domain adaptation methods to further improve the performance of the classifiers. Third, we propose a domain adaptive sparse representation-based classification method that learns projections of data in a space where the sparsity of data is maintained. We provide an efficient iterative procedure for solving the proposed optimization problem. One of the key features of the proposed method is that it is computationally efficient as learning is done in the lower-dimensional space. Various experiments on UMDAA dataset show that our method is able to capture the meaningful structure of data and can perform significantly better than many competitive domain adaptation algorithms. Fourth, we propose low-rank and joint sparse representations-based multimodal recognition. Our formulations can be viewed as generalized versions of multivariate low-rank and sparse regression, where sparse and low-rank representations across all the modalities are imposed. One of our methods takes into account coupling information within different modalities simultaneously by enforcing the common low-rank and joint sparse representation among each modality's observations. We also modify our formulations by including an occlusion term that is assumed to be sparse. The alternating direction method of multipliers is proposed to efficiently solve the proposed optimization problems. Extensive experiments on UMDAA dataset, WVU multimodal biometrics dataset and Pascal-Sentence image classification dataset show that that our methods provide better recognition performance than other feature-level fusion methods. Finally, we propose a hierarchical multimodal metric learning algorithm for multimodal data in order to improve multimodal classification performance. We design metric for each modality as a product of two matrices: one matrix is modality specific, the other is enforced to be shared by all the modalities. The modality specific projection matrices capture the varying characteristics exhibited by multiple modalities and the common projection matrix establishes the relationship of the distance metrics corresponding to multiple modalities. The learned metrics significantly improves classification accuracy and experimental results of tagged image classification problem as well as various RGBD recognition problems show that the proposed algorithm outperforms existing learning algorithms based on multiple metrics as well as other state-of-the-art approaches tested on these datasets. Furthermore, we make the proposed multimodal metric learning algorithm non-linear by using kernel methods

    Developing an Autonomous Mobile Robotic Device for Monitoring and Assisting Older People

    Get PDF
    A progressive increase of the elderly population in the world has required technological solutions capable of improving the life prospects of people suffering from senile dementias such as Alzheimer's. Socially Assistive Robotics (SAR) in the research field of elderly care is a solution that can ensure, through observation and monitoring of behaviors, their safety and improve their physical and cognitive health. A social robot can autonomously and tirelessly monitor a person daily by providing assistive tasks such as remembering to take medication and suggesting activities to keep the assisted active both physically and cognitively. However, many projects in this area have not considered the preferences, needs, personality, and cognitive profiles of older people. Moreover, other projects have developed specific robotic applications making it difficult to reuse and adapt them on other hardware devices and for other different functional contexts. This thesis presents the development of a scalable, modular, multi-tenant robotic application and its testing in real-world environments. This work is part of the UPA4SAR project ``User-centered Profiling and Adaptation for Socially Assistive Robotics''. The UPA4SAR project aimed to develop a low-cost robotic application for faster deployment among the elderly population. The architecture of the proposed robotic system is modular, robust, and scalable due to the development of functionality in microservices with event-based communication. To improve robot acceptance the functionalities, enjoyed through microservices, adapt the robot's behaviors based on the preferences and personality of the assisted person. A key part of the assistance is the monitoring of activities that are recognized through deep neural network models proposed in this work. The final experimentation of the project carried out in the homes of elderly volunteers was performed with complete autonomy of the robotic system. Daily care plans customized to the person's needs and preferences were executed. These included notification tasks to remember when to take medication, tasks to check if basic nutrition activities were accomplished, entertainment and companionship tasks with games, videos, music for cognitive and physical stimulation of the patient

    Real-time person re-identification for interactive environments

    Get PDF
    The work presented in this thesis was motivated by a vision of the future in which intelligent environments in public spaces such as galleries and museums, deliver useful and personalised services to people via natural interaction, that is, without the need for people to provide explicit instructions via tangible interfaces. Delivering the right services to the right people requires a means of biometrically identifying individuals and then re-identifying them as they move freely through the environment. Delivering the service they desire requires sensing their context, for example, sensing their location or proximity to resources. This thesis presents both a context-aware system and a person re-identification method. A tabletop display was designed and prototyped with an infrared person-sensing context function. In experimental evaluation it exhibited tracking performance comparable to other more complex systems. A real-time, viewpoint invariant, person re-identification method is proposed based on a novel set of Viewpoint Invariant Multi-modal (ViMM) feature descriptors collected from depth-sensing cameras. The method uses colour and a combination of anthropometric properties logged as a function of body orientation. A neural network classifier is used to perform re-identification

    Pathway to Future Symbiotic Creativity

    Full text link
    This report presents a comprehensive view of our vision on the development path of the human-machine symbiotic art creation. We propose a classification of the creative system with a hierarchy of 5 classes, showing the pathway of creativity evolving from a mimic-human artist (Turing Artists) to a Machine artist in its own right. We begin with an overview of the limitations of the Turing Artists then focus on the top two-level systems, Machine Artists, emphasizing machine-human communication in art creation. In art creation, it is necessary for machines to understand humans' mental states, including desires, appreciation, and emotions, humans also need to understand machines' creative capabilities and limitations. The rapid development of immersive environment and further evolution into the new concept of metaverse enable symbiotic art creation through unprecedented flexibility of bi-directional communication between artists and art manifestation environments. By examining the latest sensor and XR technologies, we illustrate the novel way for art data collection to constitute the base of a new form of human-machine bidirectional communication and understanding in art creation. Based on such communication and understanding mechanisms, we propose a novel framework for building future Machine artists, which comes with the philosophy that a human-compatible AI system should be based on the "human-in-the-loop" principle rather than the traditional "end-to-end" dogma. By proposing a new form of inverse reinforcement learning model, we outline the platform design of machine artists, demonstrate its functions and showcase some examples of technologies we have developed. We also provide a systematic exposition of the ecosystem for AI-based symbiotic art form and community with an economic model built on NFT technology. Ethical issues for the development of machine artists are also discussed