283 research outputs found

    Improving kNN for human activity recognition with privileged learning using translation models.

    Get PDF
    Multiple sensor modalities provide more accurate Human Activity Recognition (HAR) compared to using a single modality, yet the latter is preferred by consumers as it is more convenient and less intrusive. This presents a challenge to researchers, as a single modality is likely to pick up movement that is both relevant as well as extraneous to the human activity being tracked and lead to poorer performance. The goal of an optimal HAR solution is therefore to utilise the fewest sensors at deployment, while maintaining performance levels achievable using all available sensors. To this end, we introduce two translation approaches, capable of generating missing modalities from available modalities. These can be used to generate missing or 'privileged' modalities at deployment to augment case representations and improve HAR.We evaluate the presented translators with k-NN classifiers on two HAR datasets and achieve up-to 5% performance improvements using representations augmented with privileged modalities. This suggests that non-intrusive modalities suited for deployment benefit from translation models that generates privileged modalities

    A Survey on Multi-Resident Activity Recognition in Smart Environments

    Full text link
    Human activity recognition (HAR) is a rapidly growing field that utilizes smart devices, sensors, and algorithms to automatically classify and identify the actions of individuals within a given environment. These systems have a wide range of applications, including assisting with caring tasks, increasing security, and improving energy efficiency. However, there are several challenges that must be addressed in order to effectively utilize HAR systems in multi-resident environments. One of the key challenges is accurately associating sensor observations with the identities of the individuals involved, which can be particularly difficult when residents are engaging in complex and collaborative activities. This paper provides a brief overview of the design and implementation of HAR systems, including a summary of the various data collection devices and approaches used for human activity identification. It also reviews previous research on the use of these systems in multi-resident environments and offers conclusions on the current state of the art in the field.Comment: 16 pages, to appear in Evolution of Information, Communication and Computing Systems (EICCS) Book Serie

    User-oriented recommender systems in retail

    Get PDF
    User satisfaction is considered a key objective for all service provider platforms, regardless of the nature of the service, encompassing domains such as media, entertainment, retail, and information. While the goal of satisfying users is the same across different domains and services, considering domain-specific characteristics is of paramount importance to ensure users have a positive experience with a given system. User interaction data with a system is one of the main sources of data that facilitates achieving this goal. In this thesis, we investigate how to learn from domain-specific user interactions. We focus on recommendation as our main task, and retail as our main domain. We further explore the finance domain and the demand forecasting task as additional directions to understand whether our methodology and findings generalize to other tasks and domains. The research in this thesis is organized around the following dimensions: 1) Characteristics of multi-channel retail: we consider a retail setting where interaction data comes from both digital (i.e., online) and in-store (i.e., offline) shopping; 2) From user behavior to recommendation: we conduct extensive descriptive studies on user interaction log datasets that inform the design of recommender systems in two domains, retail and finance. Our key contributions in characterizing multi-channel retail are two-fold. First, we propose a neural model that makes use of sales in multiple shopping channels in order to improve the performance of demand forecasting in a target channel. Second, we provide the first study of user behavior in a multi-channel retail setting, which results in insights about the channel-specific properties of user behavior, and their effects on the performance of recommender systems. We make three main contributions in designing user-oriented recommender systems. First, we provide a large-scale user behavior study in the finance domain, targeted at understanding financial information seeking behavior in user interactions with company filings. We then propose domain-specific user-oriented filing recommender systems that are informed by the findings of the user behavior analysis. Second, we analyze repurchasing behavior in retail, specifically in the grocery shopping domain. We then propose a repeat consumption-aware neural recommender for this domain. Third, we focus on scalable recommendation in retail and propose an efficient recommender system that explicitly models users' personal preferences that are reflected in their purchasing history

    User-oriented recommender systems in retail

    Get PDF
    User satisfaction is considered a key objective for all service provider platforms, regardless of the nature of the service, encompassing domains such as media, entertainment, retail, and information. While the goal of satisfying users is the same across different domains and services, considering domain-specific characteristics is of paramount importance to ensure users have a positive experience with a given system. User interaction data with a system is one of the main sources of data that facilitates achieving this goal. In this thesis, we investigate how to learn from domain-specific user interactions. We focus on recommendation as our main task, and retail as our main domain. We further explore the finance domain and the demand forecasting task as additional directions to understand whether our methodology and findings generalize to other tasks and domains. The research in this thesis is organized around the following dimensions: 1) Characteristics of multi-channel retail: we consider a retail setting where interaction data comes from both digital (i.e., online) and in-store (i.e., offline) shopping; 2) From user behavior to recommendation: we conduct extensive descriptive studies on user interaction log datasets that inform the design of recommender systems in two domains, retail and finance. Our key contributions in characterizing multi-channel retail are two-fold. First, we propose a neural model that makes use of sales in multiple shopping channels in order to improve the performance of demand forecasting in a target channel. Second, we provide the first study of user behavior in a multi-channel retail setting, which results in insights about the channel-specific properties of user behavior, and their effects on the performance of recommender systems. We make three main contributions in designing user-oriented recommender systems. First, we provide a large-scale user behavior study in the finance domain, targeted at understanding financial information seeking behavior in user interactions with company filings. We then propose domain-specific user-oriented filing recommender systems that are informed by the findings of the user behavior analysis. Second, we analyze repurchasing behavior in retail, specifically in the grocery shopping domain. We then propose a repeat consumption-aware neural recommender for this domain. Third, we focus on scalable recommendation in retail and propose an efficient recommender system that explicitly models users' personal preferences that are reflected in their purchasing history

    Learning from Audio, Vision and Language Modalities for Affect Recognition Tasks

    Get PDF
    The world around us as well as our responses to worldly events are multimodal in nature. For intelligent machines to integrate seamlessly into our world, it is imperative that they can process and derive useful information from multimodal signals. Such capabilities can be provided to machines by employing multimodal learning algorithms that consider both the individual characteristics of unimodal signals as well as the complementariness provided by multimodal signals. Based on the number of modalities available during the training and testing phases, learning algorithms can be of three categories: unimodal trained and unimodal tested, multimodal trained and multimodal tested, and multimodal trained and unimodal tested algorithms. This thesis provides three contributions, one for each category and focuses on three modalities that are important for human-human and human-machine communication, namely, audio (paralinguistic speech), vision (facial expressions) and language (linguistic speech) signals. For several applications, either due to hardware limitations or deployment specifications, unimodal trained and tested systems suffice. Our first contribution, for the unimodal trained and unimodal tested category, is an end-to-end deep neural network that uses raw speech signals as input for a computational paralinguistic task, namely, verbal conflict intensity estimation. Our model, which uses a convolutional recurrent architecture equipped with attention mechanism to focus on task-relevant instances of the input speech signal, eliminates the need for task-specific meta data or domain knowledge based manual refinement of hand-crafted generic features. The second contribution, for the multimodal trained and multimodal tested category, is a multimodal fusion framework that exploits both cross (inter) and intra-modal interactions for categorical emotion recognition from audiovisual clips. We explore the effectiveness of two types of attention mechanisms, namely, intra and cross-modal attention by creating two versions of our fusion framework. In many applications, multimodal signals might be available during model training phase, yet we cannot expect the availability of all modality signals during testing phase. Our third contribution addresses this situation wherein we propose a framework for cross-modal learning where paired audio-visual instances are used during training to develop test-time stand-alone unimodal models

    Analysis of sensorimotor rhythms based on lower-limbs motor imagery for brain-computer interface

    Get PDF
    Over recent years significant advancements in the field of assistive technologies have been observed. One of the paramount needs for the development and advancement that urged researchers to contribute in the field other than congenital or diagnosed chronic disorders, is the rising number of affectees from accidents, natural calamity (due to climate change), or warfare, worldwide resulting in spinal cord injuries (SCI), neural disorder, or amputation (interception) of limbs, that impede a human to live a normal life. In addition to this, more than ten million people in the world are living with some form of handicap due to the central nervous system (CNS) disorder, which is precarious. Biomedical devices for rehabilitation are the center of research focus for many years. For people with lost motor control, or amputation, but unscathed sensory control, instigation of control signals from the source, i.e. electrophysiological signals, is vital for seamless control of assistive biomedical devices. Control signals, i.e. motion intentions, arouse    in the sensorimotor cortex of the brain that can be detected using invasive or non-invasive modality. With non-invasive modality, the electroencephalography (EEG) is used to record these motion intentions encoded in electrical activity of the cortex, and are deciphered to recognize user intent for locomotion. They are further transferred to the actuator, or end effector of the assistive device for control purposes. This can be executed via the brain-computer interface (BCI) technology. BCI is an emerging research field that establishes a real-time bidirectional connection between the human brain and a computer/output device. Amongst its diverse applications, neurorehabilitation to deliver sensory feedback and brain controlled biomedical devices for rehabilitation are most popular. While substantial literature on control of upper-limb assistive technologies controlled via BCI is there, less is known about the lower-limb (LL) control of biomedical devices for navigation or gait assistance via BCI. The types  of EEG signals compatible with an independent BCI are the oscillatory/sensorimotor rhythms (SMR) and event-related potential (ERP). These signals have successfully been used in BCIs for navigation control of assistive devices. However, ERP paradigm accounts for a voluminous setup for stimulus presentation to the user during operation of BCI assistive device. Contrary to this, the SMR does not require large setup for activation of cortical activity; it instead depends on the motor imagery (MI) that is produced synchronously or asynchronously by the user. MI is a covert cognitive process also termed kinaesthetic motor imagery (KMI) and elicits clearly after rigorous training trials, in form of event-related desynchronization (ERD) or synchronization (ERS), depending on imagery activity or resting period. It usually comprises of limb movement tasks, but is not limited to it in a BCI paradigm. In order to produce detectable features that correlate to the user¿s intent, selection of cognitive task is an important aspect to improve the performance of a BCI. MI used in BCI predominantly remains associated with the upper- limbs, particularly hands, due to the somatotopic organization of the motor cortex. The hand representation area is substantially large, in contrast to the anatomical location of the LL representation areas in the human sensorimotor cortex. The LL area is located within the interhemispheric fissure, i.e. between the mesial walls of both hemispheres of the cortex. This makes it arduous to detect EEG features prompted upon imagination of LL. Detailed investigation of the ERD/ERS in the mu and beta oscillatory rhythms during left and right LL KMI tasks is required, as the user¿s intent to walk is of paramount importance associated to everyday activity. This is an important area of research, followed by the improvisation of the already existing rehabilitation system that serves the LL affectees. Though challenging, solution to these issues is also imperative for the development of robust controllers that follow the asynchronous BCI paradigms to operate LL assistive devices seamlessly. This thesis focusses on the investigation of cortical lateralization of ERD/ERS in the SMR, based on foot dorsiflexion KMI and knee extension KMI separately. This research infers the possibility to deploy these features in real-time BCI by finding maximum possible classification accuracy from the machine learning (ML) models. EEG signal is non-stationary, as it is characterized by individual-to-individual and trial-to-trial variability, and a low signal-to-noise ratio (SNR), which is challenging. They are high in dimension with relatively low number of samples available for fitting ML models to the data. These factors account for ML methods that were developed into the tool of choice  to analyse single-trial EEG data. Hence, the selection of appropriate ML model for true detection of class label with no tradeoff of overfitting is crucial. The feature extraction part of the thesis constituted of testing the band-power (BP) and the common spatial pattern (CSP) methods individually. The study focused on the synchronous BCI paradigm. This was to ensure the exhibition of SMR for the possibility of a practically viable control system in a BCI. For the left vs. right foot KMI, the objective was to distinguish the bilateral tasks, in order to use them as unilateral commands in a 2-class BCI for controlling/navigating a robotic/prosthetic LL for rehabilitation. Similar was the approach for left-right knee KMI. The research was based on four main experimental studies. In addition to the four studies, the research is also inclusive of the comparison of intra-cognitive tasks within the same limb, i.e. left foot vs. left knee and right foot vs. right knee tasks, respectively (Chapter 4). This added to another novel contribution towards the findings based on comparison of different tasks within the same LL. It provides basis to increase the dimensionality of control signals within one BCI paradigm, such as a BCI-controlled LL assistive device with multiple degrees of freedom (DOF) for restoration of locomotion function. This study was based on analysis of statistically significant mu ERD feature using BP feature extraction method. The first stage of this research comprised of the left vs. right foot KMI tasks, wherein the ERD/ERS that elicited in the mu-beta rhythms were analysed using BP feature extraction method (Chapter 5). Three individual features, i.e. mu ERD, beta ERD, and beta ERS were investigated on EEG topography and time-frequency (TF) maps, and average time course of power percentage, using the common average reference and bipolar reference methods. A comparative study was drawn for both references to infer the optimal method. This was followed by ML, i.e. classification of the three feature vectors (mu ERD, beta ERD, and beta ERS), using linear discriminant analysis (LDA), support vector machine (SVM), and k-nearest neighbour (KNN) algorithms, separately. Finally, the multiple correction statistical tests were done, in order to predict maximum possible classification accuracy amongst all paradigms for the most significant feature. All classifier models were supported with the statistical techniques of k-fold cross validation and evaluation of area under receiver-operator characteristic curves (AUC-ROC) for prediction of the true class label. The highest classification accuracy of 83.4% ± 6.72 was obtained with KNN model for beta ERS feature. The next study was based on enhancing the classification accuracy obtained from previous study. It was based on using similar cognitive tasks as study in Chapter 5, however deploying different methodology for feature extraction and classification procedure. In the second study, ERD/ERS from mu and beta rhythms were extracted using CSP and filter bank common spatial pattern (FBCSP) algorithms, to optimize the individual spatial patterns (Chapter 6). This was followed by ML process, for which the supervised logistic regression (Logreg) and LDA were deployed separately. Maximum classification accuracy resulted in 77.5% ± 4.23 with FBCSP feature vector and LDA model, with a maximum kappa coefficient of 0.55 that is in the moderate range of agreement between the two classes. The left vs. right foot discrimination results were nearly same, however the BP feature vector performed better than CSP. The third stage was based on the deployment of novel cognitive task of left vs. right knee extension KMI. Analysis of the ERD/ERS in the mu-beta rhythms was done for verification of cortical lateralization via BP feature vector (Chapter 7). Similar to Chapter 5, in this study the analysis of ERD/ERS features was done on the EEG topography and TF maps, followed by the determination of average time course and peak latency of feature occurrence. However, for this study, only mu ERD and beta ERS features were taken into consideration and the EEG recording method only comprised of common average reference. This was due to the established results from the foot study earlier, in Chapter 5, where beta ERD features showed less average amplitude. The LDA and KNN classification algorithms were employed. Unexpectedly, the left vs. right knee KMI reflected the highest accuracy of 81.04% ± 7.5 and an AUC-ROC = 0.84, strong enough to be used in a real-time BCI as two independent control features. This was using KNN model for beta ERS feature. The final study of this research followed the same paradigm as used in Chapter 6, but for left vs. right knee KMI cognitive task (Chapter 8). Primarily this study aimed at enhancing the resulting accuracy from Chapter 7, using CSP and FBCSP methods with Logreg and LDA models respectively. Results were in accordance with those of the already established foot KMI study, i.e. BP feature vector performed better than the CSP. Highest classification accuracy of 70.00% ± 2.85 with kappa score of 0.40 was obtained with Logreg using FBCSP feature vector. Results stipulated the utilization of ERD/ERS in mu and beta bands, as independent control features for discrimination of bilateral foot or the novel bilateral knee KMI tasks. Resulting classification accuracies implicate that any 2-class BCI, employing unilateral foot, or knee KMI, is suitable for real-time implementation. In conclusion, this thesis demonstrates the possible EEG pre-processing, feature extraction and classification methods to instigate a real-time BCI from the conducted studies. Following this, the critical aspects of latency in information transfer rate, SNR, and tradeoff between dimensionality and overfitting needs to be taken care of, during design of real-time BCI controller. It also highlights that there is a need for consensus over the development of standardized methods of cognitive tasks for MI based BCI. Finally, the application of wireless EEG for portable assistance is essential as it will contribute to lay the foundations of the development of independent asynchronous BCI based on SMR

    Affective Image Content Analysis: Two Decades Review and New Perspectives

    Get PDF
    Images can convey rich semantics and induce various emotions in viewers. Recently, with the rapid advancement of emotional intelligence and the explosive growth of visual data, extensive research efforts have been dedicated to affective image content analysis (AICA). In this survey, we will comprehensively review the development of AICA in the recent two decades, especially focusing on the state-of-the-art methods with respect to three main challenges -- the affective gap, perception subjectivity, and label noise and absence. We begin with an introduction to the key emotion representation models that have been widely employed in AICA and description of available datasets for performing evaluation with quantitative comparison of label noise and dataset bias. We then summarize and compare the representative approaches on (1) emotion feature extraction, including both handcrafted and deep features, (2) learning methods on dominant emotion recognition, personalized emotion prediction, emotion distribution learning, and learning from noisy data or few labels, and (3) AICA based applications. Finally, we discuss some challenges and promising research directions in the future, such as image content and context understanding, group emotion clustering, and viewer-image interaction.Comment: Accepted by IEEE TPAM
    • …
    corecore