199,303 research outputs found
A novel hybrid deep learning model for human activity recognition based on transitional activities
In recent years, a plethora of algorithms have been devised for efficient human activity recognition. Most of these algorithms consider basic human activities and neglect postural transitions because of their subsidiary occurrence and short duration. However, postural transitions assume a significant part in the enforcement of an activity recognition framework and cannot be neglected. This work proposes a hybrid multi-model activity recognition approach that employs basic and transition activities by utilizing multiple deep learning models simultaneously. For final classification, a dynamic decision fusion module is introduced. The experiments are performed on the publicly available datasets. The proposed approach achieved a classification accuracy of 96.11% and 98.38% for the transition and basic activities, respectively. The outcomes show that the proposed method is superior to the state-of-the-art methods in terms of accuracy and precision
Speaker diarization assisted ASR for multi-speaker conversations
In this paper, we propose a novel approach for the transcription of speech
conversations with natural speaker overlap, from single channel recordings. We
propose a combination of a speaker diarization system and a hybrid automatic
speech recognition (ASR) system with speaker activity assisted acoustic model
(AM). An end-to-end neural network system is used for speaker diarization. Two
architectures, (i) input conditioned AM, and (ii) gated features AM, are
explored to incorporate the speaker activity information. The models output
speaker specific senones. The experiments on Switchboard telephone
conversations show the advantage of incorporating speaker activity information
in the ASR system for recordings with overlapped speech. In particular, an
absolute improvement of in word error rate (WER) is seen for the
proposed approach on natural conversation speech with automatic diarization.Comment: Manuscript submitted to INTERSPEECH 202
Attention Mechanism for Adaptive Feature Modelling
This thesis presents groundbreaking contributions in machine learning by exploring and advancing attention mechanisms within deep learning frameworks. We introduce innovative models and techniques that significantly enhance feature recognition and analysis in two key application areas: computer vision recognition and time series modeling. Our primary contributions include the development of a dual attention mechanism for crowd counting and the integration of supervised and unsupervised learning techniques for semi-supervised learning. Furthermore, we propose a novel Dynamic Unary Convolution in Transformer (DUCT) model for generalized visual recognition tasks, and investigate the efficacy of attention mechanisms in human activity recognition using time series data from wearable sensors based on the semi-supervised setting.
The capacity of humans to selectively focus on specific elements within complex scenes has long inspired machine learning research. Attention mechanisms, which dynamically modify weights to emphasize different input elements, are central to replicating this human perceptual ability in deep learning. These mechanisms have proven crucial in achieving significant advancements across various tasks.
In this thesis, we first provide a comprehensive review of the existing literature on attention mechanisms. We then introduce a dual attention mechanism for crowd counting, which employs both second-order and first-order attention to enhance spatial information processing and feature distinction. Additionally, we explore the convergence of supervised and unsupervised learning, focusing on a novel semi-supervised method that synergizes labeled and unlabeled data through an attention-driven recurrent unit and dual loss functions. This method aims to refine crowd counting in practical transportation scenarios.
Moreover, our research extends to a hybrid attention model for broader visual recognition challenges. By merging convolutional and transformer layers, this model adeptly handles multi-level features, where the DUCT modules play a pivotal role. We rigorously evaluate DUCT's performance across critical computer vision tasks. Finally, recognizing the significance of time series data in domains like health surveillance, we apply our proposed attention mechanism to human activity recognition, analyzing correlations between various daily activities to enhance the adaptability of deep learning frameworks to temporal dynamics
CAVIAR: Context-driven Active and Incremental Activity Recognition
Activity recognition on mobile device sensor data has been an active research area in mobile and pervasive computing for several years. While the majority of the proposed techniques are based on supervised learning, semi-supervised approaches are being considered to reduce the size of the training set required to initialize the model. These approaches usually apply self-training or active learning to incrementally refine the model, but their effectiveness seems to be limited to a restricted set of physical activities. We claim that the context which surrounds the user (e.g., time, location, proximity to transportation routes) combined with common knowledge about the relationship between context and human activities could be effective in significantly increasing the set of recognized activities including those that are difficult to discriminate only considering inertial sensors, and the highly context-dependent ones. In this paper, we propose CAVIAR, a novel hybrid semi-supervised and knowledge-based system for real-time activity recognition. Our method applies semantic reasoning on context-data to refine the predictions of an incremental classifier. The recognition model is continuously updated using active learning. Results on a real dataset obtained from 26 subjects show the effectiveness of our approach in increasing the recognition rate, extending the number of recognizable activities and, most importantly, reducing the number of queries triggered by active learning. In order to evaluate the impact of context reasoning, we also compare CAVIAR with a purely statistical version, considering features computed on context-data as part of the machine learning process
Hybrid Predictive Coding: Inferring, Fast and Slow
Predictive coding is an influential model of cortical neural activity. It
proposes that perceptual beliefs are furnished by sequentially minimising
"prediction errors" - the differences between predicted and observed data.
Implicit in this proposal is the idea that perception requires multiple cycles
of neural activity. This is at odds with evidence that several aspects of
visual perception - including complex forms of object recognition - arise from
an initial "feedforward sweep" that occurs on fast timescales which preclude
substantial recurrent activity. Here, we propose that the feedforward sweep can
be understood as performing amortized inference and recurrent processing can be
understood as performing iterative inference. We propose a hybrid predictive
coding network that combines both iterative and amortized inference in a
principled manner by describing both in terms of a dual optimization of a
single objective function. We show that the resulting scheme can be implemented
in a biologically plausible neural architecture that approximates Bayesian
inference utilising local Hebbian update rules. We demonstrate that our hybrid
predictive coding model combines the benefits of both amortized and iterative
inference -- obtaining rapid and computationally cheap perceptual inference for
familiar data while maintaining the context-sensitivity, precision, and sample
efficiency of iterative inference schemes. Moreover, we show how our model is
inherently sensitive to its uncertainty and adaptively balances iterative and
amortized inference to obtain accurate beliefs using minimum computational
expense. Hybrid predictive coding offers a new perspective on the functional
relevance of the feedforward and recurrent activity observed during visual
perception and offers novel insights into distinct aspects of visual
phenomenology.Comment: 05/04/22 initial upload. 06/04/22 added acknowledgements sectio
Wearable Sensor Data Based Human Activity Recognition using Machine Learning: A new approach
Recent years have witnessed the rapid development of human activity
recognition (HAR) based on wearable sensor data. One can find many practical
applications in this area, especially in the field of health care. Many machine
learning algorithms such as Decision Trees, Support Vector Machine, Naive
Bayes, K-Nearest Neighbor, and Multilayer Perceptron are successfully used in
HAR. Although these methods are fast and easy for implementation, they still
have some limitations due to poor performance in a number of situations. In
this paper, we propose a novel method based on the ensemble learning to boost
the performance of these machine learning methods for HAR
- …