1,230 research outputs found

    Towards An Intelligent Fuzzy Based Multimodal Two Stage Speech Enhancement System

    Get PDF
    This thesis presents a novel two stage multimodal speech enhancement system, making use of both visual and audio information to filter speech, and explores the extension of this system with the use of fuzzy logic to demonstrate proof of concept for an envisaged autonomous, adaptive, and context aware multimodal system. The design of the proposed cognitively inspired framework is scalable, meaning that it is possible for the techniques used in individual parts of the system to be upgraded and there is scope for the initial framework presented here to be expanded. In the proposed system, the concept of single modality two stage filtering is extended to include the visual modality. Noisy speech information received by a microphone array is first pre-processed by visually derived Wiener filtering employing the novel use of the Gaussian Mixture Regression (GMR) technique, making use of associated visual speech information, extracted using a state of the art Semi Adaptive Appearance Models (SAAM) based lip tracking approach. This pre-processed speech is then enhanced further by audio only beamforming using a state of the art Transfer Function Generalised Sidelobe Canceller (TFGSC) approach. This results in a system which is designed to function in challenging noisy speech environments (using speech sentences with different speakers from the GRID corpus and a range of noise recordings), and both objective and subjective test results (employing the widely used Perceptual Evaluation of Speech Quality (PESQ) measure, a composite objective measure, and subjective listening tests), showing that this initial system is capable of delivering very encouraging results with regard to filtering speech mixtures in difficult reverberant speech environments. Some limitations of this initial framework are identified, and the extension of this multimodal system is explored, with the development of a fuzzy logic based framework and a proof of concept demonstration implemented. Results show that this proposed autonomous,adaptive, and context aware multimodal framework is capable of delivering very positive results in difficult noisy speech environments, with cognitively inspired use of audio and visual information, depending on environmental conditions. Finally some concluding remarks are made along with proposals for future work

    Contribution of GJB2 gene mutations to hearing loss in Pakistani population – A Narrative Review

    Get PDF
    Pakistan has a unique population for the study of recessive genetic diseases due to a higher consanguinity rate. Hearing impairment is the loss of hearing normal sounds, and it is a common sensory disorder that affects more than 466 million people worldwide. Immuno-genetic and other environmental factors like loud noises, drug usage, and viral infections are the causes of hearing loss. Hearing loss is categorized into a syndromic hearing loss (70%) and non-syndromic hearing loss (30%). GJB2 mutations are one of the main causes of hearing loss in different populations, including Pakistan. The GJB2 gene encodes a gap junction protein involved in the homeostasis of the inner ear through the recycling of potassium ions. The prevalence of GJB2 mutation in the Pakistani population varies from 6.1 to 9.2%. The most common mutations found in the Pakistani population are 71G>A (p.(Trp24*), 231G > A (p. Trp77*), c.35delG (p. Gly11Leufs24*),c.355G>T (p. Glu119*) 457G > A (p.Val153Ile), 598G > A (p.Gly200Arg), 439G > A (p.Glu147Lys), c.377_378insATGCGGA (p.Arg127Cysfs*85). c.1055C>T (p. Pro352Leu), c.6202A>C p.(Thr2068Pro), c.2496_2496delC p.(Tyr832*) and c.355G>T p.(Glu119*).Keywords: Gap junction protein, GJB2 mutations, Hearing loss

    Visual Speech Recognition with Lightweight Psychologically Motivated Gabor Features

    Get PDF
    Extraction of relevant lip features is of continuing interest in the visual speech domain. 1 Using end-to-end feature extraction can produce good results, but at the cost of the results being 2 difficult for humans to comprehend and relate to. We present a new, lightweight feature extraction 3 approach, motivated by human-centric glimpse based psychological research into facial barcodes, 4 and demonstrate that these simple, easy to extract 3D geometric features (produced using Gabor 5 based image patches), can successfully be used for speech recognition with LSTM based machine 6 learning. This approach can successfully extract low dimensionality lip parameters with a minimum 7 of processing. One key difference between using these Gabor-based features and using other features 8 such as traditional DCT, or the current fashion for CNN features is that these are human-centric 9 features that can be visualised and analysed by humans. This means that it is easier to explain and 10 visualise the results. They can also be used for reliable speech recognition, as demonstrated using the 11 Grid corpus. Results for overlapping speakers using our lightweight system gave a recognition rate 12 of over 82%, which compares well to less explainable features in the literature. 1

    Factors Affecting the Accessibility of IT Artifacts : A Systematic Review

    Get PDF
    Accessibility awareness and development have improved in the past two decades, but many users still encounter accessibility barriers when using information technology (IT) artifacts (e.g., user interfaces and websites). Current research in information systems and human-computer interaction disciplines explores methods, techniques, and factors affecting the accessibility of IT artifacts for a particular population and provides solutions to address these barriers. However, design realized in one solution should be used to provide accessibility to the widest range of users, which requires an integration of solutions. To identify the factors that cause accessibility barriers and the solutions for users with different needs, a systematic literature review was conducted. This paper contributes to the existing body of knowledge by revealing (1) management- and development-level factors, and (2) user perspective factors affecting accessibility that address different accessibility barriers to different groups of population (based on the International Classification of Functioning by the World Health Organization). Based on these findings, we synthesize and illustrate the factors and solutions that need to be addressed when creating an accessible IT artifact.© 2022 by the Association for Information Systems. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than the Association for Information Systems must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists requires prior specific permission and/or fee. Request permission to publish from: AIS Administrative Office, P.O. Box 2712 Atlanta, GA, 30301-2712 Attn: Reprints are via e-mail from [email protected]=vertaisarvioitu|en=peerReviewed

    Factors Affecting the Accessibility of IT Artifacts: A Systematic Review

    Get PDF
    Accessibility awareness and development have improved in the past two decades, but many users still encounter accessibility barriers when using information technology (IT) artifacts (e.g., user interfaces and websites). Current research in information systems and human-computer interaction disciplines explores methods, techniques, and factors affecting the accessibility of IT artifacts for a particular population and provides solutions to address these barriers. However, design realized in one solution should be used to provide accessibility to the widest range of users, which requires an integration of solutions. To identify the factors that cause accessibility barriers and the solutions for users with different needs, a systematic literature review was conducted. This paper contributes to the existing body of knowledge by revealing (1) management- and development-level factors, and (2) user perspective factors affecting accessibility that address different accessibility barriers to different groups of population (based on the International Classification of Functioning by the World Health Organization). Based on these findings, we synthesize and illustrate the factors and solutions that need to be addressed when creating an accessible IT artifact

    Sentiment Analysis of Persian Movie Reviews Using Deep Learning

    Get PDF
    Sentiment analysis aims to automatically classify the subject’s sentiment (e.g., positive, negative, or neutral) towards a particular aspect such as a topic, product, movie, news, etc. Deep learning has recently emerged as a powerful machine learning technique to tackle the growing demand for accurate sentiment analysis. However, the majority of research efforts are devoted to English-language only, while information of great importance is also available in other languages. This paper presents a novel, context-aware, deep-learning-driven, Persian sentiment analysis approach. Specifically, the proposed deep-learning-driven automated feature-engineering approach classifies Persian movie reviews as having positive or negative sentiments. Two deep learning algorithms, convolutional neural networks (CNN) and long-short-term memory (LSTM), are applied and compared with our previously proposed manual-feature-engineering-driven, SVM-based approach. Simulation results demonstrate that LSTM obtained a better performance as compared to multilayer perceptron (MLP), autoencoder, support vector machine (SVM), logistic regression and CNN algorithms

    Novel Deep Convolutional Neural Network-Based Contextual Recognition of Arabic Handwritten Scripts

    Get PDF
    Offline Arabic Handwriting Recognition (OAHR) has recently become instrumental in the areas of pattern recognition and image processing due to its application in several fields, such as office automation and document processing. However, OAHR continues to face several challenges, including the high variability of the Arabic script and its intrinsic characteristics such as cursiveness, ligatures, and diacritics, the unlimited variation in human handwriting, and the lack of large public databases. In this paper, we have introduced a novel context-aware model based on deep neural networks to address the challenges of recognizing offline handwritten Arabic text, including isolated digits, characters, and words. Specifically, we have proposed a supervised Convolutional Neural Network (CNN) model that contextually extracts optimal features and employs batch normalization and dropout regularization parameters to prevent overfitting and further enhance its generalization performance when compared to conventional deep learning models. We employed numerous deep stacked-convolutional layers to design the proposed Deep CNN (DCNN) architecture. The proposed model was extensively evaluated, and it was observed to achieve excellent classification accuracy when compared to the existing state-of-the-art OAHR approaches on a diverse set of six benchmark databases, including MADBase (Digits), CMATERDB (Digits), HACDB (Characters), SUST-ALT (Digits), SUST-ALT (Characters), and SUST-ALT (Names). Further comparative experiments were conducted on the respective databases using the pre-trained VGGNet-19 and Mobile-Net models; additionally, generalization capabilities experiments on another language database (i.e., MNIST English Digits) were conducted, which showed the superiority of the proposed DCNN model

    Alzheimer’s Dementia Recognition Through Spontaneous Speech

    Get PDF

    Convolutional Spiking Neural Networks for Detecting Anticipatory Brain Potentials Using Electroencephalogram

    Full text link
    Spiking neural networks (SNNs) are receiving increased attention as a means to develop "biologically plausible" machine learning models. These networks mimic synaptic connections in the human brain and produce spike trains, which can be approximated by binary values, precluding high computational cost with floating-point arithmetic circuits. Recently, the addition of convolutional layers to combine the feature extraction power of convolutional networks with the computational efficiency of SNNs has been introduced. In this paper, the feasibility of using a convolutional spiking neural network (CSNN) as a classifier to detect anticipatory slow cortical potentials related to braking intention in human participants using an electroencephalogram (EEG) was studied. The EEG data was collected during an experiment wherein participants operated a remote controlled vehicle on a testbed designed to simulate an urban environment. Participants were alerted to an incoming braking event via an audio countdown to elicit anticipatory potentials that were then measured using an EEG. The CSNN's performance was compared to a standard convolutional neural network (CNN) and three graph neural networks (GNNs) via 10-fold cross-validation. The results showed that the CSNN outperformed the other neural networks.Comment: 14 pages, 6 figures, Scientific Reports submissio

    Advancing Perception in Artificial Intelligence through Principles of Cognitive Science

    Full text link
    Although artificial intelligence (AI) has achieved many feats at a rapid pace, there still exist open problems and fundamental shortcomings related to performance and resource efficiency. Since AI researchers benchmark a significant proportion of performance standards through human intelligence, cognitive sciences-inspired AI is a promising domain of research. Studying cognitive science can provide a fresh perspective to building fundamental blocks in AI research, which can lead to improved performance and efficiency. In this review paper, we focus on the cognitive functions of perception, which is the process of taking signals from one's surroundings as input, and processing them to understand the environment. Particularly, we study and compare its various processes through the lens of both cognitive sciences and AI. Through this study, we review all current major theories from various sub-disciplines of cognitive science (specifically neuroscience, psychology and linguistics), and draw parallels with theories and techniques from current practices in AI. We, hence, present a detailed collection of methods in AI for researchers to build AI systems inspired by cognitive science. Further, through the process of reviewing the state of cognitive-inspired AI, we point out many gaps in the current state of AI (with respect to the performance of the human brain), and hence present potential directions for researchers to develop better perception systems in AI.Comment: Summary: a detailed review of the current state of perception models through the lens of cognitive A
    • …
    corecore