131 research outputs found

    pNNCLR: Stochastic Pseudo Neighborhoods for Contrastive Learning based Unsupervised Representation Learning Problems

    Full text link
    Nearest neighbor (NN) sampling provides more semantic variations than pre-defined transformations for self-supervised learning (SSL) based image recognition problems. However, its performance is restricted by the quality of the support set, which holds positive samples for the contrastive loss. In this work, we show that the quality of the support set plays a crucial role in any nearest neighbor based method for SSL. We then provide a refined baseline (pNNCLR) to the nearest neighbor based SSL approach (NNCLR). To this end, we introduce pseudo nearest neighbors (pNN) to control the quality of the support set, wherein, rather than sampling the nearest neighbors, we sample in the vicinity of hard nearest neighbors by varying the magnitude of the resultant vector and employing a stochastic sampling strategy to improve the performance. Additionally, to stabilize the effects of uncertainty in NN-based learning, we employ a smooth-weight-update approach for training the proposed network. Evaluation of the proposed method on multiple public image recognition and medical image recognition datasets shows that it performs up to 8 percent better than the baseline nearest neighbor method, and is comparable to other previously proposed SSL methods.Comment: 15 pages, 5 figure

    Communication training for paid caregivers of people with traumatic brain injury (TBI)

    Get PDF
    Communication impairments are common following traumatic brain injury (TBI) and affect the ability of a person with TBI to have successful conversations. Ylvisaker, Sellers and Edelman (1998) argue that training to improve the communication skills of an everyday support person would impact on the functioning of a person with TBI. Paid caregivers are often involved to support people with TBI, however, little attention has been focused directly on improving their interaction skills. Therefore, this thesis describes a study conducted to investigate the effect of a communication training program on improving the conversational interactions between paid caregivers and people with TBI. Participants were 10 paid caregivers randomly selected and allocated to either a control or training group. Treatment comprised a 17 hour program (across 6 weeks) that combined collaboration and elaboration conversational strategies (Ylvisaker et al., 1998) with discourse activities (Togher, McDonald, Code, & Grant, 2004). Two conversational interactions (i.e. structured and casual conversations) were videotaped pretraining, posttraining and at 6 months follow-up. The conversations were rated by two independent judges. In addition, focused interviews were conducted pre and posttraining to explore the experiences of trained paid caregivers. Training for paid caregivers improved their structured conversational interactions with people with TBI. Independent raters perceived these interactions to be more appropriate, more rewarding and more interesting compared to a control group. Trained paid caregivers made greater improvements in their ability to acknowledge and reveal the competence of a person with TBI. Improvements were maintained for six months. No significant changes were found for the casual conversational interactions. Findings from the focused interviews revealed that trained paid caregivers reported improvements in their knowledge of effective communicative strategies. This perceived improvement led to interactions reported as more enjoyable and successful and less frustrating. Trained paid caregivers also perceived increased confidence and ability to self-regulate their use of strategies when communicating with people with TBI. Aspects of the training program that were felt to promote or hinder learning were also identified. The findings from this study highlight the importance of training and educating paid caregivers to improve their communication skills. Training can have a significant impact on interactions that involve both paid caregivers and a person with TBI

    Automatic voice disorder detection using self-supervised representations

    Get PDF
    Many speech features and models, including Deep Neural Networks (DNN), are used for classification tasks between healthy and pathological speech with the Saarbruecken Voice Database (SVD). However, accuracy values of 80.71% for phrases or 82.8% for vowels /aiu/ are the highest reported for audio samples in SVD when the evaluation includes the wide amount of pathologies in the database, instead of a selection of some pathologies. This paper targets this top performance in the state-of-the-art Automatic Voice Disorder Detection (AVDD) systems. In the framework of a DNN-based AVDD system we study the capability of Self-Supervised (SS) representation learning for describing discriminative cues between healthy and pathological speech. The system processes the SS temporal sequence of features with a single feed-forward layer and Class-Token (CT) Transformer for obtaining the classification between healthy and pathological speech. Furthermore, there is evaluated a suitable data extension of the training set with out-of-domain data is also evaluated to deal with the low availability of data for using DNN-based models in voice pathology detection. Experimental results using audio samples corresponding to phrases in the SVD dataset, including all pathologies available, show classification accuracy values until 93.36%. This means that the proposed AVDD system achieved accuracy improvements of 4.1% without the training data extension, and 15.62% after the training data extension compared to the baseline system. Beyond the novelty of using SS representations for AVDD, the fact of obtaining accuracies over 90% in these conditions and using the whole set of pathologies in the SVD is a milestone for voice disorder-related research. Furthermore, the study on the amount of in-domain data in the training set related to the system performance show guidance for the data preparation stage. Lessons learned in this work suggest guidelines for taking advantage of DNN, to boost the performance in developing automatic systems for diagnosis, treatment, and monitoring of voice pathologies

    The Effects of Visuospatial Sequence Training with Children who are Deaf or Hard of Hearing

    Get PDF
    Abstract Despite advances in hearing aid and cochlear implant technologies, many children who are deaf or hard of hearing continue to lag behind typically hearing peers in language and reading abilities. Additionally, there is a high degree of variability in language outcomes among children with a hearing loss. Evidence indicates that auditory input provides a foundation not only for speech and language development but for cognitive functions such as sequence memory and learning ability. This study investigated a variety of cognitive functions with two major aims in mind: 1) to verify differences between children who are deaf or hard of hearing and typically hearing children on variety of cognitive tasks, 2) to determine if visuospatial sequencing practice would result in improvements on nontrained tasks measuring phonological memory, sequencing ability, and executive function. Thirty-two children who were deaf or hard of hearing and 29 children with typical hearing took part in this study. One pretraining and two post training sessions assessed cognitive tasks involving visuospatial short-term memory, verbal short-term memory: nonword repetition), inhibition, and visual sequence learning. Pretraining assessments revealed significant differences between the groups on verbal tasks with both auditory and visual stimuli as well as on tasks of inhibition and visual sequencing. In addition, differences were revealed on visual tasks with nonverbal stimuli. These findings suggest a general difference or delay in performance beyond the anticipated verbal delay related to a deficit in hearing acuity. The training task utilized a touch screen computer monitor that displayed sequences of circles on a 4 x 4 grid which subjects then replicated. Subjects were age matched and completed ten days of visuospatial sequencing practice in either an adaptive or control condition. Two post training assessment sessions revealed improvement on the nonword repetition task for the adaptive group following the sequencing practice. These findings suggest that visuospatial sequencing practice can lead to improvements in language abilities. Possible applications include utilizing measures of visual sequencing ability to identify deaf or hard of hearing children who may be at risk for poorer language development and as a component in predicting successful language development following cochlear implantation

    Learning Representations of Social Media Users

    Get PDF
    User representations are routinely used in recommendation systems by platform developers, targeted advertisements by marketers, and by public policy researchers to gauge public opinion across demographic groups. Computer scientists consider the problem of inferring user representations more abstractly; how does one extract a stable user representation - effective for many downstream tasks - from a medium as noisy and complicated as social media? The quality of a user representation is ultimately task-dependent (e.g. does it improve classifier performance, make more accurate recommendations in a recommendation system) but there are proxies that are less sensitive to the specific task. Is the representation predictive of latent properties such as a person's demographic features, socioeconomic class, or mental health state? Is it predictive of the user's future behavior? In this thesis, we begin by showing how user representations can be learned from multiple types of user behavior on social media. We apply several extensions of generalized canonical correlation analysis to learn these representations and evaluate them at three tasks: predicting future hashtag mentions, friending behavior, and demographic features. We then show how user features can be employed as distant supervision to improve topic model fit. Finally, we show how user features can be integrated into and improve existing classifiers in the multitask learning framework. We treat user representations - ground truth gender and mental health features - as auxiliary tasks to improve mental health state prediction. We also use distributed user representations learned in the first chapter to improve tweet-level stance classifiers, showing that distant user information can inform classification tasks at the granularity of a single message.Comment: PhD thesi

    Learning Representations of Social Media Users

    Get PDF
    User representations are routinely used in recommendation systems by platform developers, targeted advertisements by marketers, and by public policy researchers to gauge public opinion across demographic groups. Computer scientists consider the problem of inferring user representations more abstractly; how does one extract a stable user representation - effective for many downstream tasks - from a medium as noisy and complicated as social media? The quality of a user representation is ultimately task-dependent (e.g. does it improve classifier performance, make more accurate recommendations in a recommendation system) but there are proxies that are less sensitive to the specific task. Is the representation predictive of latent properties such as a person's demographic features, socioeconomic class, or mental health state? Is it predictive of the user's future behavior? In this thesis, we begin by showing how user representations can be learned from multiple types of user behavior on social media. We apply several extensions of generalized canonical correlation analysis to learn these representations and evaluate them at three tasks: predicting future hashtag mentions, friending behavior, and demographic features. We then show how user features can be employed as distant supervision to improve topic model fit. Finally, we show how user features can be integrated into and improve existing classifiers in the multitask learning framework. We treat user representations - ground truth gender and mental health features - as auxiliary tasks to improve mental health state prediction. We also use distributed user representations learned in the first chapter to improve tweet-level stance classifiers, showing that distant user information can inform classification tasks at the granularity of a single message.Comment: PhD thesi

    Speech processing with deep learning for voice-based respiratory diagnosis : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science at Massey University, Albany, New Zealand

    Get PDF
    Voice-based respiratory diagnosis research aims at automatically screening and diagnosing respiratory-related symptoms (e.g., smoking status, COVID-19 infection) from human-generated sounds (e.g., breath, cough, speech). It has the potential to be used as an objective, simple, reliable, and less time-consuming method than traditional biomedical diagnosis methods. In this thesis, we conduct one comprehensive literature review and propose three novel deep learning methods to enrich voice-based respiratory diagnosis research and improve its performance. Firstly, we conduct a comprehensive investigation of the effects of voice features on the detection of smoking status. Secondly, we propose a novel method that uses the combination of both high-level and low-level acoustic features along with deep neural networks for smoking status identification. Thirdly, we investigate various feature extraction/representation methods and propose a SincNet-based CNN method for feature representations to further improve the performance of smoking status identification. To the best of our knowledge, this is the first systemic study that applies speech processing with deep learning for voice-based smoking status identification. Moreover, we propose a novel transfer learning scheme and a task-driven feature representation method for diagnosing respiratory diseases (e.g., COVID-19) from human-generated sounds. We find those transfer learning methods using VGGish, wav2vec 2.0 and PASE+, and our proposed task-driven method Sinc-ResNet have achieved competitive performance compared with other work. The findings of this study provide a new perspective and insights for voice-based respiratory disease diagnosis. The experimental results demonstrate the effectiveness of our proposed methods and show that they have achieved better performances compared to other existing methods

    Representation Learning for Natural Language Processing

    Get PDF
    This open access book provides an overview of the recent advances in representation learning theory, algorithms and applications for natural language processing (NLP). It is divided into three parts. Part I presents the representation learning techniques for multiple language entries, including words, phrases, sentences and documents. Part II then introduces the representation techniques for those objects that are closely related to NLP, including entity-based world knowledge, sememe-based linguistic knowledge, networks, and cross-modal entries. Lastly, Part III provides open resource tools for representation learning techniques, and discusses the remaining challenges and future research directions. The theories and algorithms of representation learning presented can also benefit other related domains such as machine learning, social network analysis, semantic Web, information retrieval, data mining and computational biology. This book is intended for advanced undergraduate and graduate students, post-doctoral fellows, researchers, lecturers, and industrial engineers, as well as anyone interested in representation learning and natural language processing

    Augmented Behavioral Annotation Tools, with Application to Multimodal Datasets and Models: A Systematic Review

    Get PDF
    Annotation tools are an essential component in the creation of datasets for machine learning purposes. Annotation tools have evolved greatly since the turn of the century, and now commonly include collaborative features to divide labor efficiently, as well as automation employed to amplify human efforts. Recent developments in machine learning models, such as Transformers, allow for training upon very large and sophisticated multimodal datasets and enable generalization across domains of knowledge. These models also herald an increasing emphasis on prompt engineering to provide qualitative fine-tuning upon the model itself, adding a novel emerging layer of direct machine learning annotation. These capabilities enable machine intelligence to recognize, predict, and emulate human behavior with much greater accuracy and nuance, a noted shortfall of which have contributed to algorithmic injustice in previous techniques. However, the scale and complexity of training data required for multimodal models presents engineering challenges. Best practices for conducting annotation for large multimodal models in the most safe and ethical, yet efficient, manner have not been established. This paper presents a systematic literature review of crowd and machine learning augmented behavioral annotation methods to distill practices that may have value in multimodal implementations, cross-correlated across disciplines. Research questions were defined to provide an overview of the evolution of augmented behavioral annotation tools in the past, in relation to the present state of the art. (Contains five figures and four tables)
    • …
    corecore