13 research outputs found

    Adversarial Training in Affective Computing and Sentiment Analysis: Recent Advances and Perspectives

    Get PDF
    Over the past few years, adversarial training has become an extremely active research topic and has been successfully applied to various Artificial Intelligence (AI) domains. As a potentially crucial technique for the development of the next generation of emotional AI systems, we herein provide a comprehensive overview of the application of adversarial training to affective computing and sentiment analysis. Various representative adversarial training algorithms are explained and discussed accordingly, aimed at tackling diverse challenges associated with emotional AI systems. Further, we highlight a range of potential future research directions. We expect that this overview will help facilitate the development of adversarial training for affective computing and sentiment analysis in both the academic and industrial communities

    Deep learning for speech enhancement : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science at Massey University, Albany, New Zealand

    Get PDF
    Speech enhancement, aiming at improving the intelligibility and overall perceptual quality of a contaminated speech signal, is an effective way to improve speech communications. In this thesis, we propose three novel deep learning methods to improve speech enhancement performance. Firstly, we propose an adversarial latent representation learning for latent space exploration of generative adversarial network based speech enhancement. Based on adversarial feature learning, this method employs an extra encoder to learn an inverse mapping from the generated data distribution to the latent space. The encoder establishes an inner connection with the generator and contributes to latent information learning. Secondly, we propose an adversarial multi-task learning with inverse mappings method for effective speech representation. This speech enhancement method focuses on enhancing the generator's capability of speech information capture and representation learning. To implement this method, two extra networks are developed to learn the inverse mappings from the generated distribution to the input data domains. Thirdly, we propose a self-supervised learning based phone-fortified method to improve specific speech characteristics learning for speech enhancement. This method explicitly imports phonetic characteristics into a deep complex convolutional network via a contrastive predictive coding model pre-trained with self-supervised learning. The experimental results demonstrate that the proposed methods outperform previous speech enhancement methods and achieve state-of-the-art performance in terms of speech intelligibility and overall perceptual quality

    Deep Learning Methods for Classification of Gliomas and Their Molecular Subtypes, From Central Learning to Federated Learning

    Get PDF
    The most common type of brain cancer in adults are gliomas. Under the updated 2016 World Health Organization (WHO) tumor classification in central nervous system (CNS), identification of molecular subtypes of gliomas is important. For low grade gliomas (LGGs), prediction of molecular subtypes by observing magnetic resonance imaging (MRI) scans might be difficult without taking biopsy. With the development of machine learning (ML) methods such as deep learning (DL), molecular based classification methods have shown promising results from MRI scans that may assist clinicians for prognosis and deciding on a treatment strategy. However, DL requires large amount of training datasets with tumor class labels and tumor boundary annotations. Manual annotation of tumor boundary is a time consuming and expensive process.The thesis is based on the work developed in five papers on gliomas and their molecular subtypes. We propose novel methods that provide improved performance. \ua0The proposed methods consist of a multi-stream convolutional autoencoder (CAE)-based classifier, a deep convolutional generative adversarial network (DCGAN) to enlarge the training dataset, a CycleGAN to handle domain shift, a novel federated learning (FL) scheme to allow local client-based training with dataset protection, and employing bounding boxes to MRIs when tumor boundary annotations are not available.Experimental results showed that DCGAN generated MRIs have enlarged the original training dataset size and have improved the classification performance on test sets. CycleGAN showed good domain adaptation on multiple source datasets and improved the classification performance. The proposed FL scheme showed a slightly degraded performance as compare to that of central learning (CL) approach while protecting dataset privacy. Using tumor bounding boxes showed to be an alternative approach to tumor boundary annotation for tumor classification and segmentation, with a trade-off between a slight decrease in performance and saving time in manual marking by clinicians. The proposed methods may benefit the future research in bringing DL tools into clinical practice for assisting tumor diagnosis and help the decision making process

    An Information-theoretic analysis of generative adversarial networks for image restoration in physics-based vision

    Full text link
    Image restoration in physics-based vision (such as image denoising, dehazing, and deraining) are fundamental tasks in computer vision that attach great significance to the processing of visual data as well as subsequent applications in different fields. Existing methods mainly focus on exploring the physical properties and mechanisms of the imaging process, and tend to use a deconstructive idea in describing how the visual degradations (like noise, haze, and rain) are integrated with the background scenes. This idea, however, relies heavily on manually engineered features and handcrafted composition models, which can be theories only in ideal conditions or hypothetical models that may involve human bias or fail in simulating true situations in actual practices. With the progress of representation learning, generative methods, especially generative adversarial networks (GANs), are considered a more promising solution for image restoration tasks. It directly learns the restorations as end-to-end generation processes using large amounts of data without understanding their physical mechanisms, and it also allows completing missing details damaged information by involving external knowledge and generating plausible results with intelligent-level interpretation and semantics-level understanding of the input images. Nevertheless, existing studies that try to apply GAN models to image restoration tasks dose not achieve satisfactory performances compared with the traditional deconstructive methods. And there is scarcely any study or theory to explain how deep generative models work in relevant tasks. In this study, we analyzed the learning dynamics of different deep generative models based on the information bottleneck principle and propose an information-theoretic framework to explain the generative methods for image restoration tasks. In which, we study the information flow in the image restoration models and point out three sources of information involved in generating the restoration results: (i) high-level information extracted by the encoder network, (ii) low-level information from the source inputs that retained, or pass directed through the skip connections, and, (iii) external information introduced by the learned parameters of the decoder network during the generation process. Based on this theory, we pointed out that conventional GAN models may not be directly applicable to the tasks of image restoration, and we identify three key issues leading to their performance gaps in the image restoration tasks: (i) over-invested abstraction processes, (ii) inherent details loss, and (iii) imbalance optimization with vanishing gradient. We formulate these problems with corresponding theoretical analyses and provide empirical evidence to verify our hypotheses and prove the existence of these problems respectively. To address these problems, we then proposed solutions and suggestions including optimizing network structure, enhancing details extraction and accumulation with network modules, as well as replacing measures of training objectives, to improve the performances of GAN models on the image restoration tasks. Ultimately, we verify our solutions on bench-marking datasets and achieve significant improvement on the baseline models

    Automatic characterization and generation of music loops and instrument samples for electronic music production

    Get PDF
    Repurposing audio material to create new music - also known as sampling - was a foundation of electronic music and is a fundamental component of this practice. Currently, large-scale databases of audio offer vast collections of audio material for users to work with. The navigation on these databases is heavily focused on hierarchical tree directories. Consequently, sound retrieval is tiresome and often identified as an undesired interruption in the creative process. We address two fundamental methods for navigating sounds: characterization and generation. Characterizing loops and one-shots in terms of instruments or instrumentation allows for organizing unstructured collections and a faster retrieval for music-making. The generation of loops and one-shot sounds enables the creation of new sounds not present in an audio collection through interpolation or modification of the existing material. To achieve this, we employ deep-learning-based data-driven methodologies for classification and generation.Repurposing audio material to create new music - also known as sampling - was a foundation of electronic music and is a fundamental component of this practice. Currently, large-scale databases of audio offer vast collections of audio material for users to work with. The navigation on these databases is heavily focused on hierarchical tree directories. Consequently, sound retrieval is tiresome and often identified as an undesired interruption in the creative process. We address two fundamental methods for navigating sounds: characterization and generation. Characterizing loops and one-shots in terms of instruments or instrumentation allows for organizing unstructured collections and a faster retrieval for music-making. The generation of loops and one-shot sounds enables the creation of new sounds not present in an audio collection through interpolation or modification of the existing material. To achieve this, we employ deep-learning-based data-driven methodologies for classification and generation

    Multimedia Forensics

    Get PDF
    This book is open access. Media forensics has never been more relevant to societal life. Not only media content represents an ever-increasing share of the data traveling on the net and the preferred communications means for most users, it has also become integral part of most innovative applications in the digital information ecosystem that serves various sectors of society, from the entertainment, to journalism, to politics. Undoubtedly, the advances in deep learning and computational imaging contributed significantly to this outcome. The underlying technologies that drive this trend, however, also pose a profound challenge in establishing trust in what we see, hear, and read, and make media content the preferred target of malicious attacks. In this new threat landscape powered by innovative imaging technologies and sophisticated tools, based on autoencoders and generative adversarial networks, this book fills an important gap. It presents a comprehensive review of state-of-the-art forensics capabilities that relate to media attribution, integrity and authenticity verification, and counter forensics. Its content is developed to provide practitioners, researchers, photo and video enthusiasts, and students a holistic view of the field
    corecore