6 research outputs found

    Deep learning in mining biological data

    Get PDF
    Recent technological advancements in data acquisition tools allowed life scientists to acquire multimodal data from different biological application domains. Categorised in three broad types (i.e., images, signals, and sequences), these data are huge in amount and complex in nature. Mining such enormous amount of data for pattern recognition is a big challenge and requires sophisticated data intensive machine learning techniques. Artificial neural network based learning systems are well known for their pattern recognition capabilities and lately their deep architectures - known as deep learning (DL) - have been successfully applied to solve many complex pattern recognition problems. To investigate how DL - especially its different architectures - has contributed and utilised in the mining of biological data pertaining to those three types, a meta analysis has been performed and the resulting resources have been critically analysed. Focusing on the use of DL to analyse patterns in data from diverse biological domains, this work investigates different DL architectures' applications to these data. This is followed by an exploration of available open access data sources pertaining to the three data types along with popular open source DL tools applicable to these data. Also, comparative investigations of these tools from qualitative, quantitative, and benchmarking perspectives are provided. Finally, some open research challenges in using DL to mine biological data are outlined and a number of possible future perspectives are put forward

    Developing and Validating Open Source Tools for Advanced Neuroimaging Research

    Get PDF
    Almost all scientific research relies on software. This is particularly true for research that uses neuroimaging technologies, such as functional magnetic resonance imaging (fMRI). These technologies generate massive amounts of data per participant, which must be processed and analyzed using specialized software. A large portion of these tools are developed by teams of researchers, rather than trained software developers. In this kind of ecosystem, where the majority of software creators are scientists, rather than trained programmers, it becomes more important than ever to rely on community-based development, which may explain why most of this software is open source. It is in the development of this kind of research-oriented, open source software that I have focused much of my graduate training, as is reflected in this dissertation. One software package I have helped to develop and maintain is tedana, a Python library for denoising multi-echo fMRI data. In chapter 2, I describe this library in a short, published software paper. Another library I maintain as the primary developer is NiMARE, a Python library for performing neuroimaging meta-analyses and derivative analyses, such as automated annotation and functional decoding. In chapter 3, I present NiMARE in a hybrid software paper with embedded tutorial code exhibiting the functionality of the library. This paper is currently hosted as a Jupyter book that combines narrative content and code snippets that can be executed online. In addition to research software development, I have focused my graduate work on performing reproducible, open fMRI research. To that end, chapter 4 is a repli- cation and extension of a recent paper on multi-echo fMRI denoising methods Power et al. (2018a). This replication was organized as a registered report, in which the introduction and methods were submitted for peer review before the analyses were performed. Finally, chapter 5 is a conclusion to the dissertation, in which I reflect on the work I have done and the skills I have developed throughout my training

    Decoding Time-Varying Functional Connectivity Networks via Linear Graph Embedding Methods

    Get PDF
    An exciting avenue of neuroscientific research involves quantifying the time-varying properties of functional connectivity networks. As a result, many methods have been proposed to estimate the dynamic properties of such networks. However, one of the challenges associated with such methods involves the interpretation and visualization of high-dimensional, dynamic networks. In this work, we employ graph embedding algorithms to provide low-dimensional vector representations of networks, thus facilitating traditional objectives such as visualization, interpretation and classification. We focus on linear graph embedding methods based on principal component analysis and regularized linear discriminant analysis. The proposed graph embedding methods are validated through a series of simulations and applied to fMRI data from the Human Connectome Project

    Deep Interpretability Methods for Neuroimaging

    Get PDF
    Brain dynamics are highly complex and yet hold the key to understanding brain function and dysfunction. The dynamics captured by resting-state functional magnetic resonance imaging data are noisy, high-dimensional, and not readily interpretable. The typical approach of reducing this data to low-dimensional features and focusing on the most predictive features comes with strong assumptions and can miss essential aspects of the underlying dynamics. In contrast, introspection of discriminatively trained deep learning models may uncover disorder-relevant elements of the signal at the level of individual time points and spatial locations. Nevertheless, the difficulty of reliable training on high-dimensional but small-sample datasets and the unclear relevance of the resulting predictive markers prevent the widespread use of deep learning in functional neuroimaging. In this dissertation, we address these challenges by proposing a deep learning framework to learn from high-dimensional dynamical data while maintaining stable, ecologically valid interpretations. The developed model is pre-trainable and alleviates the need to collect an enormous amount of neuroimaging samples to achieve optimal training. We also provide a quantitative validation module, Retain and Retrain (RAR), that can objectively verify the higher predictability of the dynamics learned by the model. Results successfully demonstrate that the proposed framework enables learning the fMRI dynamics directly from small data and capturing compact, stable interpretations of features predictive of function and dysfunction. We also comprehensively reviewed deep interpretability literature in the neuroimaging domain. Our analysis reveals the ongoing trend of interpretability practices in neuroimaging studies and identifies the gaps that should be addressed for effective human-machine collaboration in this domain. This dissertation also proposed a post hoc interpretability method, Geometrically Guided Integrated Gradients (GGIG), that leverages geometric properties of the functional space as learned by a deep learning model. With extensive experiments and quantitative validation on MNIST and ImageNet datasets, we demonstrate that GGIG outperforms integrated gradients (IG), which is considered to be a popular interpretability method in the literature. As GGIG is able to identify the contours of the discriminative regions in the input space, GGIG may be useful in various medical imaging tasks where fine-grained localization as an explanation is beneficial

    Text-mining the neurosynth corpus using deep boltzmann machines

    Get PDF
    Large-scale automated meta-analysis of neuroimaging data has recently established itself as an important tool in advancing our understanding of human brain function. This research has been pioneered by NeuroSynth, a database collecting both brain activation coordinates and associated text across a large cohort of neuroimaging research papers. One of the fundamental aspects of such meta-analysis is text-mining. To date, word counts and more sophisticated methods such as Latent Dirichlet Allocation have been proposed. In this work we present an unsupervised study of the NeuroSynth text corpus using Deep Boltzmann Machines (DBMs). The use of DBMs yields several advantages over the aforementioned methods, principal among which is the fact that it yields both word and document embeddings in a high-dimensional vector space. Such embeddings serve to facilitate the use of traditional machine learning techniques on the text corpus. The proposed DBM model is shown to learn embeddings with a clear semantic structure.Comment: 4 pages, 1 figur
    corecore