6,178 research outputs found

    NOVEL APPLICATIONS OF MACHINE LEARNING IN BIOINFORMATICS

    Get PDF
    Technological advances in next-generation sequencing and biomedical imaging have led to a rapid increase in biomedical data dimension and acquisition rate, which is challenging the conventional data analysis strategies. Modern machine learning techniques promise to leverage large data sets for finding hidden patterns within them, and for making accurate predictions. This dissertation aims to design novel machine learning-based models to transform biomedical big data into valuable biological insights. The research presented in this dissertation focuses on three bioinformatics domains: splice junction classification, gene regulatory network reconstruction, and lesion detection in mammograms. A critical step in defining gene structures and mRNA transcript variants is to accurately identify splice junctions. In the first work, we built the first deep learning-based splice junction classifier, DeepSplice. It outperforms the state-of-the-art classification tools in terms of both classification accuracy and computational efficiency. To uncover transcription factors governing metabolic reprogramming in non-small-cell lung cancer patients, we developed TFmeta, a machine learning approach to reconstruct relationships between transcription factors and their target genes in the second work. Our approach achieves the best performance on benchmark data sets. In the third work, we designed deep learning-based architectures to perform lesion detection in both 2D and 3D whole mammogram images

    Persistent Homology Tools for Image Analysis

    Get PDF
    Topological Data Analysis (TDA) is a new field of mathematics emerged rapidly since the first decade of the century from various works of algebraic topology and geometry. The goal of TDA and its main tool of persistent homology (PH) is to provide topological insight into complex and high dimensional datasets. We take this premise onboard to get more topological insight from digital image analysis and quantify tiny low-level distortion that are undetectable except possibly by highly trained persons. Such image distortion could be caused intentionally (e.g. by morphing and steganography) or naturally in abnormal human tissue/organ scan images as a result of onset of cancer or other diseases. The main objective of this thesis is to design new image analysis tools based on persistent homological invariants representing simplicial complexes on sets of pixel landmarks over a sequence of distance resolutions. We first start by proposing innovative automatic techniques to select image pixel landmarks to build a variety of simplicial topologies from a single image. Effectiveness of each image landmark selection demonstrated by testing on different image tampering problems such as morphed face detection, steganalysis and breast tumour detection. Vietoris-Rips simplicial complexes constructed based on the image landmarks at an increasing distance threshold and topological (homological) features computed at each threshold and summarized in a form known as persistent barcodes. We vectorise the space of persistent barcodes using a technique known as persistent binning where we demonstrated the strength of it for various image analysis purposes. Different machine learning approaches are adopted to develop automatic detection of tiny texture distortion in many image analysis applications. Homological invariants used in this thesis are the 0 and 1 dimensional Betti numbers. We developed an innovative approach to design persistent homology (PH) based algorithms for automatic detection of the above described types of image distortion. In particular, we developed the first PH-detector of morphing attacks on passport face biometric images. We shall demonstrate significant accuracy of 2 such morph detection algorithms with 4 types of automatically extracted image landmarks: Local Binary patterns (LBP), 8-neighbour super-pixels (8NSP), Radial-LBP (R-LBP) and centre-symmetric LBP (CS-LBP). Using any of these techniques yields several persistent barcodes that summarise persistent topological features that help gaining insights into complex hidden structures not amenable by other image analysis methods. We shall also demonstrate significant success of a similarly developed PH-based universal steganalysis tool capable for the detection of secret messages hidden inside digital images. We also argue through a pilot study that building PH records from digital images can differentiate breast malignant tumours from benign tumours using digital mammographic images. The research presented in this thesis creates new opportunities to build real applications based on TDA and demonstrate many research challenges in a variety of image processing/analysis tasks. For example, we describe a TDA-based exemplar image inpainting technique (TEBI), superior to existing exemplar algorithm, for the reconstruction of missing image regions

    Machine learning as an online diagnostic tool for proton exchange membrane fuel cells

    Get PDF
    Proton exchange membrane fuel cells are considered a promising power supply system with high efficiency and zero emissions. They typically work within a relatively narrow range of temperature and humidity to achieve optimal performance; however, this makes the system difficult to control, leading to faults and accelerated degradation. Two main approaches can be used for diagnosis, limited data input which provides an unintrusive, rapid but limited analysis, or advanced characterisation that provides a more accurate diagnosis but often requires invasive or slow measurements. To provide an accurate diagnosis with rapid data acquisition, machine learning methods have shown great potential. However, there is a broad approach to the diagnostic algorithms and signals used in the field. This article provides a critical view of the current approaches and suggests recommendations for future methodologies of machine learning in fuel cell diagnostic applications

    Opportunities and obstacles for deep learning in biology and medicine

    Get PDF
    Deep learning describes a class of machine learning algorithms that are capable of combining raw inputs into layers of intermediate features. These algorithms have recently shown impressive results across a variety of domains. Biology and medicine are data-rich disciplines, but the data are complex and often ill-understood. Hence, deep learning techniques may be particularly well suited to solve problems of these fields. We examine applications of deep learning to a variety of biomedical problems-patient classification, fundamental biological processes and treatment of patients-and discuss whether deep learning will be able to transform these tasks or if the biomedical sphere poses unique challenges. Following from an extensive literature review, we find that deep learning has yet to revolutionize biomedicine or definitively resolve any of the most pressing challenges in the field, but promising advances have been made on the prior state of the art. Even though improvements over previous baselines have been modest in general, the recent progress indicates that deep learning methods will provide valuable means for speeding up or aiding human investigation. Though progress has been made linking a specific neural network\u27s prediction to input features, understanding how users should interpret these models to make testable hypotheses about the system under study remains an open challenge. Furthermore, the limited amount of labelled data for training presents problems in some domains, as do legal and privacy constraints on work with sensitive health records. Nonetheless, we foresee deep learning enabling changes at both bench and bedside with the potential to transform several areas of biology and medicine

    HnRNP K mislocalisation and dysfunction in neurodegenerative disease and ageing

    Get PDF
    Heterogeneous nuclear ribonucleoproteins (hnRNPs) are a diverse, multi-functional family of RNA-binding proteins. Many such proteins, including TDP-43 and FUS, have been strongly implicated in the pathogenesis of frontotemporal lobar degeneration (FTLD) and amyotrophic lateral sclerosis (ALS). By contrast hnRNP K, the focus of this thesis, has been underexplored in the context of neurodegenerative disease. The first work to be described here involves a comprehensive pathological assessment of hnRNP K protein’s neuronal localisation profile in FTLD, ALS and control brain tissue. Following pathological examination, hnRNP K mislocalisation from the nucleus to the cytoplasm within pyramidal neurons of the cortex was identified as a novel neuropathological feature that is associated with both neurodegenerative disease and ageing. Double immunofluorescence was used to confirm these neurons were anatomically distinct from those harbouring the classical TDP-43 or Tau proteinaceous inclusions used in the pathological diagnosis of FTLD. Nuclear loss and mislocalisation of hnRNP K to the cytoplasm was then identified to also occur in two further neuronal cell types within the dentate nucleus of the cerebellum and the CA4 region of the hippocampus. As with pyramidal neurons, similar associations were identified between disease, age and hnRNP K mislocalisation in neurons of the dentate nucleus. Hence, neuronal mislocalisation of hnRNP K across the brain has potentially broad relevance to dementia and the ageing process. Almost all hnRNPs have been found to perform essential homeostatic functions in regulating appropriate target gene splicing activity. Recently, several hnRNPs have been found to have important roles in repressing the inclusion of non-conserved, so-called ‘cryptic exons’ within mature mRNA transcripts. Inclusion of cryptic exons following TDP-43 nuclear depletion and subsequent reductions in the functional levels of target transcripts and proteins is an emerging pathogenic theme of several neurodegenerative diseases including FTLD and ALS. To recapitulate the functional implications of the hnRNP K nuclear depletion that is observed in brain tissue, a hnRNP K knockdown neuronal model was developed utilising an iPSC-derived CRISPR-interference based platform. RNA-seq analysis revealed that nuclear hnRNP K protein depletion within cortical neurons is associated with the robust activation of several cryptic exon events in mRNA targets of hnRNP K as well as the upregulation of other abnormal splicing events termed ‘skiptic exons’. Several of these novel splicing events were validated molecularly using three-primer PCRs. Finally, an in situ hybridisation (ISH) based technology (BaseScope™) platform was optimised to visualise novel cryptic events in post-mortem brain tissue. The platform was used to detect a recently discovered cryptic exon within synaptic gene UNC13A and another in the insulin receptor (INSR) gene, two newly described targets of TDP-43. These events were found specifically in FTLD-TDP or ALS brains, validating it as a specific marker of TDP-43-proteinopathy. A methodological pipeline was also developed to delineate the spatial relationship between cryptic exons and associated TDP-43 pathology. Hence, providing a platform for the future detection, validation and analyses of novel cryptic exons associated with hnRNP K protein depletion in pyramidal neurons

    The Trypanosoma brucei MitoCarta and its regulation and splicing pattern during development

    Get PDF
    It has long been known that trypanosomes regulate mitochondrial biogenesis during the life cycle of the parasite; however, the mitochondrial protein inventory (MitoCarta) and its regulation remain unknown. We present a novel computational method for genome-wide prediction of mitochondrial proteins using a support vector machine-based classifier with ∼90% prediction accuracy. Using this method, we predicted the mitochondrial localization of 468 proteins with high confidence and have experimentally verified the localization of a subset of these proteins. We then applied a recently developed parallel sequencing technology to determine the expression profiles and the splicing patterns of a total of 1065 predicted MitoCarta transcripts during the development of the parasite, and showed that 435 of the transcripts significantly changed their expressions while 630 remain unchanged in any of the three life stages analyzed. Furthermore, we identified 298 alternatively splicing events, a small subset of which could lead to dual localization of the corresponding protein

    The Trypanosoma brucei MitoCarta and its regulation and splicing pattern during development

    Get PDF
    It has long been known that trypanosomes regulate mitochondrial biogenesis during the life cycle of the parasite; however, the mitochondrial protein inventory (MitoCarta) and its regulation remain unknown. We present a novel computational method for genome-wide prediction of mitochondrial proteins using a support vector machine-based classifier with ∼90% prediction accuracy. Using this method, we predicted the mitochondrial localization of 468 proteins with high confidence and have experimentally verified the localization of a subset of these proteins. We then applied a recently developed parallel sequencing technology to determine the expression profiles and the splicing patterns of a total of 1065 predicted MitoCarta transcripts during the development of the parasite, and showed that 435 of the transcripts significantly changed their expressions while 630 remain unchanged in any of the three life stages analyzed. Furthermore, we identified 298 alternatively splicing events, a small subset of which could lead to dual localization of the corresponding proteins

    The Trypanosoma \u3ci\u3ebrucei\u3c/i\u3e MitoCarta and its regulation and splicing pattern during development

    Get PDF
    It has long been known that trypanosomes regulate mitochondrial biogenesis during the life cycle of the parasite; however, the mitochondrial protein inventory (MitoCarta) and its regulation remain unknown. We present a novel computational method for genome-wide prediction of mitochondrial proteins using a support vector machine-based classifier with ~90% prediction accuracy. Using this method, we predicted the mitochondrial localization of 468 proteins with high confidence and have experimentally verified the localization of a subset of these proteins. We then applied a recently developed parallel sequencing technology to determine the expression profiles and the splicing patterns of a total of 1065 predicted MitoCarta transcripts during the development of the parasite, and showed that 435 of the transcripts significantly changed their expressions while 630 remain unchanged in any of the three life stages analyzed. Furthermore, we identified 298 alternatively splicing events, a small subset of which could lead to dual localization of the corresponding proteins
    • …
    corecore