110 research outputs found

    Meta-coexpression conservation analysis of microarray data: a "subset" approach provides insight into brain-derived neurotrophic factor regulation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Alterations in brain-derived neurotrophic factor (<it>BDNF</it>) gene expression contribute to serious pathologies such as depression, epilepsy, cancer, Alzheimer's, Huntington and Parkinson's disease. Therefore, exploring the mechanisms of <it>BDNF </it>regulation represents a great clinical importance. Studying <it>BDNF </it>expression remains difficult due to its multiple neural activity-dependent and tissue-specific promoters. Thus, microarray data could provide insight into the regulation of this complex gene. Conventional microarray co-expression analysis is usually carried out by merging the datasets or by confirming the re-occurrence of significant correlations across datasets. However, co-expression patterns can be different under various conditions that are represented by subsets in a dataset. Therefore, assessing co-expression by measuring correlation coefficient across merged samples of a dataset or by merging datasets might not capture all correlation patterns.</p> <p>Results</p> <p>In our study, we performed meta-coexpression analysis of publicly available microarray data using <it>BDNF </it>as a "guide-gene" introducing a "subset" approach. The key steps of the analysis included: dividing datasets into subsets with biologically meaningful sample content (e.g. tissue, gender or disease state subsets); analyzing co-expression with the <it>BDNF </it>gene in each subset separately; and confirming co- expression links across subsets. Finally, we analyzed conservation in co-expression with <it>BDNF </it>between human, mouse and rat, and sought for conserved over-represented TFBSs in <it>BDNF </it>and BDNF-correlated genes. Correlated genes discovered in this study regulate nervous system development, and are associated with various types of cancer and neurological disorders. Also, several transcription factor identified here have been reported to regulate <it>BDNF </it>expression <it>in vitro </it>and <it>in vivo</it>.</p> <p>Conclusion</p> <p>The study demonstrates the potential of the "subset" approach in co-expression conservation analysis for studying the regulation of single genes and proposes novel regulators of <it>BDNF </it>gene expression.</p

    Detecting selective sweeps in natural populations of Drosophila melanogaster

    Get PDF
    The goal of this study was to gain a deeper understanding of the selective sweep models and the statistical and computational methods that disentangle selective sweeps from neutrality. In the Introduction of the thesis I review the literature on the main approaches that have been developed in the last decade to separate selective sweeps from neutral demographic scenarios. Methods on complete and incomplete selective sweeps are reviewed as well as selective sweeps on structured populations. Further, I analyze the effects of past demographic events, especially bottlenecks, on the genealogies of a sample. Finally, I demonstrate that the ineffectiveness of separating selective sweeps from bottlenecks stems from the lack of robust statistics, and most importantly from the similar genealogies that bottlenecks and selective sweeps may generate locally on a recombining chromosome. In the first chapter I introduce a method that combines statistical tests in a machine learning framework, in order to disentangle selective sweeps from neutral demographic scenarios. The approach uses support vectormachines to learn examples from neutral scenarios and scenarios with selection. I demonstrate that the novel approach outperforms previously published approaches for a variety of demographic scenarios. The main reason for the performance difference is the usage of the scenarios with selection, that are not analyzed by classical statistical methods. In the second chapter of the thesis I present an application of the methods on detecting a selective sweep in the African population of D. melanogaster. Demographic history and ascertainment bias schemes have been taken into account. Results pinpoint to the HDAC6 gene as a target of recent positive selection. This study demonstrates the variable threshold approach, which remedies the tendency of some neutrality tests to detect selective sweeps at the edges of the region of interest. In the third chapter I present the results of the analysis of selective sweeps in multi-locus models. I assume that a phenotypic trait evolves under stabilizing or directional selection. In contrast to the classical models of selective sweeps, the evolutionary trajectory of an allele that affects the trait might belong to one of the three categories: it either fixes, disappears or remains polymorphic. Thereafter, I analyze the properties of coalescent trees and neutral polymorphism patterns that are generated from each of the three categories. I show that for the majority of simulated datasets selection cannot be detected unless the trajectory is either fixed or close to fixation

    SweepNet:A Lightweight CNN Architecture for the Classification of Adaptive Genomic Regions

    Get PDF
    The accurate identification of positive selection in genomes represents a challenge in the field of population genomics. Several recent approaches have cast this problem as an image classification task and employed Convolutional Neural Networks (CNNs). However, limited efforts have been placed on discovering a practical CNN architecture that can classify images visualizing raw genomic data in the presence of population bottlenecks, migration, and recombination hotspots, factors that typically confound the identification and localization of adaptive genomic regions. In this work, we present SweepNet, a new CNN architecture that resulted from a thorough hyper-parameter-based architecture exploration process. SweepNet has a higher training efficiency than existing CNNs and requires considerably less epochs to achieve high validation accuracy. Furthermore, it performs consistently better in the presence of confounding factors, generating models with higher validation accuracy and lower top-1 error rate for distinguishing between neutrality and a selective sweep. Unlike existing network architectures, the number of trainable parameters of SweepNet remains constant irrespective of the sample size and number of Single Nucleotide Polymorphisms, which reduces the risk of overfitting and leads to more efficient training for large datasets. Our SweepNet implementation is available for download at: https://github.com/Zhaohq96/SweepNet

    Genome-wide scans for selective sweeps using convolutional neural networks

    Get PDF
    Motivation: Recent methods for selective sweep detection cast the problem as a classification task and use summary statistics as features to capture region characteristics that are indicative of a selective sweep, thereby being sensitive to confounding factors. Furthermore, they are not designed to perform whole-genome scans or to estimate the extent of the genomic region that was affected by positive selection; both are required for identifying candidate genes and the time and strength of selection.Results: We present ASDEC (https://github.com/pephco/ASDEC), a neural-network-based framework that can scan whole genomes for selective sweeps. ASDEC achieves similar classification performance to other convolutional neural network-based classifiers that rely on summary statistics, but it is trained 10Ă— faster and classifies genomic regions 5Ă— faster by inferring region characteristics from the raw sequence data directly. Deploying ASDEC for genomic scans achieved up to 15.2Ă— higher sensitivity, 19.4Ă— higher success rates, and 4Ă— higher detection accuracy than state-of-the-art methods. We used ASDEC to scan human chromosome 1 of the Yoruba population (1000Genomes project), identifying nine known candidate genes

    Balancing selection on genomic deletion polymorphisms in humans

    Get PDF
    A key question in biology is why genomic variation persists in a population for extended periods. Recent studies have identified examples of genomic deletions that have remained polymorphic in the human lineage for hundreds of millennia, ostensibly owing to balancing selection. Nevertheless, genome-wide investigation of ancient and possibly adaptive deletions remains imperative. Here, we demonstrate an excess of polymorphisms in present-day humans that predate the modern human-Neanderthal split (ancient polymorphisms), which cannot be explained solely by selectively neutral scenarios. We analyze the adaptive mechanisms that underlie this excess in deletion polymorphisms. Using a previously published measure of balancing selection, we show that this excess of ancient deletions is largely owing to balancing selection. Based on the absence of signatures of overdominance, we conclude that it is a rare mode of balancing selection among ancient deletions. Instead, more complex scenarios involving spatially and temporally variable selective pressures are likely more common mechanisms. Our results suggest that balancing selection resulted in ancient deletions harboring disproportionately more exonic variants with GWAS associations. We further found that ancient deletions are significantly enriched for traits related to metabolism and immunity. As a by-product of our analysis, we show that deletions are, on average, more deleterious than single-nucleotide variants. We can now argue that not only is a vast majority of common variants shared among human populations, but a considerable portion of biologically relevant variants has been segregating among our ancestors for hundreds of thousands, if not millions, of years

    Federated Learning for Early Dropout Prediction on Healthy Ageing Applications

    Full text link
    The provision of social care applications is crucial for elderly people to improve their quality of life and enables operators to provide early interventions. Accurate predictions of user dropouts in healthy ageing applications are essential since they are directly related to individual health statuses. Machine Learning (ML) algorithms have enabled highly accurate predictions, outperforming traditional statistical methods that struggle to cope with individual patterns. However, ML requires a substantial amount of data for training, which is challenging due to the presence of personal identifiable information (PII) and the fragmentation posed by regulations. In this paper, we present a federated machine learning (FML) approach that minimizes privacy concerns and enables distributed training, without transferring individual data. We employ collaborative training by considering individuals and organizations under FML, which models both cross-device and cross-silo learning scenarios. Our approach is evaluated on a real-world dataset with non-independent and identically distributed (non-iid) data among clients, class imbalance and label ambiguity. Our results show that data selection and class imbalance handling techniques significantly improve the predictive accuracy of models trained under FML, demonstrating comparable or superior predictive performance than traditional ML models

    Federated Learning for 5G Base Station Traffic Forecasting

    Full text link
    Mobile traffic prediction is of great importance on the path of enabling 5G mobile networks to perform smart and efficient infrastructure planning and management. However, available data are limited to base station logging information. Hence, training methods for generating high-quality predictions that can generalize to new observations on different parties are in demand. Traditional approaches require collecting measurements from different base stations and sending them to a central entity, followed by performing machine learning operations using the received data. The dissemination of local observations raises privacy, confidentiality, and performance concerns, hindering the applicability of machine learning techniques. Various distributed learning methods have been proposed to address this issue, but their application to traffic prediction has yet to be explored. In this work, we study the effectiveness of federated learning applied to raw base station aggregated LTE data for time-series forecasting. We evaluate one-step predictions using 5 different neural network architectures trained with a federated setting on non-iid data. The presented algorithms have been submitted to the Global Federated Traffic Prediction for 5G and Beyond Challenge. Our results show that the learning architectures adapted to the federated setting achieve equivalent prediction error to the centralized setting, pre-processing techniques on base stations lead to higher forecasting accuracy, while state-of-the-art aggregators do not outperform simple approaches
    • …
    corecore