107 research outputs found

    Meta-coexpression conservation analysis of microarray data: a "subset" approach provides insight into brain-derived neurotrophic factor regulation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Alterations in brain-derived neurotrophic factor (<it>BDNF</it>) gene expression contribute to serious pathologies such as depression, epilepsy, cancer, Alzheimer's, Huntington and Parkinson's disease. Therefore, exploring the mechanisms of <it>BDNF </it>regulation represents a great clinical importance. Studying <it>BDNF </it>expression remains difficult due to its multiple neural activity-dependent and tissue-specific promoters. Thus, microarray data could provide insight into the regulation of this complex gene. Conventional microarray co-expression analysis is usually carried out by merging the datasets or by confirming the re-occurrence of significant correlations across datasets. However, co-expression patterns can be different under various conditions that are represented by subsets in a dataset. Therefore, assessing co-expression by measuring correlation coefficient across merged samples of a dataset or by merging datasets might not capture all correlation patterns.</p> <p>Results</p> <p>In our study, we performed meta-coexpression analysis of publicly available microarray data using <it>BDNF </it>as a "guide-gene" introducing a "subset" approach. The key steps of the analysis included: dividing datasets into subsets with biologically meaningful sample content (e.g. tissue, gender or disease state subsets); analyzing co-expression with the <it>BDNF </it>gene in each subset separately; and confirming co- expression links across subsets. Finally, we analyzed conservation in co-expression with <it>BDNF </it>between human, mouse and rat, and sought for conserved over-represented TFBSs in <it>BDNF </it>and BDNF-correlated genes. Correlated genes discovered in this study regulate nervous system development, and are associated with various types of cancer and neurological disorders. Also, several transcription factor identified here have been reported to regulate <it>BDNF </it>expression <it>in vitro </it>and <it>in vivo</it>.</p> <p>Conclusion</p> <p>The study demonstrates the potential of the "subset" approach in co-expression conservation analysis for studying the regulation of single genes and proposes novel regulators of <it>BDNF </it>gene expression.</p

    Detecting selective sweeps in natural populations of Drosophila melanogaster

    Get PDF
    The goal of this study was to gain a deeper understanding of the selective sweep models and the statistical and computational methods that disentangle selective sweeps from neutrality. In the Introduction of the thesis I review the literature on the main approaches that have been developed in the last decade to separate selective sweeps from neutral demographic scenarios. Methods on complete and incomplete selective sweeps are reviewed as well as selective sweeps on structured populations. Further, I analyze the effects of past demographic events, especially bottlenecks, on the genealogies of a sample. Finally, I demonstrate that the ineffectiveness of separating selective sweeps from bottlenecks stems from the lack of robust statistics, and most importantly from the similar genealogies that bottlenecks and selective sweeps may generate locally on a recombining chromosome. In the first chapter I introduce a method that combines statistical tests in a machine learning framework, in order to disentangle selective sweeps from neutral demographic scenarios. The approach uses support vectormachines to learn examples from neutral scenarios and scenarios with selection. I demonstrate that the novel approach outperforms previously published approaches for a variety of demographic scenarios. The main reason for the performance difference is the usage of the scenarios with selection, that are not analyzed by classical statistical methods. In the second chapter of the thesis I present an application of the methods on detecting a selective sweep in the African population of D. melanogaster. Demographic history and ascertainment bias schemes have been taken into account. Results pinpoint to the HDAC6 gene as a target of recent positive selection. This study demonstrates the variable threshold approach, which remedies the tendency of some neutrality tests to detect selective sweeps at the edges of the region of interest. In the third chapter I present the results of the analysis of selective sweeps in multi-locus models. I assume that a phenotypic trait evolves under stabilizing or directional selection. In contrast to the classical models of selective sweeps, the evolutionary trajectory of an allele that affects the trait might belong to one of the three categories: it either fixes, disappears or remains polymorphic. Thereafter, I analyze the properties of coalescent trees and neutral polymorphism patterns that are generated from each of the three categories. I show that for the majority of simulated datasets selection cannot be detected unless the trajectory is either fixed or close to fixation

    Balancing selection on genomic deletion polymorphisms in humans

    Get PDF
    A key question in biology is why genomic variation persists in a population for extended periods. Recent studies have identified examples of genomic deletions that have remained polymorphic in the human lineage for hundreds of millennia, ostensibly owing to balancing selection. Nevertheless, genome-wide investigation of ancient and possibly adaptive deletions remains imperative. Here, we demonstrate an excess of polymorphisms in present-day humans that predate the modern human-Neanderthal split (ancient polymorphisms), which cannot be explained solely by selectively neutral scenarios. We analyze the adaptive mechanisms that underlie this excess in deletion polymorphisms. Using a previously published measure of balancing selection, we show that this excess of ancient deletions is largely owing to balancing selection. Based on the absence of signatures of overdominance, we conclude that it is a rare mode of balancing selection among ancient deletions. Instead, more complex scenarios involving spatially and temporally variable selective pressures are likely more common mechanisms. Our results suggest that balancing selection resulted in ancient deletions harboring disproportionately more exonic variants with GWAS associations. We further found that ancient deletions are significantly enriched for traits related to metabolism and immunity. As a by-product of our analysis, we show that deletions are, on average, more deleterious than single-nucleotide variants. We can now argue that not only is a vast majority of common variants shared among human populations, but a considerable portion of biologically relevant variants has been segregating among our ancestors for hundreds of thousands, if not millions, of years

    Federated Learning for Early Dropout Prediction on Healthy Ageing Applications

    Full text link
    The provision of social care applications is crucial for elderly people to improve their quality of life and enables operators to provide early interventions. Accurate predictions of user dropouts in healthy ageing applications are essential since they are directly related to individual health statuses. Machine Learning (ML) algorithms have enabled highly accurate predictions, outperforming traditional statistical methods that struggle to cope with individual patterns. However, ML requires a substantial amount of data for training, which is challenging due to the presence of personal identifiable information (PII) and the fragmentation posed by regulations. In this paper, we present a federated machine learning (FML) approach that minimizes privacy concerns and enables distributed training, without transferring individual data. We employ collaborative training by considering individuals and organizations under FML, which models both cross-device and cross-silo learning scenarios. Our approach is evaluated on a real-world dataset with non-independent and identically distributed (non-iid) data among clients, class imbalance and label ambiguity. Our results show that data selection and class imbalance handling techniques significantly improve the predictive accuracy of models trained under FML, demonstrating comparable or superior predictive performance than traditional ML models

    Federated Learning for 5G Base Station Traffic Forecasting

    Full text link
    Mobile traffic prediction is of great importance on the path of enabling 5G mobile networks to perform smart and efficient infrastructure planning and management. However, available data are limited to base station logging information. Hence, training methods for generating high-quality predictions that can generalize to new observations on different parties are in demand. Traditional approaches require collecting measurements from different base stations and sending them to a central entity, followed by performing machine learning operations using the received data. The dissemination of local observations raises privacy, confidentiality, and performance concerns, hindering the applicability of machine learning techniques. Various distributed learning methods have been proposed to address this issue, but their application to traffic prediction has yet to be explored. In this work, we study the effectiveness of federated learning applied to raw base station aggregated LTE data for time-series forecasting. We evaluate one-step predictions using 5 different neural network architectures trained with a federated setting on non-iid data. The presented algorithms have been submitted to the Global Federated Traffic Prediction for 5G and Beyond Challenge. Our results show that the learning architectures adapted to the federated setting achieve equivalent prediction error to the centralized setting, pre-processing techniques on base stations lead to higher forecasting accuracy, while state-of-the-art aggregators do not outperform simple approaches

    Intelligent Client Selection for Federated Learning using Cellular Automata

    Full text link
    Federated Learning (FL) has emerged as a promising solution for privacy-enhancement and latency minimization in various real-world applications, such as transportation, communications, and healthcare. FL endeavors to bring Machine Learning (ML) down to the edge by harnessing data from million of devices and IoT sensors, thus enabling rapid responses to dynamic environments and yielding highly personalized results. However, the increased amount of sensors across diverse applications poses challenges in terms of communication and resource allocation, hindering the participation of all devices in the federated process and prompting the need for effective FL client selection. To address this issue, we propose Cellular Automaton-based Client Selection (CA-CS), a novel client selection algorithm, which leverages Cellular Automata (CA) as models to effectively capture spatio-temporal changes in a fast-evolving environment. CA-CS considers the computational resources and communication capacity of each participating client, while also accounting for inter-client interactions between neighbors during the client selection process, enabling intelligent client selection for online FL processes on data streams that closely resemble real-world scenarios. In this paper, we present a thorough evaluation of the proposed CA-CS algorithm using MNIST and CIFAR-10 datasets, while making a direct comparison against a uniformly random client selection scheme. Our results demonstrate that CA-CS achieves comparable accuracy to the random selection approach, while effectively avoiding high-latency clients.Comment: 18th IEEE International Workshop on Cellular Nanoscale Networks and their Application

    ChromatoGate: A Tool for Detecting Base Mis-Calls in Multiple Sequence Alignments by Semi-Automatic Chromatogram Inspection

    Get PDF
    Automated DNA sequencers generate chromatograms that contain raw sequencing data. They also generate data that translates the chromatograms into molecular sequences of A, C, G, T, or N (undetermined) characters. Since chromatogram translation programs frequently introduce errors, a manual inspection of the generated sequence data is required. As sequence numbers and lengths increase, visual inspection and manual correction of chromatograms and corresponding sequences on a perpeak and per-nucleotide basis becomes an error-prone, time-consuming, and tedious process. Here, we introduce ChromatoGate (CG), an open-source software that accelerates and partially automates the inspection of chromatograms and the detection of sequencing errors for bidirectional sequencing runs. To provide users full control over the error correction process, a fully automated error correction algorithm has not been implemented. Initially, the program scans a given multiple sequence alignment (MSA) for potential sequencing errors, assuming that each polymorphic site in the alignment may be attributed to a sequencing error with a certain probability. The guided MSA assembly procedure in ChromatoGate detects chromatogram peaks of all characters in an alignment that lead to polymorphic sites, given a user-defined threshold. The threshold value represents the sensitivity of the sequencing error detection mechanism. After this pre-filtering, the user only needs to inspect a small number of peaks in every chromatogram to correct sequencing errors. Finally, we show that correcting sequencing errors is important, because population genetic and phylogenetic inferences can be misled by MSAs with uncorrected mis-calls. Our experiments indicate that estimates of population mutation rates can be affected two- to three-fold by uncorrected errors
    • …
    corecore