107 research outputs found
Meta-coexpression conservation analysis of microarray data: a "subset" approach provides insight into brain-derived neurotrophic factor regulation
<p>Abstract</p> <p>Background</p> <p>Alterations in brain-derived neurotrophic factor (<it>BDNF</it>) gene expression contribute to serious pathologies such as depression, epilepsy, cancer, Alzheimer's, Huntington and Parkinson's disease. Therefore, exploring the mechanisms of <it>BDNF </it>regulation represents a great clinical importance. Studying <it>BDNF </it>expression remains difficult due to its multiple neural activity-dependent and tissue-specific promoters. Thus, microarray data could provide insight into the regulation of this complex gene. Conventional microarray co-expression analysis is usually carried out by merging the datasets or by confirming the re-occurrence of significant correlations across datasets. However, co-expression patterns can be different under various conditions that are represented by subsets in a dataset. Therefore, assessing co-expression by measuring correlation coefficient across merged samples of a dataset or by merging datasets might not capture all correlation patterns.</p> <p>Results</p> <p>In our study, we performed meta-coexpression analysis of publicly available microarray data using <it>BDNF </it>as a "guide-gene" introducing a "subset" approach. The key steps of the analysis included: dividing datasets into subsets with biologically meaningful sample content (e.g. tissue, gender or disease state subsets); analyzing co-expression with the <it>BDNF </it>gene in each subset separately; and confirming co- expression links across subsets. Finally, we analyzed conservation in co-expression with <it>BDNF </it>between human, mouse and rat, and sought for conserved over-represented TFBSs in <it>BDNF </it>and BDNF-correlated genes. Correlated genes discovered in this study regulate nervous system development, and are associated with various types of cancer and neurological disorders. Also, several transcription factor identified here have been reported to regulate <it>BDNF </it>expression <it>in vitro </it>and <it>in vivo</it>.</p> <p>Conclusion</p> <p>The study demonstrates the potential of the "subset" approach in co-expression conservation analysis for studying the regulation of single genes and proposes novel regulators of <it>BDNF </it>gene expression.</p
Detecting selective sweeps in natural populations of Drosophila melanogaster
The goal of this study was to gain a deeper understanding of the selective sweep models and the statistical and computational methods that disentangle selective sweeps from neutrality. In the Introduction of the thesis I review the literature on the main approaches that have been developed in the last decade to separate selective sweeps from neutral demographic scenarios. Methods on complete and incomplete selective sweeps are reviewed as well as selective sweeps on structured populations. Further, I analyze the effects of past demographic events, especially bottlenecks, on the genealogies of a sample. Finally, I demonstrate that the ineffectiveness of separating selective sweeps from bottlenecks stems from the lack of robust statistics, and most importantly from the similar genealogies that bottlenecks and selective sweeps may generate locally on a recombining chromosome.
In the first chapter I introduce a method that combines statistical tests in a machine learning framework, in order to disentangle selective sweeps from neutral demographic scenarios. The approach uses support vectormachines to learn examples from neutral scenarios and scenarios with selection. I demonstrate that the novel approach outperforms previously published approaches for a variety of demographic scenarios. The main reason for the performance difference is the usage of the scenarios with selection, that are not analyzed by classical statistical methods.
In the second chapter of the thesis I present an application of the methods on detecting a selective sweep in the African population of D. melanogaster. Demographic history and ascertainment bias schemes have been taken into account. Results pinpoint to the HDAC6 gene as a target of recent positive selection. This study demonstrates the variable threshold approach, which remedies the tendency of some neutrality tests to detect selective sweeps at the edges of the region of interest.
In the third chapter I present the results of the analysis of selective sweeps in multi-locus models. I assume that a phenotypic trait evolves under stabilizing or directional selection. In contrast to the classical models of selective sweeps, the evolutionary trajectory of an allele that affects the trait might belong to one of the three categories: it either fixes, disappears or remains polymorphic. Thereafter, I analyze the properties of coalescent trees and neutral polymorphism patterns that are generated from each of the three categories. I show that for the majority of simulated datasets selection cannot be detected unless the trajectory is either fixed or close to fixation
Balancing selection on genomic deletion polymorphisms in humans
A key question in biology is why genomic variation persists in a population for extended periods. Recent studies have identified examples of genomic deletions that have remained polymorphic in the human lineage for hundreds of millennia, ostensibly owing to balancing selection. Nevertheless, genome-wide investigation of ancient and possibly adaptive deletions remains imperative. Here, we demonstrate an excess of polymorphisms in present-day humans that predate the modern human-Neanderthal split (ancient polymorphisms), which cannot be explained solely by selectively neutral scenarios. We analyze the adaptive mechanisms that underlie this excess in deletion polymorphisms. Using a previously published measure of balancing selection, we show that this excess of ancient deletions is largely owing to balancing selection. Based on the absence of signatures of overdominance, we conclude that it is a rare mode of balancing selection among ancient deletions. Instead, more complex scenarios involving spatially and temporally variable selective pressures are likely more common mechanisms. Our results suggest that balancing selection resulted in ancient deletions harboring disproportionately more exonic variants with GWAS associations. We further found that ancient deletions are significantly enriched for traits related to metabolism and immunity. As a by-product of our analysis, we show that deletions are, on average, more deleterious than single-nucleotide variants. We can now argue that not only is a vast majority of common variants shared among human populations, but a considerable portion of biologically relevant variants has been segregating among our ancestors for hundreds of thousands, if not millions, of years
EUROPSYCHOLOGICAL ASSESSMENT OF A PATIENT DIAGNOSED WITH MAJOR DEPRESSION AND HUNTINGTON’S DISEASE
Federated Learning for Early Dropout Prediction on Healthy Ageing Applications
The provision of social care applications is crucial for elderly people to
improve their quality of life and enables operators to provide early
interventions. Accurate predictions of user dropouts in healthy ageing
applications are essential since they are directly related to individual health
statuses. Machine Learning (ML) algorithms have enabled highly accurate
predictions, outperforming traditional statistical methods that struggle to
cope with individual patterns. However, ML requires a substantial amount of
data for training, which is challenging due to the presence of personal
identifiable information (PII) and the fragmentation posed by regulations. In
this paper, we present a federated machine learning (FML) approach that
minimizes privacy concerns and enables distributed training, without
transferring individual data. We employ collaborative training by considering
individuals and organizations under FML, which models both cross-device and
cross-silo learning scenarios. Our approach is evaluated on a real-world
dataset with non-independent and identically distributed (non-iid) data among
clients, class imbalance and label ambiguity. Our results show that data
selection and class imbalance handling techniques significantly improve the
predictive accuracy of models trained under FML, demonstrating comparable or
superior predictive performance than traditional ML models
Federated Learning for 5G Base Station Traffic Forecasting
Mobile traffic prediction is of great importance on the path of enabling 5G
mobile networks to perform smart and efficient infrastructure planning and
management. However, available data are limited to base station logging
information. Hence, training methods for generating high-quality predictions
that can generalize to new observations on different parties are in demand.
Traditional approaches require collecting measurements from different base
stations and sending them to a central entity, followed by performing machine
learning operations using the received data. The dissemination of local
observations raises privacy, confidentiality, and performance concerns,
hindering the applicability of machine learning techniques. Various distributed
learning methods have been proposed to address this issue, but their
application to traffic prediction has yet to be explored. In this work, we
study the effectiveness of federated learning applied to raw base station
aggregated LTE data for time-series forecasting. We evaluate one-step
predictions using 5 different neural network architectures trained with a
federated setting on non-iid data. The presented algorithms have been submitted
to the Global Federated Traffic Prediction for 5G and Beyond Challenge. Our
results show that the learning architectures adapted to the federated setting
achieve equivalent prediction error to the centralized setting, pre-processing
techniques on base stations lead to higher forecasting accuracy, while
state-of-the-art aggregators do not outperform simple approaches
Intelligent Client Selection for Federated Learning using Cellular Automata
Federated Learning (FL) has emerged as a promising solution for
privacy-enhancement and latency minimization in various real-world
applications, such as transportation, communications, and healthcare. FL
endeavors to bring Machine Learning (ML) down to the edge by harnessing data
from million of devices and IoT sensors, thus enabling rapid responses to
dynamic environments and yielding highly personalized results. However, the
increased amount of sensors across diverse applications poses challenges in
terms of communication and resource allocation, hindering the participation of
all devices in the federated process and prompting the need for effective FL
client selection. To address this issue, we propose Cellular Automaton-based
Client Selection (CA-CS), a novel client selection algorithm, which leverages
Cellular Automata (CA) as models to effectively capture spatio-temporal changes
in a fast-evolving environment. CA-CS considers the computational resources and
communication capacity of each participating client, while also accounting for
inter-client interactions between neighbors during the client selection
process, enabling intelligent client selection for online FL processes on data
streams that closely resemble real-world scenarios. In this paper, we present a
thorough evaluation of the proposed CA-CS algorithm using MNIST and CIFAR-10
datasets, while making a direct comparison against a uniformly random client
selection scheme. Our results demonstrate that CA-CS achieves comparable
accuracy to the random selection approach, while effectively avoiding
high-latency clients.Comment: 18th IEEE International Workshop on Cellular Nanoscale Networks and
their Application
ChromatoGate: A Tool for Detecting Base Mis-Calls in Multiple Sequence Alignments by Semi-Automatic Chromatogram Inspection
Automated DNA sequencers generate chromatograms that contain raw sequencing data. They also generate data that translates the chromatograms into molecular sequences of A, C, G, T, or N (undetermined) characters. Since chromatogram translation programs frequently introduce errors, a manual inspection of the generated sequence data is required. As sequence numbers and lengths increase, visual inspection and manual correction of chromatograms and corresponding sequences on a perpeak and per-nucleotide basis becomes an error-prone, time-consuming, and tedious process. Here, we introduce ChromatoGate (CG), an open-source software that accelerates and partially automates the inspection of chromatograms and the detection of sequencing errors for bidirectional sequencing runs. To provide users full control over the error correction process, a fully automated error correction algorithm has not been implemented. Initially, the program scans a given multiple sequence alignment (MSA) for potential sequencing errors, assuming that each polymorphic site in the alignment may be attributed to a sequencing error with a certain probability. The guided MSA assembly procedure in ChromatoGate detects chromatogram peaks of all characters in an alignment that lead to polymorphic sites, given a user-defined threshold. The threshold value represents the sensitivity of the sequencing error detection mechanism. After this pre-filtering, the user only needs to inspect a small number of peaks in every chromatogram to correct sequencing errors. Finally, we show that correcting sequencing errors is important, because population genetic and phylogenetic inferences can be misled by MSAs with uncorrected mis-calls. Our experiments indicate that estimates of population mutation rates can be affected two- to three-fold by uncorrected errors
- …