106 research outputs found
Identifying targets of multiple co-regulating transcription factors from expression time-series by Bayesian model comparison
Background: Complete transcriptional regulatory network inference is a huge challenge because of the complexity
of the network and sparsity of available data. One approach to make it more manageable is to focus on the inference
of context-specific networks involving a few interacting transcription factors (TFs) and all of their target genes.
Results: We present a computational framework for Bayesian statistical inference of target genes of multiple
interacting TFs from high-throughput gene expression time-series data. We use ordinary differential equation models
that describe transcription of target genes taking into account combinatorial regulation. The method consists of a
training and a prediction phase. During the training phase we infer the unobserved TF protein concentrations on a
subnetwork of approximately known regulatory structure. During the prediction phase we apply Bayesian model
selection on a genome-wide scale and score all alternative regulatory structures for each target gene. We use our
methodology to identify targets of five TFs regulating Drosophila melanogaster mesoderm development. We find that
confident predicted links between TFs and targets are significantly enriched for supporting ChIP-chip binding events
and annotated TF-gene interations. Our method statistically significantly outperforms existing alternatives.
Conclusions: Our results show that it is possible to infer regulatory links between multiple interacting TFs and their
target genes even from a single relatively short time series and in presence of unmodelled confounders and
unreliable prior knowledge on training network connectivity. Introducing data from several different experimental
perturbations significantly increases the accuracy
Differentially private partitioned variational inference
Learning a privacy-preserving model from sensitive data which are distributed
across multiple devices is an increasingly important problem. The problem is
often formulated in the federated learning context, with the aim of learning a
single global model while keeping the data distributed. Moreover, Bayesian
learning is a popular approach for modelling, since it naturally supports
reliable uncertainty estimates. However, Bayesian learning is generally
intractable even with centralised non-private data and so approximation
techniques such as variational inference are a necessity. Variational inference
has recently been extended to the non-private federated learning setting via
the partitioned variational inference algorithm. For privacy protection, the
current gold standard is called differential privacy. Differential privacy
guarantees privacy in a strong, mathematically clearly defined sense.
In this paper, we present differentially private partitioned variational
inference, the first general framework for learning a variational approximation
to a Bayesian posterior distribution in the federated learning setting while
minimising the number of communication rounds and providing differential
privacy guarantees for data subjects.
We propose three alternative implementations in the general framework, one
based on perturbing local optimisation runs done by individual parties, and two
based on perturbing updates to the global model (one using a version of
federated averaging, the second one adding virtual parties to the protocol),
and compare their properties both theoretically and empirically.Comment: Published in TMLR 04/2023: https://openreview.net/forum?id=55Bcghgic
Creating a Dataset for Multilingual Fine-grained Emotion-detection Using Gamification-based Annotation
This paper introduces a gamified framework for fine-grained sentiment analysis and emotion detection. We present a flexible tool, Sentimentator, that can be used for efficient annotation based on crowd sourcing and a selfperpetuating gold standard. We also present a novel dataset with multi-dimensional annotations of emotions and sentiments in movie subtitles that enables research on sentiment preservation across languages and the creation of robust multilingual emotion detection tools. The tools and datasets are public and opensource and can easily be extended and applied for various purposes.Peer reviewe
Representation transfer for differentially private drug sensitivity prediction
Motivation Human genomic datasets often contain sensitive information that limits use and sharing of the data. In particular, simple anonymization strategies fail to provide sufficient level of protection for genomic data, because the data are inherently identifiable. Differentially private machine learning can help by guaranteeing that the published results do not leak too much information about any individual data point. Recent research has reached promising results on differentially private drug sensitivity prediction using gene expression data. Differentially private learning with genomic data is challenging because it is more difficult to guarantee privacy in high dimensions. Dimensionality reduction can help, but if the dimension reduction mapping is learned from the data, then it needs to be differentially private too, which can carry a significant privacy cost. Furthermore, the selection of any hyperparameters (such as the target dimensionality) needs to also avoid leaking private information. Results We study an approach that uses a large public dataset of similar type to learn a compact representation for differentially private learning. We compare three representation learning methods: variational autoencoders, principal component analysis and random projection. We solve two machine learning tasks on gene expression of cancer cell lines: cancer type classification, and drug sensitivity prediction. The experiments demonstrate significant benefit from all representation learning methods with variational autoencoders providing the most accurate predictions most often. Our results significantly improve over previous state-of-the-art in accuracy of differentially private drug sensitivity prediction. Availability and implementation Code used in the experiments is available at https://github.com/DPBayes/dp-representation-transfer.Peer reviewe
Strong pathogen competition in neonatal gut colonisation
Opportunistic bacterial pathogen species and their strains that colonise the human gut are generally understood to compete against both each other and the commensal species colonising this ecosystem. Currently we are lacking a population-wide quantification of strain-level colonisation dynamics and the relationship of colonisation potential to prevalence in disease, and how ecological factors might be modulating these. Here, using a combination of latest high-resolution metagenomics and strain-level genomic epidemiology methods we performed a characterisation of the competition and colonisation dynamics for a longitudinal cohort of neonatal gut microbiomes. We found strong inter- and intra-species competition dynamics in the gut colonisation process, but also a number of synergistic relationships among several species belonging to genus Klebsiella, which includes the prominent human pathogen Klebsiella pneumoniae. No evidence of preferential colonisation by hospital-adapted pathogen lineages in either vaginal or caesarean section birth groups was detected. Our analysis further enabled unbiased assessment of strain-level colonisation potential of extra-intestinal pathogenic Escherichia coli (ExPEC) in comparison with their propensity to cause bloodstream infections. Our study highlights the importance of systematic surveillance of bacterial gut pathogens, not only from disease but also from carriage state, to better inform therapies and preventive medicine in the future.Peer reviewe
A simple approach to ranking differentially expressed gene expression time courses through Gaussian process regression.
BACKGROUND: The analysis of gene expression from time series underpins many biological studies. Two basic forms of analysis recur for data of this type: removing inactive (quiet) genes from the study and determining which genes are differentially expressed. Often these analysis stages are applied disregarding the fact that the data is drawn from a time series. In this paper we propose a simple model for accounting for the underlying temporal nature of the data based on a Gaussian process. RESULTS: We review Gaussian process (GP) regression for estimating the continuous trajectories underlying in gene expression time-series. We present a simple approach which can be used to filter quiet genes, or for the case of time series in the form of expression ratios, quantify differential expression. We assess via ROC curves the rankings produced by our regression framework and compare them to a recently proposed hierarchical Bayesian model for the analysis of gene expression time-series (BATS). We compare on both simulated and experimental data showing that the proposed approach considerably outperforms the current state of the art. CONCLUSIONS: Gaussian processes offer an attractive trade-off between efficiency and usability for the analysis of microarray time series. The Gaussian process framework offers a natural way of handling biological replicates and missing values and provides confidence intervals along the estimated curves of gene expression. Therefore, we believe Gaussian processes should be a standard tool in the analysis of gene expression time series
Global modeling of transcriptional responses in interaction networks
Motivation: Cell-biological processes are regulated through a complex network
of interactions between genes and their products. The processes, their
activating conditions, and the associated transcriptional responses are often
unknown. Organism-wide modeling of network activation can reveal unique and
shared mechanisms between physiological conditions, and potentially as yet
unknown processes. We introduce a novel approach for organism-wide discovery
and analysis of transcriptional responses in interaction networks. The method
searches for local, connected regions in a network that exhibit coordinated
transcriptional response in a subset of conditions. Known interactions between
genes are used to limit the search space and to guide the analysis. Validation
on a human pathway network reveals physiologically coherent responses,
functional relatedness between physiological conditions, and coordinated,
context-specific regulation of the genes. Availability: Implementation is
freely available in R and Matlab at http://netpro.r-forge.r-project.orgComment: 19 pages, 13 figure
- …