76,427 research outputs found
Cancer driver gene detection in transcriptional regulatory networks using the structure analysis of weighted regulatory interactions
Identification of genes that initiate cell anomalies and cause cancer in
humans is among the important fields in the oncology researches. The mutation
and development of anomalies in these genes are then transferred to other genes
in the cell and therefore disrupt the normal functionality of the cell. These
genes are known as cancer driver genes (CDGs). Various methods have been
proposed for predicting CDGs, most of which based on genomic data and based on
computational methods. Therefore, some researchers have developed novel
bioinformatics approaches. In this study, we propose an algorithm, which is
able to calculate the effectiveness and strength of each gene and rank them by
using the gene regulatory networks and the stochastic analysis of regulatory
linking structures between genes. To do so, firstly we constructed the
regulatory network using gene expression data and the list of regulatory
interactions. Then, using biological and topological features of the network,
we weighted the regulatory interactions. After that, the obtained regulatory
interactions weight was used in interaction structure analysis process.
Interaction analysis was achieved using two separate Markov chains on the
bipartite graph obtained from the main graph of the gene network. To do so, the
stochastic approach for link-structure analysis has been implemented. The
proposed algorithm categorizes higher-ranked genes as driver genes. The
efficiency of the proposed algorithm, regarding the F-measure value and number
of identified driver genes, was compared with 23 other computational and
network-based methods
Learn to Generate Time Series Conditioned Graphs with Generative Adversarial Nets
Deep learning based approaches have been utilized to model and generate
graphs subjected to different distributions recently. However, they are
typically unsupervised learning based and unconditioned generative models or
simply conditioned on the graph-level contexts, which are not associated with
rich semantic node-level contexts. Differently, in this paper, we are
interested in a novel problem named Time Series Conditioned Graph Generation:
given an input multivariate time series, we aim to infer a target relation
graph modeling the underlying interrelationships between time series with each
node corresponding to each time series. For example, we can study the
interrelationships between genes in a gene regulatory network of a certain
disease conditioned on their gene expression data recorded as time series. To
achieve this, we propose a novel Time Series conditioned Graph
Generation-Generative Adversarial Networks (TSGG-GAN) to handle challenges of
rich node-level context structures conditioning and measuring similarities
directly between graphs and time series. Extensive experiments on synthetic and
real-word gene regulatory networks datasets demonstrate the effectiveness and
generalizability of the proposed TSGG-GAN
Generation of a Compendium of Transcription Factor Cascades and Identification of Potential Therapeutic Targets using Graph Machine Learning
Transcription factors (TFs) play a vital role in the regulation of gene
expression thereby making them critical to many cellular processes. In this
study, we used graph machine learning methods to create a compendium of TF
cascades using data extracted from the STRING database. A TF cascade is a
sequence of TFs that regulate each other, forming a directed path in the TF
network. We constructed a knowledge graph of 81,488 unique TF cascades, with
the longest cascade consisting of 62 TFs. Our results highlight the complex and
intricate nature of TF interactions, where multiple TFs work together to
regulate gene expression. We also identified 10 TFs with the highest regulatory
influence based on centrality measurements, providing valuable information for
researchers interested in studying specific TFs. Furthermore, our pathway
enrichment analysis revealed significant enrichment of various pathways and
functional categories, including those involved in cancer and other diseases,
as well as those involved in development, differentiation, and cell signaling.
The enriched pathways identified in this study may have potential as targets
for therapeutic intervention in diseases associated with dysregulation of
transcription factors. We have released the dataset, knowledge graph, and
graphML methods for the TF cascades, and created a website to display the
results, which can be accessed by researchers interested in using this dataset.
Our study provides a valuable resource for understanding the complex network of
interactions between TFs and their regulatory roles in cellular processes
Application of new probabilistic graphical models in the genetic regulatory networks studies
This paper introduces two new probabilistic graphical models for
reconstruction of genetic regulatory networks using DNA microarray data. One is
an Independence Graph (IG) model with either a forward or a backward search
algorithm and the other one is a Gaussian Network (GN) model with a novel
greedy search method. The performances of both models were evaluated on four
MAPK pathways in yeast and three simulated data sets. Generally, an IG model
provides a sparse graph but a GN model produces a dense graph where more
information about gene-gene interactions is preserved. Additionally, we found
two key limitations in the prediction of genetic regulatory networks using DNA
microarray data, the first is the sufficiency of sample size and the second is
the complexity of network structures may not be captured without additional
data at the protein level. Those limitations are present in all prediction
methods which used only DNA microarray data.Comment: 38 pages, 3 figure
On the inconsistency of â„“1-penalised sparse precision matrix estimation
Background: Various l(1)-penalised estimation methods such as graphical lasso and CLIME are widely used for sparse precision matrix estimation and learning of undirected network structure from data. Many of these methods have been shown to be consistent under various quantitative assumptions about the underlying true covariance matrix. Intuitively, these conditions are related to situations where the penalty term will dominate the optimisation. Results: We explore the consistency of l(1)-based methods for a class of bipartite graphs motivated by the structure of models commonly used for gene regulatory networks. We show that all l(1)-based methods fail dramatically for models with nearly linear dependencies between the variables. We also study the consistency on models derived from real gene expression data and note that the assumptions needed for consistency never hold even for modest sized gene networks and l(1)-based methods also become unreliable in practice for larger networks. Conclusions: Our results demonstrate that l(1)-penalised undirected network structure learning methods are unable to reliably learn many sparse bipartite graph structures, which arise often in gene expression data. Users of such methods should be aware of the consistency criteria of the methods and check if they are likely to be met in their application of interest.Peer reviewe
On the inconsistency of â„“1-penalised sparse precision matrix estimation
Background: Various l(1)-penalised estimation methods such as graphical lasso and CLIME are widely used for sparse precision matrix estimation and learning of undirected network structure from data. Many of these methods have been shown to be consistent under various quantitative assumptions about the underlying true covariance matrix. Intuitively, these conditions are related to situations where the penalty term will dominate the optimisation. Results: We explore the consistency of l(1)-based methods for a class of bipartite graphs motivated by the structure of models commonly used for gene regulatory networks. We show that all l(1)-based methods fail dramatically for models with nearly linear dependencies between the variables. We also study the consistency on models derived from real gene expression data and note that the assumptions needed for consistency never hold even for modest sized gene networks and l(1)-based methods also become unreliable in practice for larger networks. Conclusions: Our results demonstrate that l(1)-penalised undirected network structure learning methods are unable to reliably learn many sparse bipartite graph structures, which arise often in gene expression data. Users of such methods should be aware of the consistency criteria of the methods and check if they are likely to be met in their application of interest.Peer reviewe
Inferring Regulatory Networks by Combining Perturbation Screens and Steady State Gene Expression Profiles
Reconstructing transcriptional regulatory networks is an important task in
functional genomics. Data obtained from experiments that perturb genes by
knockouts or RNA interference contain useful information for addressing this
reconstruction problem. However, such data can be limited in size and/or are
expensive to acquire. On the other hand, observational data of the organism in
steady state (e.g. wild-type) are more readily available, but their
informational content is inadequate for the task at hand. We develop a
computational approach to appropriately utilize both data sources for
estimating a regulatory network. The proposed approach is based on a three-step
algorithm to estimate the underlying directed but cyclic network, that uses as
input both perturbation screens and steady state gene expression data. In the
first step, the algorithm determines causal orderings of the genes that are
consistent with the perturbation data, by combining an exhaustive search method
with a fast heuristic that in turn couples a Monte Carlo technique with a fast
search algorithm. In the second step, for each obtained causal ordering, a
regulatory network is estimated using a penalized likelihood based method,
while in the third step a consensus network is constructed from the highest
scored ones. Extensive computational experiments show that the algorithm
performs well in reconstructing the underlying network and clearly outperforms
competing approaches that rely only on a single data source. Further, it is
established that the algorithm produces a consistent estimate of the regulatory
network.Comment: 24 pages, 4 figures, 6 table
Feedbacks from the metabolic network to the genetic network reveal regulatory modules in E. coli and B. subtilis
The genetic regulatory network (GRN) plays a key role in controlling the
response of the cell to changes in the environment. Although the structure of
GRNs has been the subject of many studies, their large scale structure in the
light of feedbacks from the metabolic network (MN) has received relatively
little attention. Here we study the causal structure of the GRNs, namely the
chain of influence of one component on the other, taking into account feedback
from the MN. First we consider the GRNs of E. coli and B. subtilis without
feedback from MN and illustrate their causal structure. Next we augment the
GRNs with feedback from their respective MNs by including (a) links from genes
coding for enzymes to metabolites produced or consumed in reactions catalyzed
by those enzymes and (b) links from metabolites to genes coding for
transcription factors whose transcriptional activity the metabolites alter by
binding to them. We find that the inclusion of feedback from MN into GRN
significantly affects its causal structure, in particular the number of levels
and relative positions of nodes in the hierarchy, and the number and size of
the strongly connected components (SCCs). We then study the functional
significance of the SCCs. For this we identify condition specific feedbacks
from the MN into the GRN by retaining only those enzymes that are essential for
growth in specific environmental conditions simulated via the technique of flux
balance analysis (FBA). We find that the SCCs of the GRN augmented by these
feedbacks can be ascribed specific functional roles in the organism. Our
algorithmic approach thus reveals relatively autonomous subsystems with
specific functionality, or regulatory modules in the organism. This automated
approach could be useful in identifying biologically relevant modules in other
organisms for which network data is available, but whose biology is less well
studied.Comment: 15 figure
- …