10,291 research outputs found

    Detection of regulator genes and eQTLs in gene networks

    Full text link
    Genetic differences between individuals associated to quantitative phenotypic traits, including disease states, are usually found in non-coding genomic regions. These genetic variants are often also associated to differences in expression levels of nearby genes (they are "expression quantitative trait loci" or eQTLs for short) and presumably play a gene regulatory role, affecting the status of molecular networks of interacting genes, proteins and metabolites. Computational systems biology approaches to reconstruct causal gene networks from large-scale omics data have therefore become essential to understand the structure of networks controlled by eQTLs together with other regulatory genes, and to generate detailed hypotheses about the molecular mechanisms that lead from genotype to phenotype. Here we review the main analytical methods and softwares to identify eQTLs and their associated genes, to reconstruct co-expression networks and modules, to reconstruct causal Bayesian gene and module networks, and to validate predicted networks in silico.Comment: minor revision with typos corrected; review article; 24 pages, 2 figure

    On the design of advanced filters for biological networks using graph theoretic properties

    Get PDF
    Network modeling of biological systems is a powerful tool for analysis of high-throughput datasets by computational systems biologists. Integration of networks to form a heterogeneous model requires that each network be as noise-free as possible while still containing relevant biological information. In earlier work, we have shown that the graph theoretic properties of gene correlation networks can be used to highlight and maintain important structures such as high degree nodes, clusters, and critical links between sparse network branches while reducing noise. In this paper, we propose the design of advanced network filters using structurally related graph theoretic properties. While spanning trees and chordal subgraphs provide filters with special advantages, we hypothesize that a hybrid subgraph sampling method will allow for the design of a more effective filter preserving key properties in biological networks. That the proposed approach allows us to optimize a number of parameters associated with the filtering process which in turn improves upon the identification of essential genes in mouse aging networks

    Evaluation and improvement of the regulatory inference for large co-expression networks with limited sample size

    Get PDF
    Abstract Background Co-expression has been widely used to identify novel regulatory relationships using high throughput measurements, such as microarray and RNA-seq data. Evaluation studies on co-expression network analysis methods mostly focus on networks of small or medium size of up to a few hundred nodes. For large networks, simulated expression data usually consist of hundreds or thousands of profiles with different perturbations or knock-outs, which is uncommon in real experiments due to their cost and the amount of work required. Thus, the performances of co-expression network analysis methods on large co-expression networks consisting of a few thousand nodes, with only a small number of profiles with a single perturbation, which more accurately reflect normal experimental conditions, are generally uncharacterized and unknown. Methods We proposed a novel network inference methods based on Relevance Low order Partial Correlation (RLowPC). RLowPC method uses a two-step approach to select on the high-confidence edges first by reducing the search space by only picking the top ranked genes from an intial partial correlation analysis and, then computes the partial correlations in the confined search space by only removing the linear dependencies from the shared neighbours, largely ignoring the genes showing lower association. Results We selected six co-expression-based methods with good performance in evaluation studies from the literature: Partial correlation, PCIT, ARACNE, MRNET, MRNETB and CLR. The evaluation of these methods was carried out on simulated time-series data with various network sizes ranging from 100 to 3000 nodes. Simulation results show low precision and recall for all of the above methods for large networks with a small number of expression profiles. We improved the inference significantly by refinement of the top weighted edges in the pre-inferred partial correlation networks using RLowPC. We found improved performance by partitioning large networks into smaller co-expressed modules when assessing the method performance within these modules. Conclusions The evaluation results show that current methods suffer from low precision and recall for large co-expression networks where only a small number of profiles are available. The proposed RLowPC method effectively reduces the indirect edges predicted as regulatory relationships and increases the precision of top ranked predictions. Partitioning large networks into smaller highly co-expressed modules also helps to improve the performance of network inference methods. The RLowPC R package for network construction, refinement and evaluation is available at GitHub: https://github.com/wyguo/RLowPC

    Evaluation of essential genes in correlation networks using measures of centrality

    Get PDF
    Correlation networks are emerging as powerful tools for modeling relationships in high-throughput data such as gene expression. Other types of biological networks, such as protein-protein interaction networks, are popular targets of study in network theory, and previous analysis has revealed that network structures identified using graph theoretic techniques often relate to certain biological functions. Structures such as highly connected nodes and groups of nodes have been found to correspond to essential genes and protein complexes, respectively. The correlation network, which measures the level of co-variation of gene expression levels, shares some structural properties with other types of biological networks. We created several correlation networks using publicly available gene expression data, and identified critical groups of nodes using graph theoretic properties used previously in other biological network studies. We found that some measures of network centrality can reveal genes of impact such as essential genes, suggesting that the correlation network can prove to be a powerful tool for modeling gene expression data. In addition, our method highlights the biological impact of nodes a set of high centrality nodes identified by combined measures of centrality to validate the link between structure and function in the notoriously noisy correlation network

    An Always Correlated gene expression landscape for ovine skeletal muscle, lessons learnt from comparison with an “equivalent” bovine landscape

    Get PDF
    BACKGROUND: We have recently described a method for the construction of an informative gene expression correlation landscape for a single tissue, longissimus muscle (LM) of cattle, using a small number (less than a hundred) of diverse samples. Does this approach facilitate interspecies comparison of networks? FINDINGS: Using gene expression datasets from LM samples from a single postnatal time point for high and low muscling sheep, and from a developmental time course (prenatal to postnatal) for normal sheep and sheep exhibiting the Callipyge muscling phenotype gene expression correlations were calculated across subsets of the data comparable to the bovine analysis. An “Always Correlated” gene expression landscape was constructed by integrating the correlations from the subsets of data and was compared to the equivalent landscape for bovine LM muscle. Whilst at the high level apparently equivalent modules were identified in the two species, at the detailed level overlap between genes in the equivalent modules was limited and generally not significant. Indeed, only 395 genes and 18 edges were in common between the two landscapes. CONCLUSIONS: Since it is unlikely that the equivalent muscles of two closely related species are as different as this analysis suggests, within tissue gene expression correlations appear to be very sensitive to the samples chosen for their construction, compounded by the different platforms used. Thus users need to be very cautious in interpretation of the differences. In future experiments, attention will be required to ensure equivalent experimental designs and use cross-species gene expression platform to enable the identification of true differences between different species

    Methods for network generation and spectral feature selection: especially on gene expression data

    Get PDF
    2019 Fall.Includes bibliographical references.Feature selection is an essential step in many data analysis pipelines due to its ability to remove unimportant data. We will describe how to realize a data set as a network using correlation, partial correlation, heat kernel and random edge generation methods. Then we lay out how to select features from these networks mainly leveraging the spectrum of the graph Laplacian, adjacency, and supra-adjacency matrices. We frame this work in the context of gene co-expression network analysis and proceed with a brief analysis of a small set of gene expression data for human subjects infected with the flu virus. We are able to distinguish two sets of 14-15 genes which produce two fold SSVM classification accuracies at certain times that are at least as high as classification accuracies done with more than 12,000 genes

    Identifying a Transcription Factor’s Regulatory Targets from its Binding Targets

    Get PDF
    ChIP-chip data, which shows binding of transcription factors (TFs) to promoter regions in vivo, are widely used by biologists to identify the regulatory targets of TFs. However, the binding of a TF to a gene does not necessarily imply regulation. Thus, it is important to develop computational methods which can extract a TF’s regulatory targets from its binding targets. We developed a method, called REgulatory Targets Extraction Algorithm (RETEA), which uses partial correlation analysis on gene expression data to extract a TF’s regulatory targets from its binding targets inferred from ChIP-chip data. We applied RETEA to yeast cell cycle microarray data and identified the plausible regulatory targets of eleven known cell cycle TFs. We validated our predictions by checking the enrichments for cell cycle-regulated genes, common cellular processes and common molecular functions. Finally, we showed that RETEA performs better than three published methods (MA-Network, TRIA and Garten et al’s method)
    • …
    corecore