234 research outputs found

    Differential Forms on Log Canonical Spaces

    Get PDF
    The present paper is concerned with differential forms on log canonical varieties. It is shown that any p-form defined on the smooth locus of a variety with canonical or klt singularities extends regularly to any resolution of singularities. In fact, a much more general theorem for log canonical pairs is established. The proof relies on vanishing theorems for log canonical varieties and on methods of the minimal model program. In addition, a theory of differential forms on dlt pairs is developed. It is shown that many of the fundamental theorems and techniques known for sheaves of logarithmic differentials on smooth varieties also hold in the dlt setting. Immediate applications include the existence of a pull-back map for reflexive differentials, generalisations of Bogomolov-Sommese type vanishing results, and a positive answer to the Lipman-Zariski conjecture for klt spaces.Comment: 72 pages, 6 figures. A shortened version of this paper has appeared in Publications math\'ematiques de l'IH\'ES. The final publication is available at http://www.springerlink.co

    Differential expression analysis with global network adjustment

    Get PDF
    <p>Background: Large-scale chromosomal deletions or other non-specific perturbations of the transcriptome can alter the expression of hundreds or thousands of genes, and it is of biological interest to understand which genes are most profoundly affected. We present a method for predicting a gene’s expression as a function of other genes thereby accounting for the effect of transcriptional regulation that confounds the identification of genes differentially expressed relative to a regulatory network. The challenge in constructing such models is that the number of possible regulator transcripts within a global network is on the order of thousands, and the number of biological samples is typically on the order of 10. Nevertheless, there are large gene expression databases that can be used to construct networks that could be helpful in modeling transcriptional regulation in smaller experiments.</p> <p>Results: We demonstrate a type of penalized regression model that can be estimated from large gene expression databases, and then applied to smaller experiments. The ridge parameter is selected by minimizing the cross-validation error of the predictions in the independent out-sample. This tends to increase the model stability and leads to a much greater degree of parameter shrinkage, but the resulting biased estimation is mitigated by a second round of regression. Nevertheless, the proposed computationally efficient “over-shrinkage” method outperforms previously used LASSO-based techniques. In two independent datasets, we find that the median proportion of explained variability in expression is approximately 25%, and this results in a substantial increase in the signal-to-noise ratio allowing more powerful inferences on differential gene expression leading to biologically intuitive findings. We also show that a large proportion of gene dependencies are conditional on the biological state, which would be impossible with standard differential expression methods.</p> <p>Conclusions: By adjusting for the effects of the global network on individual genes, both the sensitivity and reliability of differential expression measures are greatly improved.</p&gt

    Network 'small-world-ness': a quantitative method for determining canonical network equivalence

    Get PDF
    Background: Many technological, biological, social, and information networks fall into the broad class of 'small-world' networks: they have tightly interconnected clusters of nodes, and a shortest mean path length that is similar to a matched random graph (same number of nodes and edges). This semi-quantitative definition leads to a categorical distinction ('small/not-small') rather than a quantitative, continuous grading of networks, and can lead to uncertainty about a network's small-world status. Moreover, systems described by small-world networks are often studied using an equivalent canonical network model-the Watts-Strogatz (WS) model. However, the process of establishing an equivalent WS model is imprecise and there is a pressing need to discover ways in which this equivalence may be quantified. Methodology/Principal Findings: We defined a precise measure of 'small-world-ness' S based on the trade off between high local clustering and short path length. A network is now deemed a 'small-world' if S. 1-an assertion which may be tested statistically. We then examined the behavior of S on a large data-set of real-world systems. We found that all these systems were linked by a linear relationship between their S values and the network size n. Moreover, we show a method for assigning a unique Watts-Strogatz (WS) model to any real-world network, and show analytically that the WS models associated with our sample of networks also show linearity between S and n. Linearity between S and n is not, however, inevitable, and neither is S maximal for an arbitrary network of given size. Linearity may, however, be explained by a common limiting growth process. Conclusions/Significance: We have shown how the notion of a small-world network may be quantified. Several key properties of the metric are described and the use of WS canonical models is placed on a more secure footing

    Differential Dynamic Properties of Scleroderma Fibroblasts in Response to Perturbation of Environmental Stimuli

    Get PDF
    Diseases are believed to arise from dysregulation of biological systems (pathways) perturbed by environmental triggers. Biological systems as a whole are not just the sum of their components, rather ever-changing, complex and dynamic systems over time in response to internal and external perturbation. In the past, biologists have mainly focused on studying either functions of isolated genes or steady-states of small biological pathways. However, it is systems dynamics that play an essential role in giving rise to cellular function/dysfunction which cause diseases, such as growth, differentiation, division and apoptosis. Biological phenomena of the entire organism are not only determined by steady-state characteristics of the biological systems, but also by intrinsic dynamic properties of biological systems, including stability, transient-response, and controllability, which determine how the systems maintain their functions and performance under a broad range of random internal and external perturbations. As a proof of principle, we examine signal transduction pathways and genetic regulatory pathways as biological systems. We employ widely used state-space equations in systems science to model biological systems, and use expectation-maximization (EM) algorithms and Kalman filter to estimate the parameters in the models. We apply the developed state-space models to human fibroblasts obtained from the autoimmune fibrosing disease, scleroderma, and then perform dynamic analysis of partial TGF-β pathway in both normal and scleroderma fibroblasts stimulated by silica. We find that TGF-β pathway under perturbation of silica shows significant differences in dynamic properties between normal and scleroderma fibroblasts. Our findings may open a new avenue in exploring the functions of cells and mechanism operative in disease development

    Phenotype Prediction Using Regularized Regression on Genetic Data in the DREAM5 Systems Genetics B Challenge

    Get PDF
    A major goal of large-scale genomics projects is to enable the use of data from high-throughput experimental methods to predict complex phenotypes such as disease susceptibility. The DREAM5 Systems Genetics B Challenge solicited algorithms to predict soybean plant resistance to the pathogen Phytophthora sojae from training sets including phenotype, genotype, and gene expression data. The challenge test set was divided into three subcategories, one requiring prediction based on only genotype data, another on only gene expression data, and the third on both genotype and gene expression data. Here we present our approach, primarily using regularized regression, which received the best-performer award for subchallenge B2 (gene expression only). We found that despite the availability of 941 genotype markers and 28,395 gene expression features, optimal models determined by cross-validation experiments typically used fewer than ten predictors, underscoring the importance of strong regularization in noisy datasets with far more features than samples. We also present substantial analysis of the training and test setup of the challenge, identifying high variance in performance on the gold standard test sets.National Science Foundation (U.S.). Graduate Research Fellowship ProgramNational Defense Science and Engineering Graduate Fellowshi

    Hsp90 orchestrates transcriptional regulation by Hsf1 and cell wall remodelling by MAPK signalling during thermal adaptation in a pathogenic yeast

    Get PDF
    Acknowledgments We thank Rebecca Shapiro for creating CaLC1819, CaLC1855 and CaLC1875, Gillian Milne for help with EM, Aaron Mitchell for generously providing the transposon insertion mutant library, Jesus Pla for generously providing the hog1 hst7 mutant, and Cathy Collins for technical assistance.Peer reviewedPublisher PD

    Identifying Biological Network Structure, Predicting Network Behavior, and Classifying Network State With High Dimensional Model Representation (HDMR)

    Get PDF
    This work presents an adapted Random Sampling - High Dimensional Model Representation (RS-HDMR) algorithm for synergistically addressing three key problems in network biology: (1) identifying the structure of biological networks from multivariate data, (2) predicting network response under previously unsampled conditions, and (3) inferring experimental perturbations based on the observed network state. RS-HDMR is a multivariate regression method that decomposes network interactions into a hierarchy of non-linear component functions. Sensitivity analysis based on these functions provides a clear physical and statistical interpretation of the underlying network structure. The advantages of RS-HDMR include efficient extraction of nonlinear and cooperative network relationships without resorting to discretization, prediction of network behavior without mechanistic modeling, robustness to data noise, and favorable scalability of the sampling requirement with respect to network size. As a proof-of-principle study, RS-HDMR was applied to experimental data measuring the single-cell response of a protein-protein signaling network to various experimental perturbations. A comparison to network structure identified in the literature and through other inference methods, including Bayesian and mutual-information based algorithms, suggests that RS-HDMR can successfully reveal a network structure with a low false positive rate while still capturing non-linear and cooperative interactions. RS-HDMR identified several higher-order network interactions that correspond to known feedback regulations among multiple network species and that were unidentified by other network inference methods. Furthermore, RS-HDMR has a better ability to predict network response under unsampled conditions in this application than the best statistical inference algorithm presented in the recent DREAM3 signaling-prediction competition. RS-HDMR can discern and predict differences in network state that arise from sources ranging from intrinsic cell-cell variability to altered experimental conditions, such as when drug perturbations are introduced. This ability ultimately allows RS-HDMR to accurately classify the experimental conditions of a given sample based on its observed network state

    Time lagged information theoretic approaches to the reverse engineering of gene regulatory networks

    Get PDF
    Background: A number of models and algorithms have been proposed in the past for gene regulatory network (GRN) inference; however, none of them address the effects of the size of time-series microarray expression data in terms of the number of time-points. In this paper, we study this problem by analyzing the behaviour of three algorithms based on information theory and dynamic Bayesian network (DBN) models. These algorithms were implemented on different sizes of data generated by synthetic networks. Experiments show that the inference accuracy of these algorithms reaches a saturation point after a specific data size brought about by a saturation in the pair-wise mutual information (MI) metric; hence there is a theoretical limit on the inference accuracy of information theory based schemes that depends on the number of time points of micro-array data used to infer GRNs. This illustrates the fact that MI might not be the best metric to use for GRN inference algorithms. To circumvent the limitations of the MI metric, we introduce a new method of computing time lags between any pair of genes and present the pair-wise time lagged Mutual Information (TLMI) and time lagged Conditional Mutual Information (TLCMI) metrics. Next we use these new metrics to propose novel GRN inference schemes which provides higher inference accuracy based on the precision and recall parameters. Results: It was observed that beyond a certain number of time-points (i.e., a specific size) of micro-array data, the performance of the algorithms measured in terms of the recall-to-precision ratio saturated due to the saturation in the calculated pair-wise MI metric with increasing data size. The proposed algorithms were compared to existing approaches on four different biological networks. The resulting networks were evaluated based on the benchmark precision and recall metrics and the results favour our approach. Conclusions: To alleviate the effects of data size on information theory based GRN inference algorithms, novel time lag based information theoretic approaches to infer gene regulatory networks have been proposed. The results show that the time lags of regulatory effects between any pair of genes play an important role in GRN inference schemes

    A Relative Variation-Based Method to Unraveling Gene Regulatory Networks

    Get PDF
    Gene regulatory network (GRN) reconstruction is essential in understanding the functioning and pathology of a biological system. Extensive models and algorithms have been developed to unravel a GRN. The DREAM project aims to clarify both advantages and disadvantages of these methods from an application viewpoint. An interesting yet surprising observation is that compared with complicated methods like those based on nonlinear differential equations, etc., methods based on a simple statistics, such as the so-called -score, usually perform better. A fundamental problem with the -score, however, is that direct and indirect regulations can not be easily distinguished. To overcome this drawback, a relative expression level variation (RELV) based GRN inference algorithm is suggested in this paper, which consists of three major steps. Firstly, on the basis of wild type and single gene knockout/knockdown experimental data, the magnitude of RELV of a gene is estimated. Secondly, probability for the existence of a direct regulation from a perturbed gene to a measured gene is estimated, which is further utilized to estimate whether a gene can be regulated by other genes. Finally, the normalized RELVs are modified to make genes with an estimated zero in-degree have smaller RELVs in magnitude than the other genes, which is used afterwards in queuing possibilities of the existence of direct regulations among genes and therefore leads to an estimate on the GRN topology. This method can in principle avoid the so-called cascade errors under certain situations. Computational results with the Size 100 sub-challenges of DREAM3 and DREAM4 show that, compared with the -score based method, prediction performances can be substantially improved, especially the AUPR specification. Moreover, it can even outperform the best team of both DREAM3 and DREAM4. Furthermore, the high precision of the obtained most reliable predictions shows that the suggested algorithm may be very helpful in guiding biological experiment designs

    Differential Gene Expression Regulated by Oscillatory Transcription Factors

    Get PDF
    Cells respond to changes in the internal and external environment by a complex regulatory system whose end-point is the activation of transcription factors controlling the expression of a pool of ad-hoc genes. Recent experiments have shown that certain stimuli may trigger oscillations in the concentration of transcription factors such as NF-B and p53 influencing the final outcome of the genetic response. In this study we investigate the role of oscillations in the case of three different well known gene regulatory mechanisms using mathematical models based on ordinary differential equations and numerical simulations. We considered the cases of direct regulation, two-step regulation and feed-forward loops, and characterized their response to oscillatory input signals both analytically and numerically. We show that in the case of indirect two-step regulation the expression of genes can be turned on or off in a frequency dependent manner, and that feed-forward loops are also able to selectively respond to the temporal profile of oscillating transcription factors
    corecore