203 research outputs found

    Priors on network structures. Biasing the search for Bayesian networks

    Get PDF
    AbstractIn this paper we show how a user can influence recovery of Bayesian networks from a database by specifying prior knowledge. The main novelty of our approach is that the user only has to provide partial prior knowledge, which is then completed to a full prior over all possible network structures. This partial prior knowledge is expressed among variables in an intuitive pairwise way, which embodies the uncertainty of the user about his/her own prior knowledge. Thus, the uncertainty of the model is updated in the normal Bayesian way

    Path Weights in Concentration Graphs

    Full text link
    A graphical model provides a compact and efficient representation of the association structure of a multivariate distribution by means of a graph. Relevant features of the distribution are represented by vertices, edges and other higher-order graphical structures, such as cliques or paths. Typically, paths play a central role in these models because they determine the independence relationships among variables. However, while a theory of path coefficients is available in models for directed graphs, little has been investigated about the strength of the association represented by a path in an undirected graph. Essentially, it has been shown that the covariance between two variables can be decomposed into a sum of weights associated with each of the paths connecting the two variables in the corresponding concentration graph. In this context, we consider concentration graph models and provide an extensive analysis of the properties of path weights and their interpretation. More specifically, we give an interpretation of covariance weights through their factorisation into a partial covariance and an inflation factor. We then extend the covariance decomposition over the paths of an undirected graph to other measures of association, such as the marginal correlation coefficient and a quantity that we call the inflated correlation. The usefulness of these findings is illustrated with an application to the analysis of dietary intake networks.Comment: 19 pages, 2 figures, 2 tables; revised manuscript after peer review; added DO

    Cancer stem cells in prostate cancer: implications for targeted therapy

    Get PDF
    Prostate cancer (PCa) is the most frequently diagnosed cancer in men and the second most common cause of cancer-related mortality among men in the developed world. Conventional anti-PCa therapies include surgery, radiation, hormonal ablation, and chemotherapy. Despite increasing efforts, these therapies are not effective for patients with advanced and/or metastatic disease. In most cases, cancer therapies fail due to an incomplete depletion of tumor cells, resulting in tumor relapse. The cancer stem cell (CSC) hypothesis is an emerging model that explains many of the molecular characteristics of oncological disease as well as the tendency of cancers to relapse, metastasize, and develop resistance to conventional therapies. CSCs are a reservoir of cancer cells that exhibit properties of self-renewal and the ability to reestablish the heterogeneous tumor cell population. The existence of PCa stem cells offers a theoretical explanation for many uncertainties regarding PCa and also for treatment resistance and disease progression once clinical cure is achieved. Therapies targeting CSCs might therefore lead to more effective cancer treatments, divergent from a traditional anti-proliferative approach, based on tumor bulk reduction accompanied by CSC-specific inhibition. Here, we focus on reviewing the historical perspective as well as concepts regarding stem cells and CSCs in PCa. In addition, we will report possible strategies and new clinical approaches that address the CSC-based concept of tumorigenesis in PCa. (C) 2017 S. Karger AG, Base

    Mapping eQTL networks with mixed graphical Markov models

    Full text link
    Expression quantitative trait loci (eQTL) mapping constitutes a challenging problem due to, among other reasons, the high-dimensional multivariate nature of gene-expression traits. Next to the expression heterogeneity produced by confounding factors and other sources of unwanted variation, indirect effects spread throughout genes as a result of genetic, molecular and environmental perturbations. From a multivariate perspective one would like to adjust for the effect of all of these factors to end up with a network of direct associations connecting the path from genotype to phenotype. In this paper we approach this challenge with mixed graphical Markov models, higher-order conditional independences and q-order correlation graphs. These models show that additive genetic effects propagate through the network as function of gene-gene correlations. Our estimation of the eQTL network underlying a well-studied yeast data set leads to a sparse structure with more direct genetic and regulatory associations that enable a straightforward comparison of the genetic control of gene expression across chromosomes. Interestingly, it also reveals that eQTLs explain most of the expression variability of network hub genes.Comment: 48 pages, 8 figures, 2 supplementary figures; fixed problems with embedded fonts; figure 7 sideways for improving display; minor fixes; major revision of the paper after journal review; fixed missing .bbl file; 36 pages, 6 figures, 2 table

    Comparison of splice sites in mammals and chicken.

    Full text link
    We have carried out an initial analysis of the dynamics of the recent evolution of the splice-sites sequences on a large collection of human, rodent (mouse and rat), and chicken introns. Our results indicate that the sequences of splice sites are largely homogeneous within tetrapoda. We have also found that orthologous splice signals between human and rodents and within rodents are more conserved than unrelated splice sites, but the additional conservation can be explained mostly by background intron conservation. In contrast, additional conservation over background is detectable in orthologous mammalian and chicken splice sites. Our results also indicate that the U2 and U12 intron classes seem to have evolved independently since the split of mammals and birds; we have not been able to find a convincing case of interconversion between these two classes in our collections of orthologous introns. Similarly, we have not found a single case of switching between AT-AC and GT-AG subtypes within U12 introns, suggesting that this event has been a rare occurrence in recent evolutionary times. Switching between GT-AG and the noncanonical GC-AG U2 subtypes, on the contrary, does not appear to be unusual; in particular, T to C mutations appear to be relatively well tolerated in GT-AG introns with very strong donor sites

    A flexible count data model to fit the wide diversity of expression profiles arising from extensively replicated RNA-seq experiments

    Get PDF
    Background: High-throughput RNA sequencing (RNA-seq) offers unprecedented power to capture the real dynamics of gene expression. Experimental designs with extensive biological replication present a unique opportunity to exploit this feature and distinguish expression profiles with higher resolution. RNA-seq data analysis methods so far have been mostly applied to data sets with few replicates and their default settings try to provide the best performance under this constraint. These methods are based on two well-known count data distributions: the Poisson and the negative binomial. The way to properly calibrate them with large RNA-seq data sets is not trivial for the non-expert bioinformatics user. Results: Here we show that expression profiles produced by extensively-replicated RNA-seq experiments lead to a rich diversity of count data distributions beyond the Poisson and the negative binomial, such as Poisson-Inverse Gaussian or PĂłlya-Aeppli, which can be captured by a more general family of count data distributions called the Poisson-Tweedie. The flexibility of the Poisson-Tweedie family enables a direct fitting of emerging features of large expression profiles, such as heavy-tails or zero-inflation, without the need to alter a single configuration parameter. We provide a software package for R called tweeDEseq implementing a new test for differential expression based on the Poisson-Tweedie family. Using simulations on synthetic and real RNA-seq data we show that tweeDEseq yields P-values that are equally or more accurate than competing methods under different configuration parameters. By surveying the tiny fraction of sex-specific gene expression changes in human lymphoblastoid cell lines, we also show that tweeDEseq accurately detects differentially expressed genes in a real large RNA-seq data set with improved performance and reproducibility over the previously compared methodologies. Finally, we compared the results with those obtained from microarrays in order to check for reproducibility. Conclusions: RNA-seq data with many replicates leads to a handful of count data distributions which can be accurately estimated with the statistical model illustrated in this paper. This method provides a better fit to the underlying biological variability; this may be critical when comparing groups of RNA-seq samples with markedly different count data distributions. The tweeDEseq package forms part of the Bioconductor project and it is available for download at http://www.bioconductor.or

    Global analysis of alternative splicing regulation by insulin and wingless signaling in Drosophila cells

    Get PDF
    A genome-wide analysis of the response to insulin and wingless activation using splicing-sensitive microarrays shows distinct but overlapping programs of transcriptional and posttranscriptional regulation

    Comparative gene finding in chicken indicates that we are closing in on the set of multi-exonic widely expressed human genes

    Get PDF
    The recent availability of the chicken genome sequence poses the question of whether there are human protein-coding genes conserved in chicken that are currently not included in the human gene catalog. Here, we show, using comparative gene finding followed by experimental verification of exon pairs by RT-PCR, that the addition to the multi-exonic subset of this catalog could be as little as 0.2%, suggesting that we may be closing in on the human gene set. Our protocol, however, has two shortcomings: (i) the bioinformatic screening of the predicted genes, applied to filter out false positives, cannot handle intronless genes; and (ii) the experimental verification could fail to identify expression at a specific developmental time. This highlights the importance of developing methods that could provide a reliable estimate of the number of these two types of gene
    • …
    corecore