731 research outputs found

    Mining Pure, Strict Epistatic Interactions from High-Dimensional Datasets: Ameliorating the Curse of Dimensionality

    Get PDF
    Background: The interaction between loci to affect phenotype is called epistasis. It is strict epistasis if no proper subset of the interacting loci exhibits a marginal effect. For many diseases, it is likely that unknown epistatic interactions affect disease susceptibility. A difficulty when mining epistatic interactions from high-dimensional datasets concerns the curse of dimensionality. There are too many combinations of SNPs to perform an exhaustive search. A method that could locate strict epistasis without an exhaustive search can be considered the brass ring of methods for analyzing high-dimensional datasets. Methodology/Findings: A SNP pattern is a Bayesian network representing SNP-disease relationships. The Bayesian score for a SNP pattern is the probability of the data given the pattern, and has been used to learn SNP patterns. We identified a bound for the score of a SNP pattern. The bound provides an upper limit on the Bayesian score of any pattern that could be obtained by expanding a given pattern. We felt that the bound might enable the data to say something about the promise of expanding a 1-SNP pattern even when there are no marginal effects. We tested the bound using simulated datasets and semi-synthetic high-dimensional datasets obtained from GWAS datasets. We found that the bound was able to dramatically reduce the search time for strict epistasis. Using an Alzheimer's dataset, we showed that it is possible to discover an interaction involving the APOE gene based on its score because of its large marginal effect, but that the bound is most effective at discovering interactions without marginal effects. Conclusions/Significance: We conclude that the bound appears to ameliorate the curse of dimensionality in high-dimensional datasets. This is a very consequential result and could be pivotal in our efforts to reveal the dark matter of genetic disease risk from high-dimensional datasets. © 2012 Jiang, Neapolitan

    Micro-analysis of seriation skills

    Get PDF

    A note of caution on maximizing entropy

    Get PDF
    The Principle of Maximum Entropy is often used to update probabilities due to evidence instead of performing Bayesian updating using Bayes' Theorem, and its use often has efficacious results. However, in some circumstances the results seem unacceptable and unintuitive. This paper discusses some of these cases, and discusses how to identify some of the situations in which this principle should not be used. The paper starts by reviewing three approaches to probability, namely the classical approach, the limiting frequency approach, and the Bayesian approach. It then introduces maximum entropy and shows its relationship to the three approaches. Next, through examples, it shows that maximizing entropy sometimes can stand in direct opposition to Bayesian updating based on reasonable prior beliefs. The paper concludes that if we take the Bayesian approach that probability is about reasonable belief based on all available information, then we can resolve the conflict between the maximum entropy approach and the Bayesian approach that is demonstrated in the examples

    Study of integrated heterogeneous data reveals prognostic power of gene expression for breast cancer survival

    Get PDF
    Background: Studies show that thousands of genes are associated with prognosis of breast cancer. Towards utilizing available genetic data, efforts have been made to predict outcomes using gene expression data, and a number of commercial products have been developed. These products have the following shortcomings: 1) They use the Cox model for prediction. However, the RSF model has been shown to significantly outperform the Cox model. 2) Testing was not done to see if a complete set of clinical predictors could predict as well as the gene expression signatures. Methodology/Findings: We address these shortcomings. The METABRIC data set concerns 1981 breast cancer tumors. Features include 21 clinical features, expression levels for 16,384 genes, and survival. We compare the survival prediction performance of the Cox model and the RSF model using the clinical data and the gene expression data to their performance using only the clinical data. We obtain significantly better results when we used both clinical data and gene expression data for 5 year, 10 year, and 15 year survival prediction. When we replace the gene expression data by PAM50 subtype, our results are significant only for 5 year and 15 year prediction. We obtain significantly better results using the RSF model over the Cox model. Finally, our results indicate that gene expression data alone may predict longterm survival. Conclusions/Significance: Our results indicate that we can obtain improved survival prediction using clinical data and gene expression data compared to prediction using only clinical data. We further conclude that we can obtain improved survival prediction using the RSF model instead of the Cox model. These results are significant because by incorporating more gene expression data with clinical features and using the RSF model, we could develop decision support systems that better utilize heterogeneous information to improve outcome prediction and decision making

    AN EMPIRICAL EXAMINATION OF THE EXISTENCE OF ART, ART/CRAFT, AND CRAFT SEGMENT AMONG CRAFT MEDIA WORKERS

    Get PDF
    In the last twenty years there has been a dramatic resurgence in the creation, sales, -and use of hand-crafted objects in the United States. However, the craft media workers of today no longer serve their local community creating utilitarian objects, but work in diverse styles according to diverse standards. Becker (1978) has proposed that three largely distinct segments exist among craft media workers: an art segment, an art/craft segment, and a craft segment. These segments can be distinguished from each other by their differing conventions and orientations. These conventions and orientations then serve as the basis for cooperative activity and result in the segments not only creating different styles of objects but with different institutional links and audiences. This study, utilizing data from a national survey of craft media workers conducted for the National Endowment for the Arts, tests Becker's propositions by examining whether craft media workers who have different conventions and orientations constitute different segments having different training, involvements, markets, goals, satisfactions, and problems
    • …
    corecore