12,194 research outputs found

    Non-linear mapping for exploratory data analysis in functional genomics

    Get PDF
    BACKGROUND: Several supervised and unsupervised learning tools are available to classify functional genomics data. However, relatively less attention has been given to exploratory, visualisation-driven approaches. Such approaches should satisfy the following factors: Support for intuitive cluster visualisation, user-friendly and robust application, computational efficiency and generation of biologically meaningful outcomes. This research assesses a relaxation method for non-linear mapping that addresses these concerns. Its applications to gene expression and protein-protein interaction data analyses are investigated RESULTS: Publicly available expression data originating from leukaemia, round blue-cell tumours and Parkinson disease studies were analysed. The method distinguished relevant clusters and critical analysis areas. The system does not require assumptions about the inherent class structure of the data, its mapping process is controlled by only one parameter and the resulting transformations offer intuitive, meaningful visual displays. Comparisons with traditional mapping models are presented. As a way of promoting potential, alternative applications of the methodology presented, an example of exploratory data analysis of interactome networks is illustrated. Data from the C. elegans interactome were analysed. Results suggest that this method might represent an effective solution for detecting key network hubs and for clustering biologically meaningful groups of proteins. CONCLUSION: A relaxation method for non-linear mapping provided the basis for visualisation-driven analyses using different types of data. This study indicates that such a system may represent a user-friendly and robust approach to exploratory data analysis. It may allow users to gain better insights into the underlying data structure, detect potential outliers and assess assumptions about the cluster composition of the data

    VizRank: Data Visualization Guided by Machine Learning

    Get PDF
    Data visualization plays a crucial role in identifying interesting patterns in exploratory data analysis. Its use is, however, made difficult by the large number of possible data projections showing different attribute subsets that must be evaluated by the data analyst. In this paper, we introduce a method called VizRank, which is applied on classified data to automatically select the most useful data projections. VizRank can be used with any visualization method that maps attribute values to points in a two-dimensional visualization space. It assesses possible data projections and ranks them by their ability to visually discriminate between classes. The quality of class separation is estimated by computing the predictive accuracy of k-nearest neighbor classifier on the data set consisting of x and y positions of the projected data points and their class information. The paper introduces the method and presents experimental results which show that VizRank's ranking of projections highly agrees with subjective rankings by data analysts. The practical use of VizRank is also demonstrated by an application in the field of functional genomics

    Association of the IL-10 gene family locus on chromosome 1 with juvenile idiopathic arthritis (JIA)

    Get PDF
    The cytokine IL-10 and its family members have been implicated in autoimmune diseases and we have previously reported that genetic variants in IL-10 were associated with a rare group of diseases called juvenile idiopathic arthritis (JIA). The aim of this study was to fine map genetic variants within the IL-10 cytokine family cluster on chromosome 1 using linkage disequilibrium (LD)-tagging single nucleotide polymorphisms (tSNPs) approach with imputation and conditional analysis to test for disease associations

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Of mice and men: Sparse statistical modeling in cardiovascular genomics

    Full text link
    In high-throughput genomics, large-scale designed experiments are becoming common, and analysis approaches based on highly multivariate regression and anova concepts are key tools. Shrinkage models of one form or another can provide comprehensive approaches to the problems of simultaneous inference that involve implicit multiple comparisons over the many, many parameters representing effects of design factors and covariates. We use such approaches here in a study of cardiovascular genomics. The primary experimental context concerns a carefully designed, and rich, gene expression study focused on gene-environment interactions, with the goals of identifying genes implicated in connection with disease states and known risk factors, and in generating expression signatures as proxies for such risk factors. A coupled exploratory analysis investigates cross-species extrapolation of gene expression signatures--how these mouse-model signatures translate to humans. The latter involves exploration of sparse latent factor analysis of human observational data and of how it relates to projected risk signatures derived in the animal models. The study also highlights a range of applied statistical and genomic data analysis issues, including model specification, computational questions and model-based correction of experimental artifacts in DNA microarray data.Comment: Published at http://dx.doi.org/10.1214/07-AOAS110 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Wolf outside, dog inside? The genomic make-up of the Czechoslovakian Wolfdog

    Get PDF
    Background Genomic methods can provide extraordinary tools to explore the genetic background of wild species and domestic breeds, optimize breeding practices, monitor and limit the spread of recessive diseases, and discourage illegal crossings. In this study we analysed a panel of 170k Single Nucleotide Polymorphisms with a combination of multivariate, Bayesian and outlier gene approaches to examine the genome-wide diversity and inbreeding levels in a recent wolf x dog cross-breed, the Czechoslovakian Wolfdog, which is becoming increasingly popular across Europe. Results Pairwise FST values, multivariate and assignment procedures indicated that the Czechoslovakian Wolfdog was significantly differentiated from all the other analysed breeds and also well-distinguished from both parental populations (Carpathian wolves and German Shepherds). Coherently with the low number of founders involved in the breed selection, the individual inbreeding levels calculated from homozygosity regions were relatively high and comparable with those derived from the pedigree data. In contrast, the coefficient of relatedness between individuals estimated from the pedigrees often underestimated the identity-by-descent scores determined using genetic profiles. The timing of the admixture and the effective population size trends estimated from the LD patterns reflected the documented history of the breed. Ancestry reconstruction methods identified more than 300 genes with excess of wolf ancestry compared to random expectations, mainly related to key morphological features, and more than 2000 genes with excess of dog ancestry, playing important roles in lipid metabolism, in the regulation of circadian rhythms, in learning and memory processes, and in sociability, such as the COMT gene, which has been described as a candidate gene for the latter trait in dogs. Conclusions In this study we successfully applied genome-wide procedures to reconstruct the history of the Czechoslovakian Wolfdog, assess individual wolf ancestry proportions and, thanks to the availability of a well-annotated reference genome, identify possible candidate genes for wolf-like and dog-like phenotypic traits typical of this breed, including commonly inherited disorders. Moreover, through the identification of ancestry-informative markers, these genomic approaches could provide tools for forensic applications to unmask illegal crossings with wolves and uncontrolled trades of recent and undeclared wolfdog hybrids
    • …
    corecore