927 research outputs found

    Simple integrative preprocessing preserves what is shared in data sources

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Bioinformatics data analysis toolbox needs general-purpose, fast and easily interpretable preprocessing tools that perform data integration during exploratory data analysis. Our focus is on vector-valued data sources, each consisting of measurements of the same entity but on different variables, and on tasks where source-specific variation is considered noisy or not interesting. Principal components analysis of all sources combined together is an obvious choice if it is not important to distinguish between data source-specific and shared variation. Canonical Correlation Analysis (CCA) focuses on mutual dependencies and discards source-specific "noise" but it produces a separate set of components for each source.</p> <p>Results</p> <p>It turns out that components given by CCA can be combined easily to produce a linear and hence fast and easily interpretable feature extraction method. The method fuses together several sources, such that the properties they share are preserved. Source-specific variation is discarded as uninteresting. We give the details and implement them in a software tool. The method is demonstrated on gene expression measurements in three case studies: classification of cell cycle regulated genes in yeast, identification of differentially expressed genes in leukemia, and defining stress response in yeast. The software package is available at <url>http://www.cis.hut.fi/projects/mi/software/drCCA/</url>.</p> <p>Conclusion</p> <p>We introduced a method for the task of data fusion for exploratory data analysis, when statistical dependencies between the sources and not within a source are interesting. The method uses canonical correlation analysis in a new way for dimensionality reduction, and inherits its good properties of being simple, fast, and easily interpretable as a linear projection.</p

    What I talk about when I talk about integration of single-cell data

    Get PDF
    Over the past decade, single-cell technologies evolved from profiling hundreds of cells to millions of cells, and emerged from a single modality of data to cover multiple views at single-cell resolution, including genome, epigenome, transcriptome, and so on. With advance of these single-cell technologies, the booming of multimodal single-cell data creates a valuable resource for us to understand cellular heterogeneity and molecular mechanism at a comprehensive level. However, the large-scale multimodal single-cell data also presents a huge computational challenge for insightful integrative analysis. Here, I will lay out problems in data integration that single-cell research community is interested in and introduce computational principles for solving these integration problems. In the following chapters, I will present four computational methods for data integration under different scenarios. Finally, I will discuss some future directions and potential applications of single-cell data integration

    GSuite HyperBrowser: integrative analysis of dataset collections across the genome and epigenome

    Get PDF
    Background: Recent large-scale undertakings such as ENCODE and Roadmap Epigenomics have generated experimental data mapped to the human reference genome (as genomic tracks) representing a variety of functional elements across a large number of cell types. Despite the high potential value of these publicly available data for a broad variety of investigations, little attention has been given to the analytical methodology necessary for their widespread utilisation. Findings: We here present a first principled treatment of the analysis of collections of genomic tracks. We have developed novel computational and statistical methodology to permit comparative and confirmatory analyses across multiple and disparate data sources. We delineate a set of generic questions that are useful across a broad range of investigations and discuss the implications of choosing different statistical measures and null models. Examples include contrasting analyses across different tissues or diseases. The methodology has been implemented in a comprehensive open-source software system, the GSuite HyperBrowser. To make the functionality accessible to biologists, and to facilitate reproducible analysis, we have also developed a web-based interface providing an expertly guided and customizable way of utilizing the methodology. With this system, many novel biological questions can flexibly be posed and rapidly answered. Conclusions: Through a combination of streamlined data acquisition, interoperable representation of dataset collections, and customizable statistical analysis with guided setup and interpretation, the GSuite HyperBrowser represents a first comprehensive solution for integrative analysis of track collections across the genome and epigenome. The software is available at: https://hyperbrowser.uio.no.This work was supported by the Research Council of Norway (under grant agreements 221580, 218241, and 231217/F20), by the Norwegian Cancer Society (under grant agreements 71220’PR-2006-0433 and 3485238-2013), and by the South-Eastern Norway Regional Health Authority (under grant agreement 2014041).Peer Reviewe

    The role of decision confidence in advice-taking and trust formation

    Full text link
    In a world where ideas flow freely between people across multiple platforms, we often find ourselves relying on others' information without an objective standard to judge whether those opinions are accurate. The present study tests an agreement-in-confidence hypothesis of advice perception, which holds that internal metacognitive evaluations of decision confidence play an important functional role in the perception and use of social information, such as peers' advice. We propose that confidence can be used, computationally, to estimate advisors' trustworthiness and advice reliability. Specifically, these processes are hypothesized to be particularly important in situations where objective feedback is absent or difficult to acquire. Here, we use a judge-advisor system paradigm to precisely manipulate the profiles of virtual advisors whose opinions are provided to participants performing a perceptual decision making task. We find that when advisors' and participants' judgments are independent, people are able to discriminate subtle advice features, like confidence calibration, whether or not objective feedback is available. However, when observers' judgments (and judgment errors) are correlated - as is the case in many social contexts - predictable distortions can be observed between feedback and feedback-free scenarios. A simple model of advice reliability estimation, endowed with metacognitive insight, is able to explain key patterns of results observed in the human data. We use agent-based modeling to explore implications of these individual-level decision strategies for network-level patterns of trust and belief formation

    Metagenomic analysis of dental calculus in ancient Egyptian baboons

    Get PDF
    Dental calculus, or mineralized plaque, represents a record of ancient biomolecules and food residues. Recently, ancient metagenomics made it possible to unlock the wealth of microbial and dietary information of dental calculus to reconstruct oral microbiomes and lifestyle of humans from the past. Although most studies have so far focused on ancient humans, dental calculus is known to form in a wide range of animals, potentially informing on how human-animal interactions changed the animals' oral ecology. Here, we characterise the oral microbiome of six ancient Egyptian baboons held in captivity during the late Pharaonic era (9th-6th centuries BC) and of two historical baboons from a zoo via shotgun metagenomics. We demonstrate that these captive baboons possessed a distinctive oral microbiome when compared to ancient and modern humans, Neanderthals and a wild chimpanzee. These results may reflect the omnivorous dietary behaviour of baboons, even though health, food provisioning and other factors associated with human management, may have changed the baboons' oral microbiome. We anticipate our study to be a starting point for more extensive studies on ancient animal oral microbiomes to examine the extent to which domestication and human management in the past affected the diet, health and lifestyle of target animals

    Multiclass microarray gene expression classification based on fusion of correlation features

    Get PDF
    In this paper, we propose novel algorithmic models based on fusion of independent and correlated gene features for multiclass microarray gene expression classification. It is possible for genes to get co-expressed via different pathways. Moreover, a gene may or may not be co-active for all samples. In this paper, we approach this problem with a optimal feature selection technique using analysis based on statistical techniques to model the complex interactions between genes. The two different types of correlation modelling techniques based on the cross modal factor analysis (CFA) and canonical correlation analysis (CCA) were examined. The subsequent fusion of CCA/CFA features with principal component analysis (PCA) features at feature-level, and at score-level result in significant enhancement in classification accuracy for different data sets corresponding to multiclass microarray gene expression data
    • …
    corecore