7,332 research outputs found
chroGPS, a global chromatin positioning system for the functional analysis and visualization of the epigenome
Development of tools to jointly visualize the genome and the epigenome remains a challenge. chroGPS is a computational approach that addresses this question. chroGPS uses multidimensional scaling techniques to represent similarity between epigenetic factors, or between genetic elements on the basis of their epigenetic state, in 2D/3D reference maps. We emphasize biological interpretability, statistical robustness, integration of genetic and epigenetic data from heterogeneous sources, and computational feasibility. Although chroGPS is a general methodology to create reference maps and study the epigenetic state of any class of genetic element or genomic region, we focus on two specific kinds of maps: chroGPSfactors, which visualizes functional similarities between epigenetic factors, and chroGPSgenes, which describes the epigenetic state of genes and integrates gene expression and other functional data. We use data from the modENCODE project on the genomic distribution of a large collection of epigenetic factors in Drosophila, a model system extensively used to study genome organization and function. Our results show that the maps allow straightforward visualization of relationships between factors and elements, capturing relevant information about their functional properties that helps to interpret epigenetic information in a functional context and derive testable hypotheses
A Distance-Based Test of Association Between Paired Heterogeneous Genomic Data
Due to rapid technological advances, a wide range of different measurements
can be obtained from a given biological sample including single nucleotide
polymorphisms, copy number variation, gene expression levels, DNA methylation
and proteomic profiles. Each of these distinct measurements provides the means
to characterize a certain aspect of biological diversity, and a fundamental
problem of broad interest concerns the discovery of shared patterns of
variation across different data types. Such data types are heterogeneous in the
sense that they represent measurements taken at very different scales or
described by very different data structures. We propose a distance-based
statistical test, the generalized RV (GRV) test, to assess whether there is a
common and non-random pattern of variability between paired biological
measurements obtained from the same random sample. The measurements enter the
test through distance measures which can be chosen to capture particular
aspects of the data. An approximate null distribution is proposed to compute
p-values in closed-form and without the need to perform costly Monte Carlo
permutation procedures. Compared to the classical Mantel test for association
between distance matrices, the GRV test has been found to be more powerful in a
number of simulation settings. We also report on an application of the GRV test
to detect biological pathways in which genetic variability is associated to
variation in gene expression levels in ovarian cancer samples, and present
results obtained from two independent cohorts
A comparative evaluation of dimensionality reduction methods on large-scale gene expression datasets
Accelerated similarity searching and clustering of large compound sets by geometric embedding and locality sensitive hashing
Motivation: Similarity searching and clustering of chemical compounds by structural similarities are important computational approaches for identifying drug-like small molecules. Most algorithms available for these tasks are limited by their speed and scalability, and cannot handle today's large compound databases with several million entries
A visual and curatorial approach to clinical variant prioritization and disease gene discovery in genome-wide diagnostics
Background: Genome-wide data are increasingly important in the clinical evaluation of human disease. However, the large number of variants observed in individual patients challenges the efficiency and accuracy of diagnostic review. Recent work has shown that systematic integration of clinical phenotype data with genotype information can improve diagnostic workflows and prioritization of filtered rare variants. We have developed visually interactive, analytically transparent analysis software that leverages existing disease catalogs, such as the Online Mendelian Inheritance in Man database (OMIM) and the Human Phenotype Ontology (HPO), to integrate patient phenotype and variant data into ranked diagnostic alternatives. Methods: Our tool, “OMIM Explorer” (http://www.omimexplorer.com), extends the biomedical application of semantic similarity methods beyond those reported in previous studies. The tool also provides a simple interface for translating free-text clinical notes into HPO terms, enabling clinical providers and geneticists to contribute phenotypes to the diagnostic process. The visual approach uses semantic similarity with multidimensional scaling to collapse high-dimensional phenotype and genotype data from an individual into a graphical format that contextualizes the patient within a low-dimensional disease map. The map proposes a differential diagnosis and algorithmically suggests potential alternatives for phenotype queries—in essence, generating a computationally assisted differential diagnosis informed by the individual’s personal genome. Visual interactivity allows the user to filter and update variant rankings by interacting with intermediate results. The tool also implements an adaptive approach for disease gene discovery based on patient phenotypes. Results: We retrospectively analyzed pilot cohort data from the Baylor Miraca Genetics Laboratory, demonstrating performance of the tool and workflow in the re-analysis of clinical exomes. Our tool assigned to clinically reported variants a median rank of 2, placing causal variants in the top 1 % of filtered candidates across the 47 cohort cases with reported molecular diagnoses of exome variants in OMIM Morbidmap genes. Our tool outperformed Phen-Gen, eXtasy, PhenIX, PHIVE, and hiPHIVE in the prioritization of these clinically reported variants. Conclusions: Our integrative paradigm can improve efficiency and, potentially, the quality of genomic medicine by more effectively utilizing available phenotype information, catalog data, and genomic knowledge
Molecular Predictors of 3D Morphogenesis by Breast Cancer Cell Lines in 3D Culture
Correlative analysis of molecular markers with phenotypic signatures is the simplest model for hypothesis generation. In this paper, a panel of 24 breast cell lines was grown in 3D culture, their morphology was imaged through phase contrast microscopy, and computational methods were developed to segment and represent each colony at multiple dimensions. Subsequently, subpopulations from these morphological responses were identified through consensus clustering to reveal three clusters of round, grape-like, and stellate phenotypes. In some cases, cell lines with particular pathobiological phenotypes clustered together (e.g., ERBB2 amplified cell lines sharing the same morphometric properties as the grape-like phenotype). Next, associations with molecular features were realized through (i) differential analysis within each morphological cluster, and (ii) regression analysis across the entire panel of cell lines. In both cases, the dominant genes that are predictive of the morphological signatures were identified. Specifically, PPARγ has been associated with the invasive stellate morphological phenotype, which corresponds to triple-negative pathobiology. PPARγ has been validated through two supporting biological assays
IRIS-EDA: An Integrated RNA-Seq Interpretation System for Gene Expression Data Analysis
Next-Generation Sequencing has made available substantial amounts of large-scale Omics data, providing unprecedented opportunities to understand complex biological systems. Specifically, the value of RNA-Sequencing (RNA-Seq) data has been confirmed in inferring how gene regulatory systems will respond under various conditions (bulk data) or cell types (single-cell data). RNA-Seq can generate genome-scale gene expression profiles that can be further analyzed using correlation analysis, co-expression analysis, clustering, differential gene expression (DGE), among many other studies. While these analyses can provide invaluable information related to gene expression, integration and interpretation of the results can prove challenging. Here we present a tool called IRIS-EDA, which is a Shiny web server for expression data analysis. It provides a straightforward and user-friendly platform for performing numerous computational analyses on user-provided RNA-Seq or Single-cell RNA-Seq (scRNA-Seq) data. Specifically, three commonly used R packages (edgeR, DESeq2, and limma) are implemented in the DGE analysis with seven unique experimental design functionalities, including a user-specified design matrix option. Seven discovery-driven methods and tools (correlation analysis, heatmap, clustering, biclustering, Principal Component Analysis (PCA), Multidimensional Scaling (MDS), and t-distributed Stochastic Neighbor Embedding (t-SNE)) are provided for gene expression exploration which is useful for designing experimental hypotheses and determining key factors for comprehensive DGE analysis. Furthermore, this platform integrates seven visualization tools in a highly interactive manner, for improved interpretation of the analyses. It is noteworthy that, for the first time, IRIS-EDA provides a framework to expedite submission of data and results to NCBI’s Gene Expression Omnibus following the FAIR (Findable, Accessible, Interoperable and Reusable) Data Principles. IRIS-EDA is freely available at http://bmbl.sdstate.edu/IRIS/
Structures in magnetohydrodynamic turbulence: detection and scaling
We present a systematic analysis of statistical properties of turbulent
current and vorticity structures at a given time using cluster analysis. The
data stems from numerical simulations of decaying three-dimensional (3D)
magnetohydrodynamic turbulence in the absence of an imposed uniform magnetic
field; the magnetic Prandtl number is taken equal to unity, and we use a
periodic box with grids of up to 1536^3 points, and with Taylor Reynolds
numbers up to 1100. The initial conditions are either an X-point configuration
embedded in 3D, the so-called Orszag-Tang vortex, or an
Arn'old-Beltrami-Childress configuration with a fully helical velocity and
magnetic field. In each case two snapshots are analyzed, separated by one
turn-over time, starting just after the peak of dissipation. We show that the
algorithm is able to select a large number of structures (in excess of 8,000)
for each snapshot and that the statistical properties of these clusters are
remarkably similar for the two snapshots as well as for the two flows under
study in terms of scaling laws for the cluster characteristics, with the
structures in the vorticity and in the current behaving in the same way. We
also study the effect of Reynolds number on cluster statistics, and we finally
analyze the properties of these clusters in terms of their velocity-magnetic
field correlation. Self-organized criticality features have been identified in
the dissipative range of scales. A different scaling arises in the inertial
range, which cannot be identified for the moment with a known self-organized
criticality class consistent with MHD. We suggest that this range can be
governed by turbulence dynamics as opposed to criticality, and propose an
interpretation of intermittency in terms of propagation of local instabilities.Comment: 17 pages, 9 figures, 5 table
- …