22,868 research outputs found
Comparative analysis and visualization of multiple collinear genomes
Abstract Background Genome browsers are a common tool used by biologists to visualize genomic features including genes, polymorphisms, and many others. However, existing genome browsers and visualization tools are not well-suited to perform meaningful comparative analysis among a large number of genomes. With the increasing quantity and availability of genomic data, there is an increased burden to provide useful visualization and analysis tools for comparison of multiple collinear genomes such as the large panels of model organisms which are the basis for much of the current genetic research. Results We have developed a novel web-based tool for visualizing and analyzing multiple collinear genomes. Our tool illustrates genome-sequence similarity through a mosaic of intervals representing local phylogeny, subspecific origin, and haplotype identity. Comparative analysis is facilitated through reordering and clustering of tracks, which can vary throughout the genome. In addition, we provide local phylogenetic trees as an alternate visualization to assess local variations. Conclusions Unlike previous genome browsers and viewers, ours allows for simultaneous and comparative analysis. Our browser provides intuitive selection and interactive navigation about features of interest. Dynamic visualizations adjust to scale and data content making analysis at variable resolutions and of multiple data sets more informative. We demonstrate our genome browser for an extensive set of genomic data sets composed of almost 200 distinct mouse laboratory strains
Visualization methods for statistical analysis of microarray clusters
BACKGROUND: The most common method of identifying groups of functionally related genes in microarray data is to apply a clustering algorithm. However, it is impossible to determine which clustering algorithm is most appropriate to apply, and it is difficult to verify the results of any algorithm due to the lack of a gold-standard. Appropriate data visualization tools can aid this analysis process, but existing visualization methods do not specifically address this issue. RESULTS: We present several visualization techniques that incorporate meaningful statistics that are noise-robust for the purpose of analyzing the results of clustering algorithms on microarray data. This includes a rank-based visualization method that is more robust to noise, a difference display method to aid assessments of cluster quality and detection of outliers, and a projection of high dimensional data into a three dimensional space in order to examine relationships between clusters. Our methods are interactive and are dynamically linked together for comprehensive analysis. Further, our approach applies to both protein and gene expression microarrays, and our architecture is scalable for use on both desktop/laptop screens and large-scale display devices. This methodology is implemented in GeneVAnD (Genomic Visual ANalysis of Datasets) and is available at . CONCLUSION: Incorporating relevant statistical information into data visualizations is key for analysis of large biological datasets, particularly because of high levels of noise and the lack of a gold-standard for comparisons. We developed several new visualization techniques and demonstrated their effectiveness for evaluating cluster quality and relationships between clusters
Kerfuffle: a web tool for multi-species gene colocalization analysis
The evolutionary pressures that underlie the large-scale functional
organization of the genome are not well understood in eukaryotes. Recent
evidence suggests that functionally similar genes may colocalize (cluster) in
the eukaryotic genome, suggesting the role of chromatin-level gene regulation
in shaping the physical distribution of coordinated genes. However, few of the
bioinformatic tools currently available allow for a systematic study of gene
colocalization across several, evolutionarily distant species. Kerfuffle is a
web tool designed to help discover, visualize, and quantify the physical
organization of genomes by identifying significant gene colocalization and
conservation across the assembled genomes of available species (currently up to
47, from humans to worms). Kerfuffle only requires the user to specify a list
of human genes and the names of other species of interest. Without further
input from the user, the software queries the e!Ensembl BioMart server to
obtain positional information and discovers homology relations in all genes and
species specified. Using this information, Kerfuffle performs a multi-species
clustering analysis, presents downloadable lists of clustered genes, performs
Monte Carlo statistical significance calculations, estimates how conserved gene
clusters are across species, plots histograms and interactive graphs, allows
users to save their queries, and generates a downloadable visualization of the
clusters using the Circos software. These analyses may be used to further
explore the functional roles of gene clusters by interrogating the enriched
molecular pathways associated with each cluster.Comment: BMC Bioinformatics, In pres
Detection of Epigenomic Network Community Oncomarkers
In this paper we propose network methodology to infer prognostic cancer
biomarkers based on the epigenetic pattern DNA methylation. Epigenetic
processes such as DNA methylation reflect environmental risk factors, and are
increasingly recognised for their fundamental role in diseases such as cancer.
DNA methylation is a gene-regulatory pattern, and hence provides a means by
which to assess genomic regulatory interactions. Network models are a natural
way to represent and analyse groups of such interactions. The utility of
network models also increases as the quantity of data and number of variables
increase, making them increasingly relevant to large-scale genomic studies. We
propose methodology to infer prognostic genomic networks from a DNA
methylation-based measure of genomic interaction and association. We then show
how to identify prognostic biomarkers from such networks, which we term
`network community oncomarkers'. We illustrate the power of our proposed
methodology in the context of a large publicly available breast cancer dataset
Genomic, Pathway Network, and Immunologic Features Distinguishing Squamous Carcinomas
This integrated, multiplatform PanCancer Atlas study co-mapped and identified distinguishing
molecular features of squamous cell carcinomas (SCCs) from five sites associated with smokin
An investigation into inter- and intragenomic variations of graphic genomic signatures
We provide, on an extensive dataset and using several different distances,
confirmation of the hypothesis that CGR patterns are preserved along a genomic
DNA sequence, and are different for DNA sequences originating from genomes of
different species. This finding lends support to the theory that CGRs of
genomic sequences can act as graphic genomic signatures. In particular, we
compare the CGR patterns of over five hundred different 150,000 bp genomic
sequences originating from the genomes of six organisms, each belonging to one
of the kingdoms of life: H. sapiens, S. cerevisiae, A. thaliana, P. falciparum,
E. coli, and P. furiosus. We also provide preliminary evidence of this method's
applicability to closely related species by comparing H. sapiens (chromosome
21) sequences and over one hundred and fifty genomic sequences, also 150,000 bp
long, from P. troglodytes (Animalia; chromosome Y), for a total length of more
than 101 million basepairs analyzed. We compute pairwise distances between CGRs
of these genomic sequences using six different distances, and construct
Molecular Distance Maps that visualize all sequences as points in a
two-dimensional or three-dimensional space, to simultaneously display their
interrelationships. Our analysis confirms that CGR patterns of DNA sequences
from the same genome are in general quantitatively similar, while being
different for DNA sequences from genomes of different species. Our analysis of
the performance of the assessed distances uses three different quality measures
and suggests that several distances outperform the Euclidean distance, which
has so far been almost exclusively used for such studies. In particular we show
that, for this dataset, DSSIM (Structural Dissimilarity Index) and the
descriptor distance (introduced here) are best able to classify genomic
sequences.Comment: 14 pages, 6 figures, 5 table
Principles of meiotic chromosome assembly revealed in S. cerevisiae
During meiotic prophase, chromosomes organise into a series of chromatin loops emanating from a proteinaceous axis, but the mechanisms of assembly remain unclear. Here we use Saccharomyces cerevisiae to explore how this elaborate three-dimensional chromosome organisation is linked to genomic sequence. As cells enter meiosis, we observe that strong cohesin-dependent grid-like Hi-C interaction patterns emerge, reminiscent of mammalian interphase organisation, but with distinct regulation. Meiotic patterns agree with simulations of loop extrusion with growth limited by barriers, in which a heterogeneous population of expanding loops develop along the chromosome. Importantly, CTCF, the factor that imposes similar features in mammalian interphase, is absent in S. cerevisiae, suggesting alternative mechanisms of barrier formation. While grid-like interactions emerge independently of meiotic chromosome synapsis, synapsis itself generates additional compaction that matures differentially according to telomere proximity and chromosome size. Collectively, our results elucidate fundamental principles of chromosome assembly and demonstrate the essential role of cohesin within this evolutionarily conserved process
- …