532 research outputs found
GIVE: portable genome browsers for personal websites.
Growing popularity and diversity of genomic data demand portable and versatile genome browsers. Here, we present an open source programming library called GIVE that facilitates the creation of personalized genome browsers without requiring a system administrator. By inserting HTML tags, one can add to a personal webpage interactive visualization of multiple types of genomics data, including genome annotation, "linear" quantitative data, and genome interaction data. GIVE includes a graphical interface called HUG (HTML Universal Generator) that automatically generates HTML code for displaying user chosen data, which can be copy-pasted into user's personal website or saved and shared with collaborators. GIVE is available at: https://www.givengine.org/
Revealing mammalian evolutionary relationships by comparative analysis of gene clusters
Many software tools for comparative analysis of genomic sequence data have been released in recent decades. Despite this, it remains challenging to determine evolutionary relationships in gene clusters due to their complex histories involving duplications, deletions, inversions, and conversions. One concept describing these relationships is orthology. Orthologs derive from a common ancestor by speciation, in contrast to paralogs, which derive from duplication. Discriminating orthologs from paralogs is a necessary step in most multispecies sequence analyses, but doing so accurately is impeded by the occurrence of gene conversion events. We propose a refined method of orthology assignment based on two paradigms for interpreting its definition: by genomic context or by sequence content. X-orthology (based on context) traces orthology resulting from speciation and duplication only, while N-orthology (based on content) includes the influence of conversion events
The Escherichia coli transcriptome mostly consists of independently regulated modules
Underlying cellular responses is a transcriptional regulatory network (TRN) that modulates gene expression. A useful description of the TRN would decompose the transcriptome into targeted effects of individual transcriptional regulators. Here, we apply unsupervised machine learning to a diverse compendium of over 250 high-quality Escherichia coli RNA-seq datasets to identify 92 statistically independent signals that modulate the expression of specific gene sets. We show that 61 of these transcriptomic signals represent the effects of currently characterized transcriptional regulators. Condition-specific activation of signals is validated by exposure of E. coli to new environmental conditions. The resulting decomposition of the transcriptome provides: a mechanistic, systems-level, network-based explanation of responses to environmental and genetic perturbations; a guide to gene and regulator function discovery; and a basis for characterizing transcriptomic differences in multiple strains. Taken together, our results show that signal summation describes the composition of a model prokaryotic transcriptome
Law of Genome Evolution Direction : Coding Information Quantity Grows
The problem of the directionality of genome evolution is studied. Based on
the analysis of C-value paradox and the evolution of genome size we propose
that the function-coding information quantity of a genome always grows in the
course of evolution through sequence duplication, expansion of code, and gene
transfer from outside. The function-coding information quantity of a genome
consists of two parts, p-coding information quantity which encodes functional
protein and n-coding information quantity which encodes other functional
elements except amino acid sequence. The evidences on the evolutionary law
about the function-coding information quantity are listed. The needs of
function is the motive force for the expansion of coding information quantity
and the information quantity expansion is the way to make functional innovation
and extension for a species. So, the increase of coding information quantity of
a genome is a measure of the acquired new function and it determines the
directionality of genome evolution.Comment: 16 page
Genetic determinants of co-accessible chromatin regions in activated T cells across humans.
Over 90% of genetic variants associated with complex human traits map to non-coding regions, but little is understood about how they modulate gene regulation in health and disease. One possible mechanism is that genetic variants affect the activity of one or more cis-regulatory elements leading to gene expression variation in specific cell types. To identify such cases, we analyzed ATAC-seq and RNA-seq profiles from stimulated primary CD4+ T cells in up to 105 healthy donors. We found that regions of accessible chromatin (ATAC-peaks) are co-accessible at kilobase and megabase resolution, consistent with the three-dimensional chromatin organization measured by in situ Hi-C in T cells. Fifteen percent of genetic variants located within ATAC-peaks affected the accessibility of the corresponding peak (local-ATAC-QTLs). Local-ATAC-QTLs have the largest effects on co-accessible peaks, are associated with gene expression and are enriched for autoimmune disease variants. Our results provide insights into how natural genetic variants modulate cis-regulatory elements, in isolation or in concert, to influence gene expression
Modeling associations between genetic markers using Bayesian networks
Motivation: Understanding the patterns of association between polymorphisms at different loci in a population (linkage disequilibrium, LD) is of fundamental importance in various genetic studies. Many coefficients were proposed for measuring the degree of LD, but they provide only a static view of the current LD structure. Generative models (GMs) were proposed to go beyond these measures, giving not only a description of the actual LD structure but also a tool to help understanding the process that generated such structure. GMs based in coalescent theory have been the most appealing because they link LD to evolutionary factors. Nevertheless, the inference and parameter estimation of such models is still computationally challenging
Modeling associations between genetic markers using Bayesian networks
Motivation: Understanding the patterns of association between polymorphisms at different loci in a population (linkage disequilibrium, LD) is of fundamental importance in various genetic studies. Many coefficients were proposed for measuring the degree of LD, but they provide only a static view of the current LD structure. Generative models (GMs) were proposed to go beyond these measures, giving not only a description of the actual LD structure but also a tool to help understanding the process that generated such structure. GMs based in coalescent theory have been the most appealing because they link LD to evolutionary factors. Nevertheless, the inference and parameter estimation of such models is still computationally challenging
A systematic, large-scale comparison of transcription factor binding site models
Background The modelling of gene regulation is a major challenge in biomedical
research. This process is dominated by transcription factors (TFs) and
mutations in their binding sites (TFBSs) may cause the misregulation of genes,
eventually leading to disease. The consequences of DNA variants on TF binding
are modelled in silico using binding matrices, but it remains unclear whether
these are capable of accurately representing in vivo binding. In this study,
we present a systematic comparison of binding models for 82 human TFs from
three freely available sources: JASPAR matrices, HT-SELEX-generated models and
matrices derived from protein binding microarrays (PBMs). We determined their
ability to detect experimentally verified “real” in vivo TFBSs derived from
ENCODE ChIP-seq data. As negative controls we chose random downstream exonic
sequences, which are unlikely to harbour TFBS. All models were assessed by
receiver operating characteristics (ROC) analysis. Results While the area-
under-curve was low for most of the tested models with only 47 % reaching a
score of 0.7 or higher, we noticed strong differences between the various
position-specific scoring matrices with JASPAR and HT-SELEX models showing
higher success rates than PBM-derived models. In addition, we found that while
TFBS sequences showed a higher degree of conservation than randomly chosen
sequences, there was a high variability between individual TFBSs. Conclusions
Our results show that only few of the matrix-based models used to predict
potential TFBS are able to reliably detect experimentally confirmed TFBS. We
compiled our findings in a freely accessible web application called ePOSSUM
(http:/mutationtaster.charite.de/ePOSSUM/) which uses a Bayes classifier to
assess the impact of genetic alterations on TF binding in user-defined
sequences. Additionally, ePOSSUM provides information on the reliability of
the prediction using our test set of experimentally confirmed binding sites
Chromatin loop anchors are associated with genome instability in cancer and recombination hotspots in the germline
Abstract Background Chromatin loops form a basic unit of interphase nuclear organization, with chromatin loop anchor points providing contacts between regulatory regions and promoters. However, the mutational landscape at these anchor points remains under-studied. Here, we describe the unusual patterns of somatic mutations and germline variation associated with loop anchor points and explore the underlying features influencing these patterns. Results Analyses of whole genome sequencing datasets reveal that anchor points are strongly depleted for single nucleotide variants (SNVs) in tumours. Despite low SNV rates in their genomic neighbourhood, anchor points emerge as sites of evolutionary innovation, showing enrichment for structural variant (SV) breakpoints and a peak of SNVs at focal CTCF sites within the anchor points. Both CTCF-bound and non-CTCF anchor points harbour an excess of SV breakpoints in multiple tumour types and are prone to double-strand breaks in cell lines. Common fragile sites, which are hotspots for genome instability, also show elevated numbers of intersecting loop anchor points. Recurrently disrupted anchor points are enriched for genes with functions in cell cycle transitions and regions associated with predisposition to cancer. We also discover a novel class of CTCF-bound anchor points which overlap meiotic recombination hotspots and are enriched for the core PRDM9 binding motif, suggesting that the anchor points have been foci for diversity generated during recent human evolution. Conclusions We suggest that the unusual chromatin environment at loop anchor points underlies the elevated rates of variation observed, marking them as sites of regulatory importance but also genomic fragility
- …
