27 research outputs found
Application of motif scoring algorithms for enhancer prediction in distantly related species
Although many studies proposed methods for the identification of enhancers, reliable prediction on a genome-wide scale is still an unsolved problem. One of the reasons for this is the highly flexible regulatory logic underlying a detectable enhancer activity. In each cell type or tissue and at any given time, a mostly unknown set of transcription factors activates specific regulatory elements by coordinated binding to the corresponding genomic region. Position, spacing, and orientation of the individual bound factors can thereby vary between different enhancers yet result in a highly similar spatio-temporal activity. Due to this inner flexibility, so-called “alignment-free” methods have been proposed for enhancer prediction, as they are able to cope with rearrangements by comparison of word profiles rather than linear sequence. However, the problems caused by allowing for permutation in sequence comparison have not been investigated so far. In this study I implemented several published alignment-free metrics and analysed, which parameters affect their ability to successfully predict regulatory regions. As results show, single point mutations and the increasing amount of spurious matches with decreasing word size pose the biggest challenge to alignment-free techniques, especially when applied on a genome-wide scale. Alignment algorithms usually solve these problems quite efficiently but cannot handle permutation. I therefore implemented a new technique for enhancer prediction that combines the advantages of both algorithm types and used it for the identification of regulatory regions in the teleost fish Oryzias latipes (Medaka) based on a set of known and validated human enhancers. Predicted medaka regions and human enhancers were subsequently used in an in vivo enhancer assay and analysed for their activity. In total, 12 predicted regions corresponding to 9 human enhancers showed clear enhancing activity in the fish. This shows that the principle implemented here is able to predict active enhancers at a high rate on a genome-wide scale even in species as diverged as human and fish. Furthermore, evidence for motif-level conservation between some of the human and medaka enhancers could be found that was invisible for most of the alignment-algorithms used for comparison
Using reference-free compressed data structures to analyze sequencing reads from thousands of human genomes.
We are rapidly approaching the point where we have sequenced millions of human genomes. There is a pressing need for new data structures to store raw sequencing data and efficient algorithms for population scale analysis. Current reference-based data formats do not fully exploit the redundancy in population sequencing nor take advantage of shared genetic variation. In recent years, the Burrows-Wheeler transform (BWT) and FM-index have been widely employed as a full-text searchable index for read alignment and de novo assembly. We introduce the concept of a population BWT and use it to store and index the sequencing reads of 2705 samples from the 1000 Genomes Project. A key feature is that, as more genomes are added, identical read sequences are increasingly observed, and compression becomes more efficient. We assess the support in the 1000 Genomes read data for every base position of two human reference assembly versions, identifying that 3.2 Mbp with population support was lost in the transition from GRCh37 with 13.7 Mbp added to GRCh38. We show that the vast majority of variant alleles can be uniquely described by overlapping 31-mers and show how rapid and accurate SNP and indel genotyping can be carried out across the genomes in the population BWT. We use the population BWT to carry out nonreference queries to search for the presence of all known viral genomes and discover human T-lymphotropic virus 1 integrations in six samples in a recognized epidemiological distribution
Genomic and Phenotypic Characterization of a Wild Medaka Population : Towards the Establishment of an Isogenic Population Genetic Resource in Fish
Oryzias latipes (medaka) has been established as a vertebrate genetic model for more than a century and recently has been rediscovered outside its native Japan. The power of new sequencing methods now makes it possible to reinvigorate medaka genetics, in particular by establishing a near-isogenic panel derived from a single wild population. Here we characterize the genomes of wild medaka catches obtained from a single Southern Japanese population in Kiyosu as a precursor for the establishment of a near-isogenic panel of wild lines. The population is free of significant detrimental population structure and has advantageous linkage disequilibrium properties suitable for the establishment of the proposed panel. Analysis of morphometric traits in five representative inbred strains suggests phenotypic mapping will be feasible in the panel. In addition, high-throughput genome sequencing of these medaka strains confirms their evolutionary relationships on lines of geographic separation and provides further evidence that there has been little significant interbreeding between the Southern and Northern medaka population since the Southern/Northern population split. The sequence data suggest that the Southern Japanese medaka existed as a larger older population that went through a relatively recent bottleneck approximately 10,000 years ago. In addition, we detect patterns of recent positive selection in the Southern population. These data indicate that the genetic structure of the Kiyosu medaka samples is suitable for the establishment of a vertebrate near-isogenic panel and therefore inbreeding of 200 lines based on this population has commenced. Progress of this project can be tracked at http://www.ebi.ac.uk/birney-srv/medaka-ref-panel
The Light Responsive Transcriptome of the Zebrafish: Function and Regulation
Most organisms possess circadian clocks that are able to anticipate the day/night cycle and are reset or “entrained” by the ambient light. In the zebrafish, many organs and even cultured cell lines are directly light responsive, allowing for direct entrainment of the clock by light. Here, we have characterized light induced gene transcription in the zebrafish at several organizational levels. Larvae, heart organ cultures and cell cultures were exposed to 1- or 3-hour light pulses, and changes in gene expression were compared with controls kept in the dark. We identified 117 light regulated genes, with the majority being induced and some repressed by light. Cluster analysis groups the genes into five major classes that show regulation at all levels of organization or in different subset combinations. The regulated genes cover a variety of functions, and the analysis of gene ontology categories reveals an enrichment of genes involved in circadian rhythms, stress response and DNA repair, consistent with the exposure to visible wavelengths of light priming cells for UV-induced damage repair. Promoter analysis of the induced genes shows an enrichment of various short sequence motifs, including E- and D-box enhancers that have previously been implicated in light regulation of the zebrafish period2 gene. Heterologous reporter constructs with sequences matching these motifs reveal light regulation of D-box elements in both cells and larvae. Morpholino-mediated knock-down studies of two homologues of the D-box binding factor Tef indicate that these are differentially involved in the cell autonomous light induction in a gene-specific manner. These findings suggest that the mechanisms involved in period2 regulation might represent a more general pathway leading to light induced gene expression
Sixteen diverse laboratory mouse reference genomes define strain-specific haplotypes and novel functional loci.
We report full-length draft de novo genome assemblies for 16 widely used inbred mouse strains and find extensive strain-specific haplotype variation. We identify and characterize 2,567 regions on the current mouse reference genome exhibiting the greatest sequence diversity. These regions are enriched for genes involved in pathogen defence and immunity and exhibit enrichment of transposable elements and signatures of recent retrotransposition events. Combinations of alleles and genes unique to an individual strain are commonly observed at these loci, reflecting distinct strain phenotypes. We used these genomes to improve the mouse reference genome, resulting in the completion of 10 new gene structures. Also, 62 new coding loci were added to the reference genome annotation. These genomes identified a large, previously unannotated, gene (Efcab3-like) encoding 5,874 amino acids. Mutant Efcab3-like mice display anomalies in multiple brain regions, suggesting a possible role for this gene in the regulation of brain development
Können auch Bilder erzählen? : Visualität und Narrativität im Comic
Comics sind ein überaus beliebtes Genre, vielleicht mehr denn je. Manga, aber auch Graphic Novels haben heute in jedem Buchladen ihre eigenen Regale. Aber worum handelt es sich eigentlich: um Bilder, die mit Text ergänzt werden, oder vice versa? Lesen wir oder schauen wir Comics, und warum lohnt es sich, dieses Misch-Genre zu erforschen? Darüber hat Dirk Frank mit Bernd Dolle-Weinkauff, Literaturwissenschaftler und Comic-Experte am Institut für Jugendbuchforschung, gesprochen
Correction: Handling Permutation in Sequence Comparison: Genome-Wide Enhancer Prediction in Vertebrates by a Novel Non-Linear Alignment Scoring Principle.
[This corrects the article DOI: 10.1371/journal.pone.0141487.]