Search CORE

59 research outputs found

GC skew is a conserved property of unmethylated CpG island promoters across vertebrates.

Author: Chédin Frédéric
Hartono Stella R
Korf Ian F
Publication venue: eScholarship, University of California
Publication date: 07/08/2015
Field of study

GC skew is a measure of the strand asymmetry in the distribution of guanines and cytosines. GC skew favors R-loops, a type of three stranded nucleic acid structures that form upon annealing of an RNA strand to one strand of DNA, creating a persistent RNA:DNA hybrid. Previous studies show that GC skew is prevalent at thousands of human CpG island (CGI) promoters and transcription termination regions, which correspond to hotspots of R-loop formation. Here, we investigated the conservation of GC skew patterns in 60 sequenced chordates genomes. We report that GC skew is a conserved sequence characteristic of the CGI promoter class in vertebrates. Furthermore, we reveal that promoter GC skew peaks at the exon 1/ intron1 junction and that it is highly correlated with gene age and CGI promoter strength. Our data also show that GC skew is predictive of unmethylated CGI promoters in a range of vertebrate species and that it imparts significant DNA hypomethylation for promoters with intermediate CpG densities. Finally, we observed that terminal GC skew is conserved for a subset of vertebrate genes that tend to be located significantly closer to their downstream neighbors, consistent with a role for R-loop formation in transcription termination

Crossref

PubMed Central

eScholarship - University of California

Comparative Analysis of Tandem Repeats from Hundreds of Species Reveals Unique Insights into Centromere Evolution

Author: Bradnam Keith R.
Chan Simon W. -L.
DeRisi Joseph L.
Eid John
Garcia José Fernando
Korf Ian F.
May Michael R.
Melters Daniël P.
Peluso Paul
Rank David
Ross-Ibarra Jeffrey
Ruby J. Graham
Sebra Robert
Smith Timothy
Telis Natalie
Tobias Christian
Young Hugh A.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/09/2012
Field of study

Centromeres are essential for chromosome segregation, yet their DNA sequences evolve rapidly. In most animals and plants that have been studied, centromeres contain megabase-scale arrays of tandem repeats. Despite their importance, very little is known about the degree to which centromere tandem repeats share common properties between different species across different phyla. We used bioinformatic methods to identify high-copy tandem repeats from 282 species using publicly available genomic sequence and our own data. The assumption that the most abundant tandem repeat is the centromere DNA was true for most species whose centromeres have been previously characterized, suggesting this is a general property of genomes. Our methods are compatible with all current sequencing technologies. Long Pacific Biosciences sequence reads allowed us to find tandem repeat monomers up to 1,419 bp. High-copy centromere tandem repeats were found in almost all animal and plant genomes, but repeat monomers were highly variable in sequence composition and in length. Furthermore, phylogenetic analysis of sequence homology showed little evidence of sequence conservation beyond ~50 million years of divergence. We find that despite an overall lack of sequence conservation, centromere tandem repeats from diverse species showed similar modes of evolution, including the appearance of higher order repeat structures in which several polymorphic monomers make up a larger repeating unit. While centromere position in most eukaryotes is epigenetically determined, our results indicate that tandem repeats are highly prevalent at centromeres of both animals and plants. This suggests a functional role for such repeats, perhaps in promoting concerted evolution of centromere DNA across chromosomes

arXiv.org e-Print Archive

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

Sequencing the transcriptome of milk production: milk trumps mammary tissue

Author: Barry Peter A
German J Bruce
Hartono Stella R
Hinde Katie
Hovey Russell C
Islas-Trejo Alma
Korf Ian
Lee Joyce WS
Lemay Danielle G
Medrano Juan F
Schmidt Kimberli A
Silva Pedro Ivo
Smilowitz Jennifer T
Ventimiglia Frank
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Background: Studies of normal human mammary gland development and function have mostly relied on cell culture, limited surgical specimens, and rodent models. Although RNA extracted from human milk has been used to assay the mammary transcriptome non-invasively, this assay has not been adequately validated in primates. Thus, the objectives of the current study were to assess the suitability of lactating rhesus macaques as a model for lactating humans and to determine whether RNA extracted from milk fractions is representative of RNA extracted from mammary tissue for the purpose of studying the transcriptome of milk-producing cells. Results: We confirmed that macaque milk contains cytoplasmic crescents and that ample high-quality RNA can be obtained for sequencing. Using RNA sequencing, RNA extracted from macaque milk fat and milk cell fractions more accurately represented RNA from mammary epithelial cells (cells that produce milk) than did RNA from whole mammary tissue. Mammary epithelium-specific transcripts were more abundant in macaque milk fat, whereas adipose or stroma-specific transcripts were more abundant in mammary tissue. Functional analyses confirmed the validity of milk as a source of RNA from milk-producing mammary epithelial cells. Conclusions: RNA extracted from the milk fat during lactation accurately portrayed the RNA profile of milk-producing mammary epithelial cells in a non-human primate. However, this sample type clearly requires protocols that minimize RNA degradation. Overall, we validated the use of RNA extracted from human and macaque milk and provided evidence to support the use of lactating macaques as a model for human lactation

Crossref

Harvard University - DASH

PubMed Central

eScholarship - University of California

Sorghum Genome Sequencing by Methylation Filtration

Sorghum bicolor is a close relative of maize and is a staple crop in Africa and much of the developing world because of its superior tolerance of arid growth conditions. We have generated sequence from the hypomethylated portion of the sorghum genome by applying methylation filtration (MF) technology. The evidence suggests that 96% of the genes have been sequence tagged, with an average coverage of 65% across their length. Remarkably, this level of gene discovery was accomplished after generating a raw coverage of less than 300 megabases of the 735-megabase genome. MF preferentially captures exons and introns, promoters, microRNAs, and simple sequence repeats, and minimizes interspersed repeats, thus providing a robust view of the functional parts of the genome. The sorghum MF sequence set is beneficial to research on sorghum and is also a powerful resource for comparative genomics among the grasses and across the entire plant kingdom. Thousands of hypothetical gene predictions in rice and Arabidopsis are supported by the sorghum dataset, and genomic similarities highlight evolutionarily conserved regions that will lead to a better understanding of rice and Arabidopsis

CiteSeerX

Public Library of Science (PLOS)

Crossref

Cold Spring Harbor Laboratory Institutional Repository

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

SHAREOK repository

The Francis Crick Institute

Characterization of the Contradictory Chromatin Signatures at the 3′ Exons of Zinc Finger Genes

Author: A Rabinovich
A Verdel
AH Lund
Anton Wutz
AP Bracken
AV Ivanov
BE Bernstein
CR Vakoc
D Crews
DR Grayson
Erica Sanchez
F Miao
G Robertson
GJ Dennis
H O'Geen
Henriette O'Geen
Ian Korf
JH Martens
JK Wiencke
JM Lin
Joseph F. Costello
JW Edmunds
K Iwamoto
Kimberly R. Blahnik
KR Blahnik
L Ellis
Lei Dou
Lorigail Echipare
M Hampsey
M Zaratiegui
Marco A. Marra
Martin Hirst
MM McCarthy
PA Jones
Peggy J. Farnham
RA Harris
S Frietze
S Frietze
SL Squazzo
SP Sripathy
Sushma Iyengar
TI Lee
TS Mikkelsen
Y Jiang
Yongjun Zhao
Publication venue: Public Library of Science
Publication date: 01/02/2011
Field of study

The H3K9me3 histone modification is often found at promoter regions, where it functions to repress transcription. However, we have previously shown that 3′ exons of zinc finger genes (ZNFs) are marked by high levels of H3K9me3. We have now further investigated this unusual location for H3K9me3 in ZNF genes. Neither bioinformatic nor experimental approaches support the hypothesis that the 3′ exons of ZNFs are promoters. We further characterized the histone modifications at the 3′ ZNF exons and found that these regions also contain H3K36me3, a mark of transcriptional elongation. A genome-wide analysis of ChIP-seq data revealed that ZNFs constitute the majority of genes that have high levels of both H3K9me3 and H3K36me3. These results suggested the possibility that the ZNF genes may be imprinted, with one allele transcribed and one allele repressed. To test the hypothesis that the contradictory modifications are due to imprinting, we used a SNP analysis of RNA-seq data to demonstrate that both alleles of certain ZNF genes having H3K9me3 and H3K36me3 are transcribed. We next analyzed isolated ZNF 3′ exons using stably integrated episomes. We found that although the H3K36me3 mark was lost when the 3′ ZNF exon was removed from its natural genomic location, the isolated ZNF 3′ exons retained the H3K9me3 mark. Thus, the H3K9me3 mark at ZNF 3′ exons does not impede transcription and it is regulated independently of the H3K36me3 mark. Finally, we demonstrate a strong relationship between the number of tandemly repeated domains in the 3′ exons and the H3K9me3 mark. We suggest that the H3K9me3 at ZNF 3′ exons may function to protect the genome from inappropriate recombination rather than to regulate transcription

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Exploring concordance and discordance for return of incidental findings from clinical sequencing

Author: Berg Jonathan S.
Berry Gerard T.
Biesecker Leslie G.
Dimmock David P.
Evans James P.
Green Robert C.
Grody Wayne W.
Hegde Madhuri R.
Jacob Howard J.
Kalia Sarah
Korf Bruce R.
Krantz Ian
McGuire Amy L.
Miller David T.
Murray Michael F.
Nussbaum Robert L.
Plon Sharon E.
Rehm Heidi L.
Publication venue
Publication date: 01/01/2012
Field of study

To explore specific conditions and types of genetic variants that specialists in genetics recommend should be returned as incidental findings in clinical sequencing

PubMed Central

Carolina Digital Repository

Guiding the Design of Synthetic DNA-Binding Molecules with Massively Parallel Sequencing

Genomic applications of DNA-binding molecules require an unbiased knowledge of their high affinity sites. We report the high-throughput analysis of pyrrole-imidazole polyamide DNA-binding specificity in a 10^(12)-member DNA sequence library using affinity purification coupled with massively parallel sequencing. We find that even within this broad context, the canonical pairing rules are remarkably predictive of polyamide DNA-binding specificity. However, this approach also allows identification of unanticipated high affinity DNA-binding sites in the reverse orientation for polyamides containing β/Im pairs. These insights allow the redesign of hairpin polyamides with different turn units capable of distinguishing 5′-WCGCGW-3′ from 5′-WGCGCW-3′. Overall, this study displays the power of high-throughput methods to aid the optimal targeting of sequence-specific minor groove binding molecules, an essential underpinning for biological and nanotechnological applications

Crossref

PubMed Central

eScholarship - University of California

Caltech Authors

Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species

Author: \uc9l\ue9nie Godzaridis
Adam M. Phillippy
Alexey Sergushichev
Anton Alexandrov
Benedict Paten
Binghang Liu
Bruno M. Vieira
Carson Qu
Daniel S. Rokhsar
Dariusz Przybylski
David B. Jaffe
David C. Schwartz
David Haussler
DEL FABBRO Cristian
Delphine Naquin
Dent Earl
Dominique Lavenier
Erich D. Jarvis
Fedor Tsarev
Filipe J. Ribeiro
Fran\ue7ois Laviolette
Francisco Pina Martins
Ganeshkumar Ganapathy
Giles Hall
Guillaume Chapuis
Guojie Zhang
Hamidreza Chitsaz
Hao Zhang
Henry Song
Huaiyang Jiang
Iain Maccallum
Ian F. Korf
Inan\ue7 Birol
Isaac Y. Ho
J. Ruby
Jacob O. Kitzman
Jacques Corbeil
James R. Knight
Jared T. Simpson
Jarrod A. Chapman
Jason Howard
Jay Shendure
Jianying Yuan
Joseph B. Hiatt
Joseph N. Fass
Jun Wang
Keith R. Bradnam
Kim C. Worley
Martin Hunt
Matthew D. Macmanes
Matthias Haimel
Michael C. Schatz
Michael Bechner
Michael Place
Nicolas Maillet
Nuno A. Fonseca
Oct\ue1vio S. Paulo
Paul J. Kersey
Paul Baranay
Pavel Fedotov
Rayan Chikhi
Richard A. Gibbs
Richard Durbin
Ruibang Luo
S\ue9bastien Boisvert
Sante Gnerre
Scalabrin Simone
Scott Emrich
Sergey Kazakov
Sergey Koren
Sergey Melnikov
Shaun D. Jackman
Shiguo Zhou
Shuangye Yin
Siu Ming Yiu
Stephen Richards
Steve Goldstein
T. Docking
Tak Wah Lam
Ted Sharpe
Thomas D. Otto
Timothy I. Shaw
Vezzi Francesco
Vicedomini Riccardo
Wen Chi Chou
Xiang Qin
Yingrui Li
Yue Liu
Yujian Shi
Zemin Ning
Zhenyu Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Background: The process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly of such data into high-quality, finished genome sequences remains challenging. Many genome assembly tools are available, but they differ greatly in terms of their performance (speed, scalability, hardware requirements, acceptance of newer read technologies) and in their final output (composition of assembled sequence). More importantly, it remains largely unclear how to best assess the quality of assembled genome sequences. The Assemblathon competitions are intended to assess current state-of-the-art methods in genome assembly. Results: In Assemblathon 2, we provided a variety of sequence data to be assembled for three vertebrate species (a bird, a fish, and snake). This resulted in a total of 43 submitted assemblies from 21 participating teams. We evaluated these assemblies using a combination of optical map data, Fosmid sequences, and several statistical methods. From over 100 different metrics, we chose ten key measures by which to assess the overall quality of the assemblies. Conclusions: Many current genome assemblers produced useful assemblies, containing a significant representation of their genes and overall genome structure. However, the high degree of variability between the entries suggests that there is still much room for improvement in the field of genome assembly and that approaches which work well in assembling the genome of one species may not necessarily work well for another

Archivio istituzionale della ricerca - Università degli Studi di Udine