Search CORE

243 research outputs found

Mobile elements: Drivers of genome evolution.

Author: Haig H Kazazian Jr
Publication venue
Publication date: 01/01/2004
Field of study

Establishing the baseline level of repetitive element expression in the human cortex

Author: B Conrad
B Langmead
BT Wilhelm
C Ladd-Acosta
C Nellaker
D Karolchik
DE Montoya-Durango
E Balada
GF Richard
GJ Faulkner
HH Kazazian
HH Kazazian
JA Armour
Jennifer Parla
JL Weber
JR Landry
JR Landry
JR Landry
M Barak
M Lafon
Melissa Kramer
O Frank
P Jern
PA Callinan
R Cordaux
R Lower
Robert H Yolken
RS Harris
S Mi
S Weis
Sarah J Wheelan
Sarven Sabunciyan
SJ Wheelan
Svitlana Tyekucheva
W Richard McCombie
WA Schulz
YHY Benjamini
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Background: Although nearly half of the human genome is comprised of repetitive sequences, the expression profile of these elements remains largely uncharacterized. Recently developed high throughput sequencing technologies provide us with a powerful new set of tools to study repeat elements. Hence, we performed whole transcriptome sequencing to investigate the expression of repetitive elements in human frontal cortex using postmortem tissue obtained from the Stanley Medical Research Institute. Results: We found a significant amount of reads from the human frontal cortex originate from repeat elements. We also noticed that Alu elements were expressed at levels higher than expected by random or background transcription. In contrast, L1 elements were expressed at lower than expected amounts. Conclusions: Repetitive elements are expressed abundantly in the human brain. This expression pattern appears to be element specific and can not be explained by random or background transcription. These results demonstrate that our knowledge about repetitive elements is far from complete. Further characterization is required to determine the mechanism, the control, and the effects of repeat element expression

Crossref

Cold Spring Harbor Laboratory Institutional Repository

Springer - Publisher Connector

PubMed Central

Enrichment analysis of Alu elements with different spatial chromatin proximity in the human genome

Author: A Antonaki
A Huda
A Nekrutenko
A Smallwood
AF Smit
AM Deaton
C Esnault
C Feschotte
CB Lowe
CT Ong
D Grover
D Grover
D Schmidt
D Xie
E Berezikov
E Lieberman-Aiden
E Wit de
E Yaffe
EP Nora
ES Lander
ES Lander
F Cui
G Bourque
G Kunarso
G Li
G Li
GA Maston
GJ Faulkner
GN Gallus
H Santos-Rosa
H Santos-Rosa
H Xie
HH Kazazian Jr
IK Jordan
J Banerji
J Dekker
J Dostie
J Jurka
J Jurka
J Ule
JA Yoder
JE Hambor
JF Brookfield
JF Brookfield
JM Chen
JR Dixon
JR Korenberg
K Ahn
K Kaer
KC Wang
L Lin
L Teng
M Hackenberg
M Simonis
M Weber
MA Batzer
MG Kidwell
MH Kagey
MJ Fullwood
MM Suzuki
ND Heintzman
NR Smalheiser
P Jin
P Medstrand
P Polak
R Cordaux
R Eskeland
R Lister
R Schneider
R Sorek
RD Hawkins
S Shen
S Winkler
SD Gillies
SL Oei
T Pastor
T Wicker
V Kapitonov
VJ Lynch
WD Gifford
Y Lu
Y Quentin
Y Quentin
Y Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Transposable elements (TEs) have no longer been totally considered as “junk DNA” for quite a time since the continual discoveries of their multifunctional roles in eukaryote genomes. As one of the most important and abundant TEs that still active in human genome, Alu, a SINE family, has demonstrated its indispensable regulatory functions at sequence level, but its spatial roles are still unclear. Technologies based on 3C(chromosomeconformation capture) have revealed the mysterious three-dimensional structure of chromatin, and make it possible to study the distal chromatin interaction in the genome. To find the role TE playing in distal regulation in human genome, we compiled the new released Hi-C data, TE annotation, histone marker annotations, and the genome-wide methylation data to operate correlation analysis, and found that the density of Alu elements showed a strong positive correlation with the level of chromatin interactions (hESC: r=0.9, P<2.2×1016; IMR90 fibroblasts: r = 0.94, P < 2.2 × 1016) and also have a significant positive correlation withsomeremote functional DNA elements like enhancers and promoters (Enhancer: hESC: r=0.997, P=2.3×10−4; IMR90: r=0.934, P=2×10−2; Promoter: hESC: r = 0.995, P = 3.8 × 10−4; IMR90: r = 0.996, P = 3.2 × 10−4). Further investigation involving GC content and methylation status showed the GC content of Alu covered sequences shared a similar pattern with that of the overall sequence, suggesting that Alu elements also function as the GC nucleotide and CpG site provider. In all, our results suggest that the Alu elements may act as an alternative parameter to evaluate the Hi-C data, which is confirmed by the correlation analysis of Alu elements and histone markers. Moreover, the GC-rich Alu sequence can bring high GC content and methylation flexibility to the regions with more distal chromatin contact, regulating the transcription of tissue-specific genes

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

University of Bedfordshire Repository

Transposon Excision from an Atypical Site: A Mechanism of Evolution of Novel Transposable Elements

Author: A Ray
Animesh Ray
B McClintock
Brian Dilkes
CF Weil
CM Rommens
CMW Fradkin
DB Roth
DR Page
EA Van der Biezen
EE Eichler
ES Coen
F McBlane
F Ros
GB Gloor
H Fu
H Saedler
H Yao
HA Becker
HH Kazazian Jr
HK Dooner
HK Dooner
J Yu
JW Szostak
L Bai
L Scott
L Zhou
LJ Conrad
Lynn F. Sniderhan
Marybeth Langer
NV Fedoroff
SE Fischer
SI Wright
T Golden
TA Rinehart
U Grossniklaus
Ueli Grossniklaus
V Gorbunova
V Sundaresan
WR Engels
X Yan
YG Liu
YL Xiao
YL Xiao
Publication venue: Public Library of Science
Publication date: 01/10/2007
Field of study

The role of transposable elements in sculpting the genome is well appreciated but remains poorly understood. Some organisms, such as humans, do not have active transposons; however, transposable elements were presumably active in their ancestral genomes. Of specific interest is whether the DNA surrounding the sites of transposon excision become recombinogenic, thus bringing about homologous recombination. Previous studies in maize and Drosophila have provided conflicting evidence on whether transposon excision is correlated with homologous recombination. Here we take advantage of an atypical Dissociation (Ds) element, a maize transposon that can be mobilized by the Ac transposase gene in Arabidopsis thaliana, to address questions on the mechanism of Ds excision. This atypical Ds element contains an adjacent 598 base pairs (bp) inverted repeat; the element was allowed to excise by the introduction of an unlinked Ac transposase source through mating. Footprints at the excision site suggest a micro-homology mediated non-homologous end joining reminiscent of V(D)J recombination involving the formation of intra-helix 3′ to 5′ trans-esterification as an intermediate, a mechanism consistent with previous observations in maize, Antirrhinum and in certain insects. The proposed mechanism suggests that the broken chromosome at the excision site should not allow recombinational interaction with the homologous chromosome, and that the linked inverted repeat should also be mobilizable. To test the first prediction, we measured recombination of flanking chromosomal arms selected for the excision of Ds. In congruence with the model, Ds excision did not influence crossover recombination. Furthermore, evidence for correlated movement of the adjacent inverted repeat sequence is presented; its origin and movement suggest a novel mechanism for the evolution of repeated elements. Taken together these results suggest that the movement of transposable elements themselves may not directly influence linkage. Possibility remains, however, for novel repeated DNA sequences produced as a consequence of transposon movement to influence crossover in subsequent generations

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Nucleoside Analogue Reverse Transcriptase Inhibitors Differentially Inhibit Human LINE-1 Retrotransposition

Author: AR Muotri
B Barret
B Brouha
C Meischl
CL Cherry
DM Sassaman
Douglas F. Nixon
EM Ostertag
Erick H. Duan
ES Lander
Etienne Joly
GJ Cost
GlaxoSmithKline
GlaxoSmithKline
H Chappuy
H Hohjoh
H Mitsuya
H Xie
HH Kazazian Jr
J Jurka
J Roman-Gomez
JA van den Hurk
Jessica C. Wong
JV Moran
Keith E. Garrison
L Mandelbrot
LM Prescott
Mario A. Ostrowski
N Narita
Q Feng
R Mangiacasale
R. Brad Jones
RA Spence
RE Bawdon
RH Drew
S Basame
S Kubo
Sciences Gilead
SJ Smerdon
SL Bloom
SL Martin
SL Martin
SL Martin
SL Martin
SL Martin
Squibb Bristol-Myers
ST Szak
T Heidmann
TA Patterson
VO Kolosha
Y Miki
Z Musova
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

Intact LINE-1 elements are the only retrotransposons encoded by the human genome known to be capable of autonomous replication. Numerous cases of genetic disease have been traced to gene disruptions caused by LINE-1 retrotransposition events in germ-line cells. In addition, genomic instability resulting from LINE-1 retrotransposition in somatic cells has been proposed as a contributing factor to oncogenesis and to cancer progression. LINE-1 element activity may also play a role in normal physiology. LINE-1 retrotransposition reporter assay, we evaluated the abilities of several antiretroviral compounds to inhibit LINE-1 retrotransposition. The nucleoside analogue reverse transcriptase inhibitors (nRTIs): stavudine, zidovudine, tenofovir disoproxil fumarate, and lamivudine all inhibited LINE-1 retrotransposition with varying degrees of potencies, while the non-nucleoside HIV-1 reverse transcriptase inhibitor nevirapine showed no effect.Our data demonstrates the ability for nRTIs to suppress LINE-1 retrotransposition. This is immediately applicable to studies aimed at examining potential roles for LINE-1 retrotransposition in physiological processes. In addition, our data raises novel safety considerations for nRTIs based on their potential to disrupt physiological processes involving LINE-1 retrotransposition

Public Library of Science (PLOS)

Saint Mary's College of California

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

Heritable L1 retrotransposition in the mouse primordial germline and early embryo

Author: Bodea Gabriela-Oana
Carreira Patricia E
Ewing Adam D
Faulkner Geoffrey J
Garcia-Perez Jose L
Gerdes Patricia
Gerhardt Daniel J
Jeddeloh Jeffrey A
Jesuadian J Samuel
Kazazian Jr Haig H
Kempen Marie-Jeanne H C
Munoz-Lopez Martin
Richardson Sandra R
Sanchez-Luque Francisco J
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 08/05/2017
Field of study

LINE-1 (L1) retrotransposons are a noted source of genetic diversity and disease in mammals. To expand its genomic footprint, L1 must mobilize in cells that will contribute their genetic material to subsequent generations. Heritable L1 insertions may therefore arise in germ cells and in pluripotent embryonic cells, prior to germline specification, yet the frequency and predominant developmental timing of such events remain unclear. Here, we applied mouse retrotransposon capture sequencing (mRC-seq) and whole-genome sequencing (WGS) to pedigrees of C57BL/6J animals, and uncovered an L1 insertion rate of ≥1 event per eight births. We traced heritable L1 insertions to pluripotent embryonic cells and, strikingly, to early primordial germ cells (PGCs). New L1 insertions bore structural hallmarks of target-site primed reverse transcription (TPRT) and mobilized efficiently in a cultured cell retrotransposition assay. Together, our results highlight the rate and evolutionary impact of heritable L1 retrotransposition and reveal retrotransposition-mediated genomic diversification as a fundamental property of pluripotent embryonic cells in vivo

Crossref

Edinburgh Research Explorer

Fondo Bibliográfico Digital Institucional

University of Queensland eSpace

RISCI - Repeat Induced Sequence Changes Identifier: a comprehensive, comparative genomics-based, in silico subtractive hybridization pipeline to identify repeat induced sequence changes in closely related genomes

Author: A Buzdin
A Buzdin
A Buzdin
A Hua-Van
AA Buzdin
AF Smit
AM Roy-Engel
AM Smith
BG Thornburg
BJ Vincent
C Feschotte
C Feschotte
CM Bergman
DJ Hedges
EA Bennett
EM Ostertag
EM Ostertag
ES Lander
FM Sheen
G Bejerano
H Quesneville
H Tiedge
HH Kazazian Jr
HH Kazazian Jr
HJ Ho
I Mamedov
IK Jordan
IZ Mamedov
J Jurka
J Piriyapongsa
J Wang
J Wang
J Xing
J Xing
JC Venter
JD Boeke
JL Goodier
JN Volff
JV Moran
K Han
K Han
K Han
L Marino-Ramirez
LE Orgel
LN van de Lagemaat
M Speek
MJ Curcio
MJ Gardner
MK Konkel
N Gilbert
NJ Bowen
NV Tomilin
OK Pickeral
P Medstrand
PA Callinan
PA Callinan
R Cordaux
Rakesh K Mishra
RE Mills
RE Mills
RT Poulter
S Boissinot
S De
S Levy
SF Altschul
SK Sen
SK Sen
SR von
T Hayakawa
T Wang
T Wicker
The Chimpanzee Sequencing and analysis consortium
The Chimpanzee Sequencing and analysis consortium
TJ Goodwin
Vipin Singh
VP Belancio
VV Kapitonov
VV Kapitonov
VV Lunyak
WF Doolittle
WJ Miller
WJ Miller
Z Szabo
Publication venue: BioMed Central
Publication date: 01/12/2010
Field of study

Abstract Background - The availability of multiple whole genome sequences has facilitated <it>in silico </it>identification of fixed and polymorphic transposable elements (TE). Whereas polymorphic loci serve as makers for phylogenetic and forensic analysis, fixed species-specific transposon insertions, when compared to orthologous loci in other closely related species, may give insights into their evolutionary significance. Besides, TE insertions are not isolated events and are frequently associated with subtle sequence changes concurrent with insertion or post insertion. These include duplication of target site, 3' and 5' flank transduction, deletion of the target locus, 5' truncation or partial deletion and inversion of the transposon, and post insertion changes like inter or intra element recombination, disruption etc. Although such changes have been studied independently, no automated platform to identify differential transposon insertions and the associated array of sequence changes in genomes of the same or closely related species is available till date. To this end, we have designed RISCI - 'Repeat Induced Sequence Changes Identifier' - a comprehensive, comparative genomics-based, <it>in silico </it>subtractive hybridization pipeline to identify differential transposon insertions and associated sequence changes using specific alignment signatures, which may then be examined for their downstream effects. Results - We showcase the utility of RISCI by comparing full length and truncated L1HS and AluYa5 retrotransposons in the reference human genome with the chimpanzee genome and the alternate human assemblies (Celera and HuRef). Comparison of the reference human genome with alternate human assemblies using RISCI predicts 14 novel polymorphisms in full length L1HS, 24 in truncated L1HS and 140 novel polymorphisms in AluYa5 insertions, besides several insertion and post insertion changes. We present comparison with two previous studies to show that RISCI predictions are broadly in agreement with earlier reports. We also demonstrate its versatility by comparing various strains of <it>Mycobacterium tuberculosis </it>for IS 6100 insertion polymorphism. Conclusions - RISCI combines comparative genomics with subtractive hybridization, inferring changes only when exclusive to one of the two genomes being compared. The pipeline is generic and may be applied to most transposons and to any two or more genomes sharing high sequence similarity. Such comparisons, when performed on a larger scale, may pull out a few critical events, which may have seeded the divergence between the two species under comparison.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

The association of Alu repeats with the generation of potential AU-rich elements (ARE) at 3' untranslated regions.

Author: AD Bailely
AM Roy-Engel
AM Zubiaga
AV Hoof
CA Beelman
CY Chen
CY Chen
CY Chen
D Caput
D Muhlrad
D Mukherjee
D Wingett
DD Kim
EM Ostertag
G Brewer
G Pages
GM Wilson
H Ambro
HH Kazazian Jr
IG Yulug
J Rogers
JD Boeke
JS Jacobs Anderson
K Peppel
M Akashi
M Gorospe
M Shen
M Tucker
MA Batzer
OI Sirenko
P Gillis
PL Deininger
R Reeves
RJ Britten
RJ Britten
S Park-Lee
S Vasudevan
T Bakheet
V Kapitonov
Z Wang
Publication venue: BioMed Central
Publication date: 01/12/2004
Field of study

BACKGROUND: A significant portion (about 8% in the human genome) of mammalian mRNA sequences contains AU (Adenine and Uracil) rich elements or AREs at their 3' untranslated regions (UTR). These mRNA sequences are usually stable. However, an increasing number of observations have been made of unstable species, possibly depending on certain elements such as Alu repeats. ARE motifs are repeats of the tetramer AUUU and a monomer A at the end of the repeats ((AUUU)(n)A). The importance of AREs in biology is that they make certain mRNA unstable. Proto-oncogene, such as c-fos, c-myc, and c-jun in humans, are associated with AREs. Although it has been known that the increased number of ARE motifs caused the decrease of the half-life of mRNA containing ARE repeats, the exact mechanism is as of yet unknown. We analyzed the occurrences of AREs and Alu and propose a possible mechanism for how human mRNA could acquire and keep AREs at its 3' UTR originating from Alu repeats. RESULTS: Interspersed in the human genome, Alu repeats occupy 5% of the 3' UTR of mRNA sequences. Alu has poly-adenine (poly-A) regions at its end, which lead to poly-thymine (poly-T) regions at the end of its complementary Alu. It has been found that AREs are present at the poly-T regions. From the 3' UTR of the NCBI's reference mRNA sequence database, we found nearly 40% (38.5%) of ARE (Class I) were associated with Alu sequences (Table 1) within one mismatch allowance in ARE sequences. Other ARE classes had statistically significant associations as well. This is far from a random occurrence given their limited quantity. At each ARE class, random distribution was simulated 1,000 times, and it was shown that there is a special relationship between ARE patterns and the Alu repeats. CONCLUSION: AREs are mediating sequence elements affecting the stabilization or degradation of mRNA at the 3' untranslated regions. However, AREs' mechanism and origins are unknown. We report that Alu is a source of ARE. We found that half of the longest AREs were derived from the poly-T regions of the complementary Alu

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

ScholarWorks@UNIST

Chromosomal-level assembly of the Asian Seabass genome using long sequence reads and multi-layered scaffolding

Author: A Bairoch
A Christoffels
A Gurevich
A Kozomara
A McKenna
A Mitchell
A Morgulis
A Morgulis
A Pradhan
A Reiner
A Rodriguez-Mari
A Stamatakis
A Yates
AI Makunin
AJ Enright
AL Price
AL Price
Alan Christoffels
Aleksey Komissarov
Alexey Tupikin
Amy Hin Yan Tong
Andrey A. Yurchenko
AR Quinlan
B Langmead
B Star
C Berthelot
C Camacho
C Holt
C Wang
Chen-Shan Chin
CS Chin
D Brawand
D Ellinghaus
DA Benson
Darrell Green
DC Hardie
Dean R. Jerry
DH Alexander
Doreen Lau
DR Kelley
DRS-K C. Jerry
E Casacuberta
E. TG Staristina
EW Myers
F Abascal
F Chen
F Yang
FC Jones
FJ Krsticevic
Fritz J. Sedlazeck
G Abrusan
G Benson
G Lin
G Marcais
G Parra
G Parra
G Tamazian
GH Yue
GH Yue
Gopikrishna Gopalapillai
Gregory W. Vurture
GS Slater
GT Valente
H Li
H Saiga
Heiner Kuhl
HH Kazazian Jr.
I Braasch
Inna S. Kuznetsova
IS Kuznetsova
J Castresana
J Eid
J Huerta-Cepas
J Jurka
J Lin
James P. Drake
JG Ruby
JN Volff
JN Volff
Jolly M. Saju
Jonas Korlach
JS Chew
Junhui Jiang
K Howe
K Katoh
K Prufer
Kathiresan Purushothaman
KD Pruitt
KJ Hoff
KP Koepfli
KW Tzung
Lawrence S. Hon
László Orbán
M Blanchette
M Kanehisa
M Kasahara
M Kolmogorov
M Krzywinski
M Martin
M Schartl
M Tarailoâ-Graovac
M Tine
MA Larkin
Mario Jonas
Marsel Kabilov
Matthew Boitano
MB Stocks
MG Grabherr
Michael C. Schatz
MJ Chaisson
MR Friedlander
N Siegel
Natascha M. Thevasagayam
NM Thevasagayam
O Jaillon
O Otero
P Cingolani
P Ravi
P Schattner
P Shannon
P Xu
Paul M. Richardson
PE Warburton
Peter Van Heusden
R Kajitani
R Lorenz
R Luo
R Moore
R Pethiyagoda
R Poulter
R She
R Sreenivasan
Ramkumar Lachumanan
RD Ward
RD Ward
Richard Hall
RJ Roberts
S Chen
S Guindon
S Hoegg
S Hoegg
S Koren
S Vij
S Zhou
Sai Rama Sridatta Prakki
Sarah Mwangi
SF Altschul
Shubha Vij
Si Lok
Si Yan Ngoh
Siddharth Singh
Simon Moxon
SM Kielbasa
Sridhar Sivasubbu
Stanley Kimbung Mbandi
Stephen J. O'Brien
Stephen W. Turner
T Anantharaman
Tamás Dalmay
Tansyn H. Noble
TD Wu
TF DeLuca
TH O'Hare
TLO Davis
TS Anantharaman
Tyler Garvin
U Consortium
U Grimholt
V Douard
V Ravi
Vinaya Kumar Katneni
Vinod Scaria
Vladimir Trifonov
W Xue
WC Liew
Woei Chang Liew
WS Davidson
X Huang
X Zheng
XG Wang
XG Wang
Xueyan Shen
Y Guiguen
Y Han
Y Hashiguchi
Y Moriya
Y Sato
Y Sato
Y Sato
Z Lai
Ø Hammer
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2016
Field of study

We report here the ~670 Mb genome assembly of the Asian seabass (Lates calcarifer), a tropical marine teleost. We used long-read sequencing augmented by transcriptomics, optical and genetic mapping along with shared synteny from closely related fish species to derive a chromosome-level assembly with a contig N50 size over 1 Mb and scaffold N50 size over 25 Mb that span ~90% of the genome. The population structure of L. calcarifer species complex was analyzed by re-sequencing 61 individuals representing various regions across the species' native range. SNP analyses identified high levels of genetic diversity and confirmed earlier indications of a population stratification comprising three clades with signs of admixture apparent in the South-East Asian population. The quality of the Asian seabass genome assembly far exceeds that of any other fish species, and will serve as a new standard for fish genomics

Public Library of Science (PLOS)

ResearchOnline@JCU

Crossref

Cold Spring Harbor Laboratory Institutional Repository

Directory of Open Access Journals

ResearchOnline at James Cook University

PubMed Central

Research Repository

Repository of the Academy's Library

University of East Anglia digital repository

NSU Works

MPG.PuRe

Repetitive Elements May Comprise Over Two-Thirds of the Human Genome

Author: A Nekrutenko
A. P. Jason de Koning
AFA Smit
AL Price
AR Quinlan
C Feschotte
DA Ray
David D. Pollock
E Lerat
EE Eichler
EF Kirkness
G Achaz
G Benson
G Lunter
Gregory P. Copenhaver
H Quesneville
HH Kazazian Jr
J Brosius
J Jurka
J Jurka
J Jurka
JS Mattick
JU Pontius
K Lindblad-Toh
M Pheasant
MA Batzer
Mark A. Batzer
MC Frith
R Li
RC Edgar
RM Kuhn
S Karlin
S Kurtz
SF Altschul
TA Castoe
Todd A. Castoe
TS Mikkelsen
W Gu
Wanjun Gu
WC Warren
Z Bao
Publication venue: Public Library of Science
Publication date: 01/12/2011
Field of study

Transposable elements (TEs) are conventionally identified in eukaryotic genomes by alignment to consensus element sequences. Using this approach, about half of the human genome has been previously identified as TEs and low-complexity repeats. We recently developed a highly sensitive alternative de novo strategy, P-clouds, that instead searches for clusters of high-abundance oligonucleotides that are related in sequence space (oligo “clouds”). We show here that P-clouds predicts >840 Mbp of additional repetitive sequences in the human genome, thus suggesting that 66%–69% of the human genome is repetitive or repeat-derived. To investigate this remarkable difference, we conducted detailed analyses of the ability of both P-clouds and a commonly used conventional approach, RepeatMasker (RM), to detect different sized fragments of the highly abundant human Alu and MIR SINEs. RM can have surprisingly low sensitivity for even moderately long fragments, in contrast to P-clouds, which has good sensitivity down to small fragment sizes (∼25 bp). Although short fragments have a high intrinsic probability of being false positives, we performed a probabilistic annotation that reflects this fact. We further developed “element-specific” P-clouds (ESPs) to identify novel Alu and MIR SINE elements, and using it we identified ∼100 Mb of previously unannotated human elements. ESP estimates of new MIR sequences are in good agreement with RM-based predictions of the amount that RM missed. These results highlight the need for combined, probabilistic genome annotation approaches and suggest that the human genome consists of substantially more repetitive sequence than previously believed

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

LSU Scholarly Repository (Louisiana State Univ.)