Search CORE

23,064 research outputs found

Analysis Of DNA Motifs In The Human Genome

Author: Liang Yupu
Publication venue: CUNY Academic Works
Publication date: 01/02/2014
Field of study

DNA motifs include repeat elements, promoter elements and gene regulator elements, and play a critical role in the human genome. This thesis describes a genome-wide computational study on two groups of motifs: tandem repeats and core promoter elements. Tandem repeats in DNA sequences are extremely relevant in biological phenomena and diagnostic tools. Computational programs that discover tandem repeats generate a huge volume of data, which can be difficult to decipher without further organization. A new method is presented here to organize and rank detected tandem repeats through clustering and classification. Our work presents multiple ways of expressing tandem repeats using the n-gram model with different clustering distance measures. Analysis of the clusters for the tandem repeats in the human genome shows that the method yields a well-defined grouping in which similarity among repeats is apparent. Our new, alignment-free method facilitates the analysis of the myriad of tandem repeats replete in the human genome. We believe that this work will lead to new discoveries on the roles, origins, and significance of tandem repeats. As with tandem repeats, promoter sequences of genes contain binding sites for proteins that play critical roles in mediating expression levels. Promoter region binding proteins and their co-factors influence timing and context of transcription. Despite the critical regulatory role of these non-coding sequences, computational methods to identify and predict DNA binding sites are extremely limited. The work reported here analyzes the relative occurrence of core promoter elements (CPEs) in and around transcription start sites. We found that out of all the data sets 49\%-63\% upstream regions have either TATA box or DPE elements. Our results suggest the possibility of predicting transcription start sites through combining CPEs signals with other promoter signals such as CpG islands and clusters of specific transcription binding sites

City University of New York

Tandemly repeated DNA families in the mouse genome

Author: AE Vinogradov
AE Vinogradov
AF Smit
AJ Therkelsen
AK Wong
Aleksey S Komissarov
Alexander M Ishov
AR Quinlan
AV Probst
B Vissel
C Alkan
C Alkan
C Camacho
C Lee
C Maison
C Mayer
C Muchardt
C Stocking
CA Morris
D Ames
D Broccoli
D Broccoli
D Kipling
D Kipling
E Falconer
EH Ford
Ekaterina V Gavrilova
EM Southern
G Benson
G-F Richard
GE Parris
HJ Cooke
HJ Cooke
HJ Cooke
I Alexandrov
I Kobliakova
I Kuznetsova
I Tagarro
IS Kuznetsova
IS Kuznetsova
J Giordano
J Jurka
J Lu
J Prosser
JA Blake
JJ Yunis
JM Kidd
JR Gosden
KH Choo
M Alleman
M Guenatri
M Plohl
MA Abdurashitov
MD Pertile
MG Schueler
MJ Higgins
MK Rudd
MM Mahtani
N Kireeva
NI Enukashvily
NI Enukashvily
O Podgornaya
OI Podgornaya
Olga I Podgornaya
P Kalitsis
PA Biro
PE Warburton
PE Warburton
RA Martienssen
RH Waterston
RJ Mural
RK Moyzis
S Demin
S Mamaeva
Sergey Ju Demin
SH Namekawa
SIS Grewal
T Beridze
T Hayashi
T Palomeque
T Ushiki
V Paar
W Hörz
X She
Publication venue: BioMed Central
Publication date: 01/10/2011
Field of study

Abstract Background Functional and morphological studies of tandem DNA repeats, that combine high portion of most genomes, are mostly limited due to the incomplete characterization of these genome elements. We report here a genome wide analysis of the large tandem repeats (TR) found in the mouse genome assemblies. Results Using a bioinformatics approach, we identified large TR with array size more than 3 kb in two mouse whole genome shotgun (WGS) assemblies. Large TR were classified based on sequence similarity, chromosome position, monomer length, array variability, and GC content; we identified four superfamilies, eight families, and 62 subfamilies - including 60 not previously described. 1) The superfamily of centromeric minor satellite is only found in the unassembled part of the reference genome. 2) The pericentromeric major satellite is the most abundant superfamily and reveals high order repeat structure. 3) Transposable elements related superfamily contains two families. 4) The superfamily of heterogeneous tandem repeats includes four families. One family is found only in the WGS, while two families represent tandem repeats with either single or multi locus location. Despite multi locus location, TRPC-21A-MM is placed into a separated family due to its abundance, strictly pericentromeric location, and resemblance to big human satellites. To confirm our data, we next performed <it>in situ </it>hybridization with three repeats from distinct families. TRPC-21A-MM probe hybridized to chromosomes 3 and 17, multi locus TR-22A-MM probe hybridized to ten chromosomes, and single locus TR-54B-MM probe hybridized with the long loops that emerge from chromosome ends. In addition to <it>in silico </it>predicted several extra-chromosomes were positive for TR by <it>in situ </it>analysis, potentially indicating inaccurate genome assembly of the heterochromatic genome regions. Conclusions Chromosome-specific TR had been predicted for mouse but no reliable cytogenetic probes were available before. We report new analysis that identified <it>in silico </it>and confirmed <it>in situ </it>3/17 chromosome-specific probe TRPC-21-MM. Thus, the new classification had proven to be useful tool for continuation of genome study, while annotated TR can be the valuable source of cytogenetic probes for chromosome recognition.</p

Crossref

Directory of Open Access Journals

PubMed Central

A new census of protein tandem repeats and their relationship with intrinsic disorder

Author: Anisimova Maria
Delucchi Matteo
Elofsson Arne
Sachenkova Oxana
Schaper Elke
Publication venue: 'MDPI AG'
Publication date: 09/04/2020
Field of study

Protein tandem repeats (TRs) are often associated with immunity-related functions and diseases. Since that last census of protein TRs in 1999, the number of curated proteins increased more than seven-fold and new TR prediction methods were published. TRs appear to be enriched with intrinsic disorder and vice versa. The significance and the biological reasons for this association are unknown. Here, we characterize protein TRs across all kingdoms of life and their overlap with intrinsic disorder in unprecedented detail. Using state-of-the-art prediction methods, we estimate that 50.9% of proteins contain at least one TR, often located at the sequence flanks. Positive linear correlation between the proportion of TRs and the protein length was observed universally, with Eukaryotes in general having more TRs, but when the difference in length is taken into account the difference is quite small. TRs were enriched with disorder-promoting amino acids and were inside intrinsically disordered regions. Many such TRs were homorepeats. Our results support that TRs mostly originate by duplication and are involved in essential functions such as transcription processes, structural organization, electron transport and iron-binding. In viruses, TRs are found in proteins essential for virulence

Multidisciplinary Digital Publishing Institute

ZHAW digitalcollection

Chromosomal-level assembly of the Asian Seabass genome using long sequence reads and multi-layered scaffolding

Author: A Bairoch
A Christoffels
A Gurevich
A Kozomara
A McKenna
A Mitchell
A Morgulis
A Morgulis
A Pradhan
A Reiner
A Rodriguez-Mari
A Stamatakis
A Yates
AI Makunin
AJ Enright
AL Price
AL Price
Alan Christoffels
Aleksey Komissarov
Alexey Tupikin
Amy Hin Yan Tong
Andrey A. Yurchenko
AR Quinlan
B Langmead
B Star
C Berthelot
C Camacho
C Holt
C Wang
Chen-Shan Chin
CS Chin
D Brawand
D Ellinghaus
DA Benson
Darrell Green
DC Hardie
Dean R. Jerry
DH Alexander
Doreen Lau
DR Kelley
DRS-K C. Jerry
E Casacuberta
E. TG Staristina
EW Myers
F Abascal
F Chen
F Yang
FC Jones
FJ Krsticevic
Fritz J. Sedlazeck
G Abrusan
G Benson
G Lin
G Marcais
G Parra
G Parra
G Tamazian
GH Yue
GH Yue
Gopikrishna Gopalapillai
Gregory W. Vurture
GS Slater
GT Valente
H Li
H Saiga
Heiner Kuhl
HH Kazazian Jr.
I Braasch
Inna S. Kuznetsova
IS Kuznetsova
J Castresana
J Eid
J Huerta-Cepas
J Jurka
J Lin
James P. Drake
JG Ruby
JN Volff
JN Volff
Jolly M. Saju
Jonas Korlach
JS Chew
Junhui Jiang
K Howe
K Katoh
K Prufer
Kathiresan Purushothaman
KD Pruitt
KJ Hoff
KP Koepfli
KW Tzung
Lawrence S. Hon
László Orbán
M Blanchette
M Kanehisa
M Kasahara
M Kolmogorov
M Krzywinski
M Martin
M Schartl
M Tarailoâ-Graovac
M Tine
MA Larkin
Mario Jonas
Marsel Kabilov
Matthew Boitano
MB Stocks
MG Grabherr
Michael C. Schatz
MJ Chaisson
MR Friedlander
N Siegel
Natascha M. Thevasagayam
NM Thevasagayam
O Jaillon
O Otero
P Cingolani
P Ravi
P Schattner
P Shannon
P Xu
Paul M. Richardson
PE Warburton
Peter Van Heusden
R Kajitani
R Lorenz
R Luo
R Moore
R Pethiyagoda
R Poulter
R She
R Sreenivasan
Ramkumar Lachumanan
RD Ward
RD Ward
Richard Hall
RJ Roberts
S Chen
S Guindon
S Hoegg
S Hoegg
S Koren
S Vij
S Zhou
Sai Rama Sridatta Prakki
Sarah Mwangi
SF Altschul
Shubha Vij
Si Lok
Si Yan Ngoh
Siddharth Singh
Simon Moxon
SM Kielbasa
Sridhar Sivasubbu
Stanley Kimbung Mbandi
Stephen J. O'Brien
Stephen W. Turner
T Anantharaman
Tamás Dalmay
Tansyn H. Noble
TD Wu
TF DeLuca
TH O'Hare
TLO Davis
TS Anantharaman
Tyler Garvin
U Consortium
U Grimholt
V Douard
V Ravi
Vinaya Kumar Katneni
Vinod Scaria
Vladimir Trifonov
W Xue
WC Liew
Woei Chang Liew
WS Davidson
X Huang
X Zheng
XG Wang
XG Wang
Xueyan Shen
Y Guiguen
Y Han
Y Hashiguchi
Y Moriya
Y Sato
Y Sato
Y Sato
Z Lai
Ø Hammer
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2016
Field of study

We report here the ~670 Mb genome assembly of the Asian seabass (Lates calcarifer), a tropical marine teleost. We used long-read sequencing augmented by transcriptomics, optical and genetic mapping along with shared synteny from closely related fish species to derive a chromosome-level assembly with a contig N50 size over 1 Mb and scaffold N50 size over 25 Mb that span ~90% of the genome. The population structure of L. calcarifer species complex was analyzed by re-sequencing 61 individuals representing various regions across the species' native range. SNP analyses identified high levels of genetic diversity and confirmed earlier indications of a population stratification comprising three clades with signs of admixture apparent in the South-East Asian population. The quality of the Asian seabass genome assembly far exceeds that of any other fish species, and will serve as a new standard for fish genomics

Public Library of Science (PLOS)

ResearchOnline@JCU

Crossref

Cold Spring Harbor Laboratory Institutional Repository

Directory of Open Access Journals

ResearchOnline at James Cook University

PubMed Central

Research Repository

Repository of the Academy's Library

University of East Anglia digital repository

NSU Works

MPG.PuRe

Measuring microsatellite conservation in mammalian evolution with a phylogenetic birth-death model.

Author: Buschiazzo Emmanuel
Gemmell Neil
Lennon Dustin
Minin Vladimir N
Sawaya Sterling M
Publication venue: eScholarship, University of California
Publication date: 01/01/2012
Field of study

Microsatellites make up ∼3% of the human genome, and there is increasing evidence that some microsatellites can have important functions and can be conserved by selection. To investigate this conservation, we performed a genome-wide analysis of human microsatellites and measured their conservation using a binary character birth--death model on a mammalian phylogeny. Using a maximum likelihood method to estimate birth and death rates for different types of microsatellites, we show that the rates at which microsatellites are gained and lost in mammals depend on their sequence composition, length, and position in the genome. Additionally, we use a mixture model to account for unequal death rates among microsatellites across the human genome. We use this model to assign a probability-based conservation score to each microsatellite. We found that microsatellites near the transcription start sites of genes are often highly conserved, and that distance from a microsatellite to the nearest transcription start site is a good predictor of the microsatellite conservation score. An analysis of gene ontology terms for genes that contain microsatellites near their transcription start site reveals that regulatory genes involved in growth and development are highly enriched with conserved microsatellites

PubMed Central

eScholarship - University of California

Genome maps across 26 human populations reveal population-specific patterns of structural variation.

Author: Cao Han
Chan Ting-Fung
Chow Eugene YC
Chu Catherine
Chung Claire YL
Hastie Alex R
Jin Nana
Kwok Pui-Yan
Lam Ernest T
Leung Alden KY
Levy-Sakin Michal
Li Le
Lin Chin
Ma Walfred
McCaffrey Jennifer
Mostovoy Yulia
Naguib Ahmed
Pastor Steven
Poon Annie
Rajagopalan Ramakrishnan
Sibert Justin
Wang Wei-Ping
Wong Karen HY
Xiao Ming
Yip Kevin Y
Young Eleanor
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Large structural variants (SVs) in the human genome are difficult to detect and study by conventional sequencing technologies. With long-range genome analysis platforms, such as optical mapping, one can identify large SVs (>2 kb) across the genome in one experiment. Analyzing optical genome maps of 154 individuals from the 26 populations sequenced in the 1000 Genomes Project, we find that phylogenetic population patterns of large SVs are similar to those of single nucleotide variations in 86% of the human genome, while ~2% of the genome has high structural complexity. We are able to characterize SVs in many intractable regions of the genome, including segmental duplications and subtelomeric, pericentromeric, and acrocentric areas. In addition, we discover ~60 Mb of non-redundant genome content missing in the reference genome sequence assembly. Our results highlight the need for a comprehensive set of alternate haplotypes from different populations to represent SV patterns in the genome

Directory of Open Access Journals

eScholarship - University of California

Detection and diversity of a putative novel heterogeneous polymorphic proline-glycine repeat (Pgr) protein in the footrot pathogen Dichelobacter nodosus

Author: Atiya Ul-Hassan
Buller
Bzymek
Cheetham
Claire L. Russell
Claxton
Depiazzi
Elizabeth M.H. Wellington
Girard
Graham F. Medley
Gravekamp
Gravekamp
Heise
Hindmarsh
Jasmeet Kaler
Jelinek
Jordan
Julian I. Rood
Katz
Katz
Kay
Kennan
Kennan
La Fontaine
Ladefoged
Laura E. Green
Leo A. Calvo-Bado
Lukomski
Madoff
Moore
Myers
Nallapareddy
Nicky Buller
O’Dushlaine
Palmer
Paterson
Pringle
Puopolo
Rasmussen
Romero
Rood
Rose Grogono-Thomas
Ruth M. Kennan
Vanhoof
Verstrepen
Williamson
Yong
Zhang
Publication venue: 'Elsevier BV'
Publication date: 03/07/2010
Field of study

Dichelobacter nodosus, a Gram-negative anaerobic bacterium, is the essential causative agent of footrot in sheep. Currently, depending on the clinical presentation in the field, footrot is described as benign or virulent; D. nodosus strains have also been classified as benign or virulent, but this designation is not always consistent with clinical disease. The aim of this study was to determine the diversity of the pgr gene, which encodes a putative proline-glycine repeat protein (Pgr). The pgr gene was present in all 100 isolates of D. nodosus that were examined and, based on sequence analysis had two variants, pgrA and pgrB. In pgrA, there were two coding tandem repeat regions, R1 and R2: different strains had variable numbers of repeats within these regions. The R1 and R2 were absent from pgrB. Both variants were present in strains from Australia, Sweden and the UK, however, only pgrB was detected in isolates from Western Australia. The pgrA gene was detected in D. nodosus from tissue samples from two flocks in the UK with virulent footrot and only pgrB from a flock with no virulent or benign footrot for >10 years. Bioinformatic analysis of the putative PgrA protein indicated that it contained a collagen-like cell surface anchor motif. These results suggest that the pgr gene may be a useful molecular marker for epidemiological studies

Crossref

University of Birmingham Research Portal

HAL Descartes

Warwick Research Archives Portal Repository

Explore Bristol Research