Search CORE

2,726 research outputs found

Analysis of high-identity segmental duplications in the grapevine genome

Abstract Background Segmental duplications (SDs) are blocks of genomic sequence of 1-200 kb that map to different loci in a genome and share a sequence identity > 90%. SDs show at the sequence level the same characteristics as other regions of the human genome: they contain both high-copy repeats and gene sequences. SDs play an important role in genome plasticity by creating new genes and modeling genome structure. Although data is plentiful for mammals, not much was known about the representation of SDs in plant genomes. In this regard, we performed a genome-wide analysis of high-identity SDs on the sequenced grapevine (<it>Vitis vinifera</it>) genome (PN40024). Results We demonstrate that recent SDs (> 94% identity and >= 10 kb in size) are a relevant component of the grapevine genome (85 Mb, 17% of the genome sequence). We detected mitochondrial and plastid DNA and genes (10% of gene annotation) in segmentally duplicated regions of the nuclear genome. In particular, the nine highest copy number genes have a copy in either or both organelle genomes. Further we showed that several duplicated genes take part in the biosynthesis of compounds involved in plant response to environmental stress. Conclusions These data show the great influence of SDs and organelle DNA transfers in modeling the <it>Vitis vinifera </it>nuclear DNA structure as well as the impact of SDs in contributing to the adaptive capacity of grapevine and the nutritional content of grape products through genome variation. This study represents a step forward in the full characterization of duplicated genes important for grapevine cultural needs and human health.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Archivio istituzionale della ricerca - Università di Bari

Predicting genome-wide redundancy using machine learning

Author: Bandyopadhyay Sunayan
Birnbaum Kenneth D
Chen Huang-Wen
Shasha Dennis E
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Gene duplication can lead to genetic redundancy, which masks the function of mutated genes in genetic analyses. Methods to increase sensitivity in identifying genetic redundancy can improve the efficiency of reverse genetics and lend insights into the evolutionary outcomes of gene duplication. Machine learning techniques are well suited to classifying gene family members into redundant and non-redundant gene pairs in model species where sufficient genetic and genomic data is available, such as <it>Arabidopsis thaliana</it>, the test case used here. Results Machine learning techniques that combine multiple attributes led to a dramatic improvement in predicting genetic redundancy over single trait classifiers alone, such as BLAST E-values or expression correlation. In withholding analysis, one of the methods used here, Support Vector Machines, was two-fold more precise than single attribute classifiers, reaching a level where the majority of redundant calls were correctly labeled. Using this higher confidence in identifying redundancy, machine learning predicts that about half of all genes in <it>Arabidopsis </it>showed the signature of predicted redundancy with at least one but typically less than three other family members. Interestingly, a large proportion of predicted redundant gene pairs were relatively old duplications (e.g., Ks > 1), suggesting that redundancy is stable over long evolutionary periods. Conclusions Machine learning predicts that most genes will have a functionally redundant paralog but will exhibit redundancy with relatively few genes within a family. The predictions and gene pair attributes for <it>Arabidopsis </it>provide a new resource for research in genetics and genome evolution. These techniques can now be applied to other organisms.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Distinct Gene Number-Genome Size Relationships for Eukaryotes and Non-Eukaryotes: Gene Content Estimation for Dinoflagellate Genomes

Author: AC Ivens
AG Hinnebusch
AR Loeblich III
CH Slamovits
CH Slamovits
D Lee
DC Sigee
DL Spector
DM Anderson
DW Coats
FM Van Dolah
H Moreau
H Zhang
H Zhang
H Zhang
J Archibald
J Lukes
J Ramsey
J Reichman
JD Hackett
JD Hackett
JM Aury
JR Allen
KH Wolfe
KT Konstantinidis
L Pfiester
L Xu
LY Liu
M Berriman
M Lynch
M Lynch
M McEwan
MJW Veldhuis
NJ Patron
NJ Patron
O Holm-Hansen
P Salois
PJ Rizzo
PJ Rizzo
QH Le
RE Steel
RJ Blank
Rosemary Jeanne Redfield
S Lin
S Lin
Senjie Lin
SR Santos
T Bertomeu
TC LaJeunesse
TM Roberts
TR Bachvaroff
TR Gregory
TR Gregory
TR Gregory
Y Bhaud
Y Bouligand
YH Chan
Yubo Hou
Publication venue: Public Library of Science
Publication date: 01/09/2009
Field of study

The ability to predict gene content is highly desirable for characterization of not-yet sequenced genomes like those of dinoflagellates. Using data from completely sequenced and annotated genomes from phylogenetically diverse lineages, we investigated the relationship between gene content and genome size using regression analyses. Distinct relationships between log10-transformed protein-coding gene number (Y′) versus log10-transformed genome size (X′, genome size in kbp) were found for eukaryotes and non-eukaryotes. Eukaryotes best fit a logarithmic model, Y′ = ln(-46.200+22.678X′, whereas non-eukaryotes a linear model, Y′ = 0.045+0.977X′, both with high significance (p<0.001, R2>0.91). Total gene number shows similar trends in both groups to their respective protein coding regressions. The distinct correlations reflect lower and decreasing gene-coding percentages as genome size increases in eukaryotes (82%–1%) compared to higher and relatively stable percentages in prokaryotes and viruses (97%–47%). The eukaryotic regression models project that the smallest dinoflagellate genome (3×106 kbp) contains 38,188 protein-coding (40,086 total) genes and the largest (245×106 kbp) 87,688 protein-coding (92,013 total) genes, corresponding to 1.8% and 0.05% gene-coding percentages. These estimates do not likely represent extraordinarily high functional diversity of the encoded proteome but rather highly redundant genomes as evidenced by high gene copy numbers documented for various dinoflagellate species

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Genomic copy number variation in Mus musculus.

Author: A Agam
A Baross
A Buj-Bello
A Fritsch
A Koike
A Moisan
A Smahi
A Valsesia
AD Ewing
AHMM Huq
Andrea E Wishart
AR Quinlan
AR Quinlan
B Franco
B Jiang
B Yalcin
B Yalcin
C Faust
C Nellåker
C Standfuss
CA Castellani
CM Egan
CN Henrichsen
D Atasoy
D Brown
D Karolchik
D Ng
D O’Carroll
D Pasini
D Restrepo
D Wang
DE Watkins-Chow
DF Conrad
DF Conrad
DM Altshuler
DW Huang
DW Huang
E Axelsson
E Ferretti
EH Wynn
EL Hassemer
F Staubach
G Cutler
G Cutler
G Faury
GH Perry
H Yang
H Yang
I Rubio-Aliaga
J Berglund
J Bryk
J Gatesy
J Sebat
J Wang
J Xing
JE Eckel-Passow
JE Levy
JL Butler
JL Peirce
JP Didion
JR Lupski
K Wang
K Wong
Kathleen A Hill
KW Seo
L Dal Zotto
L Jiang
L Longo
L Rachdi
L Winchester
LM Boyden
M Bastian
M Brosch
M Elizabeth O Locke
M Hall
M Hara-Chikuma
M Ito
M Ito
M Sugimoto
M Tudor
Maja Milojevic
Mark Daley
MB Shannon
MI Ferrante
MJ McConnell
MM Simon
N Braverman
Nisha Patel
O Gascuel
O Nakajima
P Cahan
P Krebs
P Liu
PP Rocha
R Desper
R Redon
R Rexhepaj
RC Iskow
RJ Kinsella
RS Sellers
RZ Chen
S Biechele
S Kim
S Mohan
SA Yukl
SJ Diskin
Susan T Eitutis
T Westerling
TA Graubert
TM Keane
TM Keane
W Li
W Liu
W Liu
X Miró
X She
XY Liu
Y Feng
Y Hou
Y Ueda
Publication venue: Scholarship@Western
Publication date: 01/01/2015
Field of study

BACKGROUND: Copy number variation is an important dimension of genetic diversity and has implications in development and disease. As an important model organism, the mouse is a prime candidate for copy number variant (CNV) characterization, but this has yet to be completed for a large sample size. Here we report CNV analysis of publicly available, high-density microarray data files for 351 mouse tail samples, including 290 mice that had not been characterized for CNVs previously. RESULTS: We found 9634 putative autosomal CNVs across the samples affecting 6.87% of the mouse reference genome. We find significant differences in the degree of CNV uniqueness (single sample occurrence) and the nature of CNV-gene overlap between wild-caught mice and classical laboratory strains. CNV-gene overlap was associated with lipid metabolism, pheromone response and olfaction compared to immunity, carbohydrate metabolism and amino-acid metabolism for wild-caught mice and classical laboratory strains, respectively. Using two subspecies of wild-caught Mus musculus, we identified putative CNVs unique to those subspecies and show this diversity is better captured by wild-derived laboratory strains than by the classical laboratory strains. A total of 9 genic copy number variable regions (CNVRs) were selected for experimental confirmation by droplet digital PCR (ddPCR). CONCLUSION: The analysis we present is a comprehensive, genome-wide analysis of CNVs in Mus musculus, which increases the number of known variants in the species and will accelerate the identification of novel variants in future studies

Scholarship@Western

Crossref

Springer - Publisher Connector

PubMed Central

Genomic comparisons and genome architecture of divergent Trypanosoma species

Author: Bradwell Katie
Publication venue: VCU Scholars Compass
Publication date: 01/01/2016
Field of study

Virulent Trypanosoma cruzi, and the non-pathogenic Trypanosoma conorhini and Trypanosoma rangeli are protozoan parasites with divergent lifestyles. T. cruzi and T. rangeli are endemic to Latin America, whereas T. conorhini is tropicopolitan. Reduviid bug vectors spread these parasites to mammalian hosts, within which T. rangeli and T. conorhini replicate extracellularly, while T. cruzi has intracellular stages. Firstly, this work compares the genomes of these parasites to understand their differing phenotypes. Secondly, genome architecture of T. cruzi is examined to address the effect of a complex hybridization history, polycistronic transcription, and genome plasticity on this organism, and study its highly repetitive nature and cryptic genome organization. Whole genome sequencing, assembly and comparison, as well as chromosome-scale genome mapping were employed. This study presents the first comprehensive whole-genome maps of Trypanosoma, and the first T. conorhini strain ever sequenced. Original contributions vii to knowledge include the ~21-25 Mbp assembled genomes of the less virulent T. cruzi G, T. rangeli AM80, and T. conorhini 025E, containing ~10,000 to 13,000 genes, and the ~36 Mbp genome assembly of highly virulent T. cruzi CL with ~24,000 genes. The T. cruzi strains exhibited ~74% identity to proteins of T. rangeli or T. conorhini. T. rangeli and T. conorhini displayed greater complex carbohydrate metabolic capabilities, and contained fewer retrotransposons and multigene family copies, e.g. mucins, DGF-1, and MASP, compared to T. cruzi. Although all four genomes appear highly syntenic, T. rangeli and T. conorhini exhibited greater karyotype conservation. T. cruzi genome architecture studies revealed 66 maps varying from 0.13 to 2.4 Mbp. At least 2.6% of the genome comprises highly repetitive repeat regions, and 7.4% exhibits repetitive regions barren of labels. The 66 putative chromosomes identified are likely diploid. However, 20 of these maps contained regions of up to 1.25 Mbp of homology to at least one other map, suggestive of widespread segmental duplication or an ancient hybridization event that resulted in a genome with significant redundancy. Assembled genomes of these parasites closely reflect their phylogenetic relationships and give a greater context for understanding their divergent lifestyles. Genome mapping provides insight on the genomic evolution of these parasites

VCU Scholars Compass

Gene expansion shapes genome architecture in the human pathogen Lichtheimia corymbifera: an evolutionary genomics analysis in the ancient terrestrial mucorales (Mucoromycotina)

Author: Axel A Brakhage
Ekaterina Shelest
Fabian Horn
Ilse D Jacobsen
Jörg Linde
Kerstin Kaerger
Kerstin Voigt
Konstantin Riege
Manja Marz
Marina Marcet-Houben
Michael Sammeth
Minou Nowrousian
Sascha Winter
Sebastian Böcker
Stefanie Wehner
Toni Gabaldón
Vito Valiante
Volker U Schwartze
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 14/08/2014
Field of study

Lichtheimia species are the second most important cause of mucormycosis in Europe. To provide broader insights into the molecular basis of the pathogenicity-associated traits of the basal Mucorales, we report the full genome sequence of L. corymbifera and compared it to the genome of Rhizopus oryzae, the most common cause of mucormycosis worldwide. The genome assembly encompasses 33.6 MB and 12,379 protein-coding genes. This study reveals four major differences of the L. corymbifera genome to R. oryzae: (i) the presence of an highly elevated number of gene duplications which are unlike R. oryzae not due to whole genome duplication (WGD), (ii) despite the relatively high incidence of introns, alternative splicing (AS) is not frequently observed for the generation of paralogs and in response to stress, (iii) the content of repetitive elements is strikingly low (<5%), (iv) L. corymbifera is typically haploid. Novel virulence factors were identified which may be involved in the regulation of the adaptation to iron-limitation, e.g. LCor01340.1 encoding a putative siderophore transporter and LCor00410.1 involved in the siderophore metabolism. Genes encoding the transcription factors LCor08192.1 and LCor01236.1, which are similar to GATA type regulators and to calcineurin regulated CRZ1, respectively, indicating an involvement of the calcineurin pathway in the adaption to iron limitation. Genes encoding MADS-box transcription factors are elevated up to 11 copies compared to the 1–4 copies usually found in other fungi. More findings are: (i) lower content of tRNAs, but unique codons inL. corymbifera, (ii) Over 25% of the proteins are apparently specific for L. corymbifera. (iii) L. corymbifera contains only 2/3 of the proteases (known to be essential virulence factors) in comparision to R. oryzae. On the other hand, the number of secreted proteases, however, is roughly twice as high as in R. oryzae

Stirling Online Research Repository (RIOXX)

Directory of Open Access Journals

PubMed Central

Portsmouth University Research Portal (Pure)

Stirling Online Research Repository

FigShare

A Genome-Wide Characterization of MicroRNA Genes in Maize

Author: A Adai
A Boualem
A Carra
A Li
AA Millar
AC Mallory
AH Paterson
AJ Vilella
Apurva Narechania
B Zhao
BA Chapman
BH Zhang
BH Zhang
BJ Haas
BJ Reinhart
BS Gaut
C Llave
C Lu
C Maher
C Maher
C Pina
C Seoighe
C Soderlund
CA Kidner
CA Kidner
CA Whitelaw
CG Tian
Christopher A. Maher
D Ding
D Swarbreck
DH Chitwood
Doreen Ware
DR Bentley
E Bonnet
E Mica
E van der Knaap
F Vazquez
FL Xie
G Blanc
G Chuck
G Chuck
G Chuck
GE Crooks
H Fujii
H Sanchez-Villeda
H Vaucheret
IM Ehrenreich
J de Meaux
J Lai
J Yu
JE Bowers
Jer-Ming Chia
JF Golz
JF Wang
JH Kim
JJ Doyle
Joseph R. Ecker
Joshua C. Stein
JW Brown
JW Wang
JW Wang
Katherine Guill
L Alves-Junior
L Xu
LE Palmer
Lifang Zhang
LJ Xue
M Ashburner
M Lynch
M Semon
M Yamasaki
M Zuker
MF Wu
Michael D. McMullen
MJ Aukerman
MJ Axtell
MR Willmann
MT Juarez
MW Jones-Rhoades
MW Jones-Rhoades
MW Wright
N Baumberger
N Carraro
N Lauter
NJ Mulder
NN Alexandrov
NN Alexandrov
O Voinnet
P Jaiswal
P Shannon
PS Schnable
Q Liu
R Bruggmann
R Edgar
R Rajagopalan
R Schwab
R Sunkar
R Sunkar
R Sunkar
R Sunkar
RJ Langham
S Gazzani
S Griffiths-Jones
S Guddeti
S Lu
S Maere
S Ohno
S Ouyang
S Rozen
S Schwarz
S Subramanian
S Wang
SI Wright
SQ Huang
Sunita Kumari
T Dezulian
T Tanaka
TD Schneider
TJ Chiou
TJ Hubbard
W Jin
WB Barbazuk
X Chen
X Zhou
Y Benjamini
Y Yao
YC Luo
YF Ding
Z Swigonova
Z Swigonova
Z Xie
Z Xie
Zhijie Liu
Publication venue: Public Library of Science
Publication date: 01/11/2009
Field of study

MicroRNAs (miRNAs) are small, non-coding RNAs that play essential roles in plant growth, development, and stress response. We conducted a genome-wide survey of maize miRNA genes, characterizing their structure, expression, and evolution. Computational approaches based on homology and secondary structure modeling identified 150 high-confidence genes within 26 miRNA families. For 25 families, expression was verified by deep-sequencing of small RNA libraries that were prepared from an assortment of maize tissues. PCR–RACE amplification of 68 miRNA transcript precursors, representing 18 families conserved across several plant species, showed that splice variation and the use of alternative transcriptional start and stop sites is common within this class of genes. Comparison of sequence variation data from diverse maize inbred lines versus teosinte accessions suggest that the mature miRNAs are under strong purifying selection while the flanking sequences evolve equivalently to other genes. Since maize is derived from an ancient tetraploid, the effect of whole-genome duplication on miRNA evolution was examined. We found that, like protein-coding genes, duplicated miRNA genes underwent extensive gene-loss, with ∼35% of ancestral sites retained as duplicate homoeologous miRNA genes. This number is higher than that observed with protein-coding genes. A search for putative miRNA targets indicated bias towards genes in regulatory and metabolic pathways. As maize is one of the principal models for plant growth and development, this study will serve as a foundation for future research into the functional roles of miRNA genes

Public Library of Science (PLOS)

Crossref

Cold Spring Harbor Laboratory Institutional Repository

Directory of Open Access Journals

PubMed Central

Examination of the structure, force resistance, and elasticity of muscle proteins

Author: Caldwell Tracy A
Publication venue: JMU Scholarly Commons
Publication date: 09/05/2015
Field of study

Obscurin and titin are made up of independently folded domains that can be studied individually. Both are comprised of mostly Ig (immunoglobulin) or FnIII (Fibronectin type III)- like domains, which are made of two beta sheets held together by a hydrophobic core. High resolution structures of a limited number of both titin and obscurin domains have been determined using both nuclear magnetic resonance (NMR) and X-ray crystallography. These structures have been complemented by low resolution methods such as small angle X-ray scattering (SAXS) and cryo-electron microscopy (cryo-EM). Here, other high and low resolution structures not previously published will be presented in order to investigate how their response to force, elasticity, flexibility, and orientation of domains aids in their function

James Madison University

Human spermatogenic failure purges deleterious mutation load from the autosomes and both sex chromosomes, including the gene DMRT1

Author: Amorim Antonio
Aston Kenneth I.
Barros Alberto
Carracedo A.
Carrell D.T.
Carvalho Filipa
Conrad D.F.
Dai Juncheng
Downie Jonathan
Fernandes Susana
Gonçalves João
Guo Xuejiang
Hu Z.
Huang N.
Hurles M.E.
Lopes Alexandra
Matthiesen Rune
Moskovtsev S.
Noordam Michiel J.
Ober C.
Paduch D.A.
Quintela Ines
Ramu Avinash
Schiffman J.D.
Schlegel P.N.
Seabra Catarina
Shah Jiahao
Sousa M.
Thompson Emma E
Wilfert Amy B.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/03/2013
Field of study

Gonadal failure, along with early pregnancy loss and perinatal death, may be an important filter that limits the propagation of harmful mutations in the human population. We hypothesized that men with spermatogenic impairment, a disease with unknown genetic architecture and a common cause of male infertility, are enriched for rare deleterious mutations compared to men with normal spermatogenesis. After assaying genomewide SNPs and CNVs in 323 Caucasian men with idiopathic spermatogenic impairment and more than 1,100 controls, we estimate that each rare autosomal deletion detected in our study multiplicatively changes a man’s risk of disease by 10% (OR 1.10 [1.04–1.16], p,261023), rare X-linked CNVs by 29%, (OR 1.29 [1.11–1.50], p,161023), and rare Y-linked duplications by 88% (OR 1.88 [1.13–3.13], p,0.03). By contrasting the properties of our case-specific CNVs with those of CNV callsets from cases of autism, schizophrenia, bipolar disorder, and intellectual disability, we propose that the CNV burden in spermatogenic impairment is distinct from the burden of large, dominant mutations described for neurodevelopmental disorders. We identified two patients with deletions of DMRT1, a gene on chromosome 9p24.3 orthologous to the putative sex determination locus of the avian ZW chromosome system. In an independent sample of Han Chinese men, we identified 3 more DMRT1 deletions in 979 cases of idiopathic azoospermia and none in 1,734 controls, and found none in an additional 4,519 controls from public databases. The combined results indicate that DMRT1 loss-of-function mutations are a risk factor and potential genetic cause of human spermatogenic failure (frequency of 0.38% in 1306 cases and 0% in 7,754 controls, p = 6.261025). Our study identifies other recurrent CNVs as potential causes of idiopathic azoospermia and generates hypotheses for directing future studies on the genetic basis of male infertility and IVF outcomes.This work was partially funded by the Portuguese Foundation for Science and Technology FCT/MCTES (PIDDAC) and co-financed by European funds (FEDER) through the COMPETE program, research grant PTDC/SAU-GMG/101229/2008. IPATIMUP is an Associate Laboratory of the Portuguese Ministry of Science, Technology, and Higher Education and is partially supported by FCT. AML is the recipient of a postdoctoral fellowship from FCT (SFRH/BPD/73366/2010). CO is supported by a grant from the United States National Institutes of Health (R01 HD21244), JDS is supported by Damon Runyon Clinical Investigator Award, Alex's Lemonade Stand Foundation Epidemiology Award, and the Eunice Kennedy Shriver Children's Health Research Career Development Award NICHD 5K12HD001410. Support for humans studies and specimens were provided by the NIH/NIDDK George M. O'Brien Center for Kidney Disease Kidney Translational Research Core (P30DK079333) grant to Washington University. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript

Directory of Open Access Journals

PubMed Central

Repositório Científico do Instituto Nacional de Saúde

FigShare

Statistical and functional convergence of common and rare genetic influences on autism at chromosome 16p

Author: ADHD Working Group of the Psychiatric Genomics Consortium
ASD Working Group of the Psychiatric Genomics Consortium
Daly Mark
iPSYCH Consortium
Palotie Aarno
Weiner Daniel J.
Publication venue
Publication date: 01/11/2022
Field of study

Publisher Copyright: © 2022, The Author(s).The canonical paradigm for converting genetic association to mechanism involves iteratively mapping individual associations to the proximal genes through which they act. In contrast, in the present study we demonstrate the feasibility of extracting biological insights from a very large region of the genome and leverage this strategy to study the genetic influences on autism. Using a new statistical approach, we identified the 33-Mb p-arm of chromosome 16 (16p) as harboring the greatest excess of autism’s common polygenic influences. The region also includes the mechanistically cryptic and autism-associated 16p11.2 copy number variant. Analysis of RNA-sequencing data revealed that both the common polygenic influences within 16p and the 16p11.2 deletion were associated with decreased average gene expression across 16p. The transcriptional effects of the rare deletion and diffuse common variation were correlated at the level of individual genes and analysis of Hi-C data revealed patterns of chromatin contact that may explain this transcriptional convergence. These results reflect a new approach for extracting biological insight from genetic association data and suggest convergence of common and rare genetic influences on autism at 16p.Peer reviewe

Helsingin yliopiston digitaalinen arkisto