2,726 research outputs found

    Analysis of high-identity segmental duplications in the grapevine genome

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Segmental duplications (SDs) are blocks of genomic sequence of 1-200 kb that map to different loci in a genome and share a sequence identity > 90%. SDs show at the sequence level the same characteristics as other regions of the human genome: they contain both high-copy repeats and gene sequences. SDs play an important role in genome plasticity by creating new genes and modeling genome structure. Although data is plentiful for mammals, not much was known about the representation of SDs in plant genomes. In this regard, we performed a genome-wide analysis of high-identity SDs on the sequenced grapevine (<it>Vitis vinifera</it>) genome (PN40024).</p> <p>Results</p> <p>We demonstrate that recent SDs (> 94% identity and >= 10 kb in size) are a relevant component of the grapevine genome (85 Mb, 17% of the genome sequence). We detected mitochondrial and plastid DNA and genes (10% of gene annotation) in segmentally duplicated regions of the nuclear genome. In particular, the nine highest copy number genes have a copy in either or both organelle genomes. Further we showed that several duplicated genes take part in the biosynthesis of compounds involved in plant response to environmental stress.</p> <p>Conclusions</p> <p>These data show the great influence of SDs and organelle DNA transfers in modeling the <it>Vitis vinifera </it>nuclear DNA structure as well as the impact of SDs in contributing to the adaptive capacity of grapevine and the nutritional content of grape products through genome variation. This study represents a step forward in the full characterization of duplicated genes important for grapevine cultural needs and human health.</p

    Predicting genome-wide redundancy using machine learning

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Gene duplication can lead to genetic redundancy, which masks the function of mutated genes in genetic analyses. Methods to increase sensitivity in identifying genetic redundancy can improve the efficiency of reverse genetics and lend insights into the evolutionary outcomes of gene duplication. Machine learning techniques are well suited to classifying gene family members into redundant and non-redundant gene pairs in model species where sufficient genetic and genomic data is available, such as <it>Arabidopsis thaliana</it>, the test case used here.</p> <p>Results</p> <p>Machine learning techniques that combine multiple attributes led to a dramatic improvement in predicting genetic redundancy over single trait classifiers alone, such as BLAST E-values or expression correlation. In withholding analysis, one of the methods used here, Support Vector Machines, was two-fold more precise than single attribute classifiers, reaching a level where the majority of redundant calls were correctly labeled. Using this higher confidence in identifying redundancy, machine learning predicts that about half of all genes in <it>Arabidopsis </it>showed the signature of predicted redundancy with at least one but typically less than three other family members. Interestingly, a large proportion of predicted redundant gene pairs were relatively old duplications (e.g., Ks > 1), suggesting that redundancy is stable over long evolutionary periods.</p> <p>Conclusions</p> <p>Machine learning predicts that most genes will have a functionally redundant paralog but will exhibit redundancy with relatively few genes within a family. The predictions and gene pair attributes for <it>Arabidopsis </it>provide a new resource for research in genetics and genome evolution. These techniques can now be applied to other organisms.</p

    Distinct Gene Number-Genome Size Relationships for Eukaryotes and Non-Eukaryotes: Gene Content Estimation for Dinoflagellate Genomes

    Get PDF
    The ability to predict gene content is highly desirable for characterization of not-yet sequenced genomes like those of dinoflagellates. Using data from completely sequenced and annotated genomes from phylogenetically diverse lineages, we investigated the relationship between gene content and genome size using regression analyses. Distinct relationships between log10-transformed protein-coding gene number (Yβ€²) versus log10-transformed genome size (Xβ€², genome size in kbp) were found for eukaryotes and non-eukaryotes. Eukaryotes best fit a logarithmic model, Yβ€²β€Š=β€Šln(-46.200+22.678Xβ€², whereas non-eukaryotes a linear model, Yβ€²β€Š=β€Š0.045+0.977Xβ€², both with high significance (p<0.001, R2>0.91). Total gene number shows similar trends in both groups to their respective protein coding regressions. The distinct correlations reflect lower and decreasing gene-coding percentages as genome size increases in eukaryotes (82%–1%) compared to higher and relatively stable percentages in prokaryotes and viruses (97%–47%). The eukaryotic regression models project that the smallest dinoflagellate genome (3Γ—106 kbp) contains 38,188 protein-coding (40,086 total) genes and the largest (245Γ—106 kbp) 87,688 protein-coding (92,013 total) genes, corresponding to 1.8% and 0.05% gene-coding percentages. These estimates do not likely represent extraordinarily high functional diversity of the encoded proteome but rather highly redundant genomes as evidenced by high gene copy numbers documented for various dinoflagellate species

    Genomic copy number variation in Mus musculus.

    Get PDF
    BACKGROUND: Copy number variation is an important dimension of genetic diversity and has implications in development and disease. As an important model organism, the mouse is a prime candidate for copy number variant (CNV) characterization, but this has yet to be completed for a large sample size. Here we report CNV analysis of publicly available, high-density microarray data files for 351 mouse tail samples, including 290 mice that had not been characterized for CNVs previously. RESULTS: We found 9634 putative autosomal CNVs across the samples affecting 6.87% of the mouse reference genome. We find significant differences in the degree of CNV uniqueness (single sample occurrence) and the nature of CNV-gene overlap between wild-caught mice and classical laboratory strains. CNV-gene overlap was associated with lipid metabolism, pheromone response and olfaction compared to immunity, carbohydrate metabolism and amino-acid metabolism for wild-caught mice and classical laboratory strains, respectively. Using two subspecies of wild-caught Mus musculus, we identified putative CNVs unique to those subspecies and show this diversity is better captured by wild-derived laboratory strains than by the classical laboratory strains. A total of 9 genic copy number variable regions (CNVRs) were selected for experimental confirmation by droplet digital PCR (ddPCR). CONCLUSION: The analysis we present is a comprehensive, genome-wide analysis of CNVs in Mus musculus, which increases the number of known variants in the species and will accelerate the identification of novel variants in future studies

    Genomic comparisons and genome architecture of divergent Trypanosoma species

    Get PDF
    Virulent Trypanosoma cruzi, and the non-pathogenic Trypanosoma conorhini and Trypanosoma rangeli are protozoan parasites with divergent lifestyles. T. cruzi and T. rangeli are endemic to Latin America, whereas T. conorhini is tropicopolitan. Reduviid bug vectors spread these parasites to mammalian hosts, within which T. rangeli and T. conorhini replicate extracellularly, while T. cruzi has intracellular stages. Firstly, this work compares the genomes of these parasites to understand their differing phenotypes. Secondly, genome architecture of T. cruzi is examined to address the effect of a complex hybridization history, polycistronic transcription, and genome plasticity on this organism, and study its highly repetitive nature and cryptic genome organization. Whole genome sequencing, assembly and comparison, as well as chromosome-scale genome mapping were employed. This study presents the first comprehensive whole-genome maps of Trypanosoma, and the first T. conorhini strain ever sequenced. Original contributions vii to knowledge include the ~21-25 Mbp assembled genomes of the less virulent T. cruzi G, T. rangeli AM80, and T. conorhini 025E, containing ~10,000 to 13,000 genes, and the ~36 Mbp genome assembly of highly virulent T. cruzi CL with ~24,000 genes. The T. cruzi strains exhibited ~74% identity to proteins of T. rangeli or T. conorhini. T. rangeli and T. conorhini displayed greater complex carbohydrate metabolic capabilities, and contained fewer retrotransposons and multigene family copies, e.g. mucins, DGF-1, and MASP, compared to T. cruzi. Although all four genomes appear highly syntenic, T. rangeli and T. conorhini exhibited greater karyotype conservation. T. cruzi genome architecture studies revealed 66 maps varying from 0.13 to 2.4 Mbp. At least 2.6% of the genome comprises highly repetitive repeat regions, and 7.4% exhibits repetitive regions barren of labels. The 66 putative chromosomes identified are likely diploid. However, 20 of these maps contained regions of up to 1.25 Mbp of homology to at least one other map, suggestive of widespread segmental duplication or an ancient hybridization event that resulted in a genome with significant redundancy. Assembled genomes of these parasites closely reflect their phylogenetic relationships and give a greater context for understanding their divergent lifestyles. Genome mapping provides insight on the genomic evolution of these parasites

    Gene expansion shapes genome architecture in the human pathogen Lichtheimia corymbifera: an evolutionary genomics analysis in the ancient terrestrial mucorales (Mucoromycotina)

    Get PDF
    Lichtheimia species are the second most important cause of mucormycosis in Europe. To provide broader insights into the molecular basis of the pathogenicity-associated traits of the basal Mucorales, we report the full genome sequence of L. corymbifera&nbsp;and compared it to the genome of Rhizopus oryzae, the most common cause of mucormycosis worldwide. The genome assembly encompasses 33.6 MB and 12,379 protein-coding genes. This study reveals four major differences of the L. corymbifera&nbsp;genome to R. oryzae: (i) the presence of an highly elevated number of gene duplications which are unlike R. oryzae&nbsp;not due to whole genome duplication (WGD), (ii) despite the relatively high incidence of introns, alternative splicing (AS) is not frequently observed for the generation of paralogs and in response to stress, (iii) the content of repetitive elements is strikingly low (&lt;5%), (iv) L. corymbifera&nbsp;is typically haploid. Novel virulence factors were identified which may be involved in the regulation of the adaptation to iron-limitation, e.g. LCor01340.1 encoding a putative siderophore transporter and LCor00410.1 involved in the siderophore metabolism. Genes encoding the transcription factors LCor08192.1 and LCor01236.1, which are similar to GATA type regulators and to calcineurin regulated CRZ1, respectively, indicating an involvement of the calcineurin pathway in the adaption to iron limitation. Genes encoding MADS-box transcription factors are elevated up to 11 copies compared to the 1&ndash;4 copies usually found in other fungi. More findings are: (i) lower content of tRNAs, but unique codons inL. corymbifera, (ii) Over 25% of the proteins are apparently specific for L. corymbifera. (iii) L. corymbifera&nbsp;contains only 2/3 of the proteases (known to be essential virulence factors) in comparision to R. oryzae. On the other hand, the number of secreted proteases, however, is roughly twice as high as in R. oryzae

    A Genome-Wide Characterization of MicroRNA Genes in Maize

    Get PDF
    MicroRNAs (miRNAs) are small, non-coding RNAs that play essential roles in plant growth, development, and stress response. We conducted a genome-wide survey of maize miRNA genes, characterizing their structure, expression, and evolution. Computational approaches based on homology and secondary structure modeling identified 150 high-confidence genes within 26 miRNA families. For 25 families, expression was verified by deep-sequencing of small RNA libraries that were prepared from an assortment of maize tissues. PCR–RACE amplification of 68 miRNA transcript precursors, representing 18 families conserved across several plant species, showed that splice variation and the use of alternative transcriptional start and stop sites is common within this class of genes. Comparison of sequence variation data from diverse maize inbred lines versus teosinte accessions suggest that the mature miRNAs are under strong purifying selection while the flanking sequences evolve equivalently to other genes. Since maize is derived from an ancient tetraploid, the effect of whole-genome duplication on miRNA evolution was examined. We found that, like protein-coding genes, duplicated miRNA genes underwent extensive gene-loss, with ∼35% of ancestral sites retained as duplicate homoeologous miRNA genes. This number is higher than that observed with protein-coding genes. A search for putative miRNA targets indicated bias towards genes in regulatory and metabolic pathways. As maize is one of the principal models for plant growth and development, this study will serve as a foundation for future research into the functional roles of miRNA genes

    Examination of the structure, force resistance, and elasticity of muscle proteins

    Get PDF
    Obscurin and titin are made up of independently folded domains that can be studied individually. Both are comprised of mostly Ig (immunoglobulin) or FnIII (Fibronectin type III)- like domains, which are made of two beta sheets held together by a hydrophobic core. High resolution structures of a limited number of both titin and obscurin domains have been determined using both nuclear magnetic resonance (NMR) and X-ray crystallography. These structures have been complemented by low resolution methods such as small angle X-ray scattering (SAXS) and cryo-electron microscopy (cryo-EM). Here, other high and low resolution structures not previously published will be presented in order to investigate how their response to force, elasticity, flexibility, and orientation of domains aids in their function

    Human spermatogenic failure purges deleterious mutation load from the autosomes and both sex chromosomes, including the gene DMRT1

    Get PDF
    Gonadal failure, along with early pregnancy loss and perinatal death, may be an important filter that limits the propagation of harmful mutations in the human population. We hypothesized that men with spermatogenic impairment, a disease with unknown genetic architecture and a common cause of male infertility, are enriched for rare deleterious mutations compared to men with normal spermatogenesis. After assaying genomewide SNPs and CNVs in 323 Caucasian men with idiopathic spermatogenic impairment and more than 1,100 controls, we estimate that each rare autosomal deletion detected in our study multiplicatively changes a man’s risk of disease by 10% (OR 1.10 [1.04–1.16], p,261023), rare X-linked CNVs by 29%, (OR 1.29 [1.11–1.50], p,161023), and rare Y-linked duplications by 88% (OR 1.88 [1.13–3.13], p,0.03). By contrasting the properties of our case-specific CNVs with those of CNV callsets from cases of autism, schizophrenia, bipolar disorder, and intellectual disability, we propose that the CNV burden in spermatogenic impairment is distinct from the burden of large, dominant mutations described for neurodevelopmental disorders. We identified two patients with deletions of DMRT1, a gene on chromosome 9p24.3 orthologous to the putative sex determination locus of the avian ZW chromosome system. In an independent sample of Han Chinese men, we identified 3 more DMRT1 deletions in 979 cases of idiopathic azoospermia and none in 1,734 controls, and found none in an additional 4,519 controls from public databases. The combined results indicate that DMRT1 loss-of-function mutations are a risk factor and potential genetic cause of human spermatogenic failure (frequency of 0.38% in 1306 cases and 0% in 7,754 controls, p = 6.261025). Our study identifies other recurrent CNVs as potential causes of idiopathic azoospermia and generates hypotheses for directing future studies on the genetic basis of male infertility and IVF outcomes.This work was partially funded by the Portuguese Foundation for Science and Technology FCT/MCTES (PIDDAC) and co-financed by European funds (FEDER) through the COMPETE program, research grant PTDC/SAU-GMG/101229/2008. IPATIMUP is an Associate Laboratory of the Portuguese Ministry of Science, Technology, and Higher Education and is partially supported by FCT. AML is the recipient of a postdoctoral fellowship from FCT (SFRH/BPD/73366/2010). CO is supported by a grant from the United States National Institutes of Health (R01 HD21244), JDS is supported by Damon Runyon Clinical Investigator Award, Alex's Lemonade Stand Foundation Epidemiology Award, and the Eunice Kennedy Shriver Children's Health Research Career Development Award NICHD 5K12HD001410. Support for humans studies and specimens were provided by the NIH/NIDDK George M. O'Brien Center for Kidney Disease Kidney Translational Research Core (P30DK079333) grant to Washington University. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript

    Statistical and functional convergence of common and rare genetic influences on autism at chromosome 16p

    Get PDF
    Publisher Copyright: Β© 2022, The Author(s).The canonical paradigm for converting genetic association to mechanism involves iteratively mapping individual associations to the proximal genes through which they act. In contrast, in the present study we demonstrate the feasibility of extracting biological insights from a very large region of the genome and leverage this strategy to study the genetic influences on autism. Using a new statistical approach, we identified the 33-Mb p-arm of chromosome 16 (16p) as harboring the greatest excess of autism’s common polygenic influences. The region also includes the mechanistically cryptic and autism-associated 16p11.2 copy number variant. Analysis of RNA-sequencing data revealed that both the common polygenic influences within 16p and the 16p11.2 deletion were associated with decreased average gene expression across 16p. The transcriptional effects of the rare deletion and diffuse common variation were correlated at the level of individual genes and analysis of Hi-C data revealed patterns of chromatin contact that may explain this transcriptional convergence. These results reflect a new approach for extracting biological insight from genetic association data and suggest convergence of common and rare genetic influences on autism at 16p.Peer reviewe
    • …
    corecore