101 research outputs found

    Early History of Mammals Is Elucidated with the ENCODE Multiple Species Sequencing Data

    Get PDF
    Understanding the early evolution of placental mammals is one of the most challenging issues in mammalian phylogeny. Here, we addressed this question by using the sequence data of the ENCODE consortium, which include 1% of mammalian genomes in 18 species belonging to all main mammalian lineages. Phylogenetic reconstructions based on an unprecedented amount of coding sequences taken from 218 genes resulted in a highly supported tree placing the root of Placentalia between Afrotheria and Exafroplacentalia (Afrotheria hypothesis). This topology was validated by the phylogenetic analysis of a new class of genomic phylogenetic markers, the conserved noncoding sequences. Applying the tests of alternative topologies on the coding sequence dataset resulted in the rejection of the Atlantogenata hypothesis (Xenarthra grouping with Afrotheria), while this test rejected the second alternative scenario, the Epitheria hypothesis (Xenarthra at the base), when using the noncoding sequence dataset. Thus, the two datasets support the Afrotheria hypothesis; however, none can reject both of the remaining topological alternatives

    Non-alignment comparison of human and high primate genomes

    Full text link
    Compositional spectra (CS) analysis based on k-mer scoring of DNA sequences was employed in this study for dot-plot comparison of human and primate genomes. The detection of extended conserved synteny regions was based on continuous fuzzy similarity rather than on chains of discrete anchors (genes or highly conserved noncoding elements). In addition to the high correspondence found in the comparisons of whole-genome sequences, a good similarity was also found after masking gene sequences, indicating that CS analysis manages to reveal phylogenetic signal in the organization of noncoding part of the genome sequences, including repetitive DNA and the genome "dark matter". Obviously, the possibility to reveal parallel ordering depends on the signal of common ancestor sequence organization varying locally along the corresponding segments of the compared genomes. We explored two sources contributing to this signal: sequence composition (GC content) and sequence organization (abundances of k-mers in the usual A,T,G,C or purine-pyrimidine alphabets). Whole-genome comparisons based on GC distribution along the analyzed sequences indeed gives reasonable results, but combining it with k-mer abundances dramatically improves the ordering quality, indicating that compositional and organizational heterogeneity comprise complementary sources of information on evolutionary conserved similarity of genome sequences

    A custom capture sequence approach for oculocutaneous albinism identifies structural variant alleles at the OCA2 locus

    Get PDF
    Oculocutaneous albinism (OCA) is a heritable disorder of pigment production that manifests as hypopigmentation and altered eye development. Exon sequencing of known OCA genes is unsuccessful in producing a complete molecular diagnosis for a significant number of affected individuals. We sequenced the DNA of individuals with OCA using short-read custom capture sequencing that targeted coding, intronic and non-coding regulatory regions of known OCA genes and GWAS-associated pigmentation loci. We identified an OCA2 complex structural variant (CxSV), defined by a 143kb inverted segment reintroduced in intron 1, upstream of the native location. The corresponding CxSV junctions were observed in 11/390 probands screened. The 143kb CxSV presents in one family as a copy number variant (CNV) duplication for the 143kb region. In the remaining 10/11 families, the 143kb CxSV acquired an additional 184kb deletion across the same region, restoring exons 3–19 of OCA2 to a copy-number neutral state. Allele-associated haplotype analysis found rare SNVs rs374519281 and rs139696407 are linked with the 143kb CxSV in both OCA2 alleles. For individuals in which customary molecular evaluation does not reveal a biallelic OCA diagnosis, we recommend preliminary screening for these haplotype-associated rare variants, followed by junction-specific validation for the OCA2 143kb CxSV

    Gene-Specific Substitution Profiles Describe the Types and Frequencies of Amino Acid Changes during Antibody Somatic Hypermutation

    Get PDF
    Somatic hypermutation (SHM) plays a critical role in the maturation of antibodies, optimizing recognition initiated by recombination of V(D)J genes. Previous studies have shown that the propensity to mutate is modulated by the context of surrounding nucleotides and that SHM machinery generates biased substitutions. To investigate the intrinsic mutation frequency and substitution bias of SHMs at the amino acid level, we analyzed functional human antibody repertoires and developed mGSSP (method for gene-specific substitution profile), a method to construct amino acid substitution profiles from next-generation sequencing-determined B cell transcripts. We demonstrated that these gene-specific substitution profiles (GSSPs) are unique to each V gene and highly consistent between donors. We also showed that the GSSPs constructed from functional antibody repertoires are highly similar to those constructed from antibody sequences amplified from non-productively rearranged passenger alleles, which do not undergo functional selection. This suggests the types and frequencies, or mutational space, of a majority of amino acid changes sampled by the SHM machinery to be well captured by GSSPs. We further observed the rates of mutational exchange between some amino acids to be both asymmetric and context dependent and to correlate weakly with their biochemical properties. GSSPs provide an improved, position-dependent alternative to standard substitution matrices, and can be utilized to developing software for accurately modeling the SHM process. GSSPs can also be used for predicting the amino acid mutational space available for antigen-driven selection and for understanding factors modulating the maturation pathways of antibody lineages in a gene-specific context. The mGSSP method can be used to build, compare, and plot GSSPs1; we report the GSSPs constructed for 69 common human V genes (DOI: 10.6084/m9.figshare.3511083) and provide high-resolution logo plots for each (DOI: 10.6084/m9.figshare.3511085)

    Developmental Pathway of the MPER-Directed HIV-1-Neutralizing Antibody 10E8

    Get PDF
    Antibody 10E8 targets the membrane-proximal external region (MPER) of HIV-1 gp41, neutralizes >97% of HIV-1 isolates, and lacks the auto-reactivity often associated with MPER-directed antibodies. The developmental pathway of 10E8 might therefore serve as a promising template for vaccine design, but samples from time-of-infection—often used to infer the B cell record—are unavailable. In this study, we used crystallography, next-generation sequencing (NGS), and functional assessments to infer the 10E8 developmental pathway from a single time point. Mutational analysis indicated somatic hypermutation of the 2nd-heavy chain-complementarity determining region (CDR H2) to be critical for neutralization, and structures of 10E8 variants with V-gene regions reverted to genomic origin for heavy-and-light chains or heavy chain-only showed structural differences >2 Å relative to mature 10E8 in the CDR H2 and H3. To understand these developmental changes, we used bioinformatic sieving, maximum likelihood, and parsimony analyses of immunoglobulin transcripts to identify 10E8-lineage members, to infer the 10E8-unmutated common ancestor (UCA), and to calculate 10E8-developmental intermediates. We were assisted in this analysis by the preservation of a critical D-gene segment, which was unmutated in most 10E8-lineage sequences. UCA and early intermediates weakly bound a 26-residue-MPER peptide, whereas HIV-1 neutralization and epitope recognition in liposomes were only observed with late intermediates. Antibody 10E8 thus develops from a UCA with weak MPER affinity and substantial differences in CDR H2 and H3 from the mature 10E8; only after extensive somatic hypermutation do 10E8-lineage members gain recognition in the context of membrane and HIV-1 neutralization

    Genetic effects on liver chromatin accessibility identify disease regulatory variants

    Get PDF
    Identifying the molecular mechanisms by which genome-wide association study (GWAS) loci influence traits remains challenging. Chromatin accessibility quantitative trait loci (caQTLs) help identify GWAS loci that may alter GWAS traits by modulating chromatin structure, but caQTLs have been identified in a limited set of human tissues. Here we mapped caQTLs in human liver tissue in 20 liver samples and identified 3,123 caQTLs. The caQTL variants are enriched in liver tissue promoter and enhancer states and frequently disrupt binding motifs of transcription factors expressed in liver. We predicted target genes for 861 caQTL peaks using proximity, chromatin interactions, correlation with promoter accessibility or gene expression, and colocalization with expression QTLs. Using GWAS signals for 19 liver function and/or cardiometabolic traits, we identified 110 colocalized caQTLs and GWAS signals, 56 of which contained a predicted caPeak target gene. At the LITAF LDL-cholesterol GWAS locus, we validated that a caQTL variant showed allelic differences in protein binding and transcriptional activity. These caQTLs contribute to the epigenomic characterization of human liver and help identify molecular mechanisms and genes at GWAS loci

    Initial Sequence and Comparative Analysis of the Cat Genome

    Get PDF
    The genome sequence (1.9-fold coverage) of an inbred Abyssinian domestic cat was assembled, mapped, and annotated with a comparative approach that involved cross-reference to annotated genome assemblies of six mammals (human, chimpanzee, mouse, rat, dog, and cow). The results resolved chromosomal positions for 663,480 contigs, 20,285 putative feline gene orthologs, and 133,499 conserved sequence blocks (CSBs). Additional annotated features include repetitive elements, endogenous retroviral sequences, nuclear mitochondrial (numt) sequences, micro-RNAs, and evolutionary breakpoints that suggest historic balancing of translocation and inversion incidences in distinct mammalian lineages. Large numbers of single nucleotide polymorphisms (SNPs), deletion insertion polymorphisms (DIPs), and short tandem repeats (STRs), suitable for linkage or association studies were characterized in the context of long stretches of chromosome homozygosity. In spite of the light coverage capturing ∼65% of euchromatin sequence from the cat genome, these comparative insights shed new light on the tempo and mode of gene/genome evolution in mammals, promise several research applications for the cat, and also illustrate that a comparative approach using more deeply covered mammals provides an informative, preliminary annotation of a light (1.9-fold) coverage mammal genome sequence

    Revealing mammalian evolutionary relationships by comparative analysis of gene clusters

    Get PDF
    Many software tools for comparative analysis of genomic sequence data have been released in recent decades. Despite this, it remains challenging to determine evolutionary relationships in gene clusters due to their complex histories involving duplications, deletions, inversions, and conversions. One concept describing these relationships is orthology. Orthologs derive from a common ancestor by speciation, in contrast to paralogs, which derive from duplication. Discriminating orthologs from paralogs is a necessary step in most multispecies sequence analyses, but doing so accurately is impeded by the occurrence of gene conversion events. We propose a refined method of orthology assignment based on two paradigms for interpreting its definition: by genomic context or by sequence content. X-orthology (based on context) traces orthology resulting from speciation and duplication only, while N-orthology (based on content) includes the influence of conversion events

    Exome sequencing of child–parent trios with bladder exstrophy: Findings in 26 children

    Get PDF
    Bladder exstrophy (BE) is a rare, lower ventral midline defect with the bladder and part of the urethra exposed. The etiology of BE is unknown but thought to be influenced by genetic variation with more recent studies suggesting a role for rare variants. As such, we conducted paired-end exome sequencing in 26 child/mother/father trios. Three children had rare (allele frequency ≤ 0.0001 in several public databases) inherited variants in TSPAN4, one with a loss-of-function variant and two with missense variants. Two children had loss-of-function variants in TUBE1. Four children had rare missense or nonsense variants (one per child) in WNT3, CRKL, MYH9, or LZTR1, genes previously associated with BE. We detected 17 de novo missense variants in 13 children and three de novo loss-of-function variants (AKR1C2, PRRX1, PPM1D) in three children (one per child). We also detected rare compound heterozygous loss-of-function variants in PLCH2 and CLEC4M and rare inherited missense or loss-of-function variants in additional genes applying autosomal recessive (three genes) and X-linked recessive inheritance models (13 genes). Variants in two genes identified may implicate disruption in cell migration (TUBE1) and adhesion (TSPAN4) processes, mechanisms proposed for BE, and provide additional evidence for rare variants in the development of this defect

    Exome sequencing identifies variants in infants with sacral agenesis

    Get PDF
    Background: Sacral agenesis (SA) consists of partial or complete absence of the caudal end of the spine and often presents with additional birth defects. Several studies have examined gene variants for syndromic forms of SA, but only one has examined exomes of children with non-syndromic SA. Methods: Using buccal cell specimens from families of children with non-syndromic SA, exomes of 28 child–parent trios (eight with and 20 without a maternal diagnosis of pregestational diabetes) and two child–father duos (neither with diagnosis of maternal pregestational diabetes) were exome sequenced. Results: Three children had heterozygous missense variants in ID1 (Inhibitor of DNA Binding 1), with CADD scores >20 (top 1% of deleterious variants in the genome); two children inherited the variant from their fathers and one from the child's mother. Rare missense variants were also detected in PDZD2 (PDZ Domain Containing 2; N = 1) and SPTBN5 (Spectrin Beta, Non-erythrocytic 5; N = 2), two genes previously suggested to be associated with SA etiology. Examination of variants with autosomal recessive and X-linked recessive inheritance identified five and two missense variants, respectively. Compound heterozygous variants were identified in several genes. In addition, 12 de novo variants were identified, all in different genes in different children. Conclusions: To our knowledge, this is the first study reporting a possible association between ID1 and non-syndromic SA. Although maternal pregestational diabetes has been strongly associated with SA, the missense variants in ID1 identified in two of three children were paternally inherited. These findings add to the knowledge of gene variants associated with non-syndromic SA and provide data for future studies
    corecore