23 research outputs found

    An Exon-Based Comparative Variant Analysis Pipeline to Study the Scale and Role of Frameshift and Nonsense Mutation in the Human-Chimpanzee Divergence

    Get PDF
    Chimpanzees and humans are closely related but differ in many deadly human diseases and other characteristics in physiology, anatomy, and pathology. In spite of decades of extensive research, crucial questions about the molecular mechanisms behind the differences are yet to be understood. Here I report ExonVar, a novel computational pipeline for Exon-based human-chimpanzee comparative Variant analysis. The objective is to comparatively analyze mutations specifically those that caused the frameshift and nonsense mutations and to assess their scale and potential impacts on human-chimpanzee divergence. Genomewide analysis of human and chimpanzee exons with ExonVar identified a number of species-specific, exon-disrupting mutations in chimpanzees but much fewer in humans. Many were found on genes involved in important biological processes such as T cell lineage development, the pathogenesis of inflammatory diseases, and antigen induced cell death. A “less-is-more” model was previously established to illustrate the role of the gene inactivation and disruptions during human evolution. Here this analysis suggested a different model where the chimpanzee-specific exon-disrupting mutations may act as additional evolutionary force that drove the human-chimpanzee divergence. Finally, the analysis revealed a number of sequencing errors in the chimpanzee and human genome sequences and further illustrated that they could be corrected without resequencing

    Pathogenic \u3cem\u3eBacillus Anthracis\u3c/em\u3e in the Progressive Gene Losses and Gains in Adaptive Evolution

    Get PDF
    Background: Sequence mutations represent a driving force of adaptive evolution in bacterial pathogens. It is especially evident in reductive genome evolution where bacteria underwent lifestyles shifting from a free-living to a strictly intracellular or host-depending life. It resulted in loss of function mutations and/or the acquisition of virulence gene clusters. Bacillus anthracis shares a common soil bacterial ancestor with its closely related bacillus species but is the only obligate, causative agent of inhalation anthrax within the genus Bacillus. The anthrax-causing Bacillus anthracis experienced the similar lifestyle changes. We thus hypothesized that the bacterial pathogen would follow a compatible evolution path. Results: In this study, a cluster-based evolution scheme was devised to analyze genes that are gained by or lost from B. anthracis. The study detected gene losses/gains at two separate evolutionary stages. The stage I is when B. anthracis and its sister species within the Bacillus cereus group diverged from other species in genus Bacillus. The stage II is when B. anthracis differentiated from its two closest relatives: B. cereus and B. thuringiensis. Many genes gained at these stages are homologues of known pathogenic factors such those for internalin, B. anthracis-specific toxins and large groups of surface proteins and lipoproteins. Conclusion: The analysis presented here allowed us to portray a progressive evolutionary process during the lifestyle shift of B. anthracis, thus providing new insights into how B. anthracis had evolved and bore a promise of finding drug and vaccine targets for this strategically important pathogen

    PATRIC: The VBI PathoSystems Resource Integration Center

    Get PDF
    The PathoSystems Resource Integration Center (PATRIC) is one of eight Bioinformatics Resource Centers (BRCs) funded by the National Institute of Allergy and Infection Diseases (NIAID) to create a data and analysis resource for selected NIAID priority pathogens, specifically proteobacteria of the genera Brucella, Rickettsia and Coxiella, and corona-, caliciand lyssaviruses and viruses associated with hepatitis A and E. The goal of the project is to provide a comprehensive bioinformatics resource for these pathogens, including consistently annotated genome, proteome and metabolic pathway data to facilitate research into counter-measures, including drugs, vaccines and diagnostics. The project’s curation strategy has three prongs: ‘breadth first’ beginning with whole-genome and proteome curation using standardized protocols, a ‘targeted’ approach addressing the specific needs of researchers and an integrative strategy to leverage high-throughput experimental data (e.g. microarrays, proteomics) and literature. The PATRIC infrastructure consists of a relational database, analytical pipelines and a website which supports browsing, querying, data visualization and the ability to download raw and curated data in standard formats. At present, the site warehouses complete sequences for 17 bacterial and 332 viral genomes. The PATRIC website (https:// patric.vbi.vt.edu) will continually grow with the addition of data, analysis and functionality over the course of the project

    \u3cem\u3eGen\u3csup\u3eHtr\u3c/sup\u3e\u3c/em\u3e: A Tool for Comparative Assessment of Genetic Heterogeneity in Microbial Genomes Generated by Massive Short-Read Sequencing

    Get PDF
    Background: Microevolution is the study of short-term changes of alleles within a population and their effects on the phenotype of organisms. The result of the below-species-level evolution is heterogeneity, where populations consist of subpopulations with a large number of structural variations. Heterogeneity analysis is thus essential to our understanding of how selective and neutral forces shape bacterial populations over a short period of time. The Solexa Genome Analyzer, a next-generation sequencing platform, allows millions of short sequencing reads to be obtained with great accuracy, allowing for the ability to study the dynamics of the bacterial population at the whole genome level. The tool referred to as GenHtr was developed for genome-wide heterogeneity analysis. Results: For particular bacterial strains, GenHtr relies on a set of Solexa short reads on given bacteria pathogens and their isogenic reference genome to identify heterogeneity sites, the chromosomal positions with multiple variants of genes in the bacterial population, and variations that occur in large gene families. GenHtr accomplishes this by building and comparatively analyzing genome-wide heterogeneity genotypes for both the newly sequenced genomes (using massive short-read sequencing) and their isogenic reference (using simulated data). As proof of the concept, this approach was applied to SRX007711, the Solexa sequencing data for a newly sequenced Staphylococcus aureus subsp. USA300 cell line, and demonstrated that it could predict such multiple variants. They include multiple variants of genes critical in pathogenesis, e.g. genes encoding a LysR family transcriptional regulator, 23 S ribosomal RNA, and DNA mismatch repair protein MutS. The heterogeneity results in nonsynonymous and nonsense mutations, leading to truncated proteins for both LysR and MutS. Conclusion: GenHtr was developed for genome-wide heterogeneity analysis. Although it is much more timeconsuming when compared to Maq, a popular tool for SNP analysis, GenHtr is able to predict potential multiple variants that pre-exist in the bacterial population as well as SNPs that occur in the highly duplicated gene families. It is expected that, with the proper experimental design, this analysis can improve our understanding of the molecular mechanism underlying the dynamics and the evolution of drug-resistant bacterial pathogens

    GenHtr: a tool for comparative assessment of genetic heterogeneity in microbial genomes generated by massive short-read sequencing

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Microevolution is the study of short-term changes of alleles within a population and their effects on the phenotype of organisms. The result of the below-species-level evolution is heterogeneity, where populations consist of subpopulations with a large number of structural variations. Heterogeneity analysis is thus essential to our understanding of how selective and neutral forces shape bacterial populations over a short period of time. The Solexa Genome Analyzer, a next-generation sequencing platform, allows millions of short sequencing reads to be obtained with great accuracy, allowing for the ability to study the dynamics of the bacterial population at the whole genome level. The tool referred to as <b><it>Gen<sup>Htr </sup></it></b>was developed for genome-wide heterogeneity analysis.</p> <p>Results</p> <p>For particular bacterial strains, <b><it>Gen<sup>Htr </sup></it></b>relies on a set of Solexa short reads on given bacteria pathogens and their isogenic reference genome to identify heterogeneity sites, the chromosomal positions with multiple variants of genes in the bacterial population, and variations that occur in large gene families. <b><it>Gen<sup>Htr </sup></it></b>accomplishes this by building and comparatively analyzing genome-wide heterogeneity genotypes for both the newly sequenced genomes (using massive short-read sequencing) and their isogenic reference (using simulated data). As proof of the concept, this approach was applied to SRX007711, the Solexa sequencing data for a newly sequenced <it>Staphylococcus aureus </it>subsp. USA300 cell line, and demonstrated that it could predict such multiple variants. They include multiple variants of genes critical in pathogenesis, e.g. genes encoding a LysR family transcriptional regulator, 23 S ribosomal RNA, and DNA mismatch repair protein MutS. The heterogeneity results in non-synonymous and nonsense mutations, leading to truncated proteins for both LysR and MutS.</p> <p>Conclusion</p> <p><b><it>Gen<sup>Htr </sup></it></b>was developed for genome-wide heterogeneity analysis. Although it is much more time-consuming when compared to Maq, a popular tool for SNP analysis, <b><it>Gen<sup>Htr </sup></it></b>is able to predict potential multiple variants that pre-exist in the bacterial population as well as SNPs that occur in the highly duplicated gene families. It is expected that, with the proper experimental design, this analysis can improve our understanding of the molecular mechanism underlying the dynamics and the evolution of drug-resistant bacterial pathogens.</p

    CFI2P: Coarse-to-Fine Cross-Modal Correspondence Learning for Image-to-Point Cloud Registration

    Full text link
    In the context of image-to-point cloud registration, acquiring point-to-pixel correspondences presents a challenging task since the similarity between individual points and pixels is ambiguous due to the visual differences in data modalities. Nevertheless, the same object present in the two data formats can be readily identified from the local perspective of point sets and pixel patches. Motivated by this intuition, we propose a coarse-to-fine framework that emphasizes the establishment of correspondences between local point sets and pixel patches, followed by the refinement of results at both the point and pixel levels. On a coarse scale, we mimic the classic Visual Transformer to translate both image and point cloud into two sequences of local representations, namely point and pixel proxies, and employ attention to capture global and cross-modal contexts. To supervise the coarse matching, we propose a novel projected point proportion loss, which guides to match point sets with pixel patches where more points can be projected into. On a finer scale, point-to-pixel correspondences are then refined from a smaller search space (i.e., the coarsely matched sets and patches) via well-designed sampling, attentional learning and fine matching, where sampling masks are embedded in the last two steps to mitigate the negative effect of sampling. With the high-quality correspondences, the registration problem is then resolved by EPnP algorithm within RANSAC. Experimental results on large-scale outdoor benchmarks demonstrate our superiority over existing methods

    Deep Domain Adversarial Adaptation for Photon-efficient Imaging

    Full text link
    Photon-efficient imaging with the single-photon light detection and ranging (LiDAR) captures the three-dimensional (3D) structure of a scene by only a few detected signal photons per pixel. However, the existing computational methods for photon-efficient imaging are pre-tuned on a restricted scenario or trained on simulated datasets. When applied to realistic scenarios whose signal-to-background ratios (SBR) and other hardware-specific properties differ from those of the original task, the model performance often significantly deteriorates. In this paper, we present a domain adversarial adaptation design to alleviate this domain shift problem by exploiting unlabeled real-world data, with significant resource savings. This method demonstrates superior performance on simulated and real-world experiments using our home-built up-conversion single-photon imaging system, which provides an efficient approach to bypass the lack of ground-truth depth information in implementing computational imaging algorithms for realistic applications

    A Versatile Computational Pipeline for Bacterial Genome Annotation Improvement and Comparative Analysis, with \u3cem\u3eBrucella\u3c/em\u3e as a Use Case

    Get PDF
    We present a bacterial genome computational analysis pipeline, called GenVar. The pipeline, based on the program GeneWise, is designed to analyze an annotated genome and automatically identify missed gene calls and sequence variants such as genes with disrupted reading frames (split genes) and those with insertions and deletions (indels). For a given genome to be analyzed, GenVar relies on a database containing closely related genomes (such as other species or strains) as well as a few additional reference genomes. GenVar also helps identify gene disruptions probably caused by sequencing errors. We exemplify GenVar’s capabilities by presenting results from the analysis of four Brucella genomes. Brucella is an important human pathogen and zoonotic agent. The analysis revealed hundreds of missed gene calls, new split genes and indels, several of which are species specific and hence provide valuable clues to the understanding of the genome basis of Brucella pathogenicity and host specificity

    Genome Sequence of Brucella abortus Vaccine Strain S19 Compared to Virulent Strains Yields Candidate Virulence Genes

    Get PDF
    The Brucella abortus strain S19, a spontaneously attenuated strain, has been used as a vaccine strain in vaccination of cattle against brucellosis for six decades. Despite many studies, the physiological and molecular mechanisms causing the attenuation are not known. We have applied pyrosequencing technology together with conventional sequencing to rapidly and comprehensively determine the complete genome sequence of the attenuated Brucella abortus vaccine strain S19. The main goal of this study is to identify candidate virulence genes by systematic comparative analysis of the attenuated strain with the published genome sequences of two virulent and closely related strains of B. abortus, 9–941 and 2308. The two S19 chromosomes are 2,122,487 and 1,161,449 bp in length. A total of 3062 genes were identified and annotated. Pairwise and reciprocal genome comparisons resulted in a total of 263 genes that were non-identical between the S19 genome and any of the two virulent strains. Amongst these, 45 genes were consistently different between the attenuated strain and the two virulent strains but were identical amongst the virulent strains, which included only two of the 236 genes that have been implicated as virulence factors in literature. The functional analyses of the differences have revealed a total of 24 genes that may be associated with the loss of virulence in S19. Of particular relevance are four genes with more than 60bp consistent difference in S19 compared to both the virulent strains, which, in the virulent strains, encode an outer membrane protein and three proteins involved in erythritol uptake or metabolism

    New Implications on Genomic Adaptation Derived from the Helicobacter pylori Genome Comparison

    Get PDF
    BACKGROUND: Helicobacter pylori has a reduced genome and lives in a tough environment for long-term persistence. It evolved with its particular characteristics for biological adaptation. Because several H. pylori genome sequences are available, comparative analysis could help to better understand genomic adaptation of this particular bacterium. PRINCIPAL FINDINGS: We analyzed nine H. pylori genomes with emphasis on microevolution from a different perspective. Inversion was an important factor to shape the genome structure. Illegitimate recombination not only led to genomic inversion but also inverted fragment duplication, both of which contributed to the creation of new genes and gene family, and further, homological recombination contributed to events of inversion. Based on the information of genomic rearrangement, the first genome scaffold structure of H. pylori last common ancestor was produced. The core genome consists of 1186 genes, of which 22 genes could particularly adapt to human stomach niche. H. pylori contains high proportion of pseudogenes whose genesis was principally caused by homopolynucleotide (HPN) mutations. Such mutations are reversible and facilitate the control of gene expression through the change of DNA structure. The reversible mutations and a quasi-panmictic feature could allow such genes or gene fragments frequently transferred within or between populations. Hence, pseudogenes could be a reservoir of adaptation materials and the HPN mutations could be favorable to H. pylori adaptation, leading to HPN accumulation on the genomes, which corresponds to a special feature of Helicobacter species: extremely high HPN composition of genome. CONCLUSION: Our research demonstrated that both genome content and structure of H. pylori have been highly adapted to its particular life style
    corecore