12 research outputs found

    PathSys: integrating molecular interaction graphs for systems biology

    Get PDF
    BACKGROUND: The goal of information integration in systems biology is to combine information from a number of databases and data sets, which are obtained from both high and low throughput experiments, under one data management scheme such that the cumulative information provides greater biological insight than is possible with individual information sources considered separately. RESULTS: Here we present PathSys, a graph-based system for creating a combined database of networks of interaction for generating integrated view of biological mechanisms. We used PathSys to integrate over 14 curated and publicly contributed data sources for the budding yeast (S. cerevisiae) and Gene Ontology. A number of exploratory questions were formulated as a combination of relational and graph-based queries to the integrated database. Thus, PathSys is a general-purpose, scalable, graph-data warehouse of biological information, complete with a graph manipulation and a query language, a storage mechanism and a generic data-importing mechanism through schema-mapping. CONCLUSION: Results from several test studies demonstrate the effectiveness of the approach in retrieving biologically interesting relations between genes and proteins, the networks connecting them, and of the utility of PathSys as a scalable graph-based warehouse for interaction-network integration and a hypothesis generator system. The PathSys's client software, named BiologicalNetworks, developed for navigation and analyses of molecular networks, is available as a Java Web Start application at

    BioHealthBase: informatics support in the elucidation of influenza virus host–pathogen interactions and virulence

    Get PDF
    The BioHealthBase Bioinformatics Resource Center (BRC) (http://www.biohealthbase.org) is a public bioinformatics database and analysis resource for the study of specific biodefense and public health pathogens—Influenza virus, Francisella tularensis, Mycobacterium tuberculosis, Microsporidia species and ricin toxin. The BioHealthBase serves as an extensive integrated repository of data imported from public databases, data derived from various computational algorithms and information curated from the scientific literature. The goal of the BioHealthBase is to facilitate the development of therapeutics, diagnostics and vaccines by integrating all available data in the context of host–pathogen interactions, thus allowing researchers to understand the root causes of virulence and pathogenicity. Genome and protein annotations can be viewed either as formatted text or graphically through a genome browser. 3D visualization capabilities allow researchers to view proteins with key structural and functional features highlighted. Influenza virus host–pathogen interactions at the molecular/cellular and systemic levels are represented. Host immune response to influenza infection is conveyed through the display of experimentally determined antibody and T-cell epitopes curated from the scientific literature or as derived from computational predictions. At the molecular/cellular level, the BioHealthBase BRC has developed biological pathway representations relevant to influenza virus host–pathogen interaction in collaboration with the Reactome database (http://www.reactome.org)

    Complete Genomic Characterization of a Pathogenic A.II Strain of Francisella tularensis Subspecies tularensis

    Get PDF
    Francisella tularensis is the causative agent of tularemia, which is a highly lethal disease from nature and potentially from a biological weapon. This species contains four recognized subspecies including the North American endemic F. tularensis subsp. tularensis (type A), whose genetic diversity is correlated with its geographic distribution including a major population subdivision referred to as A.I and A.II. The biological significance of the A.I – A.II genetic differentiation is unknown, though there are suggestive ecological and epidemiological correlations. In order to understand the differentiation at the genomic level, we have determined the complete sequence of an A.II strain (WY96-3418) and compared it to the genome of Schu S4 from the A.I population. We find that this A.II genome is 1,898,476 bp in size with 1,820 genes, 1,303 of which code for proteins. While extensive genomic variation exists between “WY96” and Schu S4, there is only one whole gene difference. This one gene difference is a hypothetical protein of unknown function. In contrast, there are numerous SNPs (3,367), small indels (1,015), IS element differences (7) and large chromosomal rearrangements (31), including both inversions and translocations. The rearrangement borders are frequently associated with IS elements, which would facilitate intragenomic recombination events. The pathogenicity island duplicated regions (DR1 and DR2) are essentially identical in WY96 but vary relative to Schu S4 at 60 nucleotide positions. Other potential virulence-associated genes (231) varied at 559 nucleotide positions, including 357 non-synonymous changes. Molecular clock estimates for the divergence time between A.I and A.II genomes for different chromosomal regions ranged from 866 to 2131 years before present. This paper is the first complete genomic characterization of a member of the A.II clade of Francisella tularensis subsp. tularensis

    Complete Genome Sequence of Francisella tularensis Subspecies holarctica FTNF002-00

    Get PDF
    Francisella tularensis subspecies holarctica FTNF002-00 strain was originally obtained from the first known clinical case of bacteremic F. tularensis pneumonia in Southern Europe isolated from an immunocompetent individual. The FTNF002-00 complete genome contains the RD23 deletion and represents a type strain for a clonal population from the first epidemic tularemia outbreak in Spain between 1997–1998. Here, we present the complete sequence analysis of the FTNF002-00 genome. The complete genome sequence of FTNF002-00 revealed several large as well as small genomic differences with respect to two other published complete genome sequences of F. tularensis subsp. holarctica strains, LVS and OSU18. The FTNF002-00 genome shares >99.9% sequence similarity with LVS and OSU18, and is also ∼5 MB smaller by comparison. The overall organization of the FTNF002-00 genome is remarkably identical to those of LVS and OSU18, except for a single 3.9 kb inversion in FTNF002-00. Twelve regions of difference ranging from 0.1–1.5 kb and forty-two small insertions and deletions were identified in a comparative analysis of FTNF002-00, LVS, and OSU18 genomes. Two small deletions appear to inactivate two genes in FTNF002-00 causing them to become pseudogenes; the intact genes encode a protein of unknown function and a drug:H+ antiporter. In addition, we identified ninety-nine proteins in FTNF002-00 containing amino acid mutations compared to LVS and OSU18. Several non-conserved amino acid replacements were identified, one of which occurs in the virulence-associated intracellular growth locus subunit D protein. Many of these changes in FTNF002-00 are likely the consequence of direct selection that increases the fitness of this subsp. holarctica clone within its endemic population. Our complete genome sequence analyses lay the foundation for experimental testing of these possibilities

    List of 99 FTNF002-00 proteins with amino acid mutations compared to LVS and OSU18.

    No full text
    <p>103 amino acid mutations affect 99 proteins in FTNF002-00. The four proteins that contain two mutations each are FTA_0793, FTA_1126, FTA_1828, and FTA_2061.</p><p>The list is sorted in the ascending order of the score in column 3. The score for amino acid change is based on the BLOSUM62 scoring matrix. A lower score indicates a non-conserved change, while a higher score indicates a conserved change.</p><p>The amino acid change in column 2 represents a change from an amino acid that occurs in LVS and OSU18 protein to the amino acid in the FTNF002-00 ortholog.</p><p>The COG category in the last column represents the single letter code for the functional COG categories (<a href="http://www.ncbi.nlm.nih.gov/COG/grace/fiew.cgi" target="_blank">http://www.ncbi.nlm.nih.gov/COG/grace/fiew.cgi</a>).</p

    A 10 kb hypervariable genomic region carrying three regions of difference between the three <i>F. t. holarctica</i> genomes.

    No full text
    <p>Arrows indicate the direction of gene transcription. Genes and intergenic regions are not to scale. Homologous genes in the three genomes have been assigned the same color. Solid arrows indicate putative functional genes and hatched or grey arrows represent pseudogenes. The horizontal box on the FTNF002-00 chromosomal fragment indicates the 3,868 bp fragment (positioned at 247478–251345) that is inverted with respect to the other two genomes. A vertical box over the three genomes indicates the location of a 962 bp region that is different in OSU18 (see <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0007041#pone-0007041-t002" target="_blank">Table 2</a>). The two equal sized horizontal boxes on the OSU18 genomic fragment indicate tandem duplications of a 1933 bp region (see <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0007041#pone-0007041-t002" target="_blank">Table 2</a>).</p

    Overview of general genome features of fully sequenced <i>F. tularensis</i> strains.<sup>1</sup>

    No full text
    1<p>Information presented here was obtained from the Genbank database.</p>2<p>Organisms are grouped by subspecies in order of ascending genome sizes. The first three are Type B strains; organisms 4, 5, and 6 are Type A strains. While 4 and 5 are subtype A.1 strains, organism 6 is subtype A.II strain.</p>3<p>In all seven organisms, the rRNA genes are organized in 3 operons of 16S, 23S, and 5S rRNAs and an additional orphan 5S rRNA. In <i>F. tularensis</i> subsp. <i>novicida</i> U112, one of the 5S rRNA that forms an operon with the 16S (FTN_1308) and the 23S (FTN_1305) rRNAs is not annotated and was found by BLAST search.</p>4<p>Each organism encodes a single molecule each of the RNA component of the ribonuclease P, a tmRNA, and a RFN element. Two copies of the RNA component of the signal recognistion particle (SRP) are encoded in each genome. Only 14 of the 35 strucural RNAs are annotated in the NCBI genbank database. Many of the missing structural RNAs could be identified in the genome using BLAST.</p

    Small insertions and deletions (Indels) in the <i>holarctica</i> genomes.<sup>a</sup>

    No full text
    a<p>Insertions and deletions smaller than 100 bp and larger than 10 bp in the genomes of FTNF002-00, LVS, and OSU18 are listed.</p>b<p>The fragments are listed in an ascending order of their location in FTNF002-00 genome. Length of the fragment is provided in parenthesis if the sequence occurs in FTN002-00, while a “ ̂” indicates the position of a sequence that is absent in the FTN002-00 genome.</p>c<p>Locus tags of genes contained in or overlapping the fragment are listed; FTA  = > FTNF002-00, FTL  = > LVS, and FTH  = > OSU18 genomes.</p
    corecore