Search CORE

11 research outputs found

A pairwise genic comparison of 12 NTHi strains of and the reference strain Rd KW20

Author: Benjamin Janto (34106)
Fen Z Hu (34099)
Garth D Ehrlich (34110)
J Christopher Post (48105)
Jay Hayes (48103)
Justin S Hogg (48102)
Randy Keefe (48104)
Robert Boissy (34109)
Publication venue
Publication date
Field of study

The comparison of two strains is found at the intersection of the row and column corresponding to the respective strains. Strains are compared based on the number of genes shared between the pair, the number of genes found in one strain but not the other, and the number of shared genes that are unique to that pair of strains. A typical pair of strains differs by 395 genes. Similar pairs of strains are shaded in yellow, while divergent strains are shaded orange.Copyright information:Taken from "Characterization and modeling of the core and supragenomes based on the complete genomic sequences of Rd and 12 clinical nontypeable strains"http://genomebiology.com/2007/8/6/R103Genome Biology 2007;8(6):R103-R103.Published online 5 Jun 2007PMCID:PMC2394751.</p

FigShare

A plot of the total number of clusters as a function of clustering parameters shows an inflection point near 0

Author: Benjamin Janto (34106)
Fen Z Hu (34099)
Garth D Ehrlich (34110)
J Christopher Post (48105)
Jay Hayes (48103)
Justin S Hogg (48102)
Randy Keefe (48104)
Robert Boissy (34109)
Publication venue
Publication date
Field of study

65 identity and 0.70 match length. The inflection, which minimizes the rate of change in the number of clusters per change in parameters, suggests a set of parameters that optimally segregates orthologs and paralogs.Copyright information:Taken from "Characterization and modeling of the core and supragenomes based on the complete genomic sequences of Rd and 12 clinical nontypeable strains"http://genomebiology.com/2007/8/6/R103Genome Biology 2007;8(6):R103-R103.Published online 5 Jun 2007PMCID:PMC2394751.</p

FigShare

A 40 kb region present in Rd KW20 shows two blocks of genomic variation among other strains

Author: Benjamin Janto (34106)
Fen Z Hu (34099)
Garth D Ehrlich (34110)
J Christopher Post (48105)
Jay Hayes (48103)
Justin S Hogg (48102)
Randy Keefe (48104)
Robert Boissy (34109)
Publication venue
Publication date
Field of study

The upstream block is bounded on the right by a frame-shifted insertion sequence (IS) element (HI1018). The downstream block (HI1024-HI1032) includes genes with likely roles in sugar transport and metabolism. Rd is used as a reference for the alignment, and sequence present in other strains without homology to Rd is not shown.Copyright information:Taken from "Characterization and modeling of the core and supragenomes based on the complete genomic sequences of Rd and 12 clinical nontypeable strains"http://genomebiology.com/2007/8/6/R103Genome Biology 2007;8(6):R103-R103.Published online 5 Jun 2007PMCID:PMC2394751.</p

FigShare

A 20 kb region that demonstrates strain diversity at the level of an individual gene (lic2C), a pair of genes (NTHi0683/4), and a group of seven functionally related genes (urease system)

Author: Benjamin Janto (34106)
Fen Z Hu (34099)
Garth D Ehrlich (34110)
J Christopher Post (48105)
Jay Hayes (48103)
Justin S Hogg (48102)
Randy Keefe (48104)
Robert Boissy (34109)
Publication venue
Publication date
Field of study

86-028NP is used as a reference for the alignment, and sequence present in other strains without homology to 86-028NP is not shown.Copyright information:Taken from "Characterization and modeling of the core and supragenomes based on the complete genomic sequences of Rd and 12 clinical nontypeable strains"http://genomebiology.com/2007/8/6/R103Genome Biology 2007;8(6):R103-R103.Published online 5 Jun 2007PMCID:PMC2394751.</p

FigShare

A multi-sequence alignment using 86-028NP as a reference shows varying degrees of homology among 6 strains to a 50 kb region homologous to the plasmid ICEhin1056

Author: Benjamin Janto (34106)
Fen Z Hu (34099)
Garth D Ehrlich (34110)
J Christopher Post (48105)
Jay Hayes (48103)
Justin S Hogg (48102)
Randy Keefe (48104)
Robert Boissy (34109)
Publication venue
Publication date
Field of study

The plasmid is integrated in 86-028NP and is partially present in R2866, but absent from the other strains in the alignment. Sequences present in other strains without homology to 86-028NP are not shown.Copyright information:Taken from "Characterization and modeling of the core and supragenomes based on the complete genomic sequences of Rd and 12 clinical nontypeable strains"http://genomebiology.com/2007/8/6/R103Genome Biology 2007;8(6):R103-R103.Published online 5 Jun 2007PMCID:PMC2394751.</p

FigShare

The distribution of genes among gene classes in the supragenome model trained on 8 or 13 strains

Author: Benjamin Janto (34106)
Fen Z Hu (34099)
Garth D Ehrlich (34110)
J Christopher Post (48105)
Jay Hayes (48103)
Justin S Hogg (48102)
Randy Keefe (48104)
Robert Boissy (34109)
Publication venue
Publication date
Field of study

The only significant difference occurs in the rare gene categories with frequency 0.01 and 0.10. A small sample of eight strains is not expected to generate accurate predictions for these categories.Copyright information:Taken from "Characterization and modeling of the core and supragenomes based on the complete genomic sequences of Rd and 12 clinical nontypeable strains"http://genomebiology.com/2007/8/6/R103Genome Biology 2007;8(6):R103-R103.Published online 5 Jun 2007PMCID:PMC2394751.</p

FigShare

A theoretical plot of the number of new genes expected to be found in the Nth genome for future sequencing projects

Author: Benjamin Janto (34106)
Fen Z Hu (34099)
Garth D Ehrlich (34110)
J Christopher Post (48105)
Jay Hayes (48103)
Justin S Hogg (48102)
Randy Keefe (48104)
Robert Boissy (34109)
Publication venue
Publication date
Field of study

The plot was generated using strains isolated in North America, and the extrapolation may not hold for isolates from other geographic locales if some distributed genes are geographically isolated. The model predicts that the number of new genes found in a strain will diminish 20 after sequencing 30 strains, and the number will trend toward 0 as the number of sequences becomes large.Copyright information:Taken from "Characterization and modeling of the core and supragenomes based on the complete genomic sequences of Rd and 12 clinical nontypeable strains"http://genomebiology.com/2007/8/6/R103Genome Biology 2007;8(6):R103-R103.Published online 5 Jun 2007PMCID:PMC2394751.</p

FigShare

Global alignment of R2866 and PittEE shows a large inversion and several regions unique to each strain

Author: Benjamin Janto (34106)
Fen Z Hu (34099)
Garth D Ehrlich (34110)
J Christopher Post (48105)
Jay Hayes (48103)
Justin S Hogg (48102)
Randy Keefe (48104)
Robert Boissy (34109)
Publication venue
Publication date
Field of study

The strains are similar across the majority of the genome; however, there is one large inversion as well as several regions unique to each strain.Copyright information:Taken from "Characterization and modeling of the core and supragenomes based on the complete genomic sequences of Rd and 12 clinical nontypeable strains"http://genomebiology.com/2007/8/6/R103Genome Biology 2007;8(6):R103-R103.Published online 5 Jun 2007PMCID:PMC2394751.</p

FigShare

The expected number of total gene clusters and core gene clusters identified at the addition of each genome to the clustering dataset

Author: Benjamin Janto (34106)
Fen Z Hu (34099)
Garth D Ehrlich (34110)
J Christopher Post (48105)
Jay Hayes (48103)
Justin S Hogg (48102)
Randy Keefe (48104)
Robert Boissy (34109)
Publication venue
Publication date
Field of study

Modeling predictions are based on the eight strain training set (see 'Mathematical development of a finite supragenome model'). The number of genes observed in all strains levels off to an asymptote that corresponds to a core set of genes. The rate of increase in total genes decreases, but does not level off due to the discovery of rare genes.Copyright information:Taken from "Characterization and modeling of the core and supragenomes based on the complete genomic sequences of Rd and 12 clinical nontypeable strains"http://genomebiology.com/2007/8/6/R103Genome Biology 2007;8(6):R103-R103.Published online 5 Jun 2007PMCID:PMC2394751.</p

FigShare

Plotting of relationships among the sequenced NTHi strains by gene sharing and multi-locus sequence typing

Author: Benjamin Janto (34106)
Fen Z Hu (34099)
Garth D Ehrlich (34110)
J Christopher Post (48105)
Jay Hayes (48103)
Justin S Hogg (48102)
Randy Keefe (48104)
Robert Boissy (34109)
Publication venue
Publication date
Field of study

A dendrogram based on genic differences among the 13 strains of . While several pairs of strains appear to be closely related, there is not a well-defined clade structure. The dendrogram was generated using the unweighted pair group method with arithmetic mean (UPGMA) method [44-46]. The number on each branch corresponds to the number of genic differences from the previous branch point. A dendrogram based on sequence alignments of the seven MLST loci. The tree was built using the maximum likelihood method implemented in fastDNAml. The number on each branch corresponds to the number of point mutations per kilobase from the previous branch point. The topologies of the genic and MLST based trees are different. Most notably, strains PittEE and R2846 are closely related in the genic dendrogram, but are separated in the MLST dendrogram. In other instances, such as PittII and R2866, the strains are closely related in both trees.Copyright information:Taken from "Characterization and modeling of the core and supragenomes based on the complete genomic sequences of Rd and 12 clinical nontypeable strains"http://genomebiology.com/2007/8/6/R103Genome Biology 2007;8(6):R103-R103.Published online 5 Jun 2007PMCID:PMC2394751.</p

FigShare