40 research outputs found
Transcriptome sequencing of the Microarray Quality Control (MAQC) RNA reference samples using next generation sequencing
<p>Abstract</p> <p>Background</p> <p>Transcriptome sequencing using next-generation sequencing platforms will soon be competing with DNA microarray technologies for global gene expression analysis. As a preliminary evaluation of these promising technologies, we performed deep sequencing of cDNA synthesized from the Microarray Quality Control (MAQC) reference RNA samples using Roche's 454 Genome Sequencer FLX.</p> <p>Results</p> <p>We generated more that 3.6 million sequence reads of average length 250 bp for the MAQC A and B samples and introduced a data analysis pipeline for translating cDNA read counts into gene expression levels. Using BLAST, 90% of the reads mapped to the human genome and 64% of the reads mapped to the RefSeq database of well annotated genes with e-values ≤ 10<sup>-20</sup>. We measured gene expression levels in the A and B samples by counting the numbers of reads that mapped to individual RefSeq genes in multiple sequencing runs to evaluate the MAQC quality metrics for reproducibility, sensitivity, specificity, and accuracy and compared the results with DNA microarrays and Quantitative RT-PCR (QRTPCR) from the MAQC studies. In addition, 88% of the reads were successfully aligned directly to the human genome using the AceView alignment programs with an average 90% sequence similarity to identify 137,899 unique exon junctions, including 22,193 new exon junctions not yet contained in the RefSeq database.</p> <p>Conclusion</p> <p>Using the MAQC metrics for evaluating the performance of gene expression platforms, the ExpressSeq results for gene expression levels showed excellent reproducibility, sensitivity, and specificity that improved systematically with increasing shotgun sequencing depth, and quantitative accuracy that was comparable to DNA microarrays and QRTPCR. In addition, a careful mapping of the reads to the genome using the AceView alignment programs shed new light on the complexity of the human transcriptome including the discovery of thousands of new splice variants.</p
The statistics of identifying differentially expressed genes in Expresso and TM4: a comparison
BACKGROUND: Analysis of DNA microarray data takes as input spot intensity measurements from scanner software and returns differential expression of genes between two conditions, together with a statistical significance assessment. This process typically consists of two steps: data normalization and identification of differentially expressed genes through statistical analysis. The Expresso microarray experiment management system implements these steps with a two-stage, log-linear ANOVA mixed model technique, tailored to individual experimental designs. The complement of tools in TM4, on the other hand, is based on a number of preset design choices that limit its flexibility. In the TM4 microarray analysis suite, normalization, filter, and analysis methods form an analysis pipeline. TM4 computes integrated intensity values (IIV) from the average intensities and spot pixel counts returned by the scanner software as input to its normalization steps. By contrast, Expresso can use either IIV data or median intensity values (MIV). Here, we compare Expresso and TM4 analysis of two experiments and assess the results against qRT-PCR data. RESULTS: The Expresso analysis using MIV data consistently identifies more genes as differentially expressed, when compared to Expresso analysis with IIV data. The typical TM4 normalization and filtering pipeline corrects systematic intensity-specific bias on a per microarray basis. Subsequent statistical analysis with Expresso or a TM4 t-test can effectively identify differentially expressed genes. The best agreement with qRT-PCR data is obtained through the use of Expresso analysis and MIV data. CONCLUSION: The results of this research are of practical value to biologists who analyze microarray data sets. The TM4 normalization and filtering pipeline corrects microarray-specific systematic bias and complements the normalization stage in Expresso analysis. The results of Expresso using MIV data have the best agreement with qRT-PCR results. In one experiment, MIV is a better choice than IIV as input to data normalization and statistical analysis methods, as it yields as greater number of statistically significant differentially expressed genes; TM4 does not support the choice of MIV input data. Overall, the more flexible and extensive statistical models of Expresso achieve more accurate analytical results, when judged by the yardstick of qRT-PCR data, in the context of an experimental design of modest complexity
Contrasting patterns of evolution following whole genome versus tandem duplication events in Populus
Comparative analysis of multiple angiosperm genomes has implicated gene duplication in the expansion and diversification of many gene families. However, empirical data and theory suggest that whole-genome and small-scale duplication events differ with respect to the types of genes preserved as duplicate pairs. We compared gene duplicates resulting from a recent whole genome duplication to a set of tandemly duplicated genes in the model forest tree Populus trichocarpa. We used a combination of microarray expression analyses of a diverse set of tissues and functional annotation to assess factors related to the preservation of duplicate genes of both types. Whole genome duplicates are 700 bp longer and are expressed in 20% more tissues than tandem duplicates. Furthermore, certain functional categories are over-represented in each class of duplicates. In particular, disease resistance genes and receptor-like kinases commonly occur in tandem but are significantly under-retained following whole genome duplication, while whole genome duplicate pairs are enriched for members of signal transduction cascades and transcription factors. The shape of the distribution of expression divergence for duplicated pairs suggests that nearly half of the whole genome duplicates have diverged in expression by a random degeneration process. The remaining pairs have more conserved gene expression than expected by chance, consistent with a role for selection under the constraints of gene balance. We hypothesize that duplicate gene preservation in Populus is driven by a combination of subfunctionalization of duplicate pairs and purifying selection favoring retention of genes encoding proteins with large numbers of interactions
Phylogeographic evidence of cognate recognition site patterns and transformation efficiency differences in H. pylori: theory of strain dominance
BACKGROUND: Helicobacter pylori has diverged in parallel to its human host, leading to distinct phylogeographic populations. Recent evidence suggests that in the current human mixing in Latin America, European H. pylori (hpEurope) are increasingly dominant at the expense of Amerindian haplotypes (hspAmerind). This phenomenon might occur via DNA recombination, modulated by restriction-modification systems (RMS), in which differences in cognate recognition sites (CRS) and in active methylases will determine direction and frequency of gene flow. We hypothesized that genomes from hspAmerind strains that evolved from a small founder population have lost CRS for RMS and active methylases, promoting hpEurope’s DNA invasion. We determined the observed and expected frequencies of CRS for RMS in DNA from 7 H. pylori whole genomes and 110 multilocus sequences. We also measured the number of active methylases by resistance to in vitro digestion by 16 restriction enzymes of genomic DNA from 9 hpEurope and 9 hspAmerind strains, and determined the direction of DNA uptake in co-culture experiments of hspAmerind and hpEurope strains. RESULTS: Most of the CRS were underrepresented with consistency between whole genomes and multilocus sequences. Although neither the frequency of CRS nor the number of active methylases differ among the bacterial populations (average 8.6 ± 2.6), hspAmerind strains had a restriction profile distinct from that in hpEurope strains, with 15 recognition sites accounting for the differences. Amerindians strains also exhibited higher transformation rates than European strains, and were more susceptible to be subverted by larger DNA hpEurope-fragments than vice versa. CONCLUSIONS: The geographical variation in the pattern of CRS provides evidence for ancestral differences in RMS representation and function, and the transformation findings support the hypothesis of Europeanization of the Amerindian strains in Latin America via DNA recombination
A Versatile Computational Pipeline for Bacterial Genome Annotation Improvement and Comparative Analysis, with \u3cem\u3eBrucella\u3c/em\u3e as a Use Case
We present a bacterial genome computational analysis pipeline, called GenVar. The pipeline, based on the program GeneWise, is designed to analyze an annotated genome and automatically identify missed gene calls and sequence variants such as genes with disrupted reading frames (split genes) and those with insertions and deletions (indels). For a given genome to be analyzed, GenVar relies on a database containing closely related genomes (such as other species or strains) as well as a few additional reference genomes. GenVar also helps identify gene disruptions probably caused by sequencing errors. We exemplify GenVar’s capabilities by presenting results from the analysis of four Brucella genomes. Brucella is an important human pathogen and zoonotic agent. The analysis revealed hundreds of missed gene calls, new split genes and indels, several of which are species specific and hence provide valuable clues to the understanding of the genome basis of Brucella pathogenicity and host specificity
Recommended from our members
Contrasting patterns of evolution following whole genome versus tandem duplication events in Populus
Comparative analysis of multiple angiosperm genomes has implicated gene duplication in the expansion and diversification of many gene families. However, empirical data and theory suggest that whole-genome and small-scale duplication events differ with respect to the types of genes preserved as duplicate pairs. We compared gene duplicates resulting from a recent whole genome duplication to a set of tandemly duplicated genes in the model forest tree Populus trichocarpa. We used a combination of microarray expression analyses of a diverse set of tissues and functional annotation to assess factors related to the preservation of duplicate genes of both types. Whole genome duplicates are 700 bp longer and are expressed in 20% more tissues than tandem duplicates. Furthermore, certain functional categories are over-represented in each class of duplicates. In particular, disease resistance genes and receptor-like kinases commonly occur in tandem but are significantly under-retained following whole genome duplication, while whole genome duplicate pairs are enriched for members of signal transduction cascades and transcription factors. The shape of the distribution of expression divergence for duplicated pairs suggests that nearly half of the whole genome duplicates have diverged in expression by a random degeneration process. The remaining pairs have more conserved gene expression than expected by chance, consistent with a role for selection under the constraints of gene balance. We hypothesize that duplicate gene preservation in Populus is driven by a combination of subfunctionalization of duplicate pairs and purifying selection favoring retention of genes encoding proteins with large numbers of interactions.Keywords: Family,
Disease resistance genes,
NBS,
Angiosperms,
Preservation,
Expression,
Balance hypothesis,
Trichocarpa,
Arabidopsis thaliana,
Mechanism
Targeted Development of Registries of Biological Parts
BACKGROUND: The design and construction of novel biological systems by combining basic building blocks represents a dominant paradigm in synthetic biology. Creating and maintaining a database of these building blocks is a way to streamline the fabrication of complex constructs. The Registry of Standard Biological Parts (Registry) is the most advanced implementation of this idea. METHODS/PRINCIPAL FINDINGS: By analyzing inclusion relationships between the sequences of the Registry entries, we build a network that can be related to the Registry abstraction hierarchy. The distribution of entry reuse and complexity was extracted from this network. The collection of clones associated with the database entries was also analyzed. The plasmid inserts were amplified and sequenced. The sequences of 162 inserts could be confirmed experimentally but unexpected discrepancies have also been identified. CONCLUSIONS/SIGNIFICANCE: Organizational guidelines are proposed to help design and manage this new type of scientific resources. In particular, it appears necessary to compare the cost of ensuring the integrity of database entries and associated biological samples with their value to the users. The initial strategy that permits including any combination of parts irrespective of its potential value leads to an exponential and economically unsustainable growth that may be detrimental to the quality and long-term value of the resource to its users
Genome Sequence of Brucella abortus Vaccine Strain S19 Compared to Virulent Strains Yields Candidate Virulence Genes
The Brucella abortus strain S19, a spontaneously attenuated strain, has been used as a vaccine strain in vaccination of cattle against brucellosis for six decades. Despite many studies, the physiological and molecular mechanisms causing the attenuation are not known. We have applied pyrosequencing technology together with conventional sequencing to rapidly and comprehensively determine the complete genome sequence of the attenuated Brucella abortus vaccine strain S19. The main goal of this study is to identify candidate virulence genes by systematic comparative analysis of the attenuated strain with the published genome sequences of two virulent and closely related strains of B. abortus, 9–941 and 2308. The two S19 chromosomes are 2,122,487 and 1,161,449 bp in length. A total of 3062 genes were identified and annotated. Pairwise and reciprocal genome comparisons resulted in a total of 263 genes that were non-identical between the S19 genome and any of the two virulent strains. Amongst these, 45 genes were consistently different between the attenuated strain and the two virulent strains but were identical amongst the virulent strains, which included only two of the 236 genes that have been implicated as virulence factors in literature. The functional analyses of the differences have revealed a total of 24 genes that may be associated with the loss of virulence in S19. Of particular relevance are four genes with more than 60bp consistent difference in S19 compared to both the virulent strains, which, in the virulent strains, encode an outer membrane protein and three proteins involved in erythritol uptake or metabolism
A deletion in GDF7 is associated with a heritable forebrain commissural malformation concurrent with ventriculomegaly and interhemispheric cysts in cats
Publisher Copyright: © 2020 by the authors.An inherited neurologic syndrome in a family of mixed-breed Oriental cats has been characterized as forebrain commissural malformation, concurrent with ventriculomegaly and interhemispheric cysts. However, the genetic basis for this autosomal recessive syndrome in cats is unknown. Forty-three cats were genotyped on the Illumina Infinium Feline 63K iSelect DNA Array and used for analyses. Genome-wide association studies, including a sib-transmission disequilibrium test and a case-control association analysis, and homozygosity mapping, identified a critical region on cat chromosome A3. Short-read whole genome sequencing was completed for a cat trio segregating with the syndrome. A homozygous 7 bp deletion in growth differentiation factor 7 (GDF7) (c.221_227delGCCGCGC [p.Arg74Profs]) was identified in affected cats, by comparison to the 99 Lives Cat variant dataset, validated using Sanger sequencing and genotyped by fragment analyses. This variant was not identified in 192 unaffected cats in the 99 Lives dataset. The variant segregated concordantly in an extended pedigree. In mice, GDF7 mRNA is expressed within the roof plate when commissural axons initiate ventrally-directed growth. This finding emphasized the importance of GDF7 in the neurodevelopmental process in the mammalian brain. A genetic test can be developed for use by cat breeders to eradicate this variant.Peer reviewe
Werewolf, there wolf : Variants in hairless associated with hypotrichia and roaning in the lykoi cat breed
Publisher Copyright: © 2020 by the authors. Licensee MDPI, Basel, Switzerland.A variety of cat breeds have been developed via novelty selection on aesthetic, dermatological traits, such as coat colors and fur types. A recently developed breed, the lykoi (a.k.a. werewolf cat), was bred from cats with a sparse hair coat with roaning, implying full color and all white hairs. The lykoi phenotype is a form of hypotrichia, presenting as a significant reduction in the average numbers of follicles per hair follicle group as compared to domestic shorthair cats, a mild to severe perifollicular to mural lymphocytic infiltration in 77% of observed hair follicle groups, and the follicles are often miniaturized, dilated, and dysplastic. Whole genome sequencing was conducted on a single lykoi cat that was a cross between two independently ascertained lineages. Comparison to the 99 Lives dataset of 194 non‐lykoi cats suggested two variants in the cat homolog for Hairless (HR) (HR lysine demethylase and nuclear receptor corepressor) as candidate causal gene variants. The lykoi cat was a compound heterozygote for two loss of function variants in HR, an exon 3 c.1255_1256dupGT (chrB1:36040783), which should produce a stop codon at amino acid 420 (p.Gln420Serfs*100) and, an exon 18 c.3389insGACA (chrB1:36051555), which should produce a stop codon at amino acid position 1130 (p.Ser1130Argfs*29). Ascertainment of 14 additional cats from founder lineages from Canada, France and different areas of the USA identified four additional loss of function HR variants likely causing the highly similar phenotypic hair coat across the diverse cats. The novel variants in HR for cat hypotrichia can now be established between minor differences in the phenotypic presentations.Peer reviewe