113 research outputs found

    Analysis of High-Throughput Flow Cytometry Data Using plateCore

    Get PDF
    Flow cytometry (FCM) software packages from R/Bioconductor, such as flowCore and flowViz, serve as an open platform for development of new analysis tools and methods. We created plateCore, a new package that extends the functionality in these core packages to enable automated negative control-based gating and make the processing and analysis of plate-based data sets from high-throughput FCM screening experiments easier. plateCore was used to analyze data from a BD FACS CAP screening experiment where five Peripheral Blood Mononucleocyte Cell (PBMC) samples were assayed for 189 different human cell surface markers. This same data set was also manually analyzed by a cytometry expert using the FlowJo data analysis software package (TreeStar, USA). We show that the expression values for markers characterized using the automated approach in plateCore are in good agreement with those from FlowJo, and that using plateCore allows for more reproducible analyses of FCM screening data

    TreeToReads - a pipeline for simulating raw reads from phylogenies.

    Get PDF
    BackgroundUsing phylogenomic analysis tools for tracking pathogens has become standard practice in academia, public health agencies, and large industries. Using the same raw read genomic data as input, there are several different approaches being used to infer phylogenetic tree. These include many different SNP pipelines, wgMLST approaches, k-mer algorithms, whole genome alignment and others; each of these has advantages and disadvantages, some have been extensively validated, some are faster, some have higher resolution. A few of these analysis approaches are well-integrated into the regulatory process of US Federal agencies (e.g. the FDA's SNP pipeline for tracking foodborne pathogens). However, despite extensive validation on benchmark datasets and comparison with other pipelines, we lack methods for fully exploring the effects of multiple parameter values in each pipeline that can potentially have an effect on whether the correct phylogenetic tree is recovered.ResultsTo resolve this problem, we offer a program, TreeToReads, which can generate raw read data from mutated genomes simulated under a known phylogeny. This simulation pipeline allows direct comparisons of simulated and observed data in a controlled environment. At each step of these simulations, researchers can vary parameters of interest (e.g., input tree topology, amount of sequence divergence, rate of indels, read coverage, distance of reference genome, etc) to assess the effects of various parameter values on correctly calling SNPs and reconstructing an accurate tree.ConclusionsSuch critical assessments of the accuracy and robustness of analytical pipelines are essential to progress in both research and applied settings

    flowCore: a Bioconductor package for high throughput flow cytometry

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Recent advances in automation technologies have enabled the use of flow cytometry for high throughput screening, generating large complex data sets often in clinical trials or drug discovery settings. However, data management and data analysis methods have not advanced sufficiently far from the initial small-scale studies to support modeling in the presence of multiple covariates.</p> <p>Results</p> <p>We developed a set of flexible open source computational tools in the R package flowCore to facilitate the analysis of these complex data. A key component of which is having suitable data structures that support the application of similar operations to a collection of samples or a clinical cohort. In addition, our software constitutes a shared and extensible research platform that enables collaboration between bioinformaticians, computer scientists, statisticians, biologists and clinicians. This platform will foster the development of novel analytic methods for flow cytometry.</p> <p>Conclusion</p> <p>The software has been applied in the analysis of various data sets and its data structures have proven to be highly efficient in capturing and organizing the analytic work flow. Finally, a number of additional Bioconductor packages successfully build on the infrastructure provided by flowCore, open new avenues for flow data analysis.</p

    High resolution clustering of Salmonella enterica serovar Montevideo strains using a next-generation sequencing approach

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Next-Generation Sequencing (NGS) is increasingly being used as a molecular epidemiologic tool for discerning ancestry and traceback of the most complicated, difficult to resolve bacterial pathogens. Making a linkage between possible food sources and clinical isolates requires distinguishing the suspected pathogen from an environmental background and placing the variation observed into the wider context of variation occurring within a serovar and among other closely related foodborne pathogens. Equally important is the need to validate these high resolution molecular tools for use in molecular epidemiologic traceback. Such efforts include the examination of strain cluster stability as well as the cumulative genetic effects of sub-culturing on these clusters. Numerous isolates of <it>S</it>. Montevideo were shot-gun sequenced including diverse lineage representatives as well as numerous replicate clones to determine how much variability is due to bias, sequencing error, and or the culturing of isolates. All new draft genomes were compared to 34 <it>S</it>. Montevideo isolates previously published during an NGS-based molecular epidemiological case study.</p> <p>Results</p> <p>Intraserovar lineages of <it>S</it>. Montevideo differ by thousands of SNPs, that are only slightly less than the number of SNPs observed between <it>S</it>. Montevideo and other distinct serovars. Much less variability was discovered within an individual <it>S</it>. Montevideo clade implicated in a recent foodborne outbreak as well as among individual NGS replicates. These findings were similar to previous reports documenting homopolymeric and deletion error rates with the Roche 454 GS Titanium technology. In no case, however, did variability associated with sequencing methods or sample preparations create inconsistencies with our current phylogenetic results or the subsequent molecular epidemiological evidence gleaned from these data.</p> <p>Conclusions</p> <p>Implementation of a validated pipeline for NGS data acquisition and analysis provides highly reproducible results that are stable and predictable for molecular epidemiological applications. When draft genomes are collected at 15×-20× coverage and passed through a quality filter as part of a data analysis pipeline, including sub-passaged replicates defined by a few SNPs, they can be accurately placed in a phylogenetic context. This reproducibility applies to all levels within and between serovars of <it>Salmonella </it>suggesting that investigators using these methods can have confidence in their conclusions.</p

    Core Genome Multilocus Sequence Typing for Food Animal Source Attribution of Human Campylobacter jejuni Infections

    Get PDF
    Campylobacter jejuni is a major foodborne pathogen and common cause of bacterial enteritis worldwide. A total of 622 C. jejuni isolates recovered from food animals and retail meats in the United States through the National Antimicrobial Resistance Monitoring System between 2013 and 2017 were sequenced using an Illumina MiSeq. Sequences were combined with WGS data of 222 human isolates downloaded from NCBI and analyzed by core genome multilocus sequence typing (cgMLST) and traditional MLST. cgMLST allelic difference (AD) thresholds of 0, 5, 10, 25, 50, 100 and 200 identified 828, 734, 652, 543, 422, 298 and 197 cgMLST types among the 844 isolates, respectively, and traditional MLST identified 174 ST. The cgMLST scheme allowing an AD of 200 (cgMLST200) revealed strong correlation with MLST. cgMLST200 showed 40.5% retail chicken isolates, 56.5% swine, 77.4% dairy cattle and 78.9% beef cattle isolates shared cgMLST sequence type with human isolates. All ST-8 had the same cgMLST200 type (cgMLST200-12) and 74.3% of ST-8 and 75% cgMLST200-12 were confirmed as sheep abortion virulence clones by PorA analysis. Twenty-nine acquired resistance genes, including 21 alleles of blaOXA, tetO, aph(3′)-IIIa, ant(6)-Ia, aadE, aad9, aph(2′)-Ig, aph(2′)-Ih, sat4 plus mutations in gyrA, 23SrRNA and L22 were identified. Resistance genotypes were strongly linked with cgMLST200 type for certain groups including 12/12 cgMLST200-510 with the A103V substitution in L22 and 10/11 cgMLST200-608 with the T86I GyrA substitution associated with macrolide and quinolone resistance, respectively. In summary, the cgMLST200 threshold scheme combined with resistance genotype information could provide an excellent subtyping scheme for source attribution of human C. jejuni infections

    Proficiency testing for bacterial whole genome sequencing: an end-user survey of current capabilities, requirements and priorities

    Get PDF
    The advent of next-generation sequencing (NGS) has revolutionised public health microbiology. Given the potential impact of NGS, it is paramount to ensure standardisation of ‘wet’ laboratory and bioinformatic protocols and promote comparability of methods employed by different laboratories and their outputs. Therefore, one of the ambitious goals of the Global Microbial Identifier (GMI) initiative (http://www.globalmicrobialidentifier.org/) has been to establish a mechanism for inter-laboratory NGS proficiency testing (PT). This report presents findings from the survey recently conducted by Working Group 4 among GMI members in order to ascertain NGS end-use requirements and attitudes towards NGS PT. The survey identified the high professional diversity of laboratories engaged in NGS-based public health projects and the wide range of capabilities within institutions, at a notable range of costs. The priority pathogens reported by respondents reflected the key drivers for NGS use (high burden disease and ‘high profile’ pathogens). The performance of and participation in PT was perceived as important by most respondents. The wide range of sequencing and bioinformatics practices reported by end-users highlights the importance of standardisation and harmonisation of NGS in public health and underpins the use of PT as a means to assuring quality. The findings of this survey will guide the design of the GMI PT program in relation to the spectrum of pathogens included, testing frequency and volume as well as technical requirements. The PT program for external quality assurance will evolve and inform the introduction of NGS into clinical and public health microbiology practice in the post-genomic era. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12879-015-0902-3) contains supplementary material, which is available to authorized users

    Comparative evaluation of direct plating and most probable number for enumeration of low levels of Listeria monocytogenes in naturally contaminated ice cream products

    Get PDF
    AbstractA precise and accurate method for enumeration of low level of Listeria monocytogenes in foods is critical to a variety of studies. In this study, paired comparison of most probable number (MPN) and direct plating enumeration of L. monocytogenes was conducted on a total of 1730 outbreak-associated ice cream samples that were naturally contaminated with low level of L. monocytogenes. MPN was performed on all 1730 samples. Direct plating was performed on all samples using the RAPID'L.mono (RLM) agar (1600 samples) and agar Listeria Ottaviani and Agosti (ALOA; 130 samples). Probabilistic analysis with Bayesian inference model was used to compare paired direct plating and MPN estimates of L. monocytogenes in ice cream samples because assumptions implicit in ordinary least squares (OLS) linear regression analyses were not met for such a comparison. The probabilistic analysis revealed good agreement between the MPN and direct plating estimates, and this agreement showed that the MPN schemes and direct plating schemes using ALOA or RLM evaluated in the present study were suitable for enumerating low levels of L. monocytogenes in these ice cream samples. The statistical analysis further revealed that OLS linear regression analyses of direct plating and MPN data did introduce bias that incorrectly characterized systematic differences between estimates from the two methods
    corecore