3 research outputs found

    Identifying a few foot-and-mouth disease virus signature nucleotide strings for computational genotyping

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Serotypes of the Foot-and-Mouth disease viruses (FMDVs) were generally determined by biological experiments. The computational genotyping is not well studied even with the availability of whole viral genomes, due to uneven evolution among genes as well as frequent genetic recombination. Naively using sequence comparison for genotyping is only able to achieve a limited extent of success.</p> <p>Results</p> <p>We used 129 FMDV strains with known serotype as training strains to select as many as 140 most serotype-specific nucleotide strings. We then constructed a linear-kernel Support Vector Machine classifier using these 140 strings. Under the leave-one-out cross validation scheme, this classifier was able to assign correct serotype to 127 of these 129 strains, achieving 98.45% accuracy. It also assigned serotype correctly to an independent test set of 83 other FMDV strains downloaded separately from NCBI GenBank.</p> <p>Conclusion</p> <p>Computational genotyping is much faster and much cheaper than the wet-lab based biological experiments, upon the availability of the detailed molecular sequences. The high accuracy of our proposed method suggests the potential of utilizing a few signature nucleotide strings instead of whole genomes to determine the serotypes of novel FMDV strains.</p

    Nephele: genotyping via complete composition vectors and MapReduce

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Current sequencing technology makes it practical to sequence many samples of a given organism, raising new challenges for the processing and interpretation of large genomics data sets with associated metadata. Traditional computational phylogenetic methods are ideal for studying the evolution of gene/protein families and using those to infer the evolution of an organism, but are less than ideal for the study of the whole organism mainly due to the presence of insertions/deletions/rearrangements. These methods provide the researcher with the ability to group a set of samples into distinct genotypic groups based on sequence similarity, which can then be associated with metadata, such as host information, pathogenicity, and time or location of occurrence. Genotyping is critical to understanding, at a genomic level, the origin and spread of infectious diseases. Increasingly, genotyping is coming into use for disease surveillance activities, as well as for microbial forensics. The classic genotyping approach has been based on phylogenetic analysis, starting with a multiple sequence alignment. Genotypes are then established by expert examination of phylogenetic trees. However, these traditional single-processor methods are suboptimal for rapidly growing sequence datasets being generated by next-generation DNA sequencing machines, because they increase in computational complexity quickly with the number of sequences.</p> <p>Results</p> <p>Nephele is a suite of tools that uses the complete composition vector algorithm to represent each sequence in the dataset as a vector derived from its constituent k-mers by passing the need for multiple sequence alignment, and affinity propagation clustering to group the sequences into genotypes based on a distance measure over the vectors. Our methods produce results that correlate well with expert-defined clades or genotypes, at a fraction of the computational cost of traditional phylogenetic methods run on traditional hardware. Nephele can use the open-source Hadoop implementation of MapReduce to parallelize execution using multiple compute nodes. We were able to generate a neighbour-joined tree of over 10,000 16S samples in less than 2 hours.</p> <p>Conclusions</p> <p>We conclude that using Nephele can substantially decrease the processing time required for generating genotype trees of tens to hundreds of organisms at genome scale sequence coverage.</p

    Detection and serotyping of foot-and-mouth disease virus with laboratory and in silico methods

    Get PDF
    Foot-and-mouth disease virus (FMDV) is a highly contagious animal pathogen and it has a variable genome and high antigenic variation. There are seven known serotypes of this virus: A, O, C, Asia1, SAT1, SAT2, and SAT3. The rapid detection and serotype characterization of the virus is instrumental for the prompt response by animal health authorities. This thesis presents the design and development of the first electronic microarray assay for the simultaneous detection and subtyping of FMDV. The assay was evaluated in silico and it was tested with 19 synthetic DNA constructs representing all 7 serotypes, followed by the testing with 23 viral RNA samples representing all 7 serotypes. Also, various in silico methods were compared for the classification of FMDV sequences using complete genomes and next generation sequencing (NGS) data. Finally, highly specific and highly sensitive single nucleotide variant signatures that distinguish the seven FMDV serotypes were discovered.Chemical, Biological, Radiological-Nuclear, and Explosives Research and Technology Initiative (CRTI) Project 09-403T
    corecore