5,797 research outputs found

    Highly Scalable Algorithms for Robust String Barcoding

    Full text link
    String barcoding is a recently introduced technique for genomic-based identification of microorganisms. In this paper we describe the engineering of highly scalable algorithms for robust string barcoding. Our methods enable distinguisher selection based on whole genomic sequences of hundreds of microorganisms of up to bacterial size on a well-equipped workstation, and can be easily parallelized to further extend the applicability range to thousands of bacterial size genomes. Experimental results on both randomly generated and NCBI genomic data show that whole-genome based selection results in a number of distinguishers nearly matching the information theoretic lower bounds for the problem

    My-Forensic-Loci-queries (MyFLq) framework for analysis of forensic STR data generated by massive parallel sequencing

    Get PDF
    Forensic scientists are currently investigating how to transition from capillary electrophoresis (CE) to massive parallel sequencing (MPS) for analysis of forensic DNA profiles. MPS offers several advantages over CE such as virtually unlimited multiplexy of loci, combining both short tandem repeat (STR) and single nucleotide polymorphism (SNP) loci, small amplicons without constraints of size separation, more discrimination power, deep mixture resolution and sample multiplexing. We present our bioinformatic framework My-Forensic-Loci-queries (MyFLq) for analysis of MPS forensic data. For allele calling, the framework uses a MySQL reference allele database with automatically determined regions of interest (ROIs) by a generic maximal flanking algorithm which makes it possible to use any STR or SNP forensic locus. Python scripts were designed to automatically make allele calls starting from raw MPS data. We also present a method to assess the usefulness and overall performance of a forensic locus with respect to MPS, as well as methods to estimate whether an unknown allele, which sequence is not present in the MySQL database, is in fact a new allele or a sequencing error. The MyFLq framework was applied to an Illumina MiSeq dataset of a forensic Illumina amplicon library, generated from multilocus STR polymerase chain reaction (PCR) on both single contributor samples and multiple person DNA mixtures. Although the multilocus PCR was not yet optimized for MPS in terms of amplicon length or locus selection, the results show excellent results for most loci. The results show a high signal-to-noise ratio, correct allele calls, and a low limit of detection for minor DNA contributors in mixed DNA samples. Technically, forensic MPS affords great promise for routine implementation in forensic genomics. The method is also applicable to adjacent disciplines such as mitochondrial DNA research

    Multiple Locus Variable number of tandem repeat Analysis : a molecular genotyping tool for Paenibacillus larvae

    Get PDF
    American Foulbrood, caused by Paenibacillus larvae, is the most severe bacterial disease of honey bees (Apis mellifera). To perform genotyping of P.larvae in an epidemiological context, there is a need of a fast and cheap method with a high resolution. Here, we propose Multiple Locus Variable number of tandem repeat Analysis (MLVA). MLVA has been used for typing a collection of 209 P.larvae strains from which 23 different MLVA types could be identified. Moreover, the developed methodology not only permits the identification of the four Enterobacterial Repetitive Intergenic Consensus (ERIC) genotypes, but allows also a discriminatory subdivision of the most dominant ERIC type I and ERIC type II genotypes. A biogeographical study has been conducted showing a significant correlation between MLVA genotype and the geographical region where it was isolated

    High-Throughput SNP Genotyping by SBE/SBH

    Full text link
    Despite much progress over the past decade, current Single Nucleotide Polymorphism (SNP) genotyping technologies still offer an insufficient degree of multiplexing when required to handle user-selected sets of SNPs. In this paper we propose a new genotyping assay architecture combining multiplexed solution-phase single-base extension (SBE) reactions with sequencing by hybridization (SBH) using universal DNA arrays such as all kk-mer arrays. In addition to PCR amplification of genomic DNA, SNP genotyping using SBE/SBH assays involves the following steps: (1) Synthesizing primers complementing the genomic sequence immediately preceding SNPs of interest; (2) Hybridizing these primers with the genomic DNA; (3) Extending each primer by a single base using polymerase enzyme and dideoxynucleotides labeled with 4 different fluorescent dyes; and finally (4) Hybridizing extended primers to a universal DNA array and determining the identity of the bases that extend each primer by hybridization pattern analysis. Our contributions include a study of multiplexing algorithms for SBE/SBH genotyping assays and preliminary experimental results showing the achievable tradeoffs between the number of array probes and primer length on one hand and the number of SNPs that can be assayed simultaneously on the other. Simulation results on datasets both randomly generated and extracted from the NCBI dbSNP database suggest that the SBE/SBH architecture provides a flexible and cost-effective alternative to genotyping assays currently used in the industry, enabling genotyping of up to hundreds of thousands of user-specified SNPs per assay.Comment: 19 page

    Java web tools for PCR, in silico PCR, and oligonucleotide assembly and analysis

    Get PDF
    AbstractThe polymerase chain reaction is fundamental to molecular biology and is the most important practical molecular technique for the research laboratory. We have developed and tested efficient tools for PCR primer and probe design, which also predict oligonucleotide properties based on experimental studies of PCR efficiency. The tools provide comprehensive facilities for designing primers for most PCR applications and their combinations, including standard, multiplex, long-distance, inverse, real-time, unique, group-specific, bisulphite modification assays, Overlap-Extension PCR Multi-Fragment Assembly, as well as a programme to design oligonucleotide sets for long sequence assembly by ligase chain reaction. The in silico PCR primer or probe search includes comprehensive analyses of individual primers and primer pairs. It calculates the melting temperature for standard and degenerate oligonucleotides including LNA and other modifications, provides analyses for a set of primers with prediction of oligonucleotide properties, dimer and G-quadruplex detection, linguistic complexity, and provides a dilution and resuspension calculator

    Oligonukleotiidide hübridisatsioonimudeli rakendamine PCR-i ja mikrokiipide optimeerimiseks

    Get PDF
    Väitekirja elektrooniline versioon ei sisalda publikatsioone.Nukleiinhapped on orgaaniliste makromolekulide hulgas unikaalsed tänu oma võimele kodeerida, dekodeerida ja kanda üle digitaalset informatsiooni. See omadus on aluseks nende kasutamisele arenevates tehnoloogiavaldkondades, alates kliinilisest diagnostikast kuni nanotehnoloogia ja informatsiooni talletamiseni. On aga oluline mõista, et digitaalse informatsiooni töötlemise ja säilitamise aluseks nukleiinhapetes on nende keemilised omadused. Tähtsaim nendest on hübridiseerumine - nukleiinhapete võime moodustada spontaanselt kaheahelaline heeliks kahe komplementaarse või osaliselt komplementaarse üheahelalise molekuli liitumisel. Nukleiinhapete hübridisatsiooni termodünaamika arvestamine võimaldab selle protsessi käitumist suure täpsusega modelleerida ja täiustada paljusid biotehnoloogilisi protsesse. Käesolevas väitekirjas on hübridisatsioonimudelit kasutatud multipleks-PCR-i ja detektsiooni mikrokiipide optimeerimiseks. Me töötasime välja ökonoomse algoritmi jaotamaks PCR praimeripaarid multipleksigruppidesse vastavalt nende omavahelisele sobivusele. Algoritm on realiseeritud nii iseseisva programmi kui veebirakendusena. Me uurisime multipleks PCR ebaõnnestumise põhjuseid ja näitasime, et suur arv mittespetsiifilisi seondumiskohti lähte DNA-l vähendab praimerite töötamise edukust. Need praimeripaarid, millel oli liiga suur arv mittespetsiifilisi seondumisi mitte ainult ei töötanud ise halvasti, vaid vähendasid ka teiste nendega koos amplifiseeritud praimeripaaride õnnestumise tõenäosust. Me töötasime välja arvutiprogrammi genereerimaks täieliku nimekirja kõigist võimalikest bakteriaalse tmRNA hübridiseerimisproovidest mis eristaksid omavahel kahte gruppi organisme. Proovide valideerimise käigus me näitasime, et valides hübridisatsioonienergia läviväärtuse suurema kui 4 kcl/mol on võimalik täielikult vältida valepositiivseid signaale. Me uurisime võimalust suurendada bakteriaalse RNA hübridiseerumiskiirust lisades lühikesi spetsiifilisi oligonukleotiide, mis hübridiseerudes lähtemolekulile ei lase selle sekundaarstruktuuril moodustuda. Seda meetodit kasutades tõusis hübridiseerumiskiirus temperatuuril 37C neli korda.Nucleic acids are unique among all organic macromolecules by the ability to encode, decode and transmit digital information. This property is used in emergent technologies as diverse as medical diagnosis, nanoscale engineering and information storage. Still it is important to understand that the basis of this digital information processing are the chemical properties of nucleic acids, the most important being the spontaneous formation of double-stranded helix between complementary or semi-complementary single-stranded molecules, called hybridization. Taking into account the thermodynamic properties of nucleic acid hybridization allows researchers to model the process with great accuracy and thus improve many associated technologies. In current thesis the hybridization model is used to optimize multiplex PCR and microarray hybridization. We developed an efficient algorithm to distribute PCR primer pairs into multiplex groups based on their compatibility with each other. The algorithm is also implemented as both standalone and web-based computer program. We analyzed the probable causes of failure of multiplex PCR and demonstrated that the large number of nonspecific hybridization sites in template DNA is detrimental to PCR quality. Primer pairs with too many nonspecific hybridization sites not only worked poorly but caused the failure of other primer pairs as well. We developed a computer program to generate exhaustive list of all possible hybridization probes for the detection of bacterial tmRNA, capable of distinguishing between two groups of source RNA. The probes were evaluated on microarray and shown that by keeping the hybridization energy cutoff between target and non-target groups over 4 kcal/mol all false-positive signals were eliminated. We analyzed the possibility of increasing the hybridization speed of bacterial tmRNA on low temperatures by applying short specific oligonucleotides that selectively hybridize with template molecules and break their secondary structure. Using this method the hybridization speed was increased fourfold at 37C

    Development of a multiplex PCR assay for simultaneous detection of Theileria annulata, Babesia bovis and Anaplasma marginale in cattle

    Get PDF
    Tropical theileriosis, bovine babesiosis and anaplasmosis are tick-borne protozoan diseases that impose serious constraints on the health and productivity of domestic cattle in tropical and sub-tropical regions of the world. A common feature of these diseases is that, following recovery from primary infection, animals become persistent carriers of the pathogen and continue to play a critical role in disease epidemiology, acting as reservoirs of infection. This study describes development and evaluation of multiplex and single PCR assays for simultaneous detection of Theileria annulata, Babesia bovis and Anaplasma marginale in cattle. Following in silico screening for candidate target genes representing each of the pathogens, an optimised multiplex PCR assay was established using three primer sets, cytob1, MAR1bB2 and bovar2A, for amplification of genomic DNA of T. annulata, A. marginale and B. bovis respectively. The designed primer sets were found to be species-specific, generating amplicons of 312, 265 and 166 base pairs, respectively and were deemed suitable for the development of a multiplex assay. The sensitivity of each primer pair was evaluated using serial dilutions of parasite DNA, while specificity was confirmed by testing for amplification from DNA of different stocks of each pathogen and other Theileria, Babesia and Anaplasma species. Additionally, DNA preparations derived from field samples were used to evaluate the utility of the single and multiplex PCRs for determination of infection status. The multiplex PCR was found to detect each pathogen species with the same level of sensitivity, irrespective of whether its DNA was amplified in isolation or together with DNA representing the other pathogens. Moreover, single and multiplex PCRs were able to detect each species with equal sensitivity in serially diluted DNA representing mixtures of T. annulata, B. bovis and A. marginale, and no evidence of non-specific amplification from non-target species was observed. Validation that the multiplex PCR efficiently detects single and mixed infections from field samples was demonstrated. The developed assay represents a simple and efficient diagnostic for co-detection of tropical theileriosis, bovine babesiosis and anaplasmosis, and may be a valuable tool for epidemiological studies aimed at assessing the burden of multiple infection with tick-borne pathogens and improving control of the associated diseases in endemic regions

    Genetic diversity and core subset selection in ex situ seed collections of the banana crop wild relative Musa balbisiana

    Get PDF
    Crop wild relatives (CWRs) play a key role in crop breeding by providing beneficial trait characteristics for improvement of related crops. CWRs are more efficiently used in breeding if the plant material is genetically characterized, but the diversity in CWR genetic resources has often poorly been assessed. Seven seed collections of Musa balbisiana, an important CWR of dessert and cooking bananas, originating from three natural populations, two feral populations and two ex situ field collections were retrieved and their genetic diversity was quantified using 18 microsatellite markers to select core subsets that conserve the maximum genetic diversity. The highest genetic diversity was observed in the seed collections from natural populations of Yunnan, a region that is part of M. balbisiana's centre of origin. The seeds from the ex situ field collections were less genetically diverse, but contained unique variation with regards to the diversity in all seed collections. Seeds from feral populations displayed low genetic diversity. Core subsets that maximized genetic distance incorporated almost no seeds from the ex situ field collections. In contrast, core subsets that maximized allelic richness contained seeds from the ex situ field collections. We recommend the conservation and additional collection of seeds from natural populations, preferentially originating from the species' region of origin, and from multiple individuals in one population. We also suggest that the number of seeds used for ex situ seed bank regeneration must be much higher for the seed collections from natural populations

    Multiplex primer prediction software for divergent targets

    Get PDF
    We describe a Multiplex Primer Prediction (MPP) algorithm to build multiplex compatible primer sets to amplify all members of large, diverse and unalignable sets of target sequences. The MPP algorithm is scalable to larger target sets than other available software, and it does not require a multiple sequence alignment. We applied it to questions in viral detection, and demonstrated that there are no universally conserved priming sequences among viruses and that it could require an unfeasibly large number of primers (∼3700 18-mers or ∼2000 10-mers) to generate amplicons from all sequenced viruses. We then designed primer sets separately for each viral family, and for several diverse species such as foot-and-mouth disease virus (FMDV), hemagglutinin (HA) and neuraminidase (NA) segments of influenza A virus, Norwalk virus, and HIV-1. We empirically demonstrated the application of the software with a multiplex set of 16 short (10 nt) primers designed to amplify the Poxviridae family to produce a specific amplicon from vaccinia virus
    corecore