33 research outputs found
High-Throughput SNP Genotyping by SBE/SBH
Despite much progress over the past decade, current Single Nucleotide
Polymorphism (SNP) genotyping technologies still offer an insufficient degree
of multiplexing when required to handle user-selected sets of SNPs. In this
paper we propose a new genotyping assay architecture combining multiplexed
solution-phase single-base extension (SBE) reactions with sequencing by
hybridization (SBH) using universal DNA arrays such as all -mer arrays. In
addition to PCR amplification of genomic DNA, SNP genotyping using SBE/SBH
assays involves the following steps: (1) Synthesizing primers complementing the
genomic sequence immediately preceding SNPs of interest; (2) Hybridizing these
primers with the genomic DNA; (3) Extending each primer by a single base using
polymerase enzyme and dideoxynucleotides labeled with 4 different fluorescent
dyes; and finally (4) Hybridizing extended primers to a universal DNA array and
determining the identity of the bases that extend each primer by hybridization
pattern analysis. Our contributions include a study of multiplexing algorithms
for SBE/SBH genotyping assays and preliminary experimental results showing the
achievable tradeoffs between the number of array probes and primer length on
one hand and the number of SNPs that can be assayed simultaneously on the
other. Simulation results on datasets both randomly generated and extracted
from the NCBI dbSNP database suggest that the SBE/SBH architecture provides a
flexible and cost-effective alternative to genotyping assays currently used in
the industry, enabling genotyping of up to hundreds of thousands of
user-specified SNPs per assay.Comment: 19 page
Highly Scalable Algorithms for Robust String Barcoding
String barcoding is a recently introduced technique for genomic-based
identification of microorganisms. In this paper we describe the engineering of
highly scalable algorithms for robust string barcoding. Our methods enable
distinguisher selection based on whole genomic sequences of hundreds of
microorganisms of up to bacterial size on a well-equipped workstation, and can
be easily parallelized to further extend the applicability range to thousands
of bacterial size genomes. Experimental results on both randomly generated and
NCBI genomic data show that whole-genome based selection results in a number of
distinguishers nearly matching the information theoretic lower bounds for the
problem
Identification of mammalian orthologs using local synteny
<p>Abstract</p> <p>Background</p> <p>Accurate determination of orthology is central to comparative genomics. For vertebrates in particular, very large gene families, high rates of gene duplication and loss, multiple mechanisms of gene duplication, and high rates of retrotransposition all combine to make inference of orthology between genes difficult. Many methods have been developed to identify orthologous genes, mostly based upon analysis of the inferred protein sequence of the genes. More recently, methods have been proposed that use genomic context in addition to protein sequence to improve orthology assignment in vertebrates. Such methods have been most successfully implemented in fungal genomes and have long been used in prokaryotic genomes, where gene order is far less variable than in vertebrates. However, to our knowledge, no explicit comparison of synteny and sequence based definitions of orthology has been reported in vertebrates, or, more specifically, in mammals.</p> <p>Results</p> <p>We test a simple method for the measurement and utilization of gene order (local synteny) in the identification of mammalian orthologs by investigating the agreement between coding sequence based orthology (Inparanoid) and local synteny based orthology. In the 5 mammalian genomes studied, 93% of the sampled inter-species pairs were found to be concordant between the two orthology methods, illustrating that local synteny is a robust substitute to coding sequence for identifying orthologs. However, 7% of pairs were found to be discordant between local synteny and Inparanoid. These cases of discordance result from evolutionary events including retrotransposition and genome rearrangements.</p> <p>Conclusions</p> <p>By analyzing cases of discordance between local synteny and Inparanoid we show that local synteny can distinguish between true orthologs and recent retrogenes, can resolve ambiguous many-to-many orthology relationships into one-to-one ortholog pairs, and might be used to identify cases of non-orthologous gene displacement by retroduplicated paralogs.</p
Optimum Extensions of Prefix Codes
An algorithm is given for finding the minimum weight extension of a prefix code. The algorithm runs in O(n³), where n is the number of codewords to be added, and works for arbitrary alphabets. For binary alphabets the running time is reduced to O(n²), by using the fact that a certain cost matrix satisfies the quadrangle inequality. The quadrangle inequality is shown not to hold for alphabets of size larger than two. Similar algorithms are presented for finding alphabetic and length-limited code extensions
A note on the MST heuristic for bounded edge-length Steiner Trees with minimum number of Steiner Points
We give a tight analysis of the MST heuristic recently introduced by G.-H. Lin and G. Xue for approximating the Steiner tree with minimum number of Steiner points and bounded edge-lengths. The approximation factor of the heuristic is shown to be one less than the MST number of the underlying space, defined as the maximum possible degree of a minimum-degree MST spanning points from the space. In particular, on instances drawn from the Euclidean (resp. rectilinear) plane, the MST heuristic is shown to have tight approximation factors of 4, respectively 3. Keywords: Approximation algorithms, Steiner trees, MST heuristic, fixed wireless network design, VLSI CAD. 1 Introduction The classical Steiner tree problem is that of finding a shortest tree spanning a given set of terminal points. The tree may use additional points besides the terminals, these points are commonly referred to as Steiner points. In the Minimum number of Steiner Points Tree (MSPT) problem [7,5] one also seeks a tree ..
Complete mitochondrial genome of the water vole, Microtus richardsoni (Cricetidae, Rodentia)
Water voles (Microtus richardsoni) are sensitive species distributed in the mountains of Canada (Alberta, British Columbia), and the United States of America (Idaho, Montana, Oregon, Utah, Washington, and Wyoming). We assembled the complete circular M. richardsoni mitogenome, which is 16,285 bp in length and encodes 13 protein-coding genes, 22 tRNA genes, and two rRNA genes. We estimated the phylogenetic tree of M. richardsoni and 24 related arvicoline species with two outgroup species: Phodopus roborovskii and Cricetus cricetus