27 research outputs found

    An approach to comparing tiling array and high throughput sequencing technologies for genomic transcript mapping

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>There are two main technologies for transcriptome profiling, namely, tiling microarrays and high-throughput sequencing. Recently there has been a tremendous amount of excitement about the latter because of the advent of next-generation sequencing technologies and its promises. Consequently, the question of the moment is how these two technologies compare. Here we attempt to develop an approach to do a fair comparison of transcripts identified from tiling microarray and MPSS sequencing data.</p> <p>Findings</p> <p>This comparison is a challenging task because the sequencing data is discrete while the tiling array data is continuous. We use the published rice and <it>Arabidopsis </it>datasets which provide currently best matched sets of arrays and sequencing experiments using a slightly earlier generation of sequencing, the MPSS tag sequencing technology. After scoring the arrays consistently in both the organisms, a first pass comparison reveals a surprisingly small overlap in transcripts of 22% and 66% respectively, in rice and <it>Arabidopsis</it>. However, when we do the analysis in detail, we find that this is an underestimate. In particular, when we map the probe intensities onto the sequencing tags and then look at their intensity distribution, we see that they are very similar to exons. Furthermore, restricting our comparison to only protein-coding gene loci revealed a very good overlap between the two technologies.</p> <p>Conclusion</p> <p>Our approach to compare genome tiling microarray and MPSS sequencing data suggests that there is actually a reasonable overlap in transcripts identified by the two technologies. This overlap is distorted by the scoring and thresholding in the tiling array scoring procedure.</p

    Next generation sequencing has lower sequence coverage and poorer SNP-detection capability in the regulatory regions

    Get PDF
    The rapid development of next generation sequencing (NGS) technology provides a new chance to extend the scale and resolution of genomic research. How to efficiently map millions of short reads to the reference genome and how to make accurate SNP calls are two major challenges in taking full advantage of NGS. In this article, we reviewed the current software tools for mapping and SNP calling, and evaluated their performance on samples from The Cancer Genome Atlas (TCGA) project. We found that BWA and Bowtie are better than the other alignment tools in comprehensive performance for Illumina platform, while NovoalignCS showed the best overall performance for SOLiD. Furthermore, we showed that next-generation sequencing platform has significantly lower coverage and poorer SNP-calling performance in the CpG islands, promoter and 5′-UTR regions of the genome. NGS experiments targeting for these regions should have higher sequencing depth than the normal genomic region

    Empirical Evaluation of Oligonucleotide Probe Selection for DNA Microarrays

    Get PDF
    DNA-based microarrays are increasingly central to biomedical research. Selecting oligonucleotide sequences that will behave consistently across experiments is essential to the design, production and performance of DNA microarrays. Here our aim was to improve on probe design parameters by empirically and systematically evaluating probe performance in a multivariate context. We used experimental data from 19 array CGH hybridizations to assess the probe performance of 385,474 probes tiled in the Duchenne muscular dystrophy (DMD) region of the X chromosome. Our results demonstrate that probe melting temperature, single nucleotide polymorphisms (SNPs), and homocytosine motifs all have a strong effect on probe behavior. These findings, when incorporated into future microarray probe selection algorithms, may improve microarray performance for a wide variety of applications

    New methods to analyse microarray data that partially lack a reference signal

    Get PDF
    BACKGROUND: Microarray-based Comparative Genomic Hybridisation (CGH) has been used to assess genetic variability between bacterial strains. Crucial for interpretation of microarray data is the availability of a reference to compare signal intensities to reliably determine presence or divergence each DNA fragment. However, the production of a good reference becomes unfeasible when microarrays are based on pan-genomes.When only a single strain is used as a reference for a multistrain array, the accessory gene pool will be partially represented by reference DNA, although these genes represent the genomic repertoire that can explain differences in virulence, pathogenicity or transmissibility between strains. The lack of a reference makes interpretation of the data for these genes difficult and, if the test signal is low, they are often deleted from the analysis. We aimed to develop novel methods to determine the presence or divergence of genes in a Staphylococcus aureus multistrain PCR product microarray-based CGH approach for which reference DNA was not available for some probes. RESULTS: In this study we have developed 6 new methods to predict divergence and presence of all genes spotted on a multistrain Staphylococcus aureus DNA microarray, published previously, including those gene spots that lack reference signals. When considering specificity and PPV (i.e. the false-positive rate) as the most important criteria for evaluating these methods, the method that defined gene presence based on a signal at least twice as high as the background and higher than the reference signal (method 4) had the best test characteristics. For this method specificity was 100% and 82% for MRSA252 (compared to the GACK method) and all spots (compared to sequence data), respectively, and PPV were 100% and 76% for MRSA252 (compared to the GACK method) and all spots (compared to sequence data), respectively. CONCLUSION: A definition of gene presence based on signal at least twice as high as the background and higher than the reference signal (method 4) had the best test characteristics, allowing the analysis of 6-17% more of the genes not present in the reference strain. This method is recommended to analyse microarray data that partially lack a reference signal

    Teolenn: an efficient and customizable workflow to design high-quality probes for microarray experiments

    Get PDF
    Despite the development of new high-throughput sequencing techniques, microarrays are still attractive tools to study small genome organisms, thanks to sample multiplexing and high-feature densities. However, the oligonucleotide design remains a delicate step for most users. A vast array of software is available to deal with this problem, but each program is developed with its own strategy, which makes the choice of the best solution difficult. Here we describe Teolenn, a universal probe design workflow developed with a flexible and customizable module organization allowing fixed or variable length oligonucleotide generation. In addition, our software is able to supply quality scores for each of the designed probes. In order to assess the relevance of these scores, we performed a real hybridization using a tiling array designed against the Trichoderma reesei fungus genome. We show that our scoring pipeline correlates with signal quality for 97.2% of all the designed probes, allowing for a posteriori comparisons between quality scores and signal intensities. This result is useful in discarding any bad scoring probes during the design step in order to get high-quality microarrays. Teolenn is available at http://transcriptome.ens.fr/teolenn/

    Probabilistic base calling of Solexa sequencing data

    Get PDF
    BACKGROUND: Solexa/Illumina short-read ultra-high throughput DNA sequencing technology produces millions of short tags (up to 36 bases) by parallel sequencing-by-synthesis of DNA colonies. The processing and statistical analysis of such high-throughput data poses new challenges; currently a fair proportion of the tags are routinely discarded due to an inability to match them to a reference sequence, thereby reducing the effective throughput of the technology. RESULTS: We propose a novel base calling algorithm using model-based clustering and probability theory to identify ambiguous bases and code them with IUPAC symbols. We also select optimal sub-tags using a score based on information content to remove uncertain bases towards the ends of the reads. CONCLUSION: We show that the method improves genome coverage and number of usable tags as compared with Solexa's data processing pipeline by an average of 15%. An R package is provided which allows fast and accurate base calling of Solexa's fluorescence intensity files and the production of informative diagnostic plots

    Ultrafast and memory-efficient alignment of short DNA sequences to the human genome

    Get PDF
    Bowtie: a new ultrafast memory-efficient tool for the alignment of short DNA sequence reads to large genomes

    Dynamic probe selection for studying microbial transcriptome with high-density genomic tiling microarrays

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Current commercial high-density oligonucleotide microarrays can hold millions of probe spots on a single microscopic glass slide and are ideal for studying the transcriptome of microbial genomes using a tiling probe design. This paper describes a comprehensive computational pipeline implemented specifically for designing tiling probe sets to study microbial transcriptome profiles.</p> <p>Results</p> <p>The pipeline identifies every possible probe sequence from both forward and reverse-complement strands of all DNA sequences in the target genome including circular or linear chromosomes and plasmids. Final probe sequence lengths are adjusted based on the maximal oligonucleotide synthesis cycles and best isothermality allowed. Optimal probes are then selected in two stages - sequential and gap-filling. In the sequential stage, probes are selected from sequence windows tiled alongside the genome. In the gap-filling stage, additional probes are selected from the largest gaps between adjacent probes that have already been selected, until a predefined number of probes is reached. Selection of the highest quality probe within each window and gap is based on five criteria: sequence uniqueness, probe self-annealing, melting temperature, oligonucleotide length, and probe position.</p> <p>Conclusions</p> <p>The probe selection pipeline evaluates global and local probe sequence properties and selects a set of probes dynamically and evenly distributed along the target genome. Unique to other similar methods, an exact number of non-redundant probes can be designed to utilize all the available probe spots on any chosen microarray platform. The pipeline can be applied to microbial genomes when designing high-density tiling arrays for comparative genomics, ChIP chip, gene expression and comprehensive transcriptome studies.</p

    Custom Design and Analysis of High-Density Oligonucleotide Bacterial Tiling Microarrays

    Get PDF
    Not until recently have custom made high-density oligonucleotide microarrays been available at an affordable price. The aim of this thesis was to design microarrays and analysis algorithms for DNA repair and DNA damage detection, and to apply the methods in real experiments. Thomassen et al. have used their custom designed whole genome-tiling microarrays for detection of transcriptional changes in Escherichia coli after exposure to DNA damageing reagents. The transcriptional changes in E. coli treated with UV light or the methylating reagent MNNG were shown to be larger and to include far more genes than previously reported. To optimize the data analysis for the custom made arrays, Thomassen and coworkers designed their own normalization and analysis algorithms, and showed these more suitable than established methods that are currently applied on custom tiling arrays. Among other findings several novel stress-induced transcripts were detected, of which one is predicted to be a UV-induced short transmembrane protein. Additionally, no upregulation of the previously described UV-inducible aidB is shown. In the MNNG study several genes are shown as downregulated in response to DNA damage although having upstream regulatory sequences similar to the established LexA box A and B. This indicates that the LexA regulon also might control gene repression and that the box A and B sequence can not alone answer for the LexA controlled gene regulation. Thomassen et al. have also custom designed a microarray for oncogenic fusion gene detection. Cancer specific fusion genes are often used to subgroup cancers and to define the optimal treatment, but currently the laboratory detection procedure is both laborious and tedious. In a blinded study on six cancer cell lines proof of principle was shown by detection of six out of six positive controls. The design and analysis methods for this microarray are now being refined to make a diagnostic fusion gene detection tool
    corecore