5,155 research outputs found
Recommended from our members
Linked optical and gene expression profiling of single cells at high-throughput.
Single-cell RNA sequencing has emerged as a powerful tool for characterizing cells, but not all phenotypes of interest can be observed through changes in gene expression. Linking sequencing with optical analysis has provided insight into the molecular basis of cellular function, but current approaches have limited throughput. Here, we present a high-throughput platform for linked optical and gene expression profiling of single cells. We demonstrate accurate fluorescence and gene expression measurements on thousands of cells in a single experiment. We use the platform to characterize DNA and RNA changes through the cell cycle and correlate antibody fluorescence with gene expression. The platform's ability to isolate rare cell subsets and perform multiple measurements, including fluorescence and sequencing-based analysis, holds potential for scalable multi-modal single-cell analysis
Indexing arbitrary-length -mers in sequencing reads
We propose a lightweight data structure for indexing and querying collections
of NGS reads data in main memory. The data structure supports the interface
proposed in the pioneering work by Philippe et al. for counting and locating
-mers in sequencing reads. Our solution, PgSA (pseudogenome suffix array),
based on finding overlapping reads, is competitive to the existing algorithms
in the space use, query times, or both. The main applications of our index
include variant calling, error correction and analysis of reads from RNA-seq
experiments
Indexing large genome collections on a PC
Motivation: The availability of thousands of invidual genomes of one species
should boost rapid progress in personalized medicine or understanding of the
interaction between genotype and phenotype, to name a few applications. A key
operation useful in such analyses is aligning sequencing reads against a
collection of genomes, which is costly with the use of existing algorithms due
to their large memory requirements.
Results: We present MuGI, Multiple Genome Index, which reports all
occurrences of a given pattern, in exact and approximate matching model,
against a collection of thousand(s) genomes. Its unique feature is the small
index size fitting in a standard computer with 16--32\,GB, or even 8\,GB, of
RAM, for the 1000GP collection of 1092 diploid human genomes. The solution is
also fast. For example, the exact matching queries are handled in average time
of 39\,s and with up to 3 mismatches in 373\,s on the test PC with
the index size of 13.4\,GB. For a smaller index, occupying 7.4\,GB in memory,
the respective times grow to 76\,s and 917\,s.
Availability: Software and Suuplementary material:
\url{http://sun.aei.polsl.pl/mugi}
Large-scale compression of genomic sequence databases with the Burrows-Wheeler transform
Motivation
The Burrows-Wheeler transform (BWT) is the foundation of many algorithms for
compression and indexing of text data, but the cost of computing the BWT of
very large string collections has prevented these techniques from being widely
applied to the large sets of sequences often encountered as the outcome of DNA
sequencing experiments. In previous work, we presented a novel algorithm that
allows the BWT of human genome scale data to be computed on very moderate
hardware, thus enabling us to investigate the BWT as a tool for the compression
of such datasets.
Results
We first used simulated reads to explore the relationship between the level
of compression and the error rate, the length of the reads and the level of
sampling of the underlying genome and compare choices of second-stage
compression algorithm.
We demonstrate that compression may be greatly improved by a particular
reordering of the sequences in the collection and give a novel `implicit
sorting' strategy that enables these benefits to be realised without the
overhead of sorting the reads. With these techniques, a 45x coverage of real
human genome sequence data compresses losslessly to under 0.5 bits per base,
allowing the 135.3Gbp of sequence to fit into only 8.2Gbytes of space (trimming
a small proportion of low-quality bases from the reads improves the compression
still further).
This is more than 4 times smaller than the size achieved by a standard
BWT-based compressor (bzip2) on the untrimmed reads, but an important further
advantage of our approach is that it facilitates the building of compressed
full text indexes such as the FM-index on large-scale DNA sequence collections.Comment: Version here is as submitted to Bioinformatics and is same as the
previously archived version. This submission registers the fact that the
advanced access version is now available at
http://bioinformatics.oxfordjournals.org/content/early/2012/05/02/bioinformatics.bts173.abstract
. Bioinformatics should be considered as the original place of publication of
this article, please cite accordingl
- …