Search CORE

874,127 research outputs found

SEED: efficient clustering of next-generation sequences.

Author: Bao Ergude
Girke Thomas
Jiang Tao
Kaloshian Isgouhi
Publication venue: eScholarship, University of California
Publication date: 02/08/2011
Field of study

MotivationSimilarity clustering of next-generation sequences (NGS) is an important computational problem to study the population sizes of DNA/RNA molecules and to reduce the redundancies in NGS data. Currently, most sequence clustering algorithms are limited by their speed and scalability, and thus cannot handle data with tens of millions of reads.ResultsHere, we introduce SEED-an efficient algorithm for clustering very large NGS sets. It joins sequences into clusters that can differ by up to three mismatches and three overhanging residues from their virtual center. It is based on a modified spaced seed method, called block spaced seeds. Its clustering component operates on the hash tables by first identifying virtual center sequences and then finding all their neighboring sequences that meet the similarity parameters. SEED can cluster 100 million short read sequences in <4 h with a linear time and memory performance. When using SEED as a preprocessing tool on genome/transcriptome assembly data, it was able to reduce the time and memory requirements of the Velvet/Oasis assembler for the datasets used in this study by 60-85% and 21-41%, respectively. In addition, the assemblies contained longer contigs than non-preprocessed data as indicated by 12-27% larger N50 values. Compared with other clustering tools, SEED showed the best performance in generating clusters of NGS data similar to true cluster results with a 2- to 10-fold better time performance. While most of SEED's utilities fall into the preprocessing area of NGS data, our tests also demonstrate its efficiency as stand-alone tool for discovering clusters of small RNA sequences in NGS data from unsequenced organisms.AvailabilityThe SEED software can be downloaded for free from this site: http://manuals.bioinformatics.ucr.edu/home/[email protected] informationSupplementary data are available at Bioinformatics online

PubMed Central

eScholarship - University of California

Generation of key in cryptographic system for secure communications

Author: Perlman M.
Publication venue
Publication date: 01/10/1975
Field of study

Report discusses key generation for transmission of confidential data. A number of feedback functions are discussed for generation of long key sequences

NASA Technical Reports Server

Learning and generation of long-range correlated sequences

Author: A. Priel
A. Priel
H.A. Makse
I. Kanter
I. Kanter
I. Kanter
I. Kanter
L. Ein-Dor
M. Opper
M. Schroder
P. Riegler
W. Tarkowski
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/2000
Field of study

We study the capability to learn and to generate long-range, power-law correlated sequences by a fully connected asymmetric network. The focus is set on the ability of neural networks to extract statistical features from a sequence. We demonstrate that the average power-law behavior is learnable, namely, the sequence generated by the trained network obeys the same statistical behavior. The interplay between a correlated weight matrix and the sequence generated by such a network is explored. A weight matrix with a power-law correlation function along the vertical direction, gives rise to a sequence with a similar statistical behavior.Comment: 5 pages, 3 figures, accepted for publication in Physical Review

arXiv.org e-Print Archive

CiteSeerX

Crossref

Capturing the ‘ome’ : the expanding molecular toolbox for RNA and DNA library construction

Author: Boone Morgane
Callewaert Nico
De Koker Andries
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2018
Field of study

All sequencing experiments and most functional genomics screens rely on the generation of libraries to comprehensively capture pools of targeted sequences. In the past decade especially, driven by the progress in the field of massively parallel sequencing, numerous studies have comprehensively assessed the impact of particular manipulations on library complexity and quality, and characterized the activities and specificities of several key enzymes used in library construction. Fortunately, careful protocol design and reagent choice can substantially mitigate many of these biases, and enable reliable representation of sequences in libraries. This review aims to guide the reader through the vast expanse of literature on the subject to promote informed library generation, independent of the application

Crossref

Ghent University Academic Bibliography