123,021 research outputs found

    SEED: efficient clustering of next-generation sequences.

    Get PDF
    MotivationSimilarity clustering of next-generation sequences (NGS) is an important computational problem to study the population sizes of DNA/RNA molecules and to reduce the redundancies in NGS data. Currently, most sequence clustering algorithms are limited by their speed and scalability, and thus cannot handle data with tens of millions of reads.ResultsHere, we introduce SEED-an efficient algorithm for clustering very large NGS sets. It joins sequences into clusters that can differ by up to three mismatches and three overhanging residues from their virtual center. It is based on a modified spaced seed method, called block spaced seeds. Its clustering component operates on the hash tables by first identifying virtual center sequences and then finding all their neighboring sequences that meet the similarity parameters. SEED can cluster 100 million short read sequences in <4 h with a linear time and memory performance. When using SEED as a preprocessing tool on genome/transcriptome assembly data, it was able to reduce the time and memory requirements of the Velvet/Oasis assembler for the datasets used in this study by 60-85% and 21-41%, respectively. In addition, the assemblies contained longer contigs than non-preprocessed data as indicated by 12-27% larger N50 values. Compared with other clustering tools, SEED showed the best performance in generating clusters of NGS data similar to true cluster results with a 2- to 10-fold better time performance. While most of SEED's utilities fall into the preprocessing area of NGS data, our tests also demonstrate its efficiency as stand-alone tool for discovering clusters of small RNA sequences in NGS data from unsequenced organisms.AvailabilityThe SEED software can be downloaded for free from this site: http://manuals.bioinformatics.ucr.edu/home/[email protected] informationSupplementary data are available at Bioinformatics online

    Steps toward a classifier for the Virtual Observatory. I. Classifying the SDSS photometric archive

    Full text link
    Modern photometric multiband digital surveys produce large amounts of data that, in order to be effectively exploited, need automatic tools capable to extract from photometric data an objective classification. We present here a new method for classifying objects in large multi-parametric photometric data bases, consisting of a combination of a clustering algorithm and a cluster agglomeration tool. The generalization capabilities and the potentialities of this approach are tested against the complexity of the Sloan Digital Sky Survey archive, for which an example of application is reported.Comment: To appear in the Proceedings of the "1st Workshop of Astronomy and Astrophysics for Students" - Naples, 19-20 April 200

    Peer to Peer Information Retrieval: An Overview

    Get PDF
    Peer-to-peer technology is widely used for file sharing. In the past decade a number of prototype peer-to-peer information retrieval systems have been developed. Unfortunately, none of these have seen widespread real- world adoption and thus, in contrast with file sharing, information retrieval is still dominated by centralised solutions. In this paper we provide an overview of the key challenges for peer-to-peer information retrieval and the work done so far. We want to stimulate and inspire further research to overcome these challenges. This will open the door to the development and large-scale deployment of real-world peer-to-peer information retrieval systems that rival existing centralised client-server solutions in terms of scalability, performance, user satisfaction and freedom

    The Role of Social and Institutional Contexts in Social Innovations of Spanish Academic Spinoffs

    Get PDF
    Social innovations developed by academic spinoffs (ASOs) are acquiring an ever-increasing relevance in the literature on academic entrepreneurship. Previous studies have considered the importance of the social and institutional contexts of entrepreneurial ecosystems for the development of these innovations, although a greater depth of analysis is required in this field of study. This research analyzes the influence of the frequency of contact with agents of social and institutional contexts of the entrepreneurial ecosystem on the social innovations of ASOs. From a sample of 173 Spanish ASOs, the results indicate that frequent contact with government and academic support units improves this type of innovation of ASOs. Regarding social context, an increase in the frequency of contact with customers, suppliers, and competitors favors the development of social innovation. However, frequent contact with venture capital firms inhibits the development of this type of innovation
    corecore