199 research outputs found

    MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph

    Get PDF
    MEGAHIT is a NGS de novo assembler for assembling large and complex metagenomics data in a time- and cost-efficient manner. It finished assembling a soil metagenomics dataset with 252Gbps in 44.1 hours and 99.6 hours on a single computing node with and without a GPU, respectively. MEGAHIT assembles the data as a whole, i.e., it avoids pre-processing like partitioning and normalization, which might compromise on result integrity. MEGAHIT generates 3 times larger assembly, with longer contig N50 and average contig length than the previous assembly. 55.8% of the reads were aligned to the assembly, which is 4 times higher than the previous. The source code of MEGAHIT is freely available at https://github.com/voutcn/megahit under GPLv3 license.Comment: 2 pages, 2 tables, 1 figure, submitted to Oxford Bioinformatics as an Application Not

    Potential Uses of Wild Germplasms of Grain Legumes for Crop Improvement

    Get PDF
    Challenged by population increase, climatic change, and soil deterioration, crop improvement is always a priority in securing food supplies. Although the production of grain legumes is in general lower than that of cereals, the nutritional value of grain legumes make them important components of food security. Nevertheless, limited by severe genetic bottlenecks during domestication and human selection, grain legumes, like other crops, have suffered from a loss of genetic diversity which is essential for providing genetic materials for crop improvement programs. Illustrated by whole-genome-sequencing, wild relatives of crops adapted to various environments were shown to maintain high genetic diversity. In this review, we focused on nine important grain legumes (soybean, peanut, pea, chickpea, common bean, lentil, cowpea, lupin, and pigeonpea) to discuss the potential uses of their wild relatives as genetic resources for crop breeding and improvement, and summarized the various genetic/genomic approaches adopted for these purposes.Instituto de Fisiología y Recursos Genéticos VegetalesFil: Muñoz, Nacira Belen. Chinese University of Hong Kong. Centre for Soybean Research of the Partner State Key Laboratory of Agrobiotechnology and School of Life Sciences; China. Instituto Nacional de Tecnología Agropecuaria (INTA). Instituto de Fisiología y Recursos Genéticos Vegetales; Argentina. Universidad Nacional de Córdoba. Facultad de Ciencias Exactas Físicas y Naturales. Cátedra de Fisiología Vegetal; ArgentinaFil: Ailin, Liu. Chinese University of Hong Kong. Centre for Soybean Research of the Partner State Key Laboratory of Agrobiotechnology and School of Life Sciences; ChinaFil: Leo, Kan. Chinese University of Hong Kong. Centre for Soybean Research of the Partner State Key Laboratory of Agrobiotechnology and School of Life Sciences; ChinaFil: Man-Wah, Li. Chinese University of Hong Kong. Centre for Soybean Research of the Partner State Key Laboratory of Agrobiotechnology and School of Life Sciences; ChinaFil: Hon-Ming, Lam. Chinese University of Hong Kong. Centre for Soybean Research of the Partner State Key Laboratory of Agrobiotechnology and School of Life Sciences; Chin

    BASE: a practical de novo assembler for large genomes using long NGS reads

    Get PDF
    © 2016 The Author(s). Background: De novo genome assembly using NGS data remains a computation-intensive task especially for large genomes. In practice, efficiency is often a primary concern and favors using a more efficient assembler like SOAPdenovo2. Yet SOAPdenovo2, based on de Bruijn graph, fails to take full advantage of longer NGS reads (say, 150 bp to 250 bp from Illumina HiSeq and MiSeq). Assemblers that are based on string graphs (e.g., SGA), though less popular and also very slow, are more favorable for longer reads. Methods: This paper shows a new de novo assembler called BASE. It enhances the classic seed-extension approach by indexing the reads efficiently to generate adaptive seeds that have high probability to appear uniquely in the genome. Such seeds form the basis for BASE to build extension trees and then to use reverse validation to remove the branches based on read coverage and paired-end information, resulting in high-quality consensus sequences of reads sharing the seeds. Such consensus sequences are then extended to contigs. Results: Experiments on two bacteria and four human datasets shows the advantage of BASE in both contig quality and speed in dealing with longer reads. In the experiment on bacteria, two datasets with read length of 100 bp and 250 bp were used. Especially for the 250 bp dataset, BASE gives much better quality than SOAPdenovo2 and SGA and is simlilar to SPAdes. Regarding speed, BASE is consistently a few times faster than SPAdes and SGA, but still slower than SOAPdenovo2. BASE and Soapdenov2 are further compared using human datasets with read length 100 bp, 150 bp and 250 bp. BASE shows a higher N50 for all datasets, while the improvement becomes more significant when read length reaches 250 bp. Besides, BASE is more-meory efficent than SOAPdenovo2 when sequencing data with error rate. Conclusions: BASE is a practically efficient tool for constructing contig, with significant improvement in quality for long NGS reads. It is relatively easy to extend BASE to include scaffolding.published_or_final_versio

    Rice Hypersensitive Induced Reaction Protein 1 (OsHIR1) associates with plasma membrane and triggers hypersensitive cell death

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In plants, HIR (Hypersensitive Induced Reaction) proteins, members of the PID (Proliferation, Ion and Death) superfamily, have been shown to play a part in the development of spontaneous hypersensitive response lesions in leaves, in reaction to pathogen attacks. The levels of HIR proteins were shown to correlate with localized host cell deaths and defense responses in maize and barley. However, not much was known about the HIR proteins in rice. Since rice is an important cereal crop consumed by more than 50% of the populations in Asia and Africa, it is crucial to understand the mechanisms of disease responses in this plant. We previously identified the rice HIR1 (OsHIR1) as an interacting partner of the OsLRR1 (rice Leucine-Rich Repeat protein 1). Here we show that OsHIR1 triggers hypersensitive cell death and its localization to the plasma membrane is enhanced by OsLRR1.</p> <p>Result</p> <p>Through electron microscopy studies using wild type rice plants, OsHIR1 was found to mainly localize to the plasma membrane, with a minor portion localized to the tonoplast. Moreover, the plasma membrane localization of OsHIR1 was enhanced in transgenic rice plants overexpressing its interacting protein partner, OsLRR1. Co-localization of OsHIR1 and OsLRR1 to the plasma membrane was confirmed by double-labeling electron microscopy. Pathogen inoculation studies using transgenic <it>Arabidopsis thaliana </it>expressing either OsHIR1 or OsLRR1 showed that both transgenic lines exhibited increased resistance toward the bacterial pathogen <it>Pseudomonas syringae </it>pv. <it>tomato </it>DC3000. However, <it>OsHIR1 </it>transgenic plants produced more extensive spontaneous hypersensitive response lesions and contained lower titers of the invading pathogen, when compared to <it>OsLRR1 </it>transgenic plants.</p> <p>Conclusion</p> <p>The OsHIR1 protein is mainly localized to the plasma membrane, and its subcellular localization in that compartment is enhanced by OsLRR1. The expression of OsHIR1 may sensitize the plant so that it is more prone to HR and hence can react more promptly to limit the invading pathogens' spread from the infection sites.</p

    SOAP3-dp: Fast, Accurate and Sensitive GPU-based Short Read Aligner

    Get PDF
    To tackle the exponentially increasing throughput of Next-Generation Sequencing (NGS), most of the existing short-read aligners can be configured to favor speed in trade of accuracy and sensitivity. SOAP3-dp, through leveraging the computational power of both CPU and GPU with optimized algorithms, delivers high speed and sensitivity simultaneously. Compared with widely adopted aligners including BWA, Bowtie2, SeqAlto, GEM and GPU-based aligners including BarraCUDA and CUSHAW, SOAP3-dp is two to tens of times faster, while maintaining the highest sensitivity and lowest false discovery rate (FDR) on Illumina reads with different lengths. Transcending its predecessor SOAP3, which does not allow gapped alignment, SOAP3-dp by default tolerates alignment similarity as low as 60 percent. Real data evaluation using human genome demonstrates SOAP3-dp's power to enable more authentic variants and longer Indels to be discovered. Fosmid sequencing shows a 9.1 percent FDR on newly discovered deletions. SOAP3-dp natively supports BAM file format and provides a scoring scheme same as BWA, which enables it to be integrated into existing analysis pipelines. SOAP3-dp has been deployed on Amazon-EC2, NIH-Biowulf and Tianhe-1A.Comment: 21 pages, 6 figures, submitted to PLoS ONE, additional files available at "https://www.dropbox.com/sh/bhclhxpoiubh371/O5CO_CkXQE". Comments most welcom

    Laboratório de indicadores de Governança Pública: uma proposta para mensurar a efetividade dos gastos na Segurança Pública Municipal

    Get PDF
    Anais do 35º Seminário de Extensão Universitária da Região Sul - Área temática: EducaçãoPressões por maior transparência e accountability tem sido o mote de muitas mudanças no setor público. No entanto, parece existir uma dificuldade de colocar tais conceitos em prática na área de segurança pública. Este trabalho apresenta algumas iniciativas do Laboratório de Indicadores de Governança Pública, do CESFI-UDESC, na criação de indicadores de efetividade dos gastos dos municípios do Estado de Santa Catarina, em segurança pública. São apresentados no trabalho o que foi feito até o momento e quais os desafios na mensuração das ações de políticas públicas para esta ár

    MICA: A fast short-read aligner that takes full advantage of Many Integrated Core Architecture (MIC)

    Get PDF
    Background: Short-read aligners have recently gained a lot of speed by exploiting the massive parallelism of GPU. An uprising alterative to GPU is Intel MIC; supercomputers like Tianhe-2, currently top of TOP500, is built with 48,000 MIC boards to offer ~55 PFLOPS. The CPU-like architecture of MIC allows CPU-based software to be parallelized easily; however, the performance is often inferior to GPU counterparts as an MIC card contains only ~60 cores (while a GPU card typically has over a thousand cores). Results: To better utilize MIC-enabled computers for NGS data analysis, we developed a new short-read aligner MICA that is optimized in view of MIC's limitation and the extra parallelism inside each MIC core. By utilizing the 512-bit vector units in the MIC and implementing a new seeding strategy, experiments on aligning 150 bp paired-end reads show that MICA using one MIC card is 4.9 times faster than BWA-MEM (using 6 cores of a top-end CPU), and slightly faster than SOAP3-dp (using a GPU). Furthermore, MICA's simplicity allows very efficient scale-up when multiple MIC cards are used in a node (3 cards give a 14.1-fold speedup over BWA-MEM). Summary: MICA can be readily used by MIC-enabled supercomputers for production purpose. We have tested MICA on Tianhe-2 with 90 WGS samples (17.47 Tera-bases), which can be aligned in an hour using 400 nodes. MICA has impressive performance even though MIC is only in its initial stage of development. Availability and implementation: MICA's source code is freely available at http://sourceforge.net/projects/mica-aligner under GPL v3. Supplementary information: Supplementary information is available as "Additional File 1". Datasets are available at www.bio8.cs.hku.hk/dataset/mica.published_or_final_versio
    corecore