286 research outputs found

    NGS QC Toolkit: A Toolkit for Quality Control of Next Generation Sequencing Data

    Get PDF
    Next generation sequencing (NGS) technologies provide a high-throughput means to generate large amount of sequence data. However, quality control (QC) of sequence data generated from these technologies is extremely important for meaningful downstream analysis. Further, highly efficient and fast processing tools are required to handle the large volume of datasets. Here, we have developed an application, NGS QC Toolkit, for quality check and filtering of high-quality data. This toolkit is a standalone and open source application freely available at http://www.nipgr.res.in/ngsqctoolkit.html. All the tools in the application have been implemented in Perl programming language. The toolkit is comprised of user-friendly tools for QC of sequencing data generated using Roche 454 and Illumina platforms, and additional tools to aid QC (sequence format converter and trimming tools) and analysis (statistics tools). A variety of options have been provided to facilitate the QC at user-defined parameters. The toolkit is expected to be very useful for the QC of NGS data to facilitate better downstream analysis

    Distances and ages of globular clusters using Hipparcos parallaxes of local subdwarfs

    Get PDF
    We discuss the impact of Population II and Globular Cluster (GCs) stars on the derivation of the age of the Universe, and on the study of the formation and early evolution of galaxies, our own in particular. The long-standing problem of the actual distance scale to Population II stars and GCs is addressed, and a variety of different methods commonly used to derive distances to Population II stars are briefly reviewed. Emphasis is given to the discussion of distances and ages for GCs derived using Hipparcos parallaxes of local subdwarfs. Results obtained by different authors are slightly different, depending on different assumptions about metallicity scale, reddenings, and corrections for undetected binaries. These and other uncertainties present in the method are discussed. Finally, we outline progress expected in the near future.Comment: Invited review article to appear in: `Post-Hipparcos Cosmic Candles', A. Heck & F. Caputo (Eds), Kluwer Academic Publ., Dordrecht, in press. 22 pages including 3 tables and 2 postscript figures, uses Kluwer's crckapb.sty LaTeX style file, enclose

    Rapidity and Centrality Dependence of Proton and Anti-proton Production from Au+Au Collisions at sqrt(sNN) = 130GeV

    Full text link
    We report on the rapidity and centrality dependence of proton and anti-proton transverse mass distributions from Au+Au collisions at sqrt(sNN) = 130GeV as measured by the STAR experiment at RHIC. Our results are from the rapidity and transverse momentum range of |y|<0.5 and 0.35 <p_t<1.00GeV/c. For both protons and anti-protons, transverse mass distributions become more convex from peripheral to central collisions demonstrating characteristics of collective expansion. The measured rapidity distributions and the mean transverse momenta versus rapidity are flat within |y|<0.5. Comparisons of our data with results from model calculations indicate that in order to obtain a consistent picture of the proton(anti-proton) yields and transverse mass distributions the possibility of pre-hadronic collective expansion may have to be taken into account.Comment: 4 pages, 3 figures, 1 table, submitted to PR

    CLOTU: An online pipeline for processing and clustering of 454 amplicon reads into OTUs followed by taxonomic annotation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The implementation of high throughput sequencing for exploring biodiversity poses high demands on bioinformatics applications for automated data processing. Here we introduce <smcaps>CLOTU</smcaps>, an online and open access pipeline for processing 454 amplicon reads. C<smcaps>LOTU</smcaps> has been constructed to be highly user-friendly and flexible, since different types of analyses are needed for different datasets.</p> <p>Results</p> <p>In <smcaps>CLOTU</smcaps>, the user can filter out low quality sequences, trim tags, primers, adaptors, perform clustering of sequence reads, and run <smcaps>BLAST</smcaps> against NCBInr or a customized database in a high performance computing environment. The resulting data may be browsed in a user-friendly manner and easily forwarded to downstream analyses. Although <smcaps>CLOTU</smcaps> is specifically designed for analyzing 454 amplicon reads, other types of DNA sequence data can also be processed. A fungal ITS sequence dataset generated by 454 sequencing of environmental samples is used to demonstrate the utility of <smcaps>CLOTU</smcaps>.</p> <p>Conclusions</p> <p>C<smcaps>LOTU</smcaps> is a flexible and easy to use bioinformatics pipeline that includes different options for filtering, trimming, clustering and taxonomic annotation of high throughput sequence reads. Some of these options are not included in comparable pipelines. C<smcaps>LOTU</smcaps> is implemented in a Linux computer cluster and is freely accessible to academic users through the Bioportal web-based bioinformatics service (<url>http://www.bioportal.uio.no</url>).</p

    Prevalence of visual impairment, cataract surgery and awareness of cataract and glaucoma in Bhaktapur district of Nepal: The Bhaktapur Glaucoma Study

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Cataract and glaucoma are the major causes of blindness in Nepal. Bhaktapur is one of the three districts of Kathmandu valley which represents a metropolitan city with a predominantly agrarian rural periphery. This study was undertaken to determine the prevalence of visual impairment, cataract surgery and awareness of cataract and glaucoma among subjects residing in this district of Nepal.</p> <p>Methods</p> <p>Subjects aged 40 years and above was selected using a cluster sampling methodology and a door to door enumeration was conducted for a population based cross sectional study. During the community field work, 11499 subjects underwent a structured interview regarding awareness (heard of) and knowledge (understanding of the disease) of cataract and glaucoma. At the base hospital 4003 out of 4800 (83.39%) subjects underwent a detailed ocular examination including log MAR visual acuity, refraction, applanation tonometry, cataract grading (LOCSΙΙ), retinal examination and SITA standard perimetry when indicated.</p> <p>Results</p> <p>The age-sex adjusted prevalence of blindness (best corrected <3/60) and low vision (best corrected <6/18 ≥3/60) was 0.43% (95%C.I. 0.25 - 0.68) and 3.97% (95% C.I. 3.40 - 4.60) respectively. Cataract (53.3%) was the principal cause of blindness. The leading causes of low vision were cataract (60.8%) followed by refractive error (12%). The cataract surgical coverage was 90.36% and was higher in the younger age group, females and illiterate subjects. Pseudophakia was seen in 94%. Awareness of cataract (6.7%) and glaucoma (2.4%) was very low. Among subjects who were aware, 70.4% had knowledge of cataract and 45.5% of glaucoma. Cataract was commonly known to be a 'pearl like dot' white opacity in the eye while glaucoma was known to cause blindness. Awareness remained unchanged in different age groups for cataract while for glaucoma there was an increase in awareness with age. Women were significantly less aware (odds ratio (OR): 0.63; 95%, confidence interval (CI): 0.54 - 0.74) for cataract and (OR: 0.64; 95% CI: 0.50 - 0.81) for glaucoma. Literacy was also correlated with awareness.</p> <p>Conclusion</p> <p>The low prevalence of visual impairment and the high cataract surgical coverage suggests that cataract intervention programs have been successful in Bhaktapur. Awareness and knowledge of cataract and glaucoma was very poor among this population. Eye care programs needs to be directed towards preventing visual impairment from refractive errors, screening for incurable chronic eye diseases and promoting health education in order to raise awareness on cataract and glaucoma among this population.</p

    HSRA: Hadoop-based spliced read aligner for RNA sequencing data

    Get PDF
    [Abstract] Nowadays, the analysis of transcriptome sequencing (RNA-seq) data has become the standard method for quantifying the levels of gene expression. In RNA-seq experiments, the mapping of short reads to a reference genome or transcriptome is considered a crucial step that remains as one of the most time-consuming. With the steady development of Next Generation Sequencing (NGS) technologies, unprecedented amounts of genomic data introduce significant challenges in terms of storage, processing and downstream analysis. As cost and throughput continue to improve, there is a growing need for new software solutions that minimize the impact of increasing data volume on RNA read alignment. In this work we introduce HSRA, a Big Data tool that takes advantage of the MapReduce programming model to extend the multithreading capabilities of a state-of-the-art spliced read aligner for RNA-seq data (HISAT2) to distributed memory systems such as multi-core clusters or cloud platforms. HSRA has been built upon the Hadoop MapReduce framework and supports both single- and paired-end reads from FASTQ/FASTA datasets, providing output alignments in SAM format. The design of HSRA has been carefully optimized to avoid the main limitations and major causes of inefficiency found in previous Big Data mapping tools, which cannot fully exploit the raw performance of the underlying aligner. On a 16-node multi-core cluster, HSRA is on average 2.3 times faster than previous Hadoop-based tools. Source code in Java as well as a user’s guide are publicly available for download at http://hsra.dec.udc.es.Ministerio de Economía, Industria y Competitividad; TIN2016-75845-PXunta de Galicia; ED431G/0

    Sequencing of Pooled DNA Samples (Pool-Seq) Uncovers Complex Dynamics of Transposable Element Insertions in Drosophila melanogaster

    Get PDF
    Transposable elements (TEs) are mobile genetic elements that parasitize genomes by semi-autonomously increasing their own copy number within the host genome. While TEs are important for genome evolution, appropriate methods for performing unbiased genome-wide surveys of TE variation in natural populations have been lacking. Here, we describe a novel and cost-effective approach for estimating population frequencies of TE insertions using paired-end Illumina reads from a pooled population sample. Importantly, the method treats insertions present in and absent from the reference genome identically, allowing unbiased TE population frequency estimates. We apply this method to data from a natural Drosophila melanogaster population from Portugal. Consistent with previous reports, we show that low recombining genomic regions harbor more TE insertions and maintain insertions at higher frequencies than do high recombining regions. We conservatively estimate that there are almost twice as many “novel” TE insertion sites as sites known from the reference sequence in our population sample (6,824 novel versus 3,639 reference sites, with on average a 31-fold coverage per insertion site). Different families of transposable elements show large differences in their insertion densities and population frequencies. Our analyses suggest that the history of TE activity significantly contributes to this pattern, with recently active families segregating at lower frequencies than those active in the more distant past. Finally, using our high-resolution TE abundance measurements, we identified 13 candidate positively selected TE insertions based on their high population frequencies and on low Tajima's D values in their neighborhoods
    corecore