Search CORE

23 research outputs found

Artificial and natural duplicates in pyrosequencing reads of metagenomic data

Author: Fu Limin
Li Weizhong
Niu Beifang
Sun Shulei
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Artificial duplicates from pyrosequencing reads may lead to incorrect interpretation of the abundance of species and genes in metagenomic studies. Duplicated reads were filtered out in many metagenomic projects. However, since the duplicated reads observed in a pyrosequencing run also include natural (non-artificial) duplicates, simply removing all duplicates may also cause underestimation of abundance associated with natural duplicates. Results We implemented a method for identification of exact and nearly identical duplicates from pyrosequencing reads. This method performs an all-against-all sequence comparison and clusters the duplicates into groups using an algorithm modified from our previous sequence clustering method cd-hit. This method can process a typical dataset in ~10 minutes; it also provides a consensus sequence for each group of duplicates. We applied this method to the underlying raw reads of 39 genomic projects and 10 metagenomic projects that utilized pyrosequencing technique. We compared the occurrences of the duplicates identified by our method and the natural duplicates made by independent simulations. We observed that the duplicates, including both artificial and natural duplicates, make up 4-44% of reads. The number of natural duplicates highly correlates with the samples' read density (number of reads divided by genome size). For high-complexity metagenomic samples lacking dominant species, natural duplicates only make up <1% of all duplicates. But for some other samples like transcriptomic samples, majority of the observed duplicates might be natural duplicates. Conclusions Our method is available from <url>http://cd-hit.org</url> as a downloadable program and a web server. It is important not only to identify the duplicates from metagenomic datasets but also to distinguish whether they are artificial or natural duplicates. We provide a tool to estimate the number of natural duplicates according to user-defined sample types, so users can decide whether to retain or remove duplicates in their projects.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

CD-HIT Suite: a web server for clustering and comparing biological sequences

Author: Beifang Niu
Letunic
Li
Li
Li
Li
Limin Fu
Suzek
Turnbaugh
Weizhong Li
Ying Gao
Ying Huang
Yooseph
Yooseph
Publication venue: Oxford University Press
Publication date
Field of study

Summary: CD-HIT is a widely used program for clustering and comparing large biological sequence datasets. In order to further assist the CD-HIT users, we significantly improved this program with more functions and better accuracy, scalability and flexibility. Most importantly, we developed a new web server, CD-HIT Suite, for clustering a user-uploaded sequence dataset or comparing it to another dataset at different identity levels. Users can now interactively explore the clusters within web browsers. We also provide downloadable clusters for several public databases (NCBI NR, Swissprot and PDB) at different identity levels

Crossref

PubMed Central

SIMULATION OF URBAN RAIL VEHICLE CRASH AND FACTORS INFLUNCING ANTI-CLIMBING ABILITY OF ITS ANTI-CLIMBER

Author: CHEN ShuQin
NIU WeiZhong
WEN Tao
Publication venue: Editorial Office of Journal of Mechanical Strength
Publication date: 01/01/2015
Field of study

By use of the finite element software Hypermesh and LS-DYNA,the processes were respectively simulated of urban rail vehicle head car,with and without anti-climbing energy absorption device,impacting the fixed rigid wall face to face at the speed of 12.25 km / h and 18 km / h.Based on the obtained data,the crashworthiness of urban rail vehicle head-car body and performance of its energy absorption device were evaluated.Using the response surface methodology,the factors influencing anticlimbing ability of anti-climber were also studied.The results show that,when the crash speed is respectively at 12.25 km / h and18 km / h,the energy absorption device would absorb impact energy before car body structure by plastic deformation,protecting the car body without and with only a little plastic deformation.In addition,when the total height and tooth thickness of anti-climber are fixed,its anti-climbing ability would decrease as the tooth height and angle increases,and the tooth height has more influence than the angle

Directory of Open Access Journals

WebMGA: a customizable web server for fast metagenomic sequence analysis

Author: Fu Liming
Li Weizhong
Niu Beifang
Wu Sitao
Zhu Zhengwei
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/09/2011
Field of study

Abstract Background The new field of metagenomics studies microorganism communities by culture-independent sequencing. With the advances in next-generation sequencing techniques, researchers are facing tremendous challenges in metagenomic data analysis due to huge quantity and high complexity of sequence data. Analyzing large datasets is extremely time-consuming; also metagenomic annotation involves a wide range of computational tools, which are difficult to be installed and maintained by common users. The tools provided by the few available web servers are also limited and have various constraints such as login requirement, long waiting time, inability to configure pipelines etc. Results We developed WebMGA, a customizable web server for fast metagenomic analysis. WebMGA includes over 20 commonly used tools such as ORF calling, sequence clustering, quality control of raw reads, removal of sequencing artifacts and contaminations, taxonomic analysis, functional annotation etc. WebMGA provides users with rapid metagenomic data analysis using fast and effective tools, which have been implemented to run in parallel on our local computer cluster. Users can access WebMGA through web browsers or programming scripts to perform individual analysis or to configure and run customized pipelines. WebMGA is freely available at http://weizhongli-lab.org/metagenomic-analysis. Conclusions WebMGA offers to researchers many fast and unique tools and great flexibility for complex metagenomic data analysis.</p

Directory of Open Access Journals

Facilitating software refactoring with appropriate resolution order of bad smells

Author: Liu Hui
Ma Zhyi
Niu Zhendong
Shao Weizhong
Yang Limei
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2009
Field of study

Bad smell is a key concept in software refactoring. We have a bunch of bad smells, refactoring rules, and refactoring tools, but we do not know which kind of bad smells should be resolved first. The resolution of one kind of bad smells may have impact on the resolution of other bad smells. Consequently, different resolution orders of the same set of bad smells may require different effort, and/or lead to different quality improvement. In order to ease the work and maximize the effect of refactoring, we try to analyze the relationships among different kinds of bad smells, and their impact on resolution orders of these bad smells. With the analysis, we recommend a resolution order of common bad smells. The main contribution of this paper is to motivate the necessity to arrange resolution orders of bad smells, and recommend a resolution order of common bad smells. Copyright 2009 ACM.EI

CiteSeerX

Crossref

WebMGA: a Customizable Web Server for Fast Metagenomic Sequence Analysis

Author: Fu Liming
Li Weizhong
Niu Beifang
Wu Sitao
Zhu Zhengwei
Publication venue: eScholarship, University of California
Publication date: 07/09/2011
Field of study

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

CD-HIT: accelerated for clustering the next-generation sequencing data

Author: Beifang Niu
Edgar
Li
Li
Limin Fu
Loong
Niu
Qin
Rubinstein
Sitao Wu
Sun
Suzek
Turnbaugh
Weizhong Li
Yooseph
Zhengwei Zhu
Publication venue: 'Oxford University Press (OUP)'
Publication date
Field of study

Crossref

Gclust: A Parallel Clustering Tool for Microbial Genomic Data

Author: Beifang Niu
Chuangchuang Dai
Dan Zhao
Haidong Zhu
Rong He
Rongqiang Cao
Ruilin Li
Tie Niu
Wei Chen
Weizhong Li
Xianyu Lang
Xiaodong Li
Xiaoyu He
Xinyin Han
Xuebin Chi
Yi Zhao
Yu Zhang
Zhonghua Lu
Publication venue: Elsevier
Publication date: 01/10/2019
Field of study

The accelerating growth of the public microbial genomic data imposes substantial burden on the research community that uses such resources. Building databases for non-redundant reference sequences from massive microbial genomic data based on clustering analysis is essential. However, existing clustering algorithms perform poorly on long genomic sequences. In this article, we present Gclust, a parallel program for clustering complete or draft genomic sequences, where clustering is accelerated with a novel parallelization strategy and a fast sequence comparison algorithm using sparse suffix arrays (SSAs). Moreover, genome identity measures between two sequences are calculated based on their maximal exact matches (MEMs). In this paper, we demonstrate the high speed and clustering quality of Gclust by examining four genome sequence datasets. Gclust is freely available for non-commercial use at https://github.com/niu-lab/gclust. We also introduce a web server for clustering user-uploaded genomes at http://niulab.scgrid.cn/gclust. Keywords: Microbial genome clustering, Parallelization, Sparse suffix array, Maximal exact match, Segment extensio

Directory of Open Access Journals