Search CORE

4,601 research outputs found

Benchmarking database systems for Genomic Selection implementation

Author: Guignon Valentin
Jones Elizabeth
Larmande Pierre
Matthews Dave
Nti-Addae Yaw
Petel Adrien
Renner Jon
Robbins Kelly
Sempere Guilhem
Syed Raza
Ulat Victor Jun
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2019
Field of study

Motivation: With high-throughput genotyping systems now available, it has become feasible to fully integrate genotyping information into breeding programs. To make use of this information effectively requires DNA extraction facilities and marker production facilities that can efficiently deploy the desired set of markers across samples with a rapid turnaround time that allows for selection before crosses needed to be made. In reality, breeders often have a short window of time to make decisions by the time they are able to collect all their phenotyping data and receive corresponding genotyping data. This presents a challenge to organize information and utilize it in downstream analyses to support decisions made by breeders. In order to implement genomic selection routinely as part of breeding programs, one would need an efficient genotyping data storage system. We selected and benchmarked six popular open-source data storage systems, including relational database management and columnar storage systems. Results: We found that data extract times are greatly influenced by the orientation in which genotype data is stored in a system. HDF5 consistently performed best, in part because it can more efficiently work with both orientations of the allele matrix

Agritrop

CGSpace

Horizon / Pleins textes

Detailed evaluation of data analysis tools for subtyping of bacterial isolates based on whole genome sequencing : Neisseria meningitidis as a proof of concept

Author: Bertrand Sophie
De Keersmaecker Sigrid CJ
Marchal Kathleen
Mattheus Wesley
Roosens Nancy HC
Saltykova Assia
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2019
Field of study

Whole genome sequencing is increasingly recognized as the most informative approach for characterization of bacterial isolates. Success of the routine use of this technology in public health laboratories depends on the availability of well-characterized and verified data analysis methods. However, multiple subtyping workflows are now often being used for a single organism, and differences between them are not always well described. Moreover, methodologies for comparison of subtyping workflows, and assessment of their performance are only beginning to emerge. Current work focuses on the detailed comparison of WGS-based subtyping workflows and evaluation of their suitability for the organism and the research context in question. We evaluated the performance of pipelines used for subtyping of Neisseria meningitidis, including the currently widely applied cgMLST approach and different SNP-based methods. In addition, the impact of the use of different tools for detection and filtering of recombinant regions and of different reference genomes were tested. Our benchmarking analysis included both assessment of technical performance of the pipelines and functional comparison of the generated genetic distance matrices and phylogenetic trees. It was carried out using replicate sequencing datasets of high- and low-coverage, consisting mainly of isolates belonging to the clonal complex 269. We demonstrated that cgMLST and some of the SNP-based subtyping workflows showed very good performance characteristics and highly similar genetic distance matrices and phylogenetic trees with isolates belonging to the same clonal complex. However, only two of the tested workflows demonstrated reproducible results for a group of more closely related isolates. Additionally, results of the SNP-based subtyping workflows were to some level dependent on the reference genome used. Interestingly, the use of recombination-filtering software generally reduced the similarity between the gene-by-gene and SNP-based methodologies for subtyping of N. meningitidis. Our study, where N. meningitidis was taken as an example, clearly highlights the need for more benchmarking comparative studies to eventually contribute to a justified use of a specific WGS data analysis workflow within an international public health laboratory context

Sciensano Publications Repository

Ghent University Academic Bibliography

Recommended from our members

Optimizing sequencing protocols for leaderboard metagenomics by combining long and short reads.

Author: Arthur Timothy D
Bankevich Anton
Boland Brigid S
Brennan Caitriona
Chang John T
Chen Feng
Conrad Douglas J
Dang Jason W
Dorrestein Pieter C
Fedarko Marcus
Gaffney James
Green Cliff
Humphrey Greg C
Jepsen Kristen
Khosroheidari Mahdieh
Knight Rob
Liyanage Marlon
Martino Cameron
Minich Jeremiah
Nurk Sergey
Pevzner Pavel A
Phelan Vanessa V
Quinn Robert A
Rana Tariq M
Salido Rodolfo A
Sandborn William J
Sanders Jon G
Sanders Karenina
Smarr Larry
Xu Zhenjiang Z
Zhu Qiyun
Publication venue: eScholarship, University of California
Publication date: 01/10/2019
Field of study

As metagenomic studies move to increasing numbers of samples, communities like the human gut may benefit more from the assembly of abundant microbes in many samples, rather than the exhaustive assembly of fewer samples. We term this approach leaderboard metagenome sequencing. To explore protocol optimization for leaderboard metagenomics in real samples, we introduce a benchmark of library prep and sequencing using internal references generated by synthetic long-read technology, allowing us to evaluate high-throughput library preparation methods against gold-standard reference genomes derived from the samples themselves. We introduce a low-cost protocol for high-throughput library preparation and sequencing

eScholarship - University of California

PLIT: An alignment-free computational tool for identification of long non-coding RNAs in plant transcriptomic datasets

Author: Deshpande Sumukh
England Matthew
Shuttleworth James
Taramonli Sandy
Yang Jianhua
Publication venue: 'Elsevier BV'
Publication date: 01/02/2019
Field of study

Long non-coding RNAs (lncRNAs) are a class of non-coding RNAs which play a significant role in several biological processes. RNA-seq based transcriptome sequencing has been extensively used for identification of lncRNAs. However, accurate identification of lncRNAs in RNA-seq datasets is crucial for exploring their characteristic functions in the genome as most coding potential computation (CPC) tools fail to accurately identify them in transcriptomic data. Well-known CPC tools such as CPC2, lncScore, CPAT are primarily designed for prediction of lncRNAs based on the GENCODE, NONCODE and CANTATAdb databases. The prediction accuracy of these tools often drops when tested on transcriptomic datasets. This leads to higher false positive results and inaccuracy in the function annotation process. In this study, we present a novel tool, PLIT, for the identification of lncRNAs in plants RNA-seq datasets. PLIT implements a feature selection method based on L1 regularization and iterative Random Forests (iRF) classification for selection of optimal features. Based on sequence and codon-bias features, it classifies the RNA-seq derived FASTA sequences into coding or long non-coding transcripts. Using L1 regularization, 31 optimal features were obtained based on lncRNA and protein-coding transcripts from 8 plant species. The performance of the tool was evaluated on 7 plant RNA-seq datasets using 10-fold cross-validation. The analysis exhibited superior accuracy when evaluated against currently available state-of-the-art CPC tools

arXiv.org e-Print Archive

Online Research @ Cardiff

Coventry University Pure Portal

Essential guidelines for computational method benchmarking

Author: Boulesteix Anne-Laure
Cannoodt Robrecht
Gardner Paul P.
Hapfelmeier Alexander
Robinson Mark D.
Saelens Wouter
Saeys Yvan
Soneson Charlotte
Weber Lukas M.
Publication venue
Publication date: 01/01/2019
Field of study

In computational biology and other sciences, researchers are frequently faced with a choice between several computational methods for performing data analyses. Benchmarking studies aim to rigorously compare the performance of different methods using well-characterized benchmark datasets, to determine the strengths of each method or to provide recommendations regarding suitable choices of methods for an analysis. However, benchmarking studies must be carefully designed and implemented to provide accurate, unbiased, and informative results. Here, we summarize key practical guidelines and recommendations for performing high-quality benchmarking analyses, based on our experiences in computational biology.Comment: Minor update

arXiv.org e-Print Archive

Ghent University Academic Bibliography

Open Access LMU

ZORA

Global Functional Atlas of \u3cem\u3eEscherichia coli\u3c/em\u3e Encompassing Previously Uncharacterized Proteins

Author: Ali Mehrab
Babu Mohan
Butland Gareth
Chandran Shamanta
Christopolous Constantine
Emili Andrew
Eroukova Veronika
Golshani Ashkan
Greenblatt Jack F.
Guao Xinghua
Hu Pingzhao
Janga Sarah Chandra
Moreno-Hagelsieb Gabriel
Musso Gabriela
Nazarians-Armavil Anaies
Nazemof Nazila
Paccanaro Alberto
Phanse Sadhna
Pogoutse Oxana
Wong Peter
Yang Wenhong
Publication venue: Scholars Commons @ Laurier
Publication date: 01/04/2009
Field of study

One-third of the 4,225 protein-coding genes of Escherichia coli K-12 remain functionally unannotated (orphans). Many map to distant clades such as Archaea, suggesting involvement in basic prokaryotic traits, whereas others appear restricted to E. coli, including pathogenic strains. To elucidate the orphans’ biological roles, we performed an extensive proteomic survey using affinity-tagged E. coli strains and generated comprehensive genomic context inferences to derive a high-confidence compendium for virtually the entire proteome consisting of 5,993 putative physical interactions and 74,776 putative functional associations, most of which are novel. Clustering of the respective probabilistic networks revealed putative orphan membership in discrete multiprotein complexes and functional modules together with annotated gene products, whereas a machine-learning strategy based on network integration implicated the orphans in specific biological processes. We provide additional experimental evidence supporting orphan participation in protein synthesis, amino acid metabolism, biofilm formation, motility, and assembly of the bacterial cell envelope. This resource provides a “systems-wide” functional blueprint of a model microbe, with insights into the biological and evolutionary significance of previously uncharacterized proteins

Wilfrid Laurier University

Essential guidelines for computational method benchmarking

Author: Boulesteix Anne-Laure
Cannoodt Robrecht
Gardner Paul P
Hapfelmeier Alexander
Robinson Mark D
Saelens Wouter
Saeys Yvan
Soneson Charlotte
Weber Lukas M
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Ghent University Academic Bibliography

Open Access LMU

ZORA