Search CORE

200,920 research outputs found

MarDRe: efficient MapReduce-based removal of duplicate DNA reads in the cloud

Author: Expósito Roberto R.
González-Domínguez Jorge
Touriño Juan
Veiga Jorge
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2017
Field of study

This is a pre-copyedited, author-produced version of an article accepted for publication in Bioinformatics following peer review. The version of record Roberto R. Expósito, Jorge Veiga, Jorge González-Domínguez, Juan Touriño; MarDRe: efficient MapReduce-based removal of duplicate DNA reads in the cloud, Bioinformatics, Volume 33, Issue 17, 1 September 2017, Pages 2762–2764 is available online at: https://doi.org/10.1093/bioinformatics/btx307[Abstract] This article presents MarDRe, a de novo cloud-ready duplicate and near-duplicate removal tool that can process single- and paired-end reads from FASTQ/FASTA datasets. MarDRe takes advantage of the widely adopted MapReduce programming model to fully exploit Big Data technologies on cloud-based infrastructures. Written in Java to maximize cross-platform compatibility, MarDRe is built upon the open-source Apache Hadoop project, the most popular distributed computing framework for scalable Big Data processing. On a 16-node cluster deployed on the Amazon EC2 cloud platform, MarDRe is up to 8.52 times faster than a representative state-of-the-art tool.Ministerio de Economia y Competitividad; TIN2016-75845-PMinisterio de Educación; FPU014/0280

Repositorio da Universidade da Coruña

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

BioGUID: resolving, discovering, and minting identifiers for biodiversity informatics

Author: Page R.D.M.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Background: Linking together the data of interest to biodiversity researchers (including specimen records, images, taxonomic names, and DNA sequences) requires services that can mint, resolve, and discover globally unique identifiers (including, but not limited to, DOIs, HTTP URIs, and LSIDs). Results: BioGUID implements a range of services, the core ones being an OpenURL resolver for bibliographic resources, and a LSID resolver. The LSID resolver supports Linked Data-friendly resolution using HTTP 303 redirects and content negotiation. Additional services include journal ISSN look-up, author name matching, and a tool to monitor the status of biodiversity data providers. Conclusion: BioGUID is available at http://bioguid.info/. Source code is available from http://code.google.com/p/bioguid/

Springer - Publisher Connector

PubMed Central

Enlighten

Nature Precedings

mockrobiota: a Public Resource for Microbiome Bioinformatics Benchmarking.

Author: Arron Shiffer
Benjamin Wolfe
Corinne F. Maurice
J. Gregory Caporaso
Jai Ram Rideout
Josh D. Neufeld
Nicholas A. Bokulich
Peter J. Turnbaugh
Rachel J. Dutton
Rob Knight
William G. Mercurio
Publication venue: eScholarship, University of California
Publication date: 01/01/2016
Field of study

Mock communities are an important tool for validating, optimizing, and comparing bioinformatics methods for microbial community analysis. We present mockrobiota, a public resource for sharing, validating, and documenting mock community data resources, available at http://caporaso-lab.github.io/mockrobiota/. The materials contained in mockrobiota include data set and sample metadata, expected composition data (taxonomy or gene annotations or reference sequences for mock community members), and links to raw data (e.g., raw sequence data) for each mock community data set. mockrobiota does not supply physical sample materials directly, but the data set metadata included for each mock community indicate whether physical sample materials are available. At the time of this writing, mockrobiota contains 11 mock community data sets with known species compositions, including bacterial, archaeal, and eukaryotic mock communities, analyzed by high-throughput marker gene sequencing. IMPORTANCE The availability of standard and public mock community data will facilitate ongoing method optimizations, comparisons across studies that share source data, and greater transparency and access and eliminate redundancy. These are also valuable resources for bioinformatics teaching and training. This dynamic resource is intended to expand and evolve to meet the changing needs of the omics community

Repository for Publications and Research Data

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Evaluating the Relationship Between Running Times and DNA Sequence Sizes using a Generic-Based Filtering Program.

Author: Oluwagbemi O. O.
Omonhinmin Conrad A.
Publication venue
Publication date: 01/11/2008
Field of study

Generic programming depends on the decomposition of programs into simpler components which may be developed separately and combined arbitrarily, subject only to well- defined interfaces. Bioinformatics deals with the application of computational techniques to data present in the Biological sciences. A genetic sequence is a succession of letters which represents the basic structure of a hypothetical DNA molecule, with the capacity to carry information. This research article studied the relationship between the running times of a generic-based filtering program and different samples of genetic sequences in an increasing order of magnitude. A graphical result was obtained to adequately depict this relationship. It was also discovered that the complexity of the generic tree program was O (log2 N). This research article provided one of the systematic approaches of generic programming to Bioinformatics, which could be instrumental in elucidating major discoveries in Bioinformatics, as regards efficient data management and analysis

Covenant University Repository

Differential Functional Constraints Cause Strain-Level Endemism in Polynucleobacter Populations.

Author: Geeta Rijal
Herbert Ssegane
Iratxe Zarraonaindia
Jack A. Gilbert
Janet K. Jansson
Jarrad T. Hampton-Marcell
M. Cristina Negri
Naseer Sangwan
Tifani W. Eshoo
Publication venue: eScholarship, University of California
Publication date: 01/05/2016
Field of study

The adaptation of bacterial lineages to local environmental conditions creates the potential for broader genotypic diversity within a species, which can enable a species to dominate across ecological gradients because of niche flexibility. The genus Polynucleobacter maintains both free-living and symbiotic ecotypes and maintains an apparently ubiquitous distribution in freshwater ecosystems. Subspecies-level resolution supplemented with metagenome-derived genotype analysis revealed that differential functional constraints, not geographic distance, produce and maintain strain-level genetic conservation in Polynucleobacter populations across three geographically proximal riverine environments. Genes associated with cofactor biosynthesis and one-carbon metabolism showed habitat specificity, and protein-coding genes of unknown function and membrane transport proteins were under positive selection across each habitat. Characterized by different median ratios of nonsynonymous to synonymous evolutionary changes (dN/dS ratios) and a limited but statistically significant negative correlation between the dN/dS ratio and codon usage bias between habitats, the free-living and core genotypes were observed to be evolving under strong purifying selection pressure. Highlighting the potential role of genetic adaptation to the local environment, the two-component system protein-coding genes were highly stable (dN/dS ratio, < 0.03). These results suggest that despite the impact of the habitat on genetic diversity, and hence niche partition, strong environmental selection pressure maintains a conserved core genome for Polynucleobacter populations. IMPORTANCE Understanding the biological factors influencing habitat-wide genetic endemism is important for explaining observed biogeographic patterns. Polynucleobacter is a genus of bacteria that seems to have found a way to colonize myriad freshwater ecosystems and by doing so has become one of the most abundant bacteria in these environments. We sequenced metagenomes from locations across the Chicago River system and assembled Polynucleobacter genomes from different sites and compared how the nucleotide composition, gene codon usage, and the ratio of synonymous (codes for the same amino acid) to nonsynonymous (codes for a different amino acid) mutations varied across these population genomes at each site. The environmental pressures at each site drove purifying selection for functional traits that maintained a streamlined core genome across the Chicago River Polynucleobacter population while allowing for site-specific genomic adaptation. These adaptations enable Polynucleobacter to become dominant across different riverine environmental gradients

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

A Robust and Universal Metaproteomics Workflow for Research Studies and Routine Diagnostics Within 24 h Using Phenol Extraction, FASP Digest, and the MetaProteomeAnalyzer

Author: Behne A.
Benndorf D.
Büdel A.
Dorl S.
Heyer R.
Kohrs F.
Muth T.
Püttker S.
Reichl U.
Saake G.
Schallert K.
Siewert C.
Zoun R.
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2019
Field of study

The investigation of microbial proteins by mass spectrometry (metaproteomics) is a key technology for simultaneously assessing the taxonomic composition and the functionality of microbial communities in medical, environmental, and biotechnological applications. We present an improved metaproteomics workflow using an updated sample preparation and a new version of the MetaProteomeAnalyzer software for data analysis. High resolution by multidimensional separation (GeLC, MudPIT) was sacrificed to aim at fast analysis of a broad range of different samples in less than 24 h. The improved workflow generated at least two times as many protein identifications than our previous workflow, and a drastic increase of taxonomic and functional annotations. Improvements of all aspects of the workflow, particularly the speed, are first steps toward potential routine clinical diagnostics (i.e., fecal samples) and analysis of technical and environmental samples. The MetaProteomeAnalyzer is provided to the scientific community as a central remote server solution at www.mpa.ovgu.de.Peer Reviewe

MPG.PuRe

Publikationsserver des Robert Koch-Instituts

Recommended from our members

Improved Reference Genome Sequence of Coccidioides immitis Strain WA_211, Isolated in Washington State.

Author: Barker Bridget Marie
Stajich Jason E
Teixeira Marcus de Melo
Publication venue: eScholarship, University of California
Publication date: 01/08/2019
Field of study

Coccidioides fungi are widely distributed in the American continents, with an expanding western range documented by a recently discovered cryptic population of Coccidioides immitis in Washington State. The assembled and annotated reference genome sequence of the soil-derived C. immitis strain WA_211 will support population and functional genomics studies

eScholarship - University of California