48 research outputs found

    Metagenomics : tools and insights for analyzing next-generation sequencing data derived from biodiversity studies

    Get PDF
    Advances in next-generation sequencing (NGS) have allowed significant breakthroughs in microbial ecology studies. This has led to the rapid expansion of research in the field and the establishment of “metagenomics”, often defined as the analysis of DNA from microbial communities in environmental samples without prior need for culturing. Many metagenomics statistical/computational tools and databases have been developed in order to allow the exploitation of the huge influx of data. In this review article, we provide an overview of the sequencing technologies and how they are uniquely suited to various types of metagenomic studies. We focus on the currently available bioinformatics techniques, tools, and methodologies for performing each individual step of a typical metagenomic dataset analysis. We also provide future trends in the field with respect to tools and technologies currently under development. Moreover, we discuss data management, distribution, and integration tools that are capable of performing comparative metagenomic analyses of multiple datasets using well-established databases, as well as commonly used annotation standards

    Optimized R functions for analysis of ecological community data using the R virtual laboratory (RvLab)

    Get PDF
    Background: Parallel data manipulation using R has previously been addressed by members of the R community, however most of these studies produce ad hoc solutions that are not readily available to the average R user. Our targeted users, ranging from the expert ecologist/microbiologists to computational biologists, often experience difficulties in finding optimal ways to exploit the full capacity of their computational resources. In addition, improving performance of commonly used R scripts becomes increasingly difficult especially with large datasets. Furthermore, the implementations described here can be of significant interest to expert bioinformaticians or R developers. Therefore, our goals can be summarized as: (i) description of a complete methodology for the analysis of large datasets by combining capabilities of diverse R packages, (ii) presentation of their application through a virtual R laboratory (RvLab) that makes execution of complex functions and visualization of results easy and readily available to the end-user. New information: In this paper, the novelty stems from implementations of parallel methodologies which rely on the processing of data on different levels of abstraction and the availability of these processes through an integrated portal. Parallel implementation R packages, such as the pbdMPI (Programming with Big Data – Interface to MPI) package, are used to implement Single Program Multiple Data (SPMD) parallelization on primitive mathematical operations, allowing for interplay with functions of the vegan package. The dplyr and RPostgreSQL R packages are further integrated offering connections to dataframe like objects (databases) as secondary storage solutions whenever memory demands exceed available RAM resources. The RvLab is running on a PC cluster, using version 3.1.2 (2014-10-31) on a x86_64-pc-linux-gnu (64-bit) platform, and offers an intuitive virtual environmet interface enabling users to perform analysis of ecological and microbial communities based on optimized vegan functions. A beta version of the RvLab is available after registration at: https://portal.lifewatchgreece.eu

    Seqenv : linking sequences to environments through text mining

    Get PDF
    Understanding the distribution of taxa and associated traits across different environments is one of the central questions in microbial ecology. High-throughput sequencing (HTS) studies are presently generating huge volumes of data to address this biogeographical topic. However, these studies are often focused on specific environment types or processes leading to the production of individual, unconnected datasets. The large amounts of legacy sequence data with associated metadata that exist can be harnessed to better place the genetic information found in these surveys into a wider environmental context. Here we introduce a software program, seqenv, to carry out precisely such a task. It automatically performs similarity searches of short sequences against the ‘‘nt’’ nucleotide database provided by NCBI and, out of every hit, extracts–if it is available–the textual metadata field. After collecting all the isolation sources from all the search results, we run a text mining algorithm to identify and parse words that are associated with the Environmental Ontology (EnvO) controlled vocabulary. This, in turn, enables us to determine both in which environments individual sequences or taxa have previously been observed and, by weighted summation of those results, to summarize complete samples. We present two demonstrative applications of seqenv to a survey of ammonia oxidizing archaea as well as to a plankton paleome dataset from the Black Sea. These demonstrate the ability of the tool to reveal novel patterns in HTS How to cite this article Sinclair et al. (2016), Seqenv: linking sequences to environments through text mining. PeerJ 4:e2690; DOI 10.7717/peerj.2690 and its utility in the fields of environmental source tracking, paleontology, and studies of microbial biogeography

    Putative antimicrobial peptides within bacterial proteomes affect bacterial predominance: a network analysis perspective

    Get PDF
    The predominance of bacterial taxa in the gut, was examined in view of the putative antimicrobial peptide sequences (AMPs) within their proteomes. The working assumption was that compatible bacteria would share homology and thus immunity to their putative AMPs, while competing taxa would have dissimilarities in their proteome-hidden AMPs. A network–based method (“Bacterial Wars”) was developed to handle sequence similarities of predicted AMPs among UniProt-derived protein sequences from different bacterial taxa, while a resulting parameter (“Die” score) suggested which taxa would prevail in a defined microbiome. T he working hypothesis was examined by correlating the calculated Die scores, to the abundance of bacterial taxa from gut microbiomes from different states of health and disease. Eleven publicly available 16S rRNA datasets and a dataset from a full shotgun metagenomics served for the analysis. The overall conclusion was that AMPs encrypted within bacterial proteomes affected the predominance of bacterial taxa in chemospheres

    Metagenomic investigation of the geologically unique Hellenic Volcanic Arc reveals a distinctive ecosystem with unexpected physiology

    Get PDF
    Hydrothermal vents represent a deep, hot, aphotic biosphere where chemosynthetic primary producers, fuelled by chemicals from Earth\u27s subsurface, form the basis of life. In this study, we examined microbial mats from two distinct volcanic sites within the Hellenic Volcanic Arc (HVA). The HVA is geologically and ecologically unique, with reported emissions of CO2‐saturated fluids at temperatures up to 220°C and a notable absence of macrofauna. Metagenomic data reveals highly complex prokaryotic communities composed of chemolithoautotrophs, some methanotrophs, and to our surprise, heterotrophs capable of anaerobic degradation of aromatic hydrocarbons. Our data suggest that aromatic hydrocarbons may indeed be a significant source of carbon in these sites, and instigate additional research into the nature and origin of these compounds in the HVA. Novel physiology was assigned to several uncultured prokaryotic lineages; most notably, a SAR406 representative is attributed with a role in anaerobic hydrocarbon degradation. This dataset, the largest to date from submarine volcanic ecosystems, constitutes a significant resource of novel genes and pathways with potential biotechnological applications

    Prediction of novel microRNA genes in cancer-associated genomic regions—a combined computational and experimental approach

    Get PDF
    The majority of existing computational tools rely on sequence homology and/or structural similarity to identify novel microRNA (miRNA) genes. Recently supervised algorithms are utilized to address this problem, taking into account sequence, structure and comparative genomics information. In most of these studies miRNA gene predictions are rarely supported by experimental evidence and prediction accuracy remains uncertain. In this work we present a new computational tool (SSCprofiler) utilizing a probabilistic method based on Profile Hidden Markov Models to predict novel miRNA precursors. Via the simultaneous integration of biological features such as sequence, structure and conservation, SSCprofiler achieves a performance accuracy of 88.95% sensitivity and 84.16% specificity on a large set of human miRNA genes. The trained classifier is used to identify novel miRNA gene candidates located within cancer-associated genomic regions and rank the resulting predictions using expression information from a full genome tiling array. Finally, four of the top scoring predictions are verified experimentally using northern blot analysis. Our work combines both analytical and experimental techniques to show that SSCprofiler is a highly accurate tool which can be used to identify novel miRNA gene candidates in the human genome. SSCprofiler is freely available as a web service at http://www.imbb.forth.gr/SSCprofiler.html

    Positioning Europe for the EPITRANSCRIPTOMICS challenge

    Get PDF
    The genetic alphabet consists of the four letters: C, A, G, and T in DNA and C,A,G, and U in RNA. Triplets of these four letters jointly encode 20 different amino acids out of which proteins of all organisms are built. This system is universal and is found in all kingdoms of life. However, bases in DNA and RNA can be chemically modified. In DNA, around 10 different modifications are known, and those have been studied intensively over the past 20 years. Scientific studies on DNA modifications and proteins that recognize them gave rise to the large field of epigenetic and epigenomic research. The outcome of this intense research field is the discovery that development, ageing, and stem-cell dependent regeneration but also several diseases including cancer are largely controlled by the epigenetic state of cells. Consequently, this research has already led to the first FDA approved drugs that exploit the gained knowledge to combat disease. In recent years, the ~150 modifications found in RNA have come to the focus of intense research. Here we provide a perspective on necessary and expected developments in the fast expanding area of RNA modifications, termed epitranscriptomics.SCOPUS: no.jinfo:eu-repo/semantics/publishe

    Metagenomic 16s rRNA investigation of microbial communities in the Black Sea estuaries in South-West of Ukraine

    No full text
    The Black Sea estuaries represent interfaces of the sea and river environments. Microorganisms that inhabit estuarine water play an integral role in all biochemical processes that occur there and form unique ecosystems. There are many estuaries located in the Southern-Western part of Ukraine and some of them are already separated from the sea. The aim of this research was to determine the composition of microbial communities in the Khadzhibey, Dniester and Sukhyi estuaries by metagenomic 16S rDNA analysis. This study is the first complex analysis of estuarine microbiota based on isolation of total DNA from a biome that was further subjected to sequencing. DNA was extracted from water samples and sequenced on the Illumina Miseq platform using primers to the V4 variable region of the 16S rRNA gene. Computer analysis of the obtained raw sequences was done with QIIME (Quantitative Insights Into Microbial Ecology) software. As the outcome, 57970 nucleotide sequences were retrieved. Bioinformatic analysis of bacterial community in the studied samples demonstrated a high taxonomic diversity of Prokaryotes at above genus level. It was shown that majority of 16S rDNA bacterial sequences detected in the estuarine samples belonged to phyla Cyanobacteria, Proteobacteria, Bacteroidetes, Actinobacteria, Verrucomicrobia, Planctomycetes. The Khadhzibey estuary was dominated by the Proteobacteria phylum, while Dniester and Sukhyi estuaries were characterized by dominance of Cyanobacteria. The differences in bacterial populations between the Khadzhibey, Dniester and Sukhyi estuaries were demonstrated through the Beta-diversity analysis. It showed that the Khadzhibey estuary's microbial community significantly varies from the Sukhyi and Dniester estuaries. The majority of identified bacterial species is known as typical inhabitants of marine environments, however, for 2.5% of microbial population members in the studied estuaries no relatives were determined
    corecore