46 research outputs found

    COMPARATIVE METAGENOMICS ANALYSIS OF PALM OIL MILL EFFLUENT (POME) USING THREE DIFFERENT BIOINFORMATICS PIPELINES

    Get PDF
    ABSTRACT: The substantial cost reduction and massive production of next-generation sequencing (NGS) data have contributed to the progress in the rapid growth of metagenomics. However, production of the massive amount of data by NGS has revealed the challenges in handling the existing bioinformatics tools related to metagenomics. Therefore, in this research we have investigated an equal set of DNA metagenomics data from palm oil mill effluent (POME) sample using three different freeware bioinformatics pipelines’ websites of metagenomics RAST server (MG-RAST), Integrated Microbial Genomes with Microbiome Samples (IMG/M) and European Bioinformatics Institute (EBI) Metagenomics, in term of the taxonomic assignment and functional analysis. We found that MG-RAST is the quickest among these three pipelines. However, in term of analysis of results, IMG/M provides more variety of phylum with wider percent identities for taxonomical assignment and IMG/M provides the highest carbohydrates, amino acids, lipids, and coenzymes transport and metabolism functional annotation beside the highest in total number of glycoside hydrolase enzymes. Next, in identifying the conserved domain and family involved, EBI Metagenomics would be much more appropriate. All the three bioinformatics pipelines have their own specialties and can be used alternately or at the same time based on the user’s functional preference. ABSTRAK: Pengurangan kos dalam skala besar dan pengeluaran data ‘next-generation sequencing’ (NGS) secara besar-besaran telah menyumbang kepada pertumbuhan pesat metagenomik. Walau bagaimanapun, pengeluaran data dalam skala yang besar oleh NGS telah menimbulkan cabaran dalam mengendalikan alat-alat bioinformatika yang sedia ada berkaitan dengan metagenomik. Justeru itu, dalam kajian ini, kami telah menyiasat satu set data metagenomik DNA yang sama dari sampel effluen kilang minyak sawit dengan menggunakan tiga laman web bioinformatik percuma iaitu dari laman web ‘metagenomics RAST server’ (MG-RAST), ‘Integrated Microbial Genomes with Microbiome Samples’ (IMG/M) dan ‘European Bioinformatics Institute’ (EBI) Metagenomics dari segi taksonomi dan analisis fungsi. Kami mendapati bahawa MG-RAST ialah yang paling cepat di antara ketiga-tiga ‘pipeline’, tetapi mengikut keputusan analisa, IMG/M mengeluarkan maklumat philum yang lebih pelbagai bersama peratus identiti yang lebih luas berbanding yang lain untuk pembahagian taksonomi dan IMG/M juga mempunyai bacaan tertinggi dalam hampir semua anotasi fungsional karbohidrat, amino asid, lipid, dan koenzima pengangkutan dan metabolisma malah juga paling tinggi dalam jumlah enzim hidrolase glikosida. Kemudian, untuk mengenal pasti ‘domain’ terpelihara dan keluarga yang terlibat, EBI metagenomics lebih bersesuaian. Ketiga-tiga saluran ‘bioinformatics pipeline’ mempunyai keistimewaan mereka yang tersendiri dan boleh digunakan bersilih ganti dalam masa yang sama berdasarkan pilihan fungsi penggun

    nsroot: Minimalist Process Isolation Tool Implemented With Linux Namespaces

    Get PDF
    Data analyses in the life sciences are moving from tools run on a personal computer to services run on large computing platforms. This creates a need to package tools and dependencies for easy installation, configuration and deployment on distributed platforms. In addition, for secure execution there is a need for process isolation on a shared platform. Existing virtual machine and container technologies are often more complex than traditional Unix utilities, like chroot, and often require root privileges in order to set up or use. This is especially challenging on HPC systems where users typically do not have root access. We therefore present nsroot, a lightweight Linux namespaces based process isolation tool. It allows restricting the runtime environment of data analysis tools that may not have been designed with security as a top priority, in order to reduce the risk and consequences of security breaches, without requiring any special privileges. The codebase of nsroot is small, and it provides a command line interface similar to chroot. It can be used on all Linux kernels that implement user namespaces. In addition, we propose combining nsroot with the AppImage format for secure execution of packaged applications. nsroot is open sourced and available at: https://github.com/uit-no/nsroo

    Value, but high costs in post-deposition data curation

    Get PDF
    Discoverability of sequence data in primary data archives is proportional to the richness of contextual information associated with the data. Here, we describe an exercise in the improvement of contextual information surrounding sample records associated with metagenomics sequence reads available in the European Nucleotide Archive. We outline the annotation process and summarize findings of this effort aimed at increasing usability of publicly available environmental data. Furthermore, we emphasize the benefits of such an exercise and detail its costs. We conclude that such a third party annotation approach is expensive and has value as an element of curation, but should form only part of a more sustainable submitter-driven approach

    Value, but high costs in post-deposition data Curation

    Get PDF
    © The Author(s) 2016. Published by Oxford University Press. Discoverability of sequence data in primary data archives is proportional to the richness of contextual information associated with the data. Here, we describe an exercise in the improvement of contextual information surrounding sample records associated with metagenomics sequence reads available in the European Nucleotide Archive. We outline the annotation process and summarize findings of this effort aimed at increasing usability of publicly available environmental data. Furthermore, we emphasize the benefits of such an exercise and detail its costs. We conclude that such a third party annotation approach is expensive and has value as an element of curation, but should form only part of a more sustainable submitter-driven approach

    Introducing BASE: the Biomes of Australian Soil Environments soil microbial diversity database

    Get PDF
    Background: Microbial inhabitants of soils are important to ecosystem and planetary functions, yet there are large gaps in our knowledge of their diversity and ecology. The 'Biomes of Australian Soil Environments' (BASE) project has generated a database of microbial diversity with associated metadata across extensive environmental gradients at continental scale. As the characterisation of microbes rapidly expands, the BASE database provides an evolving platform for interrogating and integrating microbial diversity and function. Findings: BASE currently provides amplicon sequences and associated contextual data for over 900 sites encompassing all Australian states and territories, a wide variety of bioregions, vegetation and land-use types. Amplicons target bacteria, archaea and general and fungal-specific eukaryotes. The growing database will soon include metagenomics data. Data are provided in both raw sequence (FASTQ) and analysed OTU table formats and are accessed via the project's data portal, which provides a user-friendly search tool to quickly identify samples of interest. Processed data can be visually interrogated and intersected with other Australian diversity and environmental data using tools developed by the 'Atlas of Living Australia'. Conclusions: Developed within an open data framework, the BASE project is the first Australian soil microbial diversity database. The database will grow and link to other global efforts to explore microbial, plant, animal, and marine biodiversity. Its design and open access nature ensures that BASE will evolve as a valuable tool for documenting an often overlooked component of biodiversity and the many microbe-driven processes that are essential to sustain soil function and ecosystem services

    nsroot: Minimalist process isolation tool implemented with Linux namespaces

    Get PDF
    services run on large computing platforms.. This creates a need to package tools and dependencies for easy installation,, configuration and deployment on distributed platforms.. In addition,, for secure execution there is a need for process isolation on a shared platform.. Existing virtual machine and container technologies are often more complex than trad itional Unix utilities,, like chroot,, and often require root privileges in order to set up or use.. This is especially challenging on HPC systems where users typically do not have root access.. We therefore present nsroot,, a lightweight Linux namespaces based process isolation tool.. It allows restricting the runtime environment of data analysis tools that may not have been designed with security as a top priority,, in order to reduce the risk and consequences of security breaches,, without requiring any special privileges.. The codebase of nsroot is small,, and it provides a command line interface similar to chroot.. It can be used on all Linux kernels that implement user namespaces.. In addition,, we propose combining nsroot with the AppImage format for secure execu tion of packaged applications.. nsroot is open sourced and available at:: https://github.com/uit-no/nsroot

    The all-intracellular order Legionellales is unexpectedly diverse, globally distributed and lowly abundant

    Get PDF
    Legionellales is an order of the Gammaproteobacteria, only composed of host-adapted, intracellular bacteria, including the accidental human pathogens Legionella pneumophila and Coxiella burnetii. Although the diversity in terms of lifestyle is large across the order, only a few genera have been sequenced, owing to the difficulty to grow intracellular bacteria in pure culture. In particular, we know little about their global distribution and abundance. Here, we analyze 16/18S rDNA amplicons both from tens of thousands of published studies and from two separate sampling campaigns in and around ponds and in a silver mine. We demonstrate that the diversity of the order is much larger than previously thought, with over 450 uncultured genera. We show that Legionellales are found in about half of the samples from freshwater, soil and marine environments and quasi-ubiquitous in man-made environments. Their abundance is low, typically 0.1%, with few samples up to 1%. Most Legionellales OTUs are globally distributed, while many do not belong to a previously identified species. This study sheds a new light on the ubiquity and diversity of one major group of host-adapted bacteria. It also emphasizes the need to use metagenomics to better understand the role of host-adapted bacteria in all environments. The all-intracellular bacterial order of Legionellales is much more diverse, prevalent and globally distributed than previously thought

    Introducing BASE: the Biomes of Australian Soil Environments soil microbial diversity database

    Get PDF
    Microbial inhabitants of soils are important to ecosystem and planetary functions, yet there are large gaps in our knowledge of their diversity and ecology. The ‘Biomes of Australian Soil Environments’ (BASE) project has generated a database of microbial diversity with associated metadata across extensive environmental gradients at continental scale. As the characterisation of microbes rapidly expands, the BASE database provides an evolving platform for interrogating and integrating microbial diversity and function

    A comprehensive survey of integron-associated genes present in metagenomes

    Get PDF
    Background: Integrons are genomic elements that mediate horizontal gene transfer by inserting and removing genetic material using site-specific recombination. Integrons are commonly found in bacterial genomes, where they maintain a large and diverse set of genes that plays an important role in adaptation and evolution. Previous studies have started to characterize the wide range of biological functions present in integrons. However, the efforts have so far mainly been limited to genomes from cultivable bacteria and amplicons generated by PCR, thus targeting only a small part of the total integron diversity. Metagenomic data, generated by direct sequencing of environmental and clinical samples, provides a more holistic and unbiased analysis of integron-associated genes. However, the fragmented nature of metagenomic data has previously made such analysis highly challenging. Results: Here, we present a systematic survey of integron-associated genes in metagenomic data. The analysis was based on a newly developed computational method where integron-associated genes were identified by detecting their associated recombination sites. By processing contiguous sequences assembled from more than 10 terabases of metagenomic data, we were able to identify 13,397 unique integron-associated genes. Metagenomes from marine microbial communities had the highest occurrence of integron-associated genes with levels more than 100-fold higher than in the human microbiome. The identified genes had a large functional diversity spanning over several functional classes. Genes associated with defense mechanisms and mobility facilitators were most overrepresented and more than five times as common in integrons compared to other bacterial genes. As many as two thirds of the genes were found to encode proteins of unknown function. Less than 1% of the genes were associated with antibiotic resistance, of which several were novel, previously undescribed, resistance gene variants. Conclusions: Our results highlight the large functional diversity maintained by integrons present in unculturable bacteria and significantly expands the number of described integron-associated genes
    corecore