66 research outputs found

    Extending Seqenv: a taxa-centric approach to environmental annotations of 16S rDNA sequences

    Get PDF
    Understanding how the environment selects a given taxon and the diversity patterns that emerge as a result of environmental filtering can dramatically improve our ability to analyse any environment in depth as well as advancing our knowledge on how the response of different taxa can impact each other and ecosystem functions. Most of the work investigating microbial biogeography has been site-specific, and logical environmental factors, rather than geographical location, may be more influential on microbial diversity. SEQenv, a novel pipeline aiming to provide environmental annotations of sequences emerged to provide a consistent description of the environmental niches using the ENVO ontology. While the pipeline provides a list of environmental terms on the basis of sample datasets and, therefore, the annotations obtained are at the dataset level, it lacks a taxa centric approach to environmental annotation. The work here describes an extension developed to enhance the SEQenv pipeline, which provided the means to directly generate environmental annotations for taxa under different contexts. 16S rDNA amplicon datasets belonging to distinct biomes were selected to illustrate the applicability of the extended SEQenv pipeline. A literature survey of the results demonstrates the immense importance of sequence level environmental annotations by illustrating the distribution of both taxa across environments as well as the various environmental sources of a specific taxon. Significantly enhancing the SEQenv pipeline in the process, this information would be valuable to any biologist seeking to understand the various taxa present in the habitat and the environment they originated from, enabling a more thorough analysis of which lineages are abundant in certain habitats and the recovery of patterns in taxon distribution across different habitats and environmental gradients

    Taxonomic and environmental annotation of bacterial 16S rRNA gene sequences via Shannon entropy and database metadata terms

    Get PDF
    Microbial ecology seeks to describe the diversity and distribution of microorganisms in various habitats within the context of environmental variables. High throughput sequencing has greatly boosted the number and scope of projects aiming to study and analyse these organisms, with ever-increasing amounts of data being generated. Amplicon based taxonomic analysis, which determines the presence of microbial taxa in different environments on the basis of marker gene annotations, often uses percentage identity as the main metric to determine sequence similarity against databases. This data is then used to study the distribution of biodiversity as well as the response of microbial communities to stressors. However, the 16S rRNA gene displays varying degrees of sequence conservation along its length and is therefore prone to provide different results depending on the part of 16S rRNA gene used for sequencing and analysis. Furthermore, sequence alignment is primarily performed using the popular BLAST sequence alignment tool, which incurs a great computational performance penalty although newer, more efficient tools are being developed. A new approach that is fast and more accurate is critically needed to process the avalanche of data. Additionally, repositories of environmental metadata can provide contextual information to sequence annotations, potentially enhancing analysis if they can be incorporated into bioinformatics pipelines. The overarching aim of this work was to enhance the taxonomic annotation of bacterial sequences by developing a weighted scheme that utilizes inherent evolutionary conservation in the bacterial 16S rRNA gene sequences and by adding contextual, environmental information pertaining to these sequences in a systematic fashion

    Data Visualization in Enlightenment Literature and Culture

    Get PDF
    This chapter presents the findings of an ongoing digital project of the Helsinki Computational History Group at Helsinki Centre for Digital Humanities (HELDIG) focused on the history of eighteenth-century book publication. The authors have created a historical-biographical database based on The English Short-Title Catalogue (ESTC), a standard source for analytical bibliographic research, and extracted a data-driven canon which considers changes over time, subject-topics, top-works, authors, publishers, publication place, and materiality. This chapter provides both methodological and historical insights into the development of print and demonstrates the huge analytical potential of harmonized metadata catalogs. While quantitative analyses of the book trade were attempted before, they did not engage with the complex process of canon formation at such a large scale. The authors’ work highlights the formative role played by publishers in this process and the epistemological shift started at the end of the seventeenth century, when religious works were increasingly replaced by literary works. As the authors argue, this shift in the production and consumption of print allowed for a reinvention of the canon during the eighteenth century.</p

    Seqenv : linking sequences to environments through text mining

    Get PDF
    Understanding the distribution of taxa and associated traits across different environments is one of the central questions in microbial ecology. High-throughput sequencing (HTS) studies are presently generating huge volumes of data to address this biogeographical topic. However, these studies are often focused on specific environment types or processes leading to the production of individual, unconnected datasets. The large amounts of legacy sequence data with associated metadata that exist can be harnessed to better place the genetic information found in these surveys into a wider environmental context. Here we introduce a software program, seqenv, to carry out precisely such a task. It automatically performs similarity searches of short sequences against the ‘‘nt’’ nucleotide database provided by NCBI and, out of every hit, extracts–if it is available–the textual metadata field. After collecting all the isolation sources from all the search results, we run a text mining algorithm to identify and parse words that are associated with the Environmental Ontology (EnvO) controlled vocabulary. This, in turn, enables us to determine both in which environments individual sequences or taxa have previously been observed and, by weighted summation of those results, to summarize complete samples. We present two demonstrative applications of seqenv to a survey of ammonia oxidizing archaea as well as to a plankton paleome dataset from the Black Sea. These demonstrate the ability of the tool to reveal novel patterns in HTS How to cite this article Sinclair et al. (2016), Seqenv: linking sequences to environments through text mining. PeerJ 4:e2690; DOI 10.7717/peerj.2690 and its utility in the fields of environmental source tracking, paleontology, and studies of microbial biogeography

    Machine learning approach to predict quality parameters for bacterial consortium-treated hospital wastewater and phytotoxicity assessment on radish, cauliflower, hot pepper, rice and wheat crops

    Get PDF
    Raw hospital wastewater is a source of excessive heavy metals and pharmaceutical pollutants. In water-stressed countries such as Pakistan, the practice of unsafe reuse by local farmers for crop irrigation is of major concern. In our previous work, we developed a low-cost bacterial consortium wastewater treatment method. Here, in a two-part study, we first aimed to find what physico-chemical parameters were the most important for differentiating consortium-treated and untreated wastewater for its safe reuse. This was achieved using a Kruskal–Wallis test on a suite of physico-chemical measurements to find those parameters which were differentially abundant between consortium-treated and untreated wastewater. The differentially abundant parameters were then input to a Random Forest classifier. The classifier showed that ‘turbidity’ was the most influential parameter for predicting biotreatment. In the second part of our study, we wanted to know if the consortium-treated wastewater was safe for crop irrigation. We therefore carried out a plant growth experiment using a range of popular crop plants in Pakistan (Radish, Cauliflower, Hot pepper, Rice and Wheat), which were grown using irrigation from consortium-treated and untreated hospital wastewater at a range of dilutions (turbidity levels) and performed a phytotoxicity assessment. Our results showed an increasing trend in germination indices and a decreasing one in phytotoxicity indices in plants after irrigation with consortium-treated hospital wastewater (at each dilution/turbidity measure). The comparative study of growth between plants showed the following trend: Cauliflower &gt; Radish &gt; Wheat &gt; Rice &gt; Hot pepper. Cauliflower was the most adaptive plant (PI: −0.28, −0.13, −0.16, −0.06) for the treated hospital wastewater, while hot pepper was susceptible for reuse; hence, we conclude that bacterial consortium-treated hospital wastewater is safe for reuse for the irrigation of cauliflower, radish, wheat and rice. We further conclude that turbidity is the most influential parameter for predicting bio-treatment efficiency prior to water reuse. This method, therefore, could represent a low-cost, low-tech and safe means for farmers to grow crops in water stressed areas

    MKEM: a Multi-level Knowledge Emergence Model for mining undiscovered public knowledge

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Since Swanson proposed the Undiscovered Public Knowledge (UPK) model, there have been many approaches to uncover UPK by mining the biomedical literature. These earlier works, however, required substantial manual intervention to reduce the number of possible connections and are mainly applied to disease-effect relation. With the advancement in biomedical science, it has become imperative to extract and combine information from multiple disjoint researches, studies and articles to infer new hypotheses and expand knowledge.</p> <p>Methods</p> <p>We propose MKEM, a Multi-level Knowledge Emergence Model, to discover implicit relationships using Natural Language Processing techniques such as Link Grammar and Ontologies such as Unified Medical Language System (UMLS) MetaMap. The contribution of MKEM is as follows: First, we propose a flexible knowledge emergence model to extract implicit relationships across different levels such as molecular level for gene and protein and Phenomic level for disease and treatment. Second, we employ MetaMap for tagging biological concepts. Third, we provide an empirical and systematic approach to discover novel relationships.</p> <p>Results</p> <p>We applied our system on 5000 abstracts downloaded from PubMed database. We performed the performance evaluation as a gold standard is not yet available. Our system performed with a good precision and recall and we generated 24 hypotheses.</p> <p>Conclusions</p> <p>Our experiments show that MKEM is a powerful tool to discover hidden relationships residing in extracted entities that were represented by our Substance-Effect-Process-Disease-Body Part (SEPDB) model. </p

    Genetic mechanisms of critical illness in COVID-19.

    Get PDF
    Host-mediated lung inflammation is present1, and drives mortality2, in the critical illness caused by coronavirus disease 2019 (COVID-19). Host genetic variants associated with critical illness may identify mechanistic targets for therapeutic development3. Here we report the results of the GenOMICC (Genetics Of Mortality In Critical Care) genome-wide association study in 2,244 critically ill patients with COVID-19 from 208 UK intensive care units. We have identified and replicated the following new genome-wide significant associations: on chromosome 12q24.13 (rs10735079, P = 1.65 × 10-8) in a gene cluster that encodes antiviral restriction enzyme activators (OAS1, OAS2 and OAS3); on chromosome 19p13.2 (rs74956615, P = 2.3 × 10-8) near the gene that encodes tyrosine kinase 2 (TYK2); on chromosome 19p13.3 (rs2109069, P = 3.98 ×  10-12) within the gene that encodes dipeptidyl peptidase 9 (DPP9); and on chromosome 21q22.1 (rs2236757, P = 4.99 × 10-8) in the interferon receptor gene IFNAR2. We identified potential targets for repurposing of licensed medications: using Mendelian randomization, we found evidence that low expression of IFNAR2, or high expression of TYK2, are associated with life-threatening disease; and transcriptome-wide association in lung tissue revealed that high expression of the monocyte-macrophage chemotactic receptor CCR2 is associated with severe COVID-19. Our results identify robust genetic signals relating to key host antiviral defence mechanisms and mediators of inflammatory organ damage in COVID-19. Both mechanisms may be amenable to targeted treatment with existing drugs. However, large-scale randomized clinical trials will be essential before any change to clinical practice

    Atrasentan and renal events in patients with type 2 diabetes and chronic kidney disease (SONAR): a double-blind, randomised, placebo-controlled trial

    Get PDF
    Background: Short-term treatment for people with type 2 diabetes using a low dose of the selective endothelin A receptor antagonist atrasentan reduces albuminuria without causing significant sodium retention. We report the long-term effects of treatment with atrasentan on major renal outcomes. Methods: We did this double-blind, randomised, placebo-controlled trial at 689 sites in 41 countries. We enrolled adults aged 18–85 years with type 2 diabetes, estimated glomerular filtration rate (eGFR)25–75 mL/min per 1·73 m 2 of body surface area, and a urine albumin-to-creatinine ratio (UACR)of 300–5000 mg/g who had received maximum labelled or tolerated renin–angiotensin system inhibition for at least 4 weeks. Participants were given atrasentan 0·75 mg orally daily during an enrichment period before random group assignment. Those with a UACR decrease of at least 30% with no substantial fluid retention during the enrichment period (responders)were included in the double-blind treatment period. Responders were randomly assigned to receive either atrasentan 0·75 mg orally daily or placebo. All patients and investigators were masked to treatment assignment. The primary endpoint was a composite of doubling of serum creatinine (sustained for ≥30 days)or end-stage kidney disease (eGFR <15 mL/min per 1·73 m 2 sustained for ≥90 days, chronic dialysis for ≥90 days, kidney transplantation, or death from kidney failure)in the intention-to-treat population of all responders. Safety was assessed in all patients who received at least one dose of their assigned study treatment. The study is registered with ClinicalTrials.gov, number NCT01858532. Findings: Between May 17, 2013, and July 13, 2017, 11 087 patients were screened; 5117 entered the enrichment period, and 4711 completed the enrichment period. Of these, 2648 patients were responders and were randomly assigned to the atrasentan group (n=1325)or placebo group (n=1323). Median follow-up was 2·2 years (IQR 1·4–2·9). 79 (6·0%)of 1325 patients in the atrasentan group and 105 (7·9%)of 1323 in the placebo group had a primary composite renal endpoint event (hazard ratio [HR]0·65 [95% CI 0·49–0·88]; p=0·0047). Fluid retention and anaemia adverse events, which have been previously attributed to endothelin receptor antagonists, were more frequent in the atrasentan group than in the placebo group. Hospital admission for heart failure occurred in 47 (3·5%)of 1325 patients in the atrasentan group and 34 (2·6%)of 1323 patients in the placebo group (HR 1·33 [95% CI 0·85–2·07]; p=0·208). 58 (4·4%)patients in the atrasentan group and 52 (3·9%)in the placebo group died (HR 1·09 [95% CI 0·75–1·59]; p=0·65). Interpretation: Atrasentan reduced the risk of renal events in patients with diabetes and chronic kidney disease who were selected to optimise efficacy and safety. These data support a potential role for selective endothelin receptor antagonists in protecting renal function in patients with type 2 diabetes at high risk of developing end-stage kidney disease. Funding: AbbVie
    corecore