155 research outputs found

    Identifying subtypes of chronic kidney disease with machine learning: development, internal validation and prognostic validation using linked electronic health records in 350,067 individuals

    Get PDF
    BACKGROUND: Although chronic kidney disease (CKD) is associated with high multimorbidity, polypharmacy, morbidity and mortality, existing classification systems (mild to severe, usually based on estimated glomerular filtration rate, proteinuria or urine albumin-creatinine ratio) and risk prediction models largely ignore the complexity of CKD, its risk factors and its outcomes. Improved subtype definition could improve prediction of outcomes and inform effective interventions. METHODS: We analysed individuals ≥18 years with incident and prevalent CKD (n = 350,067 and 195,422 respectively) from a population-based electronic health record resource (2006-2020; Clinical Practice Research Datalink, CPRD). We included factors (n = 264 with 2670 derived variables), e.g. demography, history, examination, blood laboratory values and medications. Using a published framework, we identified subtypes through seven unsupervised machine learning (ML) methods (K-means, Diana, HC, Fanny, PAM, Clara, Model-based) with 66 (of 2670) variables in each dataset. We evaluated subtypes for: (i) internal validity (within dataset, across methods); (ii) prognostic validity (predictive accuracy for 5-year all-cause mortality and admissions); and (iii) medications (new and existing by British National Formulary chapter). FINDINGS: After identifying five clusters across seven approaches, we labelled CKD subtypes: 1. Early-onset, 2. Late-onset, 3. Cancer, 4. Metabolic, and 5. Cardiometabolic. Internal validity: We trained a high performing model (using XGBoost) that could predict disease subtypes with 95% accuracy for incident and prevalent CKD (Sensitivity: 0.81-0.98, F1 score:0.84-0.97). Prognostic validity: 5-year all-cause mortality, hospital admissions, and incidence of new chronic diseases differed across CKD subtypes. The 5-year risk of mortality and admissions in the overall incident CKD population were highest in cardiometabolic subtype: 43.3% (42.3-42.8%) and 29.5% (29.1-30.0%), respectively, and lowest in the early-onset subtype: 5.7% (5.5-5.9%) and 18.7% (18.4-19.1%). MEDICATIONS: Across CKD subtypes, the distribution of prescription medication classes at baseline varied, with highest medication burden in cardiometabolic and metabolic subtypes, and higher burden in prevalent than incident CKD. INTERPRETATION: In the largest CKD study using ML, to-date, we identified five distinct subtypes in individuals with incident and prevalent CKD. These subtypes have relevance to study of aetiology, therapeutics and risk prediction. FUNDING: AstraZeneca UK Ltd, Health Data Research UK

    A retrospective cohort study measured predicting and validating the impact of the COVID-19 pandemic in individuals with chronic kidney disease

    Get PDF
    Chronic kidney disease (CKD) is associated with increased risk of baseline mortality and severe COVID-19, but analyses across CKD stages, and comorbidities are lacking. In prevalent and incident CKD, we investigated comorbidities, baseline risk, COVID-19 incidence, and predicted versus observed one-year excess death. In a national dataset (NHS Digital Trusted Research Environment (NHSD TRE)) for England encompassing 56 million individuals), we conducted a retrospective cohort study (March 2020 to March 2021) for prevalence of comorbidities by incident and prevalent CKD, SARS-CoV-2 infection and mortality. Baseline mortality risk, incidence and outcome of infection by comorbidities, controlling for age, sex and vaccination were assessed. Observed versus predicted one-year mortality at varying population infection rates and pandemic-related relative risks using our published model in pre-pandemic CKD cohorts (NHSD TRE and Clinical Practice Research Datalink (CPRD)) were compared. Among individuals with CKD (prevalent:1,934,585, incident:144,969), comorbidities were common (73.5% and 71.2% with one or more condition(s) in respective data sets, and 13.2% and 11.2% with three or more conditions, in prevalent and incident CKD), and associated with SARS-CoV-2 infection, particularly dialysis/transplantation (odds ratio 2.08, 95% confidence interval 2.04-2.13) and heart failure(1.73, 1.71-1.76), but not cancer (1.01, 1.01-1.04). One-year all-cause mortality varied by age, sex, multi-morbidity and CKD stage. Compared with 34,265 observed excess deaths, in the NHSD-TRE and CPRD databases respectively, we predicted 28,746 and 24,546 deaths (infection rates 10% and relative risks 3.0), and 23,754 and 20,283 deaths (observed infection rates 6.7% and relative risks 3.7). Thus, in this largest, national-level study, individuals with CKD have a high burden of comorbidities and multi-morbidity, and high risk of pre-pandemic and pandemic mortality. Hence, treatment of comorbidities, non-pharmaceutical measures, and vaccination are priorities for people with CKD and management of long-term conditions is important during and beyond the pandemic

    High throughput screening of hydrolytic enzymes from termites using a natural substrate derived from sugarcane bagasse

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The description of new hydrolytic enzymes is an important step in the development of techniques which use lignocellulosic materials as a starting point for fuel production. Sugarcane bagasse, which is subjected to pre-treatment, hydrolysis and fermentation for the production of ethanol in several test refineries, is the most promising source of raw material for the production of second generation renewable fuels in Brazil. One problem when screening hydrolytic activities is that the activity against commercial substrates, such as carboxymethylcellulose, does not always correspond to the activity against the natural lignocellulosic material. Besides that, the macroscopic characteristics of the raw material, such as insolubility and heterogeneity, hinder its use for high throughput screenings.</p> <p>Results</p> <p>In this paper, we present the preparation of a colloidal suspension of particles obtained from sugarcane bagasse, with minimal chemical change in the lignocellulosic material, and demonstrate its use for high throughput assays of hydrolases using Brazilian termites as the screened organisms.</p> <p>Conclusions</p> <p>Important differences between the use of the natural substrate and commercial cellulase substrates, such as carboxymethylcellulose or crystalline cellulose, were observed. This suggests that wood feeding termites, in contrast to litter feeding termites, might not be the best source for enzymes that degrade sugarcane biomass.</p

    CodingQuarry: Highly accurate hidden Markov model gene prediction in fungal genomes using RNA-seq transcripts

    Get PDF
    Background: The impact of gene annotation quality on functional and comparative genomics makes gene prediction an important process, particularly in non-model species, including many fungi. Sets of homologous protein sequences are rarely complete with respect to the fungal species of interest and are often small or unreliable, especially when closely related species have not been sequenced or annotated in detail. In these cases, protein homology-based evidence fails to correctly annotate many genes, or significantly improve ab initio predictions. Generalised hidden Markov models (GHMM) have proven to be invaluable tools in gene annotation and, recently, RNA-seq has emerged as a cost-effective means to significantly improve the quality of automated gene annotation. As these methods do not require sets of homologous proteins, improving gene prediction from these resources is of benefit to fungal researchers. While many pipelines now incorporate RNA-seq data in training GHMMs, there has been relatively little investigation into additionally combining RNA-seq data at the point of prediction, and room for improvement in this area motivates this study. Results: CodingQuarry is a highly accurate, self-training GHMM fungal gene predictor designed to work with assembled, aligned RNA-seq transcripts. RNA-seq data informs annotations both during gene-model training and in prediction. Our approach capitalises on the high quality of fungal transcript assemblies by incorporating predictions made directly from transcript sequences. Correct predictions are made despite transcript assembly problems, including those caused by overlap between the transcripts of adjacent gene loci. Stringent benchmarking against high-confidence annotation subsets showed CodingQuarry predicted 91.3% of Schizosaccharomyces pombe genes and 90.4% of Saccharomyces cerevisiae genes perfectly. These results are 4-5% better than those of AUGUSTUS, the next best performing RNA-seq driven gene predictor tested. Comparisons against whole genome Sc. pombe and S. cerevisiae annotations further substantiate a 4-5% improvement in the number of correctly predicted genes. Conclusions: We demonstrate the success of a novel method of incorporating RNA-seq data into GHMM fungal gene prediction. This shows that a high quality annotation can be achieved without relying on protein homology or a training set of genes. CodingQuarry is freely available (https://sourceforge.net/projects/codingquarry/), and suitable for incorporation into genome annotation pipelines

    Biology and biotechnology of Trichoderma

    Get PDF
    Fungi of the genus Trichoderma are soilborne, green-spored ascomycetes that can be found all over the world. They have been studied with respect to various characteristics and applications and are known as successful colonizers of their habitats, efficiently fighting their competitors. Once established, they launch their potent degradative machinery for decomposition of the often heterogeneous substrate at hand. Therefore, distribution and phylogeny, defense mechanisms, beneficial as well as deleterious interaction with hosts, enzyme production and secretion, sexual development, and response to environmental conditions such as nutrients and light have been studied in great detail with many species of this genus, thus rendering Trichoderma one of the best studied fungi with the genome of three species currently available. Efficient biocontrol strains of the genus are being developed as promising biological fungicides, and their weaponry for this function also includes secondary metabolites with potential applications as novel antibiotics. The cellulases produced by Trichoderma reesei, the biotechnological workhorse of the genus, are important industrial products, especially with respect to production of second generation biofuels from cellulosic waste. Genetic engineering not only led to significant improvements in industrial processes but also to intriguing insights into the biology of these fungi and is now complemented by the availability of a sexual cycle in T. reesei/Hypocrea jecorina, which significantly facilitates both industrial and basic research. This review aims to give a broad overview on the qualities and versatility of the best studied Trichoderma species and to highlight intriguing findings as well as promising applications

    Hybridization and adaptive evolution of diverse Saccharomyces species for cellulosic biofuel production

    Get PDF
    Additional file 15. Summary of whole genome sequencing statistics
    • …
    corecore