110 research outputs found

    Data management challenges for artificial intelligence in plant and agricultural research [version 2; peer review: 2 approved]

    Get PDF
    Artificial Intelligence (AI) is increasingly used within plant science, yet it is far from being routinely and effectively implemented in this domain. Particularly relevant to the development of novel food and agricultural technologies is the development of validated, meaningful and usable ways to integrate, compare and visualise large, multi-dimensional datasets from different sources and scientific approaches. After a brief summary of the reasons for the interest in data science and AI within plant science, the paper identifies and discusses eight key challenges in data management that must be addressed to further unlock the potential of AI in crop and agronomic research, and particularly the application of Machine Learning (AI) which holds much promise for this domain

    Ecology and Transmission of Buruli Ulcer Disease: A Systematic Review

    Get PDF
    Buruli ulcer is a neglected emerging disease that has recently been reported in some countries as the second most frequent mycobacterial disease in humans after tuberculosis. Cases have been reported from at least 32 countries in Africa (mainly west), Australia, Southeast Asia, China, Central and South America, and the Western Pacific. Large lesions often result in scarring, contractual deformities, amputations, and disabilities, and in Africa, most cases of the disease occur in children between the ages of 4–15 years. This environmental mycobacterium, Mycobacterium ulcerans, is found in communities associated with rivers, swamps, wetlands, and human-linked changes in the aquatic environment, particularly those created as a result of environmental disturbance such as deforestation, dam construction, and agriculture. Buruli ulcer disease is often referred to as the “mysterious disease” because the mode of transmission remains unclear, although several hypotheses have been proposed. The above review reveals that various routes of transmission may occur, varying amongst epidemiological setting and geographic region, and that there may be some role for living agents as reservoirs and as vectors of M. ulcerans, in particular aquatic insects, adult mosquitoes or other biting arthropods. We discuss traditional and non-traditional methods for indicting the roles of living agents as biologically significant reservoirs and/or vectors of pathogens, and suggest an intellectual framework for establishing criteria for transmission. The application of these criteria to the transmission of M. ulcerans presents a significant challenge

    The First Post-Kepler Brightness Dips of KIC 8462852

    Get PDF
    We present a photometric detection of the first brightness dips of the unique variable star KIC 8462852 since the end of the Kepler space mission in 2013 May. Our regular photometric surveillance started in October 2015, and a sequence of dipping began in 2017 May continuing on through the end of 2017, when the star was no longer visible from Earth. We distinguish four main 1-2.5% dips, named "Elsie," "Celeste," "Skara Brae," and "Angkor", which persist on timescales from several days to weeks. Our main results so far are: (i) there are no apparent changes of the stellar spectrum or polarization during the dips; (ii) the multiband photometry of the dips shows differential reddening favoring non-grey extinction. Therefore, our data are inconsistent with dip models that invoke optically thick material, but rather they are in-line with predictions for an occulter consisting primarily of ordinary dust, where much of the material must be optically thin with a size scale <<1um, and may also be consistent with models invoking variations intrinsic to the stellar photosphere. Notably, our data do not place constraints on the color of the longer-term "secular" dimming, which may be caused by independent processes, or probe different regimes of a single process

    Phenotypic Characterization of EIF2AK4 Mutation Carriers in a Large Cohort of Patients Diagnosed Clinically With Pulmonary Arterial Hypertension.

    Get PDF
    BACKGROUND: Pulmonary arterial hypertension (PAH) is a rare disease with an emerging genetic basis. Heterozygous mutations in the gene encoding the bone morphogenetic protein receptor type 2 (BMPR2) are the commonest genetic cause of PAH, whereas biallelic mutations in the eukaryotic translation initiation factor 2 alpha kinase 4 gene (EIF2AK4) are described in pulmonary veno-occlusive disease/pulmonary capillary hemangiomatosis. Here, we determine the frequency of these mutations and define the genotype-phenotype characteristics in a large cohort of patients diagnosed clinically with PAH. METHODS: Whole-genome sequencing was performed on DNA from patients with idiopathic and heritable PAH and with pulmonary veno-occlusive disease/pulmonary capillary hemangiomatosis recruited to the National Institute of Health Research BioResource-Rare Diseases study. Heterozygous variants in BMPR2 and biallelic EIF2AK4 variants with a minor allele frequency of <1:10 000 in control data sets and predicted to be deleterious (by combined annotation-dependent depletion, PolyPhen-2, and sorting intolerant from tolerant predictions) were identified as potentially causal. Phenotype data from the time of diagnosis were also captured. RESULTS: Eight hundred sixty-four patients with idiopathic or heritable PAH and 16 with pulmonary veno-occlusive disease/pulmonary capillary hemangiomatosis were recruited. Mutations in BMPR2 were identified in 130 patients (14.8%). Biallelic mutations in EIF2AK4 were identified in 5 patients with a clinical diagnosis of pulmonary veno-occlusive disease/pulmonary capillary hemangiomatosis. Furthermore, 9 patients with a clinical diagnosis of PAH carried biallelic EIF2AK4 mutations. These patients had a reduced transfer coefficient for carbon monoxide (Kco; 33% [interquartile range, 30%-35%] predicted) and younger age at diagnosis (29 years; interquartile range, 23-38 years) and more interlobular septal thickening and mediastinal lymphadenopathy on computed tomography of the chest compared with patients with PAH without EIF2AK4 mutations. However, radiological assessment alone could not accurately identify biallelic EIF2AK4 mutation carriers. Patients with PAH with biallelic EIF2AK4 mutations had a shorter survival. CONCLUSIONS: Biallelic EIF2AK4 mutations are found in patients classified clinically as having idiopathic and heritable PAH. These patients cannot be identified reliably by computed tomography, but a low Kco and a young age at diagnosis suggests the underlying molecular diagnosis. Genetic testing can identify these misclassified patients, allowing appropriate management and early referral for lung transplantation

    Comprehensive Rare Variant Analysis via Whole-Genome Sequencing to Determine the Molecular Pathology of Inherited Retinal Disease

    Get PDF
    Inherited retinal disease is a common cause of visual impairment and represents a highly heterogeneous group of conditions. Here, we present findings from a cohort of 722 individuals with inherited retinal disease, who have had whole-genome sequencing (n = 605), whole-exome sequencing (n = 72), or both (n = 45) performed, as part of the NIHR-BioResource Rare Diseases research study. We identified pathogenic variants (single-nucleotide variants, indels, or structural variants) for 404/722 (56%) individuals. Whole-genome sequencing gives unprecedented power to detect three categories of pathogenic variants in particular: structural variants, variants in GC-rich regions, which have significantly improved coverage compared to whole-exome sequencing, and variants in non-coding regulatory regions. In addition to previously reported pathogenic regulatory variants, we have identified a previously unreported pathogenic intronic variant in CHM\textit{CHM} in two males with choroideremia. We have also identified 19 genes not previously known to be associated with inherited retinal disease, which harbor biallelic predicted protein-truncating variants in unsolved cases. Whole-genome sequencing is an increasingly important comprehensive method with which to investigate the genetic causes of inherited retinal disease.This work was supported by The National Institute for Health Research England (NIHR) for the NIHR BioResource – Rare Diseases project (grant number RG65966). The Moorfields Eye Hospital cohort of patients and clinical and imaging data were ascertained and collected with the support of grants from the National Institute for Health Research Biomedical Research Centre at Moorfields Eye Hospital, National Health Service Foundation Trust, and UCL Institute of Ophthalmology, Moorfields Eye Hospital Special Trustees, Moorfields Eye Charity, the Foundation Fighting Blindness (USA), and Retinitis Pigmentosa Fighting Blindness. M.M. is a recipient of an FFB Career Development Award. E.M. is supported by UCLH/UCL NIHR Biomedical Research Centre. F.L.R. and D.G. are supported by Cambridge NIHR Biomedical Research Centre

    Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel

    Get PDF
    Imputing genotypes from reference panels created by whole-genome sequencing (WGS) provides a cost-effective strategy for augmenting the single-nucleotide polymorphism (SNP) content of genome-wide arrays. The UK10K Cohorts project has generated a data set of 3,781 whole genomes sequenced at low depth (average 7x), aiming to exhaustively characterize genetic variation down to 0.1% minor allele frequency in the British population. Here we demonstrate the value of this resource for improving imputation accuracy at rare and low-frequency variants in both a UK and an Italian population. We show that large increases in imputation accuracy can be achieved by re-phasing WGS reference panels after initial genotype calling. We also present a method for combining WGS panels to improve variant coverage and downstream imputation accuracy, which we illustrate by integrating 7,562 WGS haplotypes from the UK10K project with 2,184 haplotypes from the 1000 Genomes Project. Finally, we introduce a novel approximation that maintains speed without sacrificing imputation accuracy for rare variants

    Comprehensive Cancer-Predisposition Gene Testing in an Adult Multiple Primary Tumor Series Shows a Broad Range of Deleterious Variants and Atypical Tumor Phenotypes.

    Get PDF
    Multiple primary tumors (MPTs) affect a substantial proportion of cancer survivors and can result from various causes, including inherited predisposition. Currently, germline genetic testing of MPT-affected individuals for variants in cancer-predisposition genes (CPGs) is mostly targeted by tumor type. We ascertained pre-assessed MPT individuals (with at least two primary tumors by age 60 years or at least three by 70 years) from genetics centers and performed whole-genome sequencing (WGS) on 460 individuals from 440 families. Despite previous negative genetic assessment and molecular investigations, pathogenic variants in moderate- and high-risk CPGs were detected in 67/440 (15.2%) probands. WGS detected variants that would not be (or were not) detected by targeted resequencing strategies, including low-frequency structural variants (6/440 [1.4%] probands). In most individuals with a germline variant assessed as pathogenic or likely pathogenic (P/LP), at least one of their tumor types was characteristic of variants in the relevant CPG. However, in 29 probands (42.2% of those with a P/LP variant), the tumor phenotype appeared discordant. The frequency of individuals with truncating or splice-site CPG variants and at least one discordant tumor type was significantly higher than in a control population (χ2 = 43.642; p ≤ 0.0001). 2/67 (3%) probands with P/LP variants had evidence of multiple inherited neoplasia allele syndrome (MINAS) with deleterious variants in two CPGs. Together with variant detection rates from a previous series of similarly ascertained MPT-affected individuals, the present results suggest that first-line comprehensive CPG analysis in an MPT cohort referred to clinical genetics services would detect a deleterious variant in about a third of individuals.JW is supported by a Cancer Research UK Cambridge Cancer Centre Clinical Research Training Fellowship. Funding for the NIHR BioResource – Rare diseases project was provided by the National Institute for Health Research (NIHR, grant number RG65966). ERM acknowledges support from the European Research Council (Advanced Researcher Award), NIHR (Senior Investigator Award and Cambridge NIHR Biomedical Research Centre), Cancer Research UK Cambridge Cancer Centre and Medical Research Council Infrastructure Award. The University of Cambridge has received salary support in respect of EM from the NHS in the East of England through the Clinical Academic Reserve. The views expressed are those of the authors and not necessarily those of the NHS or Department of Health. DGE is an NIHR Senior Investigator and is supported by the all Manchester NIHR Biomedical Research Centre

    Telomerecat: A ploidy-agnostic method for estimating telomere length from whole genome sequencing data.

    Get PDF
    Telomere length is a risk factor in disease and the dynamics of telomere length are crucial to our understanding of cell replication and vitality. The proliferation of whole genome sequencing represents an unprecedented opportunity to glean new insights into telomere biology on a previously unimaginable scale. To this end, a number of approaches for estimating telomere length from whole-genome sequencing data have been proposed. Here we present Telomerecat, a novel approach to the estimation of telomere length. Previous methods have been dependent on the number of telomeres present in a cell being known, which may be problematic when analysing aneuploid cancer data and non-human samples. Telomerecat is designed to be agnostic to the number of telomeres present, making it suited for the purpose of estimating telomere length in cancer studies. Telomerecat also accounts for interstitial telomeric reads and presents a novel approach to dealing with sequencing errors. We show that Telomerecat performs well at telomere length estimation when compared to leading experimental and computational methods. Furthermore, we show that it detects expected patterns in longitudinal data, repeated measurements, and cross-species comparisons. We also apply the method to a cancer cell data, uncovering an interesting relationship with the underlying telomerase genotype
    corecore