132 research outputs found

    Streaming histogram sketching for rapid microbiome analytics

    Get PDF
    Background: The growth in publically available microbiome data in recent years has yielded an invaluable resource for genomic research, allowing for the design of new studies, augmentation of novel datasets and reanalysis of published works. This vast amount of microbiome data, as well as the widespread proliferation of microbiome research and the looming era of clinical metagenomics, means there is an urgent need to develop analytics that can process huge amounts of data in a short amount of time. To address this need, we propose a new method for the compact representation of microbiome sequencing data using similarity-preserving sketches of streaming k-mer spectra. These sketches allow for dissimilarity estimation, rapid microbiome catalogue searching and classification of microbiome samples in near real time. Results: We apply streaming histogram sketching to microbiome samples as a form of dimensionality reduction, creating a compressed ‘histosketch’ that can efficiently represent microbiome k-mer spectra. Using public microbiome datasets, we show that histosketches can be clustered by sample type using the pairwise Jaccard similarity estimation, consequently allowing for rapid microbiome similarity searches via a locality sensitive hashing indexing scheme. Furthermore, we use a ‘real life’ example to show that histosketches can train machine learning classifiers to accurately label microbiome samples. Specifically, using a collection of 108 novel microbiome samples from a cohort of premature neonates, we trained and tested a random forest classifier that could accurately predict whether the neonate had received antibiotic treatment (97% accuracy, 96% precision) and could subsequently be used to classify microbiome data streams in less than 3 s. Conclusions: Our method offers a new approach to rapidly process microbiome data streams, allowing samples to be rapidly clustered, indexed and classified. We also provide our implementation, Histosketching Using Little K-mers (HULK), which can histosketch a typical 2 GB microbiome in 50 s on a standard laptop using four cores, with the sketch occupying 3000 bytes of disk space

    Recurrent SARS-CoV-2 mutations in immunodeficient patients

    Get PDF
    Long-term severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infections in immunodeficient patients are an important source of variation for the virus but are understudied. Many case studies have been published which describe one or a small number of long-term infected individuals but no study has combined these sequences into a cohesive dataset. This work aims to rectify this and study the genomics of this patient group through a combination of literature searches as well as identifying new case series directly from the COVID-19 Genomics UK (COG-UK) dataset. The spike gene receptor-binding domain and N-terminal domain (NTD) were identified as mutation hotspots. Numerous mutations associated with variants of concern were observed to emerge recurrently. Additionally a mutation in the envelope gene, T30I was determined to be the second most frequent recurrently occurring mutation arising in persistent infections. A high proportion of recurrent mutations in immunodeficient individuals are associated with ACE2 affinity, immune escape, or viral packaging optimisation.There is an apparent selective pressure for mutations that aid cell–cell transmission within the host or persistence which are often different from mutations that aid inter-host transmission, although the fact that multiple recurrent de novo mutations are considered defining for variants of concern strongly indicates that this potential source of novel variants should not be discounted

    The impact of viral mutations on recognition by SARS-CoV-2 specific T cells.

    Get PDF
    We identify amino acid variants within dominant SARS-CoV-2 T cell epitopes by interrogating global sequence data. Several variants within nucleocapsid and ORF3a epitopes have arisen independently in multiple lineages and result in loss of recognition by epitope-specific T cells assessed by IFN-γ and cytotoxic killing assays. Complete loss of T cell responsiveness was seen due to Q213K in the A∗01:01-restricted CD8+ ORF3a epitope FTSDYYQLY207-215; due to P13L, P13S, and P13T in the B∗27:05-restricted CD8+ nucleocapsid epitope QRNAPRITF9-17; and due to T362I and P365S in the A∗03:01/A∗11:01-restricted CD8+ nucleocapsid epitope KTFPPTEPK361-369. CD8+ T cell lines unable to recognize variant epitopes have diverse T cell receptor repertoires. These data demonstrate the potential for T cell evasion and highlight the need for ongoing surveillance for variants capable of escaping T cell as well as humoral immunity.This work is supported by the UK Medical Research Council (MRC); Chinese Academy of Medical Sciences(CAMS) Innovation Fund for Medical Sciences (CIFMS), China; National Institute for Health Research (NIHR)Oxford Biomedical Research Centre, and UK Researchand Innovation (UKRI)/NIHR through the UK Coro-navirus Immunology Consortium (UK-CIC). Sequencing of SARS-CoV-2 samples and collation of data wasundertaken by the COG-UK CONSORTIUM. COG-UK is supported by funding from the Medical ResearchCouncil (MRC) part of UK Research & Innovation (UKRI),the National Institute of Health Research (NIHR),and Genome Research Limited, operating as the Wellcome Sanger Institute. T.I.d.S. is supported by a Well-come Trust Intermediate Clinical Fellowship (110058/Z/15/Z). L.T. is supported by the Wellcome Trust(grant number 205228/Z/16/Z) and by theUniversity of Liverpool Centre for Excellence in Infectious DiseaseResearch (CEIDR). S.D. is funded by an NIHR GlobalResearch Professorship (NIHR300791). L.T. and S.C.M.are also supported by the U.S. Food and Drug Administration Medical Countermeasures Initiative contract75F40120C00085 and the National Institute for Health Research Health Protection Research Unit (HPRU) inEmerging and Zoonotic Infections (NIHR200907) at University of Liverpool inpartnership with Public HealthEngland (PHE), in collaboration with Liverpool School of Tropical Medicine and the University of Oxford.L.T. is based at the University of Liverpool. M.D.P. is funded by the NIHR Sheffield Biomedical ResearchCentre (BRC – IS-BRC-1215-20017). ISARIC4C is supported by the MRC (grant no MC_PC_19059). J.C.K.is a Wellcome Investigator (WT204969/Z/16/Z) and supported by NIHR Oxford Biomedical Research Centreand CIFMS. The views expressed are those of the authors and not necessarily those of the NIHR or MRC

    Monitoring and data quality assessment of the ATLAS liquid argon calorimeter

    Get PDF
    The liquid argon calorimeter is a key component of the ATLAS detector installed at the CERN Large Hadron Collider. The primary purpose of this calorimeter is the measurement of electron and photon kinematic properties. It also provides a crucial input for measuring jets and missing transverse momentum. An advanced data monitoring procedure was designed to quickly identify issues that would affect detector performance and ensure that only the best quality data are used for physics analysis. This article presents the validation procedure developed during the 2011 and 2012 LHC data-taking periods, in which more than 98% of the proton-proton luminosity recorded by ATLAS at a centre-of-mass energy of 7-8 TeV had calorimeter data quality suitable for physics analysis

    Spatial and temporal patterns of root distribution in developing stands of four woody crop species grown with drip irrigation and fertilization.

    Get PDF
    Abstract In forest trees, roots mediate such significant carbon fluxes as primary production and soil C02 efflux. Despite the central role of roots in these critical processes, information on root distribution during stand establishment is limited, yet must be described to accurately predict how various forest types, which are growing with a range of resource limitations, might respond to environmental change. This study reports root length density and biomass development in young stands of eastern cottonwood (Populus deltoidies Bartr.) and American sycamore (Platanus occidentalis L.) that have narrow, high resource site requirements, and compares them with sweetgum (Liquidambar styraczj7ua L.) and loblolly pine (Pinus taeda L.), which have more robust site requirements. Fine roots (5 mm) were sampled to determine spatial distribu-tion in response to fertilizer and irrigation treatments delivered through drip irrigation tubes. Root length density and biomass were predominately controlled by stand development, depth and proximity to drip tubes. After accounting for this spatial and temporal variation, there was a significant increase in RLD with fertilization and irrigation for all genotypes. The response to fertilization was greater than that of irrigation. Both fine and coarse roots responded positively to resources delivered through the drip tube, indicating a wholeroot- system response to resource enrichment and not just a feeder root response. The plastic response to drip tube water and nutrient enrichment demonstmte the capability of root systems to respond to supply heterogeneity by increasing acquisition surface. Fineroot biomass, root density and specific root length were greater for broadleaved species than pine. Roots of all genotypes explored the rooting volume within 2 years, but this occurred faster and to higher root length densities in broadleaved species, indicating they had greater initial opportunity for resource acquisition than pine. Sweetgum's root characteristics and its response to resource availability were similar to the other broadleaved species, despite its hnctional resemblance to pine regarding robust site requirements. It was concluded that genotypes, irrigation arid fertilization significantly influenced tree root system development, which varied spatially in response to resource-supply heterogeneity created by dnp tubes. Knowledge of spatial and temporal patterns of root distribution in these stands will be used to interpret nutrient acquisition and soil respiration measurements
    corecore