153 research outputs found
FraGNNet: A Deep Probabilistic Model for Mass Spectrum Prediction
The process of identifying a compound from its mass spectrum is a critical
step in the analysis of complex mixtures. Typical solutions for the mass
spectrum to compound (MS2C) problem involve matching the unknown spectrum
against a library of known spectrum-molecule pairs, an approach that is limited
by incomplete library coverage. Compound to mass spectrum (C2MS) models can
improve retrieval rates by augmenting real libraries with predicted spectra.
Unfortunately, many existing C2MS models suffer from problems with prediction
resolution, scalability, or interpretability. We develop a new probabilistic
method for C2MS prediction, FraGNNet, that can efficiently and accurately
predict high-resolution spectra. FraGNNet uses a structured latent space to
provide insight into the underlying processes that define the spectrum. Our
model achieves state-of-the-art performance in terms of prediction error, and
surpasses existing C2MS models as a tool for retrieval-based MS2C.Comment: 21 pages, 4 figures, 9 table
Exploring the association between Alzheimer’s disease, oral health, microbial endocrinology and nutrition
Longitudinal monitoring of patients suggests a causal link between chronic periodontitis and the development of Alzheimer’s disease (AD). However, the explanation of how periodontitis can lead to dementia remains unclear. A working hypothesis links extrinsic inflammation as a secondary cause of AD. This hypothesis suggests a compromised oral hygiene leads to a dysbiotic oral microbiome whereby Porphyromonas gingivalis, a keystone periodontal pathogen, with its companion species, orchestrates immune subversion in the host. Brushing and chewing on teeth supported by already injured soft tissues leads to bacteraemias. As a result, a persistent systemic inflammatory response develops to periodontal pathogens. The pathogens, and the host’s inflammatory response, subsequently lead to the initiation and progression of multiple metabolic and inflammatory co-morbidities, including AD. Insufficient levels of essential micronutrients can lead to microbial dysbiosis through the growth of periodontal pathogens such as demonstrated for P. gingivalis under low hemin bioavailability. An individual’s diet also defines the consortium of microbial communities that take up residency in the oral and gastrointestinal (GI) tract microbiomes. Their imbalance can lead to behavioural changes. For example, probiotics enriched in Lactobacillus genus of bacteria, when ingested, exert some anti-inflammatory influence through common host/bacterial neurochemicals, both locally, and through sensory signalling back to the brain. Early life dietary behaviours may cause an imbalance in the host/microbial endocrinology through a dietary intake incompatible with a healthy GI tract microbiome later in life. This imbalance in host/microbial endocrinology may have a lasting impact on mental health. This observation opens up an opportunity to explore the mechanisms, which may underlie the previously detected relationship between diet, oral/GI microbial communities, to anxiety, cognition and sleep patterns. This review suggests healthy diet based interventions that together with improved life style/behavioural changes may reduce and/or delay the incidence of AD
BASys: a web server for automated bacterial genome annotation
BASys (Bacterial Annotation System) is a web server that supports automated, in-depth annotation of bacterial genomic (chromosomal and plasmid) sequences. It accepts raw DNA sequence data and an optional list of gene identification information and provides extensive textual annotation and hyperlinked image output. BASys uses >30 programs to determine ∼60 annotation subfields for each gene, including gene/protein name, GO function, COG function, possible paralogues and orthologues, molecular weight, isoelectric point, operon structure, subcellular localization, signal peptides, transmembrane regions, secondary structure, 3D structure, reactions and pathways. The depth and detail of a BASys annotation matches or exceeds that found in a standard SwissProt entry. BASys also generates colorful, clickable and fully zoomable maps of each query chromosome to permit rapid navigation and detailed visual analysis of all resulting gene annotations. The textual annotations and images that are provided by BASys can be generated in ∼24 h for an average bacterial chromosome (5 Mb). BASys annotations may be viewed and downloaded anonymously or through a password protected access system. The BASys server and databases can also be downloaded and run locally. BASys is accessible at
Multiplicity distribution and spectra of negatively charged hadrons in Au+Au collisions at sqrt(s_nn) = 130 GeV
The minimum bias multiplicity distribution and the transverse momentum and
pseudorapidity distributions for central collisions have been measured for
negative hadrons (h-) in Au+Au interactions at sqrt(s_nn) = 130 GeV. The
multiplicity density at midrapidity for the 5% most central interactions is
dNh-/deta|_{eta = 0} = 280 +- 1(stat)+- 20(syst), an increase per participant
of 38% relative to ppbar collisions at the same energy. The mean transverse
momentum is 0.508 +- 0.012 GeV/c and is larger than in central Pb+Pb collisions
at lower energies. The scaling of the h- yield per participant is a strong
function of pt. The pseudorapidity distribution is almost constant within
|eta|<1.Comment: 6 pages, 3 figure
Explaining Naive Bayes Classifications
Technical report TR03-09. Naive Bayes classifiers, a popular tool for predicting the labels of query instances, are typically learned from a training set. However, since many training sets contain noisy data, a classifier user may be reluctant to blindly trust a predicted label. We present a novel graphical explanation facility for Naive Bayes classifiers that serves three purposes. First, it transparently explains the reasoning used by the classifier to foster user confidence in the prediction. Second, it enhances the user's understanding of the complex relationships between the features and the labels. Third, it can help the user to identify suspicious training data. We demonstrate these ideas in the context of our implemented web-based system, which uses examples from molecular biology.-based system, which uses examples from molecular biology. | TRID-ID TR03-0
Prediction of overall survival for patients with metastatic castration-resistant prostate cancer : development of a prognostic model through a crowdsourced challenge with open clinical trial data
Background Improvements to prognostic models in metastatic castration-resistant prostate cancer have the potential to augment clinical trial design and guide treatment strategies. In partnership with Project Data Sphere, a not-for-profit initiative allowing data from cancer clinical trials to be shared broadly with researchers, we designed an open-data, crowdsourced, DREAM (Dialogue for Reverse Engineering Assessments and Methods) challenge to not only identify a better prognostic model for prediction of survival in patients with metastatic castration-resistant prostate cancer but also engage a community of international data scientists to study this disease. Methods Data from the comparator arms of four phase 3 clinical trials in first-line metastatic castration-resistant prostate cancer were obtained from Project Data Sphere, comprising 476 patients treated with docetaxel and prednisone from the ASCENT2 trial, 526 patients treated with docetaxel, prednisone, and placebo in the MAINSAIL trial, 598 patients treated with docetaxel, prednisone or prednisolone, and placebo in the VENICE trial, and 470 patients treated with docetaxel and placebo in the ENTHUSE 33 trial. Datasets consisting of more than 150 clinical variables were curated centrally, including demographics, laboratory values, medical history, lesion sites, and previous treatments. Data from ASCENT2, MAINSAIL, and VENICE were released publicly to be used as training data to predict the outcome of interest-namely, overall survival. Clinical data were also released for ENTHUSE 33, but data for outcome variables (overall survival and event status) were hidden from the challenge participants so that ENTHUSE 33 could be used for independent validation. Methods were evaluated using the integrated time-dependent area under the curve (iAUC). The reference model, based on eight clinical variables and a penalised Cox proportional-hazards model, was used to compare method performance. Further validation was done using data from a fifth trial-ENTHUSE M1-in which 266 patients with metastatic castration-resistant prostate cancer were treated with placebo alone. Findings 50 independent methods were developed to predict overall survival and were evaluated through the DREAM challenge. The top performer was based on an ensemble of penalised Cox regression models (ePCR), which uniquely identified predictive interaction effects with immune biomarkers and markers of hepatic and renal function. Overall, ePCR outperformed all other methods (iAUC 0.791; Bayes factor >5) and surpassed the reference model (iAUC 0.743; Bayes factor >20). Both the ePCR model and reference models stratified patients in the ENTHUSE 33 trial into high-risk and low-risk groups with significantly different overall survival (ePCR: hazard ratio 3.32, 95% CI 2.39-4.62, p Interpretation Novel prognostic factors were delineated, and the assessment of 50 methods developed by independent international teams establishes a benchmark for development of methods in the future. The results of this effort show that data-sharing, when combined with a crowdsourced challenge, is a robust and powerful framework to develop new prognostic models in advanced prostate cancer.Peer reviewe
HMDB: the Human Metabolome Database
The Human Metabolome Database (HMDB) is currently the most complete and comprehensive curated collection of human metabolite and human metabolism data in the world. It contains records for more than 2180 endogenous metabolites with information gathered from thousands of books, journal articles and electronic databases. In addition to its comprehensive literature-derived data, the HMDB also contains an extensive collection of experimental metabolite concentration data compiled from hundreds of mass spectra (MS) and Nuclear Magnetic resonance (NMR) metabolomic analyses performed on urine, blood and cerebrospinal fluid samples. This is further supplemented with thousands of NMR and MS spectra collected on purified, reference metabolites. Each metabolite entry in the HMDB contains an average of 90 separate data fields including a comprehensive compound description, names and synonyms, structural information, physico-chemical data, reference NMR and MS spectra, biofluid concentrations, disease associations, pathway information, enzyme data, gene sequence data, SNP and mutation data as well as extensive links to images, references and other public databases. Extensive searching, relational querying and data browsing tools are also provided. The HMDB is designed to address the broad needs of biochemists, clinical chemists, physicians, medical geneticists, nutritionists and members of the metabolomics community. The HMDB is available at
iPTF14yb: THE FIRST DISCOVERY OF A GAMMA-RAY BURST AFTERGLOW INDEPENDENT OF A HIGH-ENERGY TRIGGER
We report here the discovery by the Intermediate Palomar Transient Factory (iPTF) of iPTF14yb, a luminous(Mr ≈ -27.8 mag), cosmological (redshift 1.9733), rapidly fading optical transient. We demonstrate, based onprobabilistic arguments and a comparison with the broader population, that iPTF14yb is the optical afterglow ofthe long-duration gamma-ray burst GRB 140226A. This marks the first unambiguous discovery of a GRBafterglow prior to (and thus entirely independent of) an associated high-energy trigger. We estimate the rate ofiPTF14yb-like sources (i.e., cosmologically distant relativistic explosions) based on iPTF observations, inferringan all-sky value of Rrel = 610 yr?1 (68% confidence interval of 1102000 yr?1). Our derived rate is consistent(within the large uncertainty) with the all-sky rate of on-axis GRBs derived by the Swift satellite. Finally, webriefly discuss the implications of the nondetection to date of bona fide orphan afterglows (i.e., those lackingdetectable high-energy emission) on GRB beaming and the degree of baryon loading in these relativistic jets
- …
