44,107 research outputs found
Uncertainty Quantification Using Neural Networks for Molecular Property Prediction
Uncertainty quantification (UQ) is an important component of molecular
property prediction, particularly for drug discovery applications where model
predictions direct experimental design and where unanticipated imprecision
wastes valuable time and resources. The need for UQ is especially acute for
neural models, which are becoming increasingly standard yet are challenging to
interpret. While several approaches to UQ have been proposed in the literature,
there is no clear consensus on the comparative performance of these models. In
this paper, we study this question in the context of regression tasks. We
systematically evaluate several methods on five benchmark datasets using
multiple complementary performance metrics. Our experiments show that none of
the methods we tested is unequivocally superior to all others, and none
produces a particularly reliable ranking of errors across multiple datasets.
While we believe these results show that existing UQ methods are not sufficient
for all common use-cases and demonstrate the benefits of further research, we
conclude with a practical recommendation as to which existing techniques seem
to perform well relative to others
Circulating antigen tests and urine reagent strips for diagnosis of active schistosomiasis in endemic areas
Background:
Point-of-care (POC) tests for diagnosing schistosomiasis include tests based on circulating antigen detection and urine reagent strip tests. If they had sufficient diagnostic accuracy they could replace conventional microscopy as they provide a quicker answer and are easier to use.
Objectives:
To summarise the diagnostic accuracy of: a) urine reagent strip tests in detecting activeSchistosoma haematobium infection, with microscopy as the reference standard; and b) circulating antigen tests for detecting active Schistosoma infection in geographical regions endemic for Schistosoma mansoni or S. haematobium or both, with microscopy as the reference standard.
Search methods:
We searched the electronic databases MEDLINE, EMBASE, BIOSIS, MEDION, and Health Technology Assessment (HTA) without language restriction up to 30 June 2014.
Selection criteria
We included studies that used microscopy as the reference standard: for S. haematobium, microscopy of urine prepared by filtration, centrifugation, or sedimentation methods; and for S. mansoni, microscopy of stool by Kato-Katz thick smear. We included studies on participants residing in endemic areas only.
Data collection and analysis:
Two review authors independently extracted data, assessed quality of the data using QUADAS-2, and performed meta-analysis where appropriate. Using the variability of test thresholds, we used the hierarchical summary receiver operating characteristic (HSROC) model for all eligible tests (except the circulating cathodic antigen (CCA) POC for S. mansoni, where the bivariate random-effects model was more appropriate). We investigated heterogeneity, and carried out indirect comparisons where data were sufficient. Results for sensitivity and specificity are presented as percentages with 95% confidence intervals (CI).
Main results;
We included 90 studies; 88 from field settings in Africa. The median S. haematobiuminfection prevalence was 41% (range 1% to 89%) and 36% for S. mansoni (range 8% to 95%). Study design and conduct were poorly reported against current standards.
Tests for S. haematobium
Urine reagent test strips versus microscopy
Compared to microscopy, the detection of microhaematuria on test strips had the highest sensitivity and specificity (sensitivity 75%, 95% CI 71% to 79%; specificity 87%, 95% CI 84% to 90%; 74 studies, 102,447 participants). For proteinuria, sensitivity was 61% and specificity was 82% (82,113 participants); and for leukocyturia, sensitivity was 58% and specificity 61% (1532 participants). However, the difference in overall test accuracy between the urine reagent strips for microhaematuria and proteinuria was not found to be different when we compared separate populations (P = 0.25), or when direct comparisons within the same individuals were performed (paired studies; P = 0.21).
When tests were evaluated against the higher quality reference standard (when multiple samples were analysed), sensitivity was marginally lower for microhaematuria (71% vs 75%) and for proteinuria (49% vs 61%). The specificity of these tests was comparable.
Antigen assay
Compared to microscopy, the CCA test showed considerable heterogeneity; meta-analytic sensitivity estimate was 39%, 95% CI 6% to 73%; specificity 78%, 95% CI 55% to 100% (four studies, 901 participants).
Tests for S. mansoni
Compared to microscopy, the CCA test meta-analytic estimates for detecting S. mansoni at a single threshold of trace positive were: sensitivity 89% (95% CI 86% to 92%); and specificity 55% (95% CI 46% to 65%; 15 studies, 6091 participants) Against a higher quality reference standard, the sensitivity results were comparable (89% vs 88%) but specificity was higher (66% vs 55%). For the CAA test, sensitivity ranged from 47% to 94%, and specificity from 8% to 100% (four studies, 1583 participants).
Authors' conclusions:
Among the evaluated tests for S. haematobium infection, microhaematuria correctly detected the largest proportions of infections and non-infections identified by microscopy.
The CCA POC test for S. mansoni detects a very large proportion of infections identified by microscopy, but it misclassifies a large proportion of microscopy negatives as positives in endemic areas with a moderate to high prevalence of infection, possibly because the test is potentially more sensitive than microscopy
Reconstructing dynamical networks via feature ranking
Empirical data on real complex systems are becoming increasingly available.
Parallel to this is the need for new methods of reconstructing (inferring) the
topology of networks from time-resolved observations of their node-dynamics.
The methods based on physical insights often rely on strong assumptions about
the properties and dynamics of the scrutinized network. Here, we use the
insights from machine learning to design a new method of network reconstruction
that essentially makes no such assumptions. Specifically, we interpret the
available trajectories (data) as features, and use two independent feature
ranking approaches -- Random forest and RReliefF -- to rank the importance of
each node for predicting the value of each other node, which yields the
reconstructed adjacency matrix. We show that our method is fairly robust to
coupling strength, system size, trajectory length and noise. We also find that
the reconstruction quality strongly depends on the dynamical regime
Can Who-Edits-What Predict Edit Survival?
As the number of contributors to online peer-production systems grows, it
becomes increasingly important to predict whether the edits that users make
will eventually be beneficial to the project. Existing solutions either rely on
a user reputation system or consist of a highly specialized predictor that is
tailored to a specific peer-production system. In this work, we explore a
different point in the solution space that goes beyond user reputation but does
not involve any content-based feature of the edits. We view each edit as a game
between the editor and the component of the project. We posit that the
probability that an edit is accepted is a function of the editor's skill, of
the difficulty of editing the component and of a user-component interaction
term. Our model is broadly applicable, as it only requires observing data about
who makes an edit, what the edit affects and whether the edit survives or not.
We apply our model on Wikipedia and the Linux kernel, two examples of
large-scale peer-production systems, and we seek to understand whether it can
effectively predict edit survival: in both cases, we provide a positive answer.
Our approach significantly outperforms those based solely on user reputation
and bridges the gap with specialized predictors that use content-based
features. It is simple to implement, computationally inexpensive, and in
addition it enables us to discover interesting structure in the data.Comment: Accepted at KDD 201
- …