Search CORE

8,544 research outputs found

Evaluation of lntelligent Medical Systems

Author: Tilbury Julian Bernard
Publication venue: 'University of Plymouth'
Publication date: 01/01/2002
Field of study

This thesis presents novel, robust, analytic and algorithmic methods for calculating Bayesian posterior intervals of receiver operating characteristic (ROC) curves and confusion matrices used for the evaluation of intelligent medical systems tested with small amounts of data. Intelligent medical systems are potentially important in encapsulating rare and valuable medical expertise and making it more widely available. The evaluation of intelligent medical systems must make sure that such systems are safe and cost effective. To ensure systems are safe and perform at expert level they must be tested against human experts. Human experts are rare and busy which often severely restricts the number of test cases that may be used for comparison. The performance of expert human or machine can be represented objectively by ROC curves or confusion matrices. ROC curves and confusion matrices are complex representations and it is sometimes convenient to summarise them as a single value. In the case of ROC curves, this is given as the Area Under the Curve (AUC), and for confusion matrices by kappa, or weighted kappa statistics. While there is extensive literature on the statistics of ROC curves and confusion matrices they are not applicable to the measurement of intelligent systems when tested with small data samples, particularly when the AUC or kappa statistic is high. A fundamental Bayesian study has been carried out, and new methods devised, to provide better statistical measures for ROC curves and confusion matrices at low sample sizes. They enable exact Bayesian posterior intervals to be produced for: (1) the individual points on a ROC curve; (2) comparison between matching points on two uncorrelated curves; . (3) the AUC of a ROC curve, using both parametric and nonparametric assumptions; (4) the parameters of a parametric ROC curve; and (5) the weight of a weighted confusion matrix. These new methods have been implemented in software to provide a powerful and accurate tool for developers and evaluators of intelligent medical systems in particular, and to a much wider audience using ROC curves and confusion matrices in general. This should enhance the ability to prove intelligent medical systems safe and effective and should lead to their widespread deployment. The mathematical and computational methods developed in this thesis should also provide the basis for future research into determination of posterior intervals for other statistics at small sample sizes

Plymouth Electronic Archive and Research Library

Confidence Bands for Roc Curves

Author: Macskassy Sofus
Provost Foster
Publication venue: Stern School of Business, New York University
Publication date: 01/01/2003
Field of study

In this paper we study techniques for generating and evaluating confidence bands on ROC curves. ROC curve evaluation is rapidly becoming a commonly used evaluation metric in machine learning, although evaluating ROC curves has thus far been limited to studying the area under the curve (AUC) or generation of one-dimensional confidence intervals by freezing one variableâ the false-positive rate, or threshold on the classification scoring function. Researchers in the medical field have long been using ROC curves and have many well-studied methods for analyzing such curves, including generating confidence intervals as well as simultaneous confidence bands. In this paper we introduce these techniques to the machine learning community and show their empirical fitness on the Covertype data setâa standard machine learning benchmark from the UCI repository. We show how some of these methods work remarkably well, others are too loose, and that existing machine learning methods for generation of 1-dimensional confidence intervals do not translate well to generation of simultaneous bandsâtheir bands are too tight.Information Systems Working Papers Serie

CiteSeerX

New York University Faculty Digital Archive

Confidence Bands for ROC Curves: Methods and an Empirical Study

Author: Macskassy Sofus
Provost Foster
Publication venue: Proceedings of the First Workshop on ROC Analysis in AI. August 2004.
Publication date: 01/08/2004
Field of study

In this paper we study techniques for generating and evaluating confidence bands on ROC curves. ROC curve evaluation is rapidly becoming a commonly used evaluation metric in machine learning, although evaluating ROC curves has thus far been limited to studying the area under the curve (AUC) or generation of one-dimensional confidence intervals by freezing one variable—the false-positive rate, or threshold on the classification scoring function. Researchers in the medical field have long been using ROC curves and have many well-studied methods for analyzing such curves, including generating confidence intervals as well as simultaneous confidence bands. In this paper we introduce these techniques to the machine learning community and show their empirical fitness on the Covertype data set—a standard machine learning benchmark from the UCI repository. We show how some of these methods work remarkably well, others are too loose, and that existing machine learning methods for generation of 1-dimensional confidence intervals do not translate well to generation of simultanous bands—their bands are too tight.NYU, Stern School of Business, IOMS Department, Center for Digital Economy Researc

New York University Faculty Digital Archive

Cross-Modal Data Programming Enables Rapid Medical Machine Learning

Author: Dunnmon Jared
Goldman Roger
Khandwala Nishith
Lee-Messer Christopher
Lungren Matthew
Markert Matthew
Ratner Alexander
Rubin Daniel
Ré Christopher
Saab Khaled
Sagreiya Hersh
Publication venue
Publication date: 26/03/2019
Field of study

Labeling training datasets has become a key barrier to building medical machine learning models. One strategy is to generate training labels programmatically, for example by applying natural language processing pipelines to text reports associated with imaging studies. We propose cross-modal data programming, which generalizes this intuitive strategy in a theoretically-grounded way that enables simpler, clinician-driven input, reduces required labeling time, and improves with additional unlabeled data. In this approach, clinicians generate training labels for models defined over a target modality (e.g. images or time series) by writing rules over an auxiliary modality (e.g. text reports). The resulting technical challenge consists of estimating the accuracies and correlations of these rules; we extend a recent unsupervised generative modeling technique to handle this cross-modal setting in a provably consistent way. Across four applications in radiography, computed tomography, and electroencephalography, and using only several hours of clinician time, our approach matches or exceeds the efficacy of physician-months of hand-labeling with statistical significance, demonstrating a fundamentally faster and more flexible way of building machine learning models in medicine

arXiv.org e-Print Archive

eScholarship - University of California

Understanding metric-related pitfalls in image analysis validation

Author: Acion Laura
Antonelli Michela
Arbel Tal
Bakas Spyridon
Baumgartner Michael
Benis Arriel
Blaschko Matthew
Büttner Florian
Calster Ben Van
Cardoso M. Jorge
Chen Jianxu
Cheplygina Veronika
Christodoulou Evangelia
Cimini Beth A.
Collins Gary S.
Eisenmann Matthias
Farahani Keyvan
Ferrer Luciana
Galdran Adrian
Ginneken Bram van
Glocker Ben
Godau Patrick
Haase Robert
Hashimoto Daniel A.
Heckmann-Nötzel Doreen
Hoffman Michael M.
Huisman Merel
Isensee Fabian
Jannin Pierre
Jäger Paul F.
Kahn Charles E.
Kainmueller Dagmar
Kainz Bernhard
Karargyris Alexandros
Karthikesalingam Alan
Kavur A. Emre
Kenngott Hannes
Kleesiek Jens
Kofler Florian
Kooi Thijs
Kopp-Schneider Annette
Kozubek Michal
Kreshuk Anna
Kurc Tahsin
Landman Bennett A.
Litjens Geert
Madani Amin
Maier-Hein Klaus
Maier-Hein Lena
Martel Anne L.
Mattson Peter
Meijering Erik
Menze Bjoern
Moons Karel G. M.
Müller Henning
Nichyporuk Brennan
Nickel Felix
Petersen Jens
Rafelski Susanne M.
Rajpoot Nasir
Reinke Annika
Reyes Mauricio
Riegler Michael A.
Rieke Nicola
Rädsch Tim
Saez-Rodriguez Julio
Shetty Shravya
Smeden Maarten van
Sudre Carole H.
Summers Ronald M.
Sánchez Clara I.
Taha Abdel A.
Tiulpin Aleksei
Tizabi Minu D.
Tsaftaris Sotirios A.
Varoquaux Gaël
Wiesenfarth Manuel
Yaniv Ziv R.
Publication venue
Publication date: 01/01/2023
Field of study

Validation metrics are key for the reliable tracking of scientific progress and for bridging the current chasm between artificial intelligence (AI) research and its translation into practice. However, increasing evidence shows that particularly in image analysis, metrics are often chosen inadequately in relation to the underlying research problem. This could be attributed to a lack of accessibility of metric-related knowledge: While taking into account the individual strengths, weaknesses, and limitations of validation metrics is a critical prerequisite to making educated choices, the relevant knowledge is currently scattered and poorly accessible to individual researchers. Based on a multi-stage Delphi process conducted by a multidisciplinary expert consortium as well as extensive community feedback, the present work provides the first reliable and comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Focusing on biomedical image analysis but with the potential of transfer to other fields, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy. To facilitate comprehension, illustrations and specific examples accompany each pitfall. As a structured body of information accessible to researchers of all levels of expertise, this work enhances global comprehension of a key topic in image analysis validation.Comment: Shared first authors: Annika Reinke, Minu D. Tizabi; shared senior authors: Paul F. J\"ager, Lena Maier-Hei

arXiv.org e-Print Archive

IUPUIScholarWorks

INRIA a CCSD electronic archive server

Edinburgh Research Explorer

Warwick Research Archives Portal Repository

HAL-CEA

Bern Open Repository and Information System (BORIS)

Utrecht University Repository

HAL-Rennes 1

Recommended from our members

Early symptoms and sensations as predictors of lung cancer: a machine learning multivariate model.

Author: Bernhardson B-M.
Eriksson L. E.
Forshed J.
Henriksson R.
Kölbeck K.
Lehtiö J.
Levitsky A.
Olin M.
Pernemalm M.
Tishelman C.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

The aim of this study was to identify a combination of early predictive symptoms/sensations attributable to primary lung cancer (LC). An interactive e-questionnaire comprised of pre-diagnostic descriptors of first symptoms/sensations was administered to patients referred for suspected LC. Respondents were included in the present analysis only if they later received a primary LC diagnosis or had no cancer; and inclusion of each descriptor required ≥4 observations. Fully-completed data from 506/670 individuals later diagnosed with primary LC (n = 311) or no cancer (n = 195) were modelled with orthogonal projections to latent structures (OPLS). After analysing 145/285 descriptors, meeting inclusion criteria, through randomised seven-fold cross-validation (six-fold training set: n = 433; test set: n = 73), 63 provided best LC prediction. The most-significant LC-positive descriptors included a cough that varied over the day, back pain/aches/discomfort, early satiety, appetite loss, and having less strength. Upon combining the descriptors with the background variables current smoking, a cold/flu or pneumonia within the past two years, female sex, older age, a history of COPD (positive LC-association); antibiotics within the past two years, and a history of pneumonia (negative LC-association); the resulting 70-variable model had accurate cross-validated test set performance: area under the ROC curve = 0.767 (descriptors only: 0.736/background predictors only: 0.652), sensitivity = 84.8% (73.9/76.1%, respectively), specificity = 55.6% (66.7/51.9%, respectively). In conclusion, accurate prediction of LC was found through 63 early symptoms/sensations and seven background factors. Further research and precision in this model may lead to a tool for referral and LC diagnostic decision-making

City Research Online

Publikationer från Umeå universitet

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Investigating the detection of adverse drug events in a UK general practice electronic health-care database

Author: Aickelin Uwe
Feyereisl Jan
Garibaldi Jonathan M.
Gibson Jack E.
Hubbard Richard B.
Reps Jenna
Publication venue
Publication date: 01/01/2011
Field of study

Data-mining techniques have frequently been developed for Spontaneous reporting databases. These techniques aim to find adverse drug events accurately and efficiently. Spontaneous reporting databases are prone to missing information,under reporting and incorrect entries. This often results in a detection lag or prevents the detection of some adverse drug events. These limitations do not occur in electronic healthcare databases. In this paper, existing methods developed for spontaneous reporting databases are implemented on both a spontaneous reporting database and a general practice electronic health-care database and compared. The results suggests that the application of existing methods to the general practice database may help find signals that have gone undetected when using the spontaneous reporting system database. In addition the general practice database provides far more supplementary information, that if incorporated in analysis could provide a wealth of information for identifying adverse events more accurately

Nottingham eTheses

AUC confidence bounds for performance evaluation of Brain-Computer Interface

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref