Search CORE

21 research outputs found

Unachievable Region in Precision-Recall Space and Its Effect on Empirical Evaluation

Author: Boyd Kendrick
Costa Vitor Santos
Davis Jesse
Page David
Publication venue
Publication date: 30/05/2012
Field of study

Precision-recall (PR) curves and the areas under them are widely used to summarize machine learning results, especially for data sets exhibiting class skew. They are often used analogously to ROC curves and the area under ROC curves. It is known that PR curves vary as class skew changes. What was not recognized before this paper is that there is a region of PR space that is completely unachievable, and the size of this region depends only on the skew. This paper precisely characterizes the size of that region and discusses its implications for empirical evaluation methodology in machine learning.Comment: ICML2012, fixed citations to use correct tech report numbe

arXiv.org e-Print Archive

Minds@University of Wisconsin

PubMed Central

Precision-Recall-Gain Curves: PR Analysis Done Right

Author: Meelis Kull
Peter A Flach
Publication venue
Publication date: 05/03/2020
Field of study

Abstract Precision-Recall analysis abounds in applications of binary classification where true negatives do not add value and hence should not affect assessment of the classifier's performance. Perhaps inspired by the many advantages of receiver operating characteristic (ROC) curves and the area under such curves for accuracybased performance assessment, many researchers have taken to report PrecisionRecall (PR) curves and associated areas as performance metric. We demonstrate in this paper that this practice is fraught with difficulties, mainly because of incoherent scale assumptions -e.g., the area under a PR curve takes the arithmetic mean of precision values whereas the F β score applies the harmonic mean. We show how to fix this by plotting PR curves in a different coordinate system, and demonstrate that the new Precision-Recall-Gain curves inherit all key advantages of ROC curves. In particular, the area under Precision-Recall-Gain curves conveys an expected F 1 score on a harmonic scale, and the convex hull of a PrecisionRecall-Gain curve allows us to calibrate the classifier's scores so as to determine, for each operating point on the convex hull, the interval of β values for which the point optimises F β . We demonstrate experimentally that the area under traditional PR curves can easily favour models with lower expected F 1 score than others, and so the use of Precision-Recall-Gain curves will result in better model selection

CiteSeerX

Identification of long non-coding transcripts with feature selection: a comparative study

Author: Antonietta Spagnuolo
Giovanna M. M. Ventola
Luigi Cerulo
Michele Ceccarelli
Salvatore D’Aniello
Teresa M. R. Noviello
Publication venue: Springer Nature
Publication date: 01/01/2017
Field of study

Table S4. List of features ranked by each algorithm in each species. (XLS 63 kb

Springer - Publisher Connector

FigShare

Doublet identification in single-cell sequencing data using scDblFinder

Author: Garcia Meixide Carlos
Germain Pierre-Luc
Lun Aaron
Macnair Will
Robinson Mark D
Publication venue: 'F1000 Research Ltd'
Publication date: 01/01/2021
Field of study

Doublets are prevalent in single-cell sequencing data and can lead to artifactual findings. A number of strategies have therefore been proposed to detect them. Building on the strengths of existing approaches, we developed scDblFinder, a fast, flexible and accurate Bioconductor-based doublet detection method. Here we present the method, justify its design choices, demonstrate its performance on both single-cell RNA and accessibility (ATAC) sequencing data, and provide some observations on doublet formation, detection, and enrichment analysis. Even in complex datasets, scDblFinder can accurately identify most heterotypic doublets, and was already found by an independent benchmark to outcompete alternatives

ZORA