177 research outputs found

    Significance Testing Against the Random Model for Scoring Models on Top k Predictions

    Get PDF
    Performance at top k predictions, where instances are ranked by a (learned) scoring model, has been used as an evaluation metric in machine learning for various reasons such as where the entire corpus is unknown (e.g., the web) or where the results are to be used by a person with limited time or resources (e.g., ranking financial news stories where the investor only has time to look at relatively few stories per day). This evaluation metric is primarily used to report whether the performance of a given method is significantly better than other (baseline) methods. It has not, however, been used to show whether the result is significant when compared to the simplest of baselines â the random model. If no models outperform the random model at a given confidence interval, then the results may not be worth reporting. This paper introduces a technique to perform an analysis of the expected performance of the top k predictions from the random model given k and a p-value on an evaluation dataset D. The technique is based on the realization that the distribution of the number of positives seen in the top k predictions follows a hypergeometric distribution, which has welldefined statistical density functions. As this distribution is discrete, we show that using parametric estimations based on a binomial distribution are almost always in complete agreement with the discrete distribution and that, if they differ, an interpolation of the discrete bounds gets very close to the parametric estimations. The technique is demonstrated on results from three prior published works, in which it clearly shows that even though performance is greatly increased (sometimes over 100%) with respect to the expected performance of the random model (at p = 0.5), these results, although qualitatively impressive, are not always as significant (p = 0.1) as might be suggested by the impressive qualitative improvements. The technique is used to show, given k, both how many positive instances are needed to achieve a specific significance threshold is as well as how significant a given top k performance is. The technique when used in a more global setting is able to identify the crossover points, with respect to k, when a method becomes significant for a given p. Lastly, the technique is used to generate a complete confidence curve, which shows a general trend over all k and visually shows where a method is significantly better than the random model over all values of k.Information Systems Working Papers Serie

    Confidence Bands for ROC Curves: Methods and an Empirical Study

    Get PDF
    In this paper we study techniques for generating and evaluating confidence bands on ROC curves. ROC curve evaluation is rapidly becoming a commonly used evaluation metric in machine learning, although evaluating ROC curves has thus far been limited to studying the area under the curve (AUC) or generation of one-dimensional confidence intervals by freezing one variable—the false-positive rate, or threshold on the classification scoring function. Researchers in the medical field have long been using ROC curves and have many well-studied methods for analyzing such curves, including generating confidence intervals as well as simultaneous confidence bands. In this paper we introduce these techniques to the machine learning community and show their empirical fitness on the Covertype data set—a standard machine learning benchmark from the UCI repository. We show how some of these methods work remarkably well, others are too loose, and that existing machine learning methods for generation of 1-dimensional confidence intervals do not translate well to generation of simultanous bands—their bands are too tight.NYU, Stern School of Business, IOMS Department, Center for Digital Economy Researc

    A Simple Relational Classifier

    Get PDF
    We analyze a Relational Neighbor (RN) classifier, a simple relational predictive model that predicts only based on class labels of related neighbors, using no learning and no inherent attributes.We show that it performs surprisingly well by comparing it to more complex models such as Probabilistic Relational Models and Relational Probability Trees on three data sets from published work. We argue that a simple model such as this should be used as a baseline to assess the performance of relational learners.NYU, Stern School of Business, IOMS department, Center for Digital Economy Researc

    Confidence Bands for Roc Curves

    Get PDF
    In this paper we study techniques for generating and evaluating confidence bands on ROC curves. ROC curve evaluation is rapidly becoming a commonly used evaluation metric in machine learning, although evaluating ROC curves has thus far been limited to studying the area under the curve (AUC) or generation of one-dimensional confidence intervals by freezing one variableâ the false-positive rate, or threshold on the classification scoring function. Researchers in the medical field have long been using ROC curves and have many well-studied methods for analyzing such curves, including generating confidence intervals as well as simultaneous confidence bands. In this paper we introduce these techniques to the machine learning community and show their empirical fitness on the Covertype data setâa standard machine learning benchmark from the UCI repository. We show how some of these methods work remarkably well, others are too loose, and that existing machine learning methods for generation of 1-dimensional confidence intervals do not translate well to generation of simultaneous bandsâtheir bands are too tight.Information Systems Working Papers Serie

    Predicting citation rates for physics papers: Constructing features for an ordered probit model

    Get PDF
    Gehrke et al. introduce the citation prediction task in their paper "Overview of the KDD Cup 2003" (in this issue). The objective was to predict the change in the number of citations a paper will receive-not the absolute number of citations. There are obvious factors affecting the number of citations including the quality and the topic of the paper, and the reputation of the authors. However it is not clear which factors might influence the change in citations between quarters, rendering the construction of predictive features a challenging task. A high quality and timely paper will be cited more often than a lower quality paper, but that does not suggest the change in citation counts. The selection of training data was critical, as the evaluation would only be on papers that received more than 5 citations in the quarter following the submission of results. After considering several modeling approaches, we used a modified version of an ordered probit model. We describe each of these steps in turn.NYU, Stern School of Business, IOMS Department, Center for Digital Economy Researc

    Significance Testing Against the Random Model for Scoring Models on Top k Predictions

    Get PDF
    Performance at top k predictions, where instances are ranked by a (learned) scoring model, has been used as an evaluation metric in machine learning for various reasons such as where the entire corpus is unknown (e.g., the web) or where the results are to be used by a person with limited time or resources (e.g., ranking financial news stories where the investor only has time to look at relatively few stories per day). This evaluation metric is primarily used to report whether the performance of a given method is significantly better than other (baseline) methods. It has not, however, been used to show whether the result is significant when compared to the simplest of baselines â the random model. If no models outperform the random model at a given confidence interval, then the results may not be worth reporting. This paper introduces a technique to perform an analysis of the expected performance of the top k predictions from the random model given k and a p-value on an evaluation dataset D. The technique is based on the realization that the distribution of the number of positives seen in the top k predictions follows a hypergeometric distribution, which has welldefined statistical density functions. As this distribution is discrete, we show that using parametric estimations based on a binomial distribution are almost always in complete agreement with the discrete distribution and that, if they differ, an interpolation of the discrete bounds gets very close to the parametric estimations. The technique is demonstrated on results from three prior published works, in which it clearly shows that even though performance is greatly increased (sometimes over 100%) with respect to the expected performance of the random model (at p = 0.5), these results, although qualitatively impressive, are not always as significant (p = 0.1) as might be suggested by the impressive qualitative improvements. The technique is used to show, given k, both how many positive instances are needed to achieve a specific significance threshold is as well as how significant a given top k performance is. The technique when used in a more global setting is able to identify the crossover points, with respect to k, when a method becomes significant for a given p. Lastly, the technique is used to generate a complete confidence curve, which shows a general trend over all k and visually shows where a method is significantly better than the random model over all values of k.Information Systems Working Papers Serie

    Excellent diagnostic characteristics for ultrafast gene profiling of DEFA1-IL1B-LTF in detection of prosthetic joint infections

    Get PDF
    The timely and exact diagnosis of prosthetic joint infection (PJI) is crucial for surgical decision-making. Intraoperatively, delivery of the result within an hour is required. Alpha-defensin lateral immunoassay of joint fluid (JF) is precise for the intraoperative exclusion of PJI; however, for patients with a limited amount of JF and/or in cases where the JF is bloody, this test is unhelpful. Important information is hidden in periprosthetic tissues that may much better reflect the current status of implant pathology. We therefore investigated the utility of the gene expression patterns of 12 candidate genes (TLR1, -2, -4, -6, and 10, DEFA1, LTF, IL1B, BPI, CRP, IFNG, and DEFB4A) previously associated with infection for detection of PJI in periprosthetic tissues of patients with total joint arthroplasty (TJA) (n = 76) reoperated for PJI (n = 38) or aseptic failure (n = 38), using the ultrafast quantitative reverse transcription-PCR (RT-PCR) Xxpress system (BJS Biotechnologies Ltd.). Advanced data-mining algorithms were applied for data analysis. For PJI, we detected elevated mRNA expression levels of DEFA1 (P < 0.0001), IL1B (P < 0.0001), LTF (P < 0.0001), TLR1 (P = 0.02), and BPI (P = 0.01) in comparison to those in tissues from aseptic cases. A feature selection algorithm revealed that the DEFA1-IL1B-LTF pattern was the most appropriate for detection/exclusion of PJI, achieving 94.5% sensitivity and 95.7% specificity, with likelihood ratios (LRs) for positive and negative results of 16.3 and 0.06, respectively. Taken together, the results show that DEFA1-IL1B-LTF gene expression detection by use of ultrafast qRT-PCR linked to an electronic calculator allows detection of patients with a high probability of PJI within 45 min after sampling. Further testing on a larger cohort of patients is needed.Web of Science5592697268

    Confidence Bands for Roc Curves

    Get PDF
    In this paper we study techniques for generating and evaluating confidence bands on ROC curves. ROC curve evaluation is rapidly becoming a commonly used evaluation metric in machine learning, although evaluating ROC curves has thus far been limited to studying the area under the curve (AUC) or generation of one-dimensional confidence intervals by freezing one variableâ the false-positive rate, or threshold on the classification scoring function. Researchers in the medical field have long been using ROC curves and have many well-studied methods for analyzing such curves, including generating confidence intervals as well as simultaneous confidence bands. In this paper we introduce these techniques to the machine learning community and show their empirical fitness on the Covertype data setâa standard machine learning benchmark from the UCI repository. We show how some of these methods work remarkably well, others are too loose, and that existing machine learning methods for generation of 1-dimensional confidence intervals do not translate well to generation of simultaneous bandsâtheir bands are too tight.Information Systems Working Papers Serie

    DeepWalk: Online Learning of Social Representations

    Full text link
    We present DeepWalk, a novel approach for learning latent representations of vertices in a network. These latent representations encode social relations in a continuous vector space, which is easily exploited by statistical models. DeepWalk generalizes recent advancements in language modeling and unsupervised feature learning (or deep learning) from sequences of words to graphs. DeepWalk uses local information obtained from truncated random walks to learn latent representations by treating walks as the equivalent of sentences. We demonstrate DeepWalk's latent representations on several multi-label network classification tasks for social networks such as BlogCatalog, Flickr, and YouTube. Our results show that DeepWalk outperforms challenging baselines which are allowed a global view of the network, especially in the presence of missing information. DeepWalk's representations can provide F1F_1 scores up to 10% higher than competing methods when labeled data is sparse. In some experiments, DeepWalk's representations are able to outperform all baseline methods while using 60% less training data. DeepWalk is also scalable. It is an online learning algorithm which builds useful incremental results, and is trivially parallelizable. These qualities make it suitable for a broad class of real world applications such as network classification, and anomaly detection.Comment: 10 pages, 5 figures, 4 table
    • …
    corecore