26 research outputs found
Recommended from our members
Susceptibility Ranking of Electrical Feeders: A Case Study
Ranking problems arise in a wide range of real world applications where an ordering on a set of examples is preferred to a classification model. These applications include collaborative filtering, information retrieval and ranking components of a system by susceptibility to failure. In this paper, we present an ongoing project to rank the feeder cables of a major metropolitan area's electrical grid according to their susceptibility to outages. We describe our framework and the application of machine learning ranking methods, using scores from Support Vector Machines (SVM), RankBoost and Martingale Boosting. Finally, we present our experimental results and the lessons learned from this challenging real-world application
Recommended from our members
Toward Actionable Support Vector Machines: A Ranking-based Approach
During the last decade, Support Vector Machines (SVMs) have attracted a great deal of attention and achieved huge success mainly as powerful classifiers. However, one of the main drawbacks of this learning method is the lack of intelligibility of the results. SVMs are "black box" systems that do not provide insights on the reasons of a classification or explanations - the results produced must be taken on faith. We are concerned about the problem of intelligibility because from our practical experience, domain experts strongly prefer Machine Learning with explanations rather than a black box even if the black box system achieves a high predictive performance. In that context, we have developed a new approach to provide explanations and make SVMs results more actionable. The underlying idea is to produce explanations by applying symbolic Machine Learning models to SVM-produced ranking results. More precisely, we are contrasting SVM results from the top and bottom of rankings to detect the main discriminative properties between classes which can be quite useful for the practitioner to direct actions and understand the system. We applied our approach on several datasets. Our empirical results seem very promising and show the utility of our methodology with regard to the intelligibility and actionability of an SVM output
Application of Sentiment and Topic Analysis to Teacher Evaluation Policy in the U.S
ABSTRACT We examine the potential value of Internet text to understand education policy related to teacher evaluation. We discuss the use of sentiment analysis and topic modeling using articles from the New York Times and Time Magazine, to explore media portrayal of these policies. Findings indicate that sentiment analysis and topic modeling are promising methods for analyzing Internet data in ways that can inform policy decision-making, but there are limitations to account for when interpreting patterns over time
Recommended from our members
Toward Actionable Support Vector Machines: A Ranking-based Approach
During the last decade, Support Vector Machines (SVMs) have attracted a great deal of attention and achieved huge success mainly as powerful classifiers. However, one of the main drawbacks of this learning method is the lack of intelligibility of the results. SVMs are "black box" systems that do not provide insights on the reasons of a classification or explanations - the results produced must be taken on faith. We are concerned about the problem of intelligibility because from our practical experience, domain experts strongly prefer Machine Learning with explanations rather than a black box even if the black box system achieves a high predictive performance. In that context, we have developed a new approach to provide explanations and make SVMs results more actionable. The underlying idea is to produce explanations by applying symbolic Machine Learning models to SVM-produced ranking results. More precisely, we are contrasting SVM results from the top and bottom of rankings to detect the main discriminative properties between classes which can be quite useful for the practitioner to direct actions and understand the system. We applied our approach on several datasets. Our empirical results seem very promising and show the utility of our methodology with regard to the intelligibility and actionability of an SVM output
Recommended from our members
From Classification Rules to Action Recommendations
Rule induction has attracted a great deal of attention in Machine Learning and Data Mining. However, generating rules is not an end in itself because their applicability is not straightforward especially when the number of rules is large. Ideally, the user would ultimately like to use these rules to decide which actions to take. In the literature, this notion is usually referred to as actionability. The contribution of this paper1 is two-fold: first we propose a survey of the main approaches developed to address actionability. This topic has received growing attention in the past years. We present a classification of the main research in this area as well as a comparative study between the different approaches. Second, we propose a new framework to address actionability. Our goal is to lighten the burden of analyzing a large set of classification rules when the user is confronted with an "unsatisfactory situation" and needs help to decide what appropriate actions to take in order to remedy the situation. The method consists in comparing the situation to a set of classification rules. This is achieved by using a suitable distance that allows one to suggest action recommendations requiring minimal changes to improve the situation. We propose the algorithm DAKAR for learning action recommendations and we present an application to environment protection. Our experiment shows the usefulness of our contribution for action recommendation but also raises some concerns about the impact of the redundancy of a set of rules in learning action recommendations of good quality
Sequential Event Prediction with Association Rules
We consider a supervised learning problem in which data are revealed sequentially and the
goal is to determine what will next be revealed. In the context of this problem, algorithms
based on association rules have a distinct advantage over classical statistical and machine
learning methods; however, there has not previously been a theoretical foundation established for using association rules in supervised learning. We present two simple algorithms
that incorporate association rules, and provide generalization guarantees on these algorithms based on algorithmic stability analysis from statistical learning theory. We include
a discussion of the strict minimum support threshold often used in association rule mining,
and introduce an "adjusted confidence" measure that provides a weaker minimum support
condition that has advantages over the strict minimum support. The paper brings together
ideas from statistical learning theory, association rule mining and Bayesian analysis
Recommended from our members
Interpretable prediction of necrotizing enterocolitis from machine learning analysis of premature infant stool microbiota
Background
Necrotizing enterocolitis (NEC) is a common, potentially catastrophic intestinal disease among very low birthweight premature infants. Affecting up to 15% of neonates born weighing less than 1500 g, NEC causes sudden-onset, progressive intestinal inflammation and necrosis, which can lead to significant bowel loss, multi-organ injury, or death. No unifying cause of NEC has been identified, nor is there any reliable biomarker that indicates an individual patient’s risk of the disease. Without a way to predict NEC in advance, the current medical strategy involves close clinical monitoring in an effort to treat babies with NEC as quickly as possible before irrecoverable intestinal damage occurs. In this report, we describe a novel machine learning application for generating dynamic, individualized NEC risk scores based on intestinal microbiota data, which can be determined from sequencing bacterial DNA from otherwise discarded infant stool. A central insight that differentiates our work from past efforts was the recognition that disease prediction from stool microbiota represents a specific subtype of machine learning problem known as multiple instance learning (MIL).
Results
We used a neural network-based MIL architecture, which we tested on independent datasets from two cohorts encompassing 3595 stool samples from 261 at-risk infants. Our report also introduces a new concept called the “growing bag” analysis, which applies MIL over time, allowing incorporation of past data into each new risk calculation. This approach allowed early, accurate NEC prediction, with a mean sensitivity of 86% and specificity of 90%. True-positive NEC predictions occurred an average of 8 days before disease onset. We also demonstrate that an attention-gated mechanism incorporated into our MIL algorithm permits interpretation of NEC risk, identifying several bacterial taxa that past work has associated with NEC, and potentially pointing the way toward new hypotheses about NEC pathogenesis. Our system is flexible, accepting microbiota data generated from targeted 16S or “shotgun” whole-genome DNA sequencing. It performs well in the setting of common, potentially confounding preterm neonatal clinical events such as perinatal cardiopulmonary depression, antibiotic administration, feeding disruptions, or transitions between breast feeding and formula.
Conclusions
We have developed and validated a robust MIL-based system for NEC prediction from harmlessly collected premature infant stool. While this system was developed for NEC prediction, our MIL approach may also be applicable to other diseases characterized by changes in the human microbiota
Recommended from our members
Data pre-processing for the preterm prediction study MFMU dataset
Preterm birth is a major public health problem with profound implications on society. There would be extreme value in being able to identify women at risk of preterm birth during the course of their pregnancy. Previous research has largely focused on individual risk factors correlated with preterm birth (e.g. prior preterm birth, race, and infection) and less on combining these factors in a way to understand the complex etiologies of preterm birth. We attempt to address this gap by conducting a deeper analysis of the preterm prediction study data collected by the NICHD Maternal Fetal Medicine Units (MFMU) Network, a high-quality data for over 3,000 singleton pregnancies having detailed study visits and biospecimen collection at 24, 26, 28 and 30 weeks gestation. Reports from this dataset used relatively straightforward biostatitistical methodologies such as relative risk assessments to measure associations between risk factors and PTB (Maternal Fetal Medicine Units Net- work. Biostatistical Coordinating Center NICHD Networks, 1995). These methods include descriptive statistics, Pearson correlation, Fisher’s exact tests and linear/logistic regression where risk factors are studied independent of each other. In order to perform detailed experiments on this data using non-linear Support Vector Machines and other machine learning (ML) methodologies, it is necessary to complete several pre-processing steps that we describe in this report