Search CORE

2,570 research outputs found

Learning Theory Analysis for Association Rules and Sequential Event Prediction

Author: Letham Benjamin
Madigan David B.
Rudin Cynthia
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2013
Field of study

We present a theoretical analysis for prediction algorithms based on association rules. As part of this analysis, we introduce a problem for which rules are particularly natural, called “sequential event prediction." In sequential event prediction, events in a sequence are revealed one by one, and the goal is to determine which event will next be revealed. The training set is a collection of past sequences of events. An example application is to predict which item will next be placed into a customer's online shopping cart, given his/her past purchases. In the context of this problem, algorithms based on association rules have distinct advantages over classical statistical and machine learning methods: they look at correlations based on subsets of co-occurring past events (items a and b imply item c), they can be applied to the sequential event prediction problem in a natural way, they can potentially handle the “cold start" problem where the training set is small, and they yield interpretable predictions. In this work, we present two algorithms that incorporate association rules. These algorithms can be used both for sequential event prediction and for supervised classification, and they are simple enough that they can possibly be understood by users, customers, patients, managers, etc. We provide generalization guarantees on these algorithms based on algorithmic stability analysis from statistical learning theory. We include a discussion of the strict minimum support threshold often used in association rule mining, and introduce an “adjusted confidence" measure that provides a weaker minimum support condition that has advantages over the strict minimum support. The paper brings together ideas from statistical learning theory, association rule mining and Bayesian analysis

CiteSeerX

DSpace@MIT

Columbia University Academic Commons

Recommended from our members

Learning Theory Analysis for Association Rules and Sequential Event Prediction

Author: Rudin Cynthia
Letham Benjamin
Madigan David B.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 09/01/2001
Field of study

Columbia University Academic Commons

TamPub Julkaisuarkisto - TamPub Institutional Repository

Trepo - Institutional Repository of Tampere University

Recommended from our members

Algorithms for Sparse Linear Classifiers in the Massive Data Setting

Author: Balakrishnan Suhrid
Bartlett Peter
Madigan David B.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2008
Field of study

Classifiers favoring sparse solutions, such as support vector machines, relevance vector machines, LASSO-regression based classifiers, etc., provide competitive methods for classification problems in high dimensions. However, current algorithms for training sparse classifiers typically scale quite unfavorably with respect to the number of training examples. This paper proposes online and multi-pass algorithms for training sparse linear classifiers for high dimensional data. These algorithms have computational complexity and memory requirements that make learning on massive data sets feasible. The central idea that makes this possible is a straightforward quadratic approximation to the likelihood function

Columbia University Academic Commons

Recommended from our members

A Note on Equivalence Classes of Directed Acyclic Independence Graphs

Author: Madigan David B.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/1993
Field of study

Directed acyclic independence graphs (DAIGs) play an important role in recent developments in probabilistic expert systems and influence diagrams (Chyu [1]). The purpose of this note is to show that DAIGs can usefully be grouped into equivalence classes where the members of a single class share identical Markov properties. These equivalence classes can be identified via a simple graphical criterion. This result is particularly relevant to model selection procedures for DAIGs (see, e.g., Cooper and Herskovits [2] and Madigan and Raftery [4]) because it reduces the problem of searching among possible orientations of a given graph to that of searching among the equivalence classes

Columbia University Academic Commons

Recommended from our members

Correction: Separation and completeness properties for AMP chain graph Markov models

Author: Levitz Michael
Madigan David B.
Perlman Michael D.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2003
Field of study

Correction of table 2 on page 1757 of 'Separation and completeness properties for AMP chain graph Markov models', Annals of Statistics, volume 29 (2001)

Columbia University Academic Commons

A Flexible Bayesian Generalized Linear Model for Dichotomous Response Data with an Application to Text Categorization

Author: Eyheramendy Susana
Madigan David B.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2007
Field of study

We present a class of sparse generalized linear models that include probit and logistic regression as special cases and offer some extra flexibility. We provide an EM algorithm for learning the parameters of these models from data. We apply our method in text classification and in simulated data and show that our method outperforms the logistic and probit models and also the elastic net, in general by a substantial margin

arXiv.org e-Print Archive

Crossref

Columbia University Academic Commons

Recommended from our members

[Least Angle Regression]: Discussion

Author: Madigan David B.
Ridgeway Greg
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2004
Field of study

Algorithms for simultaneous shrinkage and selection in regression and classification provide attractive solutions to knotty old statistical challenges. Nevertheless, as far as we can tell, Tibshirani's Lasso algorithm has had little impact on statistical practice. Two particular reasons for this may be the relative inefficiency of the original Lasso algorithm and the relative complexity of more recent Lasso algorithms [e.g., Osborne, Presnell and Turlach (2000)]. Efron, Hastie, Johnstone and Tibshirani have provided an efficient, simple algorithm for the Lasso as well as algorithms for stagewise regression and the new least angle regression. As such this paper is an important contribution to statistical computing

Columbia University Academic Commons

Recommended from our members

[A Report on the Future of Statistics]: Comment

Author: Madigan David B.
Stuetzle Werner
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2004
Field of study

"Extraordinary opportunities for statistical ideas and for statisticians now present themselves. However, to take advantage of the opportunities, statistics has to change the way in which it recruits and trains students. Statistics has primarily focused on squeezing the maximum amount of information out of limited data. This paradigm is rapidly diminishing in importance and statistics education finds itself out of step with reality. The problems begin at the high school and undergraduate levels, where the standard course includes a narrow set of pre-computing-era topics. At the graduate level, the typical statistics program suffers from the same problem..." -- page 40

Columbia University Academic Commons

A Hierarchical Model for Association Rule Mining of Sequential Events: An Approach to Automated Medical Symptom Prediction

Author: Madigan David B.
McCormick Tyler H.
Rudin Cynthia
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2011
Field of study

In many healthcare settings, patients visit healthcare professionals periodically and report multiple medical conditions, or symptoms, at each encounter. We propose a statistical modeling technique, called the Hierarchical Association Rule Model (HARM), that predicts a patient’s possible future symptoms given the patient’s current and past history of reported symptoms. The core of our technique is a Bayesian hierarchical model for selecting predictive association rules (such as “symptom 1 and symptom 2 → symptom 3 ”) from a large set of candidate rules. Because this method “borrows strength” using the symptoms of many similar patients, it is able to provide predictions specialized to any given patient, even when little information about the patient’s history of symptoms is available

Crossref

Columbia University Academic Commons