Search CORE

21 research outputs found

Margins and combined classifiers

Author: Mason Llew
Publication venue
Publication date: 31/08/2018
Field of study

Integrating E-Commerce and Data Mining: Architecture and Challenges

Author: Ansari Suhail
Kohavi Ron
Mason Llew
Zheng Zijian
Publication venue
Publication date: 01/01/2000
Field of study

We show that the e-commerce domain can provide all the right ingredients for successful data mining and claim that it is a killer domain for data mining. We describe an integrated architecture, based on our expe-rience at Blue Martini Software, for supporting this integration. The architecture can dramatically reduce the pre-processing, cleaning, and data understanding effort often documented to take 80% of the time in knowledge discovery projects. We emphasize the need for data collection at the application server layer (not the web server) in order to support logging of data and metadata that is essential to the discovery process. We describe the data transformation bridges required from the transaction processing systems and customer event streams (e.g., clickstreams) to the data warehouse. We detail the mining workbench, which needs to provide multiple views of the data through reporting, data mining algorithms, visualization, and OLAP. We con-clude with a set of challenges.Comment: KDD workshop: WebKDD 200

arXiv.org e-Print Archive

CiteSeerX

Generalization Error of Combined Classifiers

Author: Bartlett Peter L.
Golea Mostefa
Mason Llew
Publication venue: Elsevier Science (USA).
Publication date: 30/09/2002
Field of study

AbstractWe derive an upper bound on the generalization error of classifiers which can be represented as thresholded convex combinations of thresholded convex combinations of functions. Such classifiers include single hidden-layer threshold networks and voted combinations of decision trees (such as those produced by boosting algorithms). The derived bound depends on the proportion of training examples with margin less than some threshold and the average complexity of the combined functions (where the average is over the weights assigned to each function in the convex combination). The complexity of the individual functions in the combination depends on their closeness to threshold. By representing a decision tree as a thresholded convex combination of weighted leaf functions, we apply this result to bound the generalization error of combinations of decision trees. Previous bounds depend on the margin of the combined classifier and the average complexity of the decision trees in the combination, where the complexity of each decision tree depends on the total number of leaves. Our bound also depends on the margin of the combined classifier and the average complexity of the decision trees, but our measure of complexity for an individual decision tree is based on the distribution of training examples over leaves and can be significantly smaller than the total number of leaves

Elsevier - Publisher Connector

Real World Performance of Association Rule Algorithms

Author: Llew Mason
Ron Kohavi
Zijian Zheng
Publication venue
Publication date: 01/01/2001
Field of study

This study compares five well-known association rule algorithms using three real-world datasets and an artificial dataset. The experimental results confirm the performance improvements previously claimed by the authors on the artificial data, but some of these gains do not carry over to the real datasets, indicating overfitting of the algorithms to the IBM artificial dataset. More importantly, we found that the choice of algorithm only matters at support levels that generate more rules than would be useful in practice. For support levels that generate less than 1,000,000 rules, which is much more than humans can handle and is sufficient for prediction purposes where data is loaded into RAM, Apriori finishes processing in less than 10 minutes. On our datasets, we observed super-exponential growth in the number of rules. On one of our datasets, a 0.02% change in the support increased the number of rules from less than a million to over a billion, implying that outside a very narrow range of support values, the choice of algorithm is irrelevant. Categories and Subject Descriptors H.2.8 [Database Management]: Applications -- Data Mining. Keywords Data Mining, Association Rules, Benchmark, Comparisons, Frequent Itemsets, Market Basket Analysis, Affinity Analysis

CiteSeerX

Crossref

Direct Optimization of Margins Improves Generalization in Combined Classifiers

Author: Jonathan Baxter
Llew Mason
Peter Bartlett
Publication venue: MIT Press
Publication date: 01/01/1998
Field of study

Sonar Cumulative training margin distributions for AdaBoost versus our "Direct Optimization Of Margins" (DOOM) algorithm

CiteSeerX

Improved Generalization Through Explicit Optimization of Margins

Author: Bartlett Peter
Baxter Jon
Mason Llew
Publication venue: Kluwer Academic Publishers
Publication date
Field of study

Recent theoretical results have shown that the generalization performance of thresholded convex combinations of base classifiers is greatly improved if the underlying convex combination has large margins on the training data (i.e., correct examples are classified well away from the decision boundary). Neural network algorithms and AdaBoost have been shown to implicitly maximize margins, thus providing some theoretical justification for their remarkably good generalization performance. In this paper we are concerned with maximizing the margin explicitly. In particular, we prove a theorem bounding the generalization performance of convex combinations in terms of general cost functions of the margin, in contrast to previous results, which were stated in terms of the particular cost function sgn(θ-margin). We then present a new algorithm, DOOM, for directly optimizing a piecewise-linear family of cost functions satisfying the conditions of the theorem. Experiments on several of the datasets in the UC Irvine database are presented in which AdaBoost was used to generate a set of base classifiers and then DOOM was used to find the optimal convex combination of those classifiers. In all but one case the convex combination generated by DOOM had lower test error than AdaBoost's combination. In many cases DOOM achieves these lower test errors by sacrificing training error, in the interests of reducing the new cost function. In our experiments the margin plots suggest that the size or the minimum margin is not the critical factor in determining generalization performance

The Australian National University