Search CORE

4,300 research outputs found

Improved Use of Continuous Attributes in C4.5

Author: Quinlan J. R.
Publication venue
Publication date: 01/01/1996
Field of study

A reported weakness of C4.5 in domains with continuous attributes is addressed by modifying the formation and evaluation of tests on continuous attributes. An MDL-inspired penalty is applied to such tests, eliminating some of them from consideration and altering the relative desirability of all tests. Empirical trials show that the modifications lead to smaller decision trees with higher predictive accuracies. Results also confirm that a new version of C4.5 incorporating these changes is superior to recent approaches that use global discretization and that construct small trees with multi-interval splits.Comment: See http://www.jair.org/ for any accompanying file

arXiv.org e-Print Archive

CiteSeerX

The role of circumstance monitoring on the diagnostic interpretation of condition monitoring data

Author: Bahadoorsingh S.
Catterson V.M.
McArthur S.D.J.
Rowland S.M.
Rudd S.E.
Publication venue
Publication date: 01/01/2010
Field of study

Circumstance monitoring, a recently coined termed defines the collection of data reflecting the real network working environment of in-service equipment. This ideally complete data set should reflect the elements of the electrical, mechanical, thermal, chemical and environmental stress factors present on the network. This must be distinguished from condition monitoring, which is the collection of data reflecting the status of in-service equipment. This contribution investigates the significance of considering circumstance monitoring on diagnostic interpretation of condition monitoring data. Electrical treeing partial discharge activity from various harmonic polluted waveforms have been recorded and subjected to a series of machine learning techniques. The outcome provides a platform for improved interpretation of the harmonic influenced partial discharge patterns. The main conclusion of this exercise suggests that any diagnostic interpretation is dependent on the immunity of condition monitoring measurements to the stress factors influencing the operational conditions. This enables the asset manager to have an improved holistic view of an asset's health

Crossref

University of Strathclyde Institutional Repository

The University of Manchester - Institutional Repository

Porting Decision Tree Algorithms to Multicore using FastFlow

Author: A.C. Sodan
I. Park
J.E. Gehrke
J.R. Quinlan
K. Asanovic
M. Aldinucci
M. Cole
M. Coppola
M. Joshi
M. Vanneschi
M. Zaki
M.K. Sreenivas
R. Jin
R.D. Blumofe
S. Ruggieri
S. Ruggieri
T. Lim
W. Thies
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

The whole computer hardware industry embraced multicores. For these machines, the extreme optimisation of sequential algorithms is no longer sufficient to squeeze the real machine power, which can be only exploited via thread-level parallelism. Decision tree algorithms exhibit natural concurrency that makes them suitable to be parallelised. This paper presents an approach for easy-yet-efficient porting of an implementation of the C4.5 algorithm on multicores. The parallel porting requires minimal changes to the original sequential code, and it is able to exploit up to 7X speedup on an Intel dual-quad core machine.Comment: 18 pages + cove

arXiv.org e-Print Archive

CiteSeerX

Crossref

Archivio della Ricerca - Università di Pisa

UnipiEprints

Inducing safer oblique trees without costs

Author: Althoff K.
Bennett K.P.
Bennett K.P.
Berry M.
Blake C.
Bradford J.
Breiman L.
Cohen R.
Domingos P.
Elkan C.
Elomaa T.
Fan W.
Grefenstette J.
Knoll U.
Kolodner J.
Morrison D.
Norusis M.
Nunez M.
Pazzani M.
Provost F.J.
Provost F.J.
Quinlan J.R.
Quinlan J.R.
Sunil Vadera
Tan M.
Ting K.
Turney P.
Vadera S.
Publication venue: 'Wiley'
Publication date: 01/09/2005
Field of study

Decision tree induction has been widely studied and applied. In safety applications, such as determining whether a chemical process is safe or whether a person has a medical condition, the cost of misclassification in one of the classes is significantly higher than in the other class. Several authors have tackled this problem by developing cost-sensitive decision tree learning algorithms or have suggested ways of changing the distribution of training examples to bias the decision tree learning process so as to take account of costs. A prerequisite for applying such algorithms is the availability of costs of misclassification. Although this may be possible for some applications, obtaining reasonable estimates of costs of misclassification is not easy in the area of safety. This paper presents a new algorithm for applications where the cost of misclassifications cannot be quantified, although the cost of misclassification in one class is known to be significantly higher than in another class. The algorithm utilizes linear discriminant analysis to identify oblique relationships between continuous attributes and then carries out an appropriate modification to ensure that the resulting tree errs on the side of safety. The algorithm is evaluated with respect to one of the best known cost-sensitive algorithms (ICET), a well-known oblique decision tree algorithm (OC1) and an algorithm that utilizes robust linear programming

University of Salford Institutional Repository

Crossref

Towards Automated Performance Bug Identification in Python

Author: Mazzawi Elie
Miranskyy Andriy
Tsakiltsidis Sokratis
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/07/2016
Field of study

Context: Software performance is a critical non-functional requirement, appearing in many fields such as mission critical applications, financial, and real time systems. In this work we focused on early detection of performance bugs; our software under study was a real time system used in the advertisement/marketing domain. Goal: Find a simple and easy to implement solution, predicting performance bugs. Method: We built several models using four machine learning methods, commonly used for defect prediction: C4.5 Decision Trees, Na\"{\i}ve Bayes, Bayesian Networks, and Logistic Regression. Results: Our empirical results show that a C4.5 model, using lines of code changed, file's age and size as explanatory variables, can be used to predict performance bugs (recall=0.73, accuracy=0.85, and precision=0.96). We show that reducing the number of changes delivered on a commit, can decrease the chance of performance bug injection. Conclusions: We believe that our approach can help practitioners to eliminate performance bugs early in the development cycle. Our results are also of interest to theoreticians, establishing a link between functional bugs and (non-functional) performance bugs, and explicitly showing that attributes used for prediction of functional bugs can be used for prediction of performance bugs

arXiv.org e-Print Archive

Crossref