Search CORE

105,659 research outputs found

TESTING DECISION TREE ALGORITHM USING TANAGRA APPLICATIONS

Author: Sinaga Soni Bahagia
Publication venue: ASEAN Institute for Health Development
Publication date: 04/12/2018
Field of study

Decision tree is one algorithm that is used to classify segmentation or grouping which is predictive, Decision Tree Algorithm has the Advantages in processing numerical (continuous) and discrete data, can handle missing attribute values, produces rules that are easily interpreted and the fastest among other algorithms. Prediction accuracy is the ability of the model to be able to predict class labels against new or previously unknown data well. In terms of speed or computational time efficiency needed to create and use a model. The application used is Tanagra because the application is available for Decision tree architecture

Journal Sean Institute

Recommended from our members

Effective techniques for handling incomplete data using decision trees

Author: Twala Bhekisipho E.T.H.
Publication venue
Publication date: 01/01/2005
Field of study

Decision Trees (DTs) have been recognized as one of the most successful formalisms for knowledge representation and reasoning and are currently applied to a variety of data mining or knowledge discovery applications, particularly for classification problems. There are several efficient methods to learn a DT from data. However, these methods are often limited to the assumption that data are complete. In this thesis, some contributions to the field of machine learning and statistics that solve the problem of extracting DTs for learning and classification tasks from incomplete databases are presented. The methodology underlying the thesis blends together well-established statistical theories with the most advanced techniques for machine learning and automated reasoning with uncertainty. The first contribution is the extensive simulations which study the impact of missing data on predictive accuracy of existing DTs which can cope with missing values, when missing values are in both the training and test sets or when they are in either of the two sets. All simulations are performed under missing completely at random, missing at random and informatively missing mechanisms and for different missing data patterns and proportions. The proposal of a simple, novel, yet effective proposed procedure for training and testing using decision trees in the presence of missing data is the next contribution. Original and simple splitting criteria for attribute selection in tree building are put forward. The proposed technique is evaluated and validated in empirical tests over many real world application domains. In this work, the proposed algorithm maintains (sometimes exceeds) the outstanding accuracy of multiple imputation, especially on datasets containing mixed attributes and purely nominal attributes. Also, the proposed algorithm greatly improves in accuracy for IM data. Another major advantage of this method over multiple imputation is the important saving in computational resources due to it simplicity. The next contribution is the proposal of three versions of simple probabilistic techniques that could be used for classifying incomplete vectors using decision trees based on complete data. The proposed procedure is superficially similar to that of fractional cases but more effective. The experimental results demonstrate that these approaches can achieve comparative quality to sophisticated algorithms like multiple imputation and therefore are applicable to all kinds of datasets. Finally, novel uses of two proposed ensemble procedures for handling incomplete training and test data are proposed and discussed. The algorithms combine the two best approaches either with resampling (REMIMIA) or without resampling (EMIMIA) of the training data before growing the decision trees. Experiments are used to evaluate and validate the success of the proposed ensemble methods with respect to individual missing data techniques in the form of empirical tests. EMIMIA attains the highest overall level of prediction accuracy

Open Research Online (The Open University)

Predictive Capacity of Meteorological Data - Will it rain tomorrow

Author: Ahmed Bilal
Publication venue
Publication date: 16/09/2014
Field of study

With the availability of high precision digital sensors and cheap storage medium, it is not uncommon to find large amounts of data collected on almost all measurable attributes, both in nature and man-made habitats. Weather in particular has been an area of keen interest for researchers to develop more accurate and reliable prediction models. This paper presents a set of experiments which involve the use of prevalent machine learning techniques to build models to predict the day of the week given the weather data for that particular day i.e. temperature, wind, rain etc., and test their reliability across four cities in Australia {Brisbane, Adelaide, Perth, Hobart}. The results provide a comparison of accuracy of these machine learning techniques and their reliability to predict the day of the week by analysing the weather data. We then apply the models to predict weather conditions based on the available data.Comment: 7 pages, 2 Result Set

arXiv.org e-Print Archive

Crossref

A review of associative classification mining

Author: Thabtah Fadi
Publication venue
Publication date: 01/01/2007
Field of study

Associative classification mining is a promising approach in data mining that utilizes the association rule discovery techniques to construct classification systems, also known as associative classifiers. In the last few years, a number of associative classification algorithms have been proposed, i.e. CPAR, CMAR, MCAR, MMAC and others. These algorithms employ several different rule discovery, rule ranking, rule pruning, rule prediction and rule evaluation methods. This paper focuses on surveying and comparing the state-of-the-art associative classification techniques with regards to the above criteria. Finally, future directions in associative classification, such as incremental learning and mining low-quality data sets, are also highlighted in this paper

CiteSeerX

University of Huddersfield Repository