Search CORE

53 research outputs found

HANDLING MISSING ATTRIBUTE VALUES IN DECISION TABLES USING VALUED TOLERANCE APPROACH

Author: Vasudevan Supriya
Publication venue: 'Paleontological Institute at The University of Kansas'
Publication date: 01/01/2008
Field of study

Rule induction is one of the key areas in data mining as it is applied to a large number of real life data. However, in such real life data, the information is incompletely specified most of the time. To induce rules from these incomplete data, more powerful algorithms are necessary. This research work mainly focuses on a probabilistic approach based on the valued tolerance relation. This thesis is divided into two parts. The first part describes the implementation of the valued tolerance relation. The induced rules are then evaluated based on the error rate due to incorrectly classified and unclassified examples. The second part of this research work shows a comparison of the rules induced by the MLEM2 algorithm that has been implemented before, with the rules induced by the valued tolerance based approach which was implemented as part of this research. Hence, through this thesis, the error rate for the MLEM2 algorithm and the valued tolerance based approach are compared and the results are documented

KU ScholarWorks

MRDTL: a multi-relational decision tree learning algorithm

Author: Leiva Héctor Ariel
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2002
Field of study

Many real-world data sets are organized in relational databases consisting of multiple tables and associations. Other types of data such as in bioinformatics, computational biology, HTML and XML documents require reasoning about the structure of the objects. However, most of the existing approaches to machine learning typically assume that the data are stored in a single table, and use a propositional (as opposed to relational) language for discovering predictive models. Hence, there is a need for data mining algorithms for discovery of a-priori unknown relationships from multi-relational data. This thesis explores a new framework for multi-relational data mining. It describes experiments with an implementation of a Multi-Relational Decision Tree Learning (MRDTL) algorithm for induction of decision trees from relational databases based on an approach suggested by Knobbe et al., 1999. Our experiments with widely used benchmark data sets (e.g., the carcinogenesis data) show that the performance of MRDTL is competitive with that of other algorithms for learning classifiers from multiple relations including Progol (Muggleton, 1995) FOIL (Quinlan, 1993), Tilde (Blockeel, 1998). Preliminary results indicate that MRDTL, when augmented with principled methods for handling missing attribute values, is likely to be competitive with the state-of-the-art algorithms for learning classifiers from multiple relations on real-world data sets drawn from bioinformatics applications (prediction of gene localization and gene function) used in the KDD Cup 2001 data mining competition (Cheng et al., 2002)

Digital Repository @ Iowa State University (ISU)

Rough Fuzzy Subspace Clustering for Data with Missing Values

Author: Simiński Krzysztof
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 03/06/2014
Field of study

The paper presents rough fuzzy subspace clustering algorithm and experimental results of clustering. In this algorithm three approaches for handling missing values are used: marginalisation, imputation and rough sets. The algorithm also assigns weights to attributes in each cluster; this leads to subspace clustering. The parameters of clusters are elaborated in the iterative procedure based on minimising of criterion function. The crucial parameter of the proposed algorithm is the parameter having the influence on the sharpness of elaborated subspace cluster. The lower values of the parameter lead to selection of the most important attribute. The higher values create clusters in the global space, not in subspaces. The paper is accompanied by results of clustering of synthetic and real life data sets

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Ensemble missing data techniques for software effort prediction

Author: Cartwright Michelle
Twala Bhekisipho
Publication venue: Intelligent Data Analysis, IOS Press
Publication date: 01/01/2010
Field of study

Constructing an accurate effort prediction model is a challenge in software engineering. The development and validation of models that are used for prediction tasks require good quality data. Unfortunately, software engineering datasets tend to suffer from the incompleteness which could result to inaccurate decision making and project management and implementation. Recently, the use of machine learning algorithms has proven to be of great practical value in solving a variety of software engineering problems including software prediction, including the use of ensemble (combining) classifiers. Research indicates that ensemble individual classifiers lead to a significant improvement in classification performance by having them vote for the most popular class. This paper proposes a method for improving software effort prediction accuracy produced by a decision tree learning algorithm and by generating the ensemble using two imputation methods as elements. Benchmarking results on ten industrial datasets show that the proposed ensemble strategy has the potential to improve prediction accuracy compared to an individual imputation method, especially if multiple imputation is a component of the ensemble

University of Johannesburg Institutional Repository

Research on data mining technology and its application in teaching management in Colleges and Universities

Author: 王杰
Publication venue
Publication date: 17/07/2015
Field of study

近几年来，我国教育事业进入高速发展时期，各所高校在办学规模、招生数量以及教学队伍都在日益扩大，加上高校在办学模式方面逐渐多元化与个性化，使得教学管理难度也随之增加，而传统的教学管理模式已经难以满足学校发展的需求，因此迫切地需要提高教学管理水平与效率。随着信息技术的不断发展和普及，高校信息化建设也在稳步前行，并取得非常显著的效果。正是由于高校信息化建设不断地深入与普及，使得学校积累了大量的相关数据，只有能够充分地挖掘与分析这些海量数据所包含的价值，才能进一步提高教学管理水平与效率。而数据挖掘技术就是一种有效的方法，能够充分地挖掘与分析隐藏在数据背后的信息，并为教学管理提供决策支持。本文首先分...In recent years, China's education industry has entered the high-speed development period, every college school size, the number of admissions and teaching teams are growing in, plus the university in terms of school system gradually diversified and personalized, so that also increases the difficulty of teaching management while traditional teaching management model has been difficult to meet the ...学位：工程硕士院系专业：软件学院_软件工程学号：X201223029

Xiamen University Institutional Repository

Comparison of Cart and Naive Bayesian Algorithm Performance to Diagnose Diabetes Mellitus

Author: Santiko Irfan
Subarkah Pungkas
Publication venue: Bright Publisher
Publication date: 01/09/2019
Field of study

Based on Indonesia's health profile in 2008, Diabetes Mellitus is the cause of the ranking of six for all ages in Indonesia with the proportion of deaths of 5.7% under stroke, TB, hypertension, injury and perinatal. This is reinforced by WHO (2003), Diabetes Mellitus disease reached 194 million people or 5.1 percent of the world's adult population and in 2025 is expected to increase to 333 million inhabitants. In particular, in Indonesia, people with Diabetes Mellitus are increasing. In 2000, Diabetes Mellitus sufferers have reached 8.4 million people and it is estimated that the prevalence of Diabetes Mellitus in 2030 in Indonesia reaches 21.3 million people.This allows researchers and practitioners to focus their attention on detecting/diagnosing diabetes mellitus and to prevent it because the disease can cause complications. The method used in this research was problem identification, data collection, pre-processing stage, classification method, validation and evaluation and conclusion. The algorithm used in this research was CART and Naïve Bayes using dataset taken from UCI Indian Pima database repository consisting of clinical data ofpatients who detected positive and negative diabetes mellitus. Validation and evaluation method used was 10-crossvalidation and confusion Matrix for the assessment of precision, recall and F-Measure. The result of calculation has been done, got the accuracy result on CART algorithm equaled to 76.9337% with precision 0.764%, recall 0.769%, and F-Measure 0.765%. Whilethe diabetes dataset was tested with the Naïve Bayes algorithm, got an accuracy of 73.7569% with precision 0.732%, recall 0.738%, and F-Measure 0.734%. From these results it can be concluded that to diagnose diabetes mellitus disease it is suggested to use CART algorithm

Directory of Open Access Journals

IJIIS - International Journal of Informatics and Information Systems

CLASSIFICATION MODEL FOR LEARNING DISABILITIES IN ELEMENTARY SCHOOL PUPILS

Author: AGBOOLA I. A.
AWOYELU I. O.
Publication venue: Federal University of Agriculture, Abeokuta (FUNAAB)
Publication date: 06/11/2019
Field of study

Learning disability is a general term that describes specific kinds of learning problems.  Although, Learning Disability cannot be cured medically, there exist several methods for detecting learning disabilities in a child. Existing methods of classification of learning disabilities in children are binary classification – either a child is normal or learning disabled. The focus of this paper is to extend the binary classification to multi-label classification of learning disabilities. This paper formulated and simulated a classification model for learning disabilities in primary school pupils. Information containing the symptoms of learning disabilities in pupils were elicited by administering five hundred (500) questionnaire to teachers of Primary One to Four pupils in fifteen government owned elementary schools within Ife Central Local Government Area, Ile-Ife of Osun State. The classification model was formulated using Principal Component Analysis, rule based system and back propagation algorithm. The formulated model was simulated using Waikatto Environment for Knowledge Analysis (WEKA) version 3.7.2. The performance of the model was evaluated using precision and accuracy. The classification model of primary one, primary two, primary three and primary four yielded precision rate of 95%, 91.18%, 93.10% and 93.60% respectively while the accuracy results were 95.00%, 91.18%, 93.10% and 93.60% respectively. The results obtained showed that the developed model proved to be accurate and precise in classifying pupils with learning disabilities in primary schools. The model can be adopted for the management of pupils with learning disabilities. &nbsp

Federal University of Agriculture, Abeokuta: FUNAAB Journal