95 research outputs found
Log Mining Using Generalized Association Rules
Explosive growth in size and usage of the World Wide Web has made it necessary for Web site administrators to track and analyze the navigation patterns of Web site visitors. To achieve this goal, the use of web mining tool is necessary. Web mining can be defined as the use of data mining techniques to automatically discover and extract information from web documents. Since Data Mining is primarily concerned with the discovery of knowledge and aims to provide answers to questions that people do not know how to ask, it is not an automatic process. Rather one has to exhaustively explores very large volumes of data to determine otherwise hidden relationships. The process extracts high quality information that can be used to draw conclusions based on relationships or patterns within the data. However, data mining technique are not easily applicable to Web data due to problems both related with the technology underlying the Web and the lack of standards in the design and implementation of Web pages. Information collected by the Web servers are kept in the server log is the main source of data for analyzing user navigation patterns. Once logs have been pre-processed and sessions have been obtained, there are several kinds of access pattern mining that can be performed depending on the needs of the analyst. Since the method use in this study relied on relatively simple techniques therefore the information gathered is adequate for real user profile data due to the noise in the data has to be first tackled. In this study, Data Mining techniques known as generalized association rules was used in order to get some insights into website usage pattern. For the purpose of this study, server logs from tutor.com portal were retrieved, pre-processed and analyzed. An important finding from this study is that Mathematics subject generally popular from UPSR, PMR and UPSR levels. On the contrary, arts subjects are not popular to Tutor.com users. The system administrator may consider evaluating the content and the link for such subjects, so that the real problem can be identified
Two-class classification: comparative experiments for chronic kidney disease
Over two million of population across worldwide is
currently depending on dialysis treatment or a kidney transplant
to survive from kidney disease. Therefore, it is imperative for
health agencies such as hospitals or insurance companies to
predict the probabilities of patients who suffers from chronic case
of kidney diseases, hence requiring medical attentions. This study
performs a comparative experiment on prediction of chronic
kidney disease via a classification methodology. Two supervised
classification algorithms are used to build the classification model,
which are Two-Class Decision Forest and Two-Class Neural
Networks. Experimental results showed that Neural Network
performed better based on all features but Decision Forest
produced optimal performance with high accuracy, and precision
as compared to Neural Networks and other algorithms from the
literature such as K-Nearest Neighbor, Support Vector Machine,
and Rule Induction
Elderly care monitoring system with IoT application
Falls among elderly can pose serious consequences such as injury or even fatal ones. Therefore, it is essential that fall are detected early and away to that is by using IoT platform. The authors have been developing a wearable device for elderly monitoring system utilizing accelerometer. The data from accelerometer is connected to an Internet-of-Things (IoT) platform called ThingSpeakTM. Based on IoT platform, elderly patients can be remotely monitored as long as the care providers have good internet access. The paper presents the experimental results of determining the sensitivity and specificity of the accelerometer used in the proposed system. This is the first step for developing an accurate data acquisition for monitoring purposes. Based on the experimental results, the average percentage for sensitivity obtained for this device is 73.3%, while the average for specificity obtained is 89.3%. Both sensitivity and specificity tests shows promising results which indicates that the device only has a fail rate of 26.7% and error rate of 10.7%
Data pre-processing on web server logs for generalized association rules mining algorithm
Web log file analysis began as a way for IT administrators to ensure adequate bandwidth and server capacity on their organizations website. Log file data can offer valuable insight into web site usage.It reflects actual usage in natural working condition, compared to the artificial setting of a usability lab.It represents the activity of many users, over potentially long period of time, compared to a limited number of users for an hour or two each.This paper describes the pre-processing techniques on IIS Web Server Logs ranging from the raw log file until before mining process can be performed. Since the pre-processing is tedious process, it depending on the algorithm and purposes of the applications
Comparing the knowledge quality in rough classifier and decision tree classifier
This paper presents a comparative study of two rule based classifier; rough set (Rc) and decision tree (DTc).Both techniques apply different approach to perform classification but produce same structure of output with comparable result. Theoretically, different classifiers will generate different sets of rules via knowledge even though they are implemented to the same classification problem.Hence, the aim of this paper is to investigate the quality of knowledge produced by Rc and DTc when similar problems are presented to them.In this case, four important performance metrics are used as comparison, the accuracy of classification, rules quantity, rules length and rules coverage.Five dataset from UCI Machine Learning are chosen and then mined using Rc toolkit namely ROSETTA while C4.5 algorithm in WEKA application is chosen as DTc rule generator. The experimental result shows that Rc and DTc own capability to generate quality knowledge since most of the results are comparable. Rc outperform as an accurate classifier, produce shorter and simpler rule with higher coverage. Meanwhile, DTc obviously generates fewer numbers of rules with significant difference
Discovering usage patterns from web server logs
As the amount of information available on the World Wide Web (WWW) increases rapidly, the number of sites that hold particular information also increases. In order to have some
insights o the site usage, system administrator needs tools that can aid in his usage site’s analysis.To achieve this goal, the use of web mining too is necessary to discover the usage pattern of a particular site. For the purpose of this study, server logs from the educational portal were
retrieved, pre-processed and analyzed. Information collected by the Web servers are kept in the server logs and used as the main source of data for analyzing users’ navigation patterns. Once the server logs have been preprocessed and sessions have been obtained, there are several kinds of access pattern mining that can be performed, depending on the needs of the analyst. In this
study, data mining technique known as Generalized Association Rule was used in order to get some insights into website usage pattern. The findings from this study provide an overview of the usage pattern of particular educational portal. The study also demonstrates how Generalized Association Rule can be used in site usage analysis. Such a technique enables the discovery of
hidden information within the web server logs using data mining technique
The preferable test documentation using IEEE 829
During software development, testing is one of the processes to find
errors and aimed at evaluating a program meets its required results. In testing
phase there are several testing activity involve user acceptance test, test
procedure and others. If there is no documentation involve in testing the phase
the difficulty happen during test with no solution. It because no reference they
can refer to overcome the same problem. IEEE 829 is one of the standard to
conformance the address requirements. In this standard has several
documentation provided during testing including during preparing test, running
the test and completion test. In this paper we used this standard as guideline to
analyze which documentation our companies prefer the most. From our
analytical study, most company in Malaysia they prepare document for Test
Plan and Test Summary
Pattern extraction for programming performance evaluation using directed apriori
Computer programming is taught as a core subject in Information Technology related studies.It is one of the most essential skills which each student has to acquire.However, there is still a small number of students who are unable to
write a program well. Several researches indicated that there are many factors which can affect student programming performance.Thus, the objective of this paper is to investigate
the significant factors that may influence students programming performance using information from previous student performance.Since data mining data analysis able to discover hidden knowledge in database, a programming dataset which comprises information about performance profile of Bachelor of Information Technology students of
Faculty of IT, Universiti Utara Malaysia in the year 2004-2005 were explored using data mining technique.The dataset consists of 421 records with 70 mixture type of attributes were pre-processed and then mined using directed association rule (AR) mining algorithm namely apriori.The result indicated that the student who has a programming
experience in advanced before starts learn programming in university and scored well in Mathematics and English subject during SPM were among the factor that contributes to a good
programming grades
- …