Search CORE

3 research outputs found

Comparison of Data Mining and Statistical Techniques for Prediction Model

Author: AMJAD A. M. HARB
امجد عبد المنعم محمود حرب
Publication venue: جامعة القدس
Publication date: 10/05/2012
Field of study

The aim of this research is to perform a comparison study between statistical and data mining modeling techniques. These techniques are statistical Logistic Regression, data mining Decision Tree and data mining Neural Network. The performance of these prediction techniques were measured and compared in terms of measuring the overall prediction accuracy percentage agreement for each technique and the models were trained using eight different training datasets samples drawn using two different sampling techniques. The effect of the dependent variable values distribution in the training dataset on the overall prediction percent and on the prediction accuracy of individual “0” and “1” values of the dependent variable values was also experimented. For a given data set, the results shows that the performance of the three techniques were comparable in general with small outperformance for the Neural Network. An affecting factor that makes the percent prediction accuracy varied is the dependent variable values distribution in the training dataset, distribution of “0” and “1”. The results showed that, for all the three techniques, the overall prediction accuracy percentage agreement was high when the dependent variable values distribution ratio in the training data was greater than 1:1 but at the same time they, the techniques, fails to predict the individual dependent variable values successfully or in acceptable prediction percent. If the individual dependent variable values needed to be predicted comparably, then the dependent variable values distribution ratio in the training data should be exactly 1:1.هدف هذه الدراسة هو إجراء مقارنة الكفاءة والفعالية بين الوسائل اإلحصائية وتقنيات التنقيب عن البيانات لبناء نماذج التصنيف والتنبؤ العلمي. الخوارزميات والوسائل والتقنيات التي تمت دراستها ومقارنة أدائها هي االنحدار اللوجستي اإلحصائي، وتقنيتي التنقيب عن البيانات شجرة القرار والشبكة العصبية. تم قياس أداء هذه التقنيات ومقارنتها باالعتماد على مقياس مشترك وهو النسبة المئوية الشاملة لدقة التنبؤ لكل تقنية. تم تدريب نماذج هذه التقنيات باستخدام ثمانية عينات من بيانات التدريب تم سحبها باالعتماد على تقنيتي سحب عينات إحصائية. تم أيضا فحص تأثير توزيع قيم المتغير التابع في بيانات تدريب خوارزميات التنبؤ المذكورة وذلك على مستوى النسبة المئوية الشاملة لدقة التنبؤ لكل تقنية وأيضا على مستوى النسبة المئوية لدقة التنبؤ لقيم المتغير التابع الفردية "0 "و "1 "لكل تقنية. أظهرت النتائج أن أداء التقنيات الثالثة كانت بشكل عام متقاربة وقابلة للمقارنة مع تفوق بسيط لخوارزمية الشبكات العصبية. تم تحديد عنصر مؤثر على اختالف وتفاوت دقة النسبة المئوية للتنبؤ وهذا العنصر هو توزيع قيم المتغير التابع في بيانات تدريب النماذج، أي توزيع "0 "و "1 ."كما أظهرت النتائج أيضا أن النسبة المئوية لدقة التنبؤ الشامل للتقنيات الثالثة كانت مرتفعة عندما كانت نسبة توزيع قيم المتغير التابع في بيانات التدريب أكبر من 1:1 ولكن في الوقت نفسه فشلت الخوارزميات والتقنيات قيد الدراسة في التنبؤ بالقيم الفردية للمتغير التابع بنجاح أو بنسبة تنبؤ مقبولة. في التطبيقات باستخدام هذه التقنيات إذا كان الهدف هو الحصول على تنبؤ بنسبة مئوية عالية لقيم المتغير التابع الفردية وأن تكون النسبة المئوية للتنبؤ بالقيمتين متقاربة فانه يجب أن تكون نسبة توزيع قيم المتغير التابع في بيانات التدريب بالضبط .1:1 تساو

Al-Quds University Digital Repository

Multi-dimensional clustering in user profiling

Author: Cufoglu A.
Cufoglu A.
Publication venue
Publication date: 01/01/2012
Field of study

User profiling has attracted an enormous number of technological methods and applications. With the increasing amount of products and services, user profiling has created opportunities to catch the attention of the user as well as achieving high user satisfaction. To provide the user what she/he wants, when and how, depends largely on understanding them. The user profile is the representation of the user and holds the information about the user. These profiles are the outcome of the user profiling. Personalization is the adaptation of the services to meet the user’s needs and expectations. Therefore, the knowledge about the user leads to a personalized user experience. In user profiling applications the major challenge is to build and handle user profiles. In the literature there are two main user profiling methods, collaborative and the content-based. Apart from these traditional profiling methods, a number of classification and clustering algorithms have been used to classify user related information to create user profiles. However, the profiling, achieved through these works, is lacking in terms of accuracy. This is because, all information within the profile has the same influence during the profiling even though some are irrelevant user information. In this thesis, a primary aim is to provide an insight into the concept of user profiling. For this purpose a comprehensive background study of the literature was conducted and summarized in this thesis. Furthermore, existing user profiling methods as well as the classification and clustering algorithms were investigated. Being one of the objectives of this study, the use of these algorithms for user profiling was examined. A number of classification and clustering algorithms, such as Bayesian Networks (BN) and Decision Trees (DTs) have been simulated using user profiles and their classification accuracy performances were evaluated. Additionally, a novel clustering algorithm for the user profiling, namely Multi-Dimensional Clustering (MDC), has been proposed. The MDC is a modified version of the Instance Based Learner (IBL) algorithm. In IBL every feature has an equal effect on the classification regardless of their relevance. MDC differs from the IBL by assigning weights to feature values to distinguish the effect of the features on clustering. Existing feature weighing methods, for instance Cross Category Feature (CCF), has also been investigated. In this thesis, three feature value weighting methods have been proposed for the MDC. These methods are; MDC weight method by Cross Clustering (MDC-CC), MDC weight method by Balanced Clustering (MDC-BC) and MDC weight method by changing the Lower-limit to Zero (MDC-LZ). All of these weighted MDC algorithms have been tested and evaluated. Additional simulations were carried out with existing weighted and non-weighted IBL algorithms (i.e. K-Star and Locally Weighted Learning (LWL)) in order to demonstrate the performance of the proposed methods. Furthermore, a real life scenario is implemented to show how the MDC can be used for the user profiling to improve personalized service provisioning in mobile environments. The experiments presented in this thesis were conducted by using user profile datasets that reflect the user’s personal information, preferences and interests. The simulations with existing classification and clustering algorithms (e.g. Bayesian Networks (BN), Naïve Bayesian (NB), Lazy learning of Bayesian Rules (LBR), Iterative Dichotomister 3 (Id3)) were performed on the WEKA (version 3.5.7) machine learning platform. WEKA serves as a workbench to work with a collection of popular learning schemes implemented in JAVA. In addition, the MDC-CC, MDC-BC and MDC-LZ have been implemented on NetBeans IDE 6.1 Beta as a JAVA application and MATLAB. Finally, the real life scenario is implemented as a Java Mobile Application (Java ME) on NetBeans IDE 7.1. All simulation results were evaluated based on the error rate and accuracy

WestminsterResearch

SEMINAR NASIONAL INOVASI TEKNOLOGI DAN ILMU KOMPUTER ( 2021 ) TEMA: “Prospek Menjadi Technopreneur Dimasa Pandemi”

Author: SNITIK 2021
Publication venue: PUBLISH BUKU UNPRI PRESS ISBN
Publication date: 05/03/2022
Field of study

Kegiatan Seminar Nasional Inovasi Teknologi dan Ilmu Komputer (SNITIK 2021) merupakan kegiatan yang rutin diadakan Fakultas Teknologi dan Ilmu Komputer, Universitas Prima Indonesia (FTIK UNPRI). Pada awalnya seminar ini dinamakan Semnas FTIK dan dilaksanakan selama 4 tahun, setelah itu namanya diubah menjadi SNITIK dengan ruang lingkup yang lebih luas. Di tahun ketujuh dilaksanakannya Seminar ini, diangkat tema “Prospek Menjadi Technopreneur Dimasa Pandemi.”. Dampak Pandemi Covid-19 sangat mempengaruhi beberapa sektor industri dan usaha global. Selama masa pandemi Covid-19, kebanyakan Customer lebih sering belanja secara online karena dianggap lebih mudah dan praktis. Hal ini yang menunjukkan lapangan usaha sekarang sangat berhubungan erat dengan teknologi. Sehingga perlunya memanfaatkan teknologi dalam mengembangkan model bisnis baru untuk menciptakan peluang usaha. Kondisi ini mendorong industri menggunakan sumber daya manusia lulusan perguruan tinggi yang kompeten dan memiliki jiwa techopreneur

Universitas Prima Indonesia: Open Journal Systems