3 research outputs found
Comparison of Data Mining and Statistical Techniques for Prediction Model
The aim of this research is to perform a comparison study between statistical and data mining modeling techniques. These techniques are statistical Logistic Regression, data mining Decision Tree and data mining Neural Network. The performance of these prediction techniques were measured and compared in terms of measuring the overall prediction accuracy percentage agreement for each technique and the models were trained using eight different training datasets samples drawn using two different sampling techniques. The effect of the dependent variable values distribution in the training dataset
on the overall prediction percent and on the prediction accuracy of individual “0” and “1” values of the dependent variable values was also experimented. For a given data set, the results shows that the performance of the three techniques were comparable in general with small outperformance for the Neural Network. An affecting factor that makes the percent prediction accuracy varied is the dependent variable values distribution in the training dataset, distribution of “0” and “1”. The results showed that, for all the three techniques, the overall prediction accuracy percentage agreement was high when the dependent variable values distribution ratio in the training data was greater than 1:1 but at the same
time they, the techniques, fails to predict the individual dependent variable values successfully or in acceptable prediction percent. If the individual dependent variable values needed to be predicted comparably, then the dependent variable values distribution ratio in the training data should be exactly 1:1.هدف هذه الدراسة هو إجراء مقارنة الكفاءة والفعالية بين الوسائل اإلحصائية وتقنيات التنقيب عن البيانات لبناء نماذج
التصنيف والتنبؤ العلمي. الخوارزميات والوسائل والتقنيات التي تمت دراستها ومقارنة أدائها هي االنحدار اللوجستي
اإلحصائي، وتقنيتي التنقيب عن البيانات شجرة القرار والشبكة العصبية. تم قياس أداء هذه التقنيات ومقارنتها باالعتماد
على مقياس مشترك وهو النسبة المئوية الشاملة لدقة التنبؤ لكل تقنية. تم تدريب نماذج هذه التقنيات باستخدام ثمانية
عينات من بيانات التدريب تم سحبها باالعتماد على تقنيتي سحب عينات إحصائية. تم أيضا فحص تأثير توزيع قيم
المتغير التابع في بيانات تدريب خوارزميات التنبؤ المذكورة وذلك على مستوى النسبة المئوية الشاملة لدقة التنبؤ لكل
تقنية وأيضا على مستوى النسبة المئوية لدقة التنبؤ لقيم المتغير التابع الفردية "0 "و "1 "لكل تقنية. أظهرت النتائج أن
أداء التقنيات الثالثة كانت بشكل عام متقاربة وقابلة للمقارنة مع تفوق بسيط لخوارزمية الشبكات العصبية. تم تحديد
عنصر مؤثر على اختالف وتفاوت دقة النسبة المئوية للتنبؤ وهذا العنصر هو توزيع قيم المتغير التابع في بيانات
تدريب النماذج، أي توزيع "0 "و "1 ."كما أظهرت النتائج أيضا أن النسبة المئوية لدقة التنبؤ الشامل للتقنيات الثالثة
كانت مرتفعة عندما كانت نسبة توزيع قيم المتغير التابع في بيانات التدريب أكبر من 1:1 ولكن في الوقت نفسه فشلت
الخوارزميات والتقنيات قيد الدراسة في التنبؤ بالقيم الفردية للمتغير التابع بنجاح أو بنسبة تنبؤ مقبولة. في التطبيقات
باستخدام هذه التقنيات إذا كان الهدف هو الحصول على تنبؤ بنسبة مئوية عالية لقيم المتغير التابع الفردية وأن تكون
النسبة المئوية للتنبؤ بالقيمتين متقاربة فانه يجب أن تكون نسبة توزيع قيم المتغير التابع في بيانات التدريب بالضبط
.1:1 تساو
Multi-dimensional clustering in user profiling
User profiling has attracted an enormous number of technological methods and
applications. With the increasing amount of products and services, user profiling
has created opportunities to catch the attention of the user as well as achieving
high user satisfaction. To provide the user what she/he wants, when and how,
depends largely on understanding them. The user profile is the representation of
the user and holds the information about the user. These profiles are the
outcome of the user profiling.
Personalization is the adaptation of the services to meet the user’s needs and
expectations. Therefore, the knowledge about the user leads to a personalized
user experience. In user profiling applications the major challenge is to build and
handle user profiles. In the literature there are two main user profiling methods,
collaborative and the content-based. Apart from these traditional profiling
methods, a number of classification and clustering algorithms have been used
to classify user related information to create user profiles. However, the profiling,
achieved through these works, is lacking in terms of accuracy. This is because,
all information within the profile has the same influence during the profiling even
though some are irrelevant user information.
In this thesis, a primary aim is to provide an insight into the concept of user
profiling. For this purpose a comprehensive background study of the literature
was conducted and summarized in this thesis. Furthermore, existing user
profiling methods as well as the classification and clustering algorithms were investigated. Being one of the objectives of this study, the use of these
algorithms for user profiling was examined. A number of classification and
clustering algorithms, such as Bayesian Networks (BN) and Decision Trees
(DTs) have been simulated using user profiles and their classification accuracy
performances were evaluated. Additionally, a novel clustering algorithm for the
user profiling, namely Multi-Dimensional Clustering (MDC), has been proposed.
The MDC is a modified version of the Instance Based Learner (IBL) algorithm.
In IBL every feature has an equal effect on the classification regardless of their
relevance. MDC differs from the IBL by assigning weights to feature values to
distinguish the effect of the features on clustering. Existing feature weighing
methods, for instance Cross Category Feature (CCF), has also been
investigated. In this thesis, three feature value weighting methods have been
proposed for the MDC. These methods are; MDC weight method by Cross
Clustering (MDC-CC), MDC weight method by Balanced Clustering (MDC-BC)
and MDC weight method by changing the Lower-limit to Zero (MDC-LZ). All of
these weighted MDC algorithms have been tested and evaluated. Additional
simulations were carried out with existing weighted and non-weighted IBL
algorithms (i.e. K-Star and Locally Weighted Learning (LWL)) in order to
demonstrate the performance of the proposed methods. Furthermore, a real life scenario is implemented to show how the MDC can be used for the user
profiling to improve personalized service provisioning in mobile environments.
The experiments presented in this thesis were conducted by using user profile
datasets that reflect the user’s personal information, preferences and interests.
The simulations with existing classification and clustering algorithms (e.g. Bayesian Networks (BN), Naïve Bayesian (NB), Lazy learning of Bayesian
Rules (LBR), Iterative Dichotomister 3 (Id3)) were performed on the WEKA
(version 3.5.7) machine learning platform. WEKA serves as a workbench to
work with a collection of popular learning schemes implemented in JAVA. In
addition, the MDC-CC, MDC-BC and MDC-LZ have been implemented on
NetBeans IDE 6.1 Beta as a JAVA application and MATLAB. Finally, the real life
scenario is implemented as a Java Mobile Application (Java ME) on NetBeans
IDE 7.1. All simulation results were evaluated based on the error rate and
accuracy
SEMINAR NASIONAL INOVASI TEKNOLOGI DAN ILMU KOMPUTER ( 2021 ) TEMA: “Prospek Menjadi Technopreneur Dimasa Pandemi”
Kegiatan Seminar Nasional Inovasi Teknologi dan Ilmu Komputer (SNITIK 2021) merupakan kegiatan yang rutin diadakan Fakultas Teknologi dan Ilmu Komputer, Universitas Prima Indonesia (FTIK UNPRI). Pada awalnya seminar ini dinamakan Semnas FTIK dan dilaksanakan selama 4 tahun, setelah itu namanya diubah menjadi SNITIK dengan ruang lingkup yang lebih luas. Di tahun ketujuh dilaksanakannya Seminar ini, diangkat tema “Prospek Menjadi Technopreneur Dimasa Pandemi.”. Dampak Pandemi Covid-19 sangat mempengaruhi beberapa sektor industri dan usaha global. Selama masa pandemi Covid-19, kebanyakan Customer lebih sering belanja secara online karena dianggap lebih mudah dan praktis. Hal ini yang menunjukkan lapangan usaha sekarang sangat berhubungan erat dengan teknologi. Sehingga perlunya memanfaatkan teknologi dalam mengembangkan model bisnis baru untuk menciptakan peluang usaha. Kondisi ini mendorong industri menggunakan sumber daya manusia lulusan perguruan tinggi yang kompeten dan memiliki jiwa techopreneur