18 research outputs found

    Generic Framework for Gaining Insight Into Data

    Get PDF
    Efficient data analysis can be made easier with datasets having columns in horizontal tabular layout. Aggregations using standard SQL return one column per aggregated group. So existing SQL aggregations have limitations in preparing datasets. In this paper we have proposed a framework to build dataset using a new class of functions called horizontal aggregations. To speed up the dataset preparation task we have partitioned vertical aggregations on grouping column and optimized SPJ method. Also it is proposed to integrate summary dataset, obtained from the result of horizontal aggregation, into homogeneous cluster using K-means algorithm. DOI: 10.17762/ijritcc2321-8169.150616

    ПРО КЛІНІЧНУ ЕКСПЕРТНУ СИСТЕМУ, ЩО ГРУНТУЄТЬСЯ НА ПРАВИЛАХ, НА ОСНОВІ ТЕХНОЛОГІ DATA MINING

    Get PDF
    In the work the topics of software implementation of rule induction method based on sequential covering algorithm are considered. Such approach allows us to develop clinical decision support system. The project is implemented within Netbeans IDE based on Java-classes.В работе рассмотрены вопросы программной реализации метода индукции правил на основе алгоритма последовательного покрытия. Такой подход позволяет разработать систему поддержки клинических решений. Проект реализован в среде Netbeans на основе Java-классов.У роботі розглянуто питання програмно реалізаці методу індукці правил на основі алгоритму послідовного покриття. Такий підхід дозволяє розробити систему підтримки клінічних рішень. Проект реалізовано в середовищі Netbeans на основі Java-класів

    Bayesian Classifiers Programmed In SQL Using PCA

    Get PDF
    The Bayesian classifier is a fundamental classification technique We also consider different concepts regarding Dimensionality Reduction techniques for retrieving lossless data In this paper we proposed a new architecture for pre-processing the data Here we improved our Bayesian classifier to produce more accurate models with skewed distributions data sets with missing information and subsets of points having significant overlap with each other which are known issues for clustering algorithms so we are interested in combining Dimensionality Reduction technique like PCA with Bayesian Classifiers to accelerate computations and evaluate complex mathematical equations The proposed architecture in this project contains the following stages pre-processing of input data Na ve Bayesian classifier Bayesian classifier Principal component analysis and database Principal Component Analysis PCA is the process of reducing components by calculating Eigen values and Eigen Vectors We consider two algorithms in this paper Bayesian Classifier based on KMeans BKM and Na ve Bayesian Classifier Algorithm N

    РОЗРОБКА КЛІНІЧНОЇ ДІАГНОСТИЧНОЇ СИСТЕМИ, ЩО ҐРУНТУЄТЬСЯ НА ПРАВИЛАХ, ПОБУДОВАНИХ МЕТОДОМ ПОСЛІДОВНОГО ПОКРИТТЯ

    Get PDF
    The work deals with the computational complexity of the rule induction algorithm based on sequential covering when developing clinical diagnostic systems. Established evaluation confirmed experimentally as a change in the amount of attributes, and the volume of training data sets.В работе рассматриваются вопросы вычислительной сложности алгоритма индукции правил на основе алгоритма последовательного покрытия при разработке клинической диагностической системы. Установленные оценки подтверждены экспериментально как с изменением количества атрибутов, так и объема наборов учебных данных.У роботі розглянуто питання обчислювальної складності алгоритму індукції правил на основі алгоритму послідовного покриття при розробці клінічної діагностичної системи. Встановлені оцінки підтверджено експериментально як із зміною кількості атрибутів, так і обсягу наборів навчальних даних

    INFORMATION SUPPORT SYSTEM OF MEDICAL SYSTEM RESEARCH

    Get PDF
    Background. Medical system research requires information support system of implementing data mining algorithms resulting in decision trees or IF-THEN rules. Besides that, this system should be object-oriented and web-integrated.Objective. The aim of this study was to develop information support system based on data mining algorithms applied to system analysis method for medical system research.Methods. System analysis methods are used for qualitative analysis of mathematical models diseases. Algorithms such as decision tree induction and sequential covering algorithm are applied for data mining from learning data set.Results. Taking into consideration the complexity of mathematical equations (nonlinear systems with delays), scientific community requires the appearance of new powerfull methods of exact parameter identification and qualitative analysis. From the point of view of theoretical medicine, uncertainties arising in models of diseases require to develop treatment schemes that are effective, take into account toxicity constraints, enable better life quality, have cost benefit. Multivariate method of qualitative analysis of mathematical models can be used for pathologic process forms of classification.Conclusions. The complex qualitative behavior of diseases models depending on parameters and controllers was observed in our investigation even without considering probabilistic nature of the majority of quantities and parameters of information models.KEY WORDS: data mining, system analysis, medical research, decision makin

    АЛГОРИТМ КЛАСИФІКАЦІЇ ПОЛІТРАВМ МЕТОДОМ ІНДУКЦІЇ ДЕРЕВА РІШЕНЬ

    Get PDF
    The work program is developed and implemented induction decision tree method for classification of polytrauma based on a number of biochemical parameters.The selection algorithm uses the value of the attribute information gain. The project was implemented in the  mediumNetbeans Java-based classes.В работе разработан и программно реализован метод индукции дерева решений для задачи классификации политравм на основании ряда биохимических показателей.Алгоритм выбора атрибута использует значение прироста информации. Проект реализован в среде Netbeans на основе Java- классов.У роботі розроблено і програмно реалізовано метод індукції дерева рішень для задачі класифікації політравм на основі ряду біохімічних показників.Алгоритм вибору атрибуту використовує значення приросту інформації. Проект реалізовано в середовищі Netbeans на основі Java-класів

    Algoritma Decision Table Menggunakan Inner Join Bersyarat untuk Klasifikasi Hasil Penilaian Angka Kredit Perekayasa

    Get PDF
    Salah satu syarat yang diperlukan dalam kenaikan pangkat dan jabatan perekayasa adalah surat penetapan angka kredit atau PAK. Untuk memperolehnya, perekayasa menyerahkan daftar usulan PAK (DUPAK) – yang menjelaskan kegiatan-kegiatan yang telah dilakukan selama periode tertentu – kepada sekretariat Tim Penilai untuk dilakukan proses penilaian. Proses penilaian DUPAK dilakukan secara manual sehingga seringkali bermasalah, seperti kesalahan dalam mencatat data perekayasa dan merekap hasil penilaian. Untuk mengatasi masalah tersebut, dibutuhkan aplikasi yang dapat digunakan oleh sekretariat. Makalah ini membahas tentang rancangan aplikasi Sistem Administrasi Penilaian PAK Perekayasa (SAPPP) dengan penekanan pada penggunaan teknologi Ajax pada antarmuka aplikasi untuk kemudahan interaksi pengguna. Hasil rancangan memperlihatkan bahwa aplikasi SAPPP dengan dukungan Ajax dapat membantu kelancaran proses penilaian dan mengatasi permasalahan yang sebelumnya terjadi. Kata kunci: perekayasa, penilaian, DUPAK, decision table, klasifikasi, pengambilan keputusan   Abstract   One of the requirements required in the promotion and position of engineer is a letter of determination of credit score. To obtain it, the engineer submits a list of proposed credit scores to the Appraiser Secretariat for the assessment process. Manually appraisal processes are often problematic, such as errors in recording data, assessing and assigning promotional recommendations. To overcome these problems, the application of the Administration System for Assessment and Designation of Engineer Credit Rate (SAPPP) is designed and developed. The development method is emphasized on the implementation of the Decision Table (DT) algorithm using conditional inner join, this method transforms the composition of the credit code into the decision rule to obtain the classification of the assessment results used in decision making. The development results show that SAPPP applications with the support of classification and visualization system can help the process of appraisal and determination of credit numbers of engineers more effectively and efficiently. Keywords: engineer, assessment,  proposed credit scores, decision table, classification, decision makin

    Integrasi Algoritma K-Means Dengan Bahasa SQL Untuk Klasterisasi IPK Mahasiswa (Studi Kasus: Fakultas Ilmu Komputer Universitas Brawijaya)

    Get PDF
    AbstrakSecara umum, aplikasi klasterisasi diimplementasikan di luar DBMS dengan mengambil data terlebih dahulu dari basisdata untuk disimpan sementara dalam variabel program (misal dalam sebuah array), kemudian baru dilakukan proses klasterisasi. Permasalahan waktu dan keamanan dalam pengambilan data dari DBMS dan besarnya data yang akan diklasterisasi mendorong metode lain dimana proses klasterisasi bisa langsung dilakukan di DBMS. Klasterisasi dilakukan dengan mengintegrasikan algoritma klasterisasi pada DBMS menggunakan bahasa SQL. Pada penelitian ini difokuskan pada perancangan dan pengimplementasian integrasi algoritma klasterisasi K-means pada Relational DBMS dengan menggunakan bahasa SQL. Proses klasterisasi dilakukan dengan studi kasus data akademik mahasiswa di Fakultas Ilmu Komputer universitas Brawijaya dengan fitur IPK, sks tempuh, sks lulus dan semester. Berdasarkan hasil uji coba dataset akademik dengan variasi jumlah dimensi, jumlah klaster dan metode perhitungan jarak yang berbeda, telah didapatkan hasil pengklasteran data dengan benar. Berdasarkan hasil perhitungan kompleksitas waktu untuk tiap tahap implementasi K-means menggunakan SQL dan tanpa SQL, menunjukkan hasil kompleksitas waktu asimptotik yang sama dimana tahap menghitung euclidean distance membutuhkan kompleksitas waktu yang paling tinggi.Kata kunci: Clustering, K-means, SQL, IPK (Indeks Prestasi Kumulatif)AbstractGenerally, clustering implemented with taking data from database to be stored temporarily in a program variable (eg, in an array) then continue with clustering process. Direct clustering where the data is stored by integrating the clustering algorithm using the SQL language on the DBMS is proposed. In this study focused on the design and implementation of K-means clustering algorithm on a Relational DBMS using the SQL language. The clustering process carried out with a case study of GPA student in the Faculty of Computer Science University of Brawijaya. Based on results with a variety of dimensions, the number of clusters and different distance calculation methods, has obtained clustering data correctly. Based on time complexity to review each stage of the implementation K - means using SQL and without SQL, showing the same results of asymptotic time complexity where phase euclidean distance still requires the highest time complexity.Keywords: Clustering, K-means, SQL, GPA (Grade Point Average)AbstrakSecara umum, aplikasi klasterisasi diimplementasikan di luar DBMS dengan mengambil data terlebih dahulu dari basisdata untuk disimpan sementara dalam variabel program (misal dalam sebuah array), kemudian baru dilakukan proses klasterisasi. Permasalahan waktu dan keamanan dalam pengambilan data dari DBMS dan besarnya data yang akan diklasterisasi mendorong metode lain dimana proses klasterisasi bisa langsung dilakukan di DBMS. Klasterisasi dilakukan dengan mengintegrasikan algoritma klasterisasi pada DBMS menggunakan bahasa SQL. Pada penelitian ini difokuskan pada perancangan dan pengimplementasian integrasi algoritma klasterisasi K-means pada Relational DBMSdengan menggunakan bahasa SQL. Proses klasterisasi dilakukan dengan studi kasus data akademik mahasiswa di Fakultas Ilmu Komputer universitas Brawijaya dengan fitur IPK, sks tempuh, sks lulus dan semester. Berdasarkan hasil uji coba dataset akademik dengan variasi jumlah dimensi, jumlah klaster dan metode perhitungan jarak yang berbeda, telah didapatkan hasil pengklasteran data dengan benar. Berdasarkan hasil perhitungan kompleksitas waktu untuk tiap tahap implementasi K-means menggunakan SQL dan tanpa SQL, menunjukkan hasil kompleksitas waktu asimptotik yang sama dimana tahap menghitung euclidean distance membutuhkan kompleksitas waktu yang paling tinggi. Kata kunci: Clustering, K-means, SQL, IPK (Indeks Prestasi Kumulatif)Abstract Generally, clustering implemented with taking data from database to be stored temporarily in a program variable (eg, in an array) then continue with clustering process.Directclustering where the data is storedby integrating the clustering algorithm using the SQL language on the DBMS is proposed.In this study focused on the design and implementation of K-means clustering algorithm on a Relational DBMS using the SQL language. The clustering process carried out with a case study of GPA student in the Faculty of Computer Science University of Brawijaya.Based on results with a variety of dimensions, the number of clusters and different distance calculation methods, has obtained clustering data correctly. Based on time complexity to review each stage of the implementation K - means using SQL and without SQL, showing the same results of asymptotic time complexity where phase euclidean distance still requires the highest time complexity. Keywords: Clustering, K-means, SQL, GPA (Grade Point Average
    corecore