175,402 research outputs found

    Survey: Data Mining Techniques in Medical Data Field

    Get PDF
    Now days most of the research area are working on data mining techniques in medical data. Knowledge discovery and data mining have found numerous applications in business and scientific domain. Valuable knowledge can be discovered from application of data mining techniques in healthcare system. In this study, we briefly examine the potential use of classification based data mining techniques such as Rule based, decision tree, machine learning algorithms like Support Vector Machines, Principle Component Analysis etc., Rough Set Theory and Fuzzy logic. In particular we consider a case study using classification techniques on a medical data set of diabetic patients

    Predicting Accuracy of Income a Year Using Rough Set Theory

    Get PDF
    The main objective of the experiments is to predict the accuracy of Adult dataset whether the income exceeds 50Kperyearorbelow50K per year or below 50K. Specifically, the objectives are to determine the best discretization method, split factor, reduction method, classifier and to build the classification model. In the experiments, the prediction of accuracy of the Adult dataset is developed by using rough set theory and Rosetta software while Knowledge Data Discovery (KDD) is used as the methodology. The Adult dataset that had been used in the experiments is comprises of 48,842 instances but only 24,999 instances is used along the experiments. Then, the data was randomly split into training data and testing data by using nine splits factor, which are 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8 and 0.9. The result obtained from the experiments showed that the best discretization method is Naive Algorithm, the best split factor is 0.6, the best reduction method is Johnson's Algorithm and the best classifier is Standard Voting. The highest percentage of accuracy achieved by the classification model developed using the rough set theory is 87.12%. The experiments showed that rough set theory is a useful approach to analyze the Adult dataset because the accuracy achieved in the experiments exceeds the previous methods that have been used before

    Credibility coefficients based on frequent sets

    Get PDF
    Credibility coefficients are heuristic measures applied to objects of information system. Credibility coefficients were introduced to assess similarity of objects in respect to other data in information systems or decision tables. By applying knowledge discovery methods it is possible to gain some rules and dependencies between data. However the knowledge obtained from the data can be corrupted or incomplete due to improper data. Hence identification of these exceptions cannot be overestimated. It is assumed that majority of data is correct and only a minor part may be improper. Credibility coefficients of objects should indicate to which group a particular object probably belongs. A main focus of the paper is set on an algorithm of calculating credibility coefficients. This algorithm is based on frequent sets, which are produced while using data analysis based on the rough set theory. Some information on the rough set theory is supplied to enable expression of credibility coefficient formulas. Implementation and applications of credibility coefficients are presented in the paper. Discussion of some practical results of identifying improper data by credibility coefficients is inserted as well

    Matroidal and Lattices Structures of Rough Sets and Some of Their Topological Characterizations

    Get PDF
    Matroids, rough set theory and lattices are efficient tools of knowledge discovery. Lattices and matroids are studied on preapproximations spaces. Li et al. proved that a lattice is Boolean if it is clopen set lattice for matroids. In our study, a lattice is Boolean if it is closed for matroids. Moreover, a topological lattice is discussed using its matroidal structure. Atoms in a complete atomic Boolean lattice are completely determined through its topological structure. Finally, a necessary and sufficient condition for a predefinable set is proved in preapproximation spaces. The value k for a predefinable set in lattice of matroidal closed sets is determined

    Matroidal and Lattices Structures of Rough Sets and Some of Their Topological Characterizations

    Get PDF
    Matroids, rough set theory and lattices are efficient tools of knowledge discovery. Lattices and matroids are studied on preapproximations spaces. Li et al. proved that a lattice is Boolean if it is clopen set lattice for matroids. In our study, a lattice is Boolean if it is closed for matroids. Moreover, a topological lattice is discussed using its matroidal structure. Atoms in a complete atomic Boolean lattice are completely determined through its topological structure. Finally, a necessary and sufficient condition for a predefinable set is proved in preapproximation spaces. The value k for a predefinable set in lattice of matroidal closed sets is determined

    Knowledge structure, knowledge granulation and knowledge distance in a knowledge base

    Get PDF
    AbstractOne of the strengths of rough set theory is the fact that an unknown target concept can be approximately characterized by existing knowledge structures in a knowledge base. Knowledge structures in knowledge bases have two categories: complete and incomplete. In this paper, through uniformly expressing these two kinds of knowledge structures, we first address four operators on a knowledge base, which are adequate for generating new knowledge structures through using known knowledge structures. Then, an axiom definition of knowledge granulation in knowledge bases is presented, under which some existing knowledge granulations become its special forms. Finally, we introduce the concept of a knowledge distance for calculating the difference between two knowledge structures in the same knowledge base. Noting that the knowledge distance satisfies the three properties of a distance space on all knowledge structures induced by a given universe. These results will be very helpful for knowledge discovery from knowledge bases and significant for establishing a framework of granular computing in knowledge bases

    A Scalable and Effective Rough Set Theory based Approach for Big Data Pre-processing

    Get PDF
    International audienceA big challenge in the knowledge discovery process is to perform data pre-processing, specifically feature selection, on a large amount of data and high dimensional attribute set. A variety of techniques have been proposed in the literature to deal with this challenge with different degrees of success as most of these techniques need further information about the given input data for thresholding, need to specify noise levels or use some feature ranking procedures. To overcome these limitations, rough set theory (RST) can be used to discover the dependency within the data and reduce the number of attributes enclosed in an input data set while using the data alone and requiring no supplementary information. However, when it comes to massive data sets, RST reaches its limits as it is highly computationally expensive. In this paper, we propose a scalable and effective rough set theory-based approach for large-scale data pre-processing, specifically for feature selection, under the Spark framework. In our detailed experiments, data sets with up to 10,000 attributes have been considered, revealing that our proposed solution achieves a good speedup and performs its feature selection task well without sacrificing performance. Thus, making it relevant to big data

    Analisis dan Implementasi Feature Selection dengan Perpaduan Metode Raough Sets, MLRelevance Criterion, dan PRelevance Criterion

    Get PDF
    ABSTRAKSI: Perkembangan teknologi memacu timbulnya keberagaman data didalamnya. Sedangkan data adalah sumber informasi yang sangat penting. Untuk dapat mengolah data-data tersebut terdapat teknik yang sekarang diimplementasikan, yaitu Knowledge Discovery in Database (KDD). Di dalam KDD terdapat proses data mining yang berutujuan untuk menggali informasi dari data yang ada. Salah satunya dengan cara klasifikasi. Akan tetapi dengan keberagaman data tidak ada jaminan bahwa data itu siap diolah. Contohnya adalah dimenensi data yang begitu besar, hal ini akan menyulitkan dalam proses klasifikasi. Maka dari itu dilakukanlah preprocessing terlebih dahulu.Preprocessing adalah tahap dimana untuk menyiapkan data agar seefisien mungkin dan terhindar dari noise, missing value, irrelevant feature, redundant feature dll, sehingga diharapkan akan memberikan hasil yang lebih optimal dalam melakukan klasifikasi. Di dalam preprocessing, terdapat salah satu teknik yaitu feature selection. Teknik ini digunakan untuk mengurangi dimensi data atau feature yang dianggap kurang relevan terhadap pemebentukan kelas.Tugas Akhir ini membahas serta mengimplementasikan teknik feature selection dengan menggunakan metode Rough Sets Theory yang dipadukan dengan MLRelevance Criterion dan PRelevance Criterion. Hasil dari feautre selection dengan menggunakan metode itu, mampu memprediksi feature yang paling relevan. Sehingga tingkat akurasi yang didapatkan mampu mengimbangi precission, recall dan accuracy sebelum dilakukan feature selection.Kata Kunci : Data mining, preprocessing, klasifikasi, Rough Set, feature selection, variable selection.ABSTRACT: Advance in technology leads to the emergence of data diversity. Data is an important information source. In order to process the data, there are techniques which can be implemented which is Knowledge Discovery in Database(KDD). In KDD, there are data mining processes to mine information from data. One of the process is classification. Nonetheless, data diversity doesn\u27t guarantee that data is ready to be processed. For example, large data dimention is going to make it difficult for classification task. So, preprocessing must be done.Preprocessing is a step for preparing data so that the data is efficiently clean from noise, missing value, irrelevant feature, redundant feature, etc thus it will provide optimal result in classification task. In preprocessing, one of the most common method is feature selection.This thesis discuss and implement how to apply the feature selection technique using Rough Sets Theory combined with MLRelevance Criterion and PRelevance Criterion is. Results from feautre selection by using that method, capable of predicting the most relevant feature. So that the level of accuracy obtained able to offset, precission, recall and accuracy prior to feature selection.Keyword: Data mining, preprocessing, classification, Rough Set, feature selection,variable selection

    Kaba küme tabanlı çok kriterli karar verme yöntemi ve uygulaması

    Get PDF
    06.03.2018 tarihli ve 30352 sayılı Resmi Gazetede yayımlanan “Yükseköğretim Kanunu İle Bazı Kanun Ve Kanun Hükmünde Kararnamelerde Değişiklik Yapılması Hakkında Kanun” ile 18.06.2018 tarihli “Lisansüstü Tezlerin Elektronik Ortamda Toplanması, Düzenlenmesi ve Erişime Açılmasına İlişkin Yönerge” gereğince tam metin erişime açılmıştır.Çok kriterli karar verme problemi, çağımız yöneticilerinin sıklıkla başvurmuş olduğu yöntemlerden birisidir. Verilerin belirsiz ya da eksik olması durumunda, mevcut olan çok kriterli karar verme yöntemleri yetersiz kalırken, önermiş olduğumuz kaba küme tabanlı çok kriterli karar verme algoritması, bu eksikliği gidermede en büyük yardımcı olarak karşımıza çıkmaktadır. Bununla birlikte, hızla artan veri trafiğinde, mevcut verilerin verimli bir şekilde kullanılması da beraberinde önemli bir durumu ortaya çıkartmaktadır. 1982 yılında ilk olarak Pawlak[1] tarafından önerilen kaba küme kavramı, büyük veri tabanlarını kullanarak gerekli olan bilginin keşfini sağlayan önemli bir araç olarak kullanılmaktadır. Kaba küme kavramı, çok kriterli karar verme problemlerinde kullanılmak üzere, kesin olmayan yapıların analizi için bulanık mantık yaklaşımından türetilmiştir. Kaba küme teorisi, kural indirgeme ve sınıflandırma yaklaşım özellikleri ile büyük verilerin analiz işleminin yanı sıra çok kriterli karar verme problemlerinde de kullanılabilmektedir. Kaba küme teorisi bulanık küme teorisinin bir alt kolu olarak geliştirilmiştir. Eksik, belirsiz verilerin değerlendirilmesi sürecinde, alt ve üst yaklaşımlar kullanılarak, veriler analiz edilmektedir. Bulanık kümeler gibi kesin sınırlamaları içermeyen bir yapıya sahiptir. Eksik bilgi analizi, bilgi tabanı indirgemesi yöntemleri kullanılarak, verilerdeki belirsizlik en aza indirgenmeye çalışılmaktadır. Tutarsız, eksik bilgi içeren veri yapılarından kural çıkarımı ve sınıflandırma konusunda kaba küme teorisi ilerleyen zamanlarda daha fazla tercih edilecek bir yöntem olarak çıkabilecektir. Bu çalışmada kaba kümeleme teorisine ait temel kavramlar kaba küme tabanlı bilgi keşfi ve kaba küme kavramı dikkate alınarak geliştirilen algoritma ile birlikte, çok kriterli karar verme probleminin çözümüne yönelik algoritma geliştirilmiştir ve diğer ÇKKV algoritmaları ile karşılaştırılmıştır. Anahtar kelimeler:Kaba Küme Teorisi, Çok Kriterli Karar Verme EntropiThe multi-criteria decision-making problem is one of the methods that preffered and applied by the managers. Multi criteria decision making data set may include the uncertain or incomplete data, in this situation, decision is getting difficult and impossible, the suggested rough set based multi criteria decision making algorithm can able to solve this manner problem. However, in the rapidly increasing data traffic, the efficient use of existing data also brings about an important situation. The rough set concept firstly proposed by Pawlak in 1982[1] that is used as an important tool for the discovery of the necessary information by using large databases. In the case of multi-criteria decision-making problems, the concept of rough set theory is derived from the fuzzy logic approach to perform the analysis of uncertain structures. The rough set theory also has the property of being able to be used in multi-criteria decision-making problems with the rules of rule reduction and classification during the analysis of large data. Rough set theory has a structure that does not contain definite limitations, such as fuzzy sets. Therefore, the rough set approach can able to analysis of the incomplete, inadequate and ambiguous information suitable for data analysis, uses incomplete information analysis, knowledge base reduction methods during this process. Rough set theory can be used as a natural method that deals with inconsistent and incomplete information, which is the basic problem of rule extraction and classification. In this study, the basic concepts of rough set theory is given. The algorithm for solving multi-criteria decision making has been developed by considering the rough set based knowledge discovery and rough set concept. Keywords: Rough Set Theory, Multi Criteria Decision Making Entrop

    Making implicit knowledge of distance protective relay operations and fault characteristics explicit via rough set based discernibility relationship

    Get PDF
    This paper discusses the novel application of the discernibility concept inherent in rough set theory in making explicit of the implicit knowledge of distance protective relay operations and fault characteristics that are hidden away in the recorded relay event report. A rough-set-based data mining strategy is formulated to analyze the relay trip assertion, impedance element activation, and fault characteristics of distance relay decision system. Using rough set theory, the uncertainty and vagueness in the relay event report can be resolved using the concepts of discernibility, elementary sets and set approximations. Nowadays protection engineers are suffering from very complex implementations of protection system analysis due to massive quantities of data coming from diverse points of intelligent electronic devices (IEDs such as digital protective relays, digital fault recorders, SCADA's remote terminal units, sequence of event recorders, circuit breakers, fault locators and IEDs specially used for variety of monitoring and control applications). To help the protection engineers come to term with the crucial necessity and benefit of protection system analysis without the arduous dealing of overwhelming data, using recorded data resident in digital protective relays alone in an automated approach called knowledge discovery in database (KDD) is certainly of an immense help in their protection operation analysis tasks. Digital protective relay, instead of a host of other intelligent electronic devices, is the only device for analysis in this work because it sufficiently provides virtually most attributes needed for data mining process in KDD. Unlike some artificial intelligence aproaches like artificial nueral network and decision tree in which the data mining analysis is "population-based" and single since it is common to the entire population of training data set, the rough set approach adopts an "individually-event-based" paradigm in which detailed time tracking analysis of relay operation has been successfully performed
    corecore