227,073 research outputs found

    Semantics-Preserving Dimensionality Reduction: Rough and Fuzzy-Rough-Based Approaches

    Get PDF
    Abstract—Semantics-preserving dimensionality reduction refers to the problem of selecting those input features that are most predictive of a given outcome; a problem encountered in many areas such as machine learning, pattern recognition, and signal processing. This has found successful application in tasks that involve data sets containing huge numbers of features (in the order of tens of thousands), which would be impossible to process further. Recent examples include text processing and Web content classification. One of the many successful applications of rough set theory has been to this feature selection area. This paper reviews those techniques that preserve the underlying semantics of the data, using crisp and fuzzy rough set-based methodologies. Several approaches to feature selection based on rough set theory are experimentally compared. Additionally, a new area in feature selection, feature grouping, is highlighted and a rough set-based feature grouping technique is detailed. Index Terms—Dimensionality reduction, feature selection, feature transformation, rough selection, fuzzy-rough selection.

    Analisis dan Implementasi Feature Selection dengan Perpaduan Metode Raough Sets, MLRelevance Criterion, dan PRelevance Criterion

    Get PDF
    ABSTRAKSI: Perkembangan teknologi memacu timbulnya keberagaman data didalamnya. Sedangkan data adalah sumber informasi yang sangat penting. Untuk dapat mengolah data-data tersebut terdapat teknik yang sekarang diimplementasikan, yaitu Knowledge Discovery in Database (KDD). Di dalam KDD terdapat proses data mining yang berutujuan untuk menggali informasi dari data yang ada. Salah satunya dengan cara klasifikasi. Akan tetapi dengan keberagaman data tidak ada jaminan bahwa data itu siap diolah. Contohnya adalah dimenensi data yang begitu besar, hal ini akan menyulitkan dalam proses klasifikasi. Maka dari itu dilakukanlah preprocessing terlebih dahulu.Preprocessing adalah tahap dimana untuk menyiapkan data agar seefisien mungkin dan terhindar dari noise, missing value, irrelevant feature, redundant feature dll, sehingga diharapkan akan memberikan hasil yang lebih optimal dalam melakukan klasifikasi. Di dalam preprocessing, terdapat salah satu teknik yaitu feature selection. Teknik ini digunakan untuk mengurangi dimensi data atau feature yang dianggap kurang relevan terhadap pemebentukan kelas.Tugas Akhir ini membahas serta mengimplementasikan teknik feature selection dengan menggunakan metode Rough Sets Theory yang dipadukan dengan MLRelevance Criterion dan PRelevance Criterion. Hasil dari feautre selection dengan menggunakan metode itu, mampu memprediksi feature yang paling relevan. Sehingga tingkat akurasi yang didapatkan mampu mengimbangi precission, recall dan accuracy sebelum dilakukan feature selection.Kata Kunci : Data mining, preprocessing, klasifikasi, Rough Set, feature selection, variable selection.ABSTRACT: Advance in technology leads to the emergence of data diversity. Data is an important information source. In order to process the data, there are techniques which can be implemented which is Knowledge Discovery in Database(KDD). In KDD, there are data mining processes to mine information from data. One of the process is classification. Nonetheless, data diversity doesn\u27t guarantee that data is ready to be processed. For example, large data dimention is going to make it difficult for classification task. So, preprocessing must be done.Preprocessing is a step for preparing data so that the data is efficiently clean from noise, missing value, irrelevant feature, redundant feature, etc thus it will provide optimal result in classification task. In preprocessing, one of the most common method is feature selection.This thesis discuss and implement how to apply the feature selection technique using Rough Sets Theory combined with MLRelevance Criterion and PRelevance Criterion is. Results from feautre selection by using that method, capable of predicting the most relevant feature. So that the level of accuracy obtained able to offset, precission, recall and accuracy prior to feature selection.Keyword: Data mining, preprocessing, classification, Rough Set, feature selection,variable selection

    An improved moth flame optimization algorithm based on rough sets for tomato diseases detection

    Get PDF
    Plant diseases is one of the major bottlenecks in agricultural production that have bad effects on the economic of any country. Automatic detection of such disease could minimize these effects. Features selection is a usual pre-processing step used for automatic disease detection systems. It is an important process for detecting and eliminating noisy, irrelevant, and redundant data. Thus, it could lead to improve the detection performance. In this paper, an improved moth-flame approach to automatically detect tomato diseases was proposed. The moth-flame fitness function depends on the rough sets dependency degree and it takes into a consideration the number of selected features. The proposed algorithm used both of the power of exploration of the moth flame and the high performance of rough sets for the feature selection task to find the set of features maximizing the classification accuracy which was evaluated using the support vector machine (SVM). The performance of the MFORSFS algorithm was evaluated using many benchmark datasets taken from UCI machine learning data repository and then compared with feature selection approaches based on Particle Swarm Optimization (PSO) and Genetic Algorithms (GA) with rough sets. The proposed algorithm was then used in a real-life problem, detecting tomato diseases (Powdery mildew and early blight) where a real dataset of tomato disease were manually built and a tomato disease detection approach was proposed and evaluated using this dataset. The experimental results showed that the proposed algorithm was efficient in terms of Recall, Precision, Accuracy and F-Score, as long as feature size reduction and execution time

    FEATURE SELECTION APPLIED TO THE TIME-FREQUENCY REPRESENTATION OF MUSCLE NEAR-INFRARED SPECTROSCOPY (NIRS) SIGNALS: CHARACTERIZATION OF DIABETIC OXYGENATION PATTERNS

    Get PDF
    Diabetic patients might present peripheral microcirculation impairment and might benefit from physical training. Thirty-nine diabetic patients underwent the monitoring of the tibialis anterior muscle oxygenation during a series of voluntary ankle flexo-extensions by near-infrared spectroscopy (NIRS). NIRS signals were acquired before and after training protocols. Sixteen control subjects were tested with the same protocol. Time-frequency distributions of the Cohen's class were used to process the NIRS signals relative to the concentration changes of oxygenated and reduced hemoglobin. A total of 24 variables were measured for each subject and the most discriminative were selected by using four feature selection algorithms: QuickReduct, Genetic Rough-Set Attribute Reduction, Ant Rough-Set Attribute Reduction, and traditional ANOVA. Artificial neural networks were used to validate the discriminative power of the selected features. Results showed that different algorithms extracted different sets of variables, but all the combinations were discriminative. The best classification accuracy was about 70%. The oxygenation variables were selected when comparing controls to diabetic patients or diabetic patients before and after training. This preliminary study showed the importance of feature selection techniques in NIRS assessment of diabetic peripheral vascular impairmen

    A Scalable and Effective Rough Set Theory based Approach for Big Data Pre-processing

    Get PDF
    International audienceA big challenge in the knowledge discovery process is to perform data pre-processing, specifically feature selection, on a large amount of data and high dimensional attribute set. A variety of techniques have been proposed in the literature to deal with this challenge with different degrees of success as most of these techniques need further information about the given input data for thresholding, need to specify noise levels or use some feature ranking procedures. To overcome these limitations, rough set theory (RST) can be used to discover the dependency within the data and reduce the number of attributes enclosed in an input data set while using the data alone and requiring no supplementary information. However, when it comes to massive data sets, RST reaches its limits as it is highly computationally expensive. In this paper, we propose a scalable and effective rough set theory-based approach for large-scale data pre-processing, specifically for feature selection, under the Spark framework. In our detailed experiments, data sets with up to 10,000 attributes have been considered, revealing that our proposed solution achieves a good speedup and performs its feature selection task well without sacrificing performance. Thus, making it relevant to big data

    Significant Feature Selection Method for Health Domain using Computational Intelligence- A Case Study for Heart Disease

    Get PDF
    In the medical field, the diagnosing of cardiovascular disease is that the most troublesome task. The diagnosis of heart disease is difficult as a decision relied on grouping of large clinical and pathological data. Due to this complication, the interest increased in a very vital quantity between the researchers and clinical professionals regarding the economical and correct heart disease prediction. In case of heart disease, the correct diagnosis in early stage is important as time is the very important factor. Heart disease is the principal supply of deaths widespread, and the prediction of Heart Disease is significant at an untimely phase. Machine learning in recent years has been the evolving, reliable and supporting tools in medical domain and has provided the best support for predicting disease with correct case of training and testing. The main idea behind this work is to find relevant heart disease feature among the large number of feature using rough computational Intelligence approach. The proposed feature selection approach performance is better than traditional feature selection approaches. The performances of the rough computation approach is tested with different heart disease data sets and validated with real-time data sets

    Approximation-based feature selection and application for algae population estimation

    Get PDF
    This paper presents a data-driven approach for feature selection to address the common problem of dealing with high-dimensional data. This approach is able to handle the real-valued nature of the domain features, unlike many existing approaches. This is accomplished through the use of fuzzy-rough approximations. The paper demonstrates the effectiveness of this research by proposing an estimator of algae populations, a system that approximates, given certain water characteristics, the size of algae populations. This estimator significantly reduces computer time and space requirements, decreases the cost of obtaining measurements and increases runtime efficiency, making itself more viable economically. By retaining only information required for the estimation task, the system offers higher accuracy than conventional estimators. Finally, the system does not alter the domain semantics, making any distilled knowledge human-readable. The paper describes the problem domain, architecture and operation of the system, and provides and discusses detailed experimentation. The results show that algae estimators using a fuzzy-rough feature selection step produce more accurate predictions of algae populations in general. Keywords Feature evaluation and selection; Data-driven knowledge acquisition; Classification; Fuzzy-rough sets; Algae population estimation.

    A Noise-tolerant Approach to Fuzzy-Rough Feature Selection

    Get PDF
    In rough set based feature selection, the goal is to omit attributes (features) from decision systems such that objects in different decision classes can still be discerned. A popular way to evaluate attribute subsets with respect to this criterion is based on the notion of dependency degree. In the standard approach, attributes are expected to be qualitative; in the presence of quantitative attributes, the methodology can be generalized using fuzzy rough sets, to handle gradual (in)discernibility between attribute values more naturally. However, both the extended approach, as well as its crisp counterpart, exhibit a strong sensitivity to noise: a change in a single object may significantly influence the outcome of the reduction procedure. Therefore, in this paper, we consider a more flexible methodology based on the recently introduced Vaguely Quantified Rough Set (VQRS) model. The method can handle both crisp (discrete-valued) and fuzzy (real-valued) data, and encapsulates the existing noise-tolerant data reduction approach using Variable Precision Rough Sets (VPRS), as well as the traditional rough set model, as special cases

    An Intelligent Agent Based Intrusion Detection System Using Fuzzy Rough Set Based Outlier Detection

    Get PDF
    Since existing Intrusion Detection Systems (IDS) including misuse detection and anomoly detection are generally incapable of detecting new type of attacks. However, all these systems are capable of detecting intruders with high false alarm rate. It is an urgent need to develop IDS with very high Detection rate and with low False alarm rate. To satisfy this need we propose a new intelligent agent based IDS using Fuzzy Rough Set based outlier detection and Fuzzy Rough set based SVM. In this proposed model we intorduced two different inteligent agents namely feature selection agent to select the required feature set using fuzzy rough sets and decision making agent manager for making final decision. Moreover, we have introduced fuzzy rough set based outlier detection algorithm to detect outliers. We have also adopted Fuzzy Rough based SVM in our system to classify and detect anomalies efficiently. Finally, we have used KDD Cup 99 data set for our experiment, the experimental result show that the proposed intelligent agent based model improves the overall accuracy and reduces the false alarm rate

    The rough sets feature selection for trees recognition in color aerial images using genetic algorithms

    Full text link
    Selecting a set of features which is optimal for a given task is the problem which plays an important role in a wide variety of contexts including pattern recognition, images understanding and machine learning. The concept of reduction of the decision table based on the rough set is very useful for feature selection. In this paper, a genetic algorithm based approach is presented to search the relative reduct decision table of the rough set. This approach has the ability to accommodate multiple criteria such as accuracy and cost of classification into the feature selection process and finds the effective feature subset for texture classification . On the basis of the effective feature subset selected, this paper presents a method to extract the objects which are higher than their surroundings, such as trees or forest, in the color aerial images. The experiments results show that the feature subset selected and the method of the object extraction presented in this paper are practical and effective.<br /
    • …
    corecore