10 research outputs found

    Fuzzy-Rough Attribute Reduction with Application to Web Categorization

    Get PDF
    Due to the explosive growth of electronically stored information, automatic methods must be developed to aid users in maintaining and using this abundance of informa-tion eectively. In particular, the sheer volume of redundancy present must be dealt with, leaving only the information-rich data to be processed. This paper presents a novel approach, based on an integrated use of fuzzy and rough set theories, to greatly reduce this data redundancy. Formal concepts of fuzzy-rough attribute re-duction are introduced and illustrated with a simple example. The work is applied to the problem of web categorization, considerably reducing dimensionality with minimal loss of information. Experimental results show that fuzzy-rough reduction is more powerful than the conventional rough set-based approach. Classiers that use a lower dimensional set of attributes which are retained by fuzzy-rough reduc-tion outperform those that employ more attributes returned by the existing crisp rough reduction method.

    Combining rough and fuzzy sets for feature selection

    Get PDF

    Fuzzy Entropy-Assisted Fuzzy-Rough Feature Selection

    Get PDF
    Abstract — Feature Selection (FS) is a dimensionality reduction technique that aims to select a subset of the original features of a dataset which offer the most useful information. The benefits of feature selection include improved data visualisation, transparency, reduction in training and utilisation times and improved prediction performance. Methods based on fuzzy-rough set theory (FRFS) have employed the dependency function to guide the process with much success. This paper presents a novel fuzzy-rough FS technique which is guided by fuzzy entropy. The use of this measure in fuzzy-rough feature selection can result in smaller subset sizes than those obtained through FRFS alone, with little loss or even an increase in overall classification accuracy. I

    From fuzzy-rough to crisp feature selection

    Get PDF
    A central problem in machine learning and pattern recognition is the process of recognizing the most important features in a dataset. This process plays a decisive role in big data processing by reducing the size of datasets. One major drawback of existing feature selection methods is the high chance of redundant features appearing in the final subset, where in most cases, finding and removing them can greatly improve the resulting classification accuracy. To tackle this problem on two different fronts, we employed fuzzy-rough sets and perturbation theories. On one side, we used three strategies to improve the performance of fuzzy-rough set-based feature selection methods. The first strategy was to code both features and samples in one binary vector and use a shuffled frog leaping algorithm to choose the best combination using fuzzy dependency degree as the fitness function. In the second strategy, we designed a measure to evaluate features based on fuzzy-rough dependency degree in a fashion where redundant features are given less priority to be selected. In the last strategy, we designed a new binary version of the shuffled frog leaping algorithm that employs a fuzzy positive region as its similarity measure to work in complete harmony with the fitness function (i.e. fuzzy-rough dependency degree). To extend the applicability of fuzzy-rough set-based feature selection to multi-party medical datasets, we designed a privacy-preserving version of the original method. In addition, we studied the feasibility and applicability of perturbation theory to feature selection, which to the best of our knowledge has never been researched. We introduced a new feature selection based on perturbation theory that is not only capable of detecting and discarding redundant features but also is very fast and flexible in accommodating the special needs of the application. It employs a clustering algorithm to group likely-behaved features based on the sensitivity of each feature to perturbation, the angle of each feature to the outcome and the effect of removing each feature to the outcome, and it chooses the closest feature to the centre of each cluster and returns all those features as the final subset. To assess the effectiveness of the proposed methods, we compared the results of each method with well-known feature selection methods against a series of artificially generated datasets, and biological, medical and cancer datasets adopted from the University of California Irvine machine learning repository, Arizona State University repository and Gene Expression Omnibus repository

    Computational Optimizations for Machine Learning

    Get PDF
    The present book contains the 10 articles finally accepted for publication in the Special Issue “Computational Optimizations for Machine Learning” of the MDPI journal Mathematics, which cover a wide range of topics connected to the theory and applications of machine learning, neural networks and artificial intelligence. These topics include, among others, various types of machine learning classes, such as supervised, unsupervised and reinforcement learning, deep neural networks, convolutional neural networks, GANs, decision trees, linear regression, SVM, K-means clustering, Q-learning, temporal difference, deep adversarial networks and more. It is hoped that the book will be interesting and useful to those developing mathematical algorithms and applications in the domain of artificial intelligence and machine learning as well as for those having the appropriate mathematical background and willing to become familiar with recent advances of machine learning computational optimization mathematics, which has nowadays permeated into almost all sectors of human life and activity

    Variable precision rough set theory decision support system: With an application to bank rating prediction

    Get PDF
    This dissertation considers, the Variable Precision Rough Sets (VPRS) model, and its development within a comprehensive software package (decision support system), incorporating methods of re sampling and classifier aggregation. The concept of /-reduct aggregation is introduced, as a novel approach to classifier aggregation within the VPRS framework. The software is applied to the credit rating prediction problem, in particularly, a full exposition of the prediction and classification of Fitch's Individual Bank Strength Ratings (FIBRs), to a number of banks from around the world is presented. The ethos of the developed software was to rely heavily on a simple 'point and click' interface, designed to make a VPRS analysis accessible to an analyst, who is not necessarily an expert in the field of VPRS or decision rule based systems. The development of the software has also benefited from consultations with managers from one of Europe's leading hedge funds, who gave valuable insight, advice and recommendations on what they considered as pertinent issues with regards to data mining, and what they would like to see from a modern data mining system. The elements within the developed software reflect each stage of the knowledge discovery process, namely, pre-processing, feature selection, data mining, interpretation and evaluation. The developed software encompasses three software packages, a pre-processing package incorporating some of the latest pre-processing and feature selection methods a VPRS data mining package, based on a novel "vein graph" interface, which presents the analyst with selectable /-reducts over the domain of / and a third more advanced VPRS data mining package, which essentially automates the vein graph interface for incorporation into a re-sampling environment, and also implements the introduced aggregated /-reduct, developed to optimise and stabilise the predictive accuracy of a set of decision rules induced from the aggregated /-reduct

    A decentralised secure and privacy-preserving e-government system

    Get PDF
    Electronic Government (e-Government) digitises and innovates public services to businesses, citizens, agencies, employees and other shareholders by utilising Information and Communication Technologies. E-government systems inevitably involves finance, personal, security and other sensitive information, and therefore become the target of cyber attacks through various means, such as malware, spyware, virus, denial of service attacks (DoS), and distributed DoS (DDoS). Despite the protection measures, such as authentication, authorisation, encryption, and firewalls, existing e-Government systems such as websites and electronic identity management systems (eIDs) often face potential privacy issues, security vulnerabilities and suffer from single point of failure due to centralised services. This is getting more challenging along with the dramatically increasing users and usage of e-Government systems due to the proliferation of technologies such as smart cities, internet of things (IoTs), cloud computing and interconnected networks. Thus, there is a need of developing a decentralised secure e-Government system equipped with anomaly detection to enforce system reliability, security and privacy. This PhD work develops a decentralised secure and privacy-preserving e-Government system by innovatively using blockchain technology. Blockchain technology enables the implementation of highly secure and privacy preserving decentralised applications where information is not under the control of any centralised third party. The developed secure and decentralised e-Government system is based on the consortium type of blockchain technology, which is a semi-public and decentralised blockchain system consisting of a group of pre-selected entities or organisations in charge of consensus and decisions making for the benefit of the whole network of peers. Ethereum blockchain solution was used in this project to simulate and validate the proposed system since it is open source and supports off-chain data storage such as images, PDFs, DOCs, contracts, and other files that are too large to be stored in the blockchain or that are required to be deleted or changed in the future, which are essential part of e-Government systems. This PhD work also develops an intrusion detection system (IDS) based on the Dendritic cell algorithm (DCA) for detecting unwanted internal and external traffics to support the proposed blockchain-based e-Government system, because the blockchain database is append-only and immutable. The IDS effectively prevent unwanted transactions such as virus, malware or spyware from being added to the blockchain-based e-Government network. Briefly, the DCA is a class of artificial immune systems (AIS) which was introduce for anomaly detection in computer networks and has beneficial properties such as self-organisation, scalability, decentralised control and adaptability. Three significant improvements have been implemented for DCA-based IDS. Firstly, a new parameters optimisation approach for the DCA is implemented by using the Genetic algorithm (GA). Secondly, fuzzy inference systems approach is developed to solve nonlinear relationship that exist between features during the pre processing stage of the DCA so as to further enhance its anomaly detection performance in e-Government systems. In addition, a multiclass DCA capable of detection multiple attacks is developed in this project, given that the original DCA is a binary classifier and many practical classification problems including computer network intrusion detection datasets are often associated with multiple classes. The effectiveness of the proposed approaches in enforcing security and privacy in e- Government systems are demonstrated through three real-world applications: privacy and integrity protection of information in e Government systems, internal threats detection, and external threats detection. Privacy and integrity protection of information in the proposed e- Government systems is provided by using encryption and validation mechanism offered by the blockchain technology. Experiments demonstrated the performance of the proposed system, and thus its suitability in enhancing security and privacy of information in e-Government systems. The applicability and performance of the DCA-based IDS in e Government systems were examined by using publicly accessible insider and external threat datasets with real world attacks. The results show that, the proposed system can mitigate insider and external threats in e-Government systems whilst simultaneously preserving information security and privacy. The proposed system also could potentially increase the trust and accountability of public sectors due to the transparency and efficiency which are offered by the blockchain applications
    corecore