264 research outputs found

    Dealing with imbalanced and weakly labelled data in machine learning using fuzzy and rough set methods

    Get PDF

    Extension of the fuzzy dominance-based rough set approach using ordered weighted average operators

    Get PDF
    In the article we rst review some known results on fuzzy versions of the dominance-based rough set approach (DRSA) where we expand the theory considering additional properties. Also, we apply Ordinal Weighted Average (OWA) operators in fuzzy DRSA. OWA operators have shown a lot of potential in handling outliers and noisy data in decision tables when it is combined with the indiscernibility-based rough set approach (IRSA).We examine theoretical properties of the proposed combination with fuzzy DRSA

    Perbandingan Performa Teknik Sampling Data untuk Klasifikasi Pasien Terinfeksi Covid-19 Menggunakan Rontgen Dada

    Get PDF
    The COVID-19 virus became a virus that was deadly and shocked the world. One of the consequences caused by the COVID-19 virus is a respiratory infection. The solution put forward for this problem is with a prediction of the COVID-19 virus infection. This prediction was made based on the classification of chest X-ray data. One challenging issue in this field is the imbalance on the amount of data between infected chest X-rays and uninfected chest X-rays. The result of imbalanced data is data classification that ignores classes with fewer data. To overcome this problem, the data sampling technique becomes a mechanism to make the data balanced. For this reason, several data sampling techniques will be evaluated in this study. Data sampling techniques include Random Undersampling (RUS), Random Oversampling (ROS), Combination of Over-Undersampling (COUS), Synthetic Minority Over-sampling Technique (SMOTE), and Tomek Link (T-Link). This study also uses the Support Vector Machines (SVM) data classification, because it has high accuracy. Furthermore, the evaluation is carried out by selecting the highest accuracy and Area Under Curve (AUC). The best sampling technique found was SMOTE with an accuracy value of 99% and an AUC value of 99.32%. The SMOTE technique is the best data sampling technique for the classification of COVID-19 chest x-ray data.Virus COVID-19 menjadi virus yang mematikan dan menggemparkan dunia. Salah satu akibat yang ditimbulkan oleh virus COVID-19 adalah infeksi saluran pernapasan. Solusi yang diajukan untuk masalah ini adalah dengan prediksi infeksi virus COVID-19. Prediksi ini dibuat berdasarkan klasifikasi data rontgen dada. Namun, jumlah data rontgen dada adalah data yang tidak seimbang. Hasil dari ketidakseimbangan data adalah klasifikasi data yang mengabaikan kelas dengan data yang lebih sedikit. Untuk mengatasi masalah tersebut maka teknik pengambilan sampel data menjadi mekanisme untuk membuat data menjadi seimbang. Untuk itu, beberapa teknik pengambilan sampel data akan dievaluasi dalam penelitian ini. Teknik pengambilan sampel data antara lain Random Undersampling (RUS), Random Oversampling (ROS), Combination of Over-Undersampling (COUS), Synthetic Minority Over-sampling Technique (SMOTE), dan Tomek Link (T-Link). Penelitian ini juga menggunakan klasifikasi data Support Vector Machines (SVM), karena memiliki akurasi yang tinggi. Selanjutnya evaluasi dilakukan dengan memilih akurasi dan Area Under Curve (AUC) tertinggi . Teknik pengambilan sampel terbaik yang ditemukan adalah SMOTE dengan nilai akurasi 99% dan nilai AUC 99.32%. Teknik SMOTE merupakan teknik pengambilan sampel data terbaik untuk klasifikasi data rontgen dada COVID-19

    Neural Network Models for Assessing the Financial Condition of Enterprises for Supply Chain

    Get PDF
    The paper deals with the task of assessing the financial condition of enterprises. To solve it, we prove the necessity of building a neural network model for supply chain. A set of financial ratios is defined as the input parameters of the model: the current liquidity ratio of the enterprise, the equity ratio, the equity turnover ratio, and the return on equity ratio. The output parameters were the types of the financial condition of enterprises: an unstable state (regression), a normal state (stable) and an absolutely stable state (progression). The volume of input data for building neural network models for assessing the financial condition of enterprises amounted to 210 records. The construction and evaluation of the effectiveness of neural network models are based on the analytical platform Deductor. There have been built 32 modifications of neural network models with different architectures and trained with different samples formed randomly from the source data. To assess the effectiveness of the models built, a technique has been developed, which includes the stages of testing neural networks, evaluating their accuracy and average classification error taking into account weighting factors assigned by an expert. The results of calculations of errors of the first and second type for each financial condition, as well as the average total classification error,  are presented. The best model with a minimum average classification error, which is a single-layer perceptron with 10 hidden neurons, was chosen. The classification accuracy of the model was about 98%. The neural network model is adequate and can be effectively used to solve the problem of assessing the financial condition of enterprises

    Fractal feature selection model for enhancing high-dimensional biological problems

    Get PDF
    The integration of biology, computer science, and statistics has given rise to the interdisciplinary field of bioinformatics, which aims to decode biological intricacies. It produces extensive and diverse features, presenting an enormous challenge in classifying bioinformatic problems. Therefore, an intelligent bioinformatics classification system must select the most relevant features to enhance machine learning performance. This paper proposes a feature selection model based on the fractal concept to improve the performance of intelligent systems in classifying high-dimensional biological problems. The proposed fractal feature selection (FFS) model divides features into blocks, measures the similarity between blocks using root mean square error (RMSE), and determines the importance of features based on low RMSE. The proposed FFS is tested and evaluated over ten high-dimensional bioinformatics datasets. The experiment results showed that the model significantly improved machine learning accuracy. The average accuracy rate was 79% with full features in machine learning algorithms, while FFS delivered promising results with an accuracy rate of 94%

    Autoencoder for clinical data analysis and classification : data imputation, dimensional reduction, and pattern recognition

    Get PDF
    Over the last decade, research has focused on machine learning and data mining to develop frameworks that can improve data analysis and output performance; to build accurate decision support systems that benefit from real-life datasets. This leads to the field of clinical data analysis, which has attracted a significant amount of interest in the computing, information systems, and medical fields. To create and develop models by machine learning algorithms, there is a need for a particular type of data for the existing algorithms to build an efficient model. Clinical datasets pose several issues that can affect the classification of the dataset: missing values, high dimensionality, and class imbalance. In order to build a framework for mining the data, it is necessary first to preprocess data, by eliminating patients’ records that have too many missing values, imputing missing values, addressing high dimensionality, and classifying the data for decision support.This thesis investigates a real clinical dataset to solve their challenges. Autoencoder is employed as a tool that can compress data mining methodology, by extracting features and classifying data in one model. The first step in data mining methodology is to impute missing values, so several imputation methods are analysed and employed. Then high dimensionality is demonstrated and used to discard irrelevant and redundant features, in order to improve prediction accuracy and reduce computational complexity. Class imbalance is manipulated to investigate the effect on feature selection algorithms and classification algorithms.The first stage of analysis is to investigate the role of the missing values. Results found that techniques based on class separation will outperform other techniques in predictive ability. The next stage is to investigate the high dimensionality and a class imbalance. However it was found a small set of features that can improve the classification performance, the balancing class does not affect the performance as much as imbalance class

    Computer vision based classification of fruits and vegetables for self-checkout at supermarkets

    Get PDF
    The field of machine learning, and, in particular, methods to improve the capability of machines to perform a wider variety of generalised tasks are among the most rapidly growing research areas in today’s world. The current applications of machine learning and artificial intelligence can be divided into many significant fields namely computer vision, data sciences, real time analytics and Natural Language Processing (NLP). All these applications are being used to help computer based systems to operate more usefully in everyday contexts. Computer vision research is currently active in a wide range of areas such as the development of autonomous vehicles, object recognition, Content Based Image Retrieval (CBIR), image segmentation and terrestrial analysis from space (i.e. crop estimation). Despite significant prior research, the area of object recognition still has many topics to be explored. This PhD thesis focuses on using advanced machine learning approaches to enable the automated recognition of fresh produce (i.e. fruits and vegetables) at supermarket self-checkouts. This type of complex classification task is one of the most recently emerging applications of advanced computer vision approaches and is a productive research topic in this field due to the limited means of representing the features and machine learning techniques for classification. Fruits and vegetables offer significant inter and intra class variance in weight, shape, size, colour and texture which makes the classification challenging. The applications of effective fruit and vegetable classification have significant importance in daily life e.g. crop estimation, fruit classification, robotic harvesting, fruit quality assessment, etc. One potential application for this fruit and vegetable classification capability is for supermarket self-checkouts. Increasingly, supermarkets are introducing self-checkouts in stores to make the checkout process easier and faster. However, there are a number of challenges with this as all goods cannot readily be sold with packaging and barcodes, for instance loose fresh items (e.g. fruits and vegetables). Adding barcodes to these types of items individually is impractical and pre-packaging limits the freedom of choice when selecting fruits and vegetables and creates additional waste, hence reducing customer satisfaction. The current situation, which relies on customers correctly identifying produce themselves leaves open the potential for incorrect billing either due to inadvertent error, or due to intentional fraudulent misclassification resulting in financial losses for the store. To address this identified problem, the main goals of this PhD work are: (a) exploring the types of visual and non-visual sensors that could be incorporated into a self-checkout system for classification of fruits and vegetables, (b) determining a suitable feature representation method for fresh produce items available at supermarkets, (c) identifying optimal machine learning techniques for classification within this context and (d) evaluating our work relative to the state-of-the-art object classification results presented in the literature. An in-depth analysis of related computer vision literature and techniques is performed to identify and implement the possible solutions. A progressive process distribution approach is used for this project where the task of computer vision based fruit and vegetables classification is divided into pre-processing and classification techniques. Different classification techniques have been implemented and evaluated as possible solution for this problem. Both visual and non-visual features of fruit and vegetables are exploited to perform the classification. Novel classification techniques have been carefully developed to deal with the complex and highly variant physical features of fruit and vegetables while taking advantages of both visual and non-visual features. The capability of classification techniques is tested in individual and ensemble manner to achieved the higher effectiveness. Significant results have been obtained where it can be concluded that the fruit and vegetables classification is complex task with many challenges involved. It is also observed that a larger dataset can better comprehend the complex variant features of fruit and vegetables. Complex multidimensional features can be extracted from the larger datasets to generalise on higher number of classes. However, development of a larger multiclass dataset is an expensive and time consuming process. The effectiveness of classification techniques can be significantly improved by subtracting the background occlusions and complexities. It is also worth mentioning that ensemble of simple and less complicated classification techniques can achieve effective results even if applied to less number of features for smaller number of classes. The combination of visual and nonvisual features can reduce the struggle of a classification technique to deal with higher number of classes with similar physical features. Classification of fruit and vegetables with similar physical features (i.e. colour and texture) needs careful estimation and hyper-dimensional embedding of visual features. Implementing rigorous classification penalties as loss function can achieve this goal at the cost of time and computational requirements. There is a significant need to develop larger datasets for different fruit and vegetables related computer vision applications. Considering more sophisticated loss function penalties and discriminative hyper-dimensional features embedding techniques can significantly improve the effectiveness of the classification techniques for the fruit and vegetables applications

    Data mining in computational finance

    Get PDF
    Computational finance is a relatively new discipline whose birth can be traced back to early 1950s. Its major objective is to develop and study practical models focusing on techniques that apply directly to financial analyses. The large number of decisions and computationally intensive problems involved in this discipline make data mining and machine learning models an integral part to improve, automate, and expand the current processes. One of the objectives of this research is to present a state-of-the-art of the data mining and machine learning techniques applied in the core areas of computational finance. Next, detailed analysis of public and private finance datasets is performed in an attempt to find interesting facts from data and draw conclusions regarding the usefulness of features within the datasets. Credit risk evaluation is one of the crucial modern concerns in this field. Credit scoring is essentially a classification problem where models are built using the information about past applicants to categorise new applicants as ‘creditworthy’ or ‘non-creditworthy’. We appraise the performance of a few classical machine learning algorithms for the problem of credit scoring. Typically, credit scoring databases are large and characterised by redundant and irrelevant features, making the classification task more computationally-demanding. Feature selection is the process of selecting an optimal subset of relevant features. We propose an improved information-gain directed wrapper feature selection method using genetic algorithms and successfully evaluate its effectiveness against baseline and generic wrapper methods using three benchmark datasets. One of the tasks of financial analysts is to estimate a company’s worth. In the last piece of work, this study predicts the growth rate for earnings of companies using three machine learning techniques. We employed the technique of lagged features, which allowed varying amounts of recent history to be brought into the prediction task, and transformed the time series forecasting problem into a supervised learning problem. This work was applied on a private time series dataset
    • …
    corecore