56 research outputs found

    Heartbeat Anomaly Detection using Adversarial Oversampling

    Full text link
    Cardiovascular diseases are one of the most common causes of death in the world. Prevention, knowledge of previous cases in the family, and early detection is the best strategy to reduce this fact. Different machine learning approaches to automatic diagnostic are being proposed to this task. As in most health problems, the imbalance between examples and classes is predominant in this problem and affects the performance of the automated solution. In this paper, we address the classification of heartbeats images in different cardiovascular diseases. We propose a two-dimensional Convolutional Neural Network for classification after using a InfoGAN architecture for generating synthetic images to unbalanced classes. We call this proposal Adversarial Oversampling and compare it with the classical oversampling methods as SMOTE, ADASYN, and RandomOversampling. The results show that the proposed approach improves the classifier performance for the minority classes without harming the performance in the balanced classes

    Model stability of COVID-19 mortality prediction with biomarkers

    Get PDF
    Coronavirus disease 2019 (COVID-19) is an unprecedented and fast evolving pandemic, which has caused a large number of critically ill patients and deaths globally. It is an acute public health crisis leading to overloaded critical care capacity. Timely prediction of the clinical outcome (death/survival) of hospital-admitted COVID-19 patients can provide early warnings to clinicians, allowing improved allocation of medical resources. In a recently published paper, an interpretable machine learning model was presented to predict the mortality of COVID-19 patients with blood biomarkers, where the model was trained and tested on relatively small data sets. However, the model or performance stability was not explored and assessed. By re-analyzing the data, we reveal that the reported mortality prediction performance was likely over-optimistic and its uncertainty was underestimated or overlooked, with a large variability in predicting deaths

    An in-depth performance analysis of the oversampling techniques for high-class imbalanced dataset

    Get PDF
    Class imbalance occurs when the distribution of classes between the majority and the minority classes is not the same. The data on imbalanced classes may vary from mild to severe. The effect of high-class imbalance may affect the overall classification accuracy since the model is most likely to predict most of the data that fall within the majority class.  Such a model will give biased results, and the performance predictions for the minority class often have no impact on the model. The use of the oversampling technique is one way to deal with high-class imbalance, but only a few are used to solve data imbalance. This study aims for an in-depth performance analysis of the oversampling techniques to address the high-class imbalance problem. The addition of the oversampling technique will balance each class’s data to provide unbiased evaluation results in modeling. We compared the performance of Random Oversampling (ROS), ADASYN, SMOTE, and Borderline-SMOTE techniques. All oversampling techniques will be combined with machine learning methods such as Random Forest, Logistic Regression, and k-Nearest Neighbor (KNN). The test results show that Random Forest with Borderline-SMOTE gives the best value with an accuracy value of 0.9997, 0.9474 precision, 0.8571 recall, 0.9000 F1-score, 0.9388 ROC-AUC, and 0.8581 PRAUC of the overall oversampling technique

    Data Analytics Application in Fashion Retail SMEs (A Case Study in Caracas Fashion Store)

    Get PDF
    Data analytics plays a paramount role in maximizing productivity and profitability for businesses by deriving insights from pre-existing data to predict market trends and client habits to make better business decisions. In accordance with Industrial Revolution 4.0, most SMEs have begun to implement an e-commerce business model, thus, customer data is generated at an exponential rate, allowing SMEs to further develop their services for greater user satisfaction. However, the abundance of unsorted and ambiguous data leads to issues such as server overload and inefficient customer sales cycle tracking. This paper will explain the application of data analytics techniques and architectures to overcome these issues in a fashion retail SME, as well as the benefits and drawbacks of these solutions

    An Approach to Automatically Detect and Visualize Bias in Data Analytics

    Get PDF
    Data Analytics and Artificial Intelligence (AI) are increasingly driving key business decisions and business processes. Any flaws in the interpretation of analytic results or AI outputs can lead to significant economic loses and reputation damage. Among existing flaws, one of the most often overlooked is the use biased data and imbalanced datasets. When unadverted, data bias warps the meaning of data and has a devastating effect on AI results. Existing approaches deal with data bias by constraining the data model, altering its composition until the data is no longer biased. Unfortunately, studies have shown that crucial information about the nature of data may be lost during this process. Therefore, in this paper we propose an alternative process, one that detects data biases and presents biased data in a visual way so that the user can comprehend how data is structured and decide whether or not constraining approaches are applicable in his context. Our approach detects the existence of biases in datasets through our proposed algorithm and generates a series of visualizations in a way that is understandable for users, including non-expert ones. In this way, users become aware not only of the existence of biases in the data, but also how they may impact their analytics and AI algorithms, thus avoiding undesired results.This work has been co-funded by the ECLIPSE-UA (RTI2018-094283-B-C32) project funded by Spanish Ministry of Science, Innovation, and Universities. Ana Lavalle holds an Industrial PhD Grant (I-PI 03-18) co-funded by the University of Alicante and the Lucentia Lab Spin-off Company

    A Survey of Methods for Handling Disk Data Imbalance

    Full text link
    Class imbalance exists in many classification problems, and since the data is designed for accuracy, imbalance in data classes can lead to classification challenges with a few classes having higher misclassification costs. The Backblaze dataset, a widely used dataset related to hard discs, has a small amount of failure data and a large amount of health data, which exhibits a serious class imbalance. This paper provides a comprehensive overview of research in the field of imbalanced data classification. The discussion is organized into three main aspects: data-level methods, algorithmic-level methods, and hybrid methods. For each type of method, we summarize and analyze the existing problems, algorithmic ideas, strengths, and weaknesses. Additionally, the challenges of unbalanced data classification are discussed, along with strategies to address them. It is convenient for researchers to choose the appropriate method according to their needs

    Feature extraction comparison for facial expression recognition using adaptive extreme learning machine

    Get PDF
    Facial expression recognition is an important part in the field of affective computing. Automatic analysis of human facial expression is a challenging problem with many applications. Most of the existing automated systems for facial expression analysis attempt to recognize a few prototypes emotional expressions such as anger, contempt, disgust, fear, happiness, neutral, sadness, and surprise. This paper aims to compare feature extraction methods that are used to detect human facial expression. The study compares the gray level co-occurrence matrix, local binary pattern, and facial landmark (FL) with two types of facial expression datasets, namely Japanese female facial expression (JFFE), and extended Cohn-Kanade (CK+). In addition, we also propose an enhancement of extreme learning machine (ELM) method that can adaptively select best number of hidden neurons adaptive ELM (aELM) to reach its maximum performance. The result from this paper is our proposed method can slightly improve the performance of basic ELM method using some feature extractions mentioned before. Our proposed method can obtain maximum mean accuracy score of 88.07% on CK+ dataset, and 83.12% on JFFE dataset with FL feature extraction

    VALUE CREATION THROUGH DATA TRACKING TECHNOLOGIES IN THE FOOTBALL INDUSTRY OBSTACLES AND DYSFUNCTIONAL EFFECTS

    Get PDF
    The use of Big Data has become an essential part of today’s business. Data is present wherever you turn your head, whether looking at new innovative business opportunities, optimization and automation of existing business models, or getting rid of old habits. Recently, different types of tracking technologies have been introduced in the professional football industry, which offers the coaches full insight into how far players run, where they run, their directional shift, pace, accelerations, and how often and how long the players stand still. This technology offers an opportunity to optimize the sporting conditions of the teams through digital transformation. By applying the framework ‘Multidimensional Value Categories’, this paper contributes to practice by suggesting how tracking technologies can contribute to business value in professional football organizations, and to theory, by identifying obstacles and dysfunctional effects related to this value creation
    • …
    corecore