1,256 research outputs found

    Absent Data Generating Classifier for Imbalanced Class Sizes

    Get PDF
    We propose an algorithm for two-class classification problems when the training data are imbalanced. This means the number of training instances in one of the classes is so low that the conventional classification algorithms become ineffective in detecting the minority class. We present a modification of the kernel Fisher discriminant analysis such that the imbalanced nature of the problem is explicitly addressed in the new algorithm formulation. The new algorithm exploits the properties of the existing minority points to learn the effects of other minority data points, had they actually existed. The algorithm proceeds iteratively by employing the learned properties and conditional sampling in such a way that it generates sufficient artificial data points for the minority set, thus enhancing the detection probability of the minority class. Implementing the proposed method on a number c©2015 Arash Pourhabib, Bani K. Mallick and Yu Ding. Article preprint--accepted for publication, Feb 201

    Absent Data Generating Classifier for Imbalanced Class Sizes

    Get PDF

    Sentiment classification with concept drift and imbalanced class distributions

    Get PDF
    Document-level sentiment classification aims to automate the task of classifying a textual review, which is given on a single topic, as expressing a positive or negative sentiment. In general, people express their opinions towards an entity based on their characteristics which may change over time. User‘s opinions are changed due to evolution of target entities over time. However, the existing sentiment classification approaches did not considered the evolution of User‘s opinions. They assumed that instances are independent, identically distributed and generated from a stationary distribution, while generated from a stream distribution. They used the static classification model that builds a classifier using a training set without considering the time that reviews are posted. However, time may be very useful as an important feature for classification task. In this paper, a stream sentiment classification framework is proposed to deal with concept drift and imbalanced data distribution using ensemble learning and instance selection methods. The experimental results show the effectiveness of the proposed method in compared with static sentiment classification

    Breaking the Spurious Causality of Conditional Generation via Fairness Intervention with Corrective Sampling

    Full text link
    To capture the relationship between samples and labels, conditional generative models often inherit spurious correlations from the training dataset. This can result in label-conditional distributions that are imbalanced with respect to another latent attribute. To mitigate this issue, which we call spurious causality of conditional generation, we propose a general two-step strategy. (a) Fairness Intervention (FI): emphasize the minority samples that are hard to generate due to the spurious correlation in the training dataset. (b) Corrective Sampling (CS): explicitly filter the generated samples and ensure that they follow the desired latent attribute distribution. We have designed the fairness intervention to work for various degrees of supervision on the spurious attribute, including unsupervised, weakly-supervised, and semi-supervised scenarios. Our experimental results demonstrate that FICS can effectively resolve spurious causality of conditional generation across various datasets.Comment: TMLR 202

    Automating the decision making process of Todd’s age estimation method from the pubic symphysis with explainable machine learning

    Get PDF
    Age estimation is a fundamental task in forensic anthropology for both the living and the dead. The procedure consists of analyzing properties such as appearance, ossification patterns, and morphology in different skeletonized remains. The pubic symphysis is extensively used to assess adults’ age-at-death due to its reliability. Nevertheless, most methods currently used for skeleton-based age estimation are carried out manually, even though their automation has the potential to lead to a considerable improvement in terms of economic resources, effectiveness, and execution time. In particular, explainable machine learning emerges as a promising means of addressing this challenge by engaging forensic experts to refine and audit the extracted knowledge and discover unknown patterns hidden in the complex and uncertain available data. In this contribution we address the automation of the decision making process of Todd’s pioneering age assessment method to assist the forensic practitioner in its application. To do so, we make use of the pubic bone data base available at the Physical Anthropology lab of the University of Granada. The machine learning task is significantly complex as it becomes an imbalanced ordinal classification problem with a small sample size and a high dimension. We tackle it with the combination of an ordinal classification method and oversampling techniques through an extensive experimental setup. Two forensic anthropologists refine and validate the derived rule base according to their own expertise and the knowledge available in the area. The resulting automatic system, finally composed of 34 interpretable rules, outperforms the state-of-the-art accuracy. In addition, and more importantly, it allows the forensic experts to uncover novel and interesting insights about how Todd’s method works, in particular, and the guidelines to estimate age-at-death from pubic symphysis characteristics, generally.Ministry of Science and Innovation, Spain (MICINN) Spanish GovernmentAgencia Estatal de Investigacion (AEI) PID2021-122916NB-I00 Spanish Government PGC2018-101216-B-I00Junta de AndaluciaUniversity of Granada P18 -FR -4262 B-TIC-456-UGR20European CommissionUniversidad de Granada/CBU
    • 

    corecore