1,256 research outputs found
Absent Data Generating Classifier for Imbalanced Class Sizes
We propose an algorithm for two-class classification problems when the training data are imbalanced. This means the number of training instances in one of the classes is so low that the conventional classification algorithms become ineffective in detecting the minority class. We present a modification of the kernel Fisher discriminant analysis such that the imbalanced nature of the problem is explicitly addressed in the new algorithm formulation. The new algorithm exploits the properties of the existing minority points to learn the effects of other minority data points, had they actually existed. The algorithm proceeds iteratively by employing the learned properties and conditional sampling in such a way that it generates sufficient artificial data points for the minority set, thus enhancing the detection probability of the minority class. Implementing the proposed method on a number c©2015 Arash Pourhabib, Bani K. Mallick and Yu Ding. Article preprint--accepted for publication, Feb 201
Sentiment classification with concept drift and imbalanced class distributions
Document-level sentiment classification aims to automate the task of classifying a textual review, which is given on a single topic, as expressing a positive or negative sentiment. In general, people express their opinions towards an entity based on their characteristics which may change over time. Userâs opinions are changed due to evolution of target entities over time. However, the existing sentiment classification approaches did not considered the evolution of Userâs opinions. They assumed that instances are independent, identically distributed and generated from a stationary distribution, while generated from a stream distribution. They used the static classification model that builds a classifier using a training set without considering the time that reviews are posted. However, time may be very useful as an important feature for classification task. In this paper, a stream sentiment classification framework is proposed to deal with concept drift and imbalanced data distribution using ensemble learning and instance selection methods. The experimental results show the effectiveness of the proposed method in compared with static sentiment classification
Breaking the Spurious Causality of Conditional Generation via Fairness Intervention with Corrective Sampling
To capture the relationship between samples and labels, conditional
generative models often inherit spurious correlations from the training
dataset. This can result in label-conditional distributions that are imbalanced
with respect to another latent attribute. To mitigate this issue, which we call
spurious causality of conditional generation, we propose a general two-step
strategy. (a) Fairness Intervention (FI): emphasize the minority samples that
are hard to generate due to the spurious correlation in the training dataset.
(b) Corrective Sampling (CS): explicitly filter the generated samples and
ensure that they follow the desired latent attribute distribution. We have
designed the fairness intervention to work for various degrees of supervision
on the spurious attribute, including unsupervised, weakly-supervised, and
semi-supervised scenarios. Our experimental results demonstrate that FICS can
effectively resolve spurious causality of conditional generation across various
datasets.Comment: TMLR 202
Automating the decision making process of Toddâs age estimation method from the pubic symphysis with explainable machine learning
Age estimation is a fundamental task in forensic anthropology for both the living and the
dead. The procedure consists of analyzing properties such as appearance, ossification patterns,
and morphology in different skeletonized remains. The pubic symphysis is extensively
used to assess adultsâ age-at-death due to its reliability. Nevertheless, most
methods currently used for skeleton-based age estimation are carried out manually, even
though their automation has the potential to lead to a considerable improvement in terms
of economic resources, effectiveness, and execution time. In particular, explainable
machine learning emerges as a promising means of addressing this challenge by engaging
forensic experts to refine and audit the extracted knowledge and discover unknown patterns
hidden in the complex and uncertain available data. In this contribution we address
the automation of the decision making process of Toddâs pioneering age assessment
method to assist the forensic practitioner in its application. To do so, we make use of the
pubic bone data base available at the Physical Anthropology lab of the University of
Granada. The machine learning task is significantly complex as it becomes an imbalanced
ordinal classification problem with a small sample size and a high dimension. We tackle it
with the combination of an ordinal classification method and oversampling techniques
through an extensive experimental setup. Two forensic anthropologists refine and validate
the derived rule base according to their own expertise and the knowledge available in the
area. The resulting automatic system, finally composed of 34 interpretable rules, outperforms
the state-of-the-art accuracy. In addition, and more importantly, it allows the forensic
experts to uncover novel and interesting insights about how Toddâs method works, in
particular, and the guidelines to estimate age-at-death from pubic symphysis characteristics,
generally.Ministry of Science and Innovation, Spain (MICINN)
Spanish GovernmentAgencia Estatal de Investigacion (AEI) PID2021-122916NB-I00
Spanish Government PGC2018-101216-B-I00Junta de AndaluciaUniversity of Granada P18 -FR -4262
B-TIC-456-UGR20European CommissionUniversidad de Granada/CBU
- âŠ