292 research outputs found
The Rich Get Richer: Disparate Impact of Semi-Supervised Learning
Semi-supervised learning (SSL) has demonstrated its potential to improve the
model accuracy for a variety of learning tasks when the high-quality supervised
data is severely limited. Although it is often established that the average
accuracy for the entire population of data is improved, it is unclear how SSL
fares with different sub-populations. Understanding the above question has
substantial fairness implications when different sub-populations are defined by
the demographic groups that we aim to treat fairly. In this paper, we reveal
the disparate impacts of deploying SSL: the sub-population who has a higher
baseline accuracy without using SSL (the "rich" one) tends to benefit more from
SSL; while the sub-population who suffers from a low baseline accuracy (the
"poor" one) might even observe a performance drop after adding the SSL module.
We theoretically and empirically establish the above observation for a broad
family of SSL algorithms, which either explicitly or implicitly use an
auxiliary "pseudo-label". Experiments on a set of image and text classification
tasks confirm our claims. We introduce a new metric, Benefit Ratio, and promote
the evaluation of the fairness of SSL (Equalized Benefit Ratio). We further
discuss how the disparate impact can be mitigated. We hope our paper will alarm
the potential pitfall of using SSL and encourage a multifaceted evaluation of
future SSL algorithms.Comment: Published as a conference paper at ICLR 202
Measuring Value Understanding in Language Models through Discriminator-Critique Gap
Recent advancements in Large Language Models (LLMs) have heightened concerns
about their potential misalignment with human values. However, evaluating their
grasp of these values is complex due to their intricate and adaptable nature.
We argue that truly understanding values in LLMs requires considering both
"know what" and "know why". To this end, we present the Value Understanding
Measurement (VUM) framework that quantitatively assesses both "know what" and
"know why" by measuring the discriminator-critique gap related to human values.
Using the Schwartz Value Survey, we specify our evaluation values and develop a
thousand-level dialogue dataset with GPT-4. Our assessment looks at both the
value alignment of LLM's outputs compared to baseline answers and how LLM
responses align with reasons for value recognition versus GPT-4's annotations.
We evaluate five representative LLMs and provide strong evidence that the
scaling law significantly impacts "know what" but not much on "know why", which
has consistently maintained a high level. This may further suggest that LLMs
might craft plausible explanations based on the provided context without truly
understanding their inherent value, indicating potential risks
Ferroptosis in head and neck squamous cell carcinoma: from pathogenesis to treatment
Head and neck squamous cell carcinoma (HNSCC) is the sixth most common malignant tumor worldwide, with high morbidity and mortality. Surgery and postoperative chemoradiotherapy have largely reduced the recurrence and fatality rates for most HNSCCs. Nonetheless, these therapeutic approaches result in poor prognoses owing to severe adverse reactions and the development of drug resistance. Ferroptosis is a kind of programmed cell death which is non-apoptotic. Ferroptosis of tumor cells can inhibit tumor development. Ferroptosis involves various biomolecules and signaling pathways, whose expressions can be adjusted to modulate the sensitivity of cells to ferroptosis. As a tool in the fight against cancer, the activation of ferroptosis is a treatment that has received much attention in recent years. Therefore, understanding the molecular mechanism of ferroptosis in HNSCC is an essential strategy with therapeutic potential. The most important thing to treat HNSCC is to choose the appropriate treatment method. In this review, we discuss the molecular and defense mechanisms of ferroptosis, analyze the role and mechanism of ferroptosis in the inhibition and immunity against HNSCC, and explore the therapeutic strategy for inducing ferroptosis in HNSCC including drug therapy, radiation therapy, immunotherapy, nanotherapy and comprehensive treatment. We find ferroptosis provides a new target for HNSCC treatment
Evaluating Fairness Without Sensitive Attributes: A Framework Using Only Auxiliary Models
Although the volume of literature and public attention on machine learning
fairness has been growing significantly, in practice some tasks as basic as
measuring fairness, which is the first step in studying and promoting fairness,
can be challenging. This is because sensitive attributes are often unavailable
due to privacy regulations. The straightforward solution is to use auxiliary
models to predict the missing sensitive attributes. However, our theoretical
analyses show that the estimation error of the directly measured fairness
metrics is proportional to the error rates of auxiliary models' predictions.
Existing works that attempt to reduce the estimation error often require strong
assumptions, e.g. access to the ground-truth sensitive attributes or some form
of conditional independence. In this paper, we drop those assumptions and
propose a framework that uses only off-the-shelf auxiliary models. The main
challenge is how to reduce the negative impact of imperfectly predicted
sensitive attributes on the fairness metrics without knowing the ground-truth
sensitive attributes. Inspired by the noisy label learning literature, we first
derive a closed-form relationship between the directly measured fairness
metrics and their corresponding ground-truth metrics. And then we estimate some
key statistics (most importantly transition matrix in the noisy label
literature), which we use, together with the derived relationship, to calibrate
the fairness metrics. In addition, we theoretically prove the upper bound of
the estimation error in our calibrated metrics and show our method can
substantially decrease the estimation error especially when auxiliary models
are inaccurate or the target model is highly biased. Experiments on COMPAS and
CelebA validate our theoretical analyses and show our method can measure
fairness significantly more accurately than baselines under favorable
circumstances
UA-DETRAC: A New Benchmark and Protocol for Multi-Object Detection and Tracking
In recent years, numerous effective multi-object tracking (MOT) methods are
developed because of the wide range of applications. Existing performance
evaluations of MOT methods usually separate the object tracking step from the
object detection step by using the same fixed object detection results for
comparisons. In this work, we perform a comprehensive quantitative study on the
effects of object detection accuracy to the overall MOT performance, using the
new large-scale University at Albany DETection and tRACking (UA-DETRAC)
benchmark dataset. The UA-DETRAC benchmark dataset consists of 100 challenging
video sequences captured from real-world traffic scenes (over 140,000 frames
with rich annotations, including occlusion, weather, vehicle category,
truncation, and vehicle bounding boxes) for object detection, object tracking
and MOT system. We evaluate complete MOT systems constructed from combinations
of state-of-the-art object detection and object tracking methods. Our analysis
shows the complex effects of object detection accuracy on MOT system
performance. Based on these observations, we propose new evaluation tools and
metrics for MOT systems that consider both object detection and object tracking
for comprehensive analysis.Comment: 18 pages, 11 figures, accepted by CVI
- …