15 research outputs found
Quantifying Social Biases Using Templates is Unreliable
Recently, there has been an increase in efforts to understand how large
language models (LLMs) propagate and amplify social biases. Several works have
utilized templates for fairness evaluation, which allow researchers to quantify
social biases in the absence of test sets with protected attribute labels.
While template evaluation can be a convenient and helpful diagnostic tool to
understand model deficiencies, it often uses a simplistic and limited set of
templates. In this paper, we study whether bias measurements are sensitive to
the choice of templates used for benchmarking. Specifically, we investigate the
instability of bias measurements by manually modifying templates proposed in
previous works in a semantically-preserving manner and measuring bias across
these modifications. We find that bias values and resulting conclusions vary
considerably across template modifications on four tasks, ranging from an 81%
reduction (NLI) to a 162% increase (MLM) in (task-specific) bias measurements.
Our results indicate that quantifying fairness in LLMs, as done in current
practice, can be brittle and needs to be approached with more care and caution
Distilling Large Language Models using Skill-Occupation Graph Context for HR-Related Tasks
Numerous HR applications are centered around resumes and job descriptions.
While they can benefit from advancements in NLP, particularly large language
models, their real-world adoption faces challenges due to absence of
comprehensive benchmarks for various HR tasks, and lack of smaller models with
competitive capabilities. In this paper, we aim to bridge this gap by
introducing the Resume-Job Description Benchmark (RJDB). We meticulously craft
this benchmark to cater to a wide array of HR tasks, including matching and
explaining resumes to job descriptions, extracting skills and experiences from
resumes, and editing resumes. To create this benchmark, we propose to distill
domain-specific knowledge from a large language model (LLM). We rely on a
curated skill-occupation graph to ensure diversity and provide context for LLMs
generation. Our benchmark includes over 50 thousand triples of job
descriptions, matched resumes and unmatched resumes. Using RJDB, we train
multiple smaller student models. Our experiments reveal that the student models
achieve near/better performance than the teacher model (GPT-4), affirming the
effectiveness of the benchmark. Additionally, we explore the utility of RJDB on
out-of-distribution data for skill extraction and resume-job description
matching, in zero-shot and weak supervision manner. We release our datasets and
code to foster further research and industry applications
Recommended from our members
Compact Factorization of Matrices Using Generalized Round-Rank
Matrix factorization is a popular machine learning technique, with applications in variety of domains, such as recommendation systems [16, 28], natural language processing [26], and computer vision [10]. Due to this widespread use of these models, there has been considerable theoretical analysis of the various properties of low-rank approximations of real-valued matrices, including approximation rank [1, 5] and sample complexity [2].Rather than assume real-valued data, a number of studies (particularly ones on practical applications) focus on more specific data types, such as binary data [23], integer data [17], and ordinal data [12, 30]. For such matrices, existing approaches have used different link functions, applied in an element-wise manner to the low-rank representation [21], i.e. the output Y is ψ(U^TV) instead of the conventional U^TV. These link functions have been justified from a probabilistic point of view [4, 27], and have provided considerable success in empirical settings. However, theoretical results for linear factorization do not apply here, and thus the expressive power of the factorization models with non-linear link functions is not clear, and neither is the relation of the rank of a matrix to the link function used.In this work, we first define a generalized notion of rank based on the link function ψ, as the rank of a latent matrix before the link function is applied. We focus on a link function that applies to factorization of integer-valued matrices: the generalized round function (GRF), and define the corresponding generalized round-rank (GRR). After providing background on GRR, we show that there are many low-GRR matrices that are full rank [1]. Moreover, we also study the approximation limitations of linear rank, by showing, for example, that low GRR matrices often cannot be approximated by low-rank linear matrices. We define uniqueness for GRR-based matrix completion, and derive its necessary and sufficient conditions. These properties demonstrate that many full linear-rank matrices can be factorized using low-rank matrices if an appropriate link function is used.We also present an empirical evaluation of factorization with different link functions for matrix reconstruction and completion. We show that using link functions is efficient compared to linear rank, in that gradient-based optimization approach learns more accurate reconstructions using a lower rank representation and fewer training samples. We also perform experiments on matrix completion on two recommendation datasets, and demonstrate that appropriate link function outperform linear factorization, thus can play a crucial role in accurate matrix completion