Search CORE

15 research outputs found

Quantifying Social Biases Using Templates is Unreliable

Author: Pezeshkpour Pouya
Seshadri Preethi
Singh Sameer
Publication venue
Publication date: 09/10/2022
Field of study

Recently, there has been an increase in efforts to understand how large language models (LLMs) propagate and amplify social biases. Several works have utilized templates for fairness evaluation, which allow researchers to quantify social biases in the absence of test sets with protected attribute labels. While template evaluation can be a convenient and helpful diagnostic tool to understand model deficiencies, it often uses a simplistic and limited set of templates. In this paper, we study whether bias measurements are sensitive to the choice of templates used for benchmarking. Specifically, we investigate the instability of bias measurements by manually modifying templates proposed in previous works in a semantically-preserving manner and measuring bias across these modifications. We find that bias values and resulting conclusions vary considerably across template modifications on four tasks, ranging from an 81% reduction (NLI) to a 162% increase (MLM) in (task-specific) bias measurements. Our results indicate that quantifying fairness in LLMs, as done in current practice, can be brittle and needs to be approached with more care and caution

arXiv.org e-Print Archive

Distilling Large Language Models using Skill-Occupation Graph Context for HR-Related Tasks

Author: Bhutani Nikita
Hruschka Estevam
Iso Hayate
Lake Thom
Pezeshkpour Pouya
Publication venue
Publication date: 10/11/2023
Field of study

Numerous HR applications are centered around resumes and job descriptions. While they can benefit from advancements in NLP, particularly large language models, their real-world adoption faces challenges due to absence of comprehensive benchmarks for various HR tasks, and lack of smaller models with competitive capabilities. In this paper, we aim to bridge this gap by introducing the Resume-Job Description Benchmark (RJDB). We meticulously craft this benchmark to cater to a wide array of HR tasks, including matching and explaining resumes to job descriptions, extracting skills and experiences from resumes, and editing resumes. To create this benchmark, we propose to distill domain-specific knowledge from a large language model (LLM). We rely on a curated skill-occupation graph to ensure diversity and provide context for LLMs generation. Our benchmark includes over 50 thousand triples of job descriptions, matched resumes and unmatched resumes. Using RJDB, we train multiple smaller student models. Our experiments reveal that the student models achieve near/better performance than the teacher model (GPT-4), affirming the effectiveness of the benchmark. Additionally, we explore the utility of RJDB on out-of-distribution data for skill extraction and resume-job description matching, in zero-shot and weak supervision manner. We release our datasets and code to foster further research and industry applications

arXiv.org e-Print Archive

Recommended from our members

Compact Factorization of Matrices Using Generalized Round-Rank

Author: Pezeshkpour Pouya
Publication venue: eScholarship, University of California
Publication date: 01/01/2018
Field of study

Matrix factorization is a popular machine learning technique, with applications in variety of domains, such as recommendation systems [16, 28], natural language processing [26], and computer vision [10]. Due to this widespread use of these models, there has been considerable theoretical analysis of the various properties of low-rank approximations of real-valued matrices, including approximation rank [1, 5] and sample complexity [2].Rather than assume real-valued data, a number of studies (particularly ones on practical applications) focus on more specific data types, such as binary data [23], integer data [17], and ordinal data [12, 30]. For such matrices, existing approaches have used different link functions, applied in an element-wise manner to the low-rank representation [21], i.e. the output Y is ψ(U^TV) instead of the conventional U^TV. These link functions have been justified from a probabilistic point of view [4, 27], and have provided considerable success in empirical settings. However, theoretical results for linear factorization do not apply here, and thus the expressive power of the factorization models with non-linear link functions is not clear, and neither is the relation of the rank of a matrix to the link function used.In this work, we first define a generalized notion of rank based on the link function ψ, as the rank of a latent matrix before the link function is applied. We focus on a link function that applies to factorization of integer-valued matrices: the generalized round function (GRF), and define the corresponding generalized round-rank (GRR). After providing background on GRR, we show that there are many low-GRR matrices that are full rank [1]. Moreover, we also study the approximation limitations of linear rank, by showing, for example, that low GRR matrices often cannot be approximated by low-rank linear matrices. We define uniqueness for GRR-based matrix completion, and derive its necessary and sufficient conditions. These properties demonstrate that many full linear-rank matrices can be factorized using low-rank matrices if an appropriate link function is used.We also present an empirical evaluation of factorization with different link functions for matrix reconstruction and completion. We show that using link functions is efficient compared to linear rank, in that gradient-based optimization approach learns more accurate reconstructions using a lower rank representation and fewer training samples. We also perform experiments on matrix completion on two recommendation datasets, and demonstrate that appropriate link function outperform linear factorization, thus can play a crucial role in accurate matrix completion

eScholarship - University of California