15 research outputs found
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum
Perspectives on Large Language Models for Relevance Judgment
When asked, current large language models (LLMs) like ChatGPT claim that they
can assist us with relevance judgments. Many researchers think this would not
lead to credible IR research. In this perspective paper, we discuss possible
ways for LLMs to assist human experts along with concerns and issues that
arise. We devise a human-machine collaboration spectrum that allows
categorizing different relevance judgment strategies, based on how much the
human relies on the machine. For the extreme point of "fully automated
assessment", we further include a pilot experiment on whether LLM-based
relevance judgments correlate with judgments from trained human assessors. We
conclude the paper by providing two opposing perspectives - for and against the
use of LLMs for automatic relevance judgments - and a compromise perspective,
informed by our analyses of the literature, our preliminary experimental
evidence, and our experience as IR researchers.
We hope to start a constructive discussion within the community to avoid a
stale-mate during review, where work is dammed if is uses LLMs for evaluation
and dammed if it doesn't
Budget-Feasible Mechanism Design for Non-monotone Submodular Objectives: Offline and Online
The framework of budget-feasible mechanism design studies procurement auctions where the auctioneer (buyer) aims to maximize his valuation function subject to a hard budget constraint. We study the problem of designing truthful mechanisms that have good approximation guarantees and never pay the participating agents (sellers) more than the budget. We focus on the case of general (non-monotone) submodular valuation functions and derive the first truthful, budget-feasible, and O(1)-approximation mechanisms that run in polynomial time in the value query model, for both offline and online auctions. Prior to our work, the only O(1)-approximation mechanism known for non-monotone submodular objectives required an exponential number of value queries. At the heart of our approach lies a novel greedy algorithm for non-monotone submodular maximization under a knapsack constraint. Our algorithm builds two candidate solutions simultaneously (to achieve a good approximation), yet ensures that agents cannot jump from one solution to the other (to implicitly enforce truthfulness). The fact that in our mechanism the agents are not ordered according to their marginal value per cost allows us to appropriately adapt these ideas to the online setting as well. To further illustrate the applicability of our approach, we also consider the case where additional feasibility constraints are present, for example, at most k agents can be selected. We obtain O(p)-approximation mechanisms for both monotone and non-monotone submodular objectives, when the feasible solutions are independent sets of a p-system. With the exception of additive valuation functions, no mechanisms were known for this setting prior to our work. Finally, we provide lower bounds suggesting that, when one cares about nontrivial approximation guarantees in polynomial time, our results are, asymptotically, the best possible
Budget-feasible mechanism design for non-monotone submodular objectives: Offline and online
The framework of budget-feasible mechanism design studies procurement auctions where the auctioneer (buyer) aims to maximize his valuation function subject to a hard budget constraint. We study the problem of designing truthful mechanisms that have good approximation guarantees and never pay the participating agents (sellers) more than the budget. We focus on the case of general (non-monotone) submodular valuation functions and derive the first truthful, budget-feasible and O(1)-approximation mechanisms that run in polynomial time in the value query model, for both offline and online auctions. Since the introduction of the problem by Singer [40], obtaining efficient mechanisms for objectives that go beyond the class of monotone submodular functions has been elusive. Prior to our work, the only O(1)-approximation mechanism known for non-monotone submodular objectives required an exponential number of value queries. At the heart of our approach lies a novel greedy algorithm for non-monotone submodular maximization under a knapsack constraint. Our algorithm builds two candidate solutions simultaneously (to achieve a good approximation), yet ensures that agents cannot jump from one solution to the other (to implicitly enforce truthfulness). Ours is the first mechanism for the problem where-crucially-the agents are not ordered according to their marginal value per cost. This allows us to appropriately adapt these ideas to the online setting as well. To further illustrate the applicability of our approach, we also consider the case where additional feasibility constraints are present, e.g., at most k agents can be selected. We obtain O(p)-approximation mechanisms for both monotone and non-monotone submodular objectives, when the feasible solutions are independent sets of a p-system. With the exception of additive valuation functions, no mechanisms were known for this setting prior to our work. Finally, we provide lower bounds suggesting that, when one cares about non-trivial approximation guaran
Budget-Feasible Mechanism Design for Non-Monotone Submodular Objectives: Offline and Online
The framework of budget-feasible mechanism design studies procurement
auctions where the auctioneer (buyer) aims to maximize his valuation function
subject to a hard budget constraint. We study the problem of designing truthful
mechanisms that have good approximation guarantees and never pay the
participating agents (sellers) more than the budget. We focus on the case of
general (non-monotone) submodular valuation functions and derive the first
truthful, budget-feasible and -approximate mechanisms that run in
polynomial time in the value query model, for both offline and online auctions.
Prior to our work, the only -approximation mechanism known for
non-monotone submodular objectives required an exponential number of value
queries.
At the heart of our approach lies a novel greedy algorithm for non-monotone
submodular maximization under a knapsack constraint. Our algorithm builds two
candidate solutions simultaneously (to achieve a good approximation), yet
ensures that agents cannot jump from one solution to the other (to implicitly
enforce truthfulness). Ours is the first mechanism for the problem
where---crucially---the agents are not ordered with respect to their marginal
value per cost. This allows us to appropriately adapt these ideas to the online
setting as well.
To further illustrate the applicability of our approach, we also consider the
case where additional feasibility constraints are present. We obtain
-approximation mechanisms for both monotone and non-monotone submodular
objectives, when the feasible solutions are independent sets of a -system.
With the exception of additive valuation functions, no mechanisms were known
for this setting prior to our work. Finally, we provide lower bounds suggesting
that, when one cares about non-trivial approximation guarantees in polynomial
time, our results are asymptotically best possible.Comment: Accepted to EC 201
Plattformbasierte Erwerbsarbeit: Stand der empirischen Forschung
This study summarizes the current state of empirical research in economics and social sciences on contract work mediated or provided by online platforms (online contract work). Based on a systematic literature review, this study discusses results on the diffusion of online platforms, the characteristics of workers as well as the motives for labor supply and the working conditions. The study considers services which can be provided from anywhere via the internet (online labor markets), as well as services which are mediated by online platforms but are provided at a predefined location (mobile labor markets). Besides a summary of existing research findings on the topic, this study also evaluates the quality of the empirical methods. The focus lies on the applied methods for data collection as well as the statistical analyses of the data. As a result, the current state of knowledge on online contract work can be regarded as fragmented. While for the United States several studies already exist on the diffusion of online contract work, there is a paucity of corresponding studies in Europe. A considerably higher number of studies deals with other aspects of online contract work, out of which, however, only a few focus on mobile labor markets. Administrative statistics and largescale representative surveys do not yet contain information on online contract work. Existing research on the topic is therefore based on a variety of data sources and methodological approaches, which makes it difficult to compare empirical findings
Algorithms for assessing the quality and difficulty of multiple choice exam questions
Multiple Choice Questions (MCQs) have long been the backbone of standardized
testing in academia and industry. Correspondingly, there is a constant need for the
authors of MCQs to write and refine new questions for new versions of standardized
tests as well as to support measuring performance in the emerging massive open online
courses, (MOOCs). Research that explores what makes a question difficult, or what
questions distinguish higher-performing students from lower-performing students can
aid in the creation of the next generation of teaching and evaluation tools.
In the automated MCQ answering component of this thesis, algorithms query for
definitions of scientific terms, process the returned web results, and compare the returned
definitions to the original definition in the MCQ. This automated method for
answering questions is then augmented with a model, based on human performance
data from crowdsourced question sets, for analysis of question difficulty as well as
the discrimination power of the non-answer alternatives. The crowdsourced question
sets come from PeerWise, an open source online college-level question authoring and
answering environment.
The goal of this research is to create an automated method to both answer and
assesses the difficulty of multiple choice inverse definition questions in the domain of
introductory biology. The results of this work suggest that human-authored question
banks provide useful data for building gold standard human performance models. The
methodology for building these performance models has value in other domains that
test the difficulty of questions and the quality of the exam takers