3,403 research outputs found
Better representation learning for TPMS
Avec l’augmentation de la popularité de l’IA et de l’apprentissage automatique, le nombre
de participants a explosé dans les conférences AI/ML. Le grand nombre d’articles soumis
et la nature évolutive des sujets constituent des défis supplémentaires pour les systèmes
d’évaluation par les pairs qui sont cruciaux pour nos communautés scientifiques. Certaines
conférences ont évolué vers l’automatisation de l’attribution des examinateurs pour
les soumissions, le TPMS [1] étant l’un de ces systèmes existants. Actuellement, TPMS
prépare des profils de chercheurs et de soumissions basés sur le contenu, afin de modéliser
l’adéquation des paires examinateur-soumission.
Dans ce travail, nous explorons différentes approches pour le réglage fin auto-supervisé
des transformateurs BERT pour les données des documents de conférence. Nous démontrons
quelques nouvelles approches des vues d’augmentation pour l’auto-supervision dans le
traitement du langage naturel, qui jusqu’à présent était davantage axée sur les problèmes de
vision par ordinateur. Nous utilisons ensuite ces représentations d’articles individuels pour
construire un modèle d’expertise qui apprend à combiner la représentation des différents
travaux publiés d’un examinateur et à prédire leur pertinence pour l’examen d’un article
soumis. Au final, nous montrons que de meilleures représentations individuelles des papiers
et une meilleure modélisation de l’expertise conduisent à de meilleures performances dans
la tâche de prédiction de l’adéquation de l’examinateur.With the increase in popularity of AI and Machine learning, participation numbers have
exploded in AI/ML conferences. The large number of submission papers and the evolving
nature of topics constitute additional challenges for peer-review systems that are crucial for
our scientific communities. Some conferences have moved towards automating the reviewer
assignment for submissions, TPMS [1] being one such existing system. Currently, TPMS
prepares content-based profiles of researchers and submission papers, to model the suitability
of reviewer-submission pairs.
In this work, we explore different approaches to self-supervised fine-tuning of BERT
transformers for conference papers data. We demonstrate some new approaches to augmentation
views for self-supervision in natural language processing, which till now has
been more focused on problems in computer vision. We then use these individual paper
representations for building an expertise model which learns to combine the representation
of different published works of a reviewer and predict their relevance for reviewing
a submission paper. In the end, we show that better individual paper representations
and expertise modeling lead to better performance on the reviewer suitability prediction task
Online Deception Detection Refueled by Real World Data Collection
The lack of large realistic datasets presents a bottleneck in online
deception detection studies. In this paper, we apply a data collection method
based on social network analysis to quickly identify high-quality deceptive and
truthful online reviews from Amazon. The dataset contains more than 10,000
deceptive reviews and is diverse in product domains and reviewers. Using this
dataset, we explore effective general features for online deception detection
that perform well across domains. We demonstrate that with generalized features
- advertising speak and writing complexity scores - deception detection
performance can be further improved by adding additional deceptive reviews from
assorted domains in training. Finally, reviewer level evaluation gives an
interesting insight into different deceptive reviewers' writing styles.Comment: 10 pages, Accepted to Recent Advances in Natural Language Processing
(RANLP) 201
Controlling Linguistic Style Aspects in Neural Language Generation
Most work on neural natural language generation (NNLG) focus on controlling
the content of the generated text. We experiment with controlling several
stylistic aspects of the generated text, in addition to its content. The method
is based on conditioned RNN language model, where the desired content as well
as the stylistic parameters serve as conditioning contexts. We demonstrate the
approach on the movie reviews domain and show that it is successful in
generating coherent sentences corresponding to the required linguistic style
and content
Event knowledge in large language models: the gap between the impossible and the unlikely
Word co-occurrence patterns in language corpora contain a surprising amount
of conceptual knowledge. Large language models (LLMs), trained to predict words
in context, leverage these patterns to achieve impressive performance on
diverse semantic tasks requiring world knowledge. An important but understudied
question about LLMs' semantic abilities is whether they acquire generalized
knowledge of common events. Here, we test whether five pre-trained LLMs (from
2018's BERT to 2023's MPT) assign higher likelihood to plausible descriptions
of agent-patient interactions than to minimally different implausible versions
of the same event. Using three curated sets of minimal sentence pairs (total
n=1,215), we found that pre-trained LLMs possess substantial event knowledge,
outperforming other distributional language models. In particular, they almost
always assign higher likelihood to possible vs. impossible events (The teacher
bought the laptop vs. The laptop bought the teacher). However, LLMs show less
consistent preferences for likely vs. unlikely events (The nanny tutored the
boy vs. The boy tutored the nanny). In follow-up analyses, we show that (i) LLM
scores are driven by both plausibility and surface-level sentence features,
(ii) LLM scores generalize well across syntactic variants (active vs. passive
constructions) but less well across semantic variants (synonymous sentences),
(iii) some LLM errors mirror human judgment ambiguity, and (iv) sentence
plausibility serves as an organizing dimension in internal LLM representations.
Overall, our results show that important aspects of event knowledge naturally
emerge from distributional linguistic patterns, but also highlight a gap
between representations of possible/impossible and likely/unlikely events.Comment: The two lead authors have contributed equally to this wor
A Gold Standard Dataset for the Reviewer Assignment Problem
Many peer-review venues are either using or looking to use algorithms to
assign submissions to reviewers. The crux of such automated approaches is the
notion of the "similarity score"--a numerical estimate of the expertise of a
reviewer in reviewing a paper--and many algorithms have been proposed to
compute these scores. However, these algorithms have not been subjected to a
principled comparison, making it difficult for stakeholders to choose the
algorithm in an evidence-based manner. The key challenge in comparing existing
algorithms and developing better algorithms is the lack of the publicly
available gold-standard data that would be needed to perform reproducible
research. We address this challenge by collecting a novel dataset of similarity
scores that we release to the research community. Our dataset consists of 477
self-reported expertise scores provided by 58 researchers who evaluated their
expertise in reviewing papers they have read previously.
We use this data to compare several popular algorithms employed in computer
science conferences and come up with recommendations for stakeholders. Our main
findings are as follows. First, all algorithms make a non-trivial amount of
error. For the task of ordering two papers in terms of their relevance for a
reviewer, the error rates range from 12%-30% in easy cases to 36%-43% in hard
cases, highlighting the vital need for more research on the
similarity-computation problem. Second, most existing algorithms are designed
to work with titles and abstracts of papers, and in this regime the Specter+MFR
algorithm performs best. Third, to improve performance, it may be important to
develop modern deep-learning based algorithms that can make use of the full
texts of papers: the classical TD-IDF algorithm enhanced with full texts of
papers is on par with the deep-learning based Specter+MFR that cannot make use
of this information
On Correcting Inputs: Inverse Optimization for Online Structured Prediction
Algorithm designers typically assume that the input data is correct, and then
proceed to find "optimal" or "sub-optimal" solutions using this input data.
However this assumption of correct data does not always hold in practice,
especially in the context of online learning systems where the objective is to
learn appropriate feature weights given some training samples. Such scenarios
necessitate the study of inverse optimization problems where one is given an
input instance as well as a desired output and the task is to adjust the input
data so that the given output is indeed optimal. Motivated by learning
structured prediction models, in this paper we consider inverse optimization
with a margin, i.e., we require the given output to be better than all other
feasible outputs by a desired margin. We consider such inverse optimization
problems for maximum weight matroid basis, matroid intersection, perfect
matchings, minimum cost maximum flows, and shortest paths and derive the first
known results for such problems with a non-zero margin. The effectiveness of
these algorithmic approaches to online learning for structured prediction is
also discussed.Comment: Conference version to appear in FSTTCS, 201
- …