11 research outputs found
Does My Rebuttal Matter? Insights from a Major NLP Conference
Peer review is a core element of the scientific process, particularly in
conference-centered fields such as ML and NLP. However, only few studies have
evaluated its properties empirically. Aiming to fill this gap, we present a
corpus that contains over 4k reviews and 1.2k author responses from ACL-2018.
We quantitatively and qualitatively assess the corpus. This includes a pilot
study on paper weaknesses given by reviewers and on quality of author
responses. We then focus on the role of the rebuttal phase, and propose a novel
task to predict after-rebuttal (i.e., final) scores from initial reviews and
author responses. Although author responses do have a marginal (and
statistically significant) influence on the final scores, especially for
borderline papers, our results suggest that a reviewer's final score is largely
determined by her initial score and the distance to the other reviewers'
initial scores. In this context, we discuss the conformity bias inherent to
peer reviewing, a bias that has largely been overlooked in previous research.
We hope our analyses will help better assess the usefulness of the rebuttal
phase in NLP conferences.Comment: Accepted to NAACL-HLT 2019. Main paper plus supplementary materia
The Open Review-Based (ORB) dataset: Towards Automatic Assessment of Scientific Papers and Experiment Proposals in High-Energy Physics
With the Open Science approach becoming important for research, the evolution
towards open scientific-paper reviews is making an impact on the scientific
community. However, there is a lack of publicly available resources for
conducting research activities related to this subject, as only a limited
number of journals and conferences currently allow access to their review
process for interested parties. In this paper, we introduce the new
comprehensive Open Review-Based dataset (ORB); it includes a curated list of
more than 36,000 scientific papers with their more than 89,000 reviews and
final decisions. We gather this information from two sources: the
OpenReview.net and SciPost.org websites. However, given the volatile nature of
this domain, the software infrastructure that we introduce to supplement the
ORB dataset is designed to accommodate additional resources in the future. The
ORB deliverables include (1) Python code (interfaces and implementations) to
translate document data and metadata into a structured and high-level
representation, (2) an ETL process (Extract, Transform, Load) to facilitate the
automatic updates from defined sources and (3) data files representing the
structured data. The paper presents our data architecture and an overview of
the collected data along with relevant statistics. For illustration purposes,
we also discuss preliminary Natural-Language-Processing-based experiments that
aim to predict (1) papers' acceptance based on their textual embeddings, and
(2) grading statistics inferred from embeddings as well. We believe ORB
provides a valuable resource for researchers interested in open science and
review, with our implementation easing the use of this data for further
analysis and experimentation. We plan to update ORB as the field matures as
well as introduce new resources even more fitted to dedicated scientific
domains such as High-Energy Physics.Comment: 13 pages, supplementary material included, dataset availabl
Aspect-based Sentiment Analysis of Scientific Reviews
Scientific papers are complex and understanding the usefulness of these
papers requires prior knowledge. Peer reviews are comments on a paper provided
by designated experts on that field and hold a substantial amount of
information, not only for the editors and chairs to make the final decision,
but also to judge the potential impact of the paper. In this paper, we propose
to use aspect-based sentiment analysis of scientific reviews to be able to
extract useful information, which correlates well with the accept/reject
decision.
While working on a dataset of close to 8k reviews from ICLR, one of the top
conferences in the field of machine learning, we use an active learning
framework to build a training dataset for aspect prediction, which is further
used to obtain the aspects and sentiments for the entire dataset. We show that
the distribution of aspect-based sentiments obtained from a review is
significantly different for accepted and rejected papers. We use the aspect
sentiments from these reviews to make an intriguing observation, certain
aspects present in a paper and discussed in the review strongly determine the
final recommendation. As a second objective, we quantify the extent of
disagreement among the reviewers refereeing a paper. We also investigate the
extent of disagreement between the reviewers and the chair and find that the
inter-reviewer disagreement may have a link to the disagreement with the chair.
One of the most interesting observations from this study is that reviews, where
the reviewer score and the aspect sentiments extracted from the review text
written by the reviewer are consistent, are also more likely to be concurrent
with the chair's decision.Comment: Accepted in JCDL'2
ReviewFlow: Intelligent Scaffolding to Support Academic Peer Reviewing
Peer review is a cornerstone of science. Research communities conduct peer
reviews to assess contributions and to improve the overall quality of science
work. Every year, new community members are recruited as peer reviewers for the
first time. How could technology help novices adhere to their community's
practices and standards for peer reviewing? To better understand peer review
practices and challenges, we conducted a formative study with 10 novices and 10
experts. We found that many experts adopt a workflow of annotating,
note-taking, and synthesizing notes into well-justified reviews that align with
community standards. Novices lack timely guidance on how to read and assess
submissions and how to structure paper reviews. To support the peer review
process, we developed ReviewFlow -- an AI-driven workflow that scaffolds
novices with contextual reflections to critique and annotate submissions,
in-situ knowledge support to assess novelty, and notes-to-outline synthesis to
help align peer reviews with community expectations. In a within-subjects
experiment, 16 inexperienced reviewers wrote reviews in two conditions: using
ReviewFlow and using a baseline environment with minimal guidance. With
ReviewFlow, participants produced more comprehensive reviews, identifying more
pros and cons. While participants appreciated the streamlined process support
from ReviewFlow, they also expressed concerns about using AI as part of the
scientific review process. We discuss the implications of using AI to scaffold
the peer review process on scientific work and beyond.Comment: 19 pages, accepted at the 29th ACM Conference on Intelligent User
Interfaces (IUI 2024
Topic Modeling in Theory and Practice
Topic models can decompose a large corpus of text into a relatively small set of interpretable themes or topics, potentially enabling a domain expert to explore and analyze a corpus more efficiently. However, in my work, I have found that theories put forth by topic modeling research are not always borne out in practice. In this dissertation, I use case studies to explore four theories of topic modeling. While these theories are not explicitly stated, I show that they are communicated implicitly, some within an individual study and others more diffusely. I show that this implicit knowledge fails to hold in practice in the settings I consider. While my work is confined to topic modeling research and moreover concentrated on the latent Dirichlet allocation topic model, I argue that these kinds of gaps may pervade scientific research and present an obstacle to improving the diversity of the research community