Search CORE

7 research outputs found

MegaWika: Millions of reports and their sources across 50 diverse languages

Author: Barham Samuel
Boyd-Graber Jordan
Jiang Zhengping
Liu Anqi
Martin Alexander
Murray Kenton
Van Durme Benjamin
Vashishtha Siddharth
Weller Orion
White Aaron Steven
Yarmohammadi Mahsa
Yuan Michelle
Publication venue
Publication date: 13/07/2023
Field of study

To foster the development of new models for collaborative AI-assisted report generation, we introduce MegaWika, consisting of 13 million Wikipedia articles in 50 diverse languages, along with their 71 million referenced source materials. We process this dataset for a myriad of applications, going beyond the initial Wikipedia citation extraction and web scraping of content, including translating non-English articles for cross-lingual applications and providing FrameNet parses for automated semantic analysis. MegaWika is the largest resource for sentence-level report generation and the only report generation dataset that is multilingual. We manually analyze the quality of this resource through a semantically stratified sample. Finally, we provide baseline results and trained models for crucial steps in automated report generation: cross-lingual question answering and citation retrieval.Comment: Submitted to ACL, 202

arXiv.org e-Print Archive

Improving Semantic Parsing Using Statistical Word Sense Disambiguation (Student Abstract)

Author: Allen James
Bose Ritwik
Vashishtha Siddharth
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 03/04/2020
Field of study

A Semantic Parser generates a logical form graph from an utterance where the edges are semantic roles and nodes are word senses in an ontology that supports reasoning. The generated representation attempts to capture the full meaning of the utterance. While the process of parsing works to resolve lexical ambiguity, a number of errors in the logical forms arise from incorrectly assigned word sense determinations. This is especially true in logical and rule-based semantic parsers. Although the performance of statistical word sense disambiguation methods is superior to the word sense output of semantic parser, these systems do not produce the rich role structure or a detailed semantic representation of the sentence content. In this work, we use decisions from a statistical WSD system to inform a logical semantic parser and greatly improve semantic type assignments in the resulting logical forms

Association for the Advancement of Artificial Intelligence: AAAI Publications

On Event Individuation for Document-Level Information Extraction

Author: Chen Yunmo
Gantt William
Kriz Reno
Vashishtha Siddharth
White Aaron Steven
Publication venue
Publication date: 07/05/2023
Field of study

As information extraction (IE) systems have grown more adept at processing whole documents, the classic task of template filling has seen renewed interest as benchmark for document-level IE. In this position paper, we call into question the suitability of template filling for this purpose. We argue that the task demands definitive answers to thorny questions of event individuation -- the problem of distinguishing distinct events -- about which even human experts disagree. Through an annotation study and error analysis, we show that this raises concerns about the usefulness of template filling metrics, the quality of datasets for the task, and the ability of models to learn it. Finally, we consider possible solutions

arXiv.org e-Print Archive

Decomposing Generalization: Models of Generic, Habitual, and Episodic Statements

Author: Abzianidze Lasha
Agresti Alan
Baayen R.H.
Baker Collin F.
Bauer Lisa
Bejan Cosmin Adrian
Brysbaert Marc
Carlson Greg
Carlson Gregory N.
Ciaramita Massimiliano
Cybulska Agata
Devlin Jacob
Doddington George R.
Dowty David
Fellbaum Christiane
Friedrich Annemari
Friedrich Annemarie
Friedrich Annemarie
Friedrich Annemarie
Friedrich Annemarie
Friedrich Annemarie
Gelman Andrew
Grimm Scott
Grimm Scott
Hobbs Jerry R.
Ide Nancy
Kingma Diederik P.
Lee Heeyoung
Lee Kenton
Leslie Sarah-Jane
Louis Annie
Maienborn Claudi
Marneffe Marie-Catherine De
McCarthy John
McCarthy John
McCarthy John
Minsky Marvin
O’Gorman Tim
Pennington Jeffrey
Peters Matthew
Poesio Massimo
Poesio Massimo
Pustejovsky James
Reiter Nils
Reiter Raymond
Rudinger Rachel
Schank Roger C.
Silveira Natalia
Stanovsky Gabriel
Vashishtha Siddharth
Vendler Zeno
White Aaron Steven
White Aaron Steven
Zhang Sheng
Publication venue: 'MIT Press - Journals'
Publication date
Field of study

Crossref