Search CORE

34 research outputs found

Recommended from our members

Opacity and the Accessibility of Subject in German A.C.I-Constructions

Author: Pustejovsky James D.
Publication venue: ScholarWorks@UMass Amherst
Publication date: 12/09/2020
Field of study

ScholarWorks@UMass Amherst

Causal schema induction for knowledge discovery

Author: Hwang Jena D.
Pustejovsky James
Regan Michael
Sakaguchi Keisuke
Publication venue
Publication date: 27/03/2023
Field of study

Making sense of familiar yet new situations typically involves making generalizations about causal schemas, stories that help humans reason about event sequences. Reasoning about events includes identifying cause and effect relations shared across event instances, a process we refer to as causal schema induction. Statistical schema induction systems may leverage structural knowledge encoded in discourse or the causal graphs associated with event meaning, however resources to study such causal structure are few in number and limited in size. In this work, we investigate how to apply schema induction models to the task of knowledge discovery for enhanced search of English-language news texts. To tackle the problem of data scarcity, we present Torquestra, a manually curated dataset of text-graph-schema units integrating temporal, event, and causal structures. We benchmark our dataset on three knowledge discovery tasks, building and evaluating models for each. Results show that systems that harness causal structure are effective at identifying texts sharing similar causal meaning components rather than relying on lexical cues alone. We make our dataset and models available for research purposes.Comment: 8 pages, appendi

arXiv.org e-Print Archive

A road map for interoperable language resource metadata

Author: Calzolari Nicoletta
Choukri Khalid
Cieri Christopher
Ide Nancy
Langendoen D. Terence
Leveling Johannes
Palmer Martha
Pustejovsky James
Publication venue: European Language Resources Association
Publication date: 01/01/2010
Field of study

LRs remain expensive to create and thus rare relative to demand across languages and technology types. The accidental re-creation of an LR that already exists is a nearly unforgiveable waste of scarce resources that is unfortunately not so easy to avoid. The number of catalogs the HLT researcher must search, with their different formats, make it possible to overlook an existing resource. This paper sketches the sources of this problem and outlines a proposal to rectify along with a new vision of LR cataloging that will to facilitates the documentation and exploitation of a much wider range of LRs than previously considered

CiteSeerX

DCU Online Research Access Service

Biomedical term mapping databases

Author: Adar Eytan
Altman Russ B.
Chang Jeffrey T.
Garner Harold R.
Pustejovsky James
Wren Jonathan D.
Publication venue: Oxford University Press
Publication date: 17/12/2004
Field of study

Longer words and phrases are frequently mapped onto a shorter form such as abbreviations or acronyms for efficiency of communication. These abbreviations are pervasive in all aspects of biology and medicine and as the amount of biomedical literature grows, so does the number of abbreviations and the average number of definitions per abbreviation. Even more confusing, different authors will often abbreviate the same word/phrase differently. This ambiguity impedes our ability to retrieve information, integrate databases and mine textual databases for content. Efforts to standardize nomenclature, especially those doing so retrospectively, need to be aware of different abbreviatory mappings and spelling variations. To address this problem, there have been several efforts to develop computer algorithms to identify the mapping of terms between short and long form within a large body of literature. To date, four such algorithms have been applied to create online databases that comprehensively map biomedical terms and abbreviations within MEDLINE: ARGH (http://lethargy.swmed.edu/ARGH/argh.asp), the Stanford Biomedical Abbreviation Server (http://bionlp.stanford.edu/abbreviation/), AcroMed (http://medstract.med.tufts.edu/acro1.1/index.htm) and SaRAD (http://www.hpl.hp.com/research/idl/projects/abbrev.html). In addition to serving as useful computational tools, these databases serve as valuable references that help biologists keep up with an ever-expanding vocabulary of terms

CiteSeerX

Crossref

PubMed Central

Generativity and Explanation in Semantics: A Reply to Fodor and Lepore

Author: Apresjan Jurij
Bach Emmon
Godard D.
Jackendoff Ray
James Pustejovsky
Lepore Ernie
Levin Beth
Montague Richard
Ostler N.
Parsons Terence
Pustejovsky James
Pustejovsky James
Pustejovsky James
Pustejovsky James
Publication venue: 'MIT Press - Journals'
Publication date
Field of study

Crossref

A computational theory of prose style for natural language generation

Author: David D. Mcdonald
James D. Pustejovsky
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/1985
Field of study

In this paper we report on initial rematch we have conducted on a computational theory of prose style. Our theory s'peaks to the following major points: 1. Where in the generation process style is taken into account

CiteSeerX

Crossref

Natural Language Generation

Author: David D. Mcdonald
James D. Pustejovsky
Publication venue: John Wilwy and Sons
Publication date
Field of study

We report here on a significant new set of capabilities that we have incorporated into our language generation system MUMBLE. Their impact will be to greatly simplify the work of any text planner that uses MUMBLE as ita linguistics component since MUMBLE can now take on many of the planner's text organization and decision-making problems with markedly less hand-tailoring of algorithms in either component. Briefly these new capabilies are the following: (a) ATTACHMENT. A new processing stage within MUMBLE that allows us to readily implement the conventions that go into defining a text's intended prose style, e.g. whether the text should have complex sentences or simple ones, compounds or embedding*, reduced or full relative clauses, etc. Stylistic conventions are given as independently stated rules that can be changed according to the situation. (b) REALIZATION CLASSES are a mechanism for organizing both the transformational and lexical choices for linguistically realizing a conceptual object. The mechanism highlights the intentional criteria which control selection decisions. These criteria effectively constitute an ''inteiiingua' ' between planner and linguistic component, describing the rhetorical uses to which a text choice can be put while allowing its lingustic details to be encapsulated. The first part of our paper (sections 2 and 3) describes our general approach to generation; the rest illustrates the new capabilities through examples from the UMass COUNSELOR Project. This project is a large new effort to develop a natural language discourse system based on the HYPO system [Rissland & Ashley 1964], which acts as a legal advisor suggesting relevant dimensions and case references for arguing hypothetical legal cases in trade-secret law. At various relevant points we briefly contrast our work with that of Appelt, Danlois, Gabriel, Jacobs, Man

CiteSeerX

A Gradual Effects Model for Single-Case Designs

Author: Ayres K.
Daniel M. Swan
Fox J.
Gast D. L.
James E. Pustejovsky
Pustejovsky J. E.
R Core Team
Shadish W. R.
Publication venue: 'Informa UK Limited'
Publication date
Field of study

Crossref

Small-Sample Methods for Cluster-Robust Variance Estimation and Hypothesis Testing in Fixed Effects Models

Author: Bell R. M.
Carter A. V.
Elizabeth Tipton
James E. Pustejovsky
McCaffrey D. F.
———
———
———
Publication venue: 'Informa UK Limited'
Publication date
Field of study

Crossref