Search CORE

71 research outputs found

A Unified Account of Locality Constraints for Clitic Climbing and Long Scrambling

Author: Kulick Seth
Publication venue: ScholarlyCommons
Publication date: 01/01/2000
Field of study

Process Algebra, CCS, and Bisimulation Decidability

Author: Kulick Seth
Publication venue: ScholarlyCommons
Publication date: 01/01/1994
Field of study

Over the past fifteen years, there has been intensive study of formal systems that can model concurrency and communication. Two such systems are the Calculus of Communicating Systems, and the Algebra of Communicating Processes. The objective of this paper has two aspects; (1) to study the characteristics and features of these two systems, and (2) to investigate two interesting formal proofs concerning issues of decidability of bisimulation equivalence in these systems. An examination of the processes that generate context-free languages as a trace set shows that their bisimulation equivalence is decidable, in contrast to the undecidability of their trace set equivalence. Recent results have also shown that the bisimulation equivalence problem for processes with a limited amount of concurrency is decidable

CiteSeerX

ScholarlyCommons@Penn

Exceptional Case Marking in the Xtag System

Author: Kulick Seth
Publication venue: ScholarlyCommons
Publication date: 01/01/1997
Field of study

ScholarlyCommons@Penn

Recommended from our members

Parsing Early Modern English for Linguistic Search

Author: Kulick Seth
Ryant Neville
Santorini Beatrice
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/02/2022
Field of study

This work addresses the question of whether the output of a state-of-the-art parser is accurate enough to support research in theoretical linguistics. In order to build reliable models of syntactic change, we aim to eventually parse the 1.5-billion-word Early English Books Online (EEBO) corpus. But since EEBO is not yet parsed, we begin by constructing and testing a parser on the 1.7-million-word Penn-Helsinki Parsed Corpus of Early Modern English (PPCEME). In order to obtain robust results, we define an 8-fold split on PPCEME. We then evaluate the parser with evalb and, more relevantly for us, with a task-specific metric - namely, its accuracy in parsing 6 sentence types necessary to track the rise of auxiliary do (as in They did not come vs. its historical precursor They came not). Retrieving the relevant sentences from the gold and test versions with CorpusSearch queries, we find that the parser\u27s accuracy promises to be sufficient for our purposes. A remaining concern is the variability of the output, which we plan to address with three pieces of future work sketched in the conclusion

ScholarWorks@UMass Amherst

Recommended from our members

Parsing Early English Books Online for Linguistic Search

Author: Kulick Seth
Ryant Neville
Santorini Beatrice
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/06/2023
Field of study

This work addresses the question of how to evaluate a state-of-the-art parser on Early English Books Online (EEBO), a 1.5-billion-word collection of unannotated text, for utility in linguistic research. Earlier work has trained and evaluated a parser on the 1.7-million-word Penn-Helsinki Parsed Corpus of Early Modern English (PPCEME) and defined a query-based evaluation to score the retrieval of 6 specific sentence types of interest. However, significant differences between EEBO and the manually-annotated PPCEME make it inappropriate to assume that these results will generalize to EEBO. Fortunately, an overlap of source material in PPCEME and EEBO allows us to establish a token alignment between them and to score the POS-tagging on EEBO. We use this alignment together with a more principled version of the query-based evaluation to score the recovery of sentence types on this subset of EEBO, thus allowing us to estimate the increase in error rate on EEBO compared to PPCEME. The increase is largely due to differences in sentence segmentation between the two corpora, pointing the way to further improvements

ScholarWorks@UMass Amherst

A Part-of-Speech Tagger for Yiddish: First Steps in Tagging the Yiddish Book Center Corpus

Author: Kulick Seth
Ryant Neville
Santorini Beatrice
Wallenberg Joel
Publication venue
Publication date: 03/04/2022
Field of study

We describe the construction and evaluation of a part-of-speech tagger for Yiddish (the first one, to the best of our knowledge). This is the first step in a larger project of automatically assigning part-of-speech tags and syntactic structure to Yiddish text for purposes of linguistic research. We combine two resources for the current work - an 80K word subset of the Penn Parsed Corpus of Historical Yiddish (PPCHY) (Santorini, 2021) and 650 million words of OCR'd Yiddish text from the Yiddish Book Center (YBC). We compute word embeddings on the YBC corpus, and these embeddings are used with a tagger model trained and evaluated on the PPCHY. Yiddish orthography in the YBC corpus has many spelling inconsistencies, and we present some evidence that even simple non-contextualized embeddings are able to capture the relationships among spelling variants without the need to first "standardize" the corpus. We evaluate the tagger performance on a 10-fold cross-validation split, with and without the embeddings, showing that the embeddings improve tagger performance. However, a great deal of work remains to be done, and we conclude by discussing some next steps, including the need for additional annotated training and test data

arXiv.org e-Print Archive

Nominalization and Alternations in Biomedical Language

Author: Adam Meyers
Adam Meyers
Adam Meyers
Adam Meyers
BarbaraH Partee
Ben Goertzel
Beth Levin
Carol Friedman
CharlesJ Fillmore
Christiane Fellbaum
DeborahA Dahl
Douglas Biber
George Dunham
George Hripcsak
Gondy Leroy
Gondy Leroy
James Pustejovsky
Jin-Dong Kim
JM Ko
John Lehrberger
Jonathan Schuman
K. Bretonnel Cohen
Karin Verspoor
KBretonnel Cohen
KBretonnel Cohen
Laurie Bauer
Lawrence Hunter
Leroy Gondy
Lynette Hirschman
M Narayanaswamy
Malka Rappaport-Hovav
Maria Koptjevskaja-Tamm
Martha Palmer
Martha Palmer
Martha Palmer
MartinF Porter
Michael Johnston
Michael Johnston
Naomi Sager
Naomi Sager
ParantuK Shah
PhilipV Ogren
PhilipV Ogren
Pierre Zweigenbaum
Ralph Grishman
Randolph Quirk
Richard Kittredge
Richard Tzong-Han Tsai
Robert P. Futrelle
RobertB Lees
Ron Artstein
Sameer Pradhan
Seth Kulick
T Ono
Thomas Herbst
Thomas Roeper
ThomasC Rindflesch
TimothyW Finin
Tony McEnery
Tuangthong Wattarujeekrit
Wen-Chi Chou
X Yuan
Yacov Kogan
Yuka Tateisi
Zellig Harris
Zheng Ping Jiang
ZZ Hu
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

Background: This paper presents data on alternations in the argument structure of common domain-specific verbs and their associated verbal nominalizations in the PennBioIE corpus. Alternation is the term in theoretical linguistics for variations in the surface syntactic form of verbs, e.g. the different forms of stimulate in FSH stimulates follicular development and follicular development is stimulated by FSH. The data is used to assess the implications of alternations for biomedical text mining systems and to test the fit of the sublanguage model to biomedical texts. Methodology/Principal Findings: We examined 1,872 tokens of the ten most common domain-specific verbs or their zerorelated nouns in the PennBioIE corpus and labelled them for the presence or absence of three alternations. We then annotated the arguments of 746 tokens of the nominalizations related to these verbs and counted alternations related to the presence or absence of arguments and to the syntactic position of non-absent arguments. We found that alternations are quite common both for verbs and for nominalizations. We also found a previously undescribed alternation involving an adjectival present participle. Conclusions/Significance: We found that even in this semantically restricted domain, alternations are quite common, and alternations involving nominalizations are exceptionally diverse. Nonetheless, the sublanguage model applies to biomedica

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Using Higher-Order Logic Programming for Semantic Interpretation of Coordinate Constructs

Author: Seth Kulick
Publication venue
Publication date: 01/01/1995
Field of study

Many theories of semantic interpretation use -term manipulation to compositionally compute the meaning of a sentence. These theorie

CiteSeerX

Crossref