71 research outputs found
Process Algebra, CCS, and Bisimulation Decidability
Over the past fifteen years, there has been intensive study of formal systems that can model concurrency and communication. Two such systems are the Calculus of Communicating Systems, and the Algebra of Communicating Processes. The objective of this paper has two aspects; (1) to study the characteristics and features of these two systems, and (2) to investigate two interesting formal proofs concerning issues of decidability of bisimulation equivalence in these systems. An examination of the processes that generate context-free languages as a trace set shows that their bisimulation equivalence is decidable, in contrast to the undecidability of their trace set equivalence. Recent results have also shown that the bisimulation equivalence problem for processes with a limited amount of concurrency is decidable
Recommended from our members
Parsing Early Modern English for Linguistic Search
This work addresses the question of whether the output of a state-of-the-art parser is accurate enough to support research in theoretical linguistics. In order to build reliable models of syntactic change, we aim to eventually parse the 1.5-billion-word Early English Books Online (EEBO) corpus. But since EEBO is not yet parsed, we begin by constructing and testing a parser on the 1.7-million-word Penn-Helsinki Parsed Corpus of Early Modern English (PPCEME). In order to obtain robust results, we define an 8-fold split on PPCEME. We then evaluate the parser with evalb and, more relevantly for us, with a task-specific metric - namely, its accuracy in parsing 6 sentence types necessary to track the rise of auxiliary do (as in They did not come vs. its historical precursor They came not). Retrieving the relevant sentences from the gold and test versions with CorpusSearch queries, we find that the parser\u27s accuracy promises to be sufficient for our purposes. A remaining concern is the variability of the output, which we plan to address with three pieces of future work sketched in the conclusion
Recommended from our members
Parsing Early English Books Online for Linguistic Search
This work addresses the question of how to evaluate a state-of-the-art parser on Early English Books Online (EEBO), a 1.5-billion-word collection of unannotated text, for utility in linguistic research. Earlier work has trained and evaluated a parser on the 1.7-million-word Penn-Helsinki Parsed Corpus of Early Modern English (PPCEME) and defined a query-based evaluation to score the retrieval of 6 specific sentence types of interest. However, significant differences between EEBO and the manually-annotated PPCEME make it inappropriate to assume that these results will generalize to EEBO. Fortunately, an overlap of source material in PPCEME and EEBO allows us to establish a token alignment between them and to score the POS-tagging on EEBO. We use this alignment together with a more principled version of the query-based evaluation to score the recovery of sentence types on this subset of EEBO, thus allowing us to estimate the increase in error rate on EEBO compared to PPCEME. The increase is largely due to differences in sentence segmentation between the two corpora, pointing the way to further improvements
A Part-of-Speech Tagger for Yiddish: First Steps in Tagging the Yiddish Book Center Corpus
We describe the construction and evaluation of a part-of-speech tagger for
Yiddish (the first one, to the best of our knowledge). This is the first step
in a larger project of automatically assigning part-of-speech tags and
syntactic structure to Yiddish text for purposes of linguistic research. We
combine two resources for the current work - an 80K word subset of the Penn
Parsed Corpus of Historical Yiddish (PPCHY) (Santorini, 2021) and 650 million
words of OCR'd Yiddish text from the Yiddish Book Center (YBC). We compute word
embeddings on the YBC corpus, and these embeddings are used with a tagger model
trained and evaluated on the PPCHY. Yiddish orthography in the YBC corpus has
many spelling inconsistencies, and we present some evidence that even simple
non-contextualized embeddings are able to capture the relationships among
spelling variants without the need to first "standardize" the corpus. We
evaluate the tagger performance on a 10-fold cross-validation split, with and
without the embeddings, showing that the embeddings improve tagger performance.
However, a great deal of work remains to be done, and we conclude by discussing
some next steps, including the need for additional annotated training and test
data
Nominalization and Alternations in Biomedical Language
Background: This paper presents data on alternations in the argument structure of common domain-specific verbs and their associated verbal nominalizations in the PennBioIE corpus. Alternation is the term in theoretical linguistics for variations in the surface syntactic form of verbs, e.g. the different forms of stimulate in FSH stimulates follicular development and follicular development is stimulated by FSH. The data is used to assess the implications of alternations for biomedical text mining systems and to test the fit of the sublanguage model to biomedical texts. Methodology/Principal Findings: We examined 1,872 tokens of the ten most common domain-specific verbs or their zerorelated nouns in the PennBioIE corpus and labelled them for the presence or absence of three alternations. We then annotated the arguments of 746 tokens of the nominalizations related to these verbs and counted alternations related to the presence or absence of arguments and to the syntactic position of non-absent arguments. We found that alternations are quite common both for verbs and for nominalizations. We also found a previously undescribed alternation involving an adjectival present participle. Conclusions/Significance: We found that even in this semantically restricted domain, alternations are quite common, and alternations involving nominalizations are exceptionally diverse. Nonetheless, the sublanguage model applies to biomedica
Using Higher-Order Logic Programming for Semantic Interpretation of Coordinate Constructs
Many theories of semantic interpretation use -term manipulation to compositionally compute the meaning of a sentence. These theorie
- …