17 research outputs found
Finite-state morphological analysis of Persian
This paper describes a two-level morphological analyzer for Persian using a system based on the Xerox finite state tools. Persian language presents certain challenges to computational analysis: There is a complex verbal conjugation paradigm which includes long-distance morphological dependencies; phonological alternations apply at morpheme boundaries; word and noun phrase boundaries are difficult to define since morphemes may be detached from their stems and distinct words can appear without an intervening space. In this work, we develop these problems and provide solutions in a finite-state morphology system.
Human language reveals a universal positivity bias
Using human evaluation of 100,000 words spread across 24 corpora in 10 languages diverse in origin and culture, we present evidence of a deep imprint of human sociality in language, observing that (i ) the words of natural human language possess a universal positivity bias, (ii ) the estimated emotional content of words is consistent between languages under translation, and (iii ) this positivity bias is strongly independent of frequency of word use. Alongside these general regularities, we describe interlanguage variations in the emotional spectrum of languages that allow us to rank corpora. We also show how our word evaluations can be used to construct physical-like instruments for both real-time and offline measurement of the emotional content of large-scale texts
Reply to Garcia et al.: Common mistakes in measuring frequency-dependent word characteristics
We demonstrate that the concerns expressed by Garcia et al. are misplaced,
due to (1) a misreading of our findings in [1]; (2) a widespread failure to
examine and present words in support of asserted summary quantities based on
word usage frequencies; and (3) a range of misconceptions about word usage
frequency, word rank, and expert-constructed word lists. In particular, we show
that the English component of our study compares well statistically with two
related surveys, that no survey design influence is apparent, and that
estimates of measurement error do not explain the positivity biases reported in
our work and that of others. We further demonstrate that for the frequency
dependence of positivity---of which we explored the nuances in great detail in
[1]---Garcia et al. did not perform a reanalysis of our data---they instead
carried out an analysis of a different, statistically improper data set and
introduced a nonlinearity before performing linear regression.Comment: 5 pages, 2 figures, 1 table. Expanded version of reply appearing in
PNAS 201
Unification-Based Persian Morphology
this paper, we describe the implementation of an inflectional morphological analyzer for Persian, which is based on finite state transducers and typed feature structures with unification. The analyzer was designed to provide an interface to the syntactic parser in the Shiraz Persian-English machine translation system (http://crl.nmsu.edu/shiraz) and was tested on online newspaper articles. The system includes a dictionary with 50,000 entries which is used for lookup after morphological analysis has been performed