21 research outputs found
A Data-Oriented Model of Literary Language
We consider the task of predicting how literary a text is, with a gold
standard from human ratings. Aside from a standard bigram baseline, we apply
rich syntactic tree fragments, mined from the training set, and a series of
hand-picked features. Our model is the first to distinguish degrees of highly
and less literary novels using a variety of lexical and syntactic features, and
explains 76.0 % of the variation in literary ratings.Comment: To be published in EACL 2017, 11 page
Examining Scientific Writing Styles from the Perspective of Linguistic Complexity
Publishing articles in high-impact English journals is difficult for scholars
around the world, especially for non-native English-speaking scholars (NNESs),
most of whom struggle with proficiency in English. In order to uncover the
differences in English scientific writing between native English-speaking
scholars (NESs) and NNESs, we collected a large-scale data set containing more
than 150,000 full-text articles published in PLoS between 2006 and 2015. We
divided these articles into three groups according to the ethnic backgrounds of
the first and corresponding authors, obtained by Ethnea, and examined the
scientific writing styles in English from a two-fold perspective of linguistic
complexity: (1) syntactic complexity, including measurements of sentence length
and sentence complexity; and (2) lexical complexity, including measurements of
lexical diversity, lexical density, and lexical sophistication. The
observations suggest marginal differences between groups in syntactical and
lexical complexity.Comment: 6 figure