8,967 research outputs found
Examining Scientific Writing Styles from the Perspective of Linguistic Complexity
Publishing articles in high-impact English journals is difficult for scholars
around the world, especially for non-native English-speaking scholars (NNESs),
most of whom struggle with proficiency in English. In order to uncover the
differences in English scientific writing between native English-speaking
scholars (NESs) and NNESs, we collected a large-scale data set containing more
than 150,000 full-text articles published in PLoS between 2006 and 2015. We
divided these articles into three groups according to the ethnic backgrounds of
the first and corresponding authors, obtained by Ethnea, and examined the
scientific writing styles in English from a two-fold perspective of linguistic
complexity: (1) syntactic complexity, including measurements of sentence length
and sentence complexity; and (2) lexical complexity, including measurements of
lexical diversity, lexical density, and lexical sophistication. The
observations suggest marginal differences between groups in syntactical and
lexical complexity.Comment: 6 figure
A Simple Post-Processing Technique for Improving Readability Assessment of Texts using Word Mover's Distance
Assessing the proper difficulty levels of reading materials or texts in
general is the first step towards effective comprehension and learning. In this
study, we improve the conventional methodology of automatic readability
assessment by incorporating the Word Mover's Distance (WMD) of ranked texts as
an additional post-processing technique to further ground the difficulty level
given by a model. Results of our experiments on three multilingual datasets in
Filipino, German, and English show that the post-processing technique
outperforms previous vanilla and ranking-based models using SVM
Machine Learning for Readability Assessment and Text Simplification in Crisis Communication: A Systematic Review
In times of social media, crisis managers can interact with the citizens in a variety of ways. Since machine learning has already been used to classify messages from the population, the question is, whether such technologies can play a role in the creation of messages from crisis managers to the population. This paper focuses on an explorative research revolving around selected machine learning solutions for crisis communication. We present systematic literature reviews of readability assessment and text simplification. Our research suggests that readability assessment has the potential for an effective use in crisis communication, but there is a lack of sufficient training data. This also applies to text simplification, where an exact assessment is only partly possible due to unreliable or non-existent training data and validation measures
Exploring Linguistic Features for Turkish Text Readability
This paper presents the first comprehensive study on automatic readability
assessment of Turkish texts. We combine state-of-the-art neural network models
with linguistic features at lexical, morphosyntactic, syntactic and discourse
levels to develop an advanced readability tool. We evaluate the effectiveness
of traditional readability formulas compared to modern automated methods and
identify key linguistic features that determine the readability of Turkish
texts
LFTK: Handcrafted Features in Computational Linguistics
Past research has identified a rich set of handcrafted linguistic features
that can potentially assist various tasks. However, their extensive number
makes it difficult to effectively select and utilize existing handcrafted
features. Coupled with the problem of inconsistent implementation across
research works, there has been no categorization scheme or generally-accepted
feature names. This creates unwanted confusion. Also, most existing handcrafted
feature extraction libraries are not open-source or not actively maintained. As
a result, a researcher often has to build such an extraction system from the
ground up.
We collect and categorize more than 220 popular handcrafted features grounded
on past literature. Then, we conduct a correlation analysis study on several
task-specific datasets and report the potential use cases of each feature.
Lastly, we devise a multilingual handcrafted linguistic feature extraction
system in a systematically expandable manner. We open-source our system for
public access to a rich set of pre-implemented handcrafted features. Our system
is coined LFTK and is the largest of its kind. Find it at
github.com/brucewlee/lftk.Comment: BEA @ ACL 202
- …