4,427 research outputs found
ReadNet: A Hierarchical Transformer Framework for Web Article Readability Analysis
Analyzing the readability of articles has been an important sociolinguistic
task. Addressing this task is necessary to the automatic recommendation of
appropriate articles to readers with different comprehension abilities, and it
further benefits education systems, web information systems, and digital
libraries. Current methods for assessing readability employ empirical measures
or statistical learning techniques that are limited by their ability to
characterize complex patterns such as article structures and semantic meanings
of sentences. In this paper, we propose a new and comprehensive framework which
uses a hierarchical self-attention model to analyze document readability. In
this model, measurements of sentence-level difficulty are captured along with
the semantic meanings of each sentence. Additionally, the sentence-level
features are incorporated to characterize the overall readability of an article
with consideration of article structures. We evaluate our proposed approach on
three widely-used benchmark datasets against several strong baseline
approaches. Experimental results show that our proposed method achieves the
state-of-the-art performance on estimating the readability for various web
articles and literature.Comment: ECIR 202
Tapping the Potential of Coherence and Syntactic Features in Neural Models for Automatic Essay Scoring
In the prompt-specific holistic score prediction task for Automatic Essay
Scoring, the general approaches include pre-trained neural model, coherence
model, and hybrid model that incorporate syntactic features with neural model.
In this paper, we propose a novel approach to extract and represent essay
coherence features with prompt-learning NSP that shows to match the
state-of-the-art AES coherence model, and achieves the best performance for
long essays. We apply syntactic feature dense embedding to augment BERT-based
model and achieve the best performance for hybrid methodology for AES. In
addition, we explore various ideas to combine coherence, syntactic information
and semantic embeddings, which no previous study has done before. Our combined
model also performs better than the SOTA available for combined model, even
though it does not outperform our syntactic enhanced neural model. We further
offer analyses that can be useful for future study.Comment: Accepted to "2022 International Conference on Asian Language
Processing (IALP)
Man vs machine â Detecting deception in online reviews
This study focused on three main research objectives: analyzing the methods used to identify deceptive online consumer reviews, evaluating insights provided by multi-method automated approaches based on individual and aggregated review data, and formulating a review interpretation framework for identifying deception. The theoretical framework is based on two critical deception-related models, information manipulation theory and self-presentation theory. The findings confirm the interchangeable characteristics of the various automated text analysis methods in drawing insights about review characteristics and underline their significant complementary aspects. An integrative multi-method model that approaches the data at the individual and aggregate level provides more complex insights regarding the quantity and quality of review information, sentiment, cues about its relevance and contextual information, perceptual aspects, and cognitive material
Cohesion features in ESL reading: Comparing beginning, intermediate and advanced textbooks
This study of English as a second language (ESL) reading textbooks investigates cohesion in reading passages from 27 textbooks. The guiding research questions were whether and how cohesion differs across textbooks written for beginning, intermediate, and advanced second language readers. Using a computational tool called Coh-Metrix, textual features were compared across the three levels using Multivariate Analysis of Variance (MANOVA). The results indicated that some features of cohesion yielded significant variation, but with small effect sizes. The majority of cohesion features considered were not different across the textbook levels. Larger effect sizes were found with factors like length, readability and lexical or syntactic complexity
Design Principles for Robust Fraud Detection: The Case of Stock Market Manipulations
We address the challenge of building an automated fraud detection system with robust classifiers that mitigate countermeasures from fraudsters in the field of information-based securities fraud. Our work involves developing design principles for robust fraud detection systems and presenting corresponding design features. We adopt an instrumentalist perspective that relies on theory-based linguistic features and ensemble learning concepts as justificatory knowledge for building robust classifiers. We perform a naive evaluation that assesses the classifiersâ performance to identify suspicious stock recommendations, and a robustness evaluation with a simulation that demonstrates a response to fraudster countermeasures. The results indicate that the use of theory-based linguistic features and ensemble learning can significantly increase the robustness of classifiers and contribute to the effectiveness of robust fraud detection. We discuss implications for supervisory authorities, industry, and individual users
Only Words Count; the Rest Is Mere Chattering: A Cross-Disciplinary Approach to the Verbal Expression of Emotional Experience
The analysis of sequences of words and prosody, meter, and rhythm provided in an interview addressing the capacity to identify and describe emotions represents a powerful tool to reveal emotional processing. The ability to express and identify emotions was analyzed by means of the Toronto Structured Interview for Alexithymia (TSIA), and TSIA transcripts were analyzed by Natural Language Processing to shed light on verbal features. The brain correlates of the capacity to translate emotional experience into words were determined through cortical thickness measures. A machine learning methodology proved that individuals with deficits in identifying and describing emotions (n = 7) produced language distortions, frequently used the present tense of auxiliary verbs, and few possessive determiners, as well as scarcely connected the speech, in comparison to individuals without deficits (n = 7). Interestingly, they showed high cortical thickness at left temporal pole and low at isthmus of the right cingulate cortex. Overall, we identified the neuro-linguistic pattern of the expression of emotional experience
Only Words Count; the Rest Is Mere Chattering: A Cross-Disciplinary Approach to the Verbal Expression of Emotional Experience
The analysis of sequences of words and prosody, meter, and rhythm provided in an interview addressing the capacity to identify and describe emotions represents a powerful tool to reveal emotional processing. The ability to express and identify emotions was analyzed by means of the Toronto Structured Interview for Alexithymia (TSIA), and TSIA transcripts were analyzed by Natural Language Processing to shed light on verbal features. The brain correlates of the capacity to translate emotional experience into words were determined through cortical thickness measures. A machine learning methodology proved that individuals with deficits in identifying and describing emotions (n = 7) produced language distortions, frequently used the present tense of auxiliary verbs, and few possessive determiners, as well as scarcely connected the speech, in comparison to individuals without deficits (n = 7). Interestingly, they showed high cortical thickness at left temporal pole and low at isthmus of the right cingulate cortex. Overall, we identified the neuro-linguistic pattern of the expression of emotional experience
- âŠ