4,427 research outputs found

    ReadNet: A Hierarchical Transformer Framework for Web Article Readability Analysis

    Full text link
    Analyzing the readability of articles has been an important sociolinguistic task. Addressing this task is necessary to the automatic recommendation of appropriate articles to readers with different comprehension abilities, and it further benefits education systems, web information systems, and digital libraries. Current methods for assessing readability employ empirical measures or statistical learning techniques that are limited by their ability to characterize complex patterns such as article structures and semantic meanings of sentences. In this paper, we propose a new and comprehensive framework which uses a hierarchical self-attention model to analyze document readability. In this model, measurements of sentence-level difficulty are captured along with the semantic meanings of each sentence. Additionally, the sentence-level features are incorporated to characterize the overall readability of an article with consideration of article structures. We evaluate our proposed approach on three widely-used benchmark datasets against several strong baseline approaches. Experimental results show that our proposed method achieves the state-of-the-art performance on estimating the readability for various web articles and literature.Comment: ECIR 202

    Tapping the Potential of Coherence and Syntactic Features in Neural Models for Automatic Essay Scoring

    Full text link
    In the prompt-specific holistic score prediction task for Automatic Essay Scoring, the general approaches include pre-trained neural model, coherence model, and hybrid model that incorporate syntactic features with neural model. In this paper, we propose a novel approach to extract and represent essay coherence features with prompt-learning NSP that shows to match the state-of-the-art AES coherence model, and achieves the best performance for long essays. We apply syntactic feature dense embedding to augment BERT-based model and achieve the best performance for hybrid methodology for AES. In addition, we explore various ideas to combine coherence, syntactic information and semantic embeddings, which no previous study has done before. Our combined model also performs better than the SOTA available for combined model, even though it does not outperform our syntactic enhanced neural model. We further offer analyses that can be useful for future study.Comment: Accepted to "2022 International Conference on Asian Language Processing (IALP)

    Man vs machine – Detecting deception in online reviews

    Get PDF
    This study focused on three main research objectives: analyzing the methods used to identify deceptive online consumer reviews, evaluating insights provided by multi-method automated approaches based on individual and aggregated review data, and formulating a review interpretation framework for identifying deception. The theoretical framework is based on two critical deception-related models, information manipulation theory and self-presentation theory. The findings confirm the interchangeable characteristics of the various automated text analysis methods in drawing insights about review characteristics and underline their significant complementary aspects. An integrative multi-method model that approaches the data at the individual and aggregate level provides more complex insights regarding the quantity and quality of review information, sentiment, cues about its relevance and contextual information, perceptual aspects, and cognitive material

    Cohesion features in ESL reading: Comparing beginning, intermediate and advanced textbooks

    Get PDF
    This study of English as a second language (ESL) reading textbooks investigates cohesion in reading passages from 27 textbooks. The guiding research questions were whether and how cohesion differs across textbooks written for beginning, intermediate, and advanced second language readers. Using a computational tool called Coh-Metrix, textual features were compared across the three levels using Multivariate Analysis of Variance (MANOVA). The results indicated that some features of cohesion yielded significant variation, but with small effect sizes. The majority of cohesion features considered were not different across the textbook levels. Larger effect sizes were found with factors like length, readability and lexical or syntactic complexity

    Design Principles for Robust Fraud Detection: The Case of Stock Market Manipulations

    Get PDF
    We address the challenge of building an automated fraud detection system with robust classifiers that mitigate countermeasures from fraudsters in the field of information-based securities fraud. Our work involves developing design principles for robust fraud detection systems and presenting corresponding design features. We adopt an instrumentalist perspective that relies on theory-based linguistic features and ensemble learning concepts as justificatory knowledge for building robust classifiers. We perform a naive evaluation that assesses the classifiers’ performance to identify suspicious stock recommendations, and a robustness evaluation with a simulation that demonstrates a response to fraudster countermeasures. The results indicate that the use of theory-based linguistic features and ensemble learning can significantly increase the robustness of classifiers and contribute to the effectiveness of robust fraud detection. We discuss implications for supervisory authorities, industry, and individual users

    Only Words Count; the Rest Is Mere Chattering: A Cross-Disciplinary Approach to the Verbal Expression of Emotional Experience

    Get PDF
    The analysis of sequences of words and prosody, meter, and rhythm provided in an interview addressing the capacity to identify and describe emotions represents a powerful tool to reveal emotional processing. The ability to express and identify emotions was analyzed by means of the Toronto Structured Interview for Alexithymia (TSIA), and TSIA transcripts were analyzed by Natural Language Processing to shed light on verbal features. The brain correlates of the capacity to translate emotional experience into words were determined through cortical thickness measures. A machine learning methodology proved that individuals with deficits in identifying and describing emotions (n = 7) produced language distortions, frequently used the present tense of auxiliary verbs, and few possessive determiners, as well as scarcely connected the speech, in comparison to individuals without deficits (n = 7). Interestingly, they showed high cortical thickness at left temporal pole and low at isthmus of the right cingulate cortex. Overall, we identified the neuro-linguistic pattern of the expression of emotional experience

    Only Words Count; the Rest Is Mere Chattering: A Cross-Disciplinary Approach to the Verbal Expression of Emotional Experience

    Get PDF
    The analysis of sequences of words and prosody, meter, and rhythm provided in an interview addressing the capacity to identify and describe emotions represents a powerful tool to reveal emotional processing. The ability to express and identify emotions was analyzed by means of the Toronto Structured Interview for Alexithymia (TSIA), and TSIA transcripts were analyzed by Natural Language Processing to shed light on verbal features. The brain correlates of the capacity to translate emotional experience into words were determined through cortical thickness measures. A machine learning methodology proved that individuals with deficits in identifying and describing emotions (n = 7) produced language distortions, frequently used the present tense of auxiliary verbs, and few possessive determiners, as well as scarcely connected the speech, in comparison to individuals without deficits (n = 7). Interestingly, they showed high cortical thickness at left temporal pole and low at isthmus of the right cingulate cortex. Overall, we identified the neuro-linguistic pattern of the expression of emotional experience
    • 

    corecore