5,749 research outputs found

    Створення та тестування спеціалізованих словників для аналізу тексту

    Get PDF
    Practitioners in many domains–e.g., clinical psychologists, college instructors, researchers–collect written responses from clients. A well-developed method that has been applied to texts from sources like these is the computer application Linguistic Inquiry and Word Count (LIWC). LIWC uses the words in texts as cues to a person’s thought processes, emotional states, intentions, and motivations. In the present study, we adopt analytic principles from LIWC and develop and test an alternative method of text analysis using naïve Bayes methods. We further show how output from the naïve Bayes analysis can be used for mark up of student work in order to provide immediate, constructive feedback to students and instructors.Робота фахівців-практиків у багатьох галузях, наприклад, клінічних психологів, викладачів кол д ів, дослідників п р дбача збір пись ових відповід хніх клі нтів чи студ нтів. обр розробл ни тод, яки застосову ться сьогодні до т кстів такого типу, ц ко п’ют рни додаток Linguistic Inquiry and Word Count (LIWC). Програма LIWC тракту слова в т кстах як індикатори нтальних проц сів людини, оці них станів, на ірів і отивів. У статті використано аналітичні принципи LIWC, розробл но та прот стовано альт рнативни тод аналізу т ксту з використання тодів на вного ба сового класифікатора. Автори д онструють, як р зультати аналізу за на вни ба сови класифікаторо о уть бути використані для аналізу студ нтсько роботи з тою надання н га ного, конструктивного зворотного зв’язку і студ нта і викладача

    Аналіз відмінностей тем, якості письма та стилістичного контексту в есеях студентів коледжу на основі комп’ютерної програми Linguistic Inquiry and Word Count (LIWC).

    Get PDF
    Machine methods for automatically analyzing text have been investigated for decades. Yet the availability and usability of these methods for classifying and scoring specialized essays in small samples–as is typical for ordinary coursework–remains unclear. In this paper we analyzed 156 essays submitted by students in a first-year college rhetoric course. Using cognitive and affective measures within Linguistic Inquiry and Word Count (LIWC), we tested whether machine analyses could i) distinguish among essay topics, ii) distinguish between high and low writing quality, and iii) identify differences due to changes in rhetorical context across writing assignments. The results showed positive results for all three tests. We consider ways that LIWC may benefit college instructors in assessing student compositions and in monitoring the effectiveness of the course curriculum. We also consider extensions of machine assessments for instructional applications.Машинні методи автоматичного аналізу тексту та їхні можливості вивчалися впродовж десятиліть. Однак питання доступності та зручності використання цих методів для класифікації та оцінки спеціалізованих есеїв у невеликих зразках, як, наприклад, курсових роботах, залишається досі малодослідженим питанням. У статті проаналізовано 139 есеїв із курсу стилістики, написаних студентами першого курсу. На основі використання когнітивних та афективних категорій програми Linguistic Inquiry and Word Count (LIWC) було перевірено здатність машинного аналізу: а) розмежовувати теми есеїв, б) розрізняти високу та низьку якість письма та в) виявляти відмінності через зміни стилістичного контексту написаних завдань. Дослідження засвідчило позитивні результати для всіх трьох тестових перевірок. Увагу авторів зосереджено на тому, як LIWC може полегшити роботу університетських викладачів під час оцінки ними студентських творів та моніторингу ефективності навчальної програми курсу. Крім того, у статті розглянуто питання перспектив машинного оцінювання викладацьких застосунків

    Two-layer classification and distinguished representations of users and documents for grouping and authorship identification

    Get PDF
    Most studies on authorship identification reported a drop in the identification result when the number of authors exceeds 20-25. In this paper, we introduce a new user representation to address this problem and split classification across two layers. There are at least 3 novelties in this paper. First, the two-layer approach allows applying authorship identification over larger number of authors (tested over 100 authors), and it is extendable. The authors are divided into groups that contain smaller number of authors. Given an anonymous document, the primary layer detects the group to which the document belongs. Then, the secondary layer determines the particular author inside the selected group. In order to extract the groups linking similar authors, clustering is applied over users rather than documents. Hence, the second novelty of this paper is introducing a new user representation that is different from document representation. Without the proposed user representation, the clustering over documents will result in documents of author(s) distributed over several clusters, instead of a single cluster membership for each author. Third, the extracted clusters are descriptive and meaningful of their users as the dimensions have psychological backgrounds. For authorship identification, the documents are labelled with the extracted groups and fed into machine learning to build classification models that predicts the group and author of a given document. The results show that the documents are highly correlated with the extracted corresponding groups, and the proposed model can be accurately trained to determine the group and the author identity

    Using Linguistic Features to Estimate Suicide Probability of Chinese Microblog Users

    Full text link
    If people with high risk of suicide can be identified through social media like microblog, it is possible to implement an active intervention system to save their lives. Based on this motivation, the current study administered the Suicide Probability Scale(SPS) to 1041 weibo users at Sina Weibo, which is a leading microblog service provider in China. Two NLP (Natural Language Processing) methods, the Chinese edition of Linguistic Inquiry and Word Count (LIWC) lexicon and Latent Dirichlet Allocation (LDA), are used to extract linguistic features from the Sina Weibo data. We trained predicting models by machine learning algorithm based on these two types of features, to estimate suicide probability based on linguistic features. The experiment results indicate that LDA can find topics that relate to suicide probability, and improve the performance of prediction. Our study adds value in prediction of suicidal probability of social network users with their behaviors

    Self-presentation and emotional contagion on Facebook: new experimental measures of profiles' emotional coherence

    Get PDF
    Social Networks allow users to self-present by sharing personal contents with others which may add comments. Recent studies highlighted how the emotions expressed in a post affect others' posts, eliciting a congruent emotion. So far, no studies have yet investigated the emotional coherence between wall posts and its comments. This research evaluated posts and comments mood of Facebook profiles, analyzing their linguistic features, and a measure to assess an excessive self-presentation was introduced. Two new experimental measures were built, describing the emotional loading (positive and negative) of posts and comments, and the mood correspondence between them was evaluated. The profiles "empathy", the mood coherence between post and comments, was used to investigate the relation between an excessive self-presentation and the emotional coherence of a profile. Participants publish a higher average number of posts with positive mood. To publish an emotional post corresponds to get more likes, comments and receive a coherent mood of comments, confirming the emotional contagion effect reported in literature. Finally, the more empathetic profiles are characterized by an excessive self-presentation, having more posts, and receiving more comments and likes. To publish emotional contents appears to be functional to receive more comments and likes, fulfilling needs of attention-seeking.Comment: Submitted to Complexit

    Biased Embeddings from Wild Data: Measuring, Understanding and Removing

    Get PDF
    Many modern Artificial Intelligence (AI) systems make use of data embeddings, particularly in the domain of Natural Language Processing (NLP). These embeddings are learnt from data that has been gathered "from the wild" and have been found to contain unwanted biases. In this paper we make three contributions towards measuring, understanding and removing this problem. We present a rigorous way to measure some of these biases, based on the use of word lists created for social psychology applications; we observe how gender bias in occupations reflects actual gender bias in the same occupations in the real world; and finally we demonstrate how a simple projection can significantly reduce the effects of embedding bias. All this is part of an ongoing effort to understand how trust can be built into AI systems.Comment: Author's original versio
    corecore