5,749 research outputs found
Створення та тестування спеціалізованих словників для аналізу тексту
Practitioners in many domains–e.g., clinical psychologists, college instructors,
researchers–collect written responses from clients. A well-developed method that has been applied
to texts from sources like these is the computer application Linguistic Inquiry and Word Count
(LIWC). LIWC uses the words in texts as cues to a person’s thought processes, emotional states,
intentions, and motivations. In the present study, we adopt analytic principles from LIWC and
develop and test an alternative method of text analysis using naïve Bayes methods. We further
show how output from the naïve Bayes analysis can be used for mark up of student work in order
to provide immediate, constructive feedback to students and instructors.Робота фахівців-практиків у багатьох галузях, наприклад, клінічних
психологів, викладачів кол д ів, дослідників п р дбача збір пись ових відповід хніх
клі нтів чи студ нтів. обр розробл ни тод, яки застосову ться сьогодні до т кстів
такого типу, ц ко п’ют рни додаток Linguistic Inquiry and Word Count (LIWC).
Програма LIWC тракту слова в т кстах як індикатори нтальних проц сів людини,
оці них станів, на ірів і отивів. У статті використано аналітичні принципи LIWC,
розробл но та прот стовано альт рнативни тод аналізу т ксту з використання тодів
на вного ба сового класифікатора. Автори д онструють, як р зультати аналізу за на вни
ба сови класифікаторо о уть бути використані для аналізу студ нтсько роботи з
тою надання н га ного, конструктивного зворотного зв’язку і студ нта і викладача
Аналіз відмінностей тем, якості письма та стилістичного контексту в есеях студентів коледжу на основі комп’ютерної програми Linguistic Inquiry and Word Count (LIWC).
Machine methods for automatically analyzing text have been investigated for
decades. Yet the availability and usability of these methods for classifying and scoring specialized
essays in small samples–as is typical for ordinary coursework–remains unclear. In this paper we
analyzed 156 essays submitted by students in a first-year college rhetoric course. Using cognitive
and affective measures within Linguistic Inquiry and Word Count (LIWC), we tested whether
machine analyses could i) distinguish among essay topics, ii) distinguish between high and low
writing quality, and iii) identify differences due to changes in rhetorical context across writing
assignments. The results showed positive results for all three tests. We consider ways that LIWC
may benefit college instructors in assessing student compositions and in monitoring the
effectiveness of the course curriculum. We also consider extensions of machine assessments for
instructional applications.Машинні методи автоматичного аналізу тексту та їхні можливості
вивчалися впродовж десятиліть. Однак питання доступності та зручності використання цих
методів для класифікації та оцінки спеціалізованих есеїв у невеликих зразках, як,
наприклад, курсових роботах, залишається досі малодослідженим питанням. У статті
проаналізовано 139 есеїв із курсу стилістики, написаних студентами першого курсу. На
основі використання когнітивних та афективних категорій програми Linguistic Inquiry and
Word Count (LIWC) було перевірено здатність машинного аналізу: а) розмежовувати теми
есеїв, б) розрізняти високу та низьку якість письма та в) виявляти відмінності через зміни
стилістичного контексту написаних завдань. Дослідження засвідчило позитивні результати
для всіх трьох тестових перевірок. Увагу авторів зосереджено на тому, як LIWC може
полегшити роботу університетських викладачів під час оцінки ними студентських творів та
моніторингу ефективності навчальної програми курсу. Крім того, у статті розглянуто
питання перспектив машинного оцінювання викладацьких застосунків
Two-layer classification and distinguished representations of users and documents for grouping and authorship identification
Most studies on authorship identification reported a drop in the identification result when the number of authors exceeds 20-25. In this paper, we introduce a new user representation to address this problem and split classification across two layers. There are at least 3 novelties in this paper. First, the two-layer approach allows applying authorship identification over larger number of authors (tested over 100 authors), and it is extendable. The authors are divided into groups that contain smaller number of authors. Given an anonymous document, the primary layer detects the group to which the document belongs. Then, the secondary layer determines the particular author inside the selected group. In order to extract the groups linking similar authors, clustering is applied over users rather than documents. Hence, the second novelty of this paper is introducing a new user representation that is different from document representation. Without the proposed user representation, the clustering over documents will result in documents of author(s) distributed over several clusters, instead of a single cluster membership for each author. Third, the extracted clusters are descriptive and meaningful of their users as the dimensions have psychological backgrounds. For authorship identification, the documents are labelled with the extracted groups and fed into machine learning to build classification models that predicts the group and author of a given document. The results show that the documents are highly correlated with the extracted corresponding groups, and the proposed model can be accurately trained to determine the group and the author identity
Using Linguistic Features to Estimate Suicide Probability of Chinese Microblog Users
If people with high risk of suicide can be identified through social media
like microblog, it is possible to implement an active intervention system to
save their lives. Based on this motivation, the current study administered the
Suicide Probability Scale(SPS) to 1041 weibo users at Sina Weibo, which is a
leading microblog service provider in China. Two NLP (Natural Language
Processing) methods, the Chinese edition of Linguistic Inquiry and Word Count
(LIWC) lexicon and Latent Dirichlet Allocation (LDA), are used to extract
linguistic features from the Sina Weibo data. We trained predicting models by
machine learning algorithm based on these two types of features, to estimate
suicide probability based on linguistic features. The experiment results
indicate that LDA can find topics that relate to suicide probability, and
improve the performance of prediction. Our study adds value in prediction of
suicidal probability of social network users with their behaviors
Self-presentation and emotional contagion on Facebook: new experimental measures of profiles' emotional coherence
Social Networks allow users to self-present by sharing personal contents with
others which may add comments. Recent studies highlighted how the emotions
expressed in a post affect others' posts, eliciting a congruent emotion. So
far, no studies have yet investigated the emotional coherence between wall
posts and its comments. This research evaluated posts and comments mood of
Facebook profiles, analyzing their linguistic features, and a measure to assess
an excessive self-presentation was introduced. Two new experimental measures
were built, describing the emotional loading (positive and negative) of posts
and comments, and the mood correspondence between them was evaluated. The
profiles "empathy", the mood coherence between post and comments, was used to
investigate the relation between an excessive self-presentation and the
emotional coherence of a profile. Participants publish a higher average number
of posts with positive mood. To publish an emotional post corresponds to get
more likes, comments and receive a coherent mood of comments, confirming the
emotional contagion effect reported in literature. Finally, the more empathetic
profiles are characterized by an excessive self-presentation, having more
posts, and receiving more comments and likes. To publish emotional contents
appears to be functional to receive more comments and likes, fulfilling needs
of attention-seeking.Comment: Submitted to Complexit
Biased Embeddings from Wild Data: Measuring, Understanding and Removing
Many modern Artificial Intelligence (AI) systems make use of data embeddings,
particularly in the domain of Natural Language Processing (NLP). These
embeddings are learnt from data that has been gathered "from the wild" and have
been found to contain unwanted biases. In this paper we make three
contributions towards measuring, understanding and removing this problem. We
present a rigorous way to measure some of these biases, based on the use of
word lists created for social psychology applications; we observe how gender
bias in occupations reflects actual gender bias in the same occupations in the
real world; and finally we demonstrate how a simple projection can
significantly reduce the effects of embedding bias. All this is part of an
ongoing effort to understand how trust can be built into AI systems.Comment: Author's original versio
- …