171 research outputs found
Experimental Analysis of the Relevance of Features and Effects on Gender Classification Models for Social Media Author Profiling
[Abstract] Automatic user profiling from social networks has become a popular task due to its commercial applications
(targeted advertising, market studies...). Automatic profiling models infer demographic characteristics
of social network users from their generated content or interactions. Users’ demographic information is also
precious for more social worrying tasks such as automatic early detection of mental disorders. For this type
of users’ analysis tasks, it has been shown that the way how they use language is an important indicator which
contributes to the effectiveness of the models. Therefore, we also consider that for identifying aspects such as
gender, age or user’s origin, it is interesting to consider the use of the language both from psycho-linguistic
and semantic features. A good selection of features will be vital for the performance of retrieval, classification,
and decision-making software systems. In this paper, we will address gender classification as a part of the automatic
profiling task. We show an experimental analysis of the performance of existing gender classification
models based on external corpus and baselines for automatic profiling. We analyse in-depth the influence of
the linguistic features in the classification accuracy of the model. After that analysis, we have put together a
feature set for gender classification models in social networks with an accuracy performance above existing
baselines.This work was supported by projects RTI2018-093336-B-C21, RTI2018-093336-B-C22 (Ministerio de Ciencia e Innvovacion & ERDF) and the financial support supplied by the Conselleria de Educacion, Universidade e Formacion Profesional (accreditation 2019-2022 ED431G/01, ED431B 2019/03) and the European Regional Development Fund, which acknowledges the CITIC Research Center in ICT of the University of A Coruna as a Research Center of the Galician University System.Xunta de Galicia; ED431G/01Xunta de Galicia; ED431B 2019/0
Designing Women: Essentializing Femininity in AI Linguistics
Since the eighties, feminists have considered technology a force capable of subverting sexism because of technology’s ability to produce unbiased logic. Most famously, Donna Haraway’s “A Cyborg Manifesto” posits that the cyborg has the inherent capability to transcend gender because of its removal from social construct and lack of loyalty to the natural world. But while humanoids and artificial intelligence have been imagined as inherently subversive to gender, current artificial intelligence perpetuates gender divides in labor and language as their programmers imbue them with traits considered “feminine.” A majority of 21st century AI and humanoids are programmed to fit female stereotypes as they fulfill emotional labor and perform pink-collar tasks, whether through roles as therapists, query-fillers, or companions. This paper examines four specific chat-based AI --ELIZA, XiaoIce, Sophia, and Erica-- and examines how their feminine linguistic patterns are used to maintain the illusion of emotional understanding in regards to the tasks that they perform. Overall, chat-based AI fails to subvert gender roles, as feminine AI are relegated to the realm of emotional intelligence and labor
Stylistic variation on the Donald Trump Twitter account:a linguistic analysis of tweets posted between 2009 and 2018
Twitter was an integral part of Donald Trump's communication platform during his 2016 campaign. Although its topical content has been examined by researchers and the media, we know relatively little about the style of the language used on the account or how this style changed over time. In this study, we present the first detailed description of stylistic variation on the Trump Twitter account based on a multivariate analysis of grammatical co-occurrence patterns in tweets posted between 2009 and 2018. We identify four general patterns of stylistic variation, which we interpret as representing the degree of conversational, campaigning, engaged, and advisory discourse. We then track how the use of these four styles changed over time, focusing on the period around the campaign, showing that the style of tweets shifts systematically depending on the communicative goals of Trump and his team. Based on these results, we propose a series of hypotheses about how the Trump campaign used social media during the 2016 elections
Effects of training datasets on both the extreme learning machine and support vector machine for target audience identification on twitter
The ability to identify or predict a target audience from the increasingly crowded social space will provide a company some competitive advantage over other companies. In this paper, we analyze various training datasets, which include Twitter contents of an account owner and its list of followers, using features generated in different ways for two machine learning approaches - the Extreme Learning Machine (ELM) and Support Vector Machine (SVM). Various configurations of the ELM and SVM have been evaluated. The results indicate that training datasets using features generated from the owner tweets achieve the best performance, relative to other feature sets. This finding is important and may aid researchers in developing a classifier that is capable of identifying a specific group of target audience members. This will assist the account owner to spend resources more effectively, by sending offers to the right audience, and hence maximize marketing efficiency and improve the return on investment
Detecting portuguese and english Twitter users’ gender
Existing social networking services provide means for people to communicate and express
their feelings in a easy way. Such user generated content contains clues of user’s behaviors and
preferences, as well as other metadata information that is now available for scientific research.
Twitter, in particular, has become a relevant source for social networking studies, mainly because:
it provides a simple way for users to express their feelings, ideas, and opinions; makes
the user generated content and associated metadata available to the community; and furthermore
provides easy-to-use web interfaces and application programming interfaces (API) to access
data. For many studies, the available information about a user is relevant. However, the gender
attribute is not provided when creating a Twitter account.
The main focus of this study is to infer the users’ gender from other available information.
We propose a methodology for gender detection of Twitter users, using unstructured information
found on Twitter profile, user generated content, and later using the user’s profile picture.
In previous studies, one of the challenges presented was the labor-intensive task of manually
labelling datasets. In this study, we propose a method for creating extended labelled datasets in
a semi-automatic fashion. With the extended labelled datasets, we associate the users’ textual
content with their gender and created gender models, based on the users’ generated content and
profile information. We explore supervised and unsupervised classifiers and evaluate the results
in both Portuguese and English Twitter user datasets. We obtained an accuracy of 93.2% with
English users and an accuracy of 96.9% with Portuguese users. The proposed methodology of
our research is language independent, but our focus was given to Portuguese and English users.Os serviços de redes sociais existentes proporcionam meios para as pessoas comunicarem
e exprimirem os seus sentimentos de uma forma fácil. O conteúdo gerado por estes utilizadores
contĂ©m indĂcios dos seus comportamentos e preferĂŞncias, bem como outros metadados que estĂŁo
agora disponĂveis para investigação cientĂfica. O Twitter em particular, tornou-se uma fonte
importante para estudos das redes socias, sobretudo porque fornece um modo simples para os
utilizadores expressarem os seus sentimentos, ideias e opiniões; disponibiliza o conteúdo gerado
pelos utilizadores e os metadados associados Ă comunidade; e fornece interfaces web e interfaces
de programação de aplicações (API) para acesso aos dados de fácil utilização. Para muitos
estudos, a informação disponĂvel sobre um utilizador Ă© relevante. No entanto, o atributo de
género não é fornecido ao criar uma conta no Twitter.
O foco principal deste estudo é inferir o género dos utilizadores através da informação
disponĂvel. Propomos uma metodologia para a detecção de gĂ©nero de utilizadores do Twitter,
usando informação não estruturada encontrada no perfil do Twitter, no conteúdo gerado pelo
utilizador, e mais tarde usando a imagem de perfil do utilizador. Em estudos anteriores, um dos
desafios apresentados foi a tarefa de etiquetar manualmente dados, que revelou exigir bastante
trabalho. Neste estudo, propomos um método para a criação de conjuntos de dados etiquetados
de uma forma semi-automática, utilizando um conjunto de atributos com base na informação
nĂŁo estruturada de perfil. Utilizando os conjuntos de dados etiquetados, associamos conteĂşdo
textual ao seu género e criamos modelos, com base no conteúdo gerado pelos utilizadores, e
na informação de perfil. Exploramos classificadores supervisionados e não supervisionados e
avaliamos os resultados em ambos os conjuntos de dados de utilizadores Portugueses e Ingleses
do Twitter. Obtivemos uma precisĂŁo de 93,2% com utilizadores Ingleses e uma precisĂŁo de
96,9% com utilizadores Portugueses. A metodologia proposta Ă© independente do idioma, mas
o foco foi dado a utilizadores Portugueses e Ingleses
Using support vector machine ensembles for target audience classification on Twitter
The vast amount and diversity of the content shared on social media can pose a challenge for any business wanting to use it to identify potential customers. In this paper, our aim is to investigate the use of both unsupervised and supervised learning methods for target audience classification on Twitter with minimal annotation efforts. Topic domains were automatically discovered from contents shared by followers of an account owner using Twitter Latent Dirichlet Allocation (LDA). A Support Vector Machine (SVM) ensemble was then trained using contents from different account owners of the various topic domains identified by Twitter LDA. Experimental results show that the methods presented are able to successfully identify a target audience with high accuracy. In addition, we show that using a statistical inference approach such as bootstrapping in over-sampling, instead of using random sampling, to construct training datasets can achieve a better classifier in an SVM ensemble. We conclude that such an ensemble system can take advantage of data diversity, which enables real-world applications for differentiating prospective customers from the general audience, leading to business advantage in the crowded social media space
“What, a Black man can’t have a TV?”: Vine Racial Comedy as a Sociopolitical Discourse Genre
This thesis analyzes the generic features and social significance of Vine racial comedy, a genre of sociopolitical humor on the video-sharing social media platform Vine. Comedy is the most popular category of videos on the platform, and for the majority of Vine’s existence since its launch in 2013, comedy has been dominated by King Bach (pronounced “batch”). Andrew Bachelor, the actor and producer behind the King Bach persona, is a 28-year-old Black comedian with more than 16 million Vine followers (as of October 2016), making him the most followed comedy Viner and the most followed Viner overall. King Bach has created a dominant form of Vine racial comedy, a unique style of audio-visual comedy that incorporates features of both face-to-face and online discourse genres and adapts them to the affordances of the Vine platform.Multimodal discourse analysis on a data set of 30 vines in which King Bach performs racial comedy demonstrates that King Bach’s Vine racial comedy draws on the traditions of Black stand-up and sketch comedy and the online discourse genres of reaction GIFs and hashtag activism. Black comedians have used comedy to celebrate Black culture as well as bring attention to the negative racial ideologies and racial inequality that permeate the lived experiences of Black Americans. As a genre of online discourse, Vine racial comedy is also heavily influenced by the visual/textual medium of reaction GIFs, in which a moving image is used to represent the poster’s embodied reaction to an event or situation. By utilizing the affordances of Vine to highlight social inequality, Vine racial comedy is also generically similar to hashtag activism, which emerged when social media users began using Twitter for grassroots social activism.Vine’s affordance of a six-second length limit has resulted in semiotically dense videos that rely on both audio and visual semiotic features to convey their message efficiently. Like in other subgenres of racial comedy, racial, cultural, and linguistic stereotypes are often employed for this purpose. In Vine racial comedy, King Bach constructs stereotype-based characters through the stylistic use of language, particularly African American Language. Visually, characters’ identities and social roles are constructed by embodied behavior — facial expressions, gesture, and other forms of body movement — and highly indexical attire (e.g., pastel polo shirt to index preppiness). The affordance of a title/caption for each vine allows King Bach to direct the audience’s interpretation his comedy, which is particularly important for comedy like his that addresses complex and contentious racial ideologies, stereotypes, and forms of discrimination in the U.S. In the two examples analyzed in this study, audiences are confronted with racial profiling, police officers’ targeting of Black men, the idea of “playing the race card,” “white fragility” in racially motivated interaction, stereotypes of Black speakers, and derogatory representations of African American Language.As a genre, King Bach’s innovative racial comedy uses performance and technology to challenge colorblind ideology, which asserts that acknowledging race only increases racial discord, and discourses of a racial digital divide that warn of Black Americans being left behind in a rapidly advancing technological society. With multiracial casts, his vines demonstrate that addressing race can actually bring people of different ethnoracial backgrounds together and that race is not solely the concern of people of color. By centering race in his comedy and using social media as the medium of expression, King Bach shows that, rather than being oppositional, race and technology can be complimentary, and Black people are taking advantage of the affordances of social media to address racial issues in their own live
AnalĂtica de texto y procesamiento de lenguaje natural aplicado a notas de enfermerĂa en español
58 páginasEsta investigaciĂłn, de tipo exploratoria, tomĂł una muestra de las notas de enfermerĂa asignada por la ClĂnica Universidad de La Sabana, se cargaron a una base de datos a travĂ©s de procesos de extracciĂłn, transformaciĂłn y carga, el texto fue limpiado, corregido y transformado mediante tĂ©cnicas de limpieza. Con algoritmos de NLP se descubrieron patrones en el estilo de redacciĂłn de los enfermeros, se cuantificaron las observaciones registradas con el algoritmo de frecuencia de tĂ©rminos y se detectaron tĂłpicos inmersos en el conjunto de anotaciones que, junto a las categorĂas previamente revisadas con los profesionales, fueron usadas para agrupar las notas en funciĂłn de su contenido para calcular una segunda cuantificaciĂłn usando palabras clave. Este trabajo está dividido en tres secciones, la primera (Limpieza) ilustra el pre y procesamiento de los datos, la segunda (analĂtica) usa la informaciĂłn limpia para calcular el puntaje de las notas, el perfilamiento de los enfermeros y categorizaciĂłn de las palabras usadas en los registros, la tercera y Ăşltima secciĂłn (visualizaciĂłn) toma los resultados de los pasos anteriores y los disponibiliza en una herramienta de visualizaciĂłn para exponer la nueva informaciĂłn generada a fin de facilitar el análisis y toma de decisiones.MaestrĂa en AnalĂtica AplicadaMagĂster en AnalĂtica Aplicad
- …