171 research outputs found

    Experimental Analysis of the Relevance of Features and Effects on Gender Classification Models for Social Media Author Profiling

    Get PDF
    [Abstract] Automatic user profiling from social networks has become a popular task due to its commercial applications (targeted advertising, market studies...). Automatic profiling models infer demographic characteristics of social network users from their generated content or interactions. Users’ demographic information is also precious for more social worrying tasks such as automatic early detection of mental disorders. For this type of users’ analysis tasks, it has been shown that the way how they use language is an important indicator which contributes to the effectiveness of the models. Therefore, we also consider that for identifying aspects such as gender, age or user’s origin, it is interesting to consider the use of the language both from psycho-linguistic and semantic features. A good selection of features will be vital for the performance of retrieval, classification, and decision-making software systems. In this paper, we will address gender classification as a part of the automatic profiling task. We show an experimental analysis of the performance of existing gender classification models based on external corpus and baselines for automatic profiling. We analyse in-depth the influence of the linguistic features in the classification accuracy of the model. After that analysis, we have put together a feature set for gender classification models in social networks with an accuracy performance above existing baselines.This work was supported by projects RTI2018-093336-B-C21, RTI2018-093336-B-C22 (Ministerio de Ciencia e Innvovacion & ERDF) and the financial support supplied by the Conselleria de Educacion, Universidade e Formacion Profesional (accreditation 2019-2022 ED431G/01, ED431B 2019/03) and the European Regional Development Fund, which acknowledges the CITIC Research Center in ICT of the University of A Coruna as a Research Center of the Galician University System.Xunta de Galicia; ED431G/01Xunta de Galicia; ED431B 2019/0

    Designing Women: Essentializing Femininity in AI Linguistics

    Get PDF
    Since the eighties, feminists have considered technology a force capable of subverting sexism because of technology’s ability to produce unbiased logic. Most famously, Donna Haraway’s “A Cyborg Manifesto” posits that the cyborg has the inherent capability to transcend gender because of its removal from social construct and lack of loyalty to the natural world. But while humanoids and artificial intelligence have been imagined as inherently subversive to gender, current artificial intelligence perpetuates gender divides in labor and language as their programmers imbue them with traits considered “feminine.” A majority of 21st century AI and humanoids are programmed to fit female stereotypes as they fulfill emotional labor and perform pink-collar tasks, whether through roles as therapists, query-fillers, or companions. This paper examines four specific chat-based AI --ELIZA, XiaoIce, Sophia, and Erica-- and examines how their feminine linguistic patterns are used to maintain the illusion of emotional understanding in regards to the tasks that they perform. Overall, chat-based AI fails to subvert gender roles, as feminine AI are relegated to the realm of emotional intelligence and labor

    Stylistic variation on the Donald Trump Twitter account:a linguistic analysis of tweets posted between 2009 and 2018

    Get PDF
    Twitter was an integral part of Donald Trump's communication platform during his 2016 campaign. Although its topical content has been examined by researchers and the media, we know relatively little about the style of the language used on the account or how this style changed over time. In this study, we present the first detailed description of stylistic variation on the Trump Twitter account based on a multivariate analysis of grammatical co-occurrence patterns in tweets posted between 2009 and 2018. We identify four general patterns of stylistic variation, which we interpret as representing the degree of conversational, campaigning, engaged, and advisory discourse. We then track how the use of these four styles changed over time, focusing on the period around the campaign, showing that the style of tweets shifts systematically depending on the communicative goals of Trump and his team. Based on these results, we propose a series of hypotheses about how the Trump campaign used social media during the 2016 elections

    Effects of training datasets on both the extreme learning machine and support vector machine for target audience identification on twitter

    Get PDF
    The ability to identify or predict a target audience from the increasingly crowded social space will provide a company some competitive advantage over other companies. In this paper, we analyze various training datasets, which include Twitter contents of an account owner and its list of followers, using features generated in different ways for two machine learning approaches - the Extreme Learning Machine (ELM) and Support Vector Machine (SVM). Various configurations of the ELM and SVM have been evaluated. The results indicate that training datasets using features generated from the owner tweets achieve the best performance, relative to other feature sets. This finding is important and may aid researchers in developing a classifier that is capable of identifying a specific group of target audience members. This will assist the account owner to spend resources more effectively, by sending offers to the right audience, and hence maximize marketing efficiency and improve the return on investment

    Detecting portuguese and english Twitter users’ gender

    Get PDF
    Existing social networking services provide means for people to communicate and express their feelings in a easy way. Such user generated content contains clues of user’s behaviors and preferences, as well as other metadata information that is now available for scientific research. Twitter, in particular, has become a relevant source for social networking studies, mainly because: it provides a simple way for users to express their feelings, ideas, and opinions; makes the user generated content and associated metadata available to the community; and furthermore provides easy-to-use web interfaces and application programming interfaces (API) to access data. For many studies, the available information about a user is relevant. However, the gender attribute is not provided when creating a Twitter account. The main focus of this study is to infer the users’ gender from other available information. We propose a methodology for gender detection of Twitter users, using unstructured information found on Twitter profile, user generated content, and later using the user’s profile picture. In previous studies, one of the challenges presented was the labor-intensive task of manually labelling datasets. In this study, we propose a method for creating extended labelled datasets in a semi-automatic fashion. With the extended labelled datasets, we associate the users’ textual content with their gender and created gender models, based on the users’ generated content and profile information. We explore supervised and unsupervised classifiers and evaluate the results in both Portuguese and English Twitter user datasets. We obtained an accuracy of 93.2% with English users and an accuracy of 96.9% with Portuguese users. The proposed methodology of our research is language independent, but our focus was given to Portuguese and English users.Os serviços de redes sociais existentes proporcionam meios para as pessoas comunicarem e exprimirem os seus sentimentos de uma forma fácil. O conteúdo gerado por estes utilizadores contém indícios dos seus comportamentos e preferências, bem como outros metadados que estão agora disponíveis para investigação científica. O Twitter em particular, tornou-se uma fonte importante para estudos das redes socias, sobretudo porque fornece um modo simples para os utilizadores expressarem os seus sentimentos, ideias e opiniões; disponibiliza o conteúdo gerado pelos utilizadores e os metadados associados à comunidade; e fornece interfaces web e interfaces de programação de aplicações (API) para acesso aos dados de fácil utilização. Para muitos estudos, a informação disponível sobre um utilizador é relevante. No entanto, o atributo de género não é fornecido ao criar uma conta no Twitter. O foco principal deste estudo é inferir o género dos utilizadores através da informação disponível. Propomos uma metodologia para a detecção de género de utilizadores do Twitter, usando informação não estruturada encontrada no perfil do Twitter, no conteúdo gerado pelo utilizador, e mais tarde usando a imagem de perfil do utilizador. Em estudos anteriores, um dos desafios apresentados foi a tarefa de etiquetar manualmente dados, que revelou exigir bastante trabalho. Neste estudo, propomos um método para a criação de conjuntos de dados etiquetados de uma forma semi-automática, utilizando um conjunto de atributos com base na informação não estruturada de perfil. Utilizando os conjuntos de dados etiquetados, associamos conteúdo textual ao seu género e criamos modelos, com base no conteúdo gerado pelos utilizadores, e na informação de perfil. Exploramos classificadores supervisionados e não supervisionados e avaliamos os resultados em ambos os conjuntos de dados de utilizadores Portugueses e Ingleses do Twitter. Obtivemos uma precisão de 93,2% com utilizadores Ingleses e uma precisão de 96,9% com utilizadores Portugueses. A metodologia proposta é independente do idioma, mas o foco foi dado a utilizadores Portugueses e Ingleses

    Using support vector machine ensembles for target audience classification on Twitter

    Get PDF
    The vast amount and diversity of the content shared on social media can pose a challenge for any business wanting to use it to identify potential customers. In this paper, our aim is to investigate the use of both unsupervised and supervised learning methods for target audience classification on Twitter with minimal annotation efforts. Topic domains were automatically discovered from contents shared by followers of an account owner using Twitter Latent Dirichlet Allocation (LDA). A Support Vector Machine (SVM) ensemble was then trained using contents from different account owners of the various topic domains identified by Twitter LDA. Experimental results show that the methods presented are able to successfully identify a target audience with high accuracy. In addition, we show that using a statistical inference approach such as bootstrapping in over-sampling, instead of using random sampling, to construct training datasets can achieve a better classifier in an SVM ensemble. We conclude that such an ensemble system can take advantage of data diversity, which enables real-world applications for differentiating prospective customers from the general audience, leading to business advantage in the crowded social media space

    “What, a Black man can’t have a TV?”: Vine Racial Comedy as a Sociopolitical Discourse Genre

    Get PDF
    This thesis analyzes the generic features and social significance of Vine racial comedy, a genre of sociopolitical humor on the video-sharing social media platform Vine. Comedy is the most popular category of videos on the platform, and for the majority of Vine’s existence since its launch in 2013, comedy has been dominated by King Bach (pronounced “batch”). Andrew Bachelor, the actor and producer behind the King Bach persona, is a 28-year-old Black comedian with more than 16 million Vine followers (as of October 2016), making him the most followed comedy Viner and the most followed Viner overall. King Bach has created a dominant form of Vine racial comedy, a unique style of audio-visual comedy that incorporates features of both face-to-face and online discourse genres and adapts them to the affordances of the Vine platform.Multimodal discourse analysis on a data set of 30 vines in which King Bach performs racial comedy demonstrates that King Bach’s Vine racial comedy draws on the traditions of Black stand-up and sketch comedy and the online discourse genres of reaction GIFs and hashtag activism. Black comedians have used comedy to celebrate Black culture as well as bring attention to the negative racial ideologies and racial inequality that permeate the lived experiences of Black Americans. As a genre of online discourse, Vine racial comedy is also heavily influenced by the visual/textual medium of reaction GIFs, in which a moving image is used to represent the poster’s embodied reaction to an event or situation. By utilizing the affordances of Vine to highlight social inequality, Vine racial comedy is also generically similar to hashtag activism, which emerged when social media users began using Twitter for grassroots social activism.Vine’s affordance of a six-second length limit has resulted in semiotically dense videos that rely on both audio and visual semiotic features to convey their message efficiently. Like in other subgenres of racial comedy, racial, cultural, and linguistic stereotypes are often employed for this purpose. In Vine racial comedy, King Bach constructs stereotype-based characters through the stylistic use of language, particularly African American Language. Visually, characters’ identities and social roles are constructed by embodied behavior — facial expressions, gesture, and other forms of body movement — and highly indexical attire (e.g., pastel polo shirt to index preppiness). The affordance of a title/caption for each vine allows King Bach to direct the audience’s interpretation his comedy, which is particularly important for comedy like his that addresses complex and contentious racial ideologies, stereotypes, and forms of discrimination in the U.S. In the two examples analyzed in this study, audiences are confronted with racial profiling, police officers’ targeting of Black men, the idea of “playing the race card,” “white fragility” in racially motivated interaction, stereotypes of Black speakers, and derogatory representations of African American Language.As a genre, King Bach’s innovative racial comedy uses performance and technology to challenge colorblind ideology, which asserts that acknowledging race only increases racial discord, and discourses of a racial digital divide that warn of Black Americans being left behind in a rapidly advancing technological society. With multiracial casts, his vines demonstrate that addressing race can actually bring people of different ethnoracial backgrounds together and that race is not solely the concern of people of color. By centering race in his comedy and using social media as the medium of expression, King Bach shows that, rather than being oppositional, race and technology can be complimentary, and Black people are taking advantage of the affordances of social media to address racial issues in their own live

    Analítica de texto y procesamiento de lenguaje natural aplicado a notas de enfermería en español

    Get PDF
    58 páginasEsta investigación, de tipo exploratoria, tomó una muestra de las notas de enfermería asignada por la Clínica Universidad de La Sabana, se cargaron a una base de datos a través de procesos de extracción, transformación y carga, el texto fue limpiado, corregido y transformado mediante técnicas de limpieza. Con algoritmos de NLP se descubrieron patrones en el estilo de redacción de los enfermeros, se cuantificaron las observaciones registradas con el algoritmo de frecuencia de términos y se detectaron tópicos inmersos en el conjunto de anotaciones que, junto a las categorías previamente revisadas con los profesionales, fueron usadas para agrupar las notas en función de su contenido para calcular una segunda cuantificación usando palabras clave. Este trabajo está dividido en tres secciones, la primera (Limpieza) ilustra el pre y procesamiento de los datos, la segunda (analítica) usa la información limpia para calcular el puntaje de las notas, el perfilamiento de los enfermeros y categorización de las palabras usadas en los registros, la tercera y última sección (visualización) toma los resultados de los pasos anteriores y los disponibiliza en una herramienta de visualización para exponer la nueva información generada a fin de facilitar el análisis y toma de decisiones.Maestría en Analítica AplicadaMagíster en Analítica Aplicad

    Tuning in to Terrorist Signals

    Get PDF
    • …
    corecore