3,279 research outputs found
Detecting portuguese and english Twitter users’ gender
Existing social networking services provide means for people to communicate and express
their feelings in a easy way. Such user generated content contains clues of user’s behaviors and
preferences, as well as other metadata information that is now available for scientific research.
Twitter, in particular, has become a relevant source for social networking studies, mainly because:
it provides a simple way for users to express their feelings, ideas, and opinions; makes
the user generated content and associated metadata available to the community; and furthermore
provides easy-to-use web interfaces and application programming interfaces (API) to access
data. For many studies, the available information about a user is relevant. However, the gender
attribute is not provided when creating a Twitter account.
The main focus of this study is to infer the users’ gender from other available information.
We propose a methodology for gender detection of Twitter users, using unstructured information
found on Twitter profile, user generated content, and later using the user’s profile picture.
In previous studies, one of the challenges presented was the labor-intensive task of manually
labelling datasets. In this study, we propose a method for creating extended labelled datasets in
a semi-automatic fashion. With the extended labelled datasets, we associate the users’ textual
content with their gender and created gender models, based on the users’ generated content and
profile information. We explore supervised and unsupervised classifiers and evaluate the results
in both Portuguese and English Twitter user datasets. We obtained an accuracy of 93.2% with
English users and an accuracy of 96.9% with Portuguese users. The proposed methodology of
our research is language independent, but our focus was given to Portuguese and English users.Os serviços de redes sociais existentes proporcionam meios para as pessoas comunicarem
e exprimirem os seus sentimentos de uma forma fácil. O conteúdo gerado por estes utilizadores
contém indÃcios dos seus comportamentos e preferências, bem como outros metadados que estão
agora disponÃveis para investigação cientÃfica. O Twitter em particular, tornou-se uma fonte
importante para estudos das redes socias, sobretudo porque fornece um modo simples para os
utilizadores expressarem os seus sentimentos, ideias e opiniões; disponibiliza o conteúdo gerado
pelos utilizadores e os metadados associados à comunidade; e fornece interfaces web e interfaces
de programação de aplicações (API) para acesso aos dados de fácil utilização. Para muitos
estudos, a informação disponÃvel sobre um utilizador é relevante. No entanto, o atributo de
género não é fornecido ao criar uma conta no Twitter.
O foco principal deste estudo é inferir o género dos utilizadores através da informação
disponÃvel. Propomos uma metodologia para a detecção de género de utilizadores do Twitter,
usando informação não estruturada encontrada no perfil do Twitter, no conteúdo gerado pelo
utilizador, e mais tarde usando a imagem de perfil do utilizador. Em estudos anteriores, um dos
desafios apresentados foi a tarefa de etiquetar manualmente dados, que revelou exigir bastante
trabalho. Neste estudo, propomos um método para a criação de conjuntos de dados etiquetados
de uma forma semi-automática, utilizando um conjunto de atributos com base na informação
não estruturada de perfil. Utilizando os conjuntos de dados etiquetados, associamos conteúdo
textual ao seu género e criamos modelos, com base no conteúdo gerado pelos utilizadores, e
na informação de perfil. Exploramos classificadores supervisionados e não supervisionados e
avaliamos os resultados em ambos os conjuntos de dados de utilizadores Portugueses e Ingleses
do Twitter. Obtivemos uma precisão de 93,2% com utilizadores Ingleses e uma precisão de
96,9% com utilizadores Portugueses. A metodologia proposta é independente do idioma, mas
o foco foi dado a utilizadores Portugueses e Ingleses
Analyzing the Language of Food on Social Media
We investigate the predictive power behind the language of food on social
media. We collect a corpus of over three million food-related posts from
Twitter and demonstrate that many latent population characteristics can be
directly predicted from this data: overweight rate, diabetes rate, political
leaning, and home geographical location of authors. For all tasks, our
language-based models significantly outperform the majority-class baselines.
Performance is further improved with more complex natural language processing,
such as topic modeling. We analyze which textual features have most predictive
power for these datasets, providing insight into the connections between the
language of food, geographic locale, and community characteristics. Lastly, we
design and implement an online system for real-time query and visualization of
the dataset. Visualization tools, such as geo-referenced heatmaps,
semantics-preserving wordclouds and temporal histograms, allow us to discover
more complex, global patterns mirrored in the language of food.Comment: An extended abstract of this paper will appear in IEEE Big Data 201
Gender detection of Twitter users based on multiple information sources
Twitter provides a simple way for users to express feelings, ideas and opinions, makes the user generated content and associated metadata, available to the community, and provides easy-to-use web and application programming interfaces to access data. The user profile information is important for many studies, but essential information, such as gender and age, is not provided when accessing a Twitter account. However, clues about the user profile, such as the age and gender, behaviors, and preferences, can be extracted from other content provided by the user. The main focus of this paper is to infer the gender of the user from unstructured information, including the username, screen name, description and picture, or by the user generated content. We have performed experiments using an English labelled dataset containing 6.5 M tweets from 65 K users, and a Portuguese labelled dataset containing 5.8 M tweets from 58 K users. We have created four distinct classifiers, trained using a supervised approach, each one considering a group of features extracted from four different sources: user name and screen name, user description, content of the tweets, and profile picture. Features related with the activity, such as number of following and number of followers, were discarded, since these features were found not indicative of gender. A final classifier that combines the prediction of each one of the four previous individual classifiers achieves the best performance, corresponding to 93.2% accuracy for English and 96.9% accuracy for Portuguese data.info:eu-repo/semantics/acceptedVersio
Scraping the Social? Issues in live social research
What makes scraping methodologically interesting for social and cultural research? This paper seeks to contribute to debates about digital social research by exploring how a ‘medium-specific’ technique for online data capture may be rendered analytically productive for social research. As a device that is currently being imported into social research, scraping has the capacity to re-structure social research, and this in at least two ways. Firstly, as a technique that is not native to social research, scraping risks to introduce ‘alien’ methodological assumptions into social research (such as an pre-occupation with freshness). Secondly, to scrape is to risk importing into our inquiry categories that are prevalent in the social practices enabled by the media: scraping makes available already formatted data for social research. Scraped data, and online social data more generally, tend to come with ‘external’ analytics already built-in. This circumstance is often approached as a ‘problem’ with online data capture, but we propose it may be turned into virtue, insofar as data formats that have currency in the areas under scrutiny may serve as a source of social data themselves. Scraping, we propose, makes it possible to render traffic between the object and process of social research analytically productive. It enables a form of ‘real-time’ social research, in which the formats and life cycles of online data may lend structure to the analytic objects and findings of social research. By way of a conclusion, we demonstrate this point in an exercise of online issue profiling, and more particularly, by relying on Twitter to profile the issue of ‘austerity’. Here we distinguish between two forms of real-time research, those dedicated to monitoring live content (which terms are current?) and those concerned with analysing the liveliness of issues (which topics are happening?)
Wearing Many (Social) Hats: How Different are Your Different Social Network Personae?
This paper investigates when users create profiles in different social
networks, whether they are redundant expressions of the same persona, or they
are adapted to each platform. Using the personal webpages of 116,998 users on
About.me, we identify and extract matched user profiles on several major social
networks including Facebook, Twitter, LinkedIn, and Instagram. We find evidence
for distinct site-specific norms, such as differences in the language used in
the text of the profile self-description, and the kind of picture used as
profile image. By learning a model that robustly identifies the platform given
a user's profile image (0.657--0.829 AUC) or self-description (0.608--0.847
AUC), we confirm that users do adapt their behaviour to individual platforms in
an identifiable and learnable manner. However, different genders and age groups
adapt their behaviour differently from each other, and these differences are,
in general, consistent across different platforms. We show that differences in
social profile construction correspond to differences in how formal or informal
the platform is.Comment: Accepted at the 11th International AAAI Conference on Web and Social
Media (ICWSM17
- …