4 research outputs found
Applying Deep Machine Learning for psycho-demographic profiling of Internet users using O.C.E.A.N. model of personality
In the modern era, each Internet user leaves enormous amounts of auxiliary
digital residuals (footprints) by using a variety of on-line services. All this
data is already collected and stored for many years. In recent works, it was
demonstrated that it's possible to apply simple machine learning methods to
analyze collected digital footprints and to create psycho-demographic profiles
of individuals. However, while these works clearly demonstrated the
applicability of machine learning methods for such an analysis, created simple
prediction models still lacks accuracy necessary to be successfully applied for
practical needs. We have assumed that using advanced deep machine learning
methods may considerably increase the accuracy of predictions. We started with
simple machine learning methods to estimate basic prediction performance and
moved further by applying advanced methods based on shallow and deep neural
networks. Then we compared prediction power of studied models and made
conclusions about its performance. Finally, we made hypotheses how prediction
accuracy can be further improved. As result of this work, we provide full
source code used in the experiments for all interested researchers and
practitioners in corresponding GitHub repository. We believe that applying deep
machine learning for psycho-demographic profiling may have an enormous impact
on the society (for good or worse) and provides means for Artificial
Intelligence (AI) systems to better understand humans by creating their
psychological profiles. Thus AI agents may achieve the human-like ability to
participate in conversation (communication) flow by anticipating human
opponents' reactions, expectations, and behavior
Applying data science and machine learning for psycho-demographic profiling of internet users
Dissertação de mestrado em Engenharia InformáticaThere always have been a huge interest in working with public data from online social
media users, with the exponential growth of social media usage, this interest and re searches on the area keep increasing.
This thesis aims to address prediction and classification tasks on online social net work data. The goal is to predict psycho-demographic - personality and demographic -
traits by doing text emotion analysis on social networks as Twitter and Facebook. Our
main motivation was to raise awareness to what can be done with users’ social media
or network information or usual behaviours on the web, such as from text analysis
we can trace their personality, know their tastes, how they behave and so on, and to
spread the emotion-text relation on social networks subject, because it only started to
be studied recently and there’s so much data and information to do it.
To perform these tasks mentioned above we carried an extensive review of literature
of previous works to define the state-of-art of the project and to learn and identify work
strategies. Almost all of the past researches, based their results on a vast sample of
users and data, but because some frameworks and APIs were shutdown in recent years,
such as MyPersonality from Facebook adding to some frameworks being paid for,
resulted in a small sample of users’ data to analyze in our thesis which can prejudice
the results.
We start by gathering data from Twitter and Facebook with users consent. On Twit ter we focused on tweets and retweets, on Facebook we focused on all of what the
user typed by using the DataSelfie plugin that stored all that data on a server that
can be retrieved later. Our next step was to find emotions on their text data with the
help of a lexicon that categorized words by eight different emotions, two of them were
put away because we focused only on the six major emotions - this is explained later
- and we had to remove stopwords and apply stemming to all of the text and do a
word-matching of every word of our data with every word from the lexicon. After
this, we asked our participants to fulfill a "Big-Five" personality questionnaire and to
provide us their age, so we added the Big-Five traits and age to each users individual
dataset. We got their final versions, ready to apply machine-learning algorithms to
find correlations between emotions and personality or demographic attributes. We
focused on practical and methodological aspects of the user attribute prediction task.
We used many techniques and algorithms that we thought it were best fit for the data
we had and for the goal that we had to achieve.
We gathered data in two datasets that we tested, one of them we called "Mixed Lan guage Dataset", contains all text entries from each user, and the other "User Dataset",
contains one entry per user after we analyze every text entry for all users in order to
have a more general view on each one. For the first mentioned dataset we achieve
best results with the decision trees algorithms, from 58% on the agreeableness trait,
to 68% on the neuroticism trait. This dataset had a problem with the way data was
spread, so it was impossible to predict age and gender with efficiency. As for the lat ter, regarding demographic characteristics all of the classifiers had a good classifying
percentage, from K-nearest’s 73% to Naive Bayes’ 95%. The most solid classifier for
personality traits was the one using the CART decision tree algorithm, it ranged from
50% on the openness trait to 76% on the agreeableness one. There were classifiers with
terrible results, there were others that were a bit dull, and there were some that stood
out as we stated above. We had a small sample, and that was a problem as it wasn’t
consistent or solid in terms of data value and that can change our results, we believe
that our results would be way better if we applied the same mechanisms to a much
bigger sample.
Concluding, we demonstrate how we can predict personality or demographic traits
- BigFive traits, age or gender - from studying emotions in text. As stated above, we
hope this thesis will alert people for what can be done with their online information,
we only focus on psycho-demographic profiling, but there are many other things that
can be done.Sempre houve um enorme interesse em trabalhar com dados públicos dos utilizadores
das redes sociais online, com o crescimento exponencial do uso das redes sociais, esse
interesse e pesquisas na área continuam a crescer imenso.
Esta tese tem como objetivo abordar tarefas de previsão e classificação de dados
de redes sociais online. O objetivo é prever traços psico-demográficos - de personalidade e demográficos - fazendo análises de emoções presentes no texto em redes
sociais como Twitter e Facebook. A nossa principal motivação foi consciencializar os
utilizadores sobre o que pode ser feito com as informações dos utilizadores ou com os
seus comportamentos na web, por exemplo, com a análise de texto, podemos traçar a
sua personalidade, conhecer os seus gostos, saber como eles se comportam e assim por
diante, e para espalhar a relação texto-emoções nas redes sociais, porque só começou
a ser estudado recentemente e há imensos dados e informações para isso.
Para realizar essas tarefas mencionadas acima, realizamos uma extensa revisão da
literatura de trabalhos anteriores para definir o estado da arte do projeto, aprender
e identificar estratégias de trabalho. Quase todas as pesquisas anteriores basearam
os seus resultados numa vasta amostra de utilizadores e dados, mas como algumas
frameworks e APIs foram encerradas nos últimos anos, como a MyPersonality do
Facebook, adicionando a algumas frameworks que são pagas, o resultado foi que na
nossa tese tivemos uma pequena amostra de dados de utilizadores para analisar o que
pode prejudicar os resultados.
Começamos por recolher os dados do Twitter e do Facebook com o consentimento
dos utilizadores. No Twitter, concentramo-nos nos tweets e retweets, no Facebook
concentramo-nos em tudo o que o utilizador digitou usando o plugin DataSelfie que
armazena todos os dados num servidor que podem ser recuperados mais tarde. O
nosso passo seguinte foi encontrar emoções no texto digitado por cada utilizador com
a ajuda de um léxico que categoriza palavras por oito emoções diferentes, duas dessas
emoções foram descartadas, concentrando-nos apenas nas seis principais emoções -
o processo é explicado mais tarde - e tivemos que remover as stopwords e aplicar
stemming a todo o texto e fazer uma correspondência de cada palavra dos nossos dados com cada palavra do léxico. Depois disto, pedimos aos nossos participantes que
preenchessem um questionário de personalidade "Big-Five" e nos dessem a conhecer a
sua idade. Adicionamos as 5 caracterÃsticas do "Big-Five" e a idade ao dataset individual de cada utilizador e obtivemos as suas versões finais, prontas para aplicar algoritmos de aprendizagem de máquina para encontrar correlações entre as emoções e personalidade ou atributos demográficos. Focamo-nos nos aspectos práticos e metodológicos da tarefa de predição e classificação de atributos do utilizador. Muitas técnicas e
algoritmos foram utilizados, aqueles que consideramos mais adequados para os dados
que tÃnhamos e o objetivo que tÃnhamos que alcançar.
Obtemos dados para dois datasets diferentes que testamos no final, um deles chamado
de "Mixed Language Dataset", contém todas as entradas de texto de cada utilizador
e o outro "User Dataset" contém uma entrada por utilizador após analisarmos todas
as entradas de texto de todos eles para ter informação mais concisa geral sobre cada
um. Para o primeiro conjunto de dados mencionado, os melhores resultados obtidos
foram com os algoritmos de árvores de decisão, de 58% na caracterÃstica de agreabilidade, para 68% na caracterÃstica de neuroticismo. Este conjunto de dados tinha um
problema com a forma como os dados estavam compostos no dataset, por isso foi impossÃvel prever idade e género com eficiência. Quanto ao último dataset, em relação
à s caracterÃsticas demográficas, todos os classificadores tiveram uma boa percentagem
de classificação, de 73% de K-nearest para 95% com Naive Bayes. O classificador
mais sólido para os traços de personalidade foi o que usou o algoritmo de árvore
de decisão, CART, que varia apenas entre 50% no traço de "abertura a experiências"
e 76% no de agreabilidade. Tivemos classificadores com resultados terrÃveis, houve
outros que foram um pouco "aborrecidos", e houve alguns que se destacaram como
afirmamos acima. A nossa amostra era consideravelmente pequena e isso foi um problema para nós, pois não era consistente ou sólido em termos de valores de dados e
isso provavelmente alterou alguns dos nossos resultados, com uma amostra bem maior,
mais profunda, acreditamos que aplicando os mesmos processos e mecanismos, terÃamos resultados mais sólidos e mais consistentes.
Concluindo, demonstramos como é possÃvel prever traços de personalidade ou demográficos - traços BigFive, idade ou género - a partir do estudo de emoções presentes
em texto. Como foi dito acima, esperamos que esta tese permita que os utilizadores
tenham mais consciência da importância dos seus dados e do que conseguimos atingir
com eles
Personalized political communication in the era of media abundance: a comparative study of practices in the United States, United Kingdom and Nigeria
This thesis is a multi-method qualitative comparative study of modern campaign practices in the United States, United Kingdom and Nigeria. Designed to contribute to the gap in knowledge on the technological dimension and features of modern electioneering, the thesis focuses on the 2008 and 2012 Obama campaigns as a technologically innovative exemplar to explore changes and emerging practices in campaigning across three democracies.
Findings indicate that in the two advanced democracies, campaigning has entered a historically new era where data driven practices and new technology now form the ingredients and infrastructure for voter identification, mobilization, persuasion and de-mobilization.
Three key contributions are notable in the thesis. First, the comparative methodological design of the study allowed for a typology that captures the technological state and dimension (s) of modern campaign practices to be developed. This way, the work builds comparative theory and rescues the field from comparative knowledge stagnation on the technological features of modern campaigns.
Second, using empirical evidence from the three case studies, the thesis contributes to theory by reducing and strengthening the explanatory scope of Swanson and Mancini’s (1996) Americanization and modernization theses respectively.
Third, the thesis also adds contemporary understanding to the dynamics of contextual factors and conditions that shape innovation and the uptake of technologically innovative approach (es) to campaign in the United States, United Kingdom and Nigeria
Le Canada et son positionnement stratégique dans les négociations sur le commerce numérique : le cas de l’accord Canada-États-Unis-Mexique
Ce mémoire a comme objectif d'introduire une grille d'analyse permettant d'évaluer le niveau de légalisation du chapitre 19 de l'Accord Canada-États-Unis-Mexique portant sur le commerce numérique. S'ajoute à cette grille une analyse descriptive, bâtie autour du concept d'État faible, détaillant l'importance du niveau de légalisation sur la capacité canadienne à définir sa propre stratégie commerciale dans l'économie numérique canadienne