4 research outputs found

    Applying Deep Machine Learning for psycho-demographic profiling of Internet users using O.C.E.A.N. model of personality

    Full text link
    In the modern era, each Internet user leaves enormous amounts of auxiliary digital residuals (footprints) by using a variety of on-line services. All this data is already collected and stored for many years. In recent works, it was demonstrated that it's possible to apply simple machine learning methods to analyze collected digital footprints and to create psycho-demographic profiles of individuals. However, while these works clearly demonstrated the applicability of machine learning methods for such an analysis, created simple prediction models still lacks accuracy necessary to be successfully applied for practical needs. We have assumed that using advanced deep machine learning methods may considerably increase the accuracy of predictions. We started with simple machine learning methods to estimate basic prediction performance and moved further by applying advanced methods based on shallow and deep neural networks. Then we compared prediction power of studied models and made conclusions about its performance. Finally, we made hypotheses how prediction accuracy can be further improved. As result of this work, we provide full source code used in the experiments for all interested researchers and practitioners in corresponding GitHub repository. We believe that applying deep machine learning for psycho-demographic profiling may have an enormous impact on the society (for good or worse) and provides means for Artificial Intelligence (AI) systems to better understand humans by creating their psychological profiles. Thus AI agents may achieve the human-like ability to participate in conversation (communication) flow by anticipating human opponents' reactions, expectations, and behavior

    Applying data science and machine learning for psycho-demographic profiling of internet users

    Get PDF
    Dissertação de mestrado em Engenharia InformáticaThere always have been a huge interest in working with public data from online social media users, with the exponential growth of social media usage, this interest and re searches on the area keep increasing. This thesis aims to address prediction and classification tasks on online social net work data. The goal is to predict psycho-demographic - personality and demographic - traits by doing text emotion analysis on social networks as Twitter and Facebook. Our main motivation was to raise awareness to what can be done with users’ social media or network information or usual behaviours on the web, such as from text analysis we can trace their personality, know their tastes, how they behave and so on, and to spread the emotion-text relation on social networks subject, because it only started to be studied recently and there’s so much data and information to do it. To perform these tasks mentioned above we carried an extensive review of literature of previous works to define the state-of-art of the project and to learn and identify work strategies. Almost all of the past researches, based their results on a vast sample of users and data, but because some frameworks and APIs were shutdown in recent years, such as MyPersonality from Facebook adding to some frameworks being paid for, resulted in a small sample of users’ data to analyze in our thesis which can prejudice the results. We start by gathering data from Twitter and Facebook with users consent. On Twit ter we focused on tweets and retweets, on Facebook we focused on all of what the user typed by using the DataSelfie plugin that stored all that data on a server that can be retrieved later. Our next step was to find emotions on their text data with the help of a lexicon that categorized words by eight different emotions, two of them were put away because we focused only on the six major emotions - this is explained later - and we had to remove stopwords and apply stemming to all of the text and do a word-matching of every word of our data with every word from the lexicon. After this, we asked our participants to fulfill a "Big-Five" personality questionnaire and to provide us their age, so we added the Big-Five traits and age to each users individual dataset. We got their final versions, ready to apply machine-learning algorithms to find correlations between emotions and personality or demographic attributes. We focused on practical and methodological aspects of the user attribute prediction task. We used many techniques and algorithms that we thought it were best fit for the data we had and for the goal that we had to achieve. We gathered data in two datasets that we tested, one of them we called "Mixed Lan guage Dataset", contains all text entries from each user, and the other "User Dataset", contains one entry per user after we analyze every text entry for all users in order to have a more general view on each one. For the first mentioned dataset we achieve best results with the decision trees algorithms, from 58% on the agreeableness trait, to 68% on the neuroticism trait. This dataset had a problem with the way data was spread, so it was impossible to predict age and gender with efficiency. As for the lat ter, regarding demographic characteristics all of the classifiers had a good classifying percentage, from K-nearest’s 73% to Naive Bayes’ 95%. The most solid classifier for personality traits was the one using the CART decision tree algorithm, it ranged from 50% on the openness trait to 76% on the agreeableness one. There were classifiers with terrible results, there were others that were a bit dull, and there were some that stood out as we stated above. We had a small sample, and that was a problem as it wasn’t consistent or solid in terms of data value and that can change our results, we believe that our results would be way better if we applied the same mechanisms to a much bigger sample. Concluding, we demonstrate how we can predict personality or demographic traits - BigFive traits, age or gender - from studying emotions in text. As stated above, we hope this thesis will alert people for what can be done with their online information, we only focus on psycho-demographic profiling, but there are many other things that can be done.Sempre houve um enorme interesse em trabalhar com dados públicos dos utilizadores das redes sociais online, com o crescimento exponencial do uso das redes sociais, esse interesse e pesquisas na área continuam a crescer imenso. Esta tese tem como objetivo abordar tarefas de previsão e classificação de dados de redes sociais online. O objetivo é prever traços psico-demográficos - de personalidade e demográficos - fazendo análises de emoções presentes no texto em redes sociais como Twitter e Facebook. A nossa principal motivação foi consciencializar os utilizadores sobre o que pode ser feito com as informações dos utilizadores ou com os seus comportamentos na web, por exemplo, com a análise de texto, podemos traçar a sua personalidade, conhecer os seus gostos, saber como eles se comportam e assim por diante, e para espalhar a relação texto-emoções nas redes sociais, porque só começou a ser estudado recentemente e há imensos dados e informações para isso. Para realizar essas tarefas mencionadas acima, realizamos uma extensa revisão da literatura de trabalhos anteriores para definir o estado da arte do projeto, aprender e identificar estratégias de trabalho. Quase todas as pesquisas anteriores basearam os seus resultados numa vasta amostra de utilizadores e dados, mas como algumas frameworks e APIs foram encerradas nos últimos anos, como a MyPersonality do Facebook, adicionando a algumas frameworks que são pagas, o resultado foi que na nossa tese tivemos uma pequena amostra de dados de utilizadores para analisar o que pode prejudicar os resultados. Começamos por recolher os dados do Twitter e do Facebook com o consentimento dos utilizadores. No Twitter, concentramo-nos nos tweets e retweets, no Facebook concentramo-nos em tudo o que o utilizador digitou usando o plugin DataSelfie que armazena todos os dados num servidor que podem ser recuperados mais tarde. O nosso passo seguinte foi encontrar emoções no texto digitado por cada utilizador com a ajuda de um léxico que categoriza palavras por oito emoções diferentes, duas dessas emoções foram descartadas, concentrando-nos apenas nas seis principais emoções - o processo é explicado mais tarde - e tivemos que remover as stopwords e aplicar stemming a todo o texto e fazer uma correspondência de cada palavra dos nossos dados com cada palavra do léxico. Depois disto, pedimos aos nossos participantes que preenchessem um questionário de personalidade "Big-Five" e nos dessem a conhecer a sua idade. Adicionamos as 5 características do "Big-Five" e a idade ao dataset individual de cada utilizador e obtivemos as suas versões finais, prontas para aplicar algoritmos de aprendizagem de máquina para encontrar correlações entre as emoções e personalidade ou atributos demográficos. Focamo-nos nos aspectos práticos e metodológicos da tarefa de predição e classificação de atributos do utilizador. Muitas técnicas e algoritmos foram utilizados, aqueles que consideramos mais adequados para os dados que tínhamos e o objetivo que tínhamos que alcançar. Obtemos dados para dois datasets diferentes que testamos no final, um deles chamado de "Mixed Language Dataset", contém todas as entradas de texto de cada utilizador e o outro "User Dataset" contém uma entrada por utilizador após analisarmos todas as entradas de texto de todos eles para ter informação mais concisa geral sobre cada um. Para o primeiro conjunto de dados mencionado, os melhores resultados obtidos foram com os algoritmos de árvores de decisão, de 58% na característica de agreabilidade, para 68% na característica de neuroticismo. Este conjunto de dados tinha um problema com a forma como os dados estavam compostos no dataset, por isso foi impossível prever idade e género com eficiência. Quanto ao último dataset, em relação às características demográficas, todos os classificadores tiveram uma boa percentagem de classificação, de 73% de K-nearest para 95% com Naive Bayes. O classificador mais sólido para os traços de personalidade foi o que usou o algoritmo de árvore de decisão, CART, que varia apenas entre 50% no traço de "abertura a experiências" e 76% no de agreabilidade. Tivemos classificadores com resultados terríveis, houve outros que foram um pouco "aborrecidos", e houve alguns que se destacaram como afirmamos acima. A nossa amostra era consideravelmente pequena e isso foi um problema para nós, pois não era consistente ou sólido em termos de valores de dados e isso provavelmente alterou alguns dos nossos resultados, com uma amostra bem maior, mais profunda, acreditamos que aplicando os mesmos processos e mecanismos, teríamos resultados mais sólidos e mais consistentes. Concluindo, demonstramos como é possível prever traços de personalidade ou demográficos - traços BigFive, idade ou género - a partir do estudo de emoções presentes em texto. Como foi dito acima, esperamos que esta tese permita que os utilizadores tenham mais consciência da importância dos seus dados e do que conseguimos atingir com eles

    Personalized political communication in the era of media abundance: a comparative study of practices in the United States, United Kingdom and Nigeria

    Get PDF
    This thesis is a multi-method qualitative comparative study of modern campaign practices in the United States, United Kingdom and Nigeria. Designed to contribute to the gap in knowledge on the technological dimension and features of modern electioneering, the thesis focuses on the 2008 and 2012 Obama campaigns as a technologically innovative exemplar to explore changes and emerging practices in campaigning across three democracies. Findings indicate that in the two advanced democracies, campaigning has entered a historically new era where data driven practices and new technology now form the ingredients and infrastructure for voter identification, mobilization, persuasion and de-mobilization. Three key contributions are notable in the thesis. First, the comparative methodological design of the study allowed for a typology that captures the technological state and dimension (s) of modern campaign practices to be developed. This way, the work builds comparative theory and rescues the field from comparative knowledge stagnation on the technological features of modern campaigns. Second, using empirical evidence from the three case studies, the thesis contributes to theory by reducing and strengthening the explanatory scope of Swanson and Mancini’s (1996) Americanization and modernization theses respectively. Third, the thesis also adds contemporary understanding to the dynamics of contextual factors and conditions that shape innovation and the uptake of technologically innovative approach (es) to campaign in the United States, United Kingdom and Nigeria

    Le Canada et son positionnement stratégique dans les négociations sur le commerce numérique : le cas de l’accord Canada-États-Unis-Mexique

    Get PDF
    Ce mémoire a comme objectif d'introduire une grille d'analyse permettant d'évaluer le niveau de légalisation du chapitre 19 de l'Accord Canada-États-Unis-Mexique portant sur le commerce numérique. S'ajoute à cette grille une analyse descriptive, bâtie autour du concept d'État faible, détaillant l'importance du niveau de légalisation sur la capacité canadienne à définir sa propre stratégie commerciale dans l'économie numérique canadienne
    corecore