Breadth analysis of Online Social Networks

Abstract

This thesis is mainly motivated by the analysis, understanding, and prediction of human behaviour by means of the study of their digital fingeprints. Unlike a classical PhD thesis, where you choose a topic and go further on a deep analysis on a research topic, we carried out a breadth analysis on the research topic of complex networks, such as those that humans create themselves with their relationships and interactions. These kinds of digital communities where humans interact and create relationships are commonly called Online Social Networks. Then, (i) we have collected their interactions, as text messages they share among each other, in order to analyze the sentiment and topic of such messages. We have basically applied the state-of-the-art techniques for Natural Language Processing, widely developed and tested on English texts, in a collection of Spanish Tweets and we compare the results. Next, (ii) we focused on Topic Detection, creating our own classifier and applying it to the former Tweets dataset. The breakthroughs are two: our classifier relies on text-graphs from the input text and we achieved a figure of 70% accuracy, outperforming previous results. After that, (iii) we moved to analyze the network structure (or topology) and their data values to detect outliers. We hypothesize that in social networks there is a large mass of users that behaves similarly, while a reduced set of them behave in a different way. However, specially among this last group, we try to separate those with high activity, or low activity, or any other paramater/feature that make them belong to different kind of outliers. We aim to detect influential users in one of these outliers set. We propose a new unsupervised method, Massive Unsupervised Outlier Detection (MUOD), labeling the outliers detected os of shape, magnitude, amplitude or combination of those. We applied this method to a subset of roughly 400 million Google+ users, identifying and discriminating automatically sets of outlier users. Finally, (iv) we find interesting to address the monitorization of real complex networks. We created a framework to dynamically adapt the temporality of large-scale dynamic networks, reducing compute overhead by at least 76%, data volume by 60% and overall cloud costs by at least 54%, while always maintaining accuracy above 88%.PublicadoPrograma de Doctorado en Ingeniería Matemática por la Universidad Carlos III de MadridPresidente: Rosa María Benito Zafrilla.- Secretario: Ángel Cuevas Rumín.- Vocal: José Ernesto Jiménez Merin

    Similar works