88 research outputs found

    Tensor Learning for Recovering Missing Information: Algorithms and Applications on Social Media

    Get PDF
    Real-time social systems like Facebook, Twitter, and Snapchat have been growing rapidly, producing exabytes of data in different views or aspects. Coupled with more and more GPS-enabled sharing of videos, images, blogs, and tweets that provide valuable information regarding “who”, “where”, “when” and “what”, these real-time human sensor data promise new research opportunities to uncover models of user behavior, mobility, and information sharing. These real-time dynamics in social systems usually come in multiple aspects, which are able to help better understand the social interactions of the underlying network. However, these multi-aspect datasets are often raw and incomplete owing to various unpredictable or unavoidable reasons; for instance, API limitations and data sampling policies can lead to an incomplete (and often biased) perspective on these multi-aspect datasets. This missing data could raise serious concerns such as biased estimations on structural properties of the network and properties of information cascades in social networks. In order to recover missing values or information in social systems, we identify “4S” challenges: extreme sparsity of the observed multi-aspect datasets, adoption of rich side information that is able to describe the similarities of entities, generation of robust models rather than limiting them on specific applications, and scalability of models to handle real large-scale datasets (billions of observed entries). With these challenges in mind, this dissertation aims to develop scalable and interpretable tensor-based frameworks, algorithms and methods for recovering missing information on social media. In particular, this dissertation research makes four unique contributions: _ The first research contribution of this dissertation research is to propose a scalable framework based on low-rank tensor learning in the presence of incomplete information. Concretely, we formally define the problem of recovering the spatio-temporal dynamics of online memes and tackle this problem by proposing a novel tensor-based factorization approach based on the alternative direction method of multipliers (ADMM) with the integration of the latent relationships derived from contextual information among locations, memes, and times. _ The second research contribution of this dissertation research is to evaluate the generalization of the proposed tensor learning framework and extend it to the recommendation problem. In particular, we develop a novel tensor-based approach to solve the personalized expert recommendation by integrating both the latent relationships between homogeneous entities (e.g., users and users, experts and experts) and the relationships between heterogeneous entities (e.g., users and experts, topics and experts) from the geo-spatial, topical, and social contexts. _ The third research contribution of this dissertation research is to extend the proposed tensor learning framework to the user topical profiling problem. Specifically, we propose a tensor-based contextual regularization model embedded into a matrix factorization framework, which leverages the social, textual, and behavioral contexts across users, in order to overcome identified challenges. _ The fourth research contribution of this dissertation research is to scale up the proposed tensor learning framework to be capable of handling real large-scale datasets that are too big to fit in the main memory of a single machine. Particularly, we propose a novel distributed tensor completion algorithm with the trace-based regularization of the auxiliary information based on ADMM under the proposed tensor learning framework, which is designed to scale up to real large-scale tensors (e.g., billions of entries) by efficiently computing auxiliary variables, minimizing intermediate data, and reducing the workload of updating new tensors

    Modeling adoption dynamics in social networks

    Get PDF

    Cascading Behaviour in Complex Soci-Technical Networks

    Get PDF
    Most human interactions today take place with the mediation of information and communications technology. This is extending the boundaries of interdependence: the group of reference, ideas and behaviour to which people are exposed is larger and less restricted to old geographical and cultural boundaries; but it is also providing more and better data with which to build more informative models on the effects of social interactions, amongst them, the way in which contagion and cascades diffuse in social networks. Online data are not only helping us gain deeper insights into the structural complexity of social systems, they are also illuminating the consequences of that complexity, especially around collective and temporal dynamics. This paper offers an overview of the models and applications that have been developed in what is still a nascent area of research, as well as an outline of immediate lines of work that promise to open new vistas in our understanding of cascading behaviour in social networks

    Twitter permeability to financial events: an experiment towards a model for sensing irregularities

    Get PDF
    There is a general consensus of the good sensing and novelty character- istics of Twitter as an information media for the complex fi nancial market. This paper investigates the permeability of Twitter sphere, the total universe of Twitter users and their habits, towards relevant events in the financial market. Analysis shows that a general purpose social media is permeable to fi nancial-specifi c events and establishes Twitter as a relevant feeder for taking decisions regarding the fi nancial market and event fraudulent activities in that market. However, the provenance of contributions, their diferent levels of credibility and quality and even the purpose or intention behind them should to be considered and carefully contemplated if Twitter is used as a single source for decision taking. With the overall aim of this research, to deploy an architecture for real-time monitoring of irregularities in the financial market, this paper conducts a series of experiments on the level of permeability and the permeable features of Twitter in the event of one of these irregularities. To be precise, Twitter data is collected concerning an event comprising of a specifi c financial action on the 27th January 2017: the announcement about the merge of two companies Tesco PLC and Booker Group PLC, listed in the main market of the London Stock Exchange (LSE), to create the UK's Leading Food Business. The experiment attempts to answer two research questions which aim to characterize the features of Twitter permeability to the fi nancial market. The experimental results con rm that a far-impacting financial event, such as the merger considered, caused apparent disturbances in all the features considered, that is, information volume, content and sentiment as well as geographical provenance. Analysis shows that despite, Twitter not being a specifi c fi nancial forum, it is permeable to financial events

    Quantitative intersectional data (QUINTA): a #metoo case study

    Get PDF
    This research began as an investigation of the #metoo movement, with the initial impetus to illuminate the voices located on the margins, those who often go unheard or are never recognized. This work aimed to understand the intersectional aspects of how these hashtag variations of the hashtag #metoo (i.e. #metoomosque, #churchtoo, #metoodisable, #metooqueer, #metoochina, etc) reveal the inequities of the #metoo movement on Twitter. The proliferation of these hashtag variations has often been ignored by scholars, and therefore absorbed into the larger #metoo movement conversation on Twitter. Therefore, the term `hashtag derivative\u27 was created to describe the variation on the theme of its original hashtag, strongly reflecting its composition. Moreover, a critical theory such as Intersectionality is well-equipped to explore how overlapping identities encounter structure social reality relationship to power. Amid a pandemic and racial unrest, the true capabilities of Intersectionality to describe inequities and injustices beyond the singular social position of race and gender are not widely understood. Data science, is not absolved of its role in inequities and injustices merely by dint of being a quantitative field that claims to ``objectivity\u27\u27. Social scientists have illuminated the racism, sexism, ableism, transphobia, homophobia, prejudice, bigotry, and bias embedded in data science\u27s technology, tools, and algorithms. This has, direct and indirectly, grave consequences on an entire community as a whole as well as marginalized communities. The application of Intersectionality into a quantitative field can provide researchers a formal structure to be more conscientious about how to critique, develop, and design their data science processes, while also reckoning with their own positioning in relationship to the data. In this way, Intersectionality is inclusive in terms of data equity yet adds an additional layer of accountability to the researcher. This research leads to the three critical contributions of this work: (1) creating a more concise terminology to describe the phenomenon of hashtag variation, known as hashtag derivatives, (2) defining the historical context of Intersectionality and building a formal case for this to be properly contextualized in the Computer Science field (in particular Data Science), and (3) developing the Quantitative Intersectional Data (QUINTA) Framework which data scientists and scholars can use to be more equitable, inclusive and accountable for their role in the data science process

    Tensor Learning for Recovering Missing Information: Algorithms and Applications on Social Media

    Get PDF
    Real-time social systems like Facebook, Twitter, and Snapchat have been growing rapidly, producing exabytes of data in different views or aspects. Coupled with more and more GPS-enabled sharing of videos, images, blogs, and tweets that provide valuable information regarding “who”, “where”, “when” and “what”, these real-time human sensor data promise new research opportunities to uncover models of user behavior, mobility, and information sharing. These real-time dynamics in social systems usually come in multiple aspects, which are able to help better understand the social interactions of the underlying network. However, these multi-aspect datasets are often raw and incomplete owing to various unpredictable or unavoidable reasons; for instance, API limitations and data sampling policies can lead to an incomplete (and often biased) perspective on these multi-aspect datasets. This missing data could raise serious concerns such as biased estimations on structural properties of the network and properties of information cascades in social networks. In order to recover missing values or information in social systems, we identify “4S” challenges: extreme sparsity of the observed multi-aspect datasets, adoption of rich side information that is able to describe the similarities of entities, generation of robust models rather than limiting them on specific applications, and scalability of models to handle real large-scale datasets (billions of observed entries). With these challenges in mind, this dissertation aims to develop scalable and interpretable tensor-based frameworks, algorithms and methods for recovering missing information on social media. In particular, this dissertation research makes four unique contributions: _ The first research contribution of this dissertation research is to propose a scalable framework based on low-rank tensor learning in the presence of incomplete information. Concretely, we formally define the problem of recovering the spatio-temporal dynamics of online memes and tackle this problem by proposing a novel tensor-based factorization approach based on the alternative direction method of multipliers (ADMM) with the integration of the latent relationships derived from contextual information among locations, memes, and times. _ The second research contribution of this dissertation research is to evaluate the generalization of the proposed tensor learning framework and extend it to the recommendation problem. In particular, we develop a novel tensor-based approach to solve the personalized expert recommendation by integrating both the latent relationships between homogeneous entities (e.g., users and users, experts and experts) and the relationships between heterogeneous entities (e.g., users and experts, topics and experts) from the geo-spatial, topical, and social contexts. _ The third research contribution of this dissertation research is to extend the proposed tensor learning framework to the user topical profiling problem. Specifically, we propose a tensor-based contextual regularization model embedded into a matrix factorization framework, which leverages the social, textual, and behavioral contexts across users, in order to overcome identified challenges. _ The fourth research contribution of this dissertation research is to scale up the proposed tensor learning framework to be capable of handling real large-scale datasets that are too big to fit in the main memory of a single machine. Particularly, we propose a novel distributed tensor completion algorithm with the trace-based regularization of the auxiliary information based on ADMM under the proposed tensor learning framework, which is designed to scale up to real large-scale tensors (e.g., billions of entries) by efficiently computing auxiliary variables, minimizing intermediate data, and reducing the workload of updating new tensors

    Analysis and Application of Language Models to Human-Generated Textual Content

    Get PDF
    Social networks are enormous sources of human-generated content. Users continuously create information, useful but hard to detect, extract, and categorize. Language Models (LMs) have always been among the most useful and used approaches to process textual data. Firstly designed as simple unigram models, they improved through the years until the recent release of BERT, a pre-trained Transformer-based model reaching state-of-the-art performances in many heterogeneous benchmark tasks, such as text classification and tagging. In this thesis, I apply LMs to textual content publicly shared on social media. I selected Twitter as the principal source of data for the performed experiments since its users mainly share short and noisy texts. My goal is to build models that generate meaningful representations of users encoding their syntactic and semantic features. Once appropriate embeddings are defined, I compute similarities between users to perform higher-level analyses. Tested tasks include the extraction of emerging knowledge, represented by users similar to a given set of well-known accounts, controversy detection, obtaining controversy scores for topics discussed online, community detection and characterization, clustering similar users and detecting outliers, and stance classification of users and tweets (e.g., political inclination, COVID-19 vaccines position). The obtained results suggest that publicly available data contains delicate information about users, and Language Models can now extract it, threatening users' privacy

    User Behavior Mining in Microblogging

    Get PDF
    • …
    corecore