Search CORE

134 research outputs found

All liaisons are dangerous when all your friends are known to us

Author: Gayo-Avello Daniel
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 29/12/2010
Field of study

Online Social Networks (OSNs) are used by millions of users worldwide. Academically speaking, there is little doubt about the usefulness of demographic studies conducted on OSNs and, hence, methods to label unknown users from small labeled samples are very useful. However, from the general public point of view, this can be a serious privacy concern. Thus, both topics are tackled in this paper: First, a new algorithm to perform user profiling in social networks is described, and its performance is reported and discussed. Secondly, the experiments --conducted on information usually considered sensitive-- reveal that by just publicizing one's contacts privacy is at risk and, thus, measures to minimize privacy leaks due to social graph data mining are outlined.Comment: 10 pages, 5 table

arXiv.org e-Print Archive

Crossref

Repositorio Institucional de la Universidad de Oviedo

Nepotistic relationships in Twitter and their impact on rank prestige algorithms

Author: Gayo Avello Daniel
Publication venue: Elsevier
Publication date: 18/10/2012
Field of study

Micro-blogging services such as Twitter allow anyone to publish anything, anytime. Needless to say, many of the available contents can be diminished as babble or spam. However, given the number and diversity of users, some valuable pieces of information should arise from the stream of tweets. Thus, such services can develop into valuable sources of up-to-date information (the so-called real-time web) provided a way to find the most relevant/trustworthy/authoritative users is available. Hence, this makes a highly pertinent question for which graph centrality methods can provide an answer. In this paper the author offers a comprehensive survey of feasible algorithms for ranking users in social networks, he examines their vulnerabilities to linking malpractice in such networks, and suggests an objective criterion against which to compare such algorithms. Additionally, he suggests a first step towards ―desensitizing‖ prestige algorithms against cheating by spammers and other abusive use

arXiv.org e-Print Archive

Repositorio Institucional de la Universidad de Oviedo

Leveraging Wikidata's edit history in knowledge graph refinement tasks

Author: Gayo-Avello Daniel
Gonzalez-Hevia Alejandro
Publication venue
Publication date: 27/10/2022
Field of study

Knowledge graphs have been adopted in many diverse fields for a variety of purposes. Most of those applications rely on valid and complete data to deliver their results, pressing the need to improve the quality of knowledge graphs. A number of solutions have been proposed to that end, ranging from rule-based approaches to the use of probabilistic methods, but there is an element that has not been considered yet: the edit history of the graph. In the case of collaborative knowledge graphs (e.g., Wikidata), those edits represent the process in which the community reaches some kind of fuzzy and distributed consensus over the information that best represents each entity, and can hold potentially interesting information to be used by knowledge graph refinement methods. In this paper, we explore the use of edit history information from Wikidata to improve the performance of type prediction methods. To do that, we have first built a JSON dataset containing the edit history of every instance from the 100 most important classes in Wikidata. This edit history information is then explored and analyzed, with a focus on its potential applicability in knowledge graph refinement tasks. Finally, we propose and evaluate two new methods to leverage this edit history information in knowledge graph embedding models for type prediction tasks. Our results show an improvement in one of the proposed methods against current approaches, showing the potential of using edit information in knowledge graph refinement tasks and opening new promising research lines within the field.Comment: 18 pages, 7 figures. Submitted to the Journal of Web Semantic

arXiv.org e-Print Archive

On predictability of rare events leveraging social media: a machine learning perspective

Author: Bakliwal A.
Gayo-Avello D.
Go A.
Saif H.
Tumasjan A.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/02/2015
Field of study

Information extracted from social media streams has been leveraged to forecast the outcome of a large number of real-world events, from political elections to stock market fluctuations. An increasing amount of studies demonstrates how the analysis of social media conversations provides cheap access to the wisdom of the crowd. However, extents and contexts in which such forecasting power can be effectively leveraged are still unverified at least in a systematic way. It is also unclear how social-media-based predictions compare to those based on alternative information sources. To address these issues, here we develop a machine learning framework that leverages social media streams to automatically identify and predict the outcomes of soccer matches. We focus in particular on matches in which at least one of the possible outcomes is deemed as highly unlikely by professional bookmakers. We argue that sport events offer a systematic approach for testing the predictive power of social media, and allow to compare such power against the rigorous baselines set by external sources. Despite such strict baselines, our framework yields above 8% marginal profit when used to inform simple betting strategies. The system is based on real-time sentiment analysis and exploits data collected immediately before the games, allowing for informed bets. We discuss the rationale behind our approach, describe the learning framework, its prediction performance and the return it provides as compared to a set of betting strategies. To test our framework we use both historical Twitter data from the 2014 FIFA World Cup games, and real-time Twitter data collected by monitoring the conversations about all soccer matches of four major European tournaments (FA Premier League, Serie A, La Liga, and Bundesliga), and the 2014 UEFA Champions League, during the period between Oct. 25th 2014 and Nov. 26th 2014.Comment: 10 pages, 10 tables, 8 figure

arXiv.org e-Print Archive

Crossref

On the influence of social bots in online protests. Preliminary findings of a Mexican case study

Author: D Gayo-Avello
E Ferrara
EM Clark
EM Clark
P Barberá
Z Chu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/09/2016
Field of study

Social bots can affect online communication among humans. We study this phenomenon by focusing on #YaMeCanse, the most active protest hashtag in the history of Twitter in Mexico. Accounts using the hashtag are classified using the BotOrNot bot detection tool. Our preliminary analysis suggests that bots played a critical role in disrupting online communication about the protest movement.Comment: 10 page

arXiv.org e-Print Archive

Crossref

Survey and evaluation of query intent detection methods

Author: Brenes Martínez David José
Gayo Avello Daniel
Pérez González Kilian
Publication venue: ACM
Publication date: 01/01/2009
Field of study

Second ACM International Conference on Web Search and Data Mining, Barcelona (Spain)User interactions with search engines reveal three main underlying intents, namely navigational, informational, and transactional. By providing more accurate results depending on such query intents the performance of search engines can be greatly improved. Therefore, query classification has been an active research topic for the last years. However, while query topic classification has deserved a specific bakeoff, no evaluation campaign has been devoted to the study of automatic query intent detection. In this paper some of the available query intent detection techniques are reviewed, an evaluation framework is proposed, and it is used to compare those methods in order to shed light on their relative performance and drawbacks. As it will be shown, manually prepared gold-standard files are much needed, and traditional pooling is not the most feasible evaluation method. In addition to this, future lines of work in both query intent detection and its evaluation are propose

Crossref

Repositorio Institucional de la Universidad de Oviedo