12 research outputs found

    Large scale homophily analysis in twitter using a twixonomy

    Get PDF
    In this paper we perform a large-scale homophily analysis on Twitter using a hierarchical representation of users' interests which we call a Twixonomy. In order to build a population, community, or single-user Twixonomy we first associate "topical" friends in users' friendship lists (i.e. friends representing an interest rather than a social relation between peers) with Wikipedia categories. A wordsense disambiguation algorithm is used to select the appropriate wikipage for each topical friend. Starting from the set of wikipages representing "primitive" interests, we extract all paths connecting these pages with topmost Wikipedia category nodes, and we then prune the resulting graph G efficiently so as to induce a direct acyclic graph. This graph is the Twixonomy. Then, to analyze homophily, we compare different methods to detect communities in a peer friends Twitter network, and then for each community we compute the degree of homophily on the basis of a measure of pairwise semantic similarity. We show that the Twixonomy provides a means for describing users' interests in a compact and readable way and allows for a fine-grained homophily analysis. Furthermore, we show that midlow level categories in the Twixonomy represent the best balance between informativeness and compactness of the representation

    DPM: A novel distributed large-scale social graph processing framework for link prediction algorithms

    Get PDF
    Large-scale graphs have become ubiquitous in social media. Computer-based recommendations in these huge graphs pose challenges in terms of algorithm design and resource usage efficiency when processing recommendations in distributed computing environments. Moreover, recommendation algorithms for graphs, particularly link prediction algorithms, have different requirements depending of the way the underlying graph is traversed. Path-based algorithms usually perform traversals in different directions to build a large ranking of vertices to recommend, whereas random walk-based algorithms build an initial subgraph and perform several iterations on those vertices to compute the final ranking. In this work, we propose a distributed graph processing framework called Distributed Partitioned Merge (DPM), which supports both types of algorithms and we compare its performance and resource usage w.r.t. two relevant frameworks, namely Fork-Join and Pregel. In our experiments, we show that in most tests DPM outperforms both Pregel and Fork-Join in terms of recommendation time, with a minor penalization in network usage in some scenarios.Fil: Corbellini, Alejandro. Consejo Nacional de Investigaciones CientĂ­ficas y TĂ©cnicas. Centro CientĂ­fico TecnolĂłgico Conicet - Tandil. Instituto Superior de IngenierĂ­a del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de IngenierĂ­a del Software; ArgentinaFil: Godoy, Daniela Lis. Consejo Nacional de Investigaciones CientĂ­ficas y TĂ©cnicas. Centro CientĂ­fico TecnolĂłgico Conicet - Tandil. Instituto Superior de IngenierĂ­a del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de IngenierĂ­a del Software; ArgentinaFil: Mateos Diaz, Cristian Maximiliano. Consejo Nacional de Investigaciones CientĂ­ficas y TĂ©cnicas. Centro CientĂ­fico TecnolĂłgico Conicet - Tandil. Instituto Superior de IngenierĂ­a del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de IngenierĂ­a del Software; ArgentinaFil: Schiaffino, Silvia Noemi. Consejo Nacional de Investigaciones CientĂ­ficas y TĂ©cnicas. Centro CientĂ­fico TecnolĂłgico Conicet - Tandil. Instituto Superior de IngenierĂ­a del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de IngenierĂ­a del Software; ArgentinaFil: Zunino Suarez, Alejandro Octavio. Consejo Nacional de Investigaciones CientĂ­ficas y TĂ©cnicas. Centro CientĂ­fico TecnolĂłgico Conicet - Tandil. Instituto Superior de IngenierĂ­a del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de IngenierĂ­a del Software; Argentin

    Efficient pruning of large knowledge graphs

    Get PDF
    In this paper we present an efficient and highly accurate algorithm to prune noisy or over-ambiguous knowledge graphs given as input an extensional definition of a domain of interest, namely as a set of instances or concepts. Our method climbs the graph in a bottom-up fashion, iteratively layering the graph and pruning nodes and edges in each layer while not compromising the connectivity of the set of input nodes. Iterative layering and protection of pre-defined nodes allow to extract semantically coherent DAG structures from noisy or over-ambiguous cyclic graphs, without loss of information and without incurring in computational bottlenecks, which are the main problem of stateof- the-art methods for cleaning large, i.e., Webscale, knowledge graphs. We apply our algorithm to the tasks of pruning automatically acquired taxonomies using benchmarking data from a SemEval evaluation exercise, as well as the extraction of a domain-adapted taxonomy from theWikipedia category hierarchy. The results show the superiority of our approach over state-of-art algorithms in terms of both output quality and computational efficiency

    Stepping Out of the Ivory Tower for Ocean Literacy

    Get PDF
    The Ocean Literacy movement is predominantly driven forward by scientists and educators working in subject areas associated with ocean science. While some in the scientific community have heeded the responsibility to communicate with the general public to increase scientific literacy, reaching and engaging with diverse audiences remains a challenge. Many academic institutions, research centers, and individual scientists use social network sites (SNS) like Twitter to not only promote conferences, journal publications, and scientific reports, but to disseminate resources and information that have the potential to increase the scientific literacy of diverse audiences. As more people turn to social media for news and information, SNSs like Twitter have a great potential to increase ocean literacy, so long as disseminators understand the best practices and limitations of SNS communication. This study analyzed the Twitter account of MaREI – Ireland’s Centre for Marine and Renewable Energy – coordinated by University College Cork Ireland, as a case study. We looked specifically at posts related to ocean literacy to determine what types of audiences are being engaged and what factors need to be considered to increase engagement with intended audiences. Two main findings are presented in this paper. First, we present overall user retweet frequency as a function of post characteristics, highlighting features significant in influencing users’ retweet behavior. Second, we separate users into two types – INREACH and OUTREACH – and identify post characteristics that are statistically relevant in increasing the probability of engaging with an OUTREACH user. The results of this study provide novel insight into the ways in which science-based Twitter users can better use the platform as a vector for science communication and outreach

    Efficient pruning of large knowledge graphs

    Get PDF
    In this paper we present an efficient and highly accurate algorithm to prune noisy or over-ambiguous knowledge graphs given as input an extensional definition of a domain of interest, namely as a set of instances or concepts. Our method climbs the graph in a bottom-up fashion, iteratively layering the graph and pruning nodes and edges in each layer while not compromising the connectivity of the set of input nodes. Iterative layering and protection of pre-defined nodes allow to extract semantically coherent DAG structures from noisy or over-ambiguous cyclic graphs, without loss of information and without incurring in computational bottlenecks, which are the main problem of stateof- the-art methods for cleaning large, i.e., Webscale, knowledge graphs. We apply our algorithm to the tasks of pruning automatically acquired taxonomies using benchmarking data from a SemEval evaluation exercise, as well as the extraction of a domain-adapted taxonomy from theWikipedia category hierarchy. The results show the superiority of our approach over state-of-art algorithms in terms of both output quality and computational efficiency

    End-to-End Reinforcement Learning for Automatic Taxonomy Induction

    Get PDF
    We present a novel end-to-end reinforcement learning approach to automatic taxonomy induction from a set of terms. While prior methods treat the problem as a two-phase task (i.e., detecting hypernymy pairs followed by organizing these pairs into a tree-structured hierarchy), we argue that such two-phase methods may suffer from error propagation, and cannot effectively optimize metrics that capture the holistic structure of a taxonomy. In our approach, the representations of term pairs are learned using multiple sources of information and used to determine \textit{which} term to select and \textit{where} to place it on the taxonomy via a policy network. All components are trained in an end-to-end manner with cumulative rewards, measured by a holistic tree metric over the training taxonomies. Experiments on two public datasets of different domains show that our approach outperforms prior state-of-the-art taxonomy induction methods up to 19.6\% on ancestor F1.Comment: 11 Pages. ACL 2018 Camera Read

    Inferring user interests in microblogging social networks: a survey

    Get PDF
    With the growing popularity of microblogging services such as Twitter in recent years, an increasing number of users are using these services in their daily lives. The huge volume of information generated by users raises new opportunities in various applications and areas. Inferring user interests plays a significant role in providing personalized recommendations on microblogging services, and also on third-party applications providing social logins via these services, especially in cold-start situations. In this survey, we review user modeling strategies with respect to inferring user interests from previous studies. To this end, we focus on four dimensions of inferring user interest profiles: (1) data collection, (2) representation of user interest profiles, (3) construction and enhancement of user interest profiles, and (4) the evaluation of the constructed profiles. Through this survey, we aim to provide an overview of state-of-the-art user modeling strategies for inferring user interest profiles on microblogging social networks with respect to the four dimensions. For each dimension, we review and summarize previous studies based on specified criteria. Finally, we discuss some challenges and opportunities for future work in this research domain
    corecore