1,444 research outputs found

    AuthorRank: A New Scheme for Identifying Field-Specific Key Researchers

    Get PDF
    When navigating into a new research field, it is important to identify papers with greatest impact and prominent authors which we can refer to. This work is motivated by the need to identify key authors in research fields. Traditional indices such as h-index only show the overall performance of an author. However, researchers generally contribute to more than one fields of research in their career, which makes it impractical to use h-index for identifying a key researcher in a research field. In this paper we propose a new PageRank-based scheme named “AuthorRank” for identifying key researchers in a specific field. We show that the proposed ranking system performs better than h-index does

    Identifying the topic-specific influential users in Twitter

    Get PDF
    Social Influence can be described as the ability to have an effect on the thoughts or actions of others. Influential members in online communities are becoming the new media to market products and sway opinions. Also, their guidance and recommendations can save some people the search time and assist their selective decision making. The objective of this research is to detect the influential users in a specific topic on Twitter. In more detail, from a collection of tweets matching a specified query, we want to detect the influential users, in an online fashion. In order to address this objective, we first want to focus our search on the individuals who write in their personal accounts, so we investigate how we can differentiate between the personal and non-personal accounts. Secondly, we investigate which set of features can best lead us to the topic-specific influential users, and how these features can be expressed in a model to produce a ranked list of influential users. Finally, we look into the use of the language and if it can be used as a supporting feature for detecting the author\u27s influence. In order to decide on how to differentiate between the personal and non-personal accounts, we compared between the effectiveness of using SVM and using a manually assembled list of the non-personal accounts. In order to decide on the features that can best lead us to the influential users, we ran a few experiments on a set of features inspired from the literature. Two ranking methods were then developed, using feature combinations, to identify the candidate users for being influential. For evaluation we manually examined the users, looking at their tweets and profile page in order to decide on their influence. To address our final objective, we ran a few experiments to investigate if the SLM could be used to identify the influential users\u27 tweets. For user account classification into personal and non-personal accounts, the SVM was found to be domain independent, reliable and consistent with a precision of over 0.9. The results showed that over time the list performance deteriorates and when the domain of the test data was changed, the SVM performed better than the list with higher precision and specificity values. We extracted eight independent features from a set of 12, and ran experiments on these eight and found that the best features at identifying influential users to be the Followers count, the Average Retweets count, The Average Retweets Frequency and the Age_Activity combination. Two ranking methods were developed and tested on a set of tweets retrieved using a specific query. In the first method, these best four features were combined in different ways. The best combination was the one that took the average of the Followers count and the Average Retweets count, producing a precision at 10 value of 0.9. In the second method, the users were ranked according to the eight independent features and the top 50 users of each were included in separate lists. The users were then ranked according to their appearance frequency in these lists. The best result was obtained when we considered the users who appeared in six or more of the lists, which resulted in a precision of 1.0. Both ranking methods were then conducted on 20 different collections of retrieved tweets to verify their effectiveness in detecting influential users, and to compare their performance. The best result was obtained by the second method, for the set of users who appeared in six or more of the lists, with the highest precision mean of 0.692. Finally, for the SLM, we found a correlation between the users\u27 average Retweets counts and their tweets\u27 perplexity values, which consolidates the hypothesis that SLM can be trained to detect the highly retweeted tweets. However, the use of the perplexity for identifying influential users resulted in very low precision values. The contributions of this thesis can be summarized into the following. A method to classify the personal accounts was proposed. The features that help detecting influential users were identified to be the Followers count, the Average Retweets count, the Average Retweet Frequency and the Age_Activity combination. Two methods for identifying the influential users were proposed. Finally, the simplistic approach using SLM did not produce good results, and there is still a lot of work to be done for the SLM to be used for identifying influential users

    The Usage of Personal Data as Content in Integrated Marketing Communications

    Get PDF
    Personal user data has proven extremely valuable for firms in the digital age. The wealth of data available to firms has provided unprecedented access into the world of the consumer. Companies hoping to capitalize on their user's data have turned to several interesting outlets. This research addresses the repurposing of user data as content in marketing. By analyzing four cases of data presented as marketing communications across two companies, this research provides new insights into the public release of private user data for marketing purposes. Four cases of personal data used in marketing communications were chosen specifically for their time proximity, characteristics of the sending firms, and their disparate outcomes. These instances of marketing communications, two by Spotify and two by Netflix, were released during November and December of 2017 and each resulted in a diverse range of public opinion. An analysis of these cases was conducted using the comprehensive framework of integrated marketing communications (Tafesse & Kitchen, 2017). There is a significant difference in the perceptual outcomes of integrated marketing communication campaigns which display user data as content. This analysis provides insights into the characteristics of marketing communications and how their outcomes fit into broader marketing strategies. These case studies provide opportunities for marketers to improve their campaigns in line with their desired audience outcome. Patterns of scope, strategy, mode, and outcome do not suggest success or failure in the context of marketing communications, but rather a set of insights marketers should keep in mind when pursuing communication strategies which harness personal user data.No embargoAcademic Major: Marketin

    Predicting Influencer Virality on Twitter

    Get PDF
    The ability to successfully predict virality on Twitter holds great potential as a resource for Twitter influencers, enabling the development of more sophisticated strategies for audience engagement, audience monetization, and information sharing. To our knowledge, focusing exclusively on tweets posted by influencers is a novel context for studying Twitter virality. We find, among feature categories traditionally considered in the literature, that combining categories covering a range of information performs better than models only incorporating individual feature categories. Moreover, our general predictive model, encompassing a range of feature categories, achieves a prediction accuracy of 68% for influencer virality. We also investigate the role of influencer audiences in predicting virality, a topic we believe to be understudied in the literature. We suspect that incorporating audience information will allow us to better discriminate between virality classes, thus leading to better predictions. We pursue two different approaches, resulting in 10 different predictive models that leverage influencer audience information in addition to traditional feature categories. Both of our attempts to incorporate audience information plateau at an accuracy of approximately 61%, roughly a 7% decrease in performance compared to our general predictive model. We conclude that we are unable to find experimental evidence to support our claim that incorporating influencer audience information will improve virality predictions. Nonetheless, the performance of our general model holds promise for the deployment of a tool that allows influencers to reap the benefits of virality prediction. As stronger performance from the underlying model would make this tool more useful in practice to influencers, improving the predictive performance of our general model is a cornerstone of future work

    Data Mining Algorithms for Internet Data: from Transport to Application Layer

    Get PDF
    Nowadays we live in a data-driven world. Advances in data generation, collection and storage technology have enabled organizations to gather data sets of massive size. Data mining is a discipline that blends traditional data analysis methods with sophisticated algorithms to handle the challenges posed by these new types of data sets. The Internet is a complex and dynamic system with new protocols and applications that arise at a constant pace. All these characteristics designate the Internet a valuable and challenging data source and application domain for a research activity, both looking at Transport layer, analyzing network tra c flows, and going up to Application layer, focusing on the ever-growing next generation web services: blogs, micro-blogs, on-line social networks, photo sharing services and many other applications (e.g., Twitter, Facebook, Flickr, etc.). In this thesis work we focus on the study, design and development of novel algorithms and frameworks to support large scale data mining activities over huge and heterogeneous data volumes, with a particular focus on Internet data as data source and targeting network tra c classification, on-line social network analysis, recommendation systems and cloud services and Big data

    IDENTIFYING INFLUENCERS FOR PSYOP

    Get PDF
    Social media has become one of the primary modes of communication throughout the world, especially in developed countries. Nearly every user of social media in its various forms or applications has an audience he or she can influence and a set of influencers from which he or she receives information. U.S. Psychological Operations (PSYOP) personnel focus on influencing foreign target audiences in their audience’s own language but have been slow to adapt to the use of social media as a means of influence. Drawing from principles used in influencer marketing, we ask, How can U.S. PSYOP forces and their partners best identify social media influencers with whom they can partner in their effort to change the behavior of foreign target audiences? Through this study, we identified the main factors for influence on social media using both quantitative and qualitative analysis and developed a decision-making tool to identify the key communicators, in particular social media influencers, who can elicit the desired behavioral change in a target audience. The seven-category influencer scorecard we created provides a low-tech, situationally adaptable method for identifying influencers with whom U.S. PSYOP can partner to execute a PSYOP series.Major, United States ArmyMajor, United States ArmyApproved for public release. Distribution is unlimited

    Bootstrapping Web Archive Collections From Micro-Collections in Social Media

    Get PDF
    In a Web plagued by disappearing resources, Web archive collections provide a valuable means of preserving Web resources important to the study of past events. These archived collections start with seed URIs (Uniform Resource Identifiers) hand-selected by curators. Curators produce high quality seeds by removing non-relevant URIs and adding URIs from credible and authoritative sources, but this ability comes at a cost: it is time consuming to collect these seeds. The result of this is a shortage of curators, a lack of Web archive collections for various important news events, and a need for an automatic system for generating seeds. We investigate the problem of generating seed URIs automatically, and explore the state of the art in collection building and seed selection. Attempts toward generating seeds automatically have mostly relied on scraping Web or social media Search Engine Result Pages (SERPs). In this work, we introduce a novel source for generating seeds from URIs in the threaded conversations of social media posts created by single or multiple users. Users on social media sites routinely create and share narratives about news events consisting of hand-selected URIs of news stories, tweets, videos, etc. In this work, we call these posts Micro-collections, whether shared on Reddit or Twitter, and we consider them as an important source for seeds. This is because, the effort taken to create Micro-collections is an indication of editorial activity and a demonstration of domain expertise. Therefore, we propose a model for generating seeds from Micro-collections. We begin by introducing a simple vocabulary, called post class for describing social media posts across different platforms, and extract seeds from the Micro-collections post class. We further propose Quality Proxies for seeds by extending the idea of collection comparison to evaluation, and present our Micro-collection/Quality Proxy (MCQP) framework for bootstrapping Web archive collections from Micro-collections in social media
    corecore