5,837 research outputs found

    Wikipedia mining of hidden links between political leaders

    Get PDF
    We describe a new method of reduced Google matrix which allows to establish direct and hidden links between a subset of nodes of a large directed network. This approach uses parallels with quantum scattering theory, developed for processes in nuclear and mesoscopic physics and quantum chaos. The method is applied to the Wikipedia networks in different language editions analyzing several groups of political leaders of USA, UK, Germany, France, Russia and G20. We demonstrate that this approach allows to recover reliably direct and hidden links among political leaders. We argue that the reduced Google matrix method can form the mathematical basis for studies in social and political sciences analyzing Leader-Members eXchang

    Capturing the influence of geopolitical ties from Wikipedia with reduced Google matrix

    Get PDF
    Interactions between countries originate from diverse aspects such as geographic proximity, trade, socio-cultural habits, language, religions, etc. Geopolitics studies the influence of a country’s geographic space on its political power and its relationships with other countries. This work reveals the potential of Wikipedia mining for geopolitical study. Actually, Wikipedia offers solid knowledge and strong correlations among countries by linking web pages together for different types of information (e.g. economical, historical, political, and many others). The major finding of this paper is to show that meaningful results on the influence of country ties can be extracted from the hyperlinked structure of Wikipedia. We leverage a novel stochastic matrix representation of Markov chains of complex directed networks called the reduced Google matrix theory. For a selected small size set of nodes, the reduced Google matrix concentrates direct and indirect links of the million-node sized Wikipedia network into a small Perron-Frobenius matrix keeping the PageRank probabilities of the global Wikipedia network. We perform a novel sensitivity analysis that leverages this reduced Google matrix to characterize the influence of relationships between countries from the global network. We apply this analysis to two chosen sets of countries (i.e. the set of 27 European Union countries and a set of 40 top worldwide countries). We show that with our sensitivity analysis we can exhibit easily very meaningful information on geopolitics from five different Wikipedia editions (English, Arabic, Russian, French and German)

    Data science methods for the analysis of controversial social dedia discussions

    Get PDF
    Social media communities like Reddit and Twitter allow users to express their views on topics of their interest, and to engage with other users who may share or oppose these views. This can lead to productive discussions towards a consensus, or to contended debates, where disagreements frequently arise. Prior work on such settings has primarily focused on identifying notable instances of antisocial behavior such as hate-speech and “trolling”, which represent possible threats to the health of a community. These, however, are exceptionally severe phenomena, and do not encompass controversies stemming from user debates, differences of opinions, and off-topic content, all of which can naturally come up in a discussion without going so far as to compromise its development. This dissertation proposes a framework for the systematic analysis of social media discussions that take place in the presence of controversial themes, disagreements, and mixed opinions from participating users. For this, we develop a feature-based model to describe key elements of a discussion, such as its salient topics, the level of activity from users, the sentiments it expresses, and the user feedback it receives. Initially, we build our feature model to characterize adversarial discussions surrounding political campaigns on Twitter, with a focus on the factual and sentimental nature of their topics and the role played by different users involved. We then extend our approach to Reddit discussions, leveraging community feedback signals to define a new notion of controversy and to highlight conversational archetypes that arise from frequent and interesting interaction patterns. We use our feature model to build logistic regression classifiers that can predict future instances of controversy in Reddit communities centered on politics, world news, sports, and personal relationships. Finally, our model also provides the basis for a comparison of different communities in the health domain, where topics and activity vary considerably despite their shared overall focus. In each of these cases, our framework provides insight into how user behavior can shape a community’s individual definition of controversy and its overall identity.Social-Media Communities wie Reddit und Twitter ermöglichen es Nutzern, ihre Ansichten zu eigenen Themen zu Ă€ußern und mit anderen Nutzern in Kontakt zu treten, die diese Ansichten teilen oder ablehnen. Dies kann zu produktiven Diskussionen mit einer Konsensbildung fĂŒhren oder zu strittigen Auseinandersetzungen ĂŒber auftretende Meinungsverschiedenheiten. FrĂŒhere Arbeiten zu diesem Komplex konzentrierten sich in erster Linie darauf, besondere FĂ€lle von asozialem Verhalten wie Hassrede und "Trolling" zu identifizieren, da diese eine Gefahr fĂŒr die GesprĂ€chskultur und den Wert einer Community darstellen. Die sind jedoch außergewöhnlich schwerwiegende PhĂ€nomene, die keinesfalls bei jeder Kontroverse auftreten die sich aus einfachen Diskussionen, Meinungsverschiedenheiten und themenfremden Inhalten ergeben. All diese Reibungspunkte können auch ganz natĂŒrlich in einer Diskussion auftauchen, ohne dass diese gleich den ganzen GesprĂ€chsverlauf gefĂ€hrden. Diese Dissertation stellt ein Framework fĂŒr die systematische Analyse von Social-Media Diskussionen vor, die vornehmlich von kontroversen Themen, strittigen Standpunkten und Meinungsverschiedenheiten der teilnehmenden Nutzer geprĂ€gt sind. Dazu entwickeln wir ein Feature-Modell, um SchlĂŒsselelemente einer Diskussion zu beschreiben. Dazu zĂ€hlen der AktivitĂ€tsgrad der Benutzer, die Wichtigkeit der einzelnen Aspekte, die Stimmung, die sie ausdrĂŒckt, und das Benutzerfeedback. ZunĂ€chst bauen wir unser Feature-Modell so auf, um bei Diskussionen gegensĂ€tzlicher politischer Kampagnen auf Twitter die oben genannten SchlĂŒsselelemente zu bestimmen. Der Schwerpunkt liegt dabei auf den sachlichen und emotionalen Aspekten der Themen im Bezug auf die Rollen verschiedener Nutzer. Anschließend erweitern wir unseren Ansatz auf Reddit-Diskussionen und nutzen das Community-Feedback, um einen neuen Begriff der Kontroverse zu definieren und Konversationsarchetypen hervorzuheben, die sich aus Interaktionsmustern ergeben. Wir nutzen unser Feature-Modell, um ein Logistischer Regression Verfahren zu entwickeln, das zukĂŒnftige Kontroversen in Reddit-Communities in den Themenbereichen Politik, Weltnachrichten, Sport und persönliche Beziehungen vorhersagen kann. Schlussendlich bietet unser Modell auch die Grundlage fĂŒr eine Vergleichbarkeit verschiedener Communities im Gesundheitsbereich, auch wenn dort die Themen und die NutzeraktivitĂ€t, trotz des gemeinsamen Gesamtfokus, erheblich variieren. In jedem der genannten Themenbereiche gibt unser Framework Erkenntnisgewinne, wie das Verhalten der Nutzer die spezifisch Definition von Kontroversen der Community prĂ€gt

    Measuring internet activity: a (selective) review of methods and metrics

    Get PDF
    Two Decades after the birth of the World Wide Web, more than two billion people around the world are Internet users. The digital landscape is littered with hints that the affordances of digital communications are being leveraged to transform life in profound and important ways. The reach and influence of digitally mediated activity grow by the day and touch upon all aspects of life, from health, education, and commerce to religion and governance. This trend demands that we seek answers to the biggest questions about how digitally mediated communication changes society and the role of different policies in helping or hindering the beneficial aspects of these changes. Yet despite the profusion of data the digital age has brought upon us—we now have access to a flood of information about the movements, relationships, purchasing decisions, interests, and intimate thoughts of people around the world—the distance between the great questions of the digital age and our understanding of the impact of digital communications on society remains large. A number of ongoing policy questions have emerged that beg for better empirical data and analyses upon which to base wider and more insightful perspectives on the mechanics of social, economic, and political life online. This paper seeks to describe the conceptual and practical impediments to measuring and understanding digital activity and highlights a sample of the many efforts to fill the gap between our incomplete understanding of digital life and the formidable policy questions related to developing a vibrant and healthy Internet that serves the public interest and contributes to human wellbeing. Our primary focus is on efforts to measure Internet activity, as we believe obtaining robust, accurate data is a necessary and valuable first step that will lead us closer to answering the vitally important questions of the digital realm. Even this step is challenging: the Internet is difficult to measure and monitor, and there is no simple aggregate measure of Internet activity—no GDP, no HDI. In the following section we present a framework for assessing efforts to document digital activity. The next three sections offer a summary and description of many of the ongoing projects that document digital activity, with two final sections devoted to discussion and conclusions

    On the Role of Social Identity and Cohesion in Characterizing Online Social Communities

    Get PDF
    Two prevailing theories for explaining social group or community structure are cohesion and identity. The social cohesion approach posits that social groups arise out of an aggregation of individuals that have mutual interpersonal attraction as they share common characteristics. These characteristics can range from common interests to kinship ties and from social values to ethnic backgrounds. In contrast, the social identity approach posits that an individual is likely to join a group based on an intrinsic self-evaluation at a cognitive or perceptual level. In other words group members typically share an awareness of a common category membership. In this work we seek to understand the role of these two contrasting theories in explaining the behavior and stability of social communities in Twitter. A specific focal point of our work is to understand the role of these theories in disparate contexts ranging from disaster response to socio-political activism. We extract social identity and social cohesion features-of-interest for large scale datasets of five real-world events and examine the effectiveness of such features in capturing behavioral characteristics and the stability of groups. We also propose a novel measure of social group sustainability based on the divergence in group discussion. Our main findings are: 1) Sharing of social identities (especially physical location) among group members has a positive impact on group sustainability, 2) Structural cohesion (represented by high group density and low average shortest path length) is a strong indicator of group sustainability, and 3) Event characteristics play a role in shaping group sustainability, as social groups in transient events behave differently from groups in events that last longer

    Dirichlet belief networks for topic structure learning

    Full text link
    Recently, considerable research effort has been devoted to developing deep architectures for topic models to learn topic structures. Although several deep models have been proposed to learn better topic proportions of documents, how to leverage the benefits of deep structures for learning word distributions of topics has not yet been rigorously studied. Here we propose a new multi-layer generative process on word distributions of topics, where each layer consists of a set of topics and each topic is drawn from a mixture of the topics of the layer above. As the topics in all layers can be directly interpreted by words, the proposed model is able to discover interpretable topic hierarchies. As a self-contained module, our model can be flexibly adapted to different kinds of topic models to improve their modelling accuracy and interpretability. Extensive experiments on text corpora demonstrate the advantages of the proposed model.Comment: accepted in NIPS 201

    Social media analytics: a survey of techniques, tools and platforms

    Get PDF
    This paper is written for (social science) researchers seeking to analyze the wealth of social media now available. It presents a comprehensive review of software tools for social networking media, wikis, really simple syndication feeds, blogs, newsgroups, chat and news feeds. For completeness, it also includes introductions to social media scraping, storage, data cleaning and sentiment analysis. Although principally a review, the paper also provides a methodology and a critique of social media tools. Analyzing social media, in particular Twitter feeds for sentiment analysis, has become a major research and business activity due to the availability of web-based application programming interfaces (APIs) provided by Twitter, Facebook and News services. This has led to an ‘explosion’ of data services, software tools for scraping and analysis and social media analytics platforms. It is also a research area undergoing rapid change and evolution due to commercial pressures and the potential for using social media data for computational (social science) research. Using a simple taxonomy, this paper provides a review of leading software tools and how to use them to scrape, cleanse and analyze the spectrum of social media. In addition, it discussed the requirement of an experimental computational environment for social media research and presents as an illustration the system architecture of a social media (analytics) platform built by University College London. The principal contribution of this paper is to provide an overview (including code fragments) for scientists seeking to utilize social media scraping and analytics either in their research or business. The data retrieval techniques that are presented in this paper are valid at the time of writing this paper (June 2014), but they are subject to change since social media data scraping APIs are rapidly changing

    Event-based Access to Historical Italian War Memoirs

    Full text link
    The progressive digitization of historical archives provides new, often domain specific, textual resources that report on facts and events which have happened in the past; among these, memoirs are a very common type of primary source. In this paper, we present an approach for extracting information from Italian historical war memoirs and turning it into structured knowledge. This is based on the semantic notions of events, participants and roles. We evaluate quantitatively each of the key-steps of our approach and provide a graph-based representation of the extracted knowledge, which allows to move between a Close and a Distant Reading of the collection.Comment: 23 pages, 6 figure
    • 

    corecore