35 research outputs found

    Reimagining Our World at Planetary Scale: The Big Data Future of Our Libraries

    Get PDF

    Can we forecast conflict? A framework for forecasting global human societal behavior using latent narrative indicators

    Get PDF
    The ability to successfully forecast impending societal unrest, from riots and protests to assassinations and coups, would fundamentally transform the ability of nations to proactively address instability around the world, intervening before unrest accelerates to conflict or prepositioning assets to enhance preventive activity. It would also enhance the ability of social scientists to quantitatively study the underpinnings of how and why grievances transition from agitated individuals to population-scale physical unrest. Recognizing this potential, the US government has funded research on “conflict early warning” and conflict forecasting for more than 40 years and current unclassified approaches incorporate nearly every imaginable type of data from telephone call records to traffic signals, tribal and cultural linkages to satellite imagery. Yet, current approaches have yielded poor outcomes: one recent study showed that the top models of civil war onset miss 90% of the cases they supposedly explain. At the same time, emerging work in the economics disciplines is finding that new approaches, especially those based on latent linguistic indicators, can offer significant predictive power of future physical behavior. The information environment around us records not just factual information, but also a rich array of cultural and contextual influences that offer a window into national consciousness. A growing body of literature has shown that measuring the linguistic dimensions of this real–time consciousness can accurately forecast many broad social behaviors, ranging from box office sales to the stock market itself. In fact, the United States intelligence community believes so strongly in the ability of surface-level indicators to forecast future physical unrest more successfully than current approaches, it now has an entire program devoted to such “Open Source Indicators.” Yet, few studies have explored the application of these methods to the forecasting of non-economic human societal behavior and have primarily focused on large-bore events such as militarized disputes, epidemics, and regime change. One of the reasons for this is the lack of high-resolution cross-national longitudinal data on societal conflict equivalent to the daily indicators available in economics research. This dissertation therefore presents a novel framework for evaluating these new classes of latent-based forecasting measures on high-resolution geographically-enriched quantitative databases of human behavior. To demonstrate this framework, an archive of 4.7 million news articles totaling 1.3 billion words, consisting of the entirety of international news coverage from Agence France Presse, the Associated Press, and Xinhua over the last 30 years, is used to construct a database of more than 29 million global events in over 300 categories using the TABARI coding system and CAMEO event taxonomy, resulting the largest event database created in the academic literature. The framework is then applied to examine the hypothesis of latent forecasting as a classification problem, demonstrating the ability of a simple example-based classifier to not only return potentially actionable forecasts from latent discourse indicators, but to quantitatively model the topical traces of the metanarratives that underlie them. The results of this dissertation demonstrate that this new framework provides a powerful new evaluative environment for exploring the emerging class of latent indicators and modeling approaches and that even rudimentary classification-based models may have significant forecasting potential

    The sound of revolution: BBC monitoring and the Hungarian uprising, 1956

    Get PDF
    Radio played a vitally important role during the 1956 Hungarian uprising: as an information service, diplomatic interlocutor and cultural mediator. Broadcasters and the authorities that stood behind them on both sides of the Iron Curtain mapped, interpreted and, at times, appeared to influence the course of events on the ground. The BBC Monitoring Service Transcription Collection offers an essential and unexplored perspective on the mediated experience of the Hungarian uprising, in the context of the wider political warfare battle of the cold war

    State Control and the Effects of Foreign Relations on Bilateral Trade

    Get PDF
    Do states use trade to reward and punish partners? WTO rules and the pressures of globalization restrict states’ capacity to manipulate trade policies, but we argue that governments can link political goals with economic outcomes using less direct avenues of inïŹ‚uence over ïŹrm behavior. Where governments intervene in markets, politicization of trade is likely to occur. In this paper, we examine one important form of government control: state ownership of ïŹrms. Taking China and India as examples, we use bilateral trade data by ïŹrm ownership type, as well as measures of bilateral political relations based on diplomatic events and UN voting to estimate the effect of political relations on import and export ïŹ‚ows. Our results support the hypothesis that imports controlled by state-owned enterprises (SOEs) exhibit stronger responsiveness to political relations than imports controlled by private enterprises. A more nuanced picture emerges for exports; while India’s exports through SOEs are more responsive to political tensions than its ïŹ‚ows through private entities, the opposite is true for China. This research holds broader implications for how we should think about the relationship between political and economic relations going forward, especially as a number of countries with partially state-controlled economies gain strength in the global economy

    Morning Keynote Address

    No full text
    What happens when massive computing power brings together an ever-growing cross-section of the world’s information in realtime, from news media to social media, books to academic literature, the world’s libraries to the web itself, machine translates all of that material as it arrives, and applies a vast array of algorithms to identify the events and emotions, actors and narratives and their myriad connections that define the planet to create a living silicon replica of global society? The GDELT Project (http://gdeltproject.org/), supported by Alphabet’s Jigsaw (formerly Google Ideas), is the largest open data initiative in the world focusing on cataloging and modeling global human society, offering a first glimpse at what this emerging “big data” understanding of society looks like. Operating the world’s largest open deployments of streaming machine translation, sentiment analysis, geocoding, image analysis and event identification, coupled with perhaps the world’s largest program to catalog local media, the GDELT Project monitors worldwide news media, emphasizing small local outlets, live machine translating all coverage it monitors in 65 languages, flagging mentions of people and organizations, cataloging relevant imagery, video, and social posts, converting textual mentions of location to mappable geographic coordinates, identifying millions of themes and thousands of emotions, extracting over 300 categories of physical events, collaborating with the Internet Archive to preserve online news and making all of this available in a free open data firehose of human society. This is coupled with a massive socio-cultural contextualization dataset codified from more than 21 billion words of academic literature spanning most unclassified US Government publications, the open web, and more than 2,200 journals representing the majority of humanities and social sciences research on Africa and the Middle East over the last half century. The world’s largest open deep learning image cataloging initiative, totaling more than a quarter billion images, inventories the world’s news imagery in realtime, identifying the objects, activities, locations, words and emotions defining the world’s myriad visual narratives and allowing them for the first time to be explored alongside traditional textual narratives. Used by governments, NGOs, scholars, journalists, and ordinary citizens across the world to identify breaking situations, map evolving conflicts, model the undercurrents of unrest, explore the flow of ideas and narratives across borders, and even forecast future unrest, the GDELT Project constructs a realtime global catalog of behavior and beliefs across every country, connecting the world’s information into a single massive ever-evolving realtime network capturing what\u27s happening around the world, what its context is and who\u27s involved, and how the world is feeling about it, every single day. Here’s what it looks like to conduct data analytics at a truly planetary scale and the incredible new insights we gain about the daily heartbeat of our global world and what we can learn about the role of libraries in our big data future

    World of Wikipedia

    Get PDF
    See history unfold through Wikipedia through space and time. This animation sequence shows the view of world history 1800-2012 captured by the English-language Wikipedia. Every mention of a location or date anywhere in any article across all four million Wikipedia articles was extracted and each location connected to the closest date to place it on a map. All locations mentioned in an article together with the same year are connected together. Thus, what you see is what Wikipedia had to say about each year 1800-present: which locations were mentioned the most and which locations were mentioned alongside each other, showing our evolving ever-connected world. Learn more about the project at http://www.sgi.com/go/wikipedia/Intensity: This sequence renders the intensity of global connections. Every connection between every pair of cities each year is displayed, using transparency so that areas with few links are dim, while areas with many links are bright.Tone: This sequence colors each city and link by the average tone of articles mentioning it, from bright red (highly negative) to bright green (highly positive). To make the map clearer, only major links are displayed

    Mass book digitization: The deeper story of Google Books and the Open Content Alliance

    No full text
    The Google Books and Open Content Alliance (OCA) initiatives have become the poster children of the access digitization revolution. With their sights firmly set on creating digital copies of millions upon millions of books and making them available to the world for free, the two projects have captured the popular imagination. Yet, such scale comes at a price, and certain sacrifices must be made to achieve this volume. With its greater visibility, most studies have focused on Google Books, addressing limitations of its image and metadata quality. Yet, there has been surprisingly little comparative work of the two endeavors, exploring the relationship between these two peers and their deeper similarities, rather than their obvious surface differences. While the academic community has lauded OCA's "open" model and condemned the proprietary Google, all is not always as it seems. Upon delving deeper into the underpinnings of both projects, we find Google achieves greater transparency in many regards, while OCA's operational reality is more proprietary than often thought. Further, significant concerns are raised about the long-term sustainability of the OCA rights model, its metadata management, and its transparency that must be addressed as OCA moves forward
    corecore