51 research outputs found

    Human-agent collectives

    No full text
    We live in a world where a host of computer systems, distributed throughout our physical and information environments, are increasingly implicated in our everyday actions. Computer technologies impact all aspects of our lives and our relationship with the digital has fundamentally altered as computers have moved out of the workplace and away from the desktop. Networked computers, tablets, phones and personal devices are now commonplace, as are an increasingly diverse set of digital devices built into the world around us. Data and information is generated at unprecedented speeds and volumes from an increasingly diverse range of sources. It is then combined in unforeseen ways, limited only by human imagination. People’s activities and collaborations are becoming ever more dependent upon and intertwined with this ubiquitous information substrate. As these trends continue apace, it is becoming apparent that many endeavours involve the symbiotic interleaving of humans and computers. Moreover, the emergence of these close-knit partnerships is inducing profound change. Rather than issuing instructions to passive machines that wait until they are asked before doing anything, we will work in tandem with highly inter-connected computational components that act autonomously and intelligently (aka agents). As a consequence, greater attention needs to be given to the balance of control between people and machines. In many situations, humans will be in charge and agents will predominantly act in a supporting role. In other cases, however, the agents will be in control and humans will play the supporting role. We term this emerging class of systems human-agent collectives (HACs) to reflect the close partnership and the flexible social interactions between the humans and the computers. As well as exhibiting increased autonomy, such systems will be inherently open and social. This means the participants will need to continually and flexibly establish and manage a range of social relationships. Thus, depending on the task at hand, different constellations of people, resources, and information will need to come together, operate in a coordinated fashion, and then disband. The openness and presence of many distinct stakeholders means participation will be motivated by a broad range of incentives rather than diktat. This article outlines the key research challenges involved in developing a comprehensive understanding of HACs. To illuminate this agenda, a nascent application in the domain of disaster response is presented

    Web-scale provenance reconstruction of implicit information diffusion on social media

    Get PDF
    Fast, massive, and viral data diffused on social media affects a large share of the online population, and thus, the (prospective) information diffusion mechanisms behind it are of great interest to researchers. The (retrospective) provenance of such data is equally important because it contributes to the understanding of the relevance and trustworthiness of the information. Furthermore, computing provenance in a timely way is crucial for particular use cases and practitioners, such as online journalists that promptly need to assess specific pieces of information. Social media currently provide insufficient mechanisms for provenance tracking, publication and generation, while state-of-the-art on social media research focuses mainly on explicit diffusion mechanisms (like retweets in Twitter or reshares in Facebook).The implicit diffusion mechanisms remain understudied due to the difficulties of being captured and properly understood. From a technical side, the state of the art for provenance reconstruction evaluates small datasets after the fact, sidestepping requirements for scale and speed of current social media data. In this paper, we investigate the mechanisms of implicit information diffusion by computing its fine-grained provenance. We prove that explicit mechanisms are insufficient to capture influence and our analysis unravels a significant part of implicit interactions and influence in social media. Our approach works incrementally and can be scaled up to cover a truly Web-scale scenario like major events. We can process datasets consisting of up to several millions of messages on a single machine at rates that cover bursty behaviour, without compromising result quality. By doing that, we provide to online journalists and social media users in general, fine grained provenance reconstruction which sheds lights on implicit interactions not captured by social media providers. These results are provided in an online fashion which also allows for fast relevance and trustworthiness assessment

    Enabling automatic provenance-based trust assessment of web content

    Get PDF

    Provenance : from long-term preservation to query federation and grid reasoning

    Get PDF

    High-Fidelity Provenance:Exploring the Intersection of Provenance and Security

    Get PDF
    In the past 25 years, the World Wide Web has disrupted the way news are disseminated and consumed. However, the euphoria for the democratization of news publishing was soon followed by scepticism, as a new phenomenon emerged: fake news. With no gatekeepers to vouch for it, the veracity of the information served over the World Wide Web became a major public concern. The Reuters Digital News Report 2020 cites that in at least half of the EU member countries, 50% or more of the population is concerned about online fake news. To help address the problem of trust on information communi- cated over the World Wide Web, it has been proposed to also make available the provenance metadata of the information. Similar to artwork provenance, this would include a detailed track of how the information was created, updated and propagated to produce the result we read, as well as what agents—human or software—were involved in the process. However, keeping track of provenance information is a non-trivial task. Current approaches, are often of limited scope and may require modifying existing applications to also generate provenance information along with thei regular output. This thesis explores how provenance can be automatically tracked in an application-agnostic manner, without having to modify the individual applications. We frame provenance capture as a data flow analysis problem and explore the use of dynamic taint analysis in this context. Our work shows that this appoach improves on the quality of provenance captured compared to traditonal approaches, yielding what we term as high-fidelity provenance. We explore the performance cost of this approach and use deterministic record and replay to bring it down to a more practical level. Furthermore, we create and present the tooling necessary for the expanding the use of using deterministic record and replay for provenance analysis. The thesis concludes with an application of high-fidelity provenance as a tool for state-of-the art offensive security analysis, based on the intuition that software too can be misguided by "fake news". This demonstrates that the potential uses of high-fidelity provenance for security extend beyond traditional forensics analysis

    Abstracting PROV provenance graphs:A validity-preserving approach

    Get PDF
    Data provenance is a structured form of metadata designed to record the activities and datasets involved in data production, as well as their dependency relationships. The PROV data model, released by the W3C in 2013, defines a schema and constraints that together provide a structural and semantic foundation for provenance. This enables the interoperable exchange of provenance between data producers and consumers. When the provenance content is sensitive and subject to disclosure restrictions, however, a way of hiding parts of the provenance in a principled way before communicating it to certain parties is required. In this paper we present a provenance abstraction operator that achieves this goal. It maps a graphical representation of a PROV document PG1 to a new abstract version PG2, ensuring that (i) PG2 is a valid PROV graph, and (ii) the dependencies that appear in PG2 are justified by those that appear in PG1. These two properties ensure that further abstraction of abstract PROV graphs is possible. A guiding principle of the work is that of minimum damage: the resultant graph is altered as little as possible, while ensuring that the two properties are maintained. The operator developed is implemented as part of a user tool, described in a separate paper, that lets owners of sensitive provenance information control the abstraction by specifying an abstraction policy.</p

    Big Data Analytics in Static and Streaming Provenance

    Get PDF
    Thesis (Ph.D.) - Indiana University, Informatics and Computing,, 2016With recent technological and computational advances, scientists increasingly integrate sensors and model simulations to understand spatial, temporal, social, and ecological relationships at unprecedented scale. Data provenance traces relationships of entities over time, thus providing a unique view on over-time behavior under study. However, provenance can be overwhelming in both volume and complexity; the now forecasting potential of provenance creates additional demands. This dissertation focuses on Big Data analytics of static and streaming provenance. It develops filters and a non-preprocessing slicing technique for in-situ querying of static provenance. It presents a stream processing framework for online processing of provenance data at high receiving rate. While the former is sufficient for answering queries that are given prior to the application start (forward queries), the latter deals with queries whose targets are unknown beforehand (backward queries). Finally, it explores data mining on large collections of provenance and proposes a temporal representation of provenance that can reduce the high dimensionality while effectively supporting mining tasks like clustering, classification and association rules mining; and the temporal representation can be further applied to streaming provenance as well. The proposed techniques are verified through software prototypes applied to Big Data provenance captured from computer network data, weather models, ocean models, remote (satellite) imagery data, and agent-based simulations of agricultural decision making

    Automating interpretations of trustworthiness

    Get PDF

    Yavaa: supporting data workflows from discovery to visualization

    Get PDF
    Recent years have witness an increasing number of data silos being opened up both within organizations and to the general public: Scientists publish their raw data as supplements to articles or even standalone artifacts to enable others to verify and extend their work. Governments pass laws to open up formerly protected data treasures to improve accountability and transparency as well as to enable new business ideas based on this public good. Even companies share structured information about their products and services to advertise their use and thus increase revenue. Exploiting this wealth of information holds many challenges for users, though. Oftentimes data is provided as tables whose sheer endless rows of daunting numbers are barely accessible. InfoVis can mitigate this gap. However, offered visualization options are generally very limited and next to no support is given in applying any of them. The same holds true for data wrangling. Only very few options to adjust the data to the current needs and barely any protection are in place to prevent even the most obvious mistakes. When it comes to data from multiple providers, the situation gets even bleaker. Only recently tools emerged to search for datasets across institutional borders reasonably. Easy-to-use ways to combine these datasets are still missing, though. Finally, results generally lack proper documentation of their provenance. So even the most compelling visualizations can be called into question when their coming about remains unclear. The foundations for a vivid exchange and exploitation of open data are set, but the barrier of entry remains relatively high, especially for non-expert users. This thesis aims to lower that barrier by providing tools and assistance, reducing the amount of prior experience and skills required. It covers the whole workflow ranging from identifying proper datasets, over possible transformations, up until the export of the result in the form of suitable visualizations
    corecore