3,750 research outputs found
Dynamic Influence Networks for Rule-based Models
We introduce the Dynamic Influence Network (DIN), a novel visual analytics
technique for representing and analyzing rule-based models of protein-protein
interaction networks. Rule-based modeling has proved instrumental in developing
biological models that are concise, comprehensible, easily extensible, and that
mitigate the combinatorial complexity of multi-state and multi-component
biological molecules. Our technique visualizes the dynamics of these rules as
they evolve over time. Using the data produced by KaSim, an open source
stochastic simulator of rule-based models written in the Kappa language, DINs
provide a node-link diagram that represents the influence that each rule has on
the other rules. That is, rather than representing individual biological
components or types, we instead represent the rules about them (as nodes) and
the current influence of these rules (as links). Using our interactive DIN-Viz
software tool, researchers are able to query this dynamic network to find
meaningful patterns about biological processes, and to identify salient aspects
of complex rule-based models. To evaluate the effectiveness of our approach, we
investigate a simulation of a circadian clock model that illustrates the
oscillatory behavior of the KaiC protein phosphorylation cycle.Comment: Accepted to TVCG, in pres
Driver-centric Risk Object Identification
A massive number of traffic fatalities are due to driver errors. To reduce
fatalities, developing intelligent driving systems assisting drivers to
identify potential risks is in urgent need. Risky situations are generally
defined based on collision prediction in existing research. However, collisions
are only one type of risk in traffic scenarios. We believe a more generic
definition is required. In this work, we propose a novel driver-centric
definition of risk, i.e., risky objects influence driver behavior. Based on
this definition, a new task called risk object identification is introduced. We
formulate the task as a cause-effect problem and present a novel two-stage risk
object identification framework, taking inspiration from models of situation
awareness and causal inference. A driver-centric Risk Object Identification
(ROI) dataset is curated to evaluate the proposed system. We demonstrate
state-of-the-art risk object identification performance compared with strong
baselines on the ROI dataset. In addition, we conduct extensive ablative
studies to justify our design choices.Comment: Submitted to TPAM
Estimating Emotion Contagion on Social Media via Localized Diffusion in Dynamic Graphs
We present a computational approach for estimating emotion contagion on
social media networks. Built on a foundation of psychology literature, our
approach estimates the degree to which the perceivers' emotional states
(positive or negative) start to match those of the expressors, based on the
latter's content. We use a combination of deep learning and social network
analysis to model emotion contagion as a diffusion process in dynamic social
network graphs, taking into consideration key aspects like causality,
homophily, and interference. We evaluate our approach on user behavior data
obtained from a popular social media platform for sharing short videos. We
analyze the behavior of 48 users over a span of 8 weeks (over 200k audio-visual
short posts analyzed) and estimate how contagious the users with whom they
engage with are on social media. As per the theory of diffusion, we account for
the videos a user watches during this time (inflow) and the daily engagements;
liking, sharing, downloading or creating new videos (outflow) to estimate
contagion. To validate our approach and analysis, we obtain human feedback on
these 48 social media platform users with an online study by collecting
responses of about 150 participants. We report users who interact with more
number of creators on the platform are 12% less prone to contagion, and those
who consume more content of `negative' sentiment are 23% more prone to
contagion. We will publicly release our code upon acceptance
The Origin of Data: Enabling the Determination of Provenance in Multi-institutional Scientific Systems through the Documentation of Processes
The Oxford English Dictionary defines provenance as (i) the fact of coming from some particular source or quarter; origin, derivation. (ii) the history or pedigree of a work of art, manuscript, rare book, etc.; concr., a record of the ultimate derivation and passage of an item through its various owners. In art, knowing the provenance of an artwork lends weight and authority to it while providing a context for curators and the public to understand and appreciate the work’s value. Without such a documented history, the work may be misunderstood, unappreciated, or undervalued. In computer systems, knowing the provenance of digital objects would provide them with greater weight, authority, and context just as it does for works of art. Specifically, if the provenance of digital objects could be determined, then users could understand how documents were produced, how simulation results were generated, and why decisions were made. Provenance is of particular importance in science, where experimental results are reused, reproduced, and verified. However, science is increasingly being done through large-scale collaborations that span multiple institutions, which makes the problem of determining the provenance of scientific results significantly harder. Current approaches to this problem are not designed specifically for multi-institutional scientific systems and their evolution towards greater dynamic and peer-to-peer topologies. Therefore, this thesis advocates a new approach, namely, that through the autonomous creation, scalable recording, and principled organisation of documentation of systems’ processes, the determination of the provenance of results produced by complex multi-institutional scientific systems is enabled. The dissertation makes four contributions to the state of the art. First is the idea that provenance is a query performed over documentation of a system’s past process. Thus, the problem is one of how to collect and collate documentation from multiple distributed sources and organise it in a manner that enables the provenance of a digital object to be determined. Second is an open, generic, shared, principled data model for documentation of processes, which enables its collation so that it provides high-quality evidence that a system’s processes occurred. Once documentation has been created, it is recorded into specialised repositories called provenance stores using a formally specified protocol, which ensures documentation has high-quality characteristics. Furthermore, patterns and techniques are given to permit the distributed deployment of provenance stores. The protocol and patterns are the third contribution. The fourth contribution is a characterisation of the use of documentation of process to answer questions related to the provenance of digital objects and the impact recording has on application performance. Specifically, in the context of a bioinformatics case study, it is shown that six different provenance use cases are answered given an overhead of 13% on experiment run-time. Beyond the case study, the solution has been applied to other applications including fault tolerance in service-oriented systems, aerospace engineering, and organ transplant management
Investigating system intrusions with data provenance analytics
To aid threat detection and investigation, enterprises are increasingly relying on commercially available security solutions, such as Intrusion Detection Systems (IDS) and Endpoint Detection and Response (EDR) tools. These security solutions first collect and analyze audit logs throughout the enterprise and then generate threat alerts when suspicious activities occur. Later, security analysts investigate those threat alerts to separate false alarms from true attacks by extracting contextual history from the audit logs, i.e., the trail of events that caused the threat alert.
Unfortunately, investigating threats in enterprises is a notoriously difficult task, even for expert analysts, due to two main challenges. First, existing enterprise security solutions are optimized to miss as few threats as possible – as a result, they generate an overwhelming volume of false alerts, creating a backlog of investigation tasks. Second, modern computing systems are operationally complex that produce an enormous volume of audit logs per day, making it difficult to correlate events for threats that span across multiple processes, applications, and hosts.
In this dissertation, I propose leveraging data provenance analytics to address the challenges mentioned above. I present five provenance-based techniques that enable system defenders to effectively and efficiently investigate malicious behaviors in enterprise settings. First, I present NoDoze, an alert triage system that automatically prioritizes generated alerts based on their anomalous contextual history. Following that, RapSheet brings benefits of data provenance to commercial EDR tools and provides compact visualization of multi-stage attacks to system defenders. Swift then realized a provenance graph database that generates contextual history around generated alerts in real-time even when analyzing audit logs containing tens of millions of events. Finally, OmegaLog and Zeek Agent introduced the vision of universal provenance analysis, which unifies all forensically relevant provenance information on the system regardless of their layer of origin, improving investigation capabilities
- …