9,484 research outputs found
Snapshot Processing in Streaming Environments
Computational issues related to streaming data, and in particular the monitoring and rapid correlation of multiple sources of streaming data, are becoming increasingly important in contexts ranging from business processes to crisis detection. For example, a government system to detect bioterror attacks must correlate multiple streams of possibly low-confidence data from sensors and local and national public health information networks with cues from indicators such as news and government sources indicating geographical locations, tactics and timing of possible attacks. The results of this correlation trigger appropriate responses, such as flagging information for more in-depth analysis or sending alerts to public health officials. Monitoring and correlation applications of this type are ideal for deployment on distributed computing grids, because they have high transaction throughput, require low latency, and can be partitioned into sets of small communicating computations with regular communication patterns. An important consideration in these applications is the need to ensure that, at any given time, computations are carried out on an accurate - or at least close to accurate - picture of the environment being monitored. One way of doing this, which we call snapshot processing, is to treat collections of events that occur at approximately the same time as representing a global snapshot - a valid state - of the environment. Computation on the resulting series of snapshots is much like computation on a real-time video of the entire environment. We briefly describe our model for these stream processing computations and introduce the concept of snapshot processin
Lightweight Asynchronous Snapshots for Distributed Dataflows
Distributed stateful stream processing enables the deployment and execution
of large scale continuous computations in the cloud, targeting both low latency
and high throughput. One of the most fundamental challenges of this paradigm is
providing processing guarantees under potential failures. Existing approaches
rely on periodic global state snapshots that can be used for failure recovery.
Those approaches suffer from two main drawbacks. First, they often stall the
overall computation which impacts ingestion. Second, they eagerly persist all
records in transit along with the operation states which results in larger
snapshots than required. In this work we propose Asynchronous Barrier
Snapshotting (ABS), a lightweight algorithm suited for modern dataflow
execution engines that minimises space requirements. ABS persists only operator
states on acyclic execution topologies while keeping a minimal record log on
cyclic dataflows. We implemented ABS on Apache Flink, a distributed analytics
engine that supports stateful stream processing. Our evaluation shows that our
algorithm does not have a heavy impact on the execution, maintaining linear
scalability and performing well with frequent snapshots.Comment: 8 pages, 7 figure
Exploring sensor data management
The increasing availability of cheap, small, low-power sensor hardware and the ubiquity of wired and wireless networks has led to the prediction that `smart evironments' will emerge in the near future. The sensors in these environments collect detailed information about the situation people are in, which is used to enhance information-processing applications that are present on their mobile and `ambient' devices.\ud
\ud
Bridging the gap between sensor data and application information poses new requirements to data management. This report discusses what these requirements are and documents ongoing research that explores ways of thinking about data management suited to these new requirements: a more sophisticated control flow model, data models that incorporate time, and ways to deal with the uncertainty in sensor data
Publishing LO(D)D: Linked Open (Dynamic) Data for Smart Sensing and Measuring Environments
The paper proposes a distributed framework that provides a systematic way to publish environment data which is being updated continuously; such updates might be issued at speciïŹc time intervals or bound to some environment- speciïŹc event. The framework targets smart environments having networks of devices and sensors which are interacting with each other and with their respective environments to gather and generate data and willing to publish this data. This paper addresses the issues of supporting the data publishers to maintain up-to-date and machine understandable representations, separation of views (static or dynamic data) and delivering up-to-date information to data consumers in real time, helping data consumers to keep track of changes triggered from diverse environments and keeping track of evolution of the smart environment. The paper also describes a prototype implementation of the proposed architecture. A preliminary use case implementation over a real energy metering infrastructure is also provided in the paper to prove the feasibility of the architectur
Desktop Sharing Portal
Desktop sharing technologies have existed since the late 80s. It is often used in scenarios where collaborative computing is beneficial to participants in the shared environment by the control of the more knowledgeable party. But the steps required in establishing a session is often cumbersome to many. Selection of a sharing method, obtaining sharing targetâs network address, sharing toolâs desired ports, and firewall issues are major hurdles for a typical non-IT user. In this project, I have constructed a web-portal that helps collaborators to easily locate each other and initialize sharing sessions. The portal that I developed enables collaborated sessions to start as easily as browsing to a URL of the sharing service provider, with no need to download or follow installation instructions on either partyâs end. In addition, I have added video conferencing and audio streaming capability to bring better collaborative and multimedia experience
Analyzing and Modeling the Performance of the HemeLB Lattice-Boltzmann Simulation Environment
We investigate the performance of the HemeLB lattice-Boltzmann simulator for
cerebrovascular blood flow, aimed at providing timely and clinically relevant
assistance to neurosurgeons. HemeLB is optimised for sparse geometries,
supports interactive use, and scales well to 32,768 cores for problems with ~81
million lattice sites. We obtain a maximum performance of 29.5 billion site
updates per second, with only an 11% slowdown for highly sparse problems (5%
fluid fraction). We present steering and visualisation performance measurements
and provide a model which allows users to predict the performance, thereby
determining how to run simulations with maximum accuracy within time
constraints.Comment: Accepted by the Journal of Computational Science. 33 pages, 16
figures, 7 table
- âŠ