10 research outputs found
BriskStream: Scaling Data Stream Processing on Shared-Memory Multicore Architectures
We introduce BriskStream, an in-memory data stream processing system (DSPSs)
specifically designed for modern shared-memory multicore architectures.
BriskStream's key contribution is an execution plan optimization paradigm,
namely RLAS, which takes relative-location (i.e., NUMA distance) of each pair
of producer-consumer operators into consideration. We propose a branch and
bound based approach with three heuristics to resolve the resulting nontrivial
optimization problem. The experimental evaluations demonstrate that BriskStream
yields much higher throughput and better scalability than existing DSPSs on
multi-core architectures when processing different types of workloads.Comment: To appear in SIGMOD'1
Continuous Workflows: From Model to Enactment System
Workflows are actively being used in both business and scientific domains to automate processes and facilitate collaboration. A workflow management (or enactment) system (WfMS) defines, creates and manages the execution of workflows on one or more workflow engines, which are able to interpret workflow definitions, allocate resources, interact with workflow participants and, where required, invoke the needed tools (e.g., databases, job schedulers, etc.) and applications. Traditional WfMSs and workflow design processes view the workflow as a one-time interaction with the various data sources, i.e., when a workflow is invoked, its steps are executed once and in-order. The fundamental underlying assumption has been that data sources are passive and all interactions are structured along the request/reply (query) model. Hence, traditional WfMS cannot effectively support business or scientific monitoring applications that require the processing of data streams such as those generated by sensing devices as well as mobile and web applications.
It is the hypothesis of this dissertation that Workflow Management Systems can be extended to support data stream semantics to enable monitoring applications. This includes the ability to apply flexible bounds on unbounded data streams and the ability to facilitate on-the-fly processing of bounded bundles of data (window semantics). To support this hypothesis this dissertation has produced new specifications, a design, an implementation and a thorough evaluation of a novel Continuous Workflows (CWf) model, which is backwards compatible with currently available workflow models. The CWf model was implemented in a CONtinuous workFLow ExeCution Engine, CONFLuEnCE, as an extension of Kepler, which is a popular scientific WfMS. The applicability of the CWf model in both scientific and business applications was demonstrated by utilizing CONFLuEnCE in Astroshelf to support live annotations (i.e., monitoring of astronomical data), and to support supply chain monitoring and management. The implementation of CONFLuEnCE led to the realization that different applications have different performance requirements and hence an integrated workflow scheduling framework is essential. Towards meeting this need, STAFiLOS, a Stream FLOw Scheduling framework for Continuous Workflows, was designed and implemented, within CONFLuEnCE. The performance of STAFiLOS was evaluated using the Linear Road Benchmark for continuous workflows
Scalable and fault-tolerant data stream processing on multi-core architectures
With increasing data volumes and velocity, many applications are shifting from the classical “process-after-store” paradigm to a stream processing model: data is produced and consumed as continuous streams. Stream processing captures latency-sensitive applications as diverse as credit card fraud detection and high-frequency trading. These applications are expressed as queries of algebraic operations (e.g., aggregation) over the most recent data using windows, i.e., finite evolving views over the input streams. To guarantee correct results, streaming applications require precise window semantics (e.g., temporal ordering) for operations that maintain state.
While high processing throughput and low latency are performance desiderata for stateful streaming applications, achieving both poses challenges. Computing the state of overlapping windows causes redundant aggregation operations: incremental execution (i.e., reusing previous results) reduces latency but prevents parallelization; at the same time, parallelizing window execution for stateful operations with precise semantics demands ordering guarantees and state access coordination. Finally, streams and state must be recovered to produce consistent and repeatable results in the event of failures.
Given the rise of shared-memory multi-core CPU architectures and high-speed networking, we argue that it is possible to address these challenges in a single node without compromising window semantics, performance, or fault-tolerance. In this thesis, we analyze, design, and implement stream processing engines (SPEs) that achieve high performance on multi-core architectures. To this end, we introduce new approaches for in-memory processing that address the previous challenges: (i) for overlapping windows, we provide a family of window aggregation techniques that enable computation sharing based on the algebraic properties of aggregation functions; (ii) for parallel window execution, we balance parallelism and incremental execution by developing abstractions for both and combining them to a novel design; and (iii) for reliable single-node execution, we enable strong fault-tolerance guarantees without sacrificing performance by reducing the required disk I/O bandwidth using a novel persistence model. We combine the above to implement an SPE that processes hundreds of millions of tuples per second with sub-second latencies. These results reveal the opportunity to reduce resource and maintenance footprint by replacing cluster-based SPEs with single-node deployments.Open Acces
Stateful data-parallel processing
Democratisation of data means that more people than ever are involved in the data analysis process. This is beneficial—it brings domain-specific knowledge from broad fields—but data scientists do not have adequate tools to write algorithms and execute them at scale. Processing models of current data-parallel processing systems, designed for scalability and fault tolerance, are stateless. Stateless processing facilitates capturing parallelisation opportunities and hides fault tolerance. However, data scientists want to write stateful programs—with explicit state that they can update, such as matrices in machine learning algorithms—and are used to imperative-style languages. These programs struggle to execute with high-performance in stateless data-parallel systems.
Representing state explicitly makes data-parallel processing at scale challenging. To achieve scalability, state must be distributed and coordinated across machines. In the event of failures, state must be recovered to provide correct results. We introduce stateful data-parallel processing that addresses the previous challenges by: (i) representing state as a first-class citizen so that a system can manipulate it; (ii) introducing two distributed mutable state abstractions for scalability; and (iii) an integrated approach to scale out and fault tolerance that recovers large state—spanning the memory of multiple machines. To support imperative-style programs a static analysis tool analyses Java programs that manipulate state and translates them to a representation that can execute on SEEP, an implementation of a stateful data-parallel processing model. SEEP is evaluated with stateful Big Data applications and shows comparable or better performance than state-of-the-art stateless systems.Open Acces
Task Scheduling in Data Stream Processing Systems
In the era of big data, with streaming applications such as social media, surveillance monitoring and real-time search generating large volumes of data, efficient Data Stream Processing Systems (DSPSs) have become essential. When designing an efficient DSPS, a number of challenges need to be considered including task allocation, scalability, fault tolerance, QoS, parallelism degree, and state management, among others.
In our research, we focus on task allocation as it has a significant impact on performance metrics such as data processing latency and system throughput. An application processed by DSPSs is represented as a Directed Acyclic Graph (DAG), where each vertex represents a task and the edges show the dataflow between the tasks. Task allocation can be defined as the assignment of the vertices in the DAG to the physical compute nodes such that the data movement between the nodes is minimised. Finding an optimal task placement for stream processing systems is NP-hard. Thus, approximate scheduling approaches are required to improve the performance of DSPSs.
In this thesis, we present our three proposed schedulers, each having a different heuristic partitioning approach to minimise inter-node communication for either homogeneous or heterogeneous clusters. We demonstrate how each scheduler can efficiently assign groups of highly communicating tasks to compute nodes. Our schedulers are able to outperform two state-of-the-art schedulers for three micro-benchmarks and two real-world applications, increasing throughput and reducing data processing latency as a result of a better task placement
Recommended from our members
Automated Negotiation for Complex Multi-Agent Resource Allocation
The problem of constructing and analyzing systems of intelligent, autonomous agents is becoming more and more important. These agents may include people, physical robots, virtual humans, software programs acting on behalf of human beings, or sensors. In a large class of multi-agent scenarios, agents may have different capabilities, preferences, objectives, and constraints. Therefore, efficient allocation of resources among multiple agents is often difficult to achieve. Automated negotiation (bargaining) is the most widely used approach for multi-agent resource allocation and it has received increasing attention in the recent years. However, information uncertainty, existence of multiple contracting partners and competitors, agents\u27 incentive to maximize individual utilities, and market dynamics make it difficult to calculate agents\u27 rational equilibrium negotiation strategies and develop successful negotiation agents behaving well in practice. To this end, this thesis is concerned with analyzing agents\u27 rational behavior and developing negotiation strategies for a range of complex negotiation contexts. First, we consider the problem of finding agents\u27 rational strategies in bargaining with incomplete information. We focus on the principal alternating-offers finite horizon bargaining protocol with one-sided uncertainty regarding agents\u27 reserve prices. We provide an algorithm based on the combination of game theoretic analysis and search techniques which finds agents\u27 equilibrium in pure strategies when they exist. Our approach is sound, complete and, in principle, can be applied to other uncertainty settings. Simulation results show that there is at least one pure strategy sequential equilibrium in 99.7% of various scenarios. In addition, agents with equilibrium strategies achieved higher utilities than agents with heuristic strategies. Next, we extend the alternating-offers protocol to handle concurrent negotiations in which each agent has multiple trading opportunities and faces market competition. We provide an algorithm based on backward induction to compute the subgame perfect equilibrium of concurrent negotiation. We observe that agents\u27 bargaining power are affected by the proposing ordering and market competition and for a large subset of the space of the parameters, agents\u27 equilibrium strategies depend on the values of a small number of parameters. We also extend our algorithm to find a pure strategy sequential equilibrium in concurrent negotiations where there is one-sided uncertainty regarding the reserve price of one agent. Third, we present the design and implementation of agents that concurrently negotiate with other entities for acquiring multiple resources. Negotiation agents are designed to adjust 1) the number of tentative agreements and 2) the amount of concession they are willing to make in response to changing market conditions and negotiation situations. In our approach, agents utilize a time-dependent negotiation strategy in which the reserve price of each resource is dynamically determined by 1) the likelihood that negotiation will not be successfully completed, 2) the expected agreement price of the resource, and 3) the expected number of final agreements. The negotiation deadline of each resource is determined by its relative scarcity. Since agents are permitted to decommit from agreements, a buyer may make more than one tentative agreement for each resource and the maximum number of tentative agreements is constrained by the market situation. Experimental results show that our negotiation strategy achieved significantly higher utilities than simpler strategies. Finally, we consider the problem of allocating networked resources in dynamic environment, such as cloud computing platforms, where providers strategically price resources to maximize their utility. While numerous auction-based approaches have been proposed in the literature, our work explores an alternative approach where providers and consumers negotiate resource leasing contracts. We propose a distributed negotiation mechanism where agents negotiate over both a contract price and a decommitment penalty, which allows agents to decommit from contracts at a cost. We compare our approach experimentally, using representative scenarios and workloads, to both combinatorial auctions and the fixed-price model, and show that the negotiation model achieves a higher social welfare
The psychophysiology primer : A guide to methods and a broad review with a focus on human-computer interaction
Publisher Copyright: © 2016 B. Cowley, M. Filetti, K. Lukander.Peer reviewe
Policy research working papers : catalog of numbers 801-1200
This paper contains a numerical listing of working papers produced by the Central Vicepresidencies. Each citation contains a brief abstract, and the contact point for the paper.Environmental Economics&Policies,Economic Theory&Research,Banks&Banking Reform,Poverty Assessment,Health Economics&Finance