1,233 research outputs found
Efficient Large-scale Trace Checking Using MapReduce
The problem of checking a logged event trace against a temporal logic
specification arises in many practical cases. Unfortunately, known algorithms
for an expressive logic like MTL (Metric Temporal Logic) do not scale with
respect to two crucial dimensions: the length of the trace and the size of the
time interval for which logged events must be buffered to check satisfaction of
the specification. The former issue can be addressed by distributed and
parallel trace checking algorithms that can take advantage of modern cloud
computing and programming frameworks like MapReduce. Still, the latter issue
remains open with current state-of-the-art approaches.
In this paper we address this memory scalability issue by proposing a new
semantics for MTL, called lazy semantics. This semantics can evaluate temporal
formulae and boolean combinations of temporal-only formulae at any arbitrary
time instant. We prove that lazy semantics is more expressive than standard
point-based semantics and that it can be used as a basis for a correct
parametric decomposition of any MTL formula into an equivalent one with
smaller, bounded time intervals. We use lazy semantics to extend our previous
distributed trace checking algorithm for MTL. We evaluate the proposed
algorithm in terms of memory scalability and time/memory tradeoffs.Comment: 13 pages, 8 figure
Scalable discovery of hybrid process models in a cloud computing environment
Process descriptions are used to create products and deliver services. To lead better processes and services, the first step
is to learn a process model. Process discovery is such a technique which can automatically extract process models from event logs.
Although various discovery techniques have been proposed, they focus on either constructing formal models which are very powerful
but complex, or creating informal models which are intuitive but lack semantics. In this work, we introduce a novel method that returns
hybrid process models to bridge this gap. Moreover, to cope with today’s big event logs, we propose an efficient method, called f-HMD,
aims at scalable hybrid model discovery in a cloud computing environment. We present the detailed implementation of our approach
over the Spark framework, and our experimental results demonstrate that the proposed method is efficient and scalabl
CPL: A Core Language for Cloud Computing -- Technical Report
Running distributed applications in the cloud involves deployment. That is,
distribution and configuration of application services and middleware
infrastructure. The considerable complexity of these tasks resulted in the
emergence of declarative JSON-based domain-specific deployment languages to
develop deployment programs. However, existing deployment programs unsafely
compose artifacts written in different languages, leading to bugs that are hard
to detect before run time. Furthermore, deployment languages do not provide
extension points for custom implementations of existing cloud services such as
application-specific load balancing policies.
To address these shortcomings, we propose CPL (Cloud Platform Language), a
statically-typed core language for programming both distributed applications as
well as their deployment on a cloud platform. In CPL, application services and
deployment programs interact through statically typed, extensible interfaces,
and an application can trigger further deployment at run time. We provide a
formal semantics of CPL and demonstrate that it enables type-safe, composable
and extensible libraries of service combinators, such as load balancing and
fault tolerance.Comment: Technical report accompanying the MODULARITY '16 submissio
Efficient Task Replication for Fast Response Times in Parallel Computation
One typical use case of large-scale distributed computing in data centers is
to decompose a computation job into many independent tasks and run them in
parallel on different machines, sometimes known as the "embarrassingly
parallel" computation. For this type of computation, one challenge is that the
time to execute a task for each machine is inherently variable, and the overall
response time is constrained by the execution time of the slowest machine. To
address this issue, system designers introduce task replication, which sends
the same task to multiple machines, and obtains result from the machine that
finishes first. While task replication reduces response time, it usually
increases resource usage. In this work, we propose a theoretical framework to
analyze the trade-off between response time and resource usage. We show that,
while in general, there is a tension between response time and resource usage,
there exist scenarios where replicating tasks judiciously reduces completion
time and resource usage simultaneously. Given the execution time distribution
for machines, we investigate the conditions for a scheduling policy to achieve
optimal performance trade-off, and propose efficient algorithms to search for
optimal or near-optimal scheduling policies. Our analysis gives insights on
when and why replication helps, which can be used to guide scheduler design in
large-scale distributed computing systems.Comment: Extended version of the 2-page paper accepted to ACM SIGMETRICS 201
- …