621 research outputs found
Efficient Large-scale Trace Checking Using MapReduce
The problem of checking a logged event trace against a temporal logic
specification arises in many practical cases. Unfortunately, known algorithms
for an expressive logic like MTL (Metric Temporal Logic) do not scale with
respect to two crucial dimensions: the length of the trace and the size of the
time interval for which logged events must be buffered to check satisfaction of
the specification. The former issue can be addressed by distributed and
parallel trace checking algorithms that can take advantage of modern cloud
computing and programming frameworks like MapReduce. Still, the latter issue
remains open with current state-of-the-art approaches.
In this paper we address this memory scalability issue by proposing a new
semantics for MTL, called lazy semantics. This semantics can evaluate temporal
formulae and boolean combinations of temporal-only formulae at any arbitrary
time instant. We prove that lazy semantics is more expressive than standard
point-based semantics and that it can be used as a basis for a correct
parametric decomposition of any MTL formula into an equivalent one with
smaller, bounded time intervals. We use lazy semantics to extend our previous
distributed trace checking algorithm for MTL. We evaluate the proposed
algorithm in terms of memory scalability and time/memory tradeoffs.Comment: 13 pages, 8 figure
Computing Web-scale Topic Models using an Asynchronous Parameter Server
Topic models such as Latent Dirichlet Allocation (LDA) have been widely used
in information retrieval for tasks ranging from smoothing and feedback methods
to tools for exploratory search and discovery. However, classical methods for
inferring topic models do not scale up to the massive size of today's publicly
available Web-scale data sets. The state-of-the-art approaches rely on custom
strategies, implementations and hardware to facilitate their asynchronous,
communication-intensive workloads.
We present APS-LDA, which integrates state-of-the-art topic modeling with
cluster computing frameworks such as Spark using a novel asynchronous parameter
server. Advantages of this integration include convenient usage of existing
data processing pipelines and eliminating the need for disk writes as data can
be kept in memory from start to finish. Our goal is not to outperform highly
customized implementations, but to propose a general high-performance topic
modeling framework that can easily be used in today's data processing
pipelines. We compare APS-LDA to the existing Spark LDA implementations and
show that our system can, on a 480-core cluster, process up to 135 times more
data and 10 times more topics without sacrificing model quality.Comment: To appear in SIGIR 201
Window-based Streaming Graph Partitioning Algorithm
In the recent years, the scale of graph datasets has increased to such a
degree that a single machine is not capable of efficiently processing large
graphs. Thereby, efficient graph partitioning is necessary for those large
graph applications. Traditional graph partitioning generally loads the whole
graph data into the memory before performing partitioning; this is not only a
time consuming task but it also creates memory bottlenecks. These issues of
memory limitation and enormous time complexity can be resolved using
stream-based graph partitioning. A streaming graph partitioning algorithm reads
vertices once and assigns that vertex to a partition accordingly. This is also
called an one-pass algorithm. This paper proposes an efficient window-based
streaming graph partitioning algorithm called WStream. The WStream algorithm is
an edge-cut partitioning algorithm, which distributes a vertex among the
partitions. Our results suggest that the WStream algorithm is able to partition
large graph data efficiently while keeping the load balanced across different
partitions, and communication to a minimum. Evaluation results with real
workloads also prove the effectiveness of our proposed algorithm, and it
achieves a significant reduction in load imbalance and edge-cut with different
ranges of dataset
U.S. Sport Management Programs in Business Schools: Trends and Key Issues
The growth of sport management programs housed in (or with formal curriculum-based ties to) a school of business indicates more academic institutions are reconsidering sport management as a business-oriented field. Thus, research is necessary regarding benchmarking information on the state of these academic programs. The purpose of this study is to explore trends on administration, housing, accreditation, faculty performance indicators and research requirements, as well as salaries for faculty and alumni of such programs. Data were submitted by 74 department chairs and program directors employed in U.S. business schools featuring sport management programs. Results indicate that the majority of sport business programs are part of an interdisciplinary department; COSMA accreditation is largely viewed as redundant; and, depending on business schools’ accreditation, variability exists concerning faculty performance measures and research impact, as well as faculty and alumni salaries. These findings suggest considerable progress of sport management programs within business schools
GraphSE: An Encrypted Graph Database for Privacy-Preserving Social Search
In this paper, we propose GraphSE, an encrypted graph database for online
social network services to address massive data breaches. GraphSE preserves
the functionality of social search, a key enabler for quality social network
services, where social search queries are conducted on a large-scale social
graph and meanwhile perform set and computational operations on user-generated
contents. To enable efficient privacy-preserving social search, GraphSE
provides an encrypted structural data model to facilitate parallel and
encrypted graph data access. It is also designed to decompose complex social
search queries into atomic operations and realise them via interchangeable
protocols in a fast and scalable manner. We build GraphSE with various
queries supported in the Facebook graph search engine and implement a
full-fledged prototype. Extensive evaluations on Azure Cloud demonstrate that
GraphSE is practical for querying a social graph with a million of users.Comment: This is the full version of our AsiaCCS paper "GraphSE: An
Encrypted Graph Database for Privacy-Preserving Social Search". It includes
the security proof of the proposed scheme. If you want to cite our work,
please cite the conference version of i
Metallurgical industry in Romania in the context of the economic crisis
The magnitude of the economic crisis and the influence on the developments of industrial branches was different.Although European economies are strongly interconnected both internally and externally, the way in which an economic branch has crossed and is trying to overcome the economic crisis has some peculiarities arising from its specificity on the one hand, and on the other hand, from the policies applied in the field. Based on these considerations,the paper examines how Romanian metallurgical industry passes through the economic crisis as compared with other industries. Also based on quantitative analyses performed and taking into account the specific phenomenon of seasonality are presented models of evolution of this industry with horizon in February 2015
Low latency via redundancy
Low latency is critical for interactive networked applications. But while we
know how to scale systems to increase capacity, reducing latency --- especially
the tail of the latency distribution --- can be much more difficult. In this
paper, we argue that the use of redundancy is an effective way to convert extra
capacity into reduced latency. By initiating redundant operations across
diverse resources and using the first result which completes, redundancy
improves a system's latency even under exceptional conditions. We study the
tradeoff with added system utilization, characterizing the situations in which
replicating all tasks reduces mean latency. We then demonstrate empirically
that replicating all operations can result in significant mean and tail latency
reduction in real-world systems including DNS queries, database servers, and
packet forwarding within networks
Low Latency Geo-distributed Data Analytics
Low latency analytics on geographically distributed dat-asets (across datacenters, edge clusters) is an upcoming and increasingly important challenge. The dominant approach of aggregating all the data to a single data-center significantly inflates the timeliness of analytics. At the same time, running queries over geo-distributed inputs using the current intra-DC analytics frameworks also leads to high query response times because these frameworks cannot cope with the relatively low and variable capacity of WAN links. We present Iridium, a system for low latency geo-distri-buted analytics. Iridium achieves low query response times by optimizing placement of both data and tasks of the queries. The joint data and task placement op-timization, however, is intractable. Therefore, Iridium uses an online heuristic to redistribute datasets among the sites prior to queries ’ arrivals, and places the tasks to reduce network bottlenecks during the query’s ex-ecution. Finally, it also contains a knob to budget WAN usage. Evaluation across eight worldwide EC2 re-gions using production queries show that Iridium speeds up queries by 3 × − 19 × and lowers WAN usage by 15% − 64 % compared to existing baselines
- …