Search CORE

621 research outputs found

Efficient Large-scale Trace Checking Using MapReduce

Author: Barre B.
Bartocci E.
Basin D.
Bianculli D.
Coen-Porisini A.
Ho H.-M.
Mrad A.
Zaharia M.
Zaharia M.
Publication venue
Publication date: 26/08/2015
Field of study

The problem of checking a logged event trace against a temporal logic specification arises in many practical cases. Unfortunately, known algorithms for an expressive logic like MTL (Metric Temporal Logic) do not scale with respect to two crucial dimensions: the length of the trace and the size of the time interval for which logged events must be buffered to check satisfaction of the specification. The former issue can be addressed by distributed and parallel trace checking algorithms that can take advantage of modern cloud computing and programming frameworks like MapReduce. Still, the latter issue remains open with current state-of-the-art approaches. In this paper we address this memory scalability issue by proposing a new semantics for MTL, called lazy semantics. This semantics can evaluate temporal formulae and boolean combinations of temporal-only formulae at any arbitrary time instant. We prove that lazy semantics is more expressive than standard point-based semantics and that it can be used as a basis for a correct parametric decomposition of any MTL formula into an equivalent one with smaller, bounded time intervals. We use lazy semantics to extend our previous distributed trace checking algorithm for MTL. We evaluate the proposed algorithm in terms of memory scalability and time/memory tradeoffs.Comment: 13 pages, 8 figure

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

Crossref

Open Repository and Bibliography - Luxembourg

Computing Web-scale Topic Models using an Asynchronous Parameter Server

Author: Asuncion A.
Hofmann T.
Yu H.-F.
Zaharia M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

Topic models such as Latent Dirichlet Allocation (LDA) have been widely used in information retrieval for tasks ranging from smoothing and feedback methods to tools for exploratory search and discovery. However, classical methods for inferring topic models do not scale up to the massive size of today's publicly available Web-scale data sets. The state-of-the-art approaches rely on custom strategies, implementations and hardware to facilitate their asynchronous, communication-intensive workloads. We present APS-LDA, which integrates state-of-the-art topic modeling with cluster computing frameworks such as Spark using a novel asynchronous parameter server. Advantages of this integration include convenient usage of existing data processing pipelines and eliminating the need for disk writes as data can be kept in memory from start to finish. Our goal is not to outperform highly customized implementations, but to propose a general high-performance topic modeling framework that can easily be used in today's data processing pipelines. We compare APS-LDA to the existing Spark LDA implementations and show that our system can, on a 480-core cluster, process up to 135 times more data and 10 times more topics without sacrificing model quality.Comment: To appear in SIGIR 201

arXiv.org e-Print Archive

Crossref

International Migration, Integration and Social Cohesion online publications

Window-based Streaming Graph Partitioning Algorithm

Author: Abdolrashidi A.
Bader D. A.
Gonzalez Joseph E.
Sajjad H. P.
Wang R.
Zaharia Matei
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

In the recent years, the scale of graph datasets has increased to such a degree that a single machine is not capable of efficiently processing large graphs. Thereby, efficient graph partitioning is necessary for those large graph applications. Traditional graph partitioning generally loads the whole graph data into the memory before performing partitioning; this is not only a time consuming task but it also creates memory bottlenecks. These issues of memory limitation and enormous time complexity can be resolved using stream-based graph partitioning. A streaming graph partitioning algorithm reads vertices once and assigns that vertex to a partition accordingly. This is also called an one-pass algorithm. This paper proposes an efficient window-based streaming graph partitioning algorithm called WStream. The WStream algorithm is an edge-cut partitioning algorithm, which distributes a vertex among the partitions. Our results suggest that the WStream algorithm is able to partition large graph data efficiently while keeping the load balanced across different partitions, and communication to a minimum. Evaluation results with real workloads also prove the effectiveness of our proposed algorithm, and it achieves a significant reduction in load imbalance and edge-cut with different ranges of dataset

arXiv.org e-Print Archive

Crossref

University of Tasmania Open Access Repository

Bozeş sedimentary unit (Apuseni Mts., Romania) : geochemical constraints on provenance and tectonic setting

Author: Bǎlc R.
Socaciu A.
Zaharia Luminiţa
Publication venue
Publication date: 01/01/2012
Field of study

University of Szeged

U.S. Sport Management Programs in Business Schools: Trends and Key Issues

Author: Kaburakis Anastasios
Pierce David A.
Zaharia Noni
Publication venue: 'Human Kinetics'
Publication date: 01/02/2016
Field of study

The growth of sport management programs housed in (or with formal curriculum-based ties to) a school of business indicates more academic institutions are reconsidering sport management as a business-oriented field. Thus, research is necessary regarding benchmarking information on the state of these academic programs. The purpose of this study is to explore trends on administration, housing, accreditation, faculty performance indicators and research requirements, as well as salaries for faculty and alumni of such programs. Data were submitted by 74 department chairs and program directors employed in U.S. business schools featuring sport management programs. Results indicate that the majority of sport business programs are part of an interdisciplinary department; COSMA accreditation is largely viewed as redundant; and, depending on business schools’ accreditation, variability exists concerning faculty performance measures and research impact, as well as faculty and alumni salaries. These findings suggest considerable progress of sport management programs within business schools

IUPUIScholarWorks

GraphSE $^2$ : An Encrypted Graph Database for Privacy-Preserving Social Search

Author: Beaver D.
Chi Y.
Papadimitriou A.
Poddar R.
Slee M.
Xie D.
Yao A.C.
Zaharia M.
Zhang Y.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 15/05/2019
Field of study

In this paper, we propose GraphSE

^2

, an encrypted graph database for online social network services to address massive data breaches. GraphSE

^2

preserves the functionality of social search, a key enabler for quality social network services, where social search queries are conducted on a large-scale social graph and meanwhile perform set and computational operations on user-generated contents. To enable efficient privacy-preserving social search, GraphSE

^2

provides an encrypted structural data model to facilitate parallel and encrypted graph data access. It is also designed to decompose complex social search queries into atomic operations and realise them via interchangeable protocols in a fast and scalable manner. We build GraphSE

^2

with various queries supported in the Facebook graph search engine and implement a full-fledged prototype. Extensive evaluations on Azure Cloud demonstrate that GraphSE

^2

is practical for querying a social graph with a million of users.Comment: This is the full version of our AsiaCCS paper "GraphSE

^2

: An Encrypted Graph Database for Privacy-Preserving Social Search". It includes the security proof of the proposed scheme. If you want to cite our work, please cite the conference version of i

arXiv.org e-Print Archive

Crossref

Metallurgical industry in Romania in the context of the economic crisis

Author: A. Bălăcescu
A. G. Babucea
C. I. Răbonţu
M. Zaharia
Publication venue: Croatian Metallurgical Society (CMS)
Publication date: 01/01/2015
Field of study

The magnitude of the economic crisis and the influence on the developments of industrial branches was different.Although European economies are strongly interconnected both internally and externally, the way in which an economic branch has crossed and is trying to overcome the economic crisis has some peculiarities arising from its specificity on the one hand, and on the other hand, from the policies applied in the field. Based on these considerations,the paper examines how Romanian metallurgical industry passes through the economic crisis as compared with other industries. Also based on quantitative analyses performed and taking into account the specific phenomenon of seasonality are presented models of evolution of this industry with horizon in February 2015

Directory of Open Access Journals

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Low latency via redundancy

Author: Al-Fares M.
Ananthanarayanan G.
Andersen D. G.
Asmussen S.
Beaver D.
Han D.
Li J.
Zaharia M.
Zwart A. P.
Publication venue
Publication date: 16/06/2013
Field of study

Low latency is critical for interactive networked applications. But while we know how to scale systems to increase capacity, reducing latency --- especially the tail of the latency distribution --- can be much more difficult. In this paper, we argue that the use of redundancy is an effective way to convert extra capacity into reduced latency. By initiating redundant operations across diverse resources and using the first result which completes, redundancy improves a system's latency even under exceptional conditions. We study the tradeoff with added system utilization, characterizing the situations in which replicating all tasks reduces mean latency. We then demonstrate empirically that replicating all operations can result in significant mean and tail latency reduction in real-world systems including DNS queries, database servers, and packet forwarding within networks

arXiv.org e-Print Archive

CiteSeerX

Crossref

Low Latency Geo-distributed Data Analytics

Author: Agarwal S.
Ananthanarayanan G.
Ananthanarayanan G.
Ananthanarayanan G.
Ananthanarayanan G.
Boutin E.
Corbett J. C.
Rabkin A.
Sitaraman R.
Venkataraman S.
Vulimiri A.
Zaharia M.
Zaharia M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/12/2015
Field of study

Low latency analytics on geographically distributed dat-asets (across datacenters, edge clusters) is an upcoming and increasingly important challenge. The dominant approach of aggregating all the data to a single data-center significantly inflates the timeliness of analytics. At the same time, running queries over geo-distributed inputs using the current intra-DC analytics frameworks also leads to high query response times because these frameworks cannot cope with the relatively low and variable capacity of WAN links. We present Iridium, a system for low latency geo-distri-buted analytics. Iridium achieves low query response times by optimizing placement of both data and tasks of the queries. The joint data and task placement op-timization, however, is intractable. Therefore, Iridium uses an online heuristic to redistribute datasets among the sites prior to queries ’ arrivals, and places the tasks to reduce network bottlenecks during the query’s ex-ecution. Finally, it also contains a knob to budget WAN usage. Evaluation across eight worldwide EC2 re-gions using production queries show that Iridium speeds up queries by 3 × − 19 × and lowers WAN usage by 15% − 64 % compared to existing baselines

CiteSeerX

Crossref