620 research outputs found

    Efficient Large-scale Trace Checking Using MapReduce

    Full text link
    The problem of checking a logged event trace against a temporal logic specification arises in many practical cases. Unfortunately, known algorithms for an expressive logic like MTL (Metric Temporal Logic) do not scale with respect to two crucial dimensions: the length of the trace and the size of the time interval for which logged events must be buffered to check satisfaction of the specification. The former issue can be addressed by distributed and parallel trace checking algorithms that can take advantage of modern cloud computing and programming frameworks like MapReduce. Still, the latter issue remains open with current state-of-the-art approaches. In this paper we address this memory scalability issue by proposing a new semantics for MTL, called lazy semantics. This semantics can evaluate temporal formulae and boolean combinations of temporal-only formulae at any arbitrary time instant. We prove that lazy semantics is more expressive than standard point-based semantics and that it can be used as a basis for a correct parametric decomposition of any MTL formula into an equivalent one with smaller, bounded time intervals. We use lazy semantics to extend our previous distributed trace checking algorithm for MTL. We evaluate the proposed algorithm in terms of memory scalability and time/memory tradeoffs.Comment: 13 pages, 8 figure

    Computing Web-scale Topic Models using an Asynchronous Parameter Server

    Full text link
    Topic models such as Latent Dirichlet Allocation (LDA) have been widely used in information retrieval for tasks ranging from smoothing and feedback methods to tools for exploratory search and discovery. However, classical methods for inferring topic models do not scale up to the massive size of today's publicly available Web-scale data sets. The state-of-the-art approaches rely on custom strategies, implementations and hardware to facilitate their asynchronous, communication-intensive workloads. We present APS-LDA, which integrates state-of-the-art topic modeling with cluster computing frameworks such as Spark using a novel asynchronous parameter server. Advantages of this integration include convenient usage of existing data processing pipelines and eliminating the need for disk writes as data can be kept in memory from start to finish. Our goal is not to outperform highly customized implementations, but to propose a general high-performance topic modeling framework that can easily be used in today's data processing pipelines. We compare APS-LDA to the existing Spark LDA implementations and show that our system can, on a 480-core cluster, process up to 135 times more data and 10 times more topics without sacrificing model quality.Comment: To appear in SIGIR 201

    Window-based Streaming Graph Partitioning Algorithm

    Full text link
    In the recent years, the scale of graph datasets has increased to such a degree that a single machine is not capable of efficiently processing large graphs. Thereby, efficient graph partitioning is necessary for those large graph applications. Traditional graph partitioning generally loads the whole graph data into the memory before performing partitioning; this is not only a time consuming task but it also creates memory bottlenecks. These issues of memory limitation and enormous time complexity can be resolved using stream-based graph partitioning. A streaming graph partitioning algorithm reads vertices once and assigns that vertex to a partition accordingly. This is also called an one-pass algorithm. This paper proposes an efficient window-based streaming graph partitioning algorithm called WStream. The WStream algorithm is an edge-cut partitioning algorithm, which distributes a vertex among the partitions. Our results suggest that the WStream algorithm is able to partition large graph data efficiently while keeping the load balanced across different partitions, and communication to a minimum. Evaluation results with real workloads also prove the effectiveness of our proposed algorithm, and it achieves a significant reduction in load imbalance and edge-cut with different ranges of dataset

    An analysis of the turnover index evolution in metallurgy during 2000-2012, the case of Romania

    Get PDF
    The economic crisis has left its mark on metallurgical activities, which have declined significantly with lasting implications. The magnitude of the economic crisis and its influence on industries on long-term trends is different. The research carried out shows that for the evolution of metallurgy in January 2000 - November 2012, a valid model was not identified to describe the indices of turnover of the entire period, due to the discontinuity caused by the economic crisis. But there can be generated models of the evolution of annual turnover indexes for the period which precedes the economic crisis and models of evolution of value indexes of turnover in the period after the economic crisis

    U.S. Sport Management Programs in Business Schools: Trends and Key Issues

    Get PDF
    The growth of sport management programs housed in (or with formal curriculum-based ties to) a school of business indicates more academic institutions are reconsidering sport management as a business-oriented field. Thus, research is necessary regarding benchmarking information on the state of these academic programs. The purpose of this study is to explore trends on administration, housing, accreditation, faculty performance indicators and research requirements, as well as salaries for faculty and alumni of such programs. Data were submitted by 74 department chairs and program directors employed in U.S. business schools featuring sport management programs. Results indicate that the majority of sport business programs are part of an interdisciplinary department; COSMA accreditation is largely viewed as redundant; and, depending on business schools’ accreditation, variability exists concerning faculty performance measures and research impact, as well as faculty and alumni salaries. These findings suggest considerable progress of sport management programs within business schools

    GraphSE2^2: An Encrypted Graph Database for Privacy-Preserving Social Search

    Full text link
    In this paper, we propose GraphSE2^2, an encrypted graph database for online social network services to address massive data breaches. GraphSE2^2 preserves the functionality of social search, a key enabler for quality social network services, where social search queries are conducted on a large-scale social graph and meanwhile perform set and computational operations on user-generated contents. To enable efficient privacy-preserving social search, GraphSE2^2 provides an encrypted structural data model to facilitate parallel and encrypted graph data access. It is also designed to decompose complex social search queries into atomic operations and realise them via interchangeable protocols in a fast and scalable manner. We build GraphSE2^2 with various queries supported in the Facebook graph search engine and implement a full-fledged prototype. Extensive evaluations on Azure Cloud demonstrate that GraphSE2^2 is practical for querying a social graph with a million of users.Comment: This is the full version of our AsiaCCS paper "GraphSE2^2: An Encrypted Graph Database for Privacy-Preserving Social Search". It includes the security proof of the proposed scheme. If you want to cite our work, please cite the conference version of i

    Metallurgical industry in Romania in the context of the economic crisis

    Get PDF
    The magnitude of the economic crisis and the influence on the developments of industrial branches was different.Although European economies are strongly interconnected both internally and externally, the way in which an economic branch has crossed and is trying to overcome the economic crisis has some peculiarities arising from its specificity on the one hand, and on the other hand, from the policies applied in the field. Based on these considerations,the paper examines how Romanian metallurgical industry passes through the economic crisis as compared with other industries. Also based on quantitative analyses performed and taking into account the specific phenomenon of seasonality are presented models of evolution of this industry with horizon in February 2015

    Low latency via redundancy

    Full text link
    Low latency is critical for interactive networked applications. But while we know how to scale systems to increase capacity, reducing latency --- especially the tail of the latency distribution --- can be much more difficult. In this paper, we argue that the use of redundancy is an effective way to convert extra capacity into reduced latency. By initiating redundant operations across diverse resources and using the first result which completes, redundancy improves a system's latency even under exceptional conditions. We study the tradeoff with added system utilization, characterizing the situations in which replicating all tasks reduces mean latency. We then demonstrate empirically that replicating all operations can result in significant mean and tail latency reduction in real-world systems including DNS queries, database servers, and packet forwarding within networks

    Low Latency Geo-distributed Data Analytics

    Full text link
    Low latency analytics on geographically distributed dat-asets (across datacenters, edge clusters) is an upcoming and increasingly important challenge. The dominant approach of aggregating all the data to a single data-center significantly inflates the timeliness of analytics. At the same time, running queries over geo-distributed inputs using the current intra-DC analytics frameworks also leads to high query response times because these frameworks cannot cope with the relatively low and variable capacity of WAN links. We present Iridium, a system for low latency geo-distri-buted analytics. Iridium achieves low query response times by optimizing placement of both data and tasks of the queries. The joint data and task placement op-timization, however, is intractable. Therefore, Iridium uses an online heuristic to redistribute datasets among the sites prior to queries ’ arrivals, and places the tasks to reduce network bottlenecks during the query’s ex-ecution. Finally, it also contains a knob to budget WAN usage. Evaluation across eight worldwide EC2 re-gions using production queries show that Iridium speeds up queries by 3 × − 19 × and lowers WAN usage by 15% − 64 % compared to existing baselines
    corecore