457 research outputs found

    Window-based Streaming Graph Partitioning Algorithm

    Full text link
    In the recent years, the scale of graph datasets has increased to such a degree that a single machine is not capable of efficiently processing large graphs. Thereby, efficient graph partitioning is necessary for those large graph applications. Traditional graph partitioning generally loads the whole graph data into the memory before performing partitioning; this is not only a time consuming task but it also creates memory bottlenecks. These issues of memory limitation and enormous time complexity can be resolved using stream-based graph partitioning. A streaming graph partitioning algorithm reads vertices once and assigns that vertex to a partition accordingly. This is also called an one-pass algorithm. This paper proposes an efficient window-based streaming graph partitioning algorithm called WStream. The WStream algorithm is an edge-cut partitioning algorithm, which distributes a vertex among the partitions. Our results suggest that the WStream algorithm is able to partition large graph data efficiently while keeping the load balanced across different partitions, and communication to a minimum. Evaluation results with real workloads also prove the effectiveness of our proposed algorithm, and it achieves a significant reduction in load imbalance and edge-cut with different ranges of dataset

    Asymptotically Optimal Approximation Algorithms for Coflow Scheduling

    Full text link
    Many modern datacenter applications involve large-scale computations composed of multiple data flows that need to be completed over a shared set of distributed resources. Such a computation completes when all of its flows complete. A useful abstraction for modeling such scenarios is a {\em coflow}, which is a collection of flows (e.g., tasks, packets, data transmissions) that all share the same performance goal. In this paper, we present the first approximation algorithms for scheduling coflows over general network topologies with the objective of minimizing total weighted completion time. We consider two different models for coflows based on the nature of individual flows: circuits, and packets. We design constant-factor polynomial-time approximation algorithms for scheduling packet-based coflows with or without given flow paths, and circuit-based coflows with given flow paths. Furthermore, we give an O(logn/loglogn)O(\log n/\log \log n)-approximation polynomial time algorithm for scheduling circuit-based coflows where flow paths are not given (here nn is the number of network edges). We obtain our results by developing a general framework for coflow schedules, based on interval-indexed linear programs, which may extend to other coflow models and objective functions and may also yield improved approximation bounds for specific network scenarios. We also present an experimental evaluation of our approach for circuit-based coflows that show a performance improvement of at least 22% on average over competing heuristics.Comment: Fixed minor typo

    GraphSE2^2: An Encrypted Graph Database for Privacy-Preserving Social Search

    Full text link
    In this paper, we propose GraphSE2^2, an encrypted graph database for online social network services to address massive data breaches. GraphSE2^2 preserves the functionality of social search, a key enabler for quality social network services, where social search queries are conducted on a large-scale social graph and meanwhile perform set and computational operations on user-generated contents. To enable efficient privacy-preserving social search, GraphSE2^2 provides an encrypted structural data model to facilitate parallel and encrypted graph data access. It is also designed to decompose complex social search queries into atomic operations and realise them via interchangeable protocols in a fast and scalable manner. We build GraphSE2^2 with various queries supported in the Facebook graph search engine and implement a full-fledged prototype. Extensive evaluations on Azure Cloud demonstrate that GraphSE2^2 is practical for querying a social graph with a million of users.Comment: This is the full version of our AsiaCCS paper "GraphSE2^2: An Encrypted Graph Database for Privacy-Preserving Social Search". It includes the security proof of the proposed scheme. If you want to cite our work, please cite the conference version of i

    EFFICIENT AND FAST GAUSSIAN BEAM-TRACKING APPROACH FOR INDOOR-PROPAGATION MODELING

    No full text
    International audienceA Gaussian beam-tracking technique is proposed for physical indoor-propagation modeling. Its efficiency stems from the collective treatment of rays, which is realized by using Gaussian beams to propagate fields. The formulation of this method is outlined, the computation-time efficiency is discussed, and the simulation results are compared to those obtained using a commercial ray-tracing software (XSiradif)

    Mineral waters in Brașov County. Characteristics and use

    Get PDF
    The sources of mineral water are spread all over Braşov County, but most of them have been identified and mapped in the central part of the county, namely at the contact between Transylvanian Depression with the western part of the Eastern Carpathians. Based on the analysis of the mineral water sources identified in the field during 2011 – 2016 period , three major types of mineral waters have been identified: chlorosodic, carbonated and hypothermal waters. Clorosodic waters are present within or close to areas of salt massives (the eastern and south-eastern edge of the Transylvanian Depression), some of these having high salty concentration (more than 70 g/l at Mercheaşa and Racoş). Carbonated mineral waters appear in the southern part of the neogen eruptive, respectively in the unit of the internal Carpathian flysch, on the Zizin-Tărlungeni-Săcele line. Hypothermal waters emerge on the Măieruş-Codlea line, having a constant temperature (23,4ºC at Măieruş and 18,4ºC at Codlea). Some locations with mineral water sources in the Brașov county used to be permanent or seasonal resorts of regional or local importance, many of them being currently abandoned or in an advanced degree of degradation (e.g. Băile Homorod, Băile Zizin, Băile Veneţia de Jos etc.), excepting Băile Rodbav and Băile Perşani which are still active

    Application and testing of the L neural network with the self-consistent magnetic field model of RAM-SCB

    Get PDF
    Abstract We expanded our previous work on L neural networks that used empirical magnetic field models as the underlying models by applying and extending our technique to drift shells calculated from a physics-based magnetic field model. While empirical magnetic field models represent an average, statistical magnetospheric state, the RAM-SCB model, a first-principles magnetically self-consistent code, computes magnetic fields based on fundamental equations of plasma physics. Unlike the previous L neural networks that include McIlwain L and mirror point magnetic field as part of the inputs, the new L neural network only requires solar wind conditions and the Dst index, allowing for an easier preparation of input parameters. This new neural network is compared against those previously trained networks and validated by the tracing method in the International Radiation Belt Environment Modeling (IRBEM) library. The accuracy of all L neural networks with different underlying magnetic field models is evaluated by applying the electron phase space density (PSD)-matching technique derived from the Liouville\u27s theorem to the Van Allen Probes observations. Results indicate that the uncertainty in the predicted L is statistically (75%) below 0.7 with a median value mostly below 0.2 and the median absolute deviation around 0.15, regardless of the underlying magnetic field model. We found that such an uncertainty in the calculated L value can shift the peak location of electron phase space density (PSD) profile by 0.2 RE radially but with its shape nearly preserved. Key Points L* neural network based on RAM-SCB model is developed L* calculation accuracy is estimated by PSD matching using RBSP data L* uncertainty causes a radial shift in the electron phase space density profile

    Low Latency Geo-distributed Data Analytics

    Full text link
    Low latency analytics on geographically distributed dat-asets (across datacenters, edge clusters) is an upcoming and increasingly important challenge. The dominant approach of aggregating all the data to a single data-center significantly inflates the timeliness of analytics. At the same time, running queries over geo-distributed inputs using the current intra-DC analytics frameworks also leads to high query response times because these frameworks cannot cope with the relatively low and variable capacity of WAN links. We present Iridium, a system for low latency geo-distri-buted analytics. Iridium achieves low query response times by optimizing placement of both data and tasks of the queries. The joint data and task placement op-timization, however, is intractable. Therefore, Iridium uses an online heuristic to redistribute datasets among the sites prior to queries ’ arrivals, and places the tasks to reduce network bottlenecks during the query’s ex-ecution. Finally, it also contains a knob to budget WAN usage. Evaluation across eight worldwide EC2 re-gions using production queries show that Iridium speeds up queries by 3 × − 19 × and lowers WAN usage by 15% − 64 % compared to existing baselines

    Automatic improvement of apache spark queries using semantics-preserving program reduction

    Get PDF
    © 2016 ACM. Apache Spark is a popular framework for large-scale data analytics. Unfortunately, Spark's performance can be difficult to optimise, since queries freely expressed in source code are not amenable to traditional optimisation techniques. This article describes Hylas, a tool for automatically optimising Spark queries embedded in source code via the application of semantics-preserving transformations. The transformation method is inspired by functional programming techniques of "deforestation", which eliminate intermediate data structures from a computation. This contrasts with approaches defined entirely within structured query formats such as Spark SQL. Hylas can identify certain computationally expensive operations and ensure that performing them creates no superfluous data structures. This optimisation leads to significant improvements in execution time, with over 10,000 times improvement observed in some cases
    corecore