7 research outputs found

    Theoretical Analysis for Scale-down-Aware Service Allocation in Cloud Storage Systems

    Get PDF
    Servcie allocation algorithms have been drawing popularity in cloudcomputing research community. There has been lots of research onimprovingservice allocation schemes for high utilization, latency reductionand VM migration enfficient, but few work focus on energy consumptionaffected by instance placement in data centers. In this paper we propose an algorithm in which to maximize the number of freed-up machines in data centers, machines that host purely scale-down instances, which are reuiqred to be shut down for energy saving at certain points of time. We intuitively employ a probability partitioning mechanism to schedule services such that the goal of the maximization can be achieved. Furthermore we perform a set of experiments to test the partitioning rules, which show that the proposed algorithms can dynamically increase the number of freed-up machines substantially.DOI:http://dx.doi.org/10.11591/ijece.v3i1.179

    DPM: A novel distributed large-scale social graph processing framework for link prediction algorithms

    Get PDF
    Large-scale graphs have become ubiquitous in social media. Computer-based recommendations in these huge graphs pose challenges in terms of algorithm design and resource usage efficiency when processing recommendations in distributed computing environments. Moreover, recommendation algorithms for graphs, particularly link prediction algorithms, have different requirements depending of the way the underlying graph is traversed. Path-based algorithms usually perform traversals in different directions to build a large ranking of vertices to recommend, whereas random walk-based algorithms build an initial subgraph and perform several iterations on those vertices to compute the final ranking. In this work, we propose a distributed graph processing framework called Distributed Partitioned Merge (DPM), which supports both types of algorithms and we compare its performance and resource usage w.r.t. two relevant frameworks, namely Fork-Join and Pregel. In our experiments, we show that in most tests DPM outperforms both Pregel and Fork-Join in terms of recommendation time, with a minor penalization in network usage in some scenarios.Fil: Corbellini, Alejandro. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil. Instituto Superior de Ingeniería del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de Ingeniería del Software; ArgentinaFil: Godoy, Daniela Lis. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil. Instituto Superior de Ingeniería del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de Ingeniería del Software; ArgentinaFil: Mateos Diaz, Cristian Maximiliano. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil. Instituto Superior de Ingeniería del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de Ingeniería del Software; ArgentinaFil: Schiaffino, Silvia Noemi. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil. Instituto Superior de Ingeniería del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de Ingeniería del Software; ArgentinaFil: Zunino Suarez, Alejandro Octavio. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil. Instituto Superior de Ingeniería del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de Ingeniería del Software; Argentin

    Maiter: An Asynchronous Graph Processing Framework for Delta-based Accumulative Iterative Computation

    Full text link
    Myriad of graph-based algorithms in machine learning and data mining require parsing relational data iteratively. These algorithms are implemented in a large-scale distributed environment in order to scale to massive data sets. To accelerate these large-scale graph-based iterative computations, we propose delta-based accumulative iterative computation (DAIC). Different from traditional iterative computations, which iteratively update the result based on the result from the previous iteration, DAIC updates the result by accumulating the "changes" between iterations. By DAIC, we can process only the "changes" to avoid the negligible updates. Furthermore, we can perform DAIC asynchronously to bypass the high-cost synchronous barriers in heterogeneous distributed environments. Based on the DAIC model, we design and implement an asynchronous graph processing framework, Maiter. We evaluate Maiter on local cluster as well as on Amazon EC2 Cloud. The results show that Maiter achieves as much as 60x speedup over Hadoop and outperforms other state-of-the-art frameworks.Comment: ScienceCloud 2012, TKDE 201

    Detecting Crowdsourced Spam Reviews in Social Media

    Get PDF
    User submitted reviews are used by potential buyers to evaluate products before their purchase. In this work we study cases of deceptive reviews on Amazon.com which rate the products favorably. These were paid for through a number of crowd- sourcing websites. The behavior of the review spammers as a group has distinguish- able characteristics which are used in our proposed method. We use a probabilistic model for spammer pairwise collaboration which is used to cluster reviewers. The introduced model is verified on a set of synthetic data and outperforms a baseline classifier which treats reviews on their own, without their social context. The performance of the proposed method for detecting clusters of spammers is also compared to an alternative approach. Finally we demonstrate some of the detected clusters of review spammers on the data set which was crawled from Amazon

    An Efficient Similarity Search Framework for SimRank over Large Dynamic Graphs

    Get PDF
    ABSTRACT SimRank is an important measure of vertex-pair similarity according to the structure of graphs. The similarity search based on SimRank is an important operation for identifying similar vertices in a graph and has been employed in many data analysis applications. Nowadays, graphs in the real world become much larger and more dynamic. The existing solutions for similarity search are expensive in terms of time and space cost. None of them can efficiently support similarity search over large dynamic graphs. In this paper, we propose a novel two-stage random-walk sampling framework (TSF) for SimRank-based similarity search (e.g., top-k search). In the preprocessing stage, TSF samples a set of one-way graphs to index raw random walks in a novel manner within O(N Rg) time and space, where N is the number of vertices and Rg is the number of one-way graphs. The one-way graph can be efficiently updated in accordance with the graph modification, thus TSF is well suited to dynamic graphs. During the query stage, TSF can search similar vertices fast by naturally pruning unqualified vertices based on the connectivity of one-way graphs. Furthermore, with additional Rq samples, TSF can estimate the SimRank score with probabil- (1−c) 2 if the error of approximation is bounded by 1 − ǫ. Finally, to guarantee the scalability of TSF, the one-way graphs can also be compactly stored on the disk when the memory is limited. Extensive experiments have demonstrated that TSF can handle dynamic billion-edge graphs with high performance
    corecore