14 research outputs found
Scalable Facility Location for Massive Graphs on Pregel-like Systems
We propose a new scalable algorithm for facility location. Facility location
is a classic problem, where the goal is to select a subset of facilities to
open, from a set of candidate facilities F , in order to serve a set of clients
C. The objective is to minimize the total cost of opening facilities plus the
cost of serving each client from the facility it is assigned to. In this work,
we are interested in the graph setting, where the cost of serving a client from
a facility is represented by the shortest-path distance on the graph. This
setting allows to model natural problems arising in the Web and in social media
applications. It also allows to leverage the inherent sparsity of such graphs,
as the input is much smaller than the full pairwise distances between all
vertices.
To obtain truly scalable performance, we design a parallel algorithm that
operates on clusters of shared-nothing machines. In particular, we target
modern Pregel-like architectures, and we implement our algorithm on Apache
Giraph. Our solution makes use of a recent result to build sketches for massive
graphs, and of a fast parallel algorithm to find maximal independent sets, as
building blocks. In so doing, we show how these problems can be solved on a
Pregel-like architecture, and we investigate the properties of these
algorithms. Extensive experimental results show that our algorithm scales
gracefully to graphs with billions of edges, while obtaining values of the
objective function that are competitive with a state-of-the-art sequential
algorithm
Show Me the Money: Dynamic Recommendations for Revenue Maximization
Recommender Systems (RS) play a vital role in applications such as e-commerce
and on-demand content streaming. Research on RS has mainly focused on the
customer perspective, i.e., accurate prediction of user preferences and
maximization of user utilities. As a result, most existing techniques are not
explicitly built for revenue maximization, the primary business goal of
enterprises. In this work, we explore and exploit a novel connection between RS
and the profitability of a business. As recommendations can be seen as an
information channel between a business and its customers, it is interesting and
important to investigate how to make strategic dynamic recommendations leading
to maximum possible revenue. To this end, we propose a novel \model that takes
into account a variety of factors including prices, valuations, saturation
effects, and competition amongst products. Under this model, we study the
problem of finding revenue-maximizing recommendation strategies over a finite
time horizon. We show that this problem is NP-hard, but approximation
guarantees can be obtained for a slightly relaxed version, by establishing an
elegant connection to matroid theory. Given the prohibitively high complexity
of the approximation algorithm, we also design intelligent heuristics for the
original problem. Finally, we conduct extensive experiments on two real and
synthetic datasets and demonstrate the efficiency, scalability, and
effectiveness our algorithms, and that they significantly outperform several
intuitive baselines.Comment: Conference version published in PVLDB 7(14). To be presented in the
VLDB Conference 2015, in Hawaii. This version gives a detailed submodularity
proo
The Family of MapReduce and Large Scale Data Processing Systems
In the last two decades, the continuous increase of computational power has
produced an overwhelming flow of data which has called for a paradigm shift in
the computing architecture and large scale data processing mechanisms.
MapReduce is a simple and powerful programming model that enables easy
development of scalable parallel applications to process vast amounts of data
on large clusters of commodity machines. It isolates the application from the
details of running a distributed program such as issues on data distribution,
scheduling and fault tolerance. However, the original implementation of the
MapReduce framework had some limitations that have been tackled by many
research efforts in several followup works after its introduction. This article
provides a comprehensive survey for a family of approaches and mechanisms of
large scale data processing mechanisms that have been implemented based on the
original idea of the MapReduce framework and are currently gaining a lot of
momentum in both research and industrial communities. We also cover a set of
introduced systems that have been implemented to provide declarative
programming interfaces on top of the MapReduce framework. In addition, we
review several large scale data processing systems that resemble some of the
ideas of the MapReduce framework for different purposes and application
scenarios. Finally, we discuss some of the future research directions for
implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author
Finite Automata Algorithms in Map-Reduce
In this thesis the intersection of several large nondeterministic finite automata (NFA's) as well as minimization of a large deterministic finite automaton (DFA) in map-reduce are studied. We have derived a lower bound on replication rate for computing NFA intersections and provided three concrete algorithms for the problem. Our investigation of the replication rate for each of all three algorithms shows where each algorithm could be applied through detailed experiments on large datasets of finite automata. Denoting n the number of states in DFA A, we propose an algorithm to minimize A in n map-reduce rounds in the worst-case. Our experiments, however, indicate that the number of rounds, in practice, is much smaller than n for all DFA's we examined. In other words, this algorithm converges in d iterations by computing the equivalence classes of each state, where d is the diameter of the input DFA
Book of Abstracts of the Sixth SIAM Workshop on Combinatorial Scientific Computing
Book of Abstracts of CSC14 edited by Bora UçarInternational audienceThe Sixth SIAM Workshop on Combinatorial Scientific Computing, CSC14, was organized at the Ecole Normale Supérieure de Lyon, France on 21st to 23rd July, 2014. This two and a half day event marked the sixth in a series that started ten years ago in San Francisco, USA. The CSC14 Workshop's focus was on combinatorial mathematics and algorithms in high performance computing, broadly interpreted. The workshop featured three invited talks, 27 contributed talks and eight poster presentations. All three invited talks were focused on two interesting fields of research specifically: randomized algorithms for numerical linear algebra and network analysis. The contributed talks and the posters targeted modeling, analysis, bisection, clustering, and partitioning of graphs, applied in the context of networks, sparse matrix factorizations, iterative solvers, fast multi-pole methods, automatic differentiation, high-performance computing, and linear programming. The workshop was held at the premises of the LIP laboratory of ENS Lyon and was generously supported by the LABEX MILYON (ANR-10-LABX-0070, Université de Lyon, within the program ''Investissements d'Avenir'' ANR-11-IDEX-0007 operated by the French National Research Agency), and by SIAM
QoE on media deliveriy in 5G environments
231 p.5G expandirá las redes móviles con un mayor ancho de banda, menor latencia y la capacidad de proveer conectividad de forma masiva y sin fallos. Los usuarios de servicios multimedia esperan una experiencia de reproducción multimedia fluida que se adapte de forma dinámica a los intereses del usuario y a su contexto de movilidad. Sin embargo, la red, adoptando una posición neutral, no ayuda a fortalecer los parámetros que inciden en la calidad de experiencia. En consecuencia, las soluciones diseñadas para realizar un envío de tráfico multimedia de forma dinámica y eficiente cobran un especial interés. Para mejorar la calidad de la experiencia de servicios multimedia en entornos 5G la investigación llevada a cabo en esta tesis ha diseñado un sistema múltiple, basado en cuatro contribuciones.El primer mecanismo, SaW, crea una granja elástica de recursos de computación que ejecutan tareas de análisis multimedia. Los resultados confirman la competitividad de este enfoque respecto a granjas de servidores. El segundo mecanismo, LAMB-DASH, elige la calidad en el reproductor multimedia con un diseño que requiere una baja complejidad de procesamiento. Las pruebas concluyen su habilidad para mejorar la estabilidad, consistencia y uniformidad de la calidad de experiencia entre los clientes que comparten una celda de red. El tercer mecanismo, MEC4FAIR, explota las capacidades 5G de analizar métricas del envío de los diferentes flujos. Los resultados muestran cómo habilita al servicio a coordinar a los diferentes clientes en la celda para mejorar la calidad del servicio. El cuarto mecanismo, CogNet, sirve para provisionar recursos de red y configurar una topología capaz de conmutar una demanda estimada y garantizar unas cotas de calidad del servicio. En este caso, los resultados arrojan una mayor precisión cuando la demanda de un servicio es mayor
On Improving Distributed Pregel-like Graph Processing Systems
The considerable interest in distributed systems that can execute algorithms to process large graphs has led to the creation of many graph processing systems. However, existing systems suffer from two major issues: (1) poor performance due to frequent global synchronization barriers and limited scalability; and (2) lack of support for graph algorithms that require serializability, the guarantee that parallel executions of an algorithm produce the same results as some serial execution of that algorithm.
Many graph processing systems use the bulk synchronous parallel (BSP) model, which allows graph algorithms to be easily implemented and reasoned about. However, BSP suffers from poor performance due to stale messages and frequent global synchronization barriers. While asynchronous models have been proposed to alleviate these overheads, existing systems that implement such models have limited scalability or retain frequent global barriers and do not always support graph mutations or algorithms with multiple computation phases. We propose barrierless asynchronous parallel (BAP), a new computation model that overcomes the limitations of existing asynchronous models by reducing both message staleness and global synchronization while retaining support for graph mutations and algorithms with multiple computation phases. We present GiraphUC, which implements our BAP model in the open source distributed graph processing system Giraph, and evaluate it at scale to demonstrate that BAP provides efficient and transparent asynchronous execution of algorithms that are programmed synchronously.
Secondly, very few systems provide serializability, despite the fact that many graph algorithms require it for accuracy, correctness, or termination. To address this deficiency, we provide a complete solution that can be implemented on top of existing graph processing systems to provide serializability. Our solution formalizes the notion of serializability and the conditions under which it can be provided for graph processing systems. We propose a partition-based synchronization technique that enforces these conditions efficiently to provide serializability. We implement this technique into Giraph and GiraphUC to demonstrate that it is configurable, transparent to algorithm developers, and more performant than existing techniques.4 month