    Scaling up publish/subscribe overlays using interest correlation for link sharing

    Topic-based publish/subscribe is at the core of many distributed systems, ranging from application integration middleware to news dissemination. Therefore, much research was dedicated to publish/subscribe architectures and protocols, and in particular to the design of overlay networks for decentralized topic-based routing and efficient message dissemination. Nonetheless, existing systems fail to take full advantage of shared interests when disseminating information, hence suffering from high maintenance and traffic costs, or construct overlays that cope poorly with the scale and dynamism of large networks. In this paper we present StaN, a decentralized protocol that optimizes the properties of gossip-based overlay networks for topicbased publish/subscribe by sharing a large number of physical connections without disrupting its logical properties. StaN relies only on local knowledge and operates by leveraging common interests among participants to improve global resource usage and promote topic and event scalability. The experimental evaluation under two real workloads, both via a real deployment and through simulation shows that StaN provides an attractive infrastructure for scalable topic-based publish/subscribe

    Minimum maximum-degree publish-subscribe overlay network design

    Designing an overlay network for publish/subscribe communication in a system where nodes may subscribe to many different topics of interest is of fundamental importance. For scalability and efficiency, it is important to keep the degree of the nodes in the publish/subscribe system low. It is only natural then to formalize the following problem: Given a collection of nodes and their topic subscriptions, connect the nodes into a graph that has least possible maximum degree in such a way that for each topic t, the graph induced by the nodes interested in t is connected. We present the first polynomial-time logarithmic approximation algorithm for this problem and prove an almost tight lower bound on the approximation ratio. Our experimental results show that our algorithm drastically improves the maximum degree of publish/subscribe overlay systems. We also propose a variation of the problem by enforcing that each topic-connected overlay network be of constant diameter while keeping the average degree low. We present three heuristics for this problem that guarantee that each topic-connected overlay network will be of diameter 2 and that aim at keeping the overall average node degree low. Our experimental results validate our algorithms, showing that our algorithms are able to achieve very low diameter without increasing the average degree by much. © 2011 IEEE

    Scaling Construction of Low Fan-out Overlays for Topic-Based Publish/Subscribe Systems

    Confidentiality-Preserving Publish/Subscribe: A Survey

    Publish/subscribe (pub/sub) is an attractive communication paradigm for large-scale distributed applications running across multiple administrative domains. Pub/sub allows event-based information dissemination based on constraints on the nature of the data rather than on pre-established communication channels. It is a natural fit for deployment in untrusted environments such as public clouds linking applications across multiple sites. However, pub/sub in untrusted environments lead to major confidentiality concerns stemming from the content-centric nature of the communications. This survey classifies and analyzes different approaches to confidentiality preservation for pub/sub, from applications of trust and access control models to novel encryption techniques. It provides an overview of the current challenges posed by confidentiality concerns and points to future research directions in this promising field

    Discreet - Pub/Sub for Edge Systems

    The number of devices connected to the Internet has been growing exponentially over the last few years. Today, the amount of information available to users has reached a point that makes it impossible to consume it all, showing that we need better ways to filter what kind of information is sent our way. At the same time, while users are online and access all this information, their actions are also being collected, scrutinized and commercialized with little regard for privacy. This thesis addresses those issues in the context of a decentralized Publish/Subscribe solution for edge systems. Working at the edge of the Internet aims to prevent centralized control from a single entity and lessen the chance of abuse. Our goal was to devise a solution that achieves efficient message delivery, with good load-balancing properties, without revealing its participants subscription interests to preserve user privacy. Our solution uses cryptography and probabilistic data sets as a way to obfuscate event topics and user subscriptions. We modeled a cooperative solution, where publisher and subscriber nodes work in concert to route events among themselves, by leveraging a onehop structured overlay. By using an experimental evaluation, we attest the scalability and general performance of the proposed algorithms, including latency, false negative and false positive rates, and other useful metrics.O número de aparelhos ligados a Internet têm vindo a crescer exponencialmente ao longo dos últimos anos. Hoje em dia, a quantidade de informação que os utilizadores têm disponível, chegou a um ponto que torna impossível o seu total consumo. Isto leva a que seja necessário encontrarmos melhores formas de filtrar a informação que recebemos. Ao mesmo tempo, as ações do utilizadores estão a ser recolhidas, examinadas e comercializadas, sem qualquer respeito pela privacidade. Esta tese trata destes assuntos no contexto de um sistema Publish/Subscribe descentralizado, para sistemas na periferia. O objectivo de operar na preferia da Internet está em prevenir o controlo centralizado por uma única entidade e diminuir a oportunidade para abusos. O nosso objectivo foi conceber uma solução que realiza entrega de mensagens eficientemente, com boas propriedades na distribuição de carga e sem revelar on interesses dos participantes, de forma a preservar a sua privacidade. A nossa solução usa criptografia e estruturas de dados probabilísticas, como uma forma de ofuscar os tópicos dos eventos e as subscrições dos utilizadores. Modelamos o sistema com o objectivo de ser uma solução cooperativa, onde ambos os tipos de nós Editores e Assinantes trabalham em concertadamente para encaminhar eventos entre eles, ao fazerem uso de uma estrutura de rede sobreposta com um salto. Fazendo uma avaliação experimental testámos a escalabilidade e o desempenho geral dos algoritmos propostos, incluindo a latência, falsos negativos, falsos positivos e outras métricas úteis

    Demand-Aware Network Designs of Bounded Degree

    Traditionally, networks such as datacenter interconnects are designed to optimize worst-case performance under arbitrary traffic patterns. Such network designs can however be far from optimal when considering the actual workloads and traffic patterns which they serve. This insight led to the development of demand-aware datacenter interconnects which can be reconfigured depending on the workload. Motivated by these trends, this paper initiates the algorithmic study of demand-aware networks (DANs), and in particular the design of bounded-degree networks. The inputs to the network design problem are a discrete communication request distribution, D, defined over communicating pairs from the node set V, and a bound, d, on the maximum degree. In turn, our objective is to design an (undirected) demand-aware network N = (V,E) of bounded-degree d, which provides short routing paths between frequently communicating nodes distributed across N. In particular, the designed network should minimize the expected path length on N (with respect to D), which is a basic measure of the efficiency of the network. We show that this fundamental network design problem exhibits interesting connections to several classic combinatorial problems and to information theory. We derive a general lower bound based on the entropy of the communication pattern D, and present asymptotically optimal network-aware design algorithms for important distribution families, such as sparse distributions and distributions of locally bounded doubling dimensions

    Distributed aop middleware for large-scale scenarios

    En aquesta tesi doctoral presentem una proposta de middleware distribuït pel desenvolupament d'aplicacions de gran escala. La nostra motivació principal és permetre que les responsabilitats distribuïdes d'aquestes aplicacions, com per exemple la replicació, puguin integrar-se de forma transparent i independent. El nostre enfoc es basa en la implementació d'aquestes responsabilitats mitjançant el paradigma d'aspectes distribuïts i es beneficia dels substrats de les xarxes peer-to-peer (P2P) i de la programació orientada a aspectes (AOP) per realitzar-ho de forma descentralitzada, desacoblada, eficient i transparent. La nostra arquitectura middleware es divideix en dues capes: un model de composició i una plataforma escalable de desplegament d'aspectes distribuïts. Per últim, es demostra la viabilitat i aplicabilitat del nostre model mitjançant la implementació i experimentació de prototipus en xarxes de gran escala reals.In this PhD dissertation we present a distributed middleware proposal for large-scale application development. Our main aim is to separate the distributed concerns of these applications, like replication, which can be integrated independently and transparently. Our approach is based on the implementation of these concerns using the paradigm of distributed aspects. In addition, our proposal benefits from the peer-to-peer (P2P) networks and aspect-oriented programming (AOP) substrates to provide these concerns in a decentralized, decoupled, efficient, and transparent way. Our middleware architecture is divided into two layers: a composition model and a scalable deployment platform for distributed aspects. Finally, we demonstrate the viability and applicability of our model via implementation and experimentation of prototypes in real large-scale networks

    Algorithms for continuous queries: A geometric approach

    <p>There has been an unprecedented growth in both the amount of data and the number of users interested in different types of data. Users often want to keep track of the data that match their interests over a period of time. A continuous query, once issued by a user, maintains the matching results for the user as new data (as well as updates to the existing data) continue to arrive in a stream. However, supporting potentially millions of continuous queries is a huge challenge. This dissertation addresses the problem of scalably processing a large number of continuous queries over a wide-area network. </p><p>Conceptually, the task of supporting distributed continuous queries can be divided into two components--event processing (computing the set of affected users for each data update) and notification dissemination (notifying the set of affected users). The first part of this dissertation focuses on event processing. Since interacting with large-scale data can easily frustrate and overwhelm the users, top-k queries have attracted considerable interest from the database community as they allow users to focus on the top-ranked results only. However, it is nearly impossible to find a set of common top-ranked data that everyone is interested in, therefore, users are allowed to specify their interest in different forms of preferences, such as personalized ranking function and range selection. This dissertation presents geometric frameworks, data structures, and algorithms for answering several types of preference queries efficiently. Experimental evaluations show that our approaches outperform the previous ones by orders of magnitude.</p><p>The second part of the dissertation presents comprehensive solutions to the problem of processing and notifying a large number of continuous range top-k queries across a wide-area network. Simple solutions include using a content-driven network to notify all continuous queries whose ranges contain the update (ignoring top-k), or using a server to compute only the affected continuous queries and notifying them individually. The former solution generates too much network traffic, while the latter overwhelms the server. This dissertation presents a geometric framework which allows the set of affected continuous queries to be described succinctly with messages that can be efficiently disseminated using content-driven networks. Fast algorithms are also developed to reformulate each update into a set of messages whose number is provably optimal, with or without knowing all continuous queries. </p><p>The final component of this dissertation is the design of a wide-area dissemination network for continuous range queries. In particular, this dissertation addresses the problem of assigning users to servers in a wide-area content-based publish/subscribe system. A good assignment should consider both users' interests and locations, and balance multiple performance criteria including bandwidth, delay, and load balance. This dissertation presents a Monte Carlo approximation algorithm as well as a simple greedy algorithm. The Monte Carlo algorithm jointly considers multiple performance criteria to find a broker-subscriber assignment and provides theoretical performance guarantees. Using this algorithm as a yardstick, the greedy algorithm is also concluded to work well across a wide range of workloads.</p>Dissertatio