Search CORE

175 research outputs found

The essence of P2P: A reference architecture for overlay networks

Author: Aberer Karl
Ghodsi Ali
Girdzijauskas Sarunas
Haridi Seif
Hauswirth Manfred
Onana Alima Luc
Publication venue
Publication date: 01/01/2005
Field of study

The success of the P2P idea has created a huge diversity of approaches, among which overlay networks, for example, Gnutella, Kazaa, Chord, Pastry, Tapestry, P-Grid, or DKS, have received specific attention from both developers and researchers. A wide variety of algorithms, data structures, and architectures have been proposed. The terminologies and abstractions used, however, have become quite inconsistent since the P2P paradigm has attracted people from many different communities, e.g., networking, databases, distributed systems, graph theory, complexity theory, biology, etc. In this paper we propose a reference model for overlay networks which is capable of modeling different approaches in this domain in a generic manner. It is intended to allow researchers and users to assess the properties of concrete systems, to establish a common vocabulary for scientific discussion, to facilitate the qualitative comparison of the systems, and to serve as the basis for defining a standardized API to make overlay networks interoperable

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

Peer to peer multidimensional overlays: Approximating complex structures

Author: Beaumont Olivier
Kermarrec Anne-Marie
Rivière Etienne
Publication venue: HAL CCSD
Publication date: 01/01/2007
Field of study

Peer to peer overlay networks have proven to be a good support for storing and retrieving data in a fully decentralized way. A sound approach is to structure them in such a way that they reflect the structure of the application. Peers represent objects of the application so that neighbours in the peer to peer network are objects having similar characteristics from the application's point of view. Such structured peer to peer overlay networks provide a natural support for range queries. While some complex structures such as a Voronoï tessellation, where each peer is associated to a cell in the space, are clearly relevant to structure the objects, the associated cost to compute and maintain these structures is usually extremely high for dimensions larger than 2. We argue that an approximation of a complex structure is enough to provide a native support of range queries. This stems fromthe fact that neighbours are importantwhile the exact space partitioning associated to a given peer is not as crucial. In this paper we present the design, analysis and evaluation of RayNet, a loosely structured Voronoï-based overlay network. RayNet organizes peers in an approximation of a Voronoï tessellation in a fully decentralized way. It relies on a Monte-Carlo algorithm to estimate the size of a cell and on an epidemic protocol to discover neighbours. In order to ensure efficient (polylogarithmic) routing, RayNet is inspired from the Kleinberg's small world model where each peer gets connected to close neighbours (its approximate Voronoï neighbours in Raynet) and shortcuts, long range neighbours, implemented using an existing Kleinberg-like peer sampling

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

Enabling autoscaling for in-memory storage in cluster computing framework

Author: Shrestha Bibek Raj
Publication venue: Colorado State University. Libraries
Publication date: 01/01/2019
Field of study

2019 Spring.Includes bibliographical references.IoT enabled devices and observational instruments continuously generate voluminous data. A large portion of these datasets are delivered with the associated geospatial locations. The increased volumes of geospatial data, alongside the emerging geospatial services, pose computational challenges for large-scale geospatial analytics. We have designed and implemented STRETCH , an in-memory distributed geospatial storage that preserves spatial proximity and enables proactive autoscaling for frequently accessed data. STRETCH stores data with a delayed data dispersion scheme that incrementally adds data nodes to the storage system. We have devised an autoscaling feature that proactively repartitions data to alleviate computational hotspots before they occur. We compared the performance of S TRETCH with Apache Ignite and the results show that STRETCH provides up to 3 times the throughput when the system encounters hotspots. STRETCH is built on Apache Spark and Ignite and interacts with them at runtime

Mountain Scholar (Digital Collections of Colorado and Wyoming)

Resource discovery for distributed computing systems: A comprehensive survey

Author: Abdullah
Aberer
Abraham
Aguiar
Aguilera
Ahmed
Akay
Alam
Albrecht
Albrecht
Anderson
Antonopoulos
Aspnes
Atif
Awerbuch
Awerbuch
Baldoni
Ballani
Bandara
Banerjee
Bangyong
Baranwal
Barjini
Basu
Battre
Berman
Bharambe
Bharambe
Bimson
Birman
Bisnik
Bisnik
Bo
Brocco
Brocco
Brogi
Brown
Brunner
Buccafurri
Burstein
Butt
Buyya
Byrom
Byrom
Cai
Caminero
Campo
Candan
Cao
Carra
Carzaniga
Castro
Chang
Chang-Yen
Chatziantoniou
Chaudhuri
Chawathe
Chen
Chen
Chen
Chen
Cheng
Chien
Chung
Cidon
Costa
Crainiceanu
Crainiceanu
Crespo
Czajkowski
Datta
Datta
Davtyan
Deng
Deng
Dhurandher
Di
Di
Di
Diaz
Dimakopoulos
Dimakopoulos
Dissanayaka
Di Martino
Dorigo
Dorigo
Duarte
D’Angelo
Elijorde
Erdil
Erdil
Falchi
Fensel
Ferretti
Forestiero
Forestiero
Foster
Foster
Foster
Foster
Foster
Frey
Fugkeaw
Gaeta
Ganesan
Ganesan
Ganesh
Ganguly
Gao
Gentzsch
Georgiou
Germain
Ghafarian
Ghamri-Doudane
Ghamri-Doudane
Gill
Glover
Goel
González-Beltrán
Guo
Hameurlain
Hameurlain
Harchol-Balter
Harvey
Haykin
Henderson
Hidalgo
Horrocks
Horrocks
Hussin
Iamnitchi
Ionescu
Javad Zarrin
Jelasity
Jesi
Jin
Joung
Joung
Joung
João Paulo Barraca
Kalogeraki
Kannan
Ke
Keller
Kermarrec
Keung
Khanli
Khoobkar
Kim
Klusch
Kniesburges
Ko
Korf
Korf
Kostoulas
Krauter
Krynicki
Kumar
Kutten
Kutten
Kutten
Lazaro
Lee
Lee
Li
Li
Li
Li
Li
Li
Liben-Nowell
Lima
Lu
Ludwig
Lv
Makki
Manvi
March
Martino
Massie
Mastroianni
Mateescu
McGuinness
Medrano-Chávez
Melliar-Smith
Meng
Meshkova
Michlmayr
Milojicic
Montebello
Murugan
Nagarajan
Naseer
Navimipour
Newcomer
Nurmi
Oikonomou
Pan
Pande
Passarella
Pastore
Pathan
Pipan
Pittaras
Prajapati
Raack
Raicu
Raman
Ratnasamy
Reed
Reynolds
Rhea
Rhea
Rhee
Risson
Rochwerger
Rochwerger
Rowstron
Rui L. Aguiar
Russell
Sander
Sathish
Schopf
Schubert
Schubert
Seo
Shaikh
Shaikh
Shang
Shen
Shenvi
Siddiqui
Sotiriadis
Sotomayor
Staples
Steiner
Stevens
Stevens
Stoica
Stützle
Sun
Sun
Sun
Taheri
Talia
Talia
Talia
Talia
Tang
Tang
Tannenbaum
Tao
Tate
Tereshko
Tigelaar
Torkestani
Trunfio
Valdez
Vanthournout
Vanthournout
Van Renesse
Ververidis
Wang
Watkins
Welch
Wolinsky
Wright
Xiao
Xu
Xu
Xu
Yang
Yao
Yin
Ying
Yoo
Yousefipour
Yu
Yusta
Zaharia
Zarrin
Zarrin
Zarrin
Zarrin
Zhang
Zhang
Zhang
Zhang
Zhao
Zhao
Zhou
Zhou
Zhu
Publication venue: 'Elsevier BV'
Publication date: 01/03/2018
Field of study

Large-scale distributed computing environments provide a vast amount of heterogeneous computing resources from different sources for resource sharing and distributed computing. Discovering appropriate resources in such environments is a challenge which involves several different subjects. In this paper, we provide an investigation on the current state of resource discovery protocols, mechanisms, and platforms for large-scale distributed environments, focusing on the design aspects. We classify all related aspects, general steps, and requirements to construct a novel resource discovery solution in three categories consisting of structures, methods, and issues. Accordingly, we review the literature, analyzing various aspects for each category

Crossref

Repositório Institucional da Universidade de Aveiro

Anglia Ruskin Research

Optimal Customized Content Dissemination for Rich Content Format in Pub/Sub Framework

Author: Mr. Nikhil Banait, Prof. Sonali Patil
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/07/2015
Field of study

Publish-Subscribe system is a message passing system which categorized into two types i.e. topic based system and content based system. The publisher is the sender who is responsible for deciding the classes or topics of publish messages to which subscribers can able to subscribe. Subscriber is a receiver who will receive all messages published to the class to which they subscribe. A content based publish-subscribe framework that delivers the content by matching constraints to the subscribers into their required format. Such framework enables the publish-subscribe system to adapt richer content formats including larger text files containing huge number of events to be published with different properties and other content. In Customized Content Dissemination, user’s i.e. consumers in addition to specifying their information needs, also specify their profile information which includes the device characteristics used to obtain the content. Our pub sub system as being responsible for matching and distributing the published content, also responsible for converting the content into the desired format for subscribers. DOI: 10.17762/ijritcc2321-8169.15073

International Journal on Recent and Innovation Trends in Computing and Communication

Fast Scalable Peer-to-Peer Lookup Services for Multi-Hop Wireless Networks

Author: Shin Min-Ho
Publication venue
Publication date: 28/01/2008
Field of study

Recent years have seen growing popularity of multi-hop wireless networks such as wireless mesh networks and sensor networks. Such systems require eﬃcient lookup services for reliable system operation such as packet routing, key-discovery, and object lookup. The lack of infrastructure, however, makes the centralized lookup fail to scale in multi-hop wireless networks. For example, consider a citywide wireless mesh network which provides wireless connection service to a number of mobile users. Due to a high volume of user access and inherent vulnerability of wireless links, centralized authentication methods fail to scale. The decentralization of user authentication, however, faces a challenge of key discovery ; how to ﬁnd the location of user keys. Motivated from the user authentication problem in wireless mesh networks, this dissertation work aims to provide eﬃcient and scalable distributed lookup services for multi-hop wireless networks. Employing the notion of peer-to-peer lookup where each node can both query and respond, I present two diﬀerent methods: Valley-Walk and Rigs. A loosely-structured scheme Valley-Walk strategically places object copies and locates them eﬃciently only with a minimal local structure. The Valley-Walk ﬁnds target objects in near-optimal hop counts with a moderate number of copies (e.g., 10% the network size) stored in the network. Without a global structure, however, Valley-Walk fails to guarantee the low cost search with a small number of copies. A tightly-structured scheme Rigs (Ring Interval Graph Search) realizes a Distributed Hash Table (DHT) in multi-hop wireless networks. Experimental study shows the limitations of existing DHTs in mult-hop wireless networks due to its independence of underlying topology. Unlike DHT, Rigs constructs a search structure Ring Interval Graph such that queries are forwarded only to local neighbors. Rigs guarantees successful object lookup with near-optimal performance

Digital Repository at the University of Maryland

M-Grid : A distributed framework for multidimensional indexing and querying of location based big data

Author: Kumar Shashank
Publication venue: Scholars\u27 Mine
Publication date: 01/01/2014
Field of study

The widespread use of mobile devices and the real time availability of user-location information is facilitating the development of new personalized, location-based applications and services (LBSs). Such applications require multi-attribute query processing, handling of high access scalability, support for millions of users, real time querying capability and analysis of large volumes of data. Cloud computing aided a new generation of distributed databases commonly known as key-value stores. Key-value stores were designed to extract value from very large volumes of data while being highly available, fault-tolerant and scalable, hence providing much needed features to support LBSs. However complex queries on multidimensional data cannot be processed efficiently as they do not provide means to access multiple attributes. In this thesis we present MGrid, a unifying indexing framework which enables key-value stores to support multidimensional queries. We organize a set of nodes in a P-Grid overlay network which provides fault-tolerance and efficient query processing. We use Hilbert Space Filling Curve based linearization technique which preserves the data locality to efficiently manage multi-dimensional data in a key-value store. We propose algorithms to dynamically process range and k nearest neighbor (kNN) queries on linearized values. This removes the overhead of maintaining a separate index table. Our approach is completely independent from the underlying storage layer and can be implemented on any cloud infrastructure. Experiments on Amazon EC2 show that MGrid achieves a performance improvement of three orders of magnitude in comparison to MapReduce and four times to that of MDHBase scheme --Abstract, pages iii-iv

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

A framework for multidimensional indexes on distributed and highly-available data stores

Author: Cugnasco Cesare
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2019
Field of study

Spatial Big Data is considered an essential trend in future scientific and business applications. Indeed, research instruments, medical devices, and social networks generate hundreds of peta bytes of spatial data per year. However, as many authors have pointed out, the lack of specialized frameworks dealing with such kind of data is limiting possible applications and probably precluding many scientific breakthroughs. In this thesis, we describe three HPC scientific applications, ranging from molecular dynamics, neuroscience analysis, and physics simulations, where we experience first hand the limits of the existing technologies. Thanks to our experience, we define the desirable missing functionalities, and we focus on two features that when combined significantly improve the way scientific data is analyzed. On one side, scientific simulations generate complex datasets where multiple correlated characteristics describe each item. For instance, a particle might have a space position (x,y,z) at a given time (t). If we want to find all elements within the same area and period, we either have to scan the whole dataset, or we must organize the data so that all items in the same space and time are stored together. The second approach is called Multidimensional Indexing (MI), and it uses different techniques to cluster and to organize similar data together. On the other side, approximate analytics has been often indicated as a smart and flexible way to explore large datasets in a short period. Approximate analytics includes a broad family of algorithms which aims to speed up analytical workloads by relaxing the precision of the results within a specific interval of confidence. For instance, if we want to know the average age in a group with 1-year precision, we can consider just a random fraction of all the people, thus reducing the amount of calculation. But if we also want less I/O operations, we need efficient data sampling, which means organizing data in a way that we do not need to scan the whole data set to generate a random sample of it. According to our analysis, combining Multidimensional Indexing with efficient data Sampling (MIS) is a vital missing feature not available in the current distributed data management solutions. This thesis aims to solve such a shortcoming and it provides novel scalable solutions. At first, we describe the existing data management alternatives; then we motivate our preference for NoSQL key-value databases. Secondly, we propose an analytical model to study the influence of data models on the scalability and performance of this kind of distributed database. Thirdly, we use the analytical model to design two novel multidimensional indexes with efficient data sampling: the D8tree and the AOTree. Our first solution, the D8tree, improves state of the art for approximate spatial queries on static and mostly read dataset. Later, we enhanced the data ingestion capability or our approach by introducing the AOTree, an algorithm that enables the query performance of the D8tree even for HPC write-intensive applications. We compared our solution with PostgreSQL and plain storage, and we demonstrate that our proposal has better performance and scalability. Finally, we describe Qbeast, the novel distributed system that implements the D8tree and the AOTree using NoSQL technologies, and we illustrate how Qbeast simplifies the workflow of scientists in various HPC applications providing a scalable and integrated solution for data analysis and management.La gestión de BigData con información espacial está considerada como una tendencia esencial en el futuro de las aplicaciones científicas y de negocio. De hecho, se generan cientos de petabytes de datos espaciales por año mediante instrumentos de investigación, dispositivos médicos y redes sociales. Sin embargo, tal y como muchos autores han señalado, la falta de entornos especializados en manejar este tipo de datos está limitando sus posibles aplicaciones y está impidiendo muchos avances científicos. En esta tesis, describimos 3 aplicaciones científicas HPC, que cubren los ámbitos de dinámica molecular, análisis neurocientífico y simulaciones físicas, donde hemos experimentado en primera mano las limitaciones de las tecnologías existentes. Gracias a nuestras experiencias, hemos podido definir qué funcionalidades serían deseables y no existen, y nos hemos centrado en dos características que, al combinarlas, mejoran significativamente la manera en la que se analizan los datos científicos. Por un lado, las simulaciones científicas generan conjuntos de datos complejos, en los que cada elemento es descrito por múltiples características correlacionadas. Por ejemplo, una partícula puede tener una posición espacial (x, y, z) en un momento dado (t). Si queremos encontrar todos los elementos dentro de la misma área y periodo, o bien recorremos y analizamos todo el conjunto de datos, o bien organizamos los datos de manera que se almacenen juntos todos los elementos que comparten área en un momento dado. Esta segunda opción se conoce como Indexación Multidimensional (IM) y usa diferentes técnicas para agrupar y organizar datos similares. Por otro lado, se suele señalar que las analíticas aproximadas son una manera inteligente y flexible de explorar grandes conjuntos de datos en poco tiempo. Este tipo de analíticas incluyen una amplia familia de algoritmos que acelera el tiempo de procesado, relajando la precisión de los resultados dentro de un determinado intervalo de confianza. Por ejemplo, si queremos saber la edad media de un grupo con precisión de un año, podemos considerar sólo un subconjunto aleatorio de todas las personas, reduciendo así la cantidad de cálculo. Pero si además queremos menos operaciones de entrada/salida, necesitamos un muestreo eficiente de datos, que implica organizar los datos de manera que no necesitemos recorrerlos todos para generar una muestra aleatoria. De acuerdo con nuestros análisis, la combinación de Indexación Multidimensional con Muestreo eficiente de datos (IMM) es una característica vital que no está disponible en las soluciones actuales de gestión distribuida de datos. Esta tesis pretende resolver esta limitación y proporciona unas soluciones novedosas que son escalables. En primer lugar, describimos las alternativas de gestión de datos que existen y motivamos nuestra preferencia por las bases de datos NoSQL basadas en clave-valor. En segundo lugar, proponemos un modelo analítico para estudiar la influencia que tienen los modelos de datos sobre la escalabilidad y el rendimiento de este tipo de bases de datos distribuidas. En tercer lugar, usamos el modelo analítico para diseñar dos novedosos algoritmos IMM: el D8tree y el AOTree. Nuestra primera solución, el D8tree, mejora el estado del arte actual para consultas espaciales aproximadas, cuando el conjunto de datos es estático y mayoritariamente de lectura. Después, mejoramos la capacidad de ingestión introduciendo el AOTree, un algoritmo que conserva el rendimiento del D8tree incluso para aplicaciones HPC intensivas en escritura. Hemos comparado nuestra solución con PostgreSQL y almacenamiento plano demostrando que nuestra propuesta mejora tanto el rendimiento como la escalabilidad. Finalmente, describimos Qbeast, el sistema que implementa los algoritmos D8tree y AOTree, e ilustramos cómo Qbeast simplifica el flujo de trabajo de los científicos ofreciendo una solución escalable e integraPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Architecture-independent distributed query processing

Author: Mühleisen H.F. (Hannes)
Publication venue
Publication date: 07/12/2012
Field of study

CWI's Institutional Repository