Search CORE

317 research outputs found

Technical alignment

Author: Rauber A.
Rusbridge A.
Schrimpf S.
Schultz M.
Seadle M.
Publication venue: Educopia Institute Publications
Publication date: 01/01/2012
Field of study

This essay discusses the importance of the areas of infrastructure and testing to help digital preservation services demonstrate reliability, transparency, and accountability. It encourages practitioners to build a strong culture in which transparency and collaborations between technical frameworks are valued highly. It also argues for devising and applying agreed-upon metrics that will enable the systematic analysis of preservation infrastructure. The essay begins by defining technical infrastructure and testing in the digital preservation context, provides case studies that exemplify both progress and challenges for technical alignment in both areas, and concludes with suggestions for achieving greater degrees of technical alignment going forward

Enlighten

Distributed top-k aggregation queries at large

Author: A. Marian
Gerhard Weikum
H. David
I.F. Ilyas
K. Church
K. Schnaitter
Matthias Bender
N. Bruno
Peter Triantafillou
R. Akbarinia
R. Fagin
Ralf Schenkel
S. Chaudhuri
S. Madden
Sebastian Michel
T. Cormen
Thomas Neumann
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Top-k query processing is a fundamental building block for efficient ranking in a large number of applications. Efficiency is a central issue, especially for distributed settings, when the data is spread across different nodes in a network. This paper introduces novel optimization methods for top-k aggregation queries in such distributed environments. The optimizations can be applied to all algorithms that fall into the frameworks of the prior TPUT and KLEE methods. The optimizations address three degrees of freedom: 1) hierarchically grouping input lists into top-k operator trees and optimizing the tree structure, 2) computing data-adaptive scan depths for different input sources, and 3) data-adaptive sampling of a small subset of input sources in scenarios with hundreds or thousands of query-relevant network nodes. All optimizations are based on a statistical cost model that utilizes local synopses, e.g., in the form of histograms, efficiently computed convolutions, and estimators based on order statistics. The paper presents comprehensive experiments, with three different real-life datasets and using the ns-2 network simulator for a packet-level simulation of a large Internet-style network

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Springer - Publisher Connector

Enlighten

MPG.PuRe

A Medium-Scale Distributed System for Computer Science Research: Infrastructure for the Long Term

Author: Bal H.
de Laat C.
Epema D.
Romein J.
Seinstra F.
Snoek C.
van Nieuwpoort R.
Wijshoff H.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2016
Field of study

International Migration, Integration and Social Cohesion online publications

Experimenting with Gnutella Communities

Author: Gilbert Babin
Jean Vaucher
Peter Kropf
Thierry Jouve
Publication venue
Publication date
Field of study

Computer networks and distributed systems in general may be regarded as communities where the individual components, be they entire systems, application software or users, interact in a shared environment. Such communities dynamically evolve with components or nodes joning and leaving the system. Their own individual activities affect the community's behaviour and vice-versa. This paper discusses various experiments undertaken to investigate the behaviour of a real system, the Gnutella network, which represents such a community. Gnutella is a distributed Peer-to-Peer data-sharing system without any central control. It turns out that most interactions between nodes do not last long and much of their activity is devoted to finding appropriate partners in the network. Good connections lasting longer appear only as rare events. For example, out of 42,000 connections only 57 hosts were found to available on a regular basis. This means that, in contrast to the common belief that this kind of peer-to-peer networks or sub-communities are always large, they are actually quite small. However, those sub-communities examplify very dynamic behaviour because their actual composition can change very quickly. The experimental results presented have been obtained from a Java implementation of Gnutella running in the open Internet environment, and thus in unknown and quickly changing network structures heavily dependent on chance. Les réseaux informatique ainsi que les systèmes distribués peuvent être considérés comme des communautés où les composantes - que ce soit des systèmes complets, des programmes ou des usagers - interagissent dans un environnement partagé. Ces communautés sont dynamiques car des éléments peuvent s'y joindre ou quitter en tout temps. L'article présente les résultats d'une suite d'expériences et de mesures faites sur Gnutella, un système peer-to-peer à grande échelle qui opère sans aucun contrôle centralisé. Nous avons remarqué qu'une grande partie des messages échangés sont erronés ou redondants et que les interactions entre n?uds ne durent pas très longtemps. En particulier, des connexions durant plus d'une minute sont des phénomènes rares. Les n?uds passent donc la majorité de leur temps à remplacer les partenaires perdus et, contrairement à l'idée répandue que les réseaux peer-to-peer sont immenses, nous avons noté que les communautés effectives étaient assez limitées. Gnutella est un environnement très dynamique avec peu de stabilité. Par exemple, de 42,000 sites avec lesquels nous avons établi une connexion, il a seulement été possible de re-communiquer de façon régulière avec 57. Dans un tel environnement, la chance joue un rôle important dans la performance observée; mais nous avons élaboré un protocole expérimental permettant de comparer diverses options.Gnutella, peer-to-peer networks, Internet communities, distributed systems, protocols, Gnutella, réseaux peer-to-peer, communautés virtuelles, internet, systèmes distribués, protocoles de télécommunication

Research Papers in Economics

Top-k aggregation queries in large-scale distributed systems

Author: Michel Sebastian
Publication venue: Fakultät 6 - Naturwissenschaftlich-Technische Fakultät I. Fachrichtung 6.2 - Informatik
Publication date: 01/01/2007
Field of study

Distributed top-k query processing has recently become an essential functionality in a large number of emerging application classes like Internet traffic monitoring and Peer-to-Peer Web search. This work addresses efficient algorithms for distributed top-k queries in wide-area networks where the index lists for the attribute values (or text terms) of a query are distributed across a number of data peers. More precisely, in this thesis, we make the following distributions: We present the family of KLEE algorithms that are a fundamental building-block towards efficient top-k query processing in distributed systems. We present means to model score distributions and show how these score models can be used to reason about parameter values that play an important role in the overall performance of KLEE. We present GRASS, a family of novel algorithms based on three optimization techniques significantly increased overall performance of KLEE and related algorithms. We present probabilistic guarantees for the result quality. Moreover, we present Minerva1, a distributed search engine. Minerva offers a highly distributed (in both the data dimension and the computational dimension), scalable, and efficient solution toward the development of internet-scale search engines.Top-k Anfragen spielen eine große Rolle in einer Vielzahl von Anwendungen, insbesondere im Bereich von Informationssystemen, bei denen eine kleine, sorgfältig ausgewählte Teilmenge der Ergebnisse den Benutzern präsentiert werden soll. Beispiele hierfür sind Suchmaschinen wie Google, Yahoo oder MSN. Obwohl die Forschung in diesem Bereich in den letzten Jahren große Fortschritte gemacht hat, haben Top-k-Anfragen in verteilten Systemen, bei denen die Daten auf verschiedenen Rechnern verteilt sind, vergleichsweise wenig Aufmerksamkeit erlangt. In dieser Arbeit beschäftigen wir uns mit der effizienten Verarbeitung eben dieser Anfragen. Die Hauptbeiträge gliedern sich wie folgt. Wir präsentieren KLEE, eine Familie neuartiger Top-k-Algorithmen. Wir entwickeln Modelle mit denen Datenverteilungen beschrieben werden können. Diese Modelle sind die Grundlage für eine Schätzung diverser Parameter, die einen großen Einfluss auf die Performanz von KLEE und anderen ähnlichen Algorithmen haben. Wir präsentieren GRASS, eine Familie von Algorithmen, basierend auf drei neuartigen Optimierungstechniken, mit denen die Performanz von KLEE und ähnlichen Algorithmen verbessert wird. Wir präsentieren probabilistische Garantien für die Ergebnisgüte. Wir präsentieren Minerva, eine neuartige verteilte Peer-to-Peer-Suchmaschine

Universaar

MPG.PuRe

Acronym

Neighborhood-based Tag Prediction

Author: Aberer Karl
Budura Adriana
Cudré-Mauroux Philippe
Michel Sebastian
Publication venue
Publication date: 17/03/2009
Field of study

We consider the problem of tag prediction in collaborative tagging systems where users share and annotate resources on the Web. We put forward HAMLET, a novel approach to automatically propagate tags along the edges of a graph which relates similar documents. We identify the core principles underlying tag propagation for which we derive suitable scoring models combined in one overall ranking formula. Leveraging these scores, we present an effcient top-k tag selection algorithm that infers additional tags by carefully inspecting neighbors in the document graph. Experiments using real-world data demonstrate the viability of our approach in large-scale environments where tags are scarce

Infoscience - École polytechnique fédérale de Lausanne

BIGhybrid: A Simulator for MapReduce Applications in Hybrid Distributed Infrastructures Validated with the Grid5000 Experimental Platform

Author: Anjos Julio,
Fedak Gilles
Geyer Claudio
Publication venue: 'Wiley'
Publication date: 01/01/2015
Field of study

International audienceSUMMARY Cloud computing has increasingly been used as a platform for running large business and data processing applications. Conversely, Desktop Grids have been successfully employed in a wide range of projects, because they are able to take advantage of a large number of resources provided free of charge by volunteers. A hybrid infrastructure created from the combination of Cloud and Desktop Grids infrastructures can provide a low-cost and scalable solution for Big Data analysis. Although frameworks like MapReduce have been designed to exploit commodity hardware, their ability to take advantage of a hybrid infrastructure poses significant challenges due to their large resource heterogeneity and high churn rate. In this paper is proposed BIGhybrid, a simulator for two existing classes of MapReduce runtime environments: BitDew-MapReduce designed for Desktop Grids and BlobSeer-Hadoop designed for Cloud computing, where the goal is to carry out accurate simulations of MapReduce executions in a hybrid infrastructure composed of Cloud computing and Desktop Grid resources. This work describes the principles of the simulator and describes the validation of BigHybrid with the Grid5000 experimental platform. Owing to BigHybrid, developers can investigate and evaluate new algorithms to enable MapReduce to be executed in hybrid infrastructures. This includes topics such as resource allocation and data splitting. Concurrency and Computation: Practice and Experienc

HAL-ENS-LYON

INRIA a CCSD electronic archive server

Hal-Diderot

Knowledge-Driven Selection of Market Mechanisms in E-Procurement

Author: Block C.
Karabulut Y.
Neumann Dirk
Weinhardt Christof
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/01/2007
Field of study

AIS Electronic Library (AISeL)

Towards Measuring and Understanding Performance in Infrastructure- and Function-as-a-Service Clouds

Author: Scheuner Joel
Publication venue
Publication date: 01/01/2020
Field of study

Context. Cloud computing has become the de facto standard for deploying modern software systems, which makes its performance crucial to the efficient functioning of many applications. However, the unabated growth of established cloud services, such as Infrastructure-as-a-Service (IaaS), and the emergence of new services, such as Function-as-a-Service (FaaS), has led to an unprecedented diversity of cloud services with different performance characteristics.Objective. The goal of this licentiate thesis is to measure and understand performance in IaaS and FaaS clouds. My PhD thesis will extend and leverage this understanding to propose solutions for building performance-optimized FaaS cloud applications.Method.\ua0To achieve this goal, quantitative and qualitative research methods are used, including experimental research, artifact analysis, and literature review.Findings.\ua0The thesis proposes a cloud benchmarking methodology to estimate application performance in IaaS clouds, characterizes typical FaaS applications, identifies gaps in literature on FaaS performance evaluations, and examines the reproducibility of reported FaaS performance experiments. The evaluation of the benchmarking methodology yielded promising results for benchmark-based application performance estimation under selected conditions. Characterizing 89 FaaS applications revealed that they are most commonly used for short-running tasks with low data volume and bursty workloads. The review of 112 FaaS performance studies from academic and industrial sources found a strong focus on a single cloud platform using artificial micro-benchmarks and discovered that the majority of studies do not follow reproducibility principles on cloud experimentation.Future Work. Future work will propose a suite of application performance benchmarks for FaaS, which is instrumental for evaluating candidate solutions towards building performance-optimized FaaS applications

Chalmers Research