14 research outputs found
Measuring the dynamical state of the Internet: Large-scale network tomography via the ETOMIC infrastructure
In this paper we show how to go beyond the study of the
topological properties of the Internet, by measuring its
dynamical state using special active probing techniques and the
methods of network tomography. We demonstrate this approach by
measuring the key state parameters of Internet paths, the
characteristics of queuing delay, in a part of the European
Internet. In the paper we describe in detail the ETOMIC
measurement platform that was used to conduct the experiments,
and the applied method of queuing delay tomography. The main
results of the paper are maps showing various spatial structure
in the characteristics of queuing delay corresponding to the
resolved part of the European Internet. These maps reveal that
the average queuing delay of network segments spans more than
two orders of magnitude, and that the distribution of this
quantity is very well fitted by the log-normal distribution.
Copyright © 2006 S. Karger AG
Implementation of multi-layer techniques using FEDERICA, PASITO and OneLab network infrastructures
Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. V. López, J. L. Añamuro, V. Moreno, J. E. L. De Vergara, J. Aracil, C. García, J. P. Fernández-Palacios, and M. Izal, "Implementation of multi-layer techniques using FEDERICA, PASITO and OneLab network infrastructures", in 17th IEEE International Conference on Networks, ICON 2011, p. 89-94This paper describes an implementation of multilayer
techniques using the network infrastructure provided by
FEDERICA, PASITO and OneLab projects. FEDERICA project
provides a network infrastructure, based on virtualization capabilities
in both network and computing resources, which
creates custom-made virtual environments. PASITO is a layer-
2 network that connects universities and research centers in
Spain. OneLab measurements tools allow carrying out highaccuracy
active network measurements. Thanks to FEDERICA
and PASITO, we have a multi-layer architecture where the traffic
is routed based on the measurements of OneLab equipment.
To carry out this experiment, we have developed a Multi-layer
Traffic Engineering manager and an implementation of the Path
Computation Element Protocol to solve the lack of a control plane
in IP oriented networks. This work shows the feasibility of multilayer
techniques as a convenient solution for network operators
and it validates our Path Computation Element implementation.This work has been partially funded by the Spanish Ministry
of Education and Science under project ANFORA (TEC2009-13385), by the Spanish Ministry of Industry, Tourism and
Trade under PASITO project, and by the European Union
under project OneLab2 (FP7-224263). Authors would like
to thank Mauro Campanella (GARR, the project coordinator
of FEDERICA) and Miguel Angel Sotos (RedIris) for their
support to carry out this work
End-to-End Available Bandwidth Estimation Tools, An Experimental Comparison
Abstract. The available bandwidth of a network path impacts the per-formance of many applications, such as VoIP calls, video streaming and P2P content distribution systems. Several tools for bandwidth estimation have been proposed in the last years but there is still uncertainty in their accuracy and efficiency under different network conditions. Although a number of experimental evaluations have been carried out in order to compare some of these methods, a comprehensive evaluation of all the existing active tools for available bandwidth estimation is still missing. This article introduces an empirical comparison of most of the active esti-mation tools actually implemented and freely available nowadays. Abing, ASSOLO, DietTopp, IGI, pathChirp, Pathload, PTR, Spruce and Yaz have been compared in a controlled environment and in presence of dif-ferent sources of cross-traffic. The performance of each tool has been investigated in terms of accuracy, time and traffic injected into the net-work to perform an estimation.
Harnessing low-level tuning in modern architectures for high-performance network monitoring in physical and virtual platforms
Tesis doctoral inédita leída en la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Tecnología Electrónica y de las Comunicaciones. Fecha de lectura: 02-07-201
Proactive measurement techniques for network monitoring in heterogeneous environments
Tesis doctoral inédita. Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Tecnología Electrónica y de las Comunicaciones, 201
Machine learning-based available bandwidth estimation
Today’s Internet Protocol (IP), the Internet’s network-layer protocol, provides
a best-effort service to all users without any guaranteed bandwidth. However,
for certain applications that have stringent network performance requirements
in terms of bandwidth, it is significantly important to provide Quality of Ser-
vice (QoS) guarantees in IP networks. The end-to-end available bandwidth of a
network path, i.e., the residual capacity that is left over by other traffic, is deter-
mined by its tight link, that is the link that has the minimal available bandwidth.
The tight link may differ from the bottleneck link, i.e., the link with the minimal
capacity.
Passive and active measurements are the two fundamental approaches used
to estimate the available bandwidth in IP networks. Unlike passive measurement tools that are based on the non-intrusive monitoring of traffic, active tools
are based on the concept of self-induced congestion. The dispersion, which
arises when packets traverse a network, carries information that can reveal relevant network characteristics. Using a fluid-flow probe gap model of a tight link
with First-in, First-out (FIFO) multiplexing, accepted probing tools measure the
packet dispersion to estimate the available bandwidth. Difficulties arise, how-
ever, if the dispersion is distorted compared to the model, e.g., by non-fluid
traffic, multiple tight links, clustering of packets due to interrupt coalescing
and inaccurate time-stamping in general. It is recognized that modeling these
effects is cumbersome if not intractable.
To alleviate the variability of noise-afflicted packet gaps, the state-of-the-art
bandwidth estimation techniques use post-processing of the measurement results, e.g., averaging over several packet pairs or packet trains, linear regression,
or a Kalman filter. These techniques, however, do not overcome the basic as-
sumptions of the deterministic fluid model. While packet trains and statistical
post-processing help to reduce the variability of available bandwidth estimates,
these cannot resolve systematic deviations such as the underestimation bias
in case of random cross traffic and multiple tight links. The limitations of the
state-of-the-art methods motivate us to explore the use of machine learning in
end-to-end active and passive available bandwidth estimation.
We investigate how to benefit from machine learning while using standard packet train probes for active available bandwidth estimation. To reduce
the amount of required training data, we propose a regression-based scale-
invariant method that is applicable without prior calibration to networks of arbitrary capacity. To reduce the amount of probe traffic further, we implement
a neural network that acts as a recommender and can effectively select the
probe rates that reduce the estimation error most quickly. We also evaluate our
method with other regression-based supervised machine learning techniques.
Furthermore, we propose two different multi-class classification-based meth-
ods for available bandwidth estimation. The first method employs reinforcement learning that learns through the network path’s observations without
having a training phase. We formulate the available bandwidth estimation as a
single-state Markov Decision Process (MDP) multi-armed bandit problem and
implement the ε-greedy algorithm to find the available bandwidth, where ε is
a parameter that controls the exploration vs. exploitation trade-off.
We propose another supervised learning-based classification method to ob-
tain reliable available bandwidth estimates with a reduced amount of network
overhead in networks, where available bandwidth changes very frequently. In
such networks, reinforcement learning-based method may take longer to con-
verge as it has no training phase and learns in an online manner. We also evaluate our method with different classification-based supervised machine learning techniques. Furthermore, considering the correlated changes in a network’s
traffic through time, we apply filtering techniques on the estimation results in
order to track the available bandwidth changes.
Active probing techniques provide flexibility in designing the input struc-
ture. In contrast, the vast majority of Internet traffic is Transmission Control
Protocol (TCP) flows that exhibit a rather chaotic traffic pattern. We investigate
how the theory of active probing can be used to extract relevant information
from passive TCP measurements. We extend our method to perform the estima-
tion using only sender-side measurements of TCP data and acknowledgment
packets. However, non-fluid cross traffic, multiple tight links, and packet loss
in the reverse path may alter the spacing of acknowledgments and hence in-
crease the measurement noise. To obtain reliable available bandwidth estimates
from noise-afflicted acknowledgment gaps we propose a neural network-based
method.
We conduct a comprehensive measurement study in a controlled network
testbed at Leibniz University Hannover. We evaluate our proposed methods
under a variety of notoriously difficult network conditions that have not been
included in the training such as randomly generated networks with multiple
tight links, heavy cross traffic burstiness, delays, and packet loss. Our testing
results reveal that our proposed machine learning-based techniques are able to
identify the available bandwidth with high precision from active and passive
measurements. Furthermore, our reinforcement learning-based method without any training phase shows accurate and fast convergence to available band-
width estimates
Trade-off among timeliness, messages and accuracy for large-Ssale information management
The increasing amount of data and the number of nodes in large-scale environments
require new techniques for information management. Examples of such environments
are the decentralized infrastructures of Computational Grid and Computational
Cloud applications. These large-scale applications need different kinds
of aggregated information such as resource monitoring, resource discovery or economic
information. The challenge of providing timely and accurate information
in large scale environments arise from the distribution of the information. Reasons
for delays in distributed information system are a long information transmission
time due to the distribution, churn and failures.
A problem of large applications such as peer-to-peer (P2P) systems is the increasing
retrieval time of the information due to the decentralization of the data
and the failure proneness. However, many applications need a timely information
provision. Another problem is an increasing network consumption when the application
scales to millions of users and data. Using approximation techniques allows
reducing the retrieval time and the network consumption. However, the usage of
approximation techniques decreases the accuracy of the results. Thus, the remaining
problem is to offer a trade-off in order to solve the conflicting requirements of
fast information retrieval, accurate results and low messaging cost.
Our goal is to reach a self-adaptive decision mechanism to offer a trade-off
among the retrieval time, the network consumption and the accuracy of the result.
Self-adaption enables distributed software to modify its behavior based on
changes in the operating environment. In large-scale information systems that use
hierarchical data aggregation, we apply self-adaptation to control the approximation
used for the information retrieval and reduces the network consumption and
the retrieval time. The hypothesis of the thesis is that approximation techniquescan reduce the retrieval time and the network consumption while guaranteeing an
accuracy of the results, while considering user’s defined priorities.
First, this presented research addresses the problem of a trade-off among a
timely information retrieval, accurate results and low messaging cost by proposing
a summarization algorithm for resource discovery in P2P-content networks.
After identifying how summarization can improve the discovery process, we propose
an algorithm which uses a precision-recall metric to compare the accuracy
and to offer a user-driven trade-off. Second, we propose an algorithm that applies
a self-adaptive decision making on each node. The decision is about the pruning
of the query and returning the result instead of continuing the query. The pruning
reduces the retrieval time and the network consumption at the cost of a lower accuracy
in contrast to continuing the query. The algorithm uses an analytic hierarchy
process to assess the user’s priorities and to propose a trade-off in order to satisfy
the accuracy requirements with a low message cost and a short delay.
A quantitative analysis evaluates our presented algorithms with a simulator,
which is fed with real data of a network topology and the nodes’ attributes. The
usage of a simulator instead of the prototype allows the evaluation in a large scale
of several thousands of nodes. The algorithm for content summarization is evaluated
with half a million of resources and with different query types. The selfadaptive
algorithm is evaluated with a simulator of several thousands of nodes
that are created from real data. A qualitative analysis addresses the integration
of the simulator’s components in existing market frameworks for Computational
Grid and Cloud applications.
The proposed content summarization algorithm reduces the information retrieval
time from a logarithmic increase to a constant factor. Furthermore, the
message size is reduced significantly by applying the summarization technique.
For the user, a precision-recall metric allows defining the relation between the retrieval
time and the accuracy. The self-adaptive algorithm reduces the number of
messages needed from an exponential increase to a constant factor. At the same
time, the retrieval time is reduced to a constant factor under an increasing number
of nodes. Finally, the algorithm delivers the data with the required accuracy
adjusting the depth of the query according to the network conditions.La gestió de la informació exigeix noves tècniques que tractin amb la creixent
quantitat de dades i nodes en entorns a gran escala. Alguns exemples d’aquests
entorns són les infraestructures descentralitzades de Computacional Grid i Cloud.
Les aplicacions a gran escala necessiten diferents classes d’informació agregada
com monitorització de recursos i informació econòmica. El desafiament de proporcionar
una provisió ràpida i acurada d’informació en ambients de grans escala
sorgeix de la distribució de la informació. Una raó és que el sistema d’informació
ha de tractar amb l’adaptabilitat i fracassos d’aquests ambients.
Un problema amb aplicacions molt grans com en sistemes peer-to-peer (P2P)
és el creixent temps de recuperació de l’informació a causa de la descentralització
de les dades i la facilitat al fracàs. No obstant això, moltes aplicacions necessiten
una provisió d’informació puntual. A més, alguns usuaris i aplicacions accepten
inexactituds dels resultats si la informació es reparteix a temps. A més i més, el
consum de xarxa creixent fa que sorgeixi un altre problema per l’escalabilitat del
sistema. La utilització de tècniques d’aproximació permet reduir el temps de recuperació
i el consum de xarxa. No obstant això, l’ús de tècniques d’aproximació
disminueix la precisió dels resultats. Així, el problema restant és oferir un compromís
per resoldre els requisits en conflicte d’extracció de la informació ràpida,
resultats acurats i cost d’enviament baix.
El nostre objectiu és obtenir un mecanisme de decisió completament autoadaptatiu
per tal d’oferir el compromís entre temps de recuperació, consum de
xarxa i precisió del resultat. Autoadaptacío permet al programari distribuït modificar
el seu comportament en funció dels canvis a l’entorn d’operació. En sistemes
d’informació de gran escala que utilitzen agregació de dades jeràrquica,
l’auto-adaptació permet controlar l’aproximació utilitzada per a l’extracció de la informació i redueixen el consum de xarxa i el temps de recuperació. La hipòtesi
principal d’aquesta tesi és que els tècniques d’aproximació permeten reduir el
temps de recuperació i el consum de xarxa mentre es garanteix una precisió adequada
definida per l’usari.
La recerca que es presenta, introdueix un algoritme de sumarització de continguts
per a la descoberta de recursos a xarxes de contingut P2P. Després d’identificar
com sumarització pot millorar el procés de descoberta, proposem una mètrica que
s’utilitza per comparar la precisió i oferir un compromís definit per l’usuari. Després,
introduïm un algoritme nou que aplica l’auto-adaptació a un ordre per satisfer
els requisits de precisió amb un cost de missatge baix i un retard curt. Basat
en les prioritats d’usuari, l’algoritme troba automàticament un compromís.
L’anàlisi quantitativa avalua els algoritmes presentats amb un simulador per
permetre l’evacuació d’uns quants milers de nodes. El simulador s’alimenta amb
dades d’una topologia de xarxa i uns atributs dels nodes reals. L’algoritme de
sumarització de contingut s’avalua amb mig milió de recursos i amb diferents
tipus de sol·licituds. L’anàlisi qualitativa avalua la integració del components del
simulador en estructures de mercat existents per a aplicacions de Computacional
Grid i Cloud. Així, la funcionalitat implementada del simulador (com el procés
d’agregació i la query language) és comprovada per la integració de prototips.
L’algoritme de sumarització de contingut proposat redueix el temps d’extracció
de l’informació d’un augment logarítmic a un factor constant. A més, també permet
que la mida del missatge es redueix significativament. Per a l’usuari, una
precision-recall mètric permet definir la relació entre el nivell de precisió i el
temps d’extracció de la informació. Alhora, el temps de recuperació es redueix
a un factor constant sota un nombre creixent de nodes. Finalment, l’algoritme
reparteix les dades amb la precisió exigida i ajusta la profunditat de la sol·licitud
segons les condicions de xarxa. Els algoritmes introduïts són prometedors per ser
utilitzats per l’agregació d’informació en nous sistemes de gestió de la informació
de gran escala en el futur.Postprint (published version