11 research outputs found
A critical look at power law modelling of the Internet
This paper takes a critical look at the usefulness of power law models of the
Internet. The twin focuses of the paper are Internet traffic and topology
generation. The aim of the paper is twofold. Firstly it summarises the state of
the art in power law modelling particularly giving attention to existing open
research questions. Secondly it provides insight into the failings of such
models and where progress needs to be made for power law research to feed
through to actual improvements in network performance.Comment: To appear Computer Communication
An Overview of Internet Measurements:Fundamentals, Techniques, and Trends
The Internet presents great challenges to the characterization of its structure and behavior. Different reasons contribute to this situation, including a huge user community, a large range of applications, equipment heterogeneity, distributed administration, vast geographic coverage, and the dynamism that are typical of the current Internet. In order to deal with these challenges, several measurement-based approaches have been recently proposed to estimate and better understand the behavior, dynamics, and properties of the Internet. The set of these measurement-based techniques composes the Internet Measurements area of research. This overview paper covers the Internet Measurements area by presenting measurement-based tools and methods that directly influence other conventional areas, such as network design and planning, traffic engineering, quality of service, and network management
Investigating self-similarity and heavy tailed distributions on a large scale experimental facility
After seminal work by Taqqu et al. relating self-similarity to heavy tail distributions, a number of research articles verified that aggregated Internet traffic time series show self-similarity and that Internet attributes, like WEB file sizes and flow lengths, were heavy tailed. However, the validation of the theoretical prediction relating self-similarity and heavy tails remains unsatisfactorily addressed, being investigated either using numerical or network simulations, or from uncontrolled web traffic data. Notably, this prediction has never been conclusively verified on real networks using controlled and stationary scenarii, prescribing specific heavy-tail distributions, and estimating confidence intervals. In the present work, we use the potential and facilities offered by the large-scale, deeply reconfigurable and fully controllable experimental Grid5000 instrument, to investigate the prediction observability on real networks. To this end we organize a large number of controlled traffic circulation sessions on a nation-wide real network involving two hundred independent hosts. We use a FPGA-based measurement system, to collect the corresponding traffic at packet level. We then estimate both the self-similarity exponent of the aggregated time series and the heavy-tail index of flow size distributions, independently. Comparison of these two estimated parameters, enables us to discuss the practical applicability conditions of the theoretical prediction
Investigating self-similarity and heavy-tailed distributions on a large scale experimental facility
International audienceAfter the seminal work by Taqqu et al. relating selfsimilarity to heavy-tailed distributions, a number of research articles verified that aggregated Internet traffic time series show self-similarity and that Internet attributes, like Web file sizes and flow lengths, were heavy-tailed. However, the validation of the theoretical prediction relating self-similarity and heavy tails remains unsatisfactorily addressed, being investigated either using numerical or network simulations, or from uncontrolled Web traffic data. Notably, this prediction has never been conclusively verified on real networks using controlled and stationary scenarii, prescribing specific heavy-tailed distributions, and estimating confidence intervals. With this goal in mind, we use the potential and facilities offered by the large-scale, deeply reconfigurable and fully controllable experimental Grid5000 instrument, to investigate the prediction observability on real networks. To this end we organize a large number of controlled traffic circulation sessions on a nation-wide real network involving two hundred independent hosts. We use a FPGA-based measurement system, to collect the corresponding traffic at packet level. We then estimate both the self-similarity exponent of the aggregated time series and the heavy-tail index of flow size distributions, independently. On the one hand, our results complement and validate with a striking accuracy some conclusions drawn from a series of pioneer studies. On the other hand, they bring in new insights on the controversial role of certain components of real networks
The Dynamics of Internet Traffic: Self-Similarity, Self-Organization, and Complex Phenomena
The Internet is the most complex system ever created in human history.
Therefore, its dynamics and traffic unsurprisingly take on a rich variety of
complex dynamics, self-organization, and other phenomena that have been
researched for years. This paper is a review of the complex dynamics of
Internet traffic. Departing from normal treatises, we will take a view from
both the network engineering and physics perspectives showing the strengths and
weaknesses as well as insights of both. In addition, many less covered
phenomena such as traffic oscillations, large-scale effects of worm traffic,
and comparisons of the Internet and biological models will be covered.Comment: 63 pages, 7 figures, 7 tables, submitted to Advances in Complex
System
Recommended from our members
Spatial stochastic models for network analysis
This thesis proposes new stochastic interacting particle models for networks, and studies some fundamental properties of these models. This thesis considers two application areas of networking - engineering design questions in future wireless systems and algorithmic tasks in large scale graph structured data. The key innovation introduced in this thesis is to bring tools and ideas from stochastic geometry to bear on the problems in both these application domains. We identify certain fundamental questions in design and engineering both wireless systems and large scale graph structured data processing systems. Subsequently, we identify novel stochastic geometric models, that captures the fundamental properties of these networks, which forms the first research contribution. We then rigorously study these models, by bringing to bear new tools from stochastic geometry, random graphs, percolation and Markov processes to establish structural results and fundamental phase transitions in these models. Using our developed mathematical methodology, we then identify design insights and develop algorithms, which we demonstrate are instructive in many practical settings. In the setting of wireless systems, this thesis studies both ad-hoc and cellular networks. In the ad-hoc network setting, we aim to understand fundamental limits of the simplest possible protocol to access the spectrum, namely a link transmits whenever it has data to send by treating all interference as noise. Surprisingly this basic question itself was not understood, as the system dynamics is coupled spatially due to the interference links cause one another and temporally due to randomness in traffic arrivals. We propose a novel interacting particle model called the spatial birth-death wireless network model to understand the stability properties of the simple spectrum access protocol. Using tools from Palm calculus and fluid limit theory, we establish a tight characterization of when this model is stable. Furthermore, we show that whenever stable, the links in steady-state exhibit a form of clustering. Leveraging these structural results, we propose two mean field heuristics to obtain formulas for key performance metrics such as average delay experienced by a link. We empirically find that the proposed formulas for delay predicts accurately the system behavior. We subsequently study scalability properties of this model by introducing an appropriate infinite dimensional version of the model we call the Interference Queueing Networks model. The model consists of a queue located at each grid point of an infinite regular integer lattice, with the queues interacting with each other in a translation invariant fashion. We then prove several structural properties of the model namely, tight conditions for existence of stationary solutions and some sufficient conditions for uniqueness of stationary solutions. Remarkably, we obtain exact formula for mean delay in this model, unlike the continuum model where we relied on mean-field type heuristics to obtain insights. In the setting of cellular networks, we study optimal association schemes by mobile phones in the case when there are several possible base station technologies operating on orthogonal bands. We show that this choice leads to a performance gain we term technology diversity. Interestingly, we show that the performance gain relies on the amount of instantaneous information a user has on the various base station technologies that it can leverage to make the association decision. We outline optimal association schemes under various information settings that a user may have on the network. Moreover, we propose simple heuristics for association that relies on a user obtaining minimal instantaneous information and are thus practical to implement. We prove that in certain natural asymptotic regime of parameters, our proposed heuristic policy is also optimal, and thus quantifying the value of having fine grained information at a user for association. We empirically observe that the asymptotic result is valid even at finite parameter regimes that are typical in todays networks. In the application of analyzing large scale graph structured data, we consider the graph clustering problem with side information. Graph clustering is a standard and widely used task which consists in partitioning the set of nodes of a graph into underlying clusters where nodes in the same cluster are similar to each other and nodes across different clusters are different. Motivated by applications in social and biological networks, we consider the task of clustering nodes of a graph, when there is side information on the nodes, other than that contained in the graph. For instance in social networks, one has access to meta data about a person (node in a social graph) such as age, location, income etc, along with the combinatorial data of who are his friends on the social graph. Similarly, in biological networks, there is often meta-data about an experiment that provides additional contextual data about a node, in addition to the combinatorial data. In this thesis, we propose a generative model for such graph structured data with side information, which is inspired by random graph models in stochastic geometry such as the random connection model and the generative models for networks with clusters without contexts, such as the stochastic block model or the planted partition model. We propose a novel graph model called the planted partition random connection model. Roughly speaking, in this model, each node has two labels - an observable R [superscript d] valued (for some fixed d) feature label and an unobservable binary valued community label. Conditional on the node labels, edges are drawn at random in this graph depending on both the feature and community labels of the two end points. The clustering task consists in recovering the underlying partition of nodes corresponding to the respective community labels better than a random assignment, when given an observation of the graph generated and the features of all nodes. We show that if the 'density of nodes', i.e., average number of nodes having features in an unit volume of space of R [superscript d] is small, then no algorithm can cluster the graph that can asymptotically beat a random assignment of community labels. On the contrary, if the density of nodes is sufficiently high, we give a simple algorithm that recovers the true underlying partition strictly better a random assignment. We then apply the proposed algorithm to a problem in computational biology called Haplotype Phasing and observe empirically, that it obtains state of art results. This demonstrates, both the validity of our generative model, as well as our new algorithm.Electrical and Computer Engineerin