27 research outputs found
Characterizing the spatial and temporal diversity of the Internet Traffic: a capacity planning application to the RedIRIS network
Tesis doctoral inédita. Universidad Autónoma de Madrid, Escuela Politécnica Superior, marzo de 201
Selective capping of packet payloads multi-Gb/s rates
Nowadays, network traces are an important tool to characterize network traffic, detect
anomalies and evaluate performance forensically, among others task.
However, the storage and speed required for traffic traces have been greatly expanded
in the actual multi-Gb/s networks. In this light, as an attempt to reduce such write speed
and storage requirements on hard drives and further reduce the computational burden of
packet analysis, we propose a selectively capping of packet payloads.
Our proposal takes advantage of most packet payloads being useless for analysis purposes,
because they are either encrypted or in a proprietary application non-readable
format. Then, such payloads can be capped. Conversely, non-ASCII packets from wellknown
protocols and protocols with some ASCII data are fully captured as they may be
potentially useful for network analysis.
We have implemented and integrated this proposal into a high speed network driver
and a software module at user level, to make its operation more transparent and faster
to upper-layer applications. In addition, a detailed cost and analysis of base-algorithm
implemented as well as its optimization are presented throughout this work.
The results are promising, selective capping achieves multi-Gb/s rates by exploiting
low level hardware and software techniques to meet the fastest network rates. Moreover,
storage savings between two and three times are achieved by capping nearly 60% of
packets payloads in multiple realistic scenarios.Actualmente, las trazas de red son una herramienta esencial en el trabajo de los analistas
y administradores de red, ya que les permiten entender el comportamiento del tráfico,
detectar anomalías y ataques y evaluar el desempeño de la red, entre otras tareas. Sin
embargo, las altas velocidad de las redes actuales dificultan en gran medida la captura
de las trazas ya que se requiere un gran espacio de almacenamiento para guardar apenas
tiempo de monitorización. El problema del espacio de almacenamiento se suma al de la
alta tasa de escritura en disco que se debe mantener para no perder información.
En este trabajo, se presenta un filtro selectivo a nivel de paquetes. El método propuesto
se aprovecha de que una gran parte de los paquetes que circulan por la red no
son útiles para los analistas, protocolos cifrados o propietarios con contenido no legible,
y por tanto podemos no guardar la carga útil dichos paquetes ahorrando el correspondiente
espacio y reduciendo la tasa de escritura en disco necesaria para monitorizar la
red. Mientras que los paquetes de protocolos conocidos o de estructura desconocida pero
contenido legible (por ejemplo con codificación ASCII) son capturados enteramente.
Además, se describen dos implementaciones del filtro integrándolo en un driver de red
de altas prestaciones y alternativamente como aplicación de la capa de usuario, y se detalla
un análisis de las prestaciones en ambas arquitecturas. Los resultados aquí presentados
son prometedores, las pruebas realizadas alcanzan tasas de monitorización cercanas a los
10Gb/s y muestran que es posible aplicar un filtro selectivo en la monitorización de redes
de altas prestaciones
Recommended from our members
Spatial stochastic models for network analysis
This thesis proposes new stochastic interacting particle models for networks, and studies some fundamental properties of these models. This thesis considers two application areas of networking - engineering design questions in future wireless systems and algorithmic tasks in large scale graph structured data. The key innovation introduced in this thesis is to bring tools and ideas from stochastic geometry to bear on the problems in both these application domains. We identify certain fundamental questions in design and engineering both wireless systems and large scale graph structured data processing systems. Subsequently, we identify novel stochastic geometric models, that captures the fundamental properties of these networks, which forms the first research contribution. We then rigorously study these models, by bringing to bear new tools from stochastic geometry, random graphs, percolation and Markov processes to establish structural results and fundamental phase transitions in these models. Using our developed mathematical methodology, we then identify design insights and develop algorithms, which we demonstrate are instructive in many practical settings. In the setting of wireless systems, this thesis studies both ad-hoc and cellular networks. In the ad-hoc network setting, we aim to understand fundamental limits of the simplest possible protocol to access the spectrum, namely a link transmits whenever it has data to send by treating all interference as noise. Surprisingly this basic question itself was not understood, as the system dynamics is coupled spatially due to the interference links cause one another and temporally due to randomness in traffic arrivals. We propose a novel interacting particle model called the spatial birth-death wireless network model to understand the stability properties of the simple spectrum access protocol. Using tools from Palm calculus and fluid limit theory, we establish a tight characterization of when this model is stable. Furthermore, we show that whenever stable, the links in steady-state exhibit a form of clustering. Leveraging these structural results, we propose two mean field heuristics to obtain formulas for key performance metrics such as average delay experienced by a link. We empirically find that the proposed formulas for delay predicts accurately the system behavior. We subsequently study scalability properties of this model by introducing an appropriate infinite dimensional version of the model we call the Interference Queueing Networks model. The model consists of a queue located at each grid point of an infinite regular integer lattice, with the queues interacting with each other in a translation invariant fashion. We then prove several structural properties of the model namely, tight conditions for existence of stationary solutions and some sufficient conditions for uniqueness of stationary solutions. Remarkably, we obtain exact formula for mean delay in this model, unlike the continuum model where we relied on mean-field type heuristics to obtain insights. In the setting of cellular networks, we study optimal association schemes by mobile phones in the case when there are several possible base station technologies operating on orthogonal bands. We show that this choice leads to a performance gain we term technology diversity. Interestingly, we show that the performance gain relies on the amount of instantaneous information a user has on the various base station technologies that it can leverage to make the association decision. We outline optimal association schemes under various information settings that a user may have on the network. Moreover, we propose simple heuristics for association that relies on a user obtaining minimal instantaneous information and are thus practical to implement. We prove that in certain natural asymptotic regime of parameters, our proposed heuristic policy is also optimal, and thus quantifying the value of having fine grained information at a user for association. We empirically observe that the asymptotic result is valid even at finite parameter regimes that are typical in todays networks. In the application of analyzing large scale graph structured data, we consider the graph clustering problem with side information. Graph clustering is a standard and widely used task which consists in partitioning the set of nodes of a graph into underlying clusters where nodes in the same cluster are similar to each other and nodes across different clusters are different. Motivated by applications in social and biological networks, we consider the task of clustering nodes of a graph, when there is side information on the nodes, other than that contained in the graph. For instance in social networks, one has access to meta data about a person (node in a social graph) such as age, location, income etc, along with the combinatorial data of who are his friends on the social graph. Similarly, in biological networks, there is often meta-data about an experiment that provides additional contextual data about a node, in addition to the combinatorial data. In this thesis, we propose a generative model for such graph structured data with side information, which is inspired by random graph models in stochastic geometry such as the random connection model and the generative models for networks with clusters without contexts, such as the stochastic block model or the planted partition model. We propose a novel graph model called the planted partition random connection model. Roughly speaking, in this model, each node has two labels - an observable R [superscript d] valued (for some fixed d) feature label and an unobservable binary valued community label. Conditional on the node labels, edges are drawn at random in this graph depending on both the feature and community labels of the two end points. The clustering task consists in recovering the underlying partition of nodes corresponding to the respective community labels better than a random assignment, when given an observation of the graph generated and the features of all nodes. We show that if the 'density of nodes', i.e., average number of nodes having features in an unit volume of space of R [superscript d] is small, then no algorithm can cluster the graph that can asymptotically beat a random assignment of community labels. On the contrary, if the density of nodes is sufficiently high, we give a simple algorithm that recovers the true underlying partition strictly better a random assignment. We then apply the proposed algorithm to a problem in computational biology called Haplotype Phasing and observe empirically, that it obtains state of art results. This demonstrates, both the validity of our generative model, as well as our new algorithm.Electrical and Computer Engineerin
AUTOMATIC DECOMPOSITION OF SELF-TRIGGERING KERNELS OF HAWKES PROCESSES
Department of Computer EngineeringHawkes Processes (HPs) capture self- and mutual excitation between events when the arrival of one event makes future ones more likely to happen in time-series data. Identification of the temporal covariance kernel can reveal the underlying structure to better predict future events.
In this work, we present a new framework to represent time-series events with a composition of self-triggering kernels of Hawkes Processes. Our automatic decomposition procedure is composed of three main steps: (1) discretized kernel estimation through frequency domain inversion equation associated with the covariance density, (2) greedy kernel decomposition through four base kernels and their combinations (addition and multiplication), and (3) automated report generation. In addition, we report the first multiplicative kernel compositions along with stationarity conditions for Hawkes Processes. We demonstrate that the new automatic kernel decomposition procedure performs better to predict future events than the existing framework in real-world data.ope
Deployment algorithms for multi-agent exploration and patrolling
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (p. 80-85).Exploration and patrolling are central themes in distributed robotics. These deployment scenarios have deep fundamental importance in robotics, beyond the most obvious direct applications, as they can be used to model a wider range of seemingly unrelated deployment objectives. Deploying a group of robots, or any type of agent in general, to explore or patrol in dynamic or unknown environments presents us with some fundamental conceptual steps. Regardless of the problem domain or application, we are required to (a) understand the environment that the agents are being deployed in; (b) encode the task as a set of constraints and guarantees; and (c) derive an effective deployment strategy for the operation of the agents. This thesis presents a coherent treatment of these steps at the theoretical and practical level. First, we address the problem of obtaining a concise description of a physical environment for robotic exploration. Specifically, we aim to determine the number of robots required to be deployed to clear an environment using non-recontaminating exploration. We introduce the medial axis as a configuration space and derive a mathematical representation of a continuous environment that captures its underlying topology and geometry. We show that this representation provides a concise description of arbitrary environments, and that reasoning about points in this representation is equivalent to reasoning about robots in physical space. We leverage this to derive a lower bound on the number of required pursuers. We provide a transformation from this continuous representation into a symbolic representation. We then present a Markov-based model that captures a pickup and delivery (PDP) problem on a general graph. We present a mechanism by which a group of robots can be deployed to patrol the graph in order to fulfill specific service tasks. In particular, we examine the problem in the context of urban transportation, and establish a model that captures the operation of a fleet of taxis in response to incident customer arrivals throughout the city. We consider three different evaluation criteria: minimizing the number of transportation resources for urban planning; minimizing fuel consumption for the drivers; and minimizing customer waiting time to increase the overall quality of service. Finally, we present two deployment algorithms for multi-robot exploration and patrolling. The first is a generalized pursuit-evasion algorithm. Given an environment we can compute how many pursuers we need, and generate an optimal pursuit strategy that will guarantee the evaders are detected with the minimum number of pursuers. We then present a practical patrolling policy for a general graph. We evaluate our policy using real-world data, by comparing against the actual observed redistribution of taxi drivers in Singapore. Through large-scale simulations we show that our proposed deployment strategy is stable and improves substantially upon the default unmanaged redistribution of taxi drivers in Singapore.by Mikhail Volkov.S.M
Análisis longitudinal de medidas de red
Este proyecto está dedicado al estudio, automatización y presentación
longitudinal de medidas de red. El estudio longitudinal de medidas de red pretende
explicar y caracterizar el comportamiento de una red en el medio y largo plazo, esto es
meses o años, dando respuestas y ayudando a los gestores de red de forma
complementaria al estudio de medidas de red en el rango de segundos, minutos u horas.
Por ejemplo este tipo de medidas resulta más útil para dimensionar una red que pretenda
mantener la calidad de servicio con el paso del tiempo, identificar patrones de
compartimientos anómalos, identificar comportamientos irregulares e identificar
individualmente a los potenciales causantes. De forma más concreta, este trabajo en
primer lugar ha consistido en la comprensión del sistema de monitorización de la red
académica española RedIRIS, que nos ha facilitado medidas de red reales durante un
periodo de tres años basadas en flujos de red. Luego se ha trabajado en la
automatización del pretratado de estas medidas y, a continuación, se han calculado
estadísticas significativas del comportamiento de una red de comunicaciones. Estas
estadísticas incluyen la evolución del ancho de banda consumido, el número de
direcciones IP, el número de heavy-hitters o las horas más/menos cargada. En cuarto
lugar, se han implementado scripts para facilitar la visualización y estudio de las
variaciones de estas estadísticas con el tiempo. Por último lugar, se ha aplicado todo
este desarrollo en RedIRIS y los resultados son mostrados como un caso de estudio
significativo.This project is dedicated to the study, automation and introduction of
longitudinal-network measurements. This study seeks to explain and describe the
behavior of a network in the medium and long term, i.e., months and years, providing
answers and helping network managers in a complementary way to the analysis of
measurements in the range of seconds, minutes and hours. For example, such long-term
studies are more useful to dimension a network for maintaining its quality of service
over time, identify patterns of both anomalous and irregular behaviors, and identify the
potential causes. More specifically, this work has first consisted of the understanding of
the Spanish academic network's (RedIRIS) monitoring system which has provided us
with real network measurements, specifically network flows, during three years. After,
we have worked in the automation of the pretreatment tasks of these measurements and
then, we have calculated several significant statistics to describe some pieces of
behavior of a communication network. These statistics include the evolution of
bandwidth, the number of IP addresses, the number of heavy-hitters and more/less busy
hours. Fourth, a number of scripts have been implemented for ease both the
visualization and inspection of the increases/decreases of these statistics over time.
Finally, all this development has been applied on RedIRIS' network and the results are
shown in this work as a significant study case
Human Mobility and Application Usage Prediction Algorithms for Mobile Devices
Mobile devices such as smartphones and smart watches are ubiquitous companions of humans’ daily life. Since 2014, there are more mobile devices on Earth than humans. Mobile applications utilize sensors and actuators of these devices to support individuals in their daily life. In particular, 24% of the Android applications leverage users’ mobility data. For instance, this data allows applications to understand which places an individual typically visits. This allows providing her with transportation information, location-based advertisements, or to enable smart home heating systems. These and similar scenarios require the possibility to access the Internet from everywhere and at any time. To realize these scenarios 83% of the applications available in the Android Play Store require the Internet to operate properly and therefore access it from everywhere and at any time.
Mobile applications such as Google Now or Apple Siri utilize human mobility data to anticipate where a user will go next or which information she is likely to access en route to her destination. However, predicting human mobility is a challenging task. Existing mobility prediction solutions are typically optimized a priori for a particular application scenario and mobility prediction task. There is no approach that allows for automatically composing a mobility prediction solution depending on the underlying prediction task and other parameters. This approach is required to allow mobile devices to support a plethora of mobile applications running on them, while each of the applications support its users by leveraging mobility predictions in a distinct application scenario.
Mobile applications rely strongly on the availability of the Internet to work properly. However, mobile cellular network providers are struggling to provide necessary cellular resources. Mobile applications generate a monthly average mobile traffic volume that ranged between 1 GB in Asia and 3.7 GB in North America in 2015. The Ericsson Mobility Report Q1 2016 predicts that by the end of 2021 this mobile traffic volume will experience a 12-fold increase. The consequences are higher costs for both providers and consumers and a reduced quality of service due to congested mobile cellular networks. Several countermeasures can be applied to cope with these problems. For instance, mobile applications apply caching strategies to prefetch application content by predicting which applications will be used next. However, existing solutions suffer from two major shortcomings. They either (1) do not incorporate traffic volume information into their prefetching decisions and thus generate a substantial amount of cellular traffic or (2) require a modification of mobile application code.
In this thesis, we present novel human mobility and application usage prediction algorithms for mobile devices. These two major contributions address the aforementioned problems of (1) selecting a human mobility prediction model and (2) prefetching of mobile application content to reduce cellular traffic.
First, we address the selection of human mobility prediction models. We report on an extensive analysis of the influence of temporal, spatial, and phone context data on the performance of mobility prediction algorithms. Building upon our analysis results, we present (1) SELECTOR – a novel algorithm for selecting individual human mobility prediction models and (2) MAJOR – an ensemble learning approach for human mobility prediction. Furthermore, we introduce population mobility models and demonstrate their practical applicability. In particular, we analyze techniques that focus on detection of wrong human mobility predictions. Among these techniques, an ensemble learning algorithm, called LOTUS, is designed and evaluated.
Second, we present EBC – a novel algorithm for prefetching mobile application content. EBC’s goal is to reduce cellular traffic consumption to improve application content freshness. With respect to existing solutions, EBC presents novel techniques (1) to incorporate different strategies for prefetching mobile applications depending on the available network type and (2) to incorporate application traffic volume predictions into the prefetching decisions. EBC also achieves a reduction in application launch time to the cost of a negligible increase in energy consumption.
Developing human mobility and application usage prediction algorithms requires access to human mobility and application usage data. To this end, we leverage in this thesis three publicly available data set. Furthermore, we address the shortcomings of these data sets, namely, (1) the lack of ground-truth mobility data and (2) the lack of human mobility data at short-term events like conferences. We contribute with JK2013 and UbiComp Data Collection Campaign (UbiDCC) two human mobility data sets that address these shortcomings. We also develop and make publicly available a mobile application called LOCATOR, which was used to collect our data sets.
In summary, the contributions of this thesis provide a step further towards supporting mobile applications and their users. With SELECTOR, we contribute an algorithm that allows optimizing the quality of human mobility predictions by appropriately selecting parameters. To reduce the cellular traffic footprint of mobile applications, we contribute with EBC a novel approach for prefetching of mobile application content by leveraging application usage predictions. Furthermore, we provide insights about how and to what extent wrong and uncertain human mobility predictions can be detected. Lastly, with our mobile application LOCATOR and two human mobility data sets, we contribute practical tools for researchers in the human mobility prediction domain
BIG DATA и анализ высокого уровня : материалы конференции
В сборнике опубликованы результаты научных исследований и разработок в области BIG DATA and Advanced Analytics для оптимизации IT-решений и бизнес-решений, а также тематических исследований в области медицины, образования и экологии
Understanding Complex Human Behaviour in Images and Videos.
Understanding human motions and activities in images and videos is an important problem in many application domains, including surveillance, robotics, video indexing, and sports analysis. Although much progress has been made in classifying single person's activities in simple videos, little efforts have been made toward the interpretation of behaviors of multiple people in natural videos. In this thesis, I will present my research endeavor toward the understanding of behaviors of multiple people in natural images and videos. I identify four major challenges in this problem: i) identifying individual properties of people in videos, ii) modeling and recognizing the behavior of multiple people, iii) understanding human activities in multiple levels of resolutions and iv) learning characteristic patterns of interactions between people or people and surrounding environment. I discuss how we solve these challenging problems using various computer vision and machine learning technologies. I conclude with final remarks, observations, and possible future research directions.PhDElectrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/99956/1/wgchoi_1.pd