408 research outputs found
Application of the Allan Variance to Time Series Analysis in Astrometry and Geodesy: A Review
The Allan variance (AVAR) was introduced 50 years ago as a statistical tool
for assessing of the frequency standards deviations. For the past decades, AVAR
has increasingly being used in geodesy and astrometry to assess the noise
characteristics in geodetic and astrometric time series. A specific feature of
astrometric and geodetic measurements, as compared with the clock measurements,
is that they are generally associated with uncertainties; thus, an appropriate
weighting should be applied during data analysis. Besides, some physically
connected scalar time series naturally form series of multi-dimensional
vectors. For example, three station coordinates time series , , and
can be combined to analyze 3D station position variations. The classical AVAR
is not intended for processing unevenly weighted and/or multi-dimensional data.
Therefore, AVAR modifications, namely weighted AVAR (WAVAR), multi-dimensional
AVAR (MAVAR), and weighted multi-dimensional AVAR (WMAVAR), were introduced to
overcome these deficiencies. In this paper, a brief review is given of the
experience of using AVAR and its modifications in processing astro-geodetic
time series
Evaluating the impact of traffic sampling in network analysis
Dissertação de mestrado integrado em Engenharia InformáticaThe sampling of network traffic is a very effective method in order to comprehend the
behaviour and flow of a network, essential to build network management tools to control
Service Level Agreements (SLAs), Quality of Service (QoS), traffic engineering, and the
planning of both the capacity and the safety of the network.
With the exponential rise of the amount traffic caused by the number of devices connected
to the Internet growing, it gets increasingly harder and more expensive to understand the
behaviour of a network through the analysis of the total volume of traffic. The use of
sampling techniques, or selective analysis, which consists in the election of small number of
packets in order to estimate the expected behaviour of a network, then becomes essential.
Even though these techniques drastically reduce the amount of data to be analyzed, the fact
that the sampling analysis tasks have to be performed in the network equipment can cause a
significant impact in the performance of these equipment devices, and a reduction in the
accuracy of the estimation of network state.
In this dissertation project, an evaluation of the impact of selective analysis of network
traffic will be explored, at a level of performance in estimating network state, and statistical
properties such as self-similarity and Long-Range Dependence (LRD) that exist in original
network traffic, allowing a better understanding of the behaviour of sampled network traffic.A análise seletiva do tráfego de rede é um método muito eficaz para a compreensão do
comportamento e fluxo de uma rede, sendo essencial para apoiar ferramentas de gestão de
tarefas tais como o cumprimento de contratos de serviço (Service Level Agreements - SLAs),
o controlo da Qualidade de Serviço (QoS), a engenharia de tráfego, o planeamento de
capacidade e a segurança das redes.
Neste sentido, e face ao exponencial aumento da quantidade de tráfego presente causado
pelo número de dispositivos com ligação à rede ser cada vez maior, torna-se cada vez
mais complicado e dispendioso o entendimento do comportamento de uma rede através
da análise do volume total de tráfego. A utilização de técnicas de amostragem, ou análise
seletiva, que consiste na eleição de um pequeno conjunto de pacotes de forma a tentar
estimar, ou calcular, o comportamento expectável de uma rede, torna-se assim essencial.
Apesar de estas técnicas reduzirem bastante o volume de dados a ser analisado, o facto de as
tarefas de análise seletiva terem de ser efetuadas nos equipamentos de rede pode criar um
impacto significativo no desempenho dos mesmos e uma redução de acurácia na estimação
do estado da rede.
Nesta dissertação de mestrado será então feita uma avaliação do impacto da análise
seletiva do tráfego de rede, a nÃvel do desempenho na estimativa do estado da rede e a nÃvel
das propriedades estatÃsticas tais como a Long-Range Dependence (LRD) existente no tráfego
original, permitindo assim entender melhor o comportamento do tráfego de rede seletivo
ASYMPTOTIC NORMALITY OF A HURST PARAMETER ESTIMATOR BASED ON THE MODIFIED ALLAN VARIANCE
ABSTRACT. In order to estimate the memory parameter of Internet traffic data, it has been recently proposed a log-regression estimator based on the so-called modified Allan variance (MAVAR). Simulations have shown that this estimator achieves higher accuracy and better confidence when compared with other methods. In this paper we present a rigorous study of the log-regression MAVAR estimator. In particular, under the assumption that the signal process is a fractional Brownian motion, we prove that it is asymptotically normally-distributed and consistent. Finally, we discuss its connection with the wavelets-estimators
ASYMPTOTIC NORMALITY OF A HURST PARAMETER ESTIMATOR BASED ON THE MODIFIED ALLAN VARIANCE
ABSTRACT. It has been observed that in many situations the network traffic is characterized by self-similarity and long-range correlations on various time-scales. The memory parameter of a related time series is thus a key quantity in order to predict and control the traffic flow. In the present paper we analyze the performance of a memory parameter estimator, α, defined by the log-regression on the so-called modified Allan variance. Under the assumption that the signal process is a fractional Brownian motion, with Hurst parameter H, we study the rate of convergence of the empirical modified Allan variance, and then prove that the log-regression estimatorα converges to the memory parameter α = 2H − 2 of the process. In particular, we show that the deviationα − α, when suitably normalized, converges in distribution to a normal random variable, and we compute explicitly its asymptotic variance
ASYMPTOTIC NORMALITY OF A HURST PARAMETER ESTIMATOR BASED ON THE MODIFIED ALLAN VARIANCE
ABSTRACT. It has been observed that different kinds of real data are characterized by selfsimilarity and long-range correlations on various time-scales. The memory parameter of a related time series is thus a key quantity in order to predict and control many phenomena. In the present paper we analyze the performance of a memory parameter estimator, b α, defined by the log-regression on the so-called modified Allan variance. Under the assumption that the signal process is a fractional Brownian motion, with Hurst parameter H, we study the rate of convergence of the empirical modified Allan variance, and then prove that the logregression estimator b α converges to the memory parameter α = 2H − 2 of the process. In particular, we show that the deviation b α − α, when suitably normalized, converges in distribution to a normal random variable, and we compute explicitly its asymptotic variance
Improvement of TestH: a C Library for Generating Self-Similar Series and for Estimating the Hurst Parameter
The discovery of consistent dependencies between values in certain data series paved the way
for the development of algorithms that could, somehow, classify the degree of self-similarity
between values and derive considerations about the behavior of these series. This self-similarity
metric is typically known as the Hurst Parameter, and allows the classification of the behavior of
a data series as persistent, anti-persistent, or purely random. This discovery was highly relevant
in the field of computer networks, inclusively helping companies to develop equipment and
infrastructure that suit their needs more efficiently. The Hurst Parameter is relevant in many
other fields, and it has been for exemple applied in the study of geologic phenomena [KTC07]
or even on areas related with health sciencies[VAJ08, HPS+12].
There are several algorithms for estimating the Hurst Parameter [Hur51, Hig88, RPGC06], and
each one of them has its strengths and weaknesses. The usage of these algorithms is sometimes
difficult, motivating the creation of tools or libraries that provide them in a more user-friendly
manner. Unfortunately, and despite of being an area that has been studied for decades, the
tools available have limitations and do not implement all algorithms available in the literature.
The work presented in this dissertation consists on the improvement of TestH, a library written in
ANSI C for the study of self-similarity in time series, which was initially developed by Fernandes
et al. [FNS+14]. These improvements are materialized as the addition of algorithms to estimate
the Hurst Parameter and to generate self-similar sequences. Additionally, auxiliary functions
were implemented, along with code refactoring, documentation of the application programming
interface and the creation of a website for the project.
This dissertation is mostly focused on the algorithms that were introduced in TestH, namely
the Periodogram, the Higuchi method, the Hurst Exponent by Autocorrelation Function and the
Detrended Fluctuation Analysis estimators, and the Davies and Hart method for generating selfsimilar
sequences. In order to turn TestH into a robust and trustable library, several tests were
performed comparing the results of these implementations with the values provided by similar
tools. The overall results obtained in these tests are in line with expectations and the algorithms
that are simultaneously implemented in TestH and in the other tools analyzed (for example, the
Periodogram) returned very similar results, corroborating the belief that the methods were well
implemented.A descoberta da dependência consistente entre valores em certas séries de dados, abriu caminho
para o desenvolvimento de algoritmos que permitissem, de alguma forma, classificar o
grau de auto-semelhança entre valores e tecer considerações sobre o comportamento da série.
A esta estatÃstica dá-se o nome de Parâmetro de Hurst, que permite analisar e classificar o comportamento
de uma série de dados como persistente, antipersistente ou puramente aleatória.
Esta descoberta tem sido bastante relevante na área das redes de computadores, onde serve,
p.ex., de ajuda às empresas para desenvolverem equipamentos e infraestruturas adequadas à s
suas necessidades. Para além do elevado interesse que a referida área apresentou por esta
métrica, existem outros campos ciêntificos onde algoritmos para estimar o Parâmetro de Hurst
de sequências de valores estão a ser aplicados, como por exemplo no estudo de fenómenos
geológicos [KTC07], bem como em fenómenos ligados às ciências da saúde [VAJ08, HPS+12].
Existem vários algoritmos para estimar o Parâmetro de Hurst [Hur51, Hig88, RPGC06], tendo
cada um deles as suas virtudes e fraquezas. A utilização destes algoritmos é por vezes difÃcil,
motivando a criação de ferramentas e bibliotecas que os congregam e disponibilizam de uma
forma mais amigável ao utilizador. Infelizmente, e apesar de ser uma área que está a ser alvo de
estudos há décadas, as ferramentas existentes, para além de não implementarem a totalidade
dos algoritmos mais relevantes, apresentam ainda algumas limitações. Desta forma, o trabalho
apresentado nesta dissertação consiste, principalmente, na melhoria da TestH, uma biblioteca
escrita em ANSI C para o estudo de séries temporais auto-semelhantes, inicialmente desenvolvida
por Fernandes et al. [FNS+14]. Estas melhorias materializam-se sobretudo na adição
de algoritmos para estimar o Parâmetro de Hurst e gerar séries de dados auto-semelhantes.
Adicionalmente foram introduzidas funções auxiliares, foi efetuada a refactorização do código,
documentação das interfaces de programação e ainda a criação de um sÃtio web para divulgação
do projeto.
Esta dissertação dá enfase aos algoritmos de estimação do Parâmetro de Hurst e geração de
séries auto-semelhantes. Relativamente à estimação, foram introduzidos na TestH, no âmbito
deste trabalho, o Periodograma, o método de Higuchi, a estimação através da função de autocorrelação
e o método de análise através da remoção das tendências. No que respeita à geração
de séries, foi também introduzido o método de Davies e Hart. Com o objetivo de tornar a TestH
robusta e credÃvel, foram realizados vários testes, comparando os resultados destas implementações
com os valores fornecidos por ferramentas semelhantes. Os resultados obtidos estão alinhados
com o esperado e, inclusivamente, os algoritmos que se encontram implementados na
TestH e restantes ferramentas analisadas (como por exemplo, o Periodograma), apresentaram
valores bastante semelhantes entre si, corroborando a crença da correção da implementação
dos vários métodos
Skype traffic detection and characterization
Skype is a very popular VoIP software which has recently attracted the attention of the research community and network operators; furthermore Skype uses a proprietary signalling design and its source code is unavailable. This makes its analysis really important since the classification of IP flows becomes increasingly crucial in modern network management platforms. Traditional classification systems based on packet headers are rapidly becoming ineffective. In this work after a general analysis of Skype protocol and traffic in both time and frequency domain, a new classification method is presented. It is based on statical classification of the flow, using only three basic properties of IP packets: their size, interarrival time and order of arrival. The whole process is based on a new quantity called Protocol Fingerprint. Its aim is to express these quantities in an efficient way. An important part in the classification process is taken by a Gaussian filter that smooths the protocol fingerprints avoiding misclassifications caused by any kind of noise generated in the network. Even if this technique is at an early stage of development and requires more work, it is quite promising
- …