129,237 research outputs found
Quotient correlation: A sample based alternative to Pearson's correlation
The quotient correlation is defined here as an alternative to Pearson's
correlation that is more intuitive and flexible in cases where the tail
behavior of data is important. It measures nonlinear dependence where the
regular correlation coefficient is generally not applicable. One of its most
useful features is a test statistic that has high power when testing nonlinear
dependence in cases where the Fisher's -transformation test may fail to
reach a right conclusion. Unlike most asymptotic test statistics, which are
either normal or , this test statistic has a limiting gamma
distribution (henceforth, the gamma test statistic). More than the common
usages of correlation, the quotient correlation can easily and intuitively be
adjusted to values at tails. This adjustment generates two new concepts--the
tail quotient correlation and the tail independence test statistics, which are
also gamma statistics. Due to the fact that there is no analogue of the
correlation coefficient in extreme value theory, and there does not exist an
efficient tail independence test statistic, these two new concepts may open up
a new field of study. In addition, an alternative to Spearman's rank
correlation, a rank based quotient correlation, is also defined. The advantages
of using these new concepts are illustrated with simulated data and a real data
analysis of internet traffic.Comment: Published in at http://dx.doi.org/10.1214/009053607000000866 the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Automated software quality visualisation using fuzzy logic techniques
In the past decade there has been a concerted effort by the software industry to improve the quality of its products. This has led to the inception of various techniques with which to control and measure the process involved in software development. Methods like the Capability Maturity Model have introduced processes and strategies that require measurement in the form of software metrics. With the ever increasing number of software metrics being introduced by capability based processes, software development organisations are finding it more difficult to understand and interpret metric scores. This is particularly problematic for senior management and project managers where analysis of the actual data is not feasible. This paper proposes a method with which to visually represent metric scores so that managers can easily see how their organisation is performing relative to quality goals set for each type of metric. Acting primarily as a proof of concept and prototype, we suggest ways in which real customer needs can be translated into a feasible technical solution. The solution itself visualises metric scores in the form of a tree structure and utilises Fuzzy Logic techniques, XGMML, Web Services and the .NET Framework. Future work is proposed to extend the system from the prototype stage and to overcome a problem with the masking of poor scores
Analysis of dependence among size, rate and duration in internet flows
In this paper we examine rigorously the evidence for dependence among data
size, transfer rate and duration in Internet flows. We emphasize two
statistical approaches for studying dependence, including Pearson's correlation
coefficient and the extremal dependence analysis method. We apply these methods
to large data sets of packet traces from three networks. Our major results show
that Pearson's correlation coefficients between size and duration are much
smaller than one might expect. We also find that correlation coefficients
between size and rate are generally small and can be strongly affected by
applying thresholds to size or duration. Based on Transmission Control Protocol
connection startup mechanisms, we argue that thresholds on size should be more
useful than thresholds on duration in the analysis of correlations. Using
extremal dependence analysis, we draw a similar conclusion, finding remarkable
independence for extremal values of size and rate.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS268 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Generalized Multivariate Extreme Value Models for Explicit Route Choice Sets
This paper analyses a class of route choice models with closed-form
probability expressions, namely, Generalized Multivariate Extreme Value (GMEV)
models. A large group of these models emerge from different utility formulas
that combine systematic utility and random error terms. Twelve models are
captured in a single discrete choice framework. The additive utility formula
leads to the known logit family, being multinomial, path-size, paired
combinatorial and link-nested. For the multiplicative formulation only the
multinomial and path-size weibit models have been identified; this study also
identifies the paired combinatorial and link-nested variations, and generalizes
the path-size variant. Furthermore, a new traveller's decision rule based on
the multiplicative utility formula with a reference route is presented. Here
the traveller chooses exclusively based on the differences between routes. This
leads to four new GMEV models. We assess the models qualitatively based on a
generic structure of route utility with random foreseen travel times, for which
we empirically identify that the variance of utility should be different from
thus far assumed for multinomial probit and logit-kernel models. The expected
travellers' behaviour and model-behaviour under simple network changes are
analysed. Furthermore, all models are estimated and validated on an
illustrative network example with long distance and short distance
origin-destination pairs. The new multiplicative models based on differences
outperform the additive models in both tests
Characterization of Vehicle Behavior with Information Theory
This work proposes the use of Information Theory for the characterization of
vehicles behavior through their velocities. Three public data sets were used:
i.Mobile Century data set collected on Highway I-880, near Union City,
California; ii.Borl\"ange GPS data set collected in the Swedish city of
Borl\"ange; and iii.Beijing taxicabs data set collected in Beijing, China,
where each vehicle speed is stored as a time series. The Bandt-Pompe
methodology combined with the Complexity-Entropy plane were used to identify
different regimes and behaviors. The global velocity is compatible with a
correlated noise with f^{-k} Power Spectrum with k >= 0. With this we identify
traffic behaviors as, for instance, random velocities (k aprox. 0) when there
is congestion, and more correlated velocities (k aprox. 3) in the presence of
free traffic flow
- …