129,237 research outputs found

    Quotient correlation: A sample based alternative to Pearson's correlation

    Full text link
    The quotient correlation is defined here as an alternative to Pearson's correlation that is more intuitive and flexible in cases where the tail behavior of data is important. It measures nonlinear dependence where the regular correlation coefficient is generally not applicable. One of its most useful features is a test statistic that has high power when testing nonlinear dependence in cases where the Fisher's ZZ-transformation test may fail to reach a right conclusion. Unlike most asymptotic test statistics, which are either normal or χ2\chi^2, this test statistic has a limiting gamma distribution (henceforth, the gamma test statistic). More than the common usages of correlation, the quotient correlation can easily and intuitively be adjusted to values at tails. This adjustment generates two new concepts--the tail quotient correlation and the tail independence test statistics, which are also gamma statistics. Due to the fact that there is no analogue of the correlation coefficient in extreme value theory, and there does not exist an efficient tail independence test statistic, these two new concepts may open up a new field of study. In addition, an alternative to Spearman's rank correlation, a rank based quotient correlation, is also defined. The advantages of using these new concepts are illustrated with simulated data and a real data analysis of internet traffic.Comment: Published in at http://dx.doi.org/10.1214/009053607000000866 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Automated software quality visualisation using fuzzy logic techniques

    Get PDF
    In the past decade there has been a concerted effort by the software industry to improve the quality of its products. This has led to the inception of various techniques with which to control and measure the process involved in software development. Methods like the Capability Maturity Model have introduced processes and strategies that require measurement in the form of software metrics. With the ever increasing number of software metrics being introduced by capability based processes, software development organisations are finding it more difficult to understand and interpret metric scores. This is particularly problematic for senior management and project managers where analysis of the actual data is not feasible. This paper proposes a method with which to visually represent metric scores so that managers can easily see how their organisation is performing relative to quality goals set for each type of metric. Acting primarily as a proof of concept and prototype, we suggest ways in which real customer needs can be translated into a feasible technical solution. The solution itself visualises metric scores in the form of a tree structure and utilises Fuzzy Logic techniques, XGMML, Web Services and the .NET Framework. Future work is proposed to extend the system from the prototype stage and to overcome a problem with the masking of poor scores

    Analysis of dependence among size, rate and duration in internet flows

    Get PDF
    In this paper we examine rigorously the evidence for dependence among data size, transfer rate and duration in Internet flows. We emphasize two statistical approaches for studying dependence, including Pearson's correlation coefficient and the extremal dependence analysis method. We apply these methods to large data sets of packet traces from three networks. Our major results show that Pearson's correlation coefficients between size and duration are much smaller than one might expect. We also find that correlation coefficients between size and rate are generally small and can be strongly affected by applying thresholds to size or duration. Based on Transmission Control Protocol connection startup mechanisms, we argue that thresholds on size should be more useful than thresholds on duration in the analysis of correlations. Using extremal dependence analysis, we draw a similar conclusion, finding remarkable independence for extremal values of size and rate.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS268 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Generalized Multivariate Extreme Value Models for Explicit Route Choice Sets

    Get PDF
    This paper analyses a class of route choice models with closed-form probability expressions, namely, Generalized Multivariate Extreme Value (GMEV) models. A large group of these models emerge from different utility formulas that combine systematic utility and random error terms. Twelve models are captured in a single discrete choice framework. The additive utility formula leads to the known logit family, being multinomial, path-size, paired combinatorial and link-nested. For the multiplicative formulation only the multinomial and path-size weibit models have been identified; this study also identifies the paired combinatorial and link-nested variations, and generalizes the path-size variant. Furthermore, a new traveller's decision rule based on the multiplicative utility formula with a reference route is presented. Here the traveller chooses exclusively based on the differences between routes. This leads to four new GMEV models. We assess the models qualitatively based on a generic structure of route utility with random foreseen travel times, for which we empirically identify that the variance of utility should be different from thus far assumed for multinomial probit and logit-kernel models. The expected travellers' behaviour and model-behaviour under simple network changes are analysed. Furthermore, all models are estimated and validated on an illustrative network example with long distance and short distance origin-destination pairs. The new multiplicative models based on differences outperform the additive models in both tests

    Characterization of Vehicle Behavior with Information Theory

    Full text link
    This work proposes the use of Information Theory for the characterization of vehicles behavior through their velocities. Three public data sets were used: i.Mobile Century data set collected on Highway I-880, near Union City, California; ii.Borl\"ange GPS data set collected in the Swedish city of Borl\"ange; and iii.Beijing taxicabs data set collected in Beijing, China, where each vehicle speed is stored as a time series. The Bandt-Pompe methodology combined with the Complexity-Entropy plane were used to identify different regimes and behaviors. The global velocity is compatible with a correlated noise with f^{-k} Power Spectrum with k >= 0. With this we identify traffic behaviors as, for instance, random velocities (k aprox. 0) when there is congestion, and more correlated velocities (k aprox. 3) in the presence of free traffic flow
    • …
    corecore