Search CORE

132,462 research outputs found

Phase transition and hysteresis in scale-free network traffic

Author: A. Baronchelli
Bing-Hong Wang
Bo-Yu Lin
Chuan-Long Tang
L. Steels
L. Steels
Mao-Bin Hu
T. Briscoe
Wen-Xu Wang
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/2007
Field of study

We model information traffic on scale-free networks by introducing the node queue length L proportional to the node degree and its delivering ability C proportional to L. The simulation gives the overall capacity of the traffic system, which is quantified by a phase transition from free flow to congestion. It is found that the maximal capacity of the system results from the case of the local routing coefficient \phi slightly larger than zero, and we provide an analysis for the optimal value of \phi. In addition, we report for the first time the fundamental diagram of flow against density, in which hysteresis is found, and thus we can classify the traffic flow with four states: free flow, saturated flow, bistable, and jammed.Comment: 5 pages, 4 figure

arXiv.org e-Print Archive

espace@Curtin

Equitability, mutual information, and the maximal information coefficient

Author: Atwal Gurinder S.
Kinney Justin B.
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 31/01/2013
Field of study

Reshef et al. recently proposed a new statistical measure, the "maximal information coefficient" (MIC), for quantifying arbitrary dependencies between pairs of stochastic quantities. MIC is based on mutual information, a fundamental quantity in information theory that is widely understood to serve this need. MIC, however, is not an estimate of mutual information. Indeed, it was claimed that MIC possesses a desirable mathematical property called "equitability" that mutual information lacks. This was not proven; instead it was argued solely through the analysis of simulated data. Here we show that this claim, in fact, is incorrect. First we offer mathematical proof that no (non-trivial) dependence measure satisfies the definition of equitability proposed by Reshef et al.. We then propose a self-consistent and more general definition of equitability that follows naturally from the Data Processing Inequality. Mutual information satisfies this new definition of equitability while MIC does not. Finally, we show that the simulation evidence offered by Reshef et al. was artifactual. We conclude that estimating mutual information is not only practical for many real-world applications, but also provides a natural solution to the problem of quantifying associations in large data sets

arXiv.org e-Print Archive

Cold Spring Harbor Laboratory Institutional Repository

A Framework to Adjust Dependency Measure Estimates for Chance

Author: Bailey James
Romano Simone
Verspoor Karin
Vinh Nguyen Xuan
Publication venue
Publication date: 20/01/2016
Field of study

Estimating the strength of dependency between two variables is fundamental for exploratory analysis and many other applications in data mining. For example: non-linear dependencies between two continuous variables can be explored with the Maximal Information Coefficient (MIC); and categorical variables that are dependent to the target class are selected using Gini gain in random forests. Nonetheless, because dependency measures are estimated on finite samples, the interpretability of their quantification and the accuracy when ranking dependencies become challenging. Dependency estimates are not equal to 0 when variables are independent, cannot be compared if computed on different sample size, and they are inflated by chance on variables with more categories. In this paper, we propose a framework to adjust dependency measure estimates on finite samples. Our adjustments, which are simple and applicable to any dependency measure, are helpful in improving interpretability when quantifying dependency and in improving accuracy on the task of ranking dependencies. In particular, we demonstrate that our approach enhances the interpretability of MIC when used as a proxy for the amount of noise between variables, and to gain accuracy when ranking variables during the splitting procedure in random forests.Comment: In Proceedings of the 2016 SIAM International Conference on Data Minin

arXiv.org e-Print Archive