16,840 research outputs found
Defining and identifying communities in networks
The investigation of community structures in networks is an important issue
in many domains and disciplines. This problem is relevant for social tasks
(objective analysis of relationships on the web), biological inquiries
(functional studies in metabolic, cellular or protein networks) or
technological problems (optimization of large infrastructures). Several types
of algorithm exist for revealing the community structure in networks, but a
general and quantitative definition of community is still lacking, leading to
an intrinsic difficulty in the interpretation of the results of the algorithms
without any additional non-topological information. In this paper we face this
problem by introducing two quantitative definitions of community and by showing
how they are implemented in practice in the existing algorithms. In this way
the algorithms for the identification of the community structure become fully
self-contained. Furthermore, we propose a new local algorithm to detect
communities which outperforms the existing algorithms with respect to the
computational cost, keeping the same level of reliability. The new algorithm is
tested on artificial and real-world graphs. In particular we show the
application of the new algorithm to a network of scientific collaborations,
which, for its size, can not be attacked with the usual methods. This new class
of local algorithms could open the way to applications to large-scale
technological and biological applications.Comment: Revtex, final form, 14 pages, 6 figure
An Oracle Approach for Interaction Neighborhood Estimation in Random Fields
We consider the problem of interaction neighborhood estimation from the
partial observation of a finite number of realizations of a random field. We
introduce a model selection rule to choose estimators of conditional
probabilities among natural candidates. Our main result is an oracle inequality
satisfied by the resulting estimator. We use then this selection rule in a
two-step procedure to evaluate the interacting neighborhoods. The selection
rule selects a small prior set of possible interacting points and a cutting
step remove from this prior set the irrelevant points. We also prove that the
Ising models satisfy the assumptions of the main theorems, without restrictions
on the temperature, on the structure of the interacting graph or on the range
of the interactions. It provides therefore a large class of applications for
our results. We give a computationally efficient procedure in these models. We
finally show the practical efficiency of our approach in a simulation study.Comment: 36 pages, 10 figure
A structural Markov property for decomposable graph laws that allows control of clique intersections
We present a new kind of structural Markov property for probabilistic laws on
decomposable graphs, which allows the explicit control of interactions between
cliques, so is capable of encoding some interesting structure. We prove the
equivalence of this property to an exponential family assumption, and discuss
identifiability, modelling, inferential and computational implications.Comment: 10 pages, 3 figures; updated from V1 following journal review, new
more explicit title and added section on inferenc
Negative association in uniform forests and connected graphs
We consider three probability measures on subsets of edges of a given finite
graph , namely those which govern, respectively, a uniform forest, a uniform
spanning tree, and a uniform connected subgraph. A conjecture concerning the
negative association of two edges is reviewed for a uniform forest, and a
related conjecture is posed for a uniform connected subgraph. The former
conjecture is verified numerically for all graphs having eight or fewer
vertices, or having nine vertices and no more than eighteen edges, using a
certain computer algorithm which is summarised in this paper. Negative
association is known already to be valid for a uniform spanning tree. The three
cases of uniform forest, uniform spanning tree, and uniform connected subgraph
are special cases of a more general conjecture arising from the random-cluster
model of statistical mechanics.Comment: With minor correction
Optimal model-free prediction from multivariate time series
Forecasting a time series from multivariate predictors constitutes a
challenging problem, especially using model-free approaches. Most techniques,
such as nearest-neighbor prediction, quickly suffer from the curse of
dimensionality and overfitting for more than a few predictors which has limited
their application mostly to the univariate case. Therefore, selection
strategies are needed that harness the available information as efficiently as
possible. Since often the right combination of predictors matters, ideally all
subsets of possible predictors should be tested for their predictive power, but
the exponentially growing number of combinations makes such an approach
computationally prohibitive. Here a prediction scheme that overcomes this
strong limitation is introduced utilizing a causal pre-selection step which
drastically reduces the number of possible predictors to the most predictive
set of causal drivers making a globally optimal search scheme tractable. The
information-theoretic optimality is derived and practical selection criteria
are discussed. As demonstrated for multivariate nonlinear stochastic delay
processes, the optimal scheme can even be less computationally expensive than
commonly used sub-optimal schemes like forward selection. The method suggests a
general framework to apply the optimal model-free approach to select variables
and subsequently fit a model to further improve a prediction or learn
statistical dependencies. The performance of this framework is illustrated on a
climatological index of El Ni\~no Southern Oscillation.Comment: 14 pages, 9 figure
- …