308 research outputs found
LiveRank: How to Refresh Old Crawls
International audienceThis paper considers the problem of refreshing a crawl. More precisely, given a collection of Web pages (with hyperlinks) gathered at some time, we want to identify a significant fraction of these pages that still exist at present time. The liveness of an old page can be tested through an online query at present time. We call LiveRank a ranking of the old pages so that active nodes are more likely to appear first. The quality of a LiveRank is measured by the number of queries necessary to identify a given fraction of the alive pages when using the LiveRank order. We study different scenarios from a static setting where the LiveRank is computed before any query is made, to dynamic settings where the LiveRank can be updated as queries are processed. Our results show that building on the PageRank can lead to efficient LiveRanks for Web graphs
Spectral centrality measures in complex networks
Complex networks are characterized by heterogeneous distributions of the
degree of nodes, which produce a large diversification of the roles of the
nodes within the network. Several centrality measures have been introduced to
rank nodes based on their topological importance within a graph. Here we review
and compare centrality measures based on spectral properties of graph matrices.
We shall focus on PageRank, eigenvector centrality and the hub/authority scores
of HITS. We derive simple relations between the measures and the (in)degree of
the nodes, in some limits. We also compare the rankings obtained with different
centrality measures.Comment: 11 pages, 10 figures, 5 tables. Final version published in Physical
Review
The Number of Convex Permutominoes
Permutominoes are polyominoes defined by suitable pairs of permutations. In this paper we provide a formula to count the number of convex permutominoes of given perimeter. To this aim we define the transform of a generic pair of permutations, we characterize the transform of any pair defining a convex permutomino, and we solve the counting problem in the transformed space
Ranking and clustering of nodes in networks with smart teleportation
Random teleportation is a necessary evil for ranking and clustering directed
networks based on random walks. Teleportation enables ergodic solutions, but
the solutions must necessarily depend on the exact implementation and
parametrization of the teleportation. For example, in the commonly used
PageRank algorithm, the teleportation rate must trade off a heavily biased
solution with a uniform solution. Here we show that teleportation to links
rather than nodes enables a much smoother trade-off and effectively more robust
results. We also show that, by not recording the teleportation steps of the
random walker, we can further reduce the effect of teleportation with dramatic
effects on clustering.Comment: 10 pages, 7 figure
Efficiently Clustering Very Large Attributed Graphs
Attributed graphs model real networks by enriching their nodes with
attributes accounting for properties. Several techniques have been proposed for
partitioning these graphs into clusters that are homogeneous with respect to
both semantic attributes and to the structure of the graph. However, time and
space complexities of state of the art algorithms limit their scalability to
medium-sized graphs. We propose SToC (for Semantic-Topological Clustering), a
fast and scalable algorithm for partitioning large attributed graphs. The
approach is robust, being compatible both with categorical and with
quantitative attributes, and it is tailorable, allowing the user to weight the
semantic and topological components. Further, the approach does not require the
user to guess in advance the number of clusters. SToC relies on well known
approximation techniques such as bottom-k sketches, traditional graph-theoretic
concepts, and a new perspective on the composition of heterogeneous distance
measures. Experimental results demonstrate its ability to efficiently compute
high-quality partitions of large scale attributed graphs.Comment: This work has been published in ASONAM 2017. This version includes an
appendix with validation of our attribute model and distance function,
omitted in the converence version for lack of space. Please refer to the
published versio
A 5G mobile network architecture to support vertical industries
The telecom industry is moving from a "horizontal" service delivery model, where services are defined independent of their consumers, toward a "vertical" delivery model, where the provided services are tailored to specific industry sectors and verticals. In order to enable this transition, an end-to-end comprehensive 5G architecture is needed, with capabilities to support the use cases of the different vertical industries. A key feature of this architecture is the implementation of network slicing over a single infrastructure to provision highly heterogeneous vertical services, as well as a network slicing management system capable of handling simultaneous slices. On top of the network slicing technology, functionality needs to be devised to deploy the slices required by the different vertical players and provide them with a suitable interface to manage their slice. In this article, we design a 5G mobile network architecture to support vertical industries. The proposed architecture builds on ongoing standardization efforts at 3GPP and ETSI, and incorporates additional modules to provide enhanced MANO and control functionality as well as artificial-intelligence-based data analytics. On top of these modules, a service layer is provided to offer vertical players an easyto- use interface to manage their services.This work was supported by the H2020 5G-TOURS European project (Grant Agreement No. 856950)
Evaluating the impact of topological protein features on the negative examples selection
Supervised machine learning methods when applied to the problem of automated protein-function prediction (AFP) require the availability of both positive examples (i.e., proteins which are known to possess a given protein function) and negative examples (corresponding to proteins not associated with that function). Unfortunately, publicly available proteome and genome data sources such as the Gene Ontology rarely store the functions not possessed by a protein. Thus the negative selection, consisting in identifying informative negative examples, is currently a central and challenging problem in AFP. Several heuristics have been proposed through the years to solve this problem; nevertheless, despite their effectiveness, to the best of our knowledge no previous existing work studied which protein features are more relevant to this task, that is, which protein features help more in discriminating reliable and unreliable negatives
Estimation of urban sensible heat flux using a dense wireless network of observations
The determination of the sensible heat flux over urban terrain is challenging due to irregular surface geometry and surface types. To address this, in 2006-07, a major field campaign (LUCE) took place at the École Polytechnique Fédérale de Lausanne campus, a moderately occupied urban site. A distributed network of 92 wireless weather stations was combined with routine atmospheric profiling, offering high temporal and spatial resolution meteorological measurements. The objective of this study is to estimate the sensible heat flux over the built environment under convective conditions. Calculations were based on Monin-Obukhov similarity for temperature in the surface layer. The results illustrate a good agreement between the sensible heat flux inferred from the thermal roughness length approach and independent calibrated measurements from a scintillometer located inside the urban canopy. It also shows that using only one well-selected station can provide a good estimate of the sensible heat flux over the campus for convective conditions. Overall, this study illustrates how an extensive network of meteorological measurements can be a useful tool to estimate the sensible heat flux in complex urban environment
Risk-Averse Matchings over Uncertain Graph Databases
A large number of applications such as querying sensor networks, and
analyzing protein-protein interaction (PPI) networks, rely on mining uncertain
graph and hypergraph databases. In this work we study the following problem:
given an uncertain, weighted (hyper)graph, how can we efficiently find a
(hyper)matching with high expected reward, and low risk?
This problem naturally arises in the context of several important
applications, such as online dating, kidney exchanges, and team formation. We
introduce a novel formulation for finding matchings with maximum expected
reward and bounded risk under a general model of uncertain weighted
(hyper)graphs that we introduce in this work. Our model generalizes
probabilistic models used in prior work, and captures both continuous and
discrete probability distributions, thus allowing to handle privacy related
applications that inject appropriately distributed noise to (hyper)edge
weights. Given that our optimization problem is NP-hard, we turn our attention
to designing efficient approximation algorithms. For the case of uncertain
weighted graphs, we provide a -approximation algorithm, and a
-approximation algorithm with near optimal run time. For the case
of uncertain weighted hypergraphs, we provide a
-approximation algorithm, where is the rank of the
hypergraph (i.e., any hyperedge includes at most nodes), that runs in
almost (modulo log factors) linear time.
We complement our theoretical results by testing our approximation algorithms
on a wide variety of synthetic experiments, where we observe in a controlled
setting interesting findings on the trade-off between reward, and risk. We also
provide an application of our formulation for providing recommendations of
teams that are likely to collaborate, and have high impact.Comment: 25 page
- …