308 research outputs found

    LiveRank: How to Refresh Old Crawls

    Get PDF
    International audienceThis paper considers the problem of refreshing a crawl. More precisely, given a collection of Web pages (with hyperlinks) gathered at some time, we want to identify a significant fraction of these pages that still exist at present time. The liveness of an old page can be tested through an online query at present time. We call LiveRank a ranking of the old pages so that active nodes are more likely to appear first. The quality of a LiveRank is measured by the number of queries necessary to identify a given fraction of the alive pages when using the LiveRank order. We study different scenarios from a static setting where the LiveRank is computed before any query is made, to dynamic settings where the LiveRank can be updated as queries are processed. Our results show that building on the PageRank can lead to efficient LiveRanks for Web graphs

    Spectral centrality measures in complex networks

    Full text link
    Complex networks are characterized by heterogeneous distributions of the degree of nodes, which produce a large diversification of the roles of the nodes within the network. Several centrality measures have been introduced to rank nodes based on their topological importance within a graph. Here we review and compare centrality measures based on spectral properties of graph matrices. We shall focus on PageRank, eigenvector centrality and the hub/authority scores of HITS. We derive simple relations between the measures and the (in)degree of the nodes, in some limits. We also compare the rankings obtained with different centrality measures.Comment: 11 pages, 10 figures, 5 tables. Final version published in Physical Review

    The Number of Convex Permutominoes

    Get PDF
    Permutominoes are polyominoes defined by suitable pairs of permutations. In this paper we provide a formula to count the number of convex permutominoes of given perimeter. To this aim we define the transform of a generic pair of permutations, we characterize the transform of any pair defining a convex permutomino, and we solve the counting problem in the transformed space

    Ranking and clustering of nodes in networks with smart teleportation

    Get PDF
    Random teleportation is a necessary evil for ranking and clustering directed networks based on random walks. Teleportation enables ergodic solutions, but the solutions must necessarily depend on the exact implementation and parametrization of the teleportation. For example, in the commonly used PageRank algorithm, the teleportation rate must trade off a heavily biased solution with a uniform solution. Here we show that teleportation to links rather than nodes enables a much smoother trade-off and effectively more robust results. We also show that, by not recording the teleportation steps of the random walker, we can further reduce the effect of teleportation with dramatic effects on clustering.Comment: 10 pages, 7 figure

    Efficiently Clustering Very Large Attributed Graphs

    Full text link
    Attributed graphs model real networks by enriching their nodes with attributes accounting for properties. Several techniques have been proposed for partitioning these graphs into clusters that are homogeneous with respect to both semantic attributes and to the structure of the graph. However, time and space complexities of state of the art algorithms limit their scalability to medium-sized graphs. We propose SToC (for Semantic-Topological Clustering), a fast and scalable algorithm for partitioning large attributed graphs. The approach is robust, being compatible both with categorical and with quantitative attributes, and it is tailorable, allowing the user to weight the semantic and topological components. Further, the approach does not require the user to guess in advance the number of clusters. SToC relies on well known approximation techniques such as bottom-k sketches, traditional graph-theoretic concepts, and a new perspective on the composition of heterogeneous distance measures. Experimental results demonstrate its ability to efficiently compute high-quality partitions of large scale attributed graphs.Comment: This work has been published in ASONAM 2017. This version includes an appendix with validation of our attribute model and distance function, omitted in the converence version for lack of space. Please refer to the published versio

    A 5G mobile network architecture to support vertical industries

    Get PDF
    The telecom industry is moving from a "horizontal" service delivery model, where services are defined independent of their consumers, toward a "vertical" delivery model, where the provided services are tailored to specific industry sectors and verticals. In order to enable this transition, an end-to-end comprehensive 5G architecture is needed, with capabilities to support the use cases of the different vertical industries. A key feature of this architecture is the implementation of network slicing over a single infrastructure to provision highly heterogeneous vertical services, as well as a network slicing management system capable of handling simultaneous slices. On top of the network slicing technology, functionality needs to be devised to deploy the slices required by the different vertical players and provide them with a suitable interface to manage their slice. In this article, we design a 5G mobile network architecture to support vertical industries. The proposed architecture builds on ongoing standardization efforts at 3GPP and ETSI, and incorporates additional modules to provide enhanced MANO and control functionality as well as artificial-intelligence-based data analytics. On top of these modules, a service layer is provided to offer vertical players an easyto- use interface to manage their services.This work was supported by the H2020 5G-TOURS European project (Grant Agreement No. 856950)

    Evaluating the impact of topological protein features on the negative examples selection

    Get PDF
    Supervised machine learning methods when applied to the problem of automated protein-function prediction (AFP) require the availability of both positive examples (i.e., proteins which are known to possess a given protein function) and negative examples (corresponding to proteins not associated with that function). Unfortunately, publicly available proteome and genome data sources such as the Gene Ontology rarely store the functions not possessed by a protein. Thus the negative selection, consisting in identifying informative negative examples, is currently a central and challenging problem in AFP. Several heuristics have been proposed through the years to solve this problem; nevertheless, despite their effectiveness, to the best of our knowledge no previous existing work studied which protein features are more relevant to this task, that is, which protein features help more in discriminating reliable and unreliable negatives

    Estimation of urban sensible heat flux using a dense wireless network of observations

    Get PDF
    The determination of the sensible heat flux over urban terrain is challenging due to irregular surface geometry and surface types. To address this, in 2006-07, a major field campaign (LUCE) took place at the École Polytechnique Fédérale de Lausanne campus, a moderately occupied urban site. A distributed network of 92 wireless weather stations was combined with routine atmospheric profiling, offering high temporal and spatial resolution meteorological measurements. The objective of this study is to estimate the sensible heat flux over the built environment under convective conditions. Calculations were based on Monin-Obukhov similarity for temperature in the surface layer. The results illustrate a good agreement between the sensible heat flux inferred from the thermal roughness length approach and independent calibrated measurements from a scintillometer located inside the urban canopy. It also shows that using only one well-selected station can provide a good estimate of the sensible heat flux over the campus for convective conditions. Overall, this study illustrates how an extensive network of meteorological measurements can be a useful tool to estimate the sensible heat flux in complex urban environment

    Risk-Averse Matchings over Uncertain Graph Databases

    Full text link
    A large number of applications such as querying sensor networks, and analyzing protein-protein interaction (PPI) networks, rely on mining uncertain graph and hypergraph databases. In this work we study the following problem: given an uncertain, weighted (hyper)graph, how can we efficiently find a (hyper)matching with high expected reward, and low risk? This problem naturally arises in the context of several important applications, such as online dating, kidney exchanges, and team formation. We introduce a novel formulation for finding matchings with maximum expected reward and bounded risk under a general model of uncertain weighted (hyper)graphs that we introduce in this work. Our model generalizes probabilistic models used in prior work, and captures both continuous and discrete probability distributions, thus allowing to handle privacy related applications that inject appropriately distributed noise to (hyper)edge weights. Given that our optimization problem is NP-hard, we turn our attention to designing efficient approximation algorithms. For the case of uncertain weighted graphs, we provide a 13\frac{1}{3}-approximation algorithm, and a 15\frac{1}{5}-approximation algorithm with near optimal run time. For the case of uncertain weighted hypergraphs, we provide a Ω(1k)\Omega(\frac{1}{k})-approximation algorithm, where kk is the rank of the hypergraph (i.e., any hyperedge includes at most kk nodes), that runs in almost (modulo log factors) linear time. We complement our theoretical results by testing our approximation algorithms on a wide variety of synthetic experiments, where we observe in a controlled setting interesting findings on the trade-off between reward, and risk. We also provide an application of our formulation for providing recommendations of teams that are likely to collaborate, and have high impact.Comment: 25 page
    • …
    corecore