1,011,041 research outputs found

    {VoG}: {Summarizing} and Understanding Large Graphs

    Get PDF
    How can we succinctly describe a million-node graph with a few simple sentences? How can we measure the "importance" of a set of discovered subgraphs in a large graph? These are exactly the problems we focus on. Our main ideas are to construct a "vocabulary" of subgraph-types that often occur in real graphs (e.g., stars, cliques, chains), and from a set of subgraphs, find the most succinct description of a graph in terms of this vocabulary. We measure success in a well-founded way by means of the Minimum Description Length (MDL) principle: a subgraph is included in the summary if it decreases the total description length of the graph. Our contributions are three-fold: (a) formulation: we provide a principled encoding scheme to choose vocabulary subgraphs; (b) algorithm: we develop \method, an efficient method to minimize the description cost, and (c) applicability: we report experimental results on multi-million-edge real graphs, including Flickr and the Notre Dame web graph

    VoG: Summarizing and Understanding Large Graphs

    Get PDF
    How can we succinctly describe a million-node graph with a few simple sentences? How can we measure the "importance" of a set of discovered subgraphs in a large graph? These are exactly the problems we focus on. Our main ideas are to construct a "vocabulary" of subgraph-types that often occur in real graphs (e.g., stars, cliques, chains), and from a set of subgraphs, find the most succinct description of a graph in terms of this vocabulary. We measure success in a well-founded way by means of the Minimum Description Length (MDL) principle: a subgraph is included in the summary if it decreases the total description length of the graph. Our contributions are three-fold: (a) formulation: we provide a principled encoding scheme to choose vocabulary subgraphs; (b) algorithm: we develop \method, an efficient method to minimize the description cost, and (c) applicability: we report experimental results on multi-million-edge real graphs, including Flickr and the Notre Dame web graph.Comment: SIAM International Conference on Data Mining (SDM) 201

    Clustering for Different Scales of Measurement - the Gap-Ratio Weighted K-means Algorithm

    Full text link
    This paper describes a method for clustering data that are spread out over large regions and which dimensions are on different scales of measurement. Such an algorithm was developed to implement a robotics application consisting in sorting and storing objects in an unsupervised way. The toy dataset used to validate such application consists of Lego bricks of different shapes and colors. The uncontrolled lighting conditions together with the use of RGB color features, respectively involve data with a large spread and different levels of measurement between data dimensions. To overcome the combination of these two characteristics in the data, we have developed a new weighted K-means algorithm, called gap-ratio K-means, which consists in weighting each dimension of the feature space before running the K-means algorithm. The weight associated with a feature is proportional to the ratio of the biggest gap between two consecutive data points, and the average of all the other gaps. This method is compared with two other variants of K-means on the Lego bricks clustering problem as well as two other common classification datasets.Comment: 13 pages, 6 figures, 2 tables. This paper is under the review process for AIAP 201

    Structural Equation Modeling and simultaneous clustering through the Partial Least Squares algorithm

    Full text link
    The identification of different homogeneous groups of observations and their appropriate analysis in PLS-SEM has become a critical issue in many appli- cation fields. Usually, both SEM and PLS-SEM assume the homogeneity of all units on which the model is estimated, and approaches of segmentation present in literature, consist in estimating separate models for each segments of statistical units, which have been obtained either by assigning the units to segments a priori defined. However, these approaches are not fully accept- able because no causal structure among the variables is postulated. In other words, a modeling approach should be used, where the obtained clusters are homogeneous with respect to the structural causal relationships. In this paper, a new methodology for simultaneous non-hierarchical clus- tering and PLS-SEM is proposed. This methodology is motivated by the fact that the sequential approach of applying first SEM or PLS-SEM and second the clustering algorithm such as K-means on the latent scores of the SEM/PLS-SEM may fail to find the correct clustering structure existing in the data. A simulation study and an application on real data are included to evaluate the performance of the proposed methodology

    The quantum correlation between the selection of the problem and that of the solution sheds light on the mechanism of the quantum speed up

    Full text link
    In classical problem solving, there is of course correlation between the selection of the problem on the part of Bob (the problem setter) and that of the solution on the part of Alice (the problem solver). In quantum problem solving, this correlation becomes quantum. This means that Alice contributes to selecting 50% of the information that specifies the problem. As the solution is a function of the problem, this gives to Alice advanced knowledge of 50% of the information that specifies the solution. Both the quadratic and exponential speed ups are explained by the fact that quantum algorithms start from this advanced knowledge.Comment: Earlier version submitted to QIP 2011. Further clarified section 1, "Outline of the argument", submitted to Phys Rev A, 16 page

    A novel ensemble method for electric vehicle power consumption forecasting: Application to the Spanish system

    Get PDF
    The use of electric vehicle across the world has become one of the most challenging issues for environmental policies. The galloping climate change and the expected running out of fossil fuels turns the use of such non-polluting cars into a priority for most developed countries. However, such a use has led to major concerns to power companies, since they must adapt their generation to a new scenario, in which electric vehicles will dramatically modify the curve of generation. In this paper, a novel approach based on ensemble learning is proposed. In particular, ARIMA, GARCH and PSF algorithms' performances are used to forecast the electric vehicle power consumption in Spain. It is worth noting that the studied time series of consumption is non-stationary and adds difficulties to the forecasting process. Thus, an ensemble is proposed by dynamically weighting all algorithms over time. The proposal presented has been implemented for a real case, in particular, at the Spanish Control Centre for the Electric Vehicle. The performance of the approach is assessed by means of WAPE, showing robust and promising results for this research field.Ministerio de Economía y Competitividad Proyectos ENE2016-77650-R, PCIN-2015-04 y TIN2017-88209-C2-R

    Extending CKKW-merging to One-Loop Matrix Elements

    Full text link
    We extend earlier schemes for merging tree-level matrix elements with parton showers to include also merging with one-loop matrix elements. In this paper we make a first study on how to include one-loop corrections, not only for events with a given jet multiplicity, but simultaneously for several different jet multiplicities. Results are presented for the simplest non-trivial case of hadronic events at LEP as a proof-of-concept
    corecore