2,776 research outputs found

    Link Prediction by De-anonymization: How We Won the Kaggle Social Network Challenge

    Full text link
    This paper describes the winning entry to the IJCNN 2011 Social Network Challenge run by Kaggle.com. The goal of the contest was to promote research on real-world link prediction, and the dataset was a graph obtained by crawling the popular Flickr social photo sharing website, with user identities scrubbed. By de-anonymizing much of the competition test set using our own Flickr crawl, we were able to effectively game the competition. Our attack represents a new application of de-anonymization to gaming machine learning contests, suggesting changes in how future competitions should be run. We introduce a new simulated annealing-based weighted graph matching algorithm for the seeding step of de-anonymization. We also show how to combine de-anonymization with link prediction---the latter is required to achieve good performance on the portion of the test set not de-anonymized---for example by training the predictor on the de-anonymized portion of the test set, and combining probabilistic predictions from de-anonymization and link prediction.Comment: 11 pages, 13 figures; submitted to IJCNN'201

    Recent Advances in Graph Partitioning

    Full text link
    We survey recent trends in practical algorithms for balanced graph partitioning together with applications and future research directions

    Reinforcement learning based local search for grouping problems: A case study on graph coloring

    Get PDF
    Grouping problems aim to partition a set of items into multiple mutually disjoint subsets according to some specific criterion and constraints. Grouping problems cover a large class of important combinatorial optimization problems that are generally computationally difficult. In this paper, we propose a general solution approach for grouping problems, i.e., reinforcement learning based local search (RLS), which combines reinforcement learning techniques with descent-based local search. The viability of the proposed approach is verified on a well-known representative grouping problem (graph coloring) where a very simple descent-based coloring algorithm is applied. Experimental studies on popular DIMACS and COLOR02 benchmark graphs indicate that RLS achieves competitive performances compared to a number of well-known coloring algorithms

    Auto-tuning similarity search algorithms on multi-core architectures

    Get PDF
    Cataloged from PDF version of article.In recent times, large high-dimensional datasets have become ubiquitous. Video and image repositories, financial, and sensor data are just a few examples of such datasets in practice. Many applications that use such datasets require the retrieval of data items similar to a given query item, or the nearest neighbors (NN or k-NN) of a given item. Another common query is the retrieval of multiple sets of nearest neighbors, i.e., multi k-NN, for different query items on the same data. With commodity multi-core CPUs becoming more and more widespread at lower costs, developing parallel algorithms for these search problems has become increasingly important. While the core nearest neighbor search problem is relatively easy to parallelize, it is challenging to tune it for optimality. This is due to the fact that the various performance-specific algorithmic parameters, or “tuning knobs”, are inter-related and also depend on the data and query workloads. In this paper, we present (1) a detailed study of the various tuning knobs and their contributions on increasing the query throughput for parallelized versions of the two most common classes of high-dimensional multi-NN search algorithms: linear scan and tree traversal, and (2) an offline auto-tuner for setting these knobs by iteratively measuring actual query execution times for a given workload and dataset. We show experimentally that our auto-tuner reaches near-optimal performance and significantly outperforms un-tuned versions of parallel multi-NN algorithms for real video repository data on a variety of multi-core platforms. © Springer Science+Business Media New York 201

    Multiple-retrieval case-based reasoning for course timetabling problems

    Get PDF
    The structured representation of cases by attribute graphs in a Case-Based Reasoning (CBR) system for course timetabling has been the subject of previous research by the authors. In that system, the case base is organised as a decision tree and the retrieval process chooses those cases which are sub attribute graph isomorphic to the new case. The drawback of that approach is that it is not suitable for solving large problems. This paper presents a multiple-retrieval approach that partitions a large problem into small solvable sub-problems by recursively inputting the unsolved part of the graph into the decision tree for retrieval. The adaptation combines the retrieved partial solutions of all the partitioned sub-problems and employs a graph heuristic method to construct the whole solution for the new case. We present a methodology which is not dependant upon problem specific information and which, as such, represents an approach which underpins the goal of building more general timetabling systems. We also explore the question of whether this multiple-retrieval CBR could be an effective initialisation method for local search methods such as Hill Climbing, Tabu Search and Simulated Annealing. Significant results are obtained from a wide range of experiments. An evaluation of the CBR system is presented and the impact of the approach on timetabling research is discussed. We see that the approach does indeed represent an effective initialisation method for these approaches

    Multiple-Retrieval Case-Based Reasoning for Course Timetabling Problems

    Get PDF
    The structured representation of cases by attribute graphs in a Case-Based Reasoning (CBR) system for course timetabling has been the subject of previous research by the authors. In that system, the case base is organised as a decision tree and the retrieval process chooses those cases which are sub attribute graph isomorphic to the new case. The drawback of that approach is that it is not suitable for solving large problems. This paper presents a multiple-retrieval approach that partitions a large problem into small solvable sub-problems by recursively inputting the unsolved part of the graph into the decision tree for retrieval. The adaptation combines the retrieved partial solutions of all the partitioned sub-problems and employs a graph heuristic method to construct the whole solution for the new case. We present a methodology which is not dependant upon problem specific information and which, as such, represents an approach which underpins the goal of building more general timetabling systems. We also explore the question of whether this multiple-retrieval CBR could be an effective initialisation method for local search methods such as Hill Climbing, Tabu Search and Simulated Annealing. Significant results are obtained from a wide range of experiments. An evaluation of the CBR system is presented and the impact of the approach on timetabling research is discussed. We see that the approach does indeed represent an effective initialisation method for these approaches

    Multiple-Retrieval Case-Based Reasoning for Course Timetabling Problems

    Get PDF
    The structured representation of cases by attribute graphs in a Case-Based Reasoning (CBR) system for course timetabling has been the subject of previous research by the authors. In that system, the case base is organised as a decision tree and the retrieval process chooses those cases which are sub attribute graph isomorphic to the new case. The drawback of that approach is that it is not suitable for solving large problems. This paper presents a multiple-retrieval approach that partitions a large problem into small solvable sub-problems by recursively inputting the unsolved part of the graph into the decision tree for retrieval. The adaptation combines the retrieved partial solutions of all the partitioned sub-problems and employs a graph heuristic method to construct the whole solution for the new case. We present a methodology which is not dependant upon problem specific information and which, as such, represents an approach which underpins the goal of building more general timetabling systems. We also explore the question of whether this multiple-retrieval CBR could be an effective initialisation method for local search methods such as Hill Climbing, Tabu Search and Simulated Annealing. Significant results are obtained from a wide range of experiments. An evaluation of the CBR system is presented and the impact of the approach on timetabling research is discussed. We see that the approach does indeed represent an effective initialisation method for these approaches
    • …
    corecore