15,919 research outputs found

    On Cognitive Preferences and the Plausibility of Rule-based Models

    Get PDF
    It is conventional wisdom in machine learning and data mining that logical models such as rule sets are more interpretable than other models, and that among such rule-based models, simpler models are more interpretable than more complex ones. In this position paper, we question this latter assumption by focusing on one particular aspect of interpretability, namely the plausibility of models. Roughly speaking, we equate the plausibility of a model with the likeliness that a user accepts it as an explanation for a prediction. In particular, we argue that, all other things being equal, longer explanations may be more convincing than shorter ones, and that the predominant bias for shorter models, which is typically necessary for learning powerful discriminative models, may not be suitable when it comes to user acceptance of the learned models. To that end, we first recapitulate evidence for and against this postulate, and then report the results of an evaluation in a crowd-sourcing study based on about 3.000 judgments. The results do not reveal a strong preference for simple rules, whereas we can observe a weak preference for longer rules in some domains. We then relate these results to well-known cognitive biases such as the conjunction fallacy, the representative heuristic, or the recogition heuristic, and investigate their relation to rule length and plausibility.Comment: V4: Another rewrite of section on interpretability to clarify focus on plausibility and relation to interpretability, comprehensibility, and justifiabilit

    GraphCombEx: A Software Tool for Exploration of Combinatorial Optimisation Properties of Large Graphs

    Full text link
    We present a prototype of a software tool for exploration of multiple combinatorial optimisation problems in large real-world and synthetic complex networks. Our tool, called GraphCombEx (an acronym of Graph Combinatorial Explorer), provides a unified framework for scalable computation and presentation of high-quality suboptimal solutions and bounds for a number of widely studied combinatorial optimisation problems. Efficient representation and applicability to large-scale graphs and complex networks are particularly considered in its design. The problems currently supported include maximum clique, graph colouring, maximum independent set, minimum vertex clique covering, minimum dominating set, as well as the longest simple cycle problem. Suboptimal solutions and intervals for optimal objective values are estimated using scalable heuristics. The tool is designed with extensibility in mind, with the view of further problems and both new fast and high-performance heuristics to be added in the future. GraphCombEx has already been successfully used as a support tool in a number of recent research studies using combinatorial optimisation to analyse complex networks, indicating its promise as a research software tool

    A Case Study in Matching Service Descriptions to Implementations in an Existing System

    Full text link
    A number of companies are trying to migrate large monolithic software systems to Service Oriented Architectures. A common approach to do this is to first identify and describe desired services (i.e., create a model), and then to locate portions of code within the existing system that implement the described services. In this paper we describe a detailed case study we undertook to match a model to an open-source business application. We describe the systematic methodology we used, the results of the exercise, as well as several observations that throw light on the nature of this problem. We also suggest and validate heuristics that are likely to be useful in partially automating the process of matching service descriptions to implementations.Comment: 20 pages, 19 pdf figure

    Technology diffusion in communication networks

    Full text link
    The deployment of new technologies in the Internet is notoriously difficult, as evidence by the myriad of well-developed networking technologies that still have not seen widespread adoption (e.g., secure routing, IPv6, etc.) A key hurdle is the fact that the Internet lacks a centralized authority that can mandate the deployment of a new technology. Instead, the Internet consists of thousands of nodes, each controlled by an autonomous, profit-seeking firm, that will deploy a new networking technology only if it obtains sufficient local utility by doing so. For the technologies we study here, local utility depends on the set of nodes that can be reached by traversing paths consisting only of nodes that have already deployed the new technology. To understand technology diffusion in the Internet, we propose a new model inspired by work on the spread of influence in social networks. Unlike traditional models, where a node's utility depends only its immediate neighbors, in our model, a node can be influenced by the actions of remote nodes. Specifically, we assume node v activates (i.e. deploys the new technology) when it is adjacent to a sufficiently large connected component in the subgraph induced by the set of active nodes; namely, of size exceeding node v's threshold value \theta(v). We are interested in the problem of choosing the right seedset of nodes to activate initially, so that the rest of the nodes in the network have sufficient local utility to follow suit. We take the graph and thresholds values as input to our problem. We show that our problem is both NP-hard and does not admit an (1-o(1) ln|V| approximation on general graphs. Then, we restrict our study to technology diffusion problems where (a) maximum distance between any pair of nodes in the graph is r, and (b) there are at most \ell possible threshold values. Our set of restrictions is quite natural, given that (a) the Internet graph has constant diameter, and (b) the fact that limiting the granularity of the threshold values makes sense given the difficulty in obtaining empirical data that parameterizes deployment costs and benefits. We present algorithm that obtains a solution with guaranteed approximation rate of O(r^2 \ell \log|V|) which is asymptotically optimal, given our hardness results. Our approximation algorithm is a linear-programming relaxation of an 0-1 integer program along with a novel randomized rounding scheme.National Science Foundation (S-1017907, CCF-0915922

    Towards trajectory anonymization: a generalization-based approach

    Get PDF
    Trajectory datasets are becoming popular due to the massive usage of GPS and locationbased services. In this paper, we address privacy issues regarding the identification of individuals in static trajectory datasets. We first adopt the notion of k-anonymity to trajectories and propose a novel generalization-based approach for anonymization of trajectories. We further show that releasing anonymized trajectories may still have some privacy leaks. Therefore we propose a randomization based reconstruction algorithm for releasing anonymized trajectory data and also present how the underlying techniques can be adapted to other anonymity standards. The experimental results on real and synthetic trajectory datasets show the effectiveness of the proposed techniques
    corecore