15,919 research outputs found
On Cognitive Preferences and the Plausibility of Rule-based Models
It is conventional wisdom in machine learning and data mining that logical
models such as rule sets are more interpretable than other models, and that
among such rule-based models, simpler models are more interpretable than more
complex ones. In this position paper, we question this latter assumption by
focusing on one particular aspect of interpretability, namely the plausibility
of models. Roughly speaking, we equate the plausibility of a model with the
likeliness that a user accepts it as an explanation for a prediction. In
particular, we argue that, all other things being equal, longer explanations
may be more convincing than shorter ones, and that the predominant bias for
shorter models, which is typically necessary for learning powerful
discriminative models, may not be suitable when it comes to user acceptance of
the learned models. To that end, we first recapitulate evidence for and against
this postulate, and then report the results of an evaluation in a
crowd-sourcing study based on about 3.000 judgments. The results do not reveal
a strong preference for simple rules, whereas we can observe a weak preference
for longer rules in some domains. We then relate these results to well-known
cognitive biases such as the conjunction fallacy, the representative heuristic,
or the recogition heuristic, and investigate their relation to rule length and
plausibility.Comment: V4: Another rewrite of section on interpretability to clarify focus
on plausibility and relation to interpretability, comprehensibility, and
justifiabilit
GraphCombEx: A Software Tool for Exploration of Combinatorial Optimisation Properties of Large Graphs
We present a prototype of a software tool for exploration of multiple
combinatorial optimisation problems in large real-world and synthetic complex
networks. Our tool, called GraphCombEx (an acronym of Graph Combinatorial
Explorer), provides a unified framework for scalable computation and
presentation of high-quality suboptimal solutions and bounds for a number of
widely studied combinatorial optimisation problems. Efficient representation
and applicability to large-scale graphs and complex networks are particularly
considered in its design. The problems currently supported include maximum
clique, graph colouring, maximum independent set, minimum vertex clique
covering, minimum dominating set, as well as the longest simple cycle problem.
Suboptimal solutions and intervals for optimal objective values are estimated
using scalable heuristics. The tool is designed with extensibility in mind,
with the view of further problems and both new fast and high-performance
heuristics to be added in the future. GraphCombEx has already been successfully
used as a support tool in a number of recent research studies using
combinatorial optimisation to analyse complex networks, indicating its promise
as a research software tool
A Case Study in Matching Service Descriptions to Implementations in an Existing System
A number of companies are trying to migrate large monolithic software systems
to Service Oriented Architectures. A common approach to do this is to first
identify and describe desired services (i.e., create a model), and then to
locate portions of code within the existing system that implement the described
services. In this paper we describe a detailed case study we undertook to match
a model to an open-source business application. We describe the systematic
methodology we used, the results of the exercise, as well as several
observations that throw light on the nature of this problem. We also suggest
and validate heuristics that are likely to be useful in partially automating
the process of matching service descriptions to implementations.Comment: 20 pages, 19 pdf figure
Technology diffusion in communication networks
The deployment of new technologies in the Internet is notoriously difficult, as evidence by the myriad of well-developed networking technologies that still have not seen widespread adoption (e.g., secure routing, IPv6, etc.) A key hurdle is the fact that the Internet lacks a centralized authority that can mandate the deployment of a new technology. Instead, the Internet consists of thousands of nodes, each controlled by an autonomous, profit-seeking firm, that will deploy a new networking technology only if it obtains sufficient local utility by doing so. For the technologies we study here, local utility depends on the set of nodes that can be reached by traversing paths consisting only of nodes that have already deployed the new technology.
To understand technology diffusion in the Internet, we propose a new model inspired by work on the spread of influence in social networks. Unlike traditional models, where a node's utility depends only its immediate neighbors, in our model, a node can be influenced by the actions of remote nodes. Specifically, we assume node v activates (i.e. deploys the new technology) when it is adjacent to a sufficiently large connected component in the subgraph induced by the set of active nodes; namely, of size exceeding node v's threshold value \theta(v). We are interested in the problem of choosing the right seedset of nodes to activate initially, so that the rest of the nodes in the network have sufficient local utility to follow suit.
We take the graph and thresholds values as input to our problem. We show that our problem is both NP-hard and does not admit an (1-o(1) ln|V| approximation on general graphs. Then, we restrict our study to technology diffusion problems where (a) maximum distance between any pair of nodes in the graph is r, and (b) there are at most \ell possible threshold values. Our set of restrictions is quite natural, given that (a) the Internet graph has constant diameter, and (b) the fact that limiting the granularity of the threshold values makes sense given the difficulty in obtaining empirical data that parameterizes deployment costs and benefits.
We present algorithm that obtains a solution with guaranteed approximation rate of O(r^2 \ell \log|V|) which is asymptotically optimal, given our hardness results. Our approximation algorithm is a linear-programming relaxation of an 0-1 integer program along with a novel randomized rounding scheme.National Science Foundation (S-1017907, CCF-0915922
Towards trajectory anonymization: a generalization-based approach
Trajectory datasets are becoming popular due to the massive usage of GPS and locationbased services. In this paper, we address privacy issues regarding the identification of individuals in static trajectory datasets. We first adopt the notion of k-anonymity to trajectories and propose a novel generalization-based approach for anonymization of trajectories. We further show that releasing
anonymized trajectories may still have some privacy leaks. Therefore we propose a randomization based reconstruction algorithm for releasing anonymized trajectory data and also present how the underlying techniques can be adapted to other anonymity standards. The experimental results on real and synthetic trajectory datasets show the effectiveness of the proposed techniques
- …