12,184 research outputs found

    Using unknowns to prevent discovery of association rules

    Get PDF
    Data mining technology has given us new capabilities to identify correlations in large data sets. This introduces risks when the data is to be made public, but the correlations are private. We introduce a method for selectively removing individual values from a database to prevent the discovery of a set of rules, while preserving the data for other applications. The efficacy and complexity of this method are discussed. We also present an experiment showing an example of this methodology

    Criminal Adjudication, Error Correction, and Hindsight Blind Spots

    Get PDF
    Concerns about hindsight in the law typically arise with regard to the bias that outcome knowledge can produce. But a more difficult problem than the clear view that hindsight appears to provide is the blind spot that it actually has. Because of the conventional wisdom about error review, there is a missed opportunity to ensure meaningful scrutiny. Beyond the confirmation biases that make convictions seem inevitable lies the question whether courts can see what they are meant to assess when they do look closely for error. Standards that require a retrospective showing of materiality, prejudice, or harm turn on what a judge imagines would have happened at trial under different circumstances. The interactive nature of the fact-finding process, however, means that the effect of error can rarely be assessed with confidence. Moreover, changing paradigms in criminal procedure scholarship make accuracy and error correction newly paramount. The empirical evidence of known innocents found guilty in the criminal justice system is mounting, and many of those wrongful convictions endured because errors were reviewed under hindsight standards. New insights about the cognitive psychology of decision-making, taken together with this heightened awareness of error, suggest that it is time to reevaluate some thresholds for reversal. The problem of hindsight blindness is particularly evident in the rules concerning the discovery of exculpatory evidence, the adequacy of defense counsel, and the harmfulness of erroneous rulings at trial. The standards applied in each of those contexts share a common flaw: a barrier between the mechanism for evaluation and the source of error. This essay concludes that reviewing courts should consider the trial that actually occurred rather than what “might have been” in a different proceeding and proposes some new vocabulary for weighing error

    Learning Language from a Large (Unannotated) Corpus

    Full text link
    A novel approach to the fully automated, unsupervised extraction of dependency grammars and associated syntax-to-semantic-relationship mappings from large text corpora is described. The suggested approach builds on the authors' prior work with the Link Grammar, RelEx and OpenCog systems, as well as on a number of prior papers and approaches from the statistical language learning literature. If successful, this approach would enable the mining of all the information needed to power a natural language comprehension and generation system, directly from a large, unannotated corpus.Comment: 29 pages, 5 figures, research proposa

    Double Whammy - How ICT Projects are Fooled by Randomness and Screwed by Political Intent

    Get PDF
    The cost-benefit analysis formulates the holy trinity of objectives of project management - cost, schedule, and benefits. As our previous research has shown, ICT projects deviate from their initial cost estimate by more than 10% in 8 out of 10 cases. Academic research has argued that Optimism Bias and Black Swan Blindness cause forecasts to fall short of actual costs. Firstly, optimism bias has been linked to effects of deception and delusion, which is caused by taking the inside-view and ignoring distributional information when making decisions. Secondly, we argued before that Black Swan Blindness makes decision-makers ignore outlying events even if decisions and judgements are based on the outside view. Using a sample of 1,471 ICT projects with a total value of USD 241 billion - we answer the question: Can we show the different effects of Normal Performance, Delusion, and Deception? We calculated the cumulative distribution function (CDF) of (actual-forecast)/forecast. Our results show that the CDF changes at two tipping points - the first one transforms an exponential function into a Gaussian bell curve. The second tipping point transforms the bell curve into a power law distribution with the power of 2. We argue that these results show that project performance up to the first tipping point is politically motivated and project performance above the second tipping point indicates that project managers and decision-makers are fooled by random outliers, because they are blind to thick tails. We then show that Black Swan ICT projects are a significant source of uncertainty to an organisation and that management needs to be aware of

    A Framework for High-Accuracy Privacy-Preserving Mining

    Full text link
    To preserve client privacy in the data mining process, a variety of techniques based on random perturbation of data records have been proposed recently. In this paper, we present a generalized matrix-theoretic model of random perturbation, which facilitates a systematic approach to the design of perturbation mechanisms for privacy-preserving mining. Specifically, we demonstrate that (a) the prior techniques differ only in their settings for the model parameters, and (b) through appropriate choice of parameter settings, we can derive new perturbation techniques that provide highly accurate mining results even under strict privacy guarantees. We also propose a novel perturbation mechanism wherein the model parameters are themselves characterized as random variables, and demonstrate that this feature provides significant improvements in privacy at a very marginal cost in accuracy. While our model is valid for random-perturbation-based privacy-preserving mining in general, we specifically evaluate its utility here with regard to frequent-itemset mining on a variety of real datasets. The experimental results indicate that our mechanisms incur substantially lower identity and support errors as compared to the prior techniques

    Efficient Privacy Preserving Distributed Clustering Based on Secret Sharing

    Get PDF
    In this paper, we propose a privacy preserving distributed clustering protocol for horizontally partitioned data based on a very efficient homomorphic additive secret sharing scheme. The model we use for the protocol is novel in the sense that it utilizes two non-colluding third parties. We provide a brief security analysis of our protocol from information theoretic point of view, which is a stronger security model. We show communication and computation complexity analysis of our protocol along with another protocol previously proposed for the same problem. We also include experimental results for computation and communication overhead of these two protocols. Our protocol not only outperforms the others in execution time and communication overhead on data holders, but also uses a more efficient model for many data mining applications