12,184 research outputs found
Using unknowns to prevent discovery of association rules
Data mining technology has given us new capabilities to identify correlations in large data sets. This introduces risks when the data is to be made public, but the correlations are private. We introduce a method for selectively removing individual values from a database to prevent the discovery of a set of rules, while preserving the data for other applications. The efficacy and complexity of this method are discussed. We also present an experiment showing an example of this methodology
Criminal Adjudication, Error Correction, and Hindsight Blind Spots
Concerns about hindsight in the law typically arise with regard to the bias that outcome knowledge can produce. But a more difficult problem than the clear view that hindsight appears to provide is the blind spot that it actually has. Because of the conventional wisdom about error review, there is a missed opportunity to ensure meaningful scrutiny. Beyond the confirmation biases that make convictions seem inevitable lies the question whether courts can see what they are meant to assess when they do look closely for error. Standards that require a retrospective showing of materiality, prejudice, or harm turn on what a judge imagines would have happened at trial under different circumstances. The interactive nature of the fact-finding process, however, means that the effect of error can rarely be assessed with confidence. Moreover, changing paradigms in criminal procedure scholarship make accuracy and error correction newly paramount. The empirical evidence of known innocents found guilty in the criminal justice system is mounting, and many of those wrongful convictions endured because errors were reviewed under hindsight standards. New insights about the cognitive psychology of decision-making, taken together with this heightened awareness of error, suggest that it is time to reevaluate some thresholds for reversal. The problem of hindsight blindness is particularly evident in the rules concerning the discovery of exculpatory evidence, the adequacy of defense counsel, and the harmfulness of erroneous rulings at trial. The standards applied in each of those contexts share a common flaw: a barrier between the mechanism for evaluation and the source of error. This essay concludes that reviewing courts should consider the trial that actually occurred rather than what “might have been” in a different proceeding and proposes some new vocabulary for weighing error
Learning Language from a Large (Unannotated) Corpus
A novel approach to the fully automated, unsupervised extraction of
dependency grammars and associated syntax-to-semantic-relationship mappings
from large text corpora is described. The suggested approach builds on the
authors' prior work with the Link Grammar, RelEx and OpenCog systems, as well
as on a number of prior papers and approaches from the statistical language
learning literature. If successful, this approach would enable the mining of
all the information needed to power a natural language comprehension and
generation system, directly from a large, unannotated corpus.Comment: 29 pages, 5 figures, research proposa
Double Whammy - How ICT Projects are Fooled by Randomness and Screwed by Political Intent
The cost-benefit analysis formulates the holy trinity of objectives of
project management - cost, schedule, and benefits. As our previous research has
shown, ICT projects deviate from their initial cost estimate by more than 10%
in 8 out of 10 cases. Academic research has argued that Optimism Bias and Black
Swan Blindness cause forecasts to fall short of actual costs. Firstly, optimism
bias has been linked to effects of deception and delusion, which is caused by
taking the inside-view and ignoring distributional information when making
decisions. Secondly, we argued before that Black Swan Blindness makes
decision-makers ignore outlying events even if decisions and judgements are
based on the outside view. Using a sample of 1,471 ICT projects with a total
value of USD 241 billion - we answer the question: Can we show the different
effects of Normal Performance, Delusion, and Deception? We calculated the
cumulative distribution function (CDF) of (actual-forecast)/forecast. Our
results show that the CDF changes at two tipping points - the first one
transforms an exponential function into a Gaussian bell curve. The second
tipping point transforms the bell curve into a power law distribution with the
power of 2. We argue that these results show that project performance up to the
first tipping point is politically motivated and project performance above the
second tipping point indicates that project managers and decision-makers are
fooled by random outliers, because they are blind to thick tails. We then show
that Black Swan ICT projects are a significant source of uncertainty to an
organisation and that management needs to be aware of
A Framework for High-Accuracy Privacy-Preserving Mining
To preserve client privacy in the data mining process, a variety of
techniques based on random perturbation of data records have been proposed
recently. In this paper, we present a generalized matrix-theoretic model of
random perturbation, which facilitates a systematic approach to the design of
perturbation mechanisms for privacy-preserving mining. Specifically, we
demonstrate that (a) the prior techniques differ only in their settings for the
model parameters, and (b) through appropriate choice of parameter settings, we
can derive new perturbation techniques that provide highly accurate mining
results even under strict privacy guarantees. We also propose a novel
perturbation mechanism wherein the model parameters are themselves
characterized as random variables, and demonstrate that this feature provides
significant improvements in privacy at a very marginal cost in accuracy.
While our model is valid for random-perturbation-based privacy-preserving
mining in general, we specifically evaluate its utility here with regard to
frequent-itemset mining on a variety of real datasets. The experimental results
indicate that our mechanisms incur substantially lower identity and support
errors as compared to the prior techniques
Efficient Privacy Preserving Distributed Clustering Based on Secret Sharing
In this paper, we propose a privacy preserving distributed
clustering protocol for horizontally partitioned data based on a very efficient
homomorphic additive secret sharing scheme. The model we use
for the protocol is novel in the sense that it utilizes two non-colluding
third parties. We provide a brief security analysis of our protocol from
information theoretic point of view, which is a stronger security model.
We show communication and computation complexity analysis of our
protocol along with another protocol previously proposed for the same
problem. We also include experimental results for computation and communication
overhead of these two protocols. Our protocol not only outperforms
the others in execution time and communication overhead on
data holders, but also uses a more efficient model for many data mining
applications
Recommended from our members
Electronic Discovery/Disclosure: From Litigation to International Commercial Arbitration
- …