34,941 research outputs found
A Framework for High-Accuracy Privacy-Preserving Mining
To preserve client privacy in the data mining process, a variety of
techniques based on random perturbation of data records have been proposed
recently. In this paper, we present a generalized matrix-theoretic model of
random perturbation, which facilitates a systematic approach to the design of
perturbation mechanisms for privacy-preserving mining. Specifically, we
demonstrate that (a) the prior techniques differ only in their settings for the
model parameters, and (b) through appropriate choice of parameter settings, we
can derive new perturbation techniques that provide highly accurate mining
results even under strict privacy guarantees. We also propose a novel
perturbation mechanism wherein the model parameters are themselves
characterized as random variables, and demonstrate that this feature provides
significant improvements in privacy at a very marginal cost in accuracy.
While our model is valid for random-perturbation-based privacy-preserving
mining in general, we specifically evaluate its utility here with regard to
frequent-itemset mining on a variety of real datasets. The experimental results
indicate that our mechanisms incur substantially lower identity and support
errors as compared to the prior techniques
Using association rule mining to enrich semantic concepts for video retrieval
In order to achieve true content-based information retrieval on video we should analyse and index video with
high-level semantic concepts in addition to using user-generated tags and structured metadata like title, date,
etc. However the range of such high-level semantic concepts, detected either manually or automatically,
usually limited compared to the richness of information content in video and the potential vocabulary of
available concepts for indexing. Even though there is work to improve the performance of individual concept
classifiers, we should strive to make the best use of whatever partial sets of semantic concept occurrences
are available to us. We describe in this paper our method for using association rule mining to automatically
enrich the representation of video content through a set of semantic concepts based on concept co-occurrence
patterns. We describe our experiments on the TRECVid 2005 video corpus annotated with the 449 concepts
of the LSCOM ontology. The evaluation of our results shows the usefulness of our approach
Deriving query suggestions for site search
Modern search engines have been moving away from simplistic interfaces that aimed at satisfying a user's need with a single-shot query. Interactive features are now integral parts of web search engines. However, generating good query modification suggestions remains a challenging issue. Query log analysis is one of the major strands of work in this direction. Although much research has been performed on query logs collected on the web as a whole, query log analysis to enhance search on smaller and more focused collections has attracted less attention, despite its increasing practical importance. In this article, we report on a systematic study of different query modification methods applied to a substantial query log collected on a local website that already uses an interactive search engine. We conducted experiments in which we asked users to assess the relevance of potential query modification suggestions that have been constructed using a range of log analysis methods and different baseline approaches. The experimental results demonstrate the usefulness of log analysis to extract query modification suggestions. Furthermore, our experiments demonstrate that a more fine-grained approach than grouping search requests into sessions allows for extraction of better refinement terms from query log files. © 2013 ASIS&T
FixMiner: Mining Relevant Fix Patterns for Automated Program Repair
Patching is a common activity in software development. It is generally
performed on a source code base to address bugs or add new functionalities. In
this context, given the recurrence of bugs across projects, the associated
similar patches can be leveraged to extract generic fix actions. While the
literature includes various approaches leveraging similarity among patches to
guide program repair, these approaches often do not yield fix patterns that are
tractable and reusable as actionable input to APR systems. In this paper, we
propose a systematic and automated approach to mining relevant and actionable
fix patterns based on an iterative clustering strategy applied to atomic
changes within patches. The goal of FixMiner is thus to infer separate and
reusable fix patterns that can be leveraged in other patch generation systems.
Our technique, FixMiner, leverages Rich Edit Script which is a specialized tree
structure of the edit scripts that captures the AST-level context of the code
changes. FixMiner uses different tree representations of Rich Edit Scripts for
each round of clustering to identify similar changes. These are abstract syntax
trees, edit actions trees, and code context trees. We have evaluated FixMiner
on thousands of software patches collected from open source projects.
Preliminary results show that we are able to mine accurate patterns,
efficiently exploiting change information in Rich Edit Scripts. We further
integrated the mined patterns to an automated program repair prototype,
PARFixMiner, with which we are able to correctly fix 26 bugs of the Defects4J
benchmark. Beyond this quantitative performance, we show that the mined fix
patterns are sufficiently relevant to produce patches with a high probability
of correctness: 81% of PARFixMiner's generated plausible patches are correct.Comment: 31 pages, 11 figure
Secure and Distributed Approach for Mining Association Rules
Data mining is the process of extracting trends from data sources. Domain exerts can make use of the trends to derive business intelligence. Big organizations store data in multiple server and often data is horizontally distributed. Mining such database provides useful and actionable knowledge which can help in making well informed decisions. However, secure mining of extracting association rules can provide interesting information that can help enterprises to make expert decisions. In this paper, we propose an algorithm and have a secure mechanism in order to mine association rules for deriving knowledge. We also incorporated auditing of data in the proposed system. We built a prototype application that demonstrates the secure mining of association rules with support and confidence. The statistical measures such as support and confidence help in knowing the usefulness of the rules. The empirical results are encouraging
Sound mining in the North : a guide to environmental regulation and best practices supporting social sustainability
Julkaistu versi
- …