7,924 research outputs found
State of the Art in Privacy Preserving Data Mining
Privacy is one of the most important properties an information system must satisfy. A relatively new trend shows that classical
access control techniques are not sufficient to guarantee privacy when Data Mining techniques are used. Such a trend, especially in the context of public databases, or in the context of sensible information related to critical infrastructures, represents, nowadays a not negligible thread. Privacy Preserving Data Mining (PPDM) algorithms have been recently introduced with the aim of modifying the database in such a way to prevent the discovery of sensible information. This is a very complex task and there exist in the scientific literature some different approaches to the problem. In this work we present a "Survey" of the current PPDM methodologies which seem promising for the future.JRC.G.6-Sensors, radar technologies and cybersecurit
Market Basket Analysis in the Financial Sector – A Customer Centric Approach
Organizations often struggle with their efforts to implement data mining projects successfully. This is often due to the fact that they are influenced by success stories of others that glamorize the outcome of successful initiatives, while understating the persistent rigour and diligence required. Although process models exist for the knowledge discovery process their focus is often on outlining the activities that must be done and not on describing how they should be done. While there is some research in addressing how to carry out the various tasks in the phases, the data preparation phase is thought to be the most challenging and is often described as an art rather than a science. In this study we apply a multi-phased integrated knowledge discovery and data mining process model (IKDDM) to a data set from the financial sector and a present a new approach to data preparation for Sequential Patterns (SP) that facilitated the identification of customer focused patterns rather than products focussed patterns in the modelling phase
XML content warehousing: Improving sociological studies of mailing lists and web data
In this paper, we present the guidelines for an XML-based approach for the
sociological study of Web data such as the analysis of mailing lists or
databases available online. The use of an XML warehouse is a flexible solution
for storing and processing this kind of data. We propose an implemented
solution and show possible applications with our case study of profiles of
experts involved in W3C standard-setting activity. We illustrate the
sociological use of semi-structured databases by presenting our XML Schema for
mailing-list warehousing. An XML Schema allows many adjunctions or crossings of
data sources, without modifying existing data sets, while allowing possible
structural evolution. We also show that the existence of hidden data implies
increased complexity for traditional SQL users. XML content warehousing allows
altogether exhaustive warehousing and recursive queries through contents, with
far less dependence on the initial storage. We finally present the possibility
of exporting the data stored in the warehouse to commonly-used advanced
software devoted to sociological analysis
New Fundamental Technologies in Data Mining
The progress of data mining technology and large public popularity establish a need for a comprehensive text on the subject. The series of books entitled by "Data Mining" address the need by presenting in-depth description of novel mining algorithms and many useful applications. In addition to understanding each section deeply, the two books present useful hints and strategies to solving problems in the following chapters. The contributing authors have highlighted many future research directions that will foster multi-disciplinary collaborations and hence will lead to significant development in the field of data mining
An Evaluation of the Use of Diversity to Improve the Accuracy of Predicted Ratings in Recommender Systems
The diversity; versus accuracy trade off, has become an important area of research within recommender systems as online retailers attempt to better serve their customers and gain a competitive advantage through an improved customer experience. This dissertation attempted to evaluate the use of diversity measures in predictive models as a means of improving predicted ratings. Research literature outlines a number of influencing factors such as personality, taste, mood and social networks in addition to approaches to the diversity challenge post recommendation. A number of models were applied included DecisionStump, Linear Regression, J48 Decision Tree and Naive Bayes. Various evaluation metrics such as precision, recall, ROC area, mean squared error and correlation coefficient were used to evaluate the model types. The results were below a benchmark selected during the literature review. The experiment did not demonstrate that diversity measures as inputs improve the accuracy of predicted ratings. However, the evaluation results for the model without diversity measures were low also and comparable to those with diversity indicating that further research in this area may be worthwhile. While the experiment conducted did not clearly demonstrate that the inclusion of diversity measures as inputs improve the accuracy of predicted ratings, approaches to data extraction, pre-processing, and model selection could inform further research. Areas of further research identified within this paper may also add value for those interested in this topic
- …