22 research outputs found

    Min Max Normalization Based Data Perturbation Method for Privacy Protection

    Get PDF
    Data mining system contain large amount of private and sensitive data such as healthcare, financial and criminal records. These private and sensitive data can not be share to every one, so privacy protection of data is required in data mining system for avoiding privacy leakage of data. Data perturbation is one of the best methods for privacy preserving. We used data perturbation method for preserving privacy as well as accuracy. In this method individual data value are distorted before data mining application. In this paper we present min max normalization transformation based data perturbation. The privacy parameters are used for measurement of privacy protection and the utility measure shows the performance of data mining technique after data distortion. We performed experiment on real life dataset and the result show that min max normalization transformation based data perturbation method is effective to protect confidential information and also maintain the performance of data mining technique after data distortion

    Updating/downdating the NonNegative Matrix Factorization

    Full text link
    This is the author’s version of a work that was accepted for publication in Journal of Computational and Applied Mathematics. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Journal of Computational and Applied Mathematics 318 (2017) 59–68. DOI 10.1016/j.cam.2016.11.048.The Non-Negative Matrix Factorization (NNMF) is a recent numerical tool that, given a nonnegative data matrix, tries to obtain its factorization as the approximate product of two nonnegative matrices. Nowadays, this factorization is being used in many science fields; in some of these fields, real-time computation of the NNMF is required. In some scenarios, all data is not initially available and when new data (as new rows or columns) becomes available the NNMF must be recomputed. Recomputing the whole factorization every time is very costly and not suitable for real time applications. In this paper we propose several algorithms to update the NNMF factorization taking advantage of the previously computed factorizations, with similar error and lower computational cost. © 2016 Elsevier B.V. All rights reserved.This work has been partially supported by EU together with Spanish Government through TEC2015-67387-C4-1-R (MINECO/FEDER), by Generalitat Valenciana through PROMETEOII/2014/003 and by Programa de FPU del Ministerio de Educacion, Cultura y Deporte FPU13/03828 (Spain). We want to thank Dr. Pedro Vera and his team (University of Jaen) for providing us with their music analysis software.San Juan Sebastián, P.; Vidal Maciá, AM.; García Mollá, VM. (2016). Updating/downdating the NonNegative Matrix Factorization. Journal of Computational and Applied Mathematics. 318:59-68. https://doi.org/10.1016/j.cam.2016.11.048S596831

    Optimized Data Aggregation Method for Time, Privacy and Effort Reduction in Wireless Sensor Network

    Get PDF
    Wireless sensor networks (WSNs) have gained wide application in recent years, such as in intelligent transportation system, medical care, disaster rescue, structure health monitoring and so on. In these applications, since WSNs are multi-hop networks, and the sink nodes of WSNs require to gather every sensor node’s data, data aggregation is emerging as a critical function for WSNs. Reducing the latency of data aggregation attracts much research because many applications are event urgent. Data aggregation is ubiquitous in wireless sensor networks (WSNs). Much work investigates how to reduce the data aggregation latency. This paper considers the data aggregation method based on optimization of required time, maintain privacy while keeping lesser efforts by data aggregation in a wireless sensor network (WSN) and propose a method for the solution of the problem

    Towards Name Disambiguation: Relational, Streaming, and Privacy-Preserving Text Data

    Get PDF
    In the real world, our DNA is unique but many people share names. This phenomenon often causes erroneous aggregation of documents of multiple persons who are namesakes of one another. Such mistakes deteriorate the performance of document retrieval, web search, and more seriously, cause improper attribution of credit or blame in digital forensics. To resolve this issue, the name disambiguation task 1 is designed to partition the documents associated with a name reference such that each partition contains documents pertaining to a unique real-life person. Existing algorithms for this task mainly suffer from the following drawbacks. First, the majority of existing solutions substantially rely on feature engineering, such as biographical feature extraction, or construction of auxiliary features from Wikipedia. However, for many scenarios, such features may be costly to obtain or unavailable in privacy sensitive domains. Instead we solve the name disambiguation task in restricted setting by leveraging only the relational data in the form of anonymized graphs. Second, most of the existing works for this task operate in a batch mode, where all records to be disambiguated are initially available to the algorithm. However, more realistic settings require that the name disambiguation task should be performed in an online streaming fashion in order to identify records of new ambiguous entities having no preexisting records. Finally, we investigate the potential disclosure risk of textual features used in name disambiguation and propose several algorithms to tackle the task in a privacy-aware scenario. In summary, in this dissertation, we present a number of novel approaches to address name disambiguation tasks from the above three aspects independently, namely relational, streaming, and privacy preserving textual data

    MATRIX DECOMPOSITION FOR DATA DISCLOSURE CONTROL AND DATA MINING APPLICATIONS

    Get PDF
    Access to huge amounts of various data with private information brings out a dual demand for preservation of data privacy and correctness of knowledge discovery, which are two apparently contradictory tasks. Low-rank approximations generated by matrix decompositions are a fundamental element in this dissertation for the privacy preserving data mining (PPDM) applications. Two categories of PPDM are studied: data value hiding (DVH) and data pattern hiding (DPH). A matrix-decomposition-based framework is designed to incorporate matrix decomposition techniques into data preprocessing to distort original data sets. With respect to the challenge in the DVH, how to protect sensitive/confidential attribute values without jeopardizing underlying data patterns, we propose singular value decomposition (SVD)-based and nonnegative matrix factorization (NMF)-based models. Some discussion on data distortion and data utility metrics is presented. Our experimental results on benchmark data sets demonstrate that our proposed models have potential for outperforming standard data perturbation models regarding the balance between data privacy and data utility. Based on an equivalence between the NMF and K-means clustering, a simultaneous data value and pattern hiding strategy is developed for data mining activities using K-means clustering. Three schemes are designed to make a slight alteration on submatrices such that user-specified cluster properties of data subjects are hidden. Performance evaluation demonstrates the efficacy of the proposed strategy since some optimal solutions can be computed with zero side effects on nonconfidential memberships. Accordingly, the protection of privacy is simplified by one modified data set with enhanced performance by this dual privacy protection. In addition, an improved incremental SVD-updating algorithm is applied to speed up the real-time performance of the SVD-based model for frequent data updates. The performance and effectiveness of the improved algorithm have been examined on synthetic and real data sets. Experimental results indicate that the introduction of the incremental matrix decomposition produces a significant speedup. It also provides potential support for the use of the SVD technique in the On-Line Analytical Processing for business data analysis

    A Methodology for Sensitive Attribute Discrimination Prevention in Data Mining

    Get PDF
    ABSTRACT: Today, Data mining is an increasingly important technology. It is a process of extracting useful knowledge from large collections of data. There are some negative view about data mining, among which potential privacy and potential discrimination. Discrimination means is the unequal or unfairly treating people on the basis of their specific belonging group. If the data sets are divided on the basis of sensitive attributes like gender, race, religion, etc., discriminatory decisions may ensue. For this reason, antidiscrimination laws for discrimination prevention have been introduced for data mining. Discrimination can be either direct or indirect. Direct discrimination occurs when decisions are made based on some sensitive attributes. It consists of rules or procedures that explicitly mention minority or disadvantaged groups based on sensitive discriminatory attributes related to group membership. Indirect discrimination occurs when decisions are made based on non sensitive attributes which are strongly related with biased sensitive ones. It consists of rules or procedures that, which is not explicitly mentioning discriminatory attributes, intentionally or unintentionally, could generate decisions about discrimination
    corecore