22 research outputs found
Min Max Normalization Based Data Perturbation Method for Privacy Protection
Data mining system contain large amount of private and sensitive data such as healthcare, financial and criminal records. These private and sensitive data can not be share to every one, so privacy protection of data is required in data mining system for avoiding privacy leakage of data. Data perturbation is one of the best methods for privacy preserving. We used data perturbation method for preserving privacy as well as accuracy. In this method individual data value are distorted before data mining application. In this paper we present min max normalization transformation based data perturbation. The privacy parameters are used for measurement of privacy protection and the utility measure shows the performance of data mining technique after data distortion. We performed experiment on real life dataset and the result show that min max normalization transformation based data perturbation method is effective to protect confidential information and also maintain the performance of data mining technique after data distortion
Updating/downdating the NonNegative Matrix Factorization
This is the author’s version of a work that was accepted for publication in Journal of Computational and Applied Mathematics. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Journal of Computational and Applied Mathematics 318 (2017) 59–68. DOI 10.1016/j.cam.2016.11.048.The Non-Negative Matrix Factorization (NNMF) is a recent numerical tool that, given a nonnegative
data matrix, tries to obtain its factorization as the approximate product of two
nonnegative matrices. Nowadays, this factorization is being used in many science fields;
in some of these fields, real-time computation of the NNMF is required. In some scenarios,
all data is not initially available and when new data (as new rows or columns) becomes
available the NNMF must be recomputed. Recomputing the whole factorization every time
is very costly and not suitable for real time applications. In this paper we propose several
algorithms to update the NNMF factorization taking advantage of the previously computed
factorizations, with similar error and lower computational cost.
© 2016 Elsevier B.V. All rights reserved.This work has been partially supported by EU together with Spanish Government through TEC2015-67387-C4-1-R (MINECO/FEDER), by Generalitat Valenciana through PROMETEOII/2014/003 and by Programa de FPU del Ministerio de Educacion, Cultura y Deporte FPU13/03828 (Spain). We want to thank Dr. Pedro Vera and his team (University of Jaen) for providing us with their music analysis software.San Juan Sebastián, P.; Vidal Maciá, AM.; García Mollá, VM. (2016). Updating/downdating the NonNegative Matrix Factorization. Journal of Computational and Applied Mathematics. 318:59-68. https://doi.org/10.1016/j.cam.2016.11.048S596831
Optimized Data Aggregation Method for Time, Privacy and Effort Reduction in Wireless Sensor Network
Wireless sensor networks (WSNs) have gained wide application in recent years, such as in intelligent transportation system, medical care, disaster rescue, structure health monitoring and so on. In these applications, since WSNs are multi-hop networks, and the sink nodes of WSNs require to gather every sensor node’s data, data aggregation is emerging as a critical function for WSNs. Reducing the latency of data aggregation attracts much research because many applications are event urgent. Data aggregation is ubiquitous in wireless sensor networks (WSNs). Much work investigates how to reduce the data aggregation latency. This paper considers the data aggregation method based on optimization of required time, maintain privacy while keeping lesser efforts by data aggregation in a wireless sensor network (WSN) and propose a method for the solution of the problem
Towards Name Disambiguation: Relational, Streaming, and Privacy-Preserving Text Data
In the real world, our DNA is unique but many people share names. This phenomenon often causes erroneous aggregation of documents of multiple persons who are namesakes of one another. Such mistakes deteriorate the performance of document retrieval, web search, and more seriously, cause improper attribution of credit or blame in digital forensics. To resolve this issue, the name disambiguation task 1 is designed to partition the documents associated with a name reference such that each partition contains documents pertaining to a unique real-life person. Existing algorithms for this task mainly suffer from the following drawbacks. First, the majority of existing solutions substantially rely on feature engineering, such as biographical feature extraction, or construction of auxiliary features from Wikipedia. However, for many scenarios, such features may be costly to obtain or unavailable in privacy sensitive domains. Instead we solve the name disambiguation task in restricted setting by leveraging only the relational data in the form of anonymized graphs. Second, most of the existing works for this task operate in a batch mode, where all records to be disambiguated are initially available to the algorithm. However, more realistic settings require that the name disambiguation task should be performed in an online streaming fashion in order to identify records of new ambiguous entities having no preexisting records. Finally, we investigate the potential disclosure risk of textual features used in name disambiguation and propose several algorithms to tackle the task in a privacy-aware scenario. In summary, in this dissertation, we present a number of novel approaches to address name disambiguation tasks from the above three aspects independently, namely relational, streaming, and privacy preserving textual data
MATRIX DECOMPOSITION FOR DATA DISCLOSURE CONTROL AND DATA MINING APPLICATIONS
Access to huge amounts of various data with private information brings out a dual demand for preservation of data privacy and correctness of knowledge discovery, which are two apparently contradictory tasks. Low-rank approximations generated by matrix decompositions are a fundamental element in this dissertation for the privacy preserving data mining (PPDM) applications. Two categories of PPDM are studied: data value hiding (DVH) and data pattern hiding (DPH). A matrix-decomposition-based framework is designed to incorporate matrix decomposition techniques into data preprocessing to distort original data sets. With respect to the challenge in the DVH, how to protect sensitive/confidential attribute values without jeopardizing underlying data patterns, we propose singular value decomposition (SVD)-based and nonnegative matrix factorization (NMF)-based models. Some discussion on data distortion and data utility metrics is presented. Our experimental results on benchmark data sets demonstrate that our proposed models have potential for outperforming standard data perturbation models regarding the balance between data privacy and data utility.
Based on an equivalence between the NMF and K-means clustering, a simultaneous data value and pattern hiding strategy is developed for data mining activities using K-means clustering. Three schemes are designed to make a slight alteration on submatrices such that user-specified cluster properties of data subjects are hidden. Performance evaluation demonstrates the efficacy of the proposed strategy since some optimal solutions can be computed with zero side effects on nonconfidential memberships. Accordingly, the protection of privacy is simplified by one modified data set with enhanced performance by this dual privacy protection.
In addition, an improved incremental SVD-updating algorithm is applied to speed up the real-time performance of the SVD-based model for frequent data updates. The performance and effectiveness of the improved algorithm have been examined on synthetic and real data sets. Experimental results indicate that the introduction of the incremental matrix decomposition produces a significant speedup. It also provides potential support for the use of the SVD technique in the On-Line Analytical Processing for business data analysis
Recommended from our members
A Non-negative Matrix Factorization Framework for Privacy-preserving and Federated Learning
The uncontrolled growth in domains such as surveillance systems, health care services, and finance produce a large amount of data and contain potentially sensitive data that can become public if they are not appropriately sanitized.
Motivated by this issue, we introduce a privacy filter (PF), a novel non-negative matrix factorization (NMF) framework aiming to preserve the privacy of data before publishing based on an alternating non-negative least squares (ANLS) approach.
More specifically, this framework enables data holders to choose the data dimension that protects user privacy without being aware of the privacy leakage.
We also consider the problem of privately learning a PF across multiple sensitive datasets, leading to a federated learning algorithm that guarantees private data protection and high accuracy classification for non-private information.
Finally, the experiments conduct and illustrate the superior performance of the proposed algorithms under the premise of protecting users’ private data.Keywords: Data privacy, distributed data privacy, privacy-preserving machine learning, adversarial learning, no-negative matrix factorizatio
A Methodology for Sensitive Attribute Discrimination Prevention in Data Mining
ABSTRACT: Today, Data mining is an increasingly important technology. It is a process of extracting useful knowledge from large collections of data. There are some negative view about data mining, among which potential privacy and potential discrimination. Discrimination means is the unequal or unfairly treating people on the basis of their specific belonging group. If the data sets are divided on the basis of sensitive attributes like gender, race, religion, etc., discriminatory decisions may ensue. For this reason, antidiscrimination laws for discrimination prevention have been introduced for data mining. Discrimination can be either direct or indirect. Direct discrimination occurs when decisions are made based on some sensitive attributes. It consists of rules or procedures that explicitly mention minority or disadvantaged groups based on sensitive discriminatory attributes related to group membership. Indirect discrimination occurs when decisions are made based on non sensitive attributes which are strongly related with biased sensitive ones. It consists of rules or procedures that, which is not explicitly mentioning discriminatory attributes, intentionally or unintentionally, could generate decisions about discrimination