2,100 research outputs found

    Privacy and Confidentiality in an e-Commerce World: Data Mining, Data Warehousing, Matching and Disclosure Limitation

    Full text link
    The growing expanse of e-commerce and the widespread availability of online databases raise many fears regarding loss of privacy and many statistical challenges. Even with encryption and other nominal forms of protection for individual databases, we still need to protect against the violation of privacy through linkages across multiple databases. These issues parallel those that have arisen and received some attention in the context of homeland security. Following the events of September 11, 2001, there has been heightened attention in the United States and elsewhere to the use of multiple government and private databases for the identification of possible perpetrators of future attacks, as well as an unprecedented expansion of federal government data mining activities, many involving databases containing personal information. We present an overview of some proposals that have surfaced for the search of multiple databases which supposedly do not compromise possible pledges of confidentiality to the individuals whose data are included. We also explore their link to the related literature on privacy-preserving data mining. In particular, we focus on the matching problem across databases and the concept of ``selective revelation'' and their confidentiality implications.Comment: Published at http://dx.doi.org/10.1214/088342306000000240 in the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Illegal Intrusion Detection of Internet of Things Based on Deep Mining Algorithm

    Get PDF
    In this study, to reduce the influence of The Internet of Things (IoT) illegal intrusion on the transmission effect, and ensure IoT safe operation, an illegal intrusion detection method of the Internet of Things (IoT) based on deep mining algorithm was designed to accurately detect IoT illegal intrusion. Moreover, this study collected the data in the IoT through data packets and carries out data attribute mapping on the collected data, transformed the character information into numerical information, implemented standardization and normalization processing on the numerical information, and optimized the processed data by using a regional adaptive oversampling algorithm to obtain an IoT data training set. The IoT data training set was taken as the input data of the improved sparse auto-encoder neural network. The hierarchical greedy training strategy was used to extract the feature vector of the sparse IoT illegal intrusion data that were used as the inputs of the extreme learning machine classifier to realize the classification and detection of the IoT illegal intrusion features. The experimental results indicate that the feature extraction of the illegal intrusion data of the IoT can effectively reduce the feature dimension of the illegal intrusion data of the IoT to less than 30 and the dimension of the original data. The recall rate, precision, and F1 value of the IoT intrusion detection are 98.3%, 98.7%, and 98.6%, respectively, which can accurately detect IoT intrusion attacks. The conclusion demonstrates that the intrusion detection of IoT based on deep mining algorithm can achieve accurate detection of IoT illegal intrusion and reduce the influence of IoT illegal intrusion on the transmission effect

    Evaluate Various Techniques of Data Warehouse and Data Mining with Web Based Tool

    Get PDF
    All enterprise has a crucial role to play proficiently and productively to maintain its survival in the market and increase its profitability shares. This challenge becomes more complicated with advancement in information technology along with increasing volume and complexity of information. Currently, success of an enterprise is not just the result of efforts by resources but also depends upon its ability to mine the data from the stored information. Data warehousing is a compilation of decision making procedure to integrate and manage the large variant data efficiently and scientifically. Data mining shores up organizations, scrutinize their data more effectively and proficiently to achieve valuable information, that can reward an intelligent and strategic decision making. Data mining has several techniques and maths algorithms which are used to mine large data to increase the organization performance and strategic decision-making. Clustering is a powerful and widely accepted data mining method used to segregate the large data sets into group of similar objects and provides to the end user a sophisticated view of database. This study discusses the basic concept of clustering; its meaning and applications, especially in business for division and selection of target market. This technique is useful in marketing or sales side and, for example, sends a promotion to the right target for that product or service. Association is a known data mining techniques. A pattern is inferred based on an affiliation between matter of same business transaction. It is also referred as relation technique. Large enterprises depend on this technique to research customer's buying preferences. For instance, to track people's buying behavior, retailers might categorize that a customer always buy sambar onion when they buy dal, and therefore suggest that the next time that they buy dal they might also want to buy onion. Classification – it is one of the data mining concept differs from the above in a way it is used on machine learning and makes use of techniques used in maths such as linear programming, decision trees, neural network. In classification, enterprises try to build tool that can learn how to classify the data items into groups. For instance, a company can define a classification in the application that “given all records of employees who offered to resign from the company, predict the number of individuals who are likely to resign from the company in future.” Under such a scenario, the company can classify the records of employees into two groups that namely “separate” and “retain”. It can use its data mining software to classify the employees into separate groups created earlier. Fuzzy logic resembles human reasoning greatly in handling of imperfect information and can be used as a flexibility tool for soften the boundaries in classification that suits the real problems more efficiently. The present study discusses the meaning of fuzzy logic, its applications and different features. A tool to be build to check data mining algorithms and algorithm behind the model, apply clustering method as a sample in tool to select the training data out of the large data base and reduce complexity and time while computing. K-nearest neighbor method can be used in many applications from general to specific to find the requested data out of huge data. Decision trees – A decision tree is a structure that includes a root node, branches, and leaf nodes. Every one interior node signify a test on an attribute, each branch denotes the result of a test, and each leaf node represents a class label. The topmost node in the tree is the root node. Within the decision tree, we start with a simple question that has multiple answers. Each respond show the way to a further query to help classify or identify the data so that it can be categorized, or so that a prediction can be made based on each answer. Regression analysis is the data mining method of identifying and analyzing the relationship between variables. It is used to identify the likelihood of a specific variable, given the presence of other variables. Outlier detection technique refers to observation of data items in the dataset which do not match an expected pattern or expected behaviour. This technique can be used in a variety of domains, such as intrusion, detection, fraud or fault detection, etc. Outer detection is also called Outlier Analysis or Outlier mining. Sequential Patterns technique helps to find out similar patterns or trends in transaction data for definite period
    • …
    corecore