84 research outputs found

    PRIVACY PRESERVING DATA MINING FOR NUMERICAL MATRICES, SOCIAL NETWORKS, AND BIG DATA

    Get PDF
    Motivated by increasing public awareness of possible abuse of confidential information, which is considered as a significant hindrance to the development of e-society, medical and financial markets, a privacy preserving data mining framework is presented so that data owners can carefully process data in order to preserve confidential information and guarantee information functionality within an acceptable boundary. First, among many privacy-preserving methodologies, as a group of popular techniques for achieving a balance between data utility and information privacy, a class of data perturbation methods add a noise signal, following a statistical distribution, to an original numerical matrix. With the help of analysis in eigenspace of perturbed data, the potential privacy vulnerability of a popular data perturbation is analyzed in the presence of very little information leakage in privacy-preserving databases. The vulnerability to very little data leakage is theoretically proved and experimentally illustrated. Second, in addition to numerical matrices, social networks have played a critical role in modern e-society. Security and privacy in social networks receive a lot of attention because of recent security scandals among some popular social network service providers. So, the need to protect confidential information from being disclosed motivates us to develop multiple privacy-preserving techniques for social networks. Affinities (or weights) attached to edges are private and can lead to personal security leakage. To protect privacy of social networks, several algorithms are proposed, including Gaussian perturbation, greedy algorithm, and probability random walking algorithm. They can quickly modify original data in a large-scale situation, to satisfy different privacy requirements. Third, the era of big data is approaching on the horizon in the industrial arena and academia, as the quantity of collected data is increasing in an exponential fashion. Three issues are studied in the age of big data with privacy preservation, obtaining a high confidence about accuracy of any specific differentially private queries, speedily and accurately updating a private summary of a binary stream with I/O-awareness, and launching a mutual private information retrieval for big data. All three issues are handled by two core backbones, differential privacy and the Chernoff Bound

    Efficient estimation of statistical functions while preserving client-side privacy

    Get PDF
    Aggregating service users’ personal data for analytical purposes is a common practice in today’s Internet economy. However, distrust in the data aggregator, data breaches and risks of subpoenas pose significant challenges in the availability of data. The framework of differential privacy is enjoying wide attention due to its scalability and rigour of privacy protection it provides, and has become a de facto standard for facilitating privacy preserving information extraction. In this dissertation, we design and implement resource efficient algorithms for three fundamental data analysis primitives, marginal, range, and count queries while providing strong differential privacy guarantees. The first two queries are studied in the strict scenario of untrusted aggregation (aka local model) in which the data collector is allowed to only access the noisy/perturbed version of users’ data but not their true data. To the best of our knowledge, marginal and range queries have not been studied in detail in the local setting before our works. We show that our simple data transfomation techniques help us achieve great accuracy in practice and can be used for performing more interesting analysis. Finally, we revisit the problem of count queries under trusted aggregation. This setting can also be viewed as a relaxation of the local model called limited precision local differential privacy. We first discover certain weakness in a well-known optimization framework leading to solutions exhibiting pathological behaviours. We then propose more constraints in the framework to remove these weaknesses without compromising too much on utility

    Pattern Recognition

    Get PDF
    Pattern recognition is a very wide research field. It involves factors as diverse as sensors, feature extraction, pattern classification, decision fusion, applications and others. The signals processed are commonly one, two or three dimensional, the processing is done in real- time or takes hours and days, some systems look for one narrow object class, others search huge databases for entries with at least a small amount of similarity. No single person can claim expertise across the whole field, which develops rapidly, updates its paradigms and comprehends several philosophical approaches. This book reflects this diversity by presenting a selection of recent developments within the area of pattern recognition and related fields. It covers theoretical advances in classification and feature extraction as well as application-oriented works. Authors of these 25 works present and advocate recent achievements of their research related to the field of pattern recognition

    Investigation of late time response analysis for security applications

    Get PDF
    The risk of armed attack by individual’s intent on causing mass casualties against soft targets, such as transport hubs continues. This has led to an increased need for a robust, reliable and accurate detection system for concealed threat items. This new system will need to improve upon existing detection systems including portal based scanners, x-ray scanners and hand held metal detectors as these all suffer from drawbacks of limited detection range and relatively long scanning times. A literature appraisal has been completed to assess the work being undertaken in the relevant field of Concealed Threat Detection (CTD). From this Ultra-Wide Band (UWB) radar has been selected as the most promising technology available for CTD at the present. UWB radar is provided by using Frequency Modulated Continuous Waves (FMCW) from laboratory test equipment over a multi gigahertz bandwidth. This gives the UWB radar the ability to detect both metallic and dielectric objects. Current published results have shown that it is possible to use the LTR technique to detect and discriminate both single objects isolated in air and multiple objects present within the same environment. A Vector Network Analyser (VNA) has been used to provide the Ultra-Wide Band (UWB) Frequency Modulated Continuous Wave (FMCW) radar signal required for the LTR technique. This thesis presents the application of the Generalized Pencil-of-Function (GPOF), Dual Tree Wavelet Transform (DTWT) and the Continuous Wavelet Transform (CWT), both real and complex valued, in Late Time Response (LTR) security analysis to produce a viable detection algorithm. Supervised and unsupervised Artificial Neural Networks (ANN) have been applied to develop a successful classification scheme for Concealed Threat Detection (CTD) in on body security screening. Signal deconvolution and other techniques have been applied in post processing to allow for extraction of the LTR signal from the scattered return. Data vectorization has been applied to the extracted LTR signal using an unsupervised learning based ANN to prepare data for classification. Classification results for both binary threat/non-threat classifiers and a group classifier are presented. The GPOF method presented true positive classification results approaching 72% with wavelet based methods offering between 98% and 100%

    Predictive Maintenance of an External Gear Pump using Machine Learning Algorithms

    Get PDF
    The importance of Predictive Maintenance is critical for engineering industries, such as manufacturing, aerospace and energy. Unexpected failures cause unpredictable downtime, which can be disruptive and high costs due to reduced productivity. This forces industries to ensure the reliability of their equip-ment. In order to increase the reliability of equipment, maintenance actions, such as repairs, replacements, equipment updates, and corrective actions are employed. These actions affect the flexibility, quality of operation and manu-facturing time. It is therefore essential to plan maintenance before failure occurs.Traditional maintenance techniques rely on checks conducted routinely based on running hours of the machine. The drawback of this approach is that maintenance is sometimes performed before it is required. Therefore, conducting maintenance based on the actual condition of the equipment is the optimal solu-tion. This requires collecting real-time data on the condition of the equipment, using sensors (to detect events and send information to computer processor).Predictive Maintenance uses these types of techniques or analytics to inform about the current, and future state of the equipment. In the last decade, with the introduction of the Internet of Things (IoT), Machine Learning (ML), cloud computing and Big Data Analytics, manufacturing industry has moved forward towards implementing Predictive Maintenance, resulting in increased uptime and quality control, optimisation of maintenance routes, improved worker safety and greater productivity.The present thesis describes a novel computational strategy of Predictive Maintenance (fault diagnosis and fault prognosis) with ML and Deep Learning applications for an FG304 series external gear pump, also known as a domino pump. In the absence of a comprehensive set of experimental data, synthetic data generation techniques are implemented for Predictive Maintenance by perturbing the frequency content of time series generated using High-Fidelity computational techniques. In addition, various types of feature extraction methods considered to extract most discriminatory informations from the data. For fault diagnosis, three types of ML classification algorithms are employed, namely Multilayer Perceptron (MLP), Support Vector Machine (SVM) and Naive Bayes (NB) algorithms. For prognosis, ML regression algorithms, such as MLP and SVM, are utilised. Although significant work has been reported by previous authors, it remains difficult to optimise the choice of hyper-parameters (important parameters whose value is used to control the learning process) for each specific ML algorithm. For instance, the type of SVM kernel function or the selection of the MLP activation function and the optimum number of hidden layers (and neurons).It is widely understood that the reliability of ML algorithms is strongly depen-dent upon the existence of a sufficiently large quantity of high-quality training data. In the present thesis, due to the unavailability of experimental data, a novel high-fidelity in-silico dataset is generated via a Computational Fluid Dynamic (CFD) model, which has been used for the training of the underlying ML metamodel. In addition, a large number of scenarios are recreated, ranging from healthy to faulty ones (e.g. clogging, radial gap variations, axial gap variations, viscosity variations, speed variations). Furthermore, the high-fidelity dataset is re-enacted by using degradation functions to predict the remaining useful life (fault prognosis) of an external gear pump.The thesis explores and compares the performance of MLP, SVM and NB algo-rithms for fault diagnosis and MLP and SVM for fault prognosis. In order to enable fast training and reliable testing of the MLP algorithm, some predefined network architectures, like 2n neurons per hidden layer, are used to speed up the identification of the precise number of neurons (shown to be useful when the sample data set is sufficiently large). Finally, a series of benchmark tests are presented, enabling to conclude that for fault diagnosis, the use of wavelet features and a MLP algorithm can provide the best accuracy, and the MLP al-gorithm provides the best prediction results for fault prognosis. In addition, benchmark examples are simulated to demonstrate the mesh convergence for the CFD model whereas, quantification analysis and noise influence on training data are performed for ML algorithms

    Automatic counting of mounds on UAV images using computer vision and machine learning

    Get PDF
    Site preparation by mounding is a commonly used silvicultural treatment that improves tree growth conditions by mechanically creating planting microsites called mounds. Following site preparation, an important planning step is to count the number of mounds, which provides forest managers with an estimate of the number of seedlings required for a given plantation block. In the forest industry, counting the number of mounds is generally conducted through manual field surveys by forestry workers, which is costly and prone to errors, especially for large areas. To address this issue, we present a novel framework exploiting advances in Unmanned Aerial Vehicle (UAV) imaging and computer vision to estimate the number of mounds on a planting block accurately. The proposed framework comprises two main components. First, we exploit a visual recognition method based on a deep learning algorithm for multiple object detection by pixel-based segmentation. This enables a preliminary count of visible mounds and other frequently seen objects on the forest floor (e.g., trees, debris, accumulation of water) to be used to characterize the planting block. Second, since visual recognition could be limited by several perturbation factors (e.g., mound erosion, occlusion), we employ a machine learning estimation function that predicts the final number of mounds based on the local block properties extracted in the first stage. We evaluate the proposed framework on a new UAV dataset representing numerous planting blocks with varying features. The proposed method outperformed manual counting methods in terms of relative counting precision, indicating that it has the potential to be advantageous and efficient under challenging situations
    corecore