399 research outputs found

    Numerical Problem Encryption for High-Performance Computing Applications

    Get PDF
    Recent years witnessed the diffusion of cloud-based services. Cloud services have the interesting advantage that they can provide resources (CPU, disk space, etc.) that would be too expensive to deploy and maintain in-house. A major drawback of cloud-based services is the problem of handling private data and—possibly—intellectual property to a third party. With some service (e.g., data storage), cryptography can provide a solution; however, there are some services that are more difficult to protect. An example of such services is the renting of CPU to carry out numerical computation such as differential equation solving. In this chapter, we discuss the problem of encrypting numerical problems so that their solution can be safely outsourced. The idea is to transform (encrypt) a given numerical problem into a different one whose solution can be mapped back to the solution of the original problem if the key used at the encryption stage is known

    Privacy-Preserving Cloud-Assisted Data Analytics

    Get PDF
    Nowadays industries are collecting a massive and exponentially growing amount of data that can be utilized to extract useful insights for improving various aspects of our life. Data analytics (e.g., via the use of machine learning) has been extensively applied to make important decisions in various real world applications. However, it is challenging for resource-limited clients to analyze their data in an efficient way when its scale is large. Additionally, the data resources are increasingly distributed among different owners. Nonetheless, users\u27 data may contain private information that needs to be protected. Cloud computing has become more and more popular in both academia and industry communities. By pooling infrastructure and servers together, it can offer virtually unlimited resources easily accessible via the Internet. Various services could be provided by cloud platforms including machine learning and data analytics. The goal of this dissertation is to develop privacy-preserving cloud-assisted data analytics solutions to address the aforementioned challenges, leveraging the powerful and easy-to-access cloud. In particular, we propose the following systems. To address the problem of limited computation power at user and the need of privacy protection in data analytics, we consider geometric programming (GP) in data analytics, and design a secure, efficient, and verifiable outsourcing protocol for GP. Our protocol consists of a transform scheme that converts GP to DGP, a transform scheme with computationally indistinguishability, and an efficient scheme to solve the transformed DGP at the cloud side with result verification. Evaluation results show that the proposed secure outsourcing protocol can achieve significant time savings for users. To address the problem of limited data at individual users, we propose two distributed learning systems such that users can collaboratively train machine learning models without losing privacy. The first one is a differentially private framework to train logistic regression models with distributed data sources. We employ the relevance between input data features and the model output to significantly improve the learning accuracy. Moreover, we adopt an evaluation data set at the cloud side to suppress low-quality data sources and propose a differentially private mechanism to protect user\u27s data quality privacy. Experimental results show that the proposed framework can achieve high utility with low quality data, and strong privacy guarantee. The second one is an efficient privacy-preserving federated learning system that enables multiple edge users to collaboratively train their models without revealing dataset. To reduce the communication overhead, we select well-aligned and large-enough magnitude gradients for uploading which leads to quick convergence. To minimize the noise added and improve model utility, each user only adds a small amount of noise to his selected gradients, encrypts the noise gradients before uploading, and the cloud server will only get the aggregate gradients that contain enough noise to achieve differential privacy. Evaluation results show that the proposed system can achieve high accuracy, low communication overhead, and strong privacy guarantee. In future work, we plan to design a privacy-preserving data analytics with fair exchange, which ensures the payment fairness. We will also consider designing distributed learning systems with heterogeneous architectures

    LINEAR REGRESSION FOR PREDICTION OF EXCESSIVE PERMISSIONS DATABASE ACCOUNT TRAFFIC

    Get PDF
    Today, the security of information and data is an important asset for everyone in protecting data. Data and information become critical when weaknesses and threats come. In this study, an estimation of the observed variables will be carried out. The application of data mining, especially with the estimation method by using linear regression techniques. The next stage is data preparation by referring to the dataset recorded in the user activity log. Data preparation takes a lot of time because you have to make sure the data fits the needs of data mining analysis. The analysis technique with linear regression involves three independent variables as Type Permissions, Type of User Account, Status, and the dependent variable, namely User Actions. The strongest effect was found in type_permissions and state when together on user_actions. The type_permissions variable keeps increasing when the state on the user is active. The status attribute also suffers from the same condition. Accrording to the results, our findings in root mean squared error is 37.614 and absolute error is 31.058, and mean absolute percentage about 23%. Furthermore, User_action as an estimated variable gives two data opportunities whether it is allowed or not. Therefore, in future research, it is necessary to map users of the database system still in the context of data mining when digging for information on excessive permissions

    Privacy-Preserving Ridge Regression with only Linearly-Homomorphic Encryption

    Get PDF
    Linear regression with 2-norm regularization (i.e., ridge regression) is an important statistical technique that models the relationship between some explanatory values and an outcome value using a linear function. In many applications (e.g., predictive modelling in personalised health care), these values represent sensitive data owned by several different parties who are unwilling to share them. In this setting, training a linear regression model becomes challenging and needs specific cryptographic solutions. This problem was elegantly addressed by Nikolaenko et al. in S&P (Oakland) 2013. They suggested a two-server system that uses linearly-homomorphic encryption (LHE) and Yao’s two-party protocol (garbled circuits). In this work, we propose a novel system that can train a ridge linear regression model using only LHE (i.e., without using Yao’s protocol). This greatly improves the overall performance (both in computation and communication) as Yao’s protocol was the main bottleneck in the previous solution. The efficiency of the proposed system is validated both on synthetically-generated and real-world datasets

    Computing Competencies for Undergraduate Data Science Curricula: ACM Data Science Task Force

    Get PDF
    At the August 2017 ACM Education Council meeting, a task force was formed to explore a process to add to the broad, interdisciplinary conversation on data science, with an articulation of the role of computing discipline-specific contributions to this emerging field. Specifically, the task force would seek to define what the computing/computational contributions are to this new field, and provide guidance on computing-specific competencies in data science for departments offering such programs of study at the undergraduate level. There are many stakeholders in the discussion of data science – these include colleges and universities that (hope to) offer data science programs, employers who hope to hire a workforce with knowledge and experience in data science, as well as individuals and professional societies representing the fields of computing, statistics, machine learning, computational biology, computational social sciences, digital humanities, and others. There is a shared desire to form a broad interdisciplinary definition of data science and to develop curriculum guidance for degree programs in data science. This volume builds upon the important work of other groups who have published guidelines for data science education. There is a need to acknowledge the definition and description of the individual contributions to this interdisciplinary field. For instance, those interested in the business context for these concepts generally use the term “analytics”; in some cases, the abbreviation DSA appears, meaning Data Science and Analytics. This volume is the third draft articulation of computing-focused competencies for data science. It recognizes the inherent interdisciplinarity of data science and situates computing-specific competencies within the broader interdisciplinary space

    The 11th Conference of PhD Students in Computer Science

    Get PDF

    Effective and Secure Healthcare Machine Learning System with Explanations Based on High Quality Crowdsourcing Data

    Get PDF
    Affordable cloud computing technologies allow users to efficiently outsource, store, and manage their Personal Health Records (PHRs) and share with their caregivers or physicians. With this exponential growth of the stored large scale clinical data and the growing need for personalized care, researchers are keen on developing data mining methodologies to learn efficient hidden patterns in such data. While studies have shown that those progresses can significantly improve the performance of various healthcare applications for clinical decision making and personalized medicine, the collected medical datasets are highly ambiguous and noisy. Thus, it is essential to develop a better tool for disease progression and survival rate predictions, where dataset needs to be cleaned before it is used for predictions and useful feature selection techniques need to be employed before prediction models can be constructed. In addition, having predictions without explanations prevent medical personnel and patients from adopting such healthcare deep learning models. Thus, any prediction models must come with some explanations. Finally, despite the efficiency of machine learning systems and their outstanding prediction performance, it is still a risk to reuse pre-trained models since most machine learning modules that are contributed and maintained by third parties lack proper checking to ensure that they are robust to various adversarial attacks. We need to design mechanisms for detection such attacks. In this thesis, we focus on addressing all the above issues: (i) Privacy Preserving Disease Treatment & Complication Prediction System (PDTCPS): A privacy-preserving disease treatment, complication prediction scheme (PDTCPS) is proposed, which allows authorized users to conduct searches for disease diagnosis, personalized treatments, and prediction of potential complications. (ii) Incentivizing High Quality Crowdsourcing Data For Disease Prediction: A new incentive model with individual rationality and platform profitability features is developed to encourage different hospitals to share high quality data so that better prediction models can be constructed. We also explore how data cleaning and feature selection techniques affect the performance of the prediction models. (iii) Explainable Deep Learning Based Medical Diagnostic System: A deep learning based medical diagnosis system (DL-MDS) is present which integrates heterogeneous medical data sources to produce better disease diagnosis with explanations for authorized users who submit their personalized health related queries. (iv) Attacks on RNN based Healthcare Learning Systems and Their Detection & Defense Mechanisms: Potential attacks on Recurrent Neural Network (RNN) based ML systems are identified and low-cost detection & defense schemes are designed to prevent such adversarial attacks. Finally, we conduct extensive experiments using both synthetic and real-world datasets to validate the feasibility and practicality of our proposed systems
    • …
    corecore