16 research outputs found

    Security and Privacy Preservation in Mobile Crowdsensing

    Get PDF
    Mobile crowdsensing (MCS) is a compelling paradigm that enables a crowd of individuals to cooperatively collect and share data to measure phenomena or record events of common interest using their mobile devices. Pairing with inherent mobility and intelligence, mobile users can collect, produce and upload large amounts of data to service providers based on crowdsensing tasks released by customers, ranging from general information, such as temperature, air quality and traffic condition, to more specialized data, such as recommended places, health condition and voting intentions. Compared with traditional sensor networks, MCS can support large-scale sensing applications, improve sensing data trustworthiness and reduce the cost on deploying expensive hardware or software to acquire high-quality data. Despite the appealing benefits, however, MCS is also confronted with a variety of security and privacy threats, which would impede its rapid development. Due to their own incentives and vulnerabilities of service providers, data security and user privacy are being put at risk. The corruption of sensing reports may directly affect crowdsensing results, and thereby mislead customers to make irrational decisions. Moreover, the content of crowdsensing tasks may expose the intention of customers, and the sensing reports might inadvertently reveal sensitive information about mobile users. Data encryption and anonymization techniques can provide straightforward solutions for data security and user privacy, but there are several issues, which are of significantly importance to make MCS practical. First of all, to enhance data trustworthiness, service providers need to recruit mobile users based on their personal information, such as preferences, mobility pattern and reputation, resulting in the privacy exposure to service providers. Secondly, it is inevitable to have replicate data in crowdsensing reports, which may possess large communication bandwidth, but traditional data encryption makes replicate data detection and deletion challenging. Thirdly, crowdsensed data analysis is essential to generate crowdsensing reports in MCS, but the correctness of crowdsensing results in the absence of malicious mobile users and service providers become a huge concern for customers. Finally yet importantly, even if user privacy is preserved during task allocation and data collection, it may still be exposed during reward distribution. It further discourage mobile users from task participation. In this thesis, we explore the approaches to resolve these challenges in MCS. Based on the architecture of MCS, we conduct our research with the focus on security and privacy protection without sacrificing data quality and users' enthusiasm. Specifically, the main contributions are, i) to enable privacy preservation and task allocation, we propose SPOON, a strong privacy-preserving mobile crowdsensing scheme supporting accurate task allocation. In SPOON, the service provider recruits mobile users based on their locations, and selects proper sensing reports according to their trust levels without invading user privacy. By utilizing the blind signature, sensing tasks are protected and reports are anonymized. In addition, a privacy-preserving credit management mechanism is introduced to achieve decentralized trust management and secure credit proof for mobile users; ii) to improve communication efficiency while guaranteeing data confidentiality, we propose a fog-assisted secure data deduplication scheme, in which a BLS-oblivious pseudo-random function is developed to enable fog nodes to detect and delete replicate data in sensing reports without exposing the content of reports. Considering the privacy leakages of mobile users who report the same data, the blind signature is utilized to hide users' identities, and chameleon hash function is leveraged to achieve contribution claim and reward retrieval for anonymous greedy mobile users; iii) to achieve data statistics with privacy preservation, we propose a privacy-preserving data statistics scheme to achieve end-to-end security and integrity protection, while enabling the aggregation of the collected data from multiple sources. The correctness verification is supported to prevent the corruption of the aggregate results during data transmission based on the homomorphic authenticator and the proxy re-signature. A privacy-preserving verifiable linear statistics mechanism is developed to realize the linear aggregation of multiple crowdsensed data from a same device and the verification on the correctness of aggregate results; and iv) to encourage mobile users to participating in sensing tasks, we propose a dual-anonymous reward distribution scheme to offer the incentive for mobile users and privacy protection for both customers and mobile users in MCS. Based on the dividable cash, a new reward sharing incentive mechanism is developed to encourage mobile users to participating in sensing tasks, and the randomization technique is leveraged to protect the identities of customers and mobile users during reward claim, distribution and deposit

    Active Learning for Reducing Labeling Effort in Text Classification Tasks

    Get PDF
    Labeling data can be an expensive task as it is usually performed manually by domain experts. This is cumbersome for deep learning, as it is dependent on large labeled datasets. Active learning (AL) is a paradigm that aims to reduce labeling effort by only using the data which the used model deems most informative. Little research has been done on AL in a text classification setting and next to none has involved the more recent, state-of-the-art Natural Language Processing (NLP) models. Here, we present an empirical study that compares different uncertainty-based algorithms with BERTbase_{base} as the used classifier. We evaluate the algorithms on two NLP classification datasets: Stanford Sentiment Treebank and KvK-Frontpages. Additionally, we explore heuristics that aim to solve presupposed problems of uncertainty-based AL; namely, that it is unscalable and that it is prone to selecting outliers. Furthermore, we explore the influence of the query-pool size on the performance of AL. Whereas it was found that the proposed heuristics for AL did not improve performance of AL; our results show that using uncertainty-based AL with BERTbase_{base} outperforms random sampling of data. This difference in performance can decrease as the query-pool size gets larger.Comment: Accepted as a conference paper at the joint 33rd Benelux Conference on Artificial Intelligence and the 30th Belgian Dutch Conference on Machine Learning (BNAIC/BENELEARN 2021). This camera-ready version submitted to BNAIC/BENELEARN, adds several improvements including a more thorough discussion of related work plus an extended discussion section. 28 pages including references and appendice

    Structured Prediction on Dirty Datasets

    Get PDF
    Many errors cannot be detected or repaired without taking into account the underlying structure and dependencies in the dataset. One way of modeling the structure of the data is graphical models. Graphical models combine probability theory and graph theory in order to address one of the key objectives in designing and fitting probabilistic models, which is to capture dependencies among relevant random variables. Structure representation helps to understand the side effect of the errors or it reveals correct interrelationships between data points. Hence, principled representation of structure in prediction and cleaning tasks of dirty data is essential for the quality of downstream analytical results. Existing structured prediction research considers limited structures and configurations, with little attention to the performance limitations and how well the problem can be solved in more general settings where the structure is complex and rich. In this dissertation, I present the following thesis: By leveraging the underlying dependency and structure in machine learning models, we can effectively detect and clean errors via pragmatic structured predictions techniques. To highlight the main contributions: I investigate prediction algorithms and systems on dirty data with a more realistic structure and dependencies to help deploy this type of learning in more pragmatic settings. Specifically, We introduce a few-shot learning framework for error detection that uses structure-based features of data such as denial constraints violations and Bayesian network as co-occurrence feature. I have studied the problem of recovering the latent ground truth labeling of a structured instance. Then, I consider the problem of mining integrity constraints from data and specifically using the sampling methods for extracting approximate denial constraints. Finally, I have introduced an ML framework that uses solitary and structured data features to solve the problem of record fusion

    Tools and Algorithms for the Construction and Analysis of Systems

    Get PDF
    This open access two-volume set constitutes the proceedings of the 26th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS 2020, which took place in Dublin, Ireland, in April 2020, and was held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2020. The total of 60 regular papers presented in these volumes was carefully reviewed and selected from 155 submissions. The papers are organized in topical sections as follows: Part I: Program verification; SAT and SMT; Timed and Dynamical Systems; Verifying Concurrent Systems; Probabilistic Systems; Model Checking and Reachability; and Timed and Probabilistic Systems. Part II: Bisimulation; Verification and Efficiency; Logic and Proof; Tools and Case Studies; Games and Automata; and SV-COMP 2020

    Goal Reasoning: Papers from the ACS Workshop

    Get PDF
    This technical report contains the 14 accepted papers presented at the Workshop on Goal Reasoning, which was held as part of the 2015 Conference on Advances in Cognitive Systems (ACS-15) in Atlanta, Georgia on 28 May 2015. This is the fourth in a series of workshops related to this topic, the first of which was the AAAI-10 Workshop on Goal-Directed Autonomy; the second was the Self-Motivated Agents (SeMoA) Workshop, held at Lehigh University in November 2012; and the third was the Goal Reasoning Workshop at ACS-13 in Baltimore, Maryland in December 2013

    Securing the Next Generation Web

    Get PDF
    With the ever-increasing digitalization of society, the need for secure systems is growing. While some security features, like HTTPS, are popular, securing web applications, and the clients we use to interact with them remains difficult.To secure web applications we focus on both the client-side and server-side. For the client-side, mainly web browsers, we analyze how new security features might solve a problem but introduce new ones. We show this by performing a systematic analysis of the new Content Security Policy (CSP)\ua0 directive navigate-to. In our research, we find that it does introduce new vulnerabilities, to which we recommend countermeasures. We also create AutoNav, a tool capable of automatically suggesting navigation policies for this directive. Finding server-side vulnerabilities in a black-box setting where\ua0 there is no access to the source code is challenging. To improve this, we develop novel black-box methods for automatically finding vulnerabilities. We\ua0 accomplish this by identifying key challenges in web scanning and combining the best of previous methods. Additionally, we leverage SMT solvers to\ua0 further improve the coverage and vulnerability detection rate of scanners.In addition to browsers, browser extensions also play an important role in the web ecosystem. These small programs, e.g. AdBlockers and password\ua0 managers, have powerful APIs and access to sensitive user data like browsing history. By systematically analyzing the extension ecosystem we find new\ua0 static and dynamic methods for detecting both malicious and vulnerable extensions. In addition, we develop a method for detecting malicious extensions\ua0 solely based on the meta-data of downloads over time. We analyze new attack vectors introduced by Google’s new vehicle OS, Android Automotive. This\ua0 is based on Android with the addition of vehicle APIs. Our analysis results in new attacks pertaining to safety, privacy, and availability. Furthermore, we\ua0 create AutoTame, which is designed to analyze third-party apps for vehicles for the vulnerabilities we found

    Advances in knowledge discovery and data mining Part II

    Get PDF
    19th Pacific-Asia Conference, PAKDD 2015, Ho Chi Minh City, Vietnam, May 19-22, 2015, Proceedings, Part II</p
    corecore