310 research outputs found

    Privacy-Aware Adversarial Network in Human Mobility Prediction

    Get PDF
    As mobile devices and location-based services are increasingly developed in different smart city scenarios and applications, many unexpected privacy leakages have arisen due to geolocated data collection and sharing. User re-identification and other sensitive inferences are major privacy threats when geolocated data are shared with cloud-assisted applications. Significantly, four spatio-temporal points are enough to uniquely identify 95\% of the individuals, which exacerbates personal information leakages. To tackle malicious purposes such as user re-identification, we propose an LSTM-based adversarial mechanism with representation learning to attain a privacy-preserving feature representation of the original geolocated data (i.e., mobility data) for a sharing purpose. These representations aim to maximally reduce the chance of user re-identification and full data reconstruction with a minimal utility budget (i.e., loss). We train the mechanism by quantifying privacy-utility trade-off of mobility datasets in terms of trajectory reconstruction risk, user re-identification risk, and mobility predictability. We report an exploratory analysis that enables the user to assess this trade-off with a specific loss function and its weight parameters. The extensive comparison results on four representative mobility datasets demonstrate the superiority of our proposed architecture in mobility privacy protection and the efficiency of the proposed privacy-preserving features extractor. We show that the privacy of mobility traces attains decent protection at the cost of marginal mobility utility. Our results also show that by exploring the Pareto optimal setting, we can simultaneously increase both privacy (45%) and utility (32%)

    Detecting Events and Patterns in Large-Scale User Generated Textual Streams with Statistical Learning Methods

    Full text link
    A vast amount of textual web streams is influenced by events or phenomena emerging in the real world. The social web forms an excellent modern paradigm, where unstructured user generated content is published on a regular basis and in most occasions is freely distributed. The present Ph.D. Thesis deals with the problem of inferring information - or patterns in general - about events emerging in real life based on the contents of this textual stream. We show that it is possible to extract valuable information about social phenomena, such as an epidemic or even rainfall rates, by automatic analysis of the content published in Social Media, and in particular Twitter, using Statistical Machine Learning methods. An important intermediate task regards the formation and identification of features which characterise a target event; we select and use those textual features in several linear, non-linear and hybrid inference approaches achieving a significantly good performance in terms of the applied loss function. By examining further this rich data set, we also propose methods for extracting various types of mood signals revealing how affective norms - at least within the social web's population - evolve during the day and how significant events emerging in the real world are influencing them. Lastly, we present some preliminary findings showing several spatiotemporal characteristics of this textual information as well as the potential of using it to tackle tasks such as the prediction of voting intentions.Comment: PhD thesis, 238 pages, 9 chapters, 2 appendices, 58 figures, 49 table

    Driving cybersecurity policy insights from information on the Internet

    Get PDF
    National Research Foundation (NRF) Singapor

    An overview of City Analytics

    Get PDF
    We introduce the fourteen articles in the Royal Society Open Science themed issue on City Analytics. To provide a high level, strategic, overview, we summarize the topics addressed and the analytical tools deployed. We then give a more detailed account of the individual contributions. Our overall aims are (a) to highlight exciting advances in this emerging, interdisciplinary field, (b) to encourage further activity and, (c) to emphasize the variety of new, public domain, data sets that are available to researchers

    Privacy protection in location based services

    Get PDF
    This thesis takes a multidisciplinary approach to understanding the characteristics of Location Based Services (LBS) and the protection of location information in these transactions. This thesis reviews the state of the art and theoretical approaches in Regulations, Geographic Information Science, and Computer Science. Motivated by the importance of location privacy in the current age of mobile devices, this thesis argues that failure to ensure privacy protection under this context is a violation to human rights and poses a detriment to the freedom of users as individuals. Since location information has unique characteristics, existing methods for protecting other type of information are not suitable for geographical transactions. This thesis demonstrates methods that safeguard location information in location based services and that enable geospatial analysis. Through a taxonomy, the characteristics of LBS and privacy techniques are examined and contrasted. Moreover, mechanisms for privacy protection in LBS are presented and the resulting data is tested with different geospatial analysis tools to verify the possibility of conducting these analyses even with protected location information. By discussing the results and conclusions of these studies, this thesis provides an agenda for the understanding of obfuscated geospatial data usability and the feasibility to implement the proposed mechanisms in privacy concerning LBS, as well as for releasing crowdsourced geographic information to third-parties

    Reshaping the African Internet: From scattered islands to a connected continent

    Get PDF
    There is an increasing awareness amongst developing regions on the importance of localizing Internet traffic in the quest for fast, affordable, and available Internet access. In this paper, we focus on Africa, where 37 IXPs are currently interconnecting local ISPs, but mostly at the country level. An option to enrich connectivity on the continent and incentivize content providers to establish presence in the region is to interconnect ISPs present at isolated IXPs by creating a distributed IXP layout spanning the continent. The goal of this paper is to investigate whether such IXP interconnection would be possible, and if successful, to estimate the best-case benefits that could be realized in terms of traffic localization and performance. Our hope is that quantitatively demonstrating the benefits will provide incentives for ISPs to intensify their peering relationships in the region. However, it is challenging to estimate this best-case scenario, due to numerous economic, political, and geographical factors influencing the region. Towards this end, we begin with a thorough analysis of the environment in Africa. We then investigate a naive approach to IXP interconnection, which shows that a theoretically optimal solution would be infeasible in practice due to the prevailing socio-economic conditions in the region. We therefore provide an innovative, realistic four-step interconnection scheme to achieve the distributed IXP layout that considers and parameterizes external socio-economic factors using publicly available datasets. We demonstrate that our constrained solution doubles the percentage of continental intra-African paths, reduces their lengths, and drastically decreases the median of their RTTs as well as RTTs to ASes hosting the top 10 global and top 10 regional Alexa websites. Our approach highlights how, given real-world constraints, a solution requires careful considerations in order to be practically realizable.Rodérick Fanou was partially supported by IMDEA Networks Institute, US NSF grant CNS-1414177, and the project BRADE (P2013/ICE-2958) from the Directorate General of Universities and Research, Board of Education, Madrid Regional Governement. Francisco Valera was partially funded by the European Commission under FP7 project LEONE (FP7-317647). Amogh Dhamdhere was partially funded by US NSF grants CNS-1414177 and CNS-1513847.Publicad

    Misusability Measure Based Sanitization of Big Data for Privacy Preserving MapReduce Programming

    Get PDF
    Leakage and misuse of sensitive data is a challenging problem to enterprises. It has become more serious problem with the advent of cloud and big data. The rationale behind this is the increase in outsourcing of data to public cloud and publishing data for wider visibility. Therefore Privacy Preserving Data Publishing (PPDP), Privacy Preserving Data Mining (PPDM) and Privacy Preserving Distributed Data Mining (PPDM) are crucial in the contemporary era. PPDP and PPDM can protect privacy at data and process levels respectively. Therefore, with big data privacy to data became indispensable due to the fact that data is stored and processed in semi-trusted environment. In this paper we proposed a comprehensive methodology for effective sanitization of data based on misusability measure for preserving privacy to get rid of data leakage and misuse. We followed a hybrid approach that caters to the needs of privacy preserving MapReduce programming. We proposed an algorithm known as Misusability Measure-Based Privacy serving Algorithm (MMPP) which considers level of misusability prior to choosing and application of appropriate sanitization on big data. Our empirical study with Amazon EC2 and EMR revealed that the proposed methodology is useful in realizing privacy preserving Map Reduce programming
    • …
    corecore