426 research outputs found

    A location privacy-preserving system based on query range cover-up for location-based services

    Full text link

    Data Mining and Machine Learning in Astronomy

    Full text link
    We review the current state of data mining and machine learning in astronomy. 'Data Mining' can have a somewhat mixed connotation from the point of view of a researcher in this field. If used correctly, it can be a powerful approach, holding the potential to fully exploit the exponentially increasing amount of available data, promising great scientific advance. However, if misused, it can be little more than the black-box application of complex computing algorithms that may give little physical insight, and provide questionable results. Here, we give an overview of the entire data mining process, from data collection through to the interpretation of results. We cover common machine learning algorithms, such as artificial neural networks and support vector machines, applications from a broad range of astronomy, emphasizing those where data mining techniques directly resulted in improved science, and important current and future directions, including probability density functions, parallel algorithms, petascale computing, and the time domain. We conclude that, so long as one carefully selects an appropriate algorithm, and is guided by the astronomical problem at hand, data mining can be very much the powerful tool, and not the questionable black box.Comment: Published in IJMPD. 61 pages, uses ws-ijmpd.cls. Several extra figures, some minor additions to the tex

    Privacy protection in location based services

    Get PDF
    This thesis takes a multidisciplinary approach to understanding the characteristics of Location Based Services (LBS) and the protection of location information in these transactions. This thesis reviews the state of the art and theoretical approaches in Regulations, Geographic Information Science, and Computer Science. Motivated by the importance of location privacy in the current age of mobile devices, this thesis argues that failure to ensure privacy protection under this context is a violation to human rights and poses a detriment to the freedom of users as individuals. Since location information has unique characteristics, existing methods for protecting other type of information are not suitable for geographical transactions. This thesis demonstrates methods that safeguard location information in location based services and that enable geospatial analysis. Through a taxonomy, the characteristics of LBS and privacy techniques are examined and contrasted. Moreover, mechanisms for privacy protection in LBS are presented and the resulting data is tested with different geospatial analysis tools to verify the possibility of conducting these analyses even with protected location information. By discussing the results and conclusions of these studies, this thesis provides an agenda for the understanding of obfuscated geospatial data usability and the feasibility to implement the proposed mechanisms in privacy concerning LBS, as well as for releasing crowdsourced geographic information to third-parties

    Private and Flexible Proximity Detection in Mobile Social Networks

    Get PDF

    System Abstractions for Scalable Application Development at the Edge

    Get PDF
    Recent years have witnessed an explosive growth of Internet of Things (IoT) devices, which collect or generate huge amounts of data. Given diverse device capabilities and application requirements, data processing takes place across a range of settings, from on-device to a nearby edge server/cloud and remote cloud. Consequently, edge-cloud coordination has been studied extensively from the perspectives of job placement, scheduling and joint optimization. Typical approaches focus on performance optimization for individual applications. This often requires domain knowledge of the applications, but also leads to application-specific solutions. Application development and deployment over diverse scenarios thus incur repetitive manual efforts. There are two overarching challenges to provide system-level support for application development at the edge. First, there is inherent heterogeneity at the device hardware level. The execution settings may range from a small cluster as an edge cloud to on-device inference on embedded devices, differing in hardware capability and programming environments. Further, application performance requirements vary significantly, making it even more difficult to map different applications to already heterogeneous hardware. Second, there are trends towards incorporating edge and cloud and multi-modal data. Together, these add further dimensions to the design space and increase the complexity significantly. In this thesis, we propose a novel framework to simplify application development and deployment over a continuum of edge to cloud. Our framework provides key connections between different dimensions of design considerations, corresponding to the application abstraction, data abstraction and resource management abstraction respectively. First, our framework masks hardware heterogeneity with abstract resource types through containerization, and abstracts away the application processing pipelines into generic flow graphs. Further, our framework further supports a notion of degradable computing for application scenarios at the edge that are driven by multimodal sensory input. Next, as video analytics is the killer app of edge computing, we include a generic data management service between video query systems and a video store to organize video data at the edge. We propose a video data unit abstraction based on a notion of distance between objects in the video, quantifying the semantic similarity among video data. Last, considering concurrent application execution, our framework supports multi-application offloading with device-centric control, with a userspace scheduler service that wraps over the operating system scheduler

    Private Web Search with Tiptoe

    Get PDF
    Tiptoe is a private web search engine that allows clients to search over hundreds of millions of documents, while revealing no information about their search query to the search engine’s servers. Tiptoe’s privacy guarantee is based on cryptography alone; it does not require hardware enclaves or non-colluding servers. Tiptoe uses semantic embeddings to reduce the problem of private full-text search to private nearest-neighbor search. Then, Tiptoe implements private nearest-neighbor search with a new, high-throughput protocol based on linearly homomorphic encryption. Running on a 45-server cluster, Tiptoe can privately search over 360 million web pages with 145 core-seconds of server compute, 56.9 MiB of client-server communication (74% of which occurs before the client enters its search query), and 2.7 seconds of end-to-end latency. Tiptoe’s search works best on conceptual queries (“knee pain”) and less well on exact string matches (“123 Main Street, New York”). On the MS MARCO search-quality benchmark, Tiptoe ranks the best-matching result in position 7.7 on average. This is worse than a state-of-the-art, non-private neural search algorithm (average rank: 2.3), but is close to the classical tf-idf algorithm (average rank: 6.7). Finally, Tiptoe is extensible: it also supports private text-to-image search and, with minor modifications, it can search over audio, code, and more

    A privacy preserving framework for cyber-physical systems and its integration in real world applications

    Get PDF
    A cyber-physical system (CPS) comprises of a network of processing and communication capable sensors and actuators that are pervasively embedded in the physical world. These intelligent computing elements achieve the tight combination and coordination between the logic processing and physical resources. It is envisioned that CPS will have great economic and societal impact, and alter the qualify of life like what Internet has done. This dissertation focuses on the privacy issues in current and future CPS applications. as thousands of the intelligent devices are deeply embedded in human societies, the system operations may potentially disclose the sensitive information if no privacy preserving mechanism is designed. This dissertation identifies data privacy and location privacy as the representatives to investigate the privacy problems in CPS. The data content privacy infringement occurs if the adversary can determine or partially determine the meaning of the transmitted data or the data stored in the storage. The location privacy, on the other hand, is the secrecy that a certain sensed object is associated to a specific location, the disclosure of which may endanger the sensed object. The location privacy may be compromised by the adversary through hop-by-hop traceback along the reverse direction of the message routing path. This dissertation proposes a public key based access control scheme to protect the data content privacy. Recent advances in efficient public key schemes, such as ECC, have already shown the feasibility to use public key schemes on low power devices including sensor motes. In this dissertation, an efficient public key security primitives, WM-ECC, has been implemented for TelosB and MICAz, the two major hardware platform in current sensor networks. WM-ECC achieves the best performance among the academic implementations. Based on WM-ECC, this dissertation has designed various security schemes, including pairwise key establishment, user access control and false data filtering mechanism, to protect the data content privacy. The experiments presented in this dissertation have shown that the proposed schemes are practical for real world applications. to protect the location privacy, this dissertation has considered two adversary models. For the first model in which an adversary has limited radio detection capability, the privacy-aware routing schemes are designed to slow down the adversary\u27s traceback progress. Through theoretical analysis, this dissertation shows how to maximize the adversary\u27s traceback time given a power consumption budget for message routing. Based on the theoretical results, this dissertation also proposes a simple and practical weighted random stride (WRS) routing scheme. The second model assumes a more powerful adversary that is able to monitor all radio communications in the network. This dissertation proposes a random schedule scheme in which each node transmits at a certain time slot in a period so that the adversary would not be able to profile the difference in communication patterns among all the nodes. Finally, this dissertation integrates the proposed privacy preserving framework into Snoogle, a sensor nodes based search engine for the physical world. Snoogle allows people to search for the physical objects in their vicinity. The previously proposed privacy preserving schemes are applied in the application to achieve the flexible and resilient privacy preserving capabilities. In addition to security and privacy, Snoogle also incorporates a number of energy saving and communication compression techniques that are carefully designed for systems composed of low-cost, low-power embedded devices. The evaluation study comprises of the real world experiments on a prototype Snoogle system and the scalability simulations

    Survey of Vector Database Management Systems

    Full text link
    There are now over 20 commercial vector database management systems (VDBMSs), all produced within the past five years. But embedding-based retrieval has been studied for over ten years, and similarity search a staggering half century and more. Driving this shift from algorithms to systems are new data intensive applications, notably large language models, that demand vast stores of unstructured data coupled with reliable, secure, fast, and scalable query processing capability. A variety of new data management techniques now exist for addressing these needs, however there is no comprehensive survey to thoroughly review these techniques and systems. We start by identifying five main obstacles to vector data management, namely vagueness of semantic similarity, large size of vectors, high cost of similarity comparison, lack of natural partitioning that can be used for indexing, and difficulty of efficiently answering hybrid queries that require both attributes and vectors. Overcoming these obstacles has led to new approaches to query processing, storage and indexing, and query optimization and execution. For query processing, a variety of similarity scores and query types are now well understood; for storage and indexing, techniques include vector compression, namely quantization, and partitioning based on randomization, learning partitioning, and navigable partitioning; for query optimization and execution, we describe new operators for hybrid queries, as well as techniques for plan enumeration, plan selection, and hardware accelerated execution. These techniques lead to a variety of VDBMSs across a spectrum of design and runtime characteristics, including native systems specialized for vectors and extended systems that incorporate vector capabilities into existing systems. We then discuss benchmarks, and finally we outline research challenges and point the direction for future work.Comment: 25 page

    Privacy preserving linkage and sharing of sensitive data

    Get PDF
    2018 Summer.Includes bibliographical references.Sensitive data, such as personal and business information, is collected by many service providers nowadays. This data is considered as a rich source of information for research purposes that could benet individuals, researchers and service providers. However, because of the sensitivity of such data, privacy concerns, legislations, and con ict of interests, data holders are reluctant to share their data with others. Data holders typically lter out or obliterate privacy related sensitive information from their data before sharing it, which limits the utility of this data and aects the accuracy of research. Such practice will protect individuals' privacy; however it prevents researchers from linking records belonging to the same individual across dierent sources. This is commonly referred to as record linkage problem by the healthcare industry. In this dissertation, our main focus is on designing and implementing ecient privacy preserving methods that will encourage sensitive information sources to share their data with researchers without compromising the privacy of the clients or aecting the quality of the research data. The proposed solution should be scalable and ecient for real-world deploy- ments and provide good privacy assurance. While this problem has been investigated before, most of the proposed solutions were either considered as partial solutions, not accurate, or impractical, and therefore subject to further improvements. We have identied several issues and limitations in the state of the art solutions and provided a number of contributions that improve upon existing solutions. Our rst contribution is the design of privacy preserving record linkage protocol using semi-trusted third party. The protocol allows a set of data publishers (data holders) who compete with each other, to share sensitive information with subscribers (researchers) while preserving the privacy of their clients and without sharing encryption keys. Our second contribution is the design and implementation of a probabilistic privacy preserving record linkage protocol, that accommodates discrepancies and errors in the data such as typos. This work builds upon the previous work by linking the records that are similar, where the similarity range is formally dened. Our third contribution is a protocol that performs information integration and sharing without third party services. We use garbled circuits secure computation to design and build a system to perform the record linkages between two parties without sharing their data. Our design uses Bloom lters as inputs to the garbled circuits and performs a probabilistic record linkage using the Dice coecient similarity measure. As garbled circuits are known for their expensive computations, we propose new approaches that reduce the computation overhead needed, to achieve a given level of privacy. We built a scalable record linkage system using garbled circuits, that could be deployed in a distributed computation environment like the cloud, and evaluated its security and performance. One of the performance issues for linking large datasets is the amount of secure computation to compare every pair of records across the linked datasets to nd all possible record matches. To reduce the amount of computations a method, known as blocking, is used to lter out as much as possible of the record pairs that will not match, and limit the comparison to a subset of the record pairs (called can- didate pairs) that possibly match. Most of the current blocking methods either require the parties to share blocking keys (called blocks identiers), extracted from the domain of some record attributes (termed blocking variables), or share reference data points to group their records around these points using some similarity measures. Though these methods reduce the computation substantially, they leak too much information about the records within each block. Toward this end, we proposed a novel privacy preserving approximate blocking scheme that allows parties to generate the list of candidate pairs with high accuracy, while protecting the privacy of the records in each block. Our scheme is congurable such that the level of performance and accuracy could be achieved according to the required level of privacy. We analyzed the accuracy and privacy of our scheme, implemented a prototype of the scheme, and experimentally evaluated its accuracy and performance against dierent levels of privacy

    Detecting outlier patterns with query-based artificially generated searching conditions

    Get PDF
    In the age of social computing, finding interesting network patterns or motifs is significant and critical for various areas, such as decision intelligence, intrusion detection, medical diagnosis, social network analysis, fake news identification, and national security. However, subgraph matching remains a computationally challenging problem, let alone identifying special motifs among them. This is especially the case in large heterogeneous real-world networks. In this article, we propose an efficient solution for discovering and ranking human behavior patterns based on network motifs by exploring a user's query in an intelligent way. Our method takes advantage of the semantics provided by a user's query, which in turn provides the mathematical constraint that is crucial for faster detection. We propose an approach to generate query conditions based on the user's query. In particular, we use meta paths between the nodes to define target patterns as well as their similarities, leading to efficient motif discovery and ranking at the same time. The proposed method is examined in a real-world academic network using different similarity measures between the nodes. The experiment result demonstrates that our method can identify interesting motifs and is robust to the choice of similarity measures. © 2014 IEEE
    • …
    corecore