42,601 research outputs found

    Multiple imputation for sharing precise geographies in public use data

    Full text link
    When releasing data to the public, data stewards are ethically and often legally obligated to protect the confidentiality of data subjects' identities and sensitive attributes. They also strive to release data that are informative for a wide range of secondary analyses. Achieving both objectives is particularly challenging when data stewards seek to release highly resolved geographical information. We present an approach for protecting the confidentiality of data with geographic identifiers based on multiple imputation. The basic idea is to convert geography to latitude and longitude, estimate a bivariate response model conditional on attributes, and simulate new latitude and longitude values from these models. We illustrate the proposed methods using data describing causes of death in Durham, North Carolina. In the context of the application, we present a straightforward tool for generating simulated geographies and attributes based on regression trees, and we present methods for assessing disclosure risks with such simulated data.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS506 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A Taxonomy for Management and Optimization of Multiple Resources in Edge Computing

    Full text link
    Edge computing is promoted to meet increasing performance needs of data-driven services using computational and storage resources close to the end devices, at the edge of the current network. To achieve higher performance in this new paradigm one has to consider how to combine the efficiency of resource usage at all three layers of architecture: end devices, edge devices, and the cloud. While cloud capacity is elastically extendable, end devices and edge devices are to various degrees resource-constrained. Hence, an efficient resource management is essential to make edge computing a reality. In this work, we first present terminology and architectures to characterize current works within the field of edge computing. Then, we review a wide range of recent articles and categorize relevant aspects in terms of 4 perspectives: resource type, resource management objective, resource location, and resource use. This taxonomy and the ensuing analysis is used to identify some gaps in the existing research. Among several research gaps, we found that research is less prevalent on data, storage, and energy as a resource, and less extensive towards the estimation, discovery and sharing objectives. As for resource types, the most well-studied resources are computation and communication resources. Our analysis shows that resource management at the edge requires a deeper understanding of how methods applied at different levels and geared towards different resource types interact. Specifically, the impact of mobility and collaboration schemes requiring incentives are expected to be different in edge architectures compared to the classic cloud solutions. Finally, we find that fewer works are dedicated to the study of non-functional properties or to quantifying the footprint of resource management techniques, including edge-specific means of migrating data and services.Comment: Accepted in the Special Issue Mobile Edge Computing of the Wireless Communications and Mobile Computing journa

    Multitask Learning for Network Traffic Classification

    Full text link
    Traffic classification has various applications in today's Internet, from resource allocation, billing and QoS purposes in ISPs to firewall and malware detection in clients. Classical machine learning algorithms and deep learning models have been widely used to solve the traffic classification task. However, training such models requires a large amount of labeled data. Labeling data is often the most difficult and time-consuming process in building a classifier. To solve this challenge, we reformulate the traffic classification into a multi-task learning framework where bandwidth requirement and duration of a flow are predicted along with the traffic class. The motivation of this approach is twofold: First, bandwidth requirement and duration are useful in many applications, including routing, resource allocation, and QoS provisioning. Second, these two values can be obtained from each flow easily without the need for human labeling or capturing flows in a controlled and isolated environment. We show that with a large amount of easily obtainable data samples for bandwidth and duration prediction tasks, and only a few data samples for the traffic classification task, one can achieve high accuracy. We conduct two experiment with ISCX and QUIC public datasets and show the efficacy of our approach

    Internet-based 'social sharing' as a new form of global production: The case of SETI@home

    Get PDF
    Benkler ('Sharing Nicely', Yale Law Journal, 2004, Vol. 114, pp. 273-358) has argued that 'social sharing' via Internet-based distributed computing is a new, so far under-appreciated modality of economic production. This paper presents results from an empirical study of SETI@home (the Search for Extraterrestrial Intelligence), which is the classic example of such a computing project. The aim is to explain SETI@home participation and its intensity in a cross-country setting. The data are for a sample of 172 developed and developing countries for the years 2002-2004. The results indicate that SETI@home participation and its intensity can be explained largely by the degree of ICT access (proxied by the International Telecommunication Union's 'Digital Access Index'), as well as GDP per capita and dummy variables for major country groups. Some other variables, such as the Human Development Index, perform less well. Although SETI@home is a global phenomenon, it is never-the-less mostly concentrated in rich countries. However, there are indications of a slowly narrowing global SETI@home digital divide
    • …
    corecore