Search CORE

1,303 research outputs found

An Economic Analysis of Privacy Protection and Statistical Accuracy as Social Choices

Author: Abowd John M
Schmutte Ian M
Publication venue: DigitalCommons@ILR
Publication date: 15/08/2018
Field of study

Statistical agencies face a dual mandate to publish accurate statistics while protecting respondent privacy. Increasing privacy protection requires decreased accuracy. Recognizing this as a resource allocation problem, we propose an economic solution: operate where the marginal cost of increasing privacy equals the marginal benefit. Our model of production, from computer science, assumes data are published using an efficient differentially private algorithm. Optimal choice weighs the demand for accurate statistics against the demand for privacy. Examples from U.S. statistical programs show how our framework can guide decision-making. Further progress requires a better understanding of willingness-to-pay for privacy and statistical accuracy

DigitalCommons@ILR

From Word to Sense Embeddings: A Survey on Vector Representations of Meaning

Author: Camacho-Collados Jose
Pilehvar Mohammad Taher
Publication venue
Publication date: 26/10/2018
Field of study

Over the past years, distributed semantic representations have proved to be effective and flexible keepers of prior knowledge to be integrated into downstream applications. This survey focuses on the representation of meaning. We start from the theoretical background behind word vector space models and highlight one of their major limitations: the meaning conflation deficiency, which arises from representing a word with all its possible meanings as a single vector. Then, we explain how this deficiency can be addressed through a transition from the word level to the more fine-grained level of word senses (in its broader acceptation) as a method for modelling unambiguous lexical meaning. We present a comprehensive overview of the wide range of techniques in the two main branches of sense representation, i.e., unsupervised and knowledge-based. Finally, this survey covers the main evaluation procedures and applications for this type of representation, and provides an analysis of four of its important aspects: interpretability, sense granularity, adaptability to different domains and compositionality.Comment: 46 pages, 8 figures. Published in Journal of Artificial Intelligence Researc

arXiv.org e-Print Archive

Online Research @ Cardiff

Metaphysics of Internal Controls

Author: Gal Graham
Publication venue
Publication date: 01/01/2022
Field of study

A quality internal control system has been seen as a remedy for various corporate governance issues. Two pieces of legislation, the Foreign Corrupt Practices Act (FCPA) and the Sarbanes-Oxley Act (SOX) deal with very different corporate governance issues, but each argue for a similar remedy. Both the FCPA and the SOX legislation argue that improved (or proper) internal controls are necessary to root out bribery of foreign officials, in the case of the FCPA, and (in the case of SOX) to support the accurate preparation of financial statements. An issue that has yet to be resolved is that the quality of internal control systems is subject to subjective assessments of the internal control deficiencies and their impact. This paper presents a mathematical model of internal controls based on Gӧdel number of axioms. This results in the representation of quality internal controls in terms of an integer. This approach also allows for inferences about financial statements and various auditing judgements

ScholarSpace at University of Hawai'i at Manoa

Enhancing In-Memory Spatial Indexing with Learned Search

Author: Ding Jialin
Kemper Alfons
Kipf Andreas
Markl Volker
Pandey Varun
Sabek Ibrahim
Van Renen Alexander
Zacharatou Eleni Tzirita
Publication venue
Publication date: 01/01/2023
Field of study

Spatial data is ubiquitous. Massive amounts of data are generated every day from a plethora of sources such as billions of GPS-enableddevices (e.g., cell phones, cars, and sensors), consumer-based applications (e.g., Uber and Strava), and social media platforms (e.g.,location-tagged posts on Facebook, Twitter, and Instagram). This exponential growth in spatial data has led the research communityto build systems and applications for efficient spatial data processing.In this study, we apply a recently developed machine-learned search technique for single-dimensional sorted data to spatial indexing.Specifically, we partition spatial data using six traditional spatial partitioning techniques and employ machine-learned search withineach partition to support point, range, distance, and spatial join queries. Adhering to the latest research trends, we tune the partitioningtechniques to be instance-optimized. By tuning each partitioning technique for optimal performance, we demonstrate that: (i) grid-basedindex structures outperform tree-based index structures (from 1.23× to 2.47×), (ii) learning-enhanced variants of commonly used spatialindex structures outperform their original counterparts (from 1.44× to 53.34× faster), (iii) machine-learned search within a partitionis faster than binary search by 11.79% - 39.51% when filtering on one dimension, (iv) the benefit of machine-learned search diminishesin the presence of other compute-intensive operations (e.g. scan costs in higher selectivity queries, Haversine distance computation, andpoint-in-polygon tests), and (v) index lookup is the bottleneck for tree-based structures, which could potentially be reduced by linearizingthe indexed partitions.Additional Key Words and Phrases: spatial data, indexing, machine-learning, spatial queries, geospatia

The IT University of Copenhagen's Repository

DALiuGE: A Graph Execution Framework for Harnessing the Astronomical Data Deluge

Author: An Tao
Boulton Mark
Cooper Ian
Dodson Richard
Dolensky Markus
Lao Baoqiang
Mei Ying
Pallot Dave
Tobar Rodrigo
Vinsen Kevin
Wang Feng
Wang Ruonan
Wicenec Andreas
Wu Chen
Publication venue
Publication date: 01/01/2017
Field of study

The Data Activated Liu Graph Engine - DALiuGE - is an execution framework for processing large astronomical datasets at a scale required by the Square Kilometre Array Phase 1 (SKA1). It includes an interface for expressing complex data reduction pipelines consisting of both data sets and algorithmic components and an implementation run-time to execute such pipelines on distributed resources. By mapping the logical view of a pipeline to its physical realisation, DALiuGE separates the concerns of multiple stakeholders, allowing them to collectively optimise large-scale data processing solutions in a coherent manner. The execution in DALiuGE is data-activated, where each individual data item autonomously triggers the processing on itself. Such decentralisation also makes the execution framework very scalable and flexible, supporting pipeline sizes ranging from less than ten tasks running on a laptop to tens of millions of concurrent tasks on the second fastest supercomputer in the world. DALiuGE has been used in production for reducing interferometry data sets from the Karl E. Jansky Very Large Array and the Mingantu Ultrawide Spectral Radioheliograph; and is being developed as the execution framework prototype for the Science Data Processor (SDP) consortium of the Square Kilometre Array (SKA) telescope. This paper presents a technical overview of DALiuGE and discusses case studies from the CHILES and MUSER projects that use DALiuGE to execute production pipelines. In a companion paper, we provide in-depth analysis of DALiuGE's scalability to very large numbers of tasks on two supercomputing facilities.Comment: 31 pages, 12 figures, currently under review by Astronomy and Computin

arXiv.org e-Print Archive

Shanghai Astronomical Observatory,Chinese Academy of Sciences

Supporting the long‐term curation and migration of natural history museum collections databases

Author: Thomer Andrea K.
Twidale Michael B.
Weber Nicholas M.
Publication venue: 'Wiley'
Publication date: 01/01/2018
Field of study

Migration of data collections from one platform to another is an important component of data curation – yet, there is surprisingly little guidance for information professionals faced with this task. Data migration may be particularly challenging when these data collections are housed in relational databases, due to the complex ways that data, data schemas, and relational database management software become intertwined over time. Here we present results from a study of the maintenance, evolution and migration of research databases housed in Natural History Museums. We find that database migration is an on‐going – rather than occasional – process for many Collection managers, and that they creatively appropriate and innovate on many existing technologies in their migration work. This paper contributes descriptions of a preliminary set of common adaptations and “migration patterns” in the practices of database curators. It also outlines the strategies they use when facing collection‐level data migration and describes the limitations of existing tools in supporting LAM and “small science” research database migration. We conclude by outlining future research directions for the maintenance and migration of collections and complex digital objects.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/147782/1/pra214505501055.pd

Deep Blue Documents at the University of Michigan