122,310 research outputs found
Linear Regression from Strategic Data Sources
Linear regression is a fundamental building block of statistical data
analysis. It amounts to estimating the parameters of a linear model that maps
input features to corresponding outputs. In the classical setting where the
precision of each data point is fixed, the famous Aitken/Gauss-Markov theorem
in statistics states that generalized least squares (GLS) is a so-called "Best
Linear Unbiased Estimator" (BLUE). In modern data science, however, one often
faces strategic data sources, namely, individuals who incur a cost for
providing high-precision data.
In this paper, we study a setting in which features are public but
individuals choose the precision of the outputs they reveal to an analyst. We
assume that the analyst performs linear regression on this dataset, and
individuals benefit from the outcome of this estimation. We model this scenario
as a game where individuals minimize a cost comprising two components: (a) an
(agent-specific) disclosure cost for providing high-precision data; and (b) a
(global) estimation cost representing the inaccuracy in the linear model
estimate. In this game, the linear model estimate is a public good that
benefits all individuals. We establish that this game has a unique non-trivial
Nash equilibrium. We study the efficiency of this equilibrium and we prove
tight bounds on the price of stability for a large class of disclosure and
estimation costs. Finally, we study the estimator accuracy achieved at
equilibrium. We show that, in general, Aitken's theorem does not hold under
strategic data sources, though it does hold if individuals have identical
disclosure costs (up to a multiplicative factor). When individuals have
non-identical costs, we derive a bound on the improvement of the equilibrium
estimation cost that can be achieved by deviating from GLS, under mild
assumptions on the disclosure cost functions.Comment: This version (v3) extends the results on the sub-optimality of GLS
(Section 6) and improves writing in multiple places compared to v2. Compared
to the initial version v1, it also fixes an error in Theorem 6 (now Theorem
5), and extended many of the result
THE IMPORTANCE OF POSITIVE LANGUAGE ATTITUDE IN MAINTAINING JAVANESE LANGUAGE
This study is a library study aimed at figuring out how language attitude plays an
important role in maintaining particular language, by taking a look at the case of
Javanese language. Javanese language is viewed as a case here, as Javanese language is
considered as being endangered though not by the size but by the fact that it has lost its
public function, it is no longer effectively taught to the next generation, and it has
undergone shrinkage in both variety and properties. Language contact, language policy,
and language attitude are seen as the reasons of Javanese language shifted by Bahasa
Indonesia. Javanese positive language attitude is suggested as the key factor in
maintaining Javanese language because the first two factors are inevitable and are
difficult to be changed. Positive language attitude is important in maintaining Javanese
language because once the attitude is enriched with the cognitive and affective aspects of
considering Javanese language as part of the speakers identity – and culture in the
bigger sense – the attitude toward Javanese language will be positive and this will lead to
positive behavior toward Javanese language as well
A secure data outsourcing scheme based on Asmuth – Bloom secret sharing
The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link.Data outsourcing is an emerging paradigm for data management in which a database is provided as a service by third-party service providers. One of the major benefits of offering database as a service is to provide organisations, which are unable to purchase expensive hardware and software to host their databases, with efficient data storage accessible online at a cheap rate. Despite that, several issues of data confidentiality, integrity, availability and efficient indexing of users’ queries at the server side have to be addressed in the data outsourcing paradigm. Service providers have to guarantee that their clients’ data are secured against internal (insider) and external attacks. This paper briefly analyses the existing indexing schemes in data outsourcing and highlights their advantages and disadvantages. Then, this paper proposes a secure data outsourcing scheme based on Asmuth–Bloom secret sharing which tries to address the issues in data outsourcing such as data confidentiality, availability and order preservation for efficient indexing
A Game-Theoretic Study on Non-Monetary Incentives in Data Analytics Projects with Privacy Implications
The amount of personal information contributed by individuals to digital
repositories such as social network sites has grown substantially. The
existence of this data offers unprecedented opportunities for data analytics
research in various domains of societal importance including medicine and
public policy. The results of these analyses can be considered a public good
which benefits data contributors as well as individuals who are not making
their data available. At the same time, the release of personal information
carries perceived and actual privacy risks to the contributors. Our research
addresses this problem area. In our work, we study a game-theoretic model in
which individuals take control over participation in data analytics projects in
two ways: 1) individuals can contribute data at a self-chosen level of
precision, and 2) individuals can decide whether they want to contribute at all
(or not). From the analyst's perspective, we investigate to which degree the
research analyst has flexibility to set requirements for data precision, so
that individuals are still willing to contribute to the project, and the
quality of the estimation improves. We study this tradeoff scenario for
populations of homogeneous and heterogeneous individuals, and determine Nash
equilibria that reflect the optimal level of participation and precision of
contributions. We further prove that the analyst can substantially increase the
accuracy of the analysis by imposing a lower bound on the precision of the data
that users can reveal
Approximate Two-Party Privacy-Preserving String Matching with Linear Complexity
Consider two parties who want to compare their strings, e.g., genomes, but do
not want to reveal them to each other. We present a system for
privacy-preserving matching of strings, which differs from existing systems by
providing a deterministic approximation instead of an exact distance. It is
efficient (linear complexity), non-interactive and does not involve a third
party which makes it particularly suitable for cloud computing. We extend our
protocol, such that it mitigates iterated differential attacks proposed by
Goodrich. Further an implementation of the system is evaluated and compared
against current privacy-preserving string matching algorithms.Comment: 6 pages, 4 figure
- …