7,848 research outputs found
Recommended from our members
Data standardization
With data rapidly becoming the lifeblood of the global economy, the ability to improve its use significantly affects both social and private welfare. Data standardization is key to facilitating and improving the use of data when data portability and interoperability are needed. Absent data standardization, a “Tower of Babel” of different databases may be created, limiting synergetic knowledge production. Based on interviews with data scientists, this Article identifies three main technological obstacles to data portability and interoperability: metadata uncertainties, data transfer obstacles, and missing data. It then explains how data standardization can remove at least some of these obstacles and lead to smoother data flows and better machine learning. The Article then identifies and analyzes additional effects of data standardization. As shown, data standardization has the potential to support a competitive and distributed data collection ecosystem and lead to easier policing in cases where rights are infringed or unjustified harms are created by data-fed algorithms. At the same time, increasing the scale and scope of data analysis can create negative externalities in the form of better profiling, increased harms to privacy, and cybersecurity harms. Standardization also has implications for investment and innovation, especially if lock-in to an inefficient standard occurs. The Article then explores whether market-led standardization initiatives can be relied upon to increase welfare, and the role governmental-facilitated data standardization should play, if at all
Middleware-based Database Replication: The Gaps between Theory and Practice
The need for high availability and performance in data management systems has
been fueling a long running interest in database replication from both academia
and industry. However, academic groups often attack replication problems in
isolation, overlooking the need for completeness in their solutions, while
commercial teams take a holistic approach that often misses opportunities for
fundamental innovation. This has created over time a gap between academic
research and industrial practice.
This paper aims to characterize the gap along three axes: performance,
availability, and administration. We build on our own experience developing and
deploying replication systems in commercial and academic settings, as well as
on a large body of prior related work. We sift through representative examples
from the last decade of open-source, academic, and commercial database
replication systems and combine this material with case studies from real
systems deployed at Fortune 500 customers. We propose two agendas, one for
academic research and one for industrial R&D, which we believe can bridge the
gap within 5-10 years. This way, we hope to both motivate and help researchers
in making the theory and practice of middleware-based database replication more
relevant to each other.Comment: 14 pages. Appears in Proc. ACM SIGMOD International Conference on
Management of Data, Vancouver, Canada, June 200
- …