5,230 research outputs found
Optimistic Concurrency Control for Distributed Unsupervised Learning
Research on distributed machine learning algorithms has focused primarily on
one of two extremes - algorithms that obey strict concurrency constraints or
algorithms that obey few or no such constraints. We consider an intermediate
alternative in which algorithms optimistically assume that conflicts are
unlikely and if conflicts do arise a conflict-resolution protocol is invoked.
We view this "optimistic concurrency control" paradigm as particularly
appropriate for large-scale machine learning algorithms, particularly in the
unsupervised setting. We demonstrate our approach in three problem areas:
clustering, feature learning and online facility location. We evaluate our
methods via large-scale experiments in a cluster computing environment.Comment: 25 pages, 5 figure
GeoGauss: Strongly Consistent and Light-Coordinated OLTP for Geo-Replicated SQL Database
Multinational enterprises conduct global business that has a demand for
geo-distributed transactional databases. Existing state-of-the-art databases
adopt a sharded master-follower replication architecture. However, the
single-master serving mode incurs massive cross-region writes from clients, and
the sharded architecture requires multiple round-trip acknowledgments (e.g.,
2PC) to ensure atomicity for cross-shard transactions. These limitations drive
us to seek yet another design choice. In this paper, we propose a strongly
consistent OLTP database GeoGauss with full replica multi-master architecture.
To efficiently merge the updates from different master nodes, we propose a
multi-master OCC that unifies data replication and concurrent transaction
processing. By leveraging an epoch-based delta state merge rule and the
optimistic asynchronous execution, GeoGauss ensures strong consistency with
light-coordinated protocol and allows more concurrency with weak isolation,
which are sufficient to meet our needs. Our geo-distributed experimental
results show that GeoGauss achieves 7.06X higher throughput and 17.41X lower
latency than the state-of-the-art geo-distributed database CockroachDB on the
TPC-C benchmark
- …