CORE
πΊπ¦Β
Β make metadata, not war
Services
Research
Services overview
Explore all CORE services
Access to raw data
API
Dataset
FastSync
Content discovery
Recommender
Discovery
OAI identifiers
OAI Resolver
Managing content
Dashboard
Bespoke contracts
Consultancy services
Support us
Support us
Membership
Sponsorship
Community governance
Advisory Board
Board of supporters
Research network
About
About us
Our mission
Team
Blog
FAQs
Contact us
research
Well-Structured Futures and Cache Locality
Authors
Maurice Herlihy
Zhiyu Liu
Publication date
16 August 2016
Publisher
Doi
View
on
arXiv
Abstract
In fork-join parallelism, a sequential program is split into a directed acyclic graph of tasks linked by directed dependency edges, and the tasks are executed, possibly in parallel, in an order consistent with their dependencies. A popular and effective way to extend fork-join parallelism is to allow threads to create futures. A thread creates a future to hold the results of a computation, which may or may not be executed in parallel. That result is returned when some thread touches that future, blocking if necessary until the result is ready. Recent research has shown that while futures can, of course, enhance parallelism in a structured way, they can have a deleterious effect on cache locality. In the worst case, futures can incur
Ξ©
(
P
T
β
+
t
T
β
)
\Omega(P T_\infty + t T_\infty)
Ξ©
(
P
T
β
β
+
t
T
β
β
)
deviations, which implies
Ξ©
(
C
P
T
β
+
C
t
T
β
)
\Omega(C P T_\infty + C t T_\infty)
Ξ©
(
CP
T
β
β
+
Ct
T
β
β
)
additional cache misses, where
C
C
C
is the number of cache lines,
P
P
P
is the number of processors,
t
t
t
is the number of touches, and
T
β
T_\infty
T
β
β
is the \emph{computation span}. Since cache locality has a large impact on software performance on modern multicores, this result is troubling. In this paper, however, we show that if futures are used in a simple, disciplined way, then the situation is much better: if each future is touched only once, either by the thread that created it, or by a thread to which the future has been passed from the thread that created it, then parallel executions with work stealing can incur at most
O
(
C
P
T
β
2
)
O(C P T^2_\infty)
O
(
CP
T
β
2
β
)
additional cache misses, a substantial improvement. This structured use of futures is characteristic of many (but not all) parallel applications
Similar works
Full text
Available Versions
CiteSeerX
See this paper in CORE
Go to the repository landing page
Download from data provider
oai:CiteSeerX.psu:10.1.1.755.6...
Last time updated on 30/10/2017