Search CORE

1,794 research outputs found

Well-Structured Futures and Cache Locality

Author: Herlihy Maurice
Liu Zhiyu
Publication venue
Publication date: 16/08/2016
Field of study

In fork-join parallelism, a sequential program is split into a directed acyclic graph of tasks linked by directed dependency edges, and the tasks are executed, possibly in parallel, in an order consistent with their dependencies. A popular and effective way to extend fork-join parallelism is to allow threads to create futures. A thread creates a future to hold the results of a computation, which may or may not be executed in parallel. That result is returned when some thread touches that future, blocking if necessary until the result is ready. Recent research has shown that while futures can, of course, enhance parallelism in a structured way, they can have a deleterious effect on cache locality. In the worst case, futures can incur

\Omega(P T_\infty + t T_\infty)

deviations, which implies

\Omega(C P T_\infty + C t T_\infty)

additional cache misses, where

C

is the number of cache lines,

P

is the number of processors,

t

is the number of touches, and

T_\infty

is the \emph{computation span}. Since cache locality has a large impact on software performance on modern multicores, this result is troubling. In this paper, however, we show that if futures are used in a simple, disciplined way, then the situation is much better: if each future is touched only once, either by the thread that created it, or by a thread to which the future has been passed from the thread that created it, then parallel executions with work stealing can incur at most

O(C P T^2_\infty)

additional cache misses, a substantial improvement. This structured use of futures is characteristic of many (but not all) parallel applications

arXiv.org e-Print Archive

CiteSeerX

Cutting sequence and Sturmian sequence in billiard

Author: Liu Zhiyu
Publication venue
Publication date: 09/10/2022
Field of study

The winning rule of billiards is to drive the billiard ball on the table into the designated holes. We try to study the trajectory of the billiard ball, so that we can predict the direction of the ball. For rational slopes, we got cutting sequence by setting up the square torus. We simplified cutting sequence using shearing and flipping and we obtain the transformation between trajectory slope and cutting sequence. For irrational slopes, we look at some properties of Sturmian sequence, which help us distinguish between cutting sequence and Sturmian sequence. In conclusion, in the case of different slopes, we use different sequences to do research.Comment: 29 pages,11 figures. arXiv admin note: text overlap with arXiv:1507.02571 by other author

arXiv.org e-Print Archive

Root Cause Analysis of Refinement Engineering for Automobile M

Author: Liu Zhiyu
Publication venue: The Repository at St. Cloud State
Publication date: 01/05/2016
Field of study

This project focused on finding the solutions for tiny flaws that appeared on auto M due to the production process. The core problem was discovering the root cause of tiny flaws and proposing on how to solve them. Brainstorming, data collection and analysis, measurement and material test, Gage R&R project and other methods were used to look for the root causes. During the project, the defect of production was confirmed. With the root cause defined, the quality of auto M improved and the problem solved observably

St. Cloud State University

Abortable Reader-Writer Locks are No More Complex Than Abortable Mutex Locks

Author: Liu Zhiyu
Publication venue: Dartmouth Digital Commons
Publication date: 01/06/2012
Field of study

When a process attempts to acquire a mutex lock, it may be forced to wait if another process currently holds the lock. In certain applications, such as real-time operating systems and databases, indefinite waiting can cause a process to miss an important deadline. Hence, there has been research on designing abortable mutual exclusion locks, and fairly efficient algorithms of O(log n) RMR complexity have been discovered (n denotes the number of processes for which the algorithm is designed). The abort feature is just as important for a reader-writer lock as it is for a mutual exclusion lock, but to the best of our knowledge there are currently no abortable reader-writer locks that are starvation-free. We show the surprising result that any abortable, starvation-free mutual exclusion algorithm of RMR complexity t(n) can be transformed into an abortable, starvation-free reader-writer exclusion algorithm of RMR complexity O(t(n)). Thus, we obtain the first abortable, starvation-free reader-writer exclusion algorithm of O(log n) RMR complexity. Our results apply to the Cache-Coherent (CC) model of multiprocessors

Dartmouth Digital Commons (Dartmouth College)