218 research outputs found
Performance Optimization Strategies for Transactional Memory Applications
This thesis presents tools for Transactional Memory (TM) applications that cover multiple TM systems (Software, Hardware, and hybrid TM) and use information of all different layers of the TM software stack. Therefore, this thesis addresses a number of challenges to extract static information, information about the run time behavior, and expert-level knowledge to develop these new methods and strategies for the optimization of TM applications
QuakeTM: Parallelizing a complex serial application using transactional memory
'Is transactional memory useful?' is the question that cannot be answered until we provide substantial applications that can evaluate its capabilities. While existing TM applications can partially answer the above question, and are useful in the sense that they provide a first-order TM experimentation framework, they serve only as a proof of concept and fail to make a conclusive case for wide adoption by the general computing community.
This work presents QuakeTM, a multiplayer game server; a complex real life TM application that was parallelized from the serial version with TM-specific considerations in mind. QuakeTM consists of 27,600 lines of code spread among 49 files and exhibits irregular parallelism and coarse-grain transactions with large read and write sets. In spite of its complexity, we show that QuakeTM does scale, however more effort is needed to decrease the overhead and the abort rate of current software transactional memory systems. We give insights into development challenges, suggest techniques to solve them and provide extensive analysis of transactional behavior of QuakeTM, with an emphasis and discussion of the TM promise of making parallel programming easy.Postprint (published version
Recommended from our members
Software lock elision for x86 machine code
More than a decade after becoming a topic of intense research there is no
transactional memory hardware nor any examples of software transactional memory
use outside the research community. Using software transactional memory in large
pieces of software needs copious source code annotations and often means
that standard compilers and debuggers can no longer be used. At the same time,
overheads associated with software transactional memory fail to motivate
programmers to expend the needed effort to use software transactional
memory. The only way around the overheads in the case of general unmanaged code
is the anticipated availability of hardware support. On the other hand, architects
are unwilling to devote power and area budgets in mainstream microprocessors to
hardware transactional memory, pointing to transactional memory being a
"niche" programming construct. A deadlock has thus ensued that is blocking
transactional memory use and experimentation in the mainstream.
This dissertation covers the design and construction of a software transactional
memory runtime system called SLE_x86 that can potentially break this
deadlock by decoupling transactional memory from programs using it. Unlike most
other STM designs, the core design principle is transparency rather than
performance. SLE_x86 operates at the level of x86 machine code, thereby
becoming immediately applicable to binaries for the popular x86
architecture. The only requirement is that the binary synchronise using known
locking constructs or calls such as those in Pthreads or OpenMP
libraries. SLE_x86 provides speculative lock elision (SLE) entirely in
software, executing critical sections in the binary using transactional
memory. Optionally, the critical sections can also be executed without using
transactions by acquiring the protecting lock.
The dissertation makes a careful analysis of the impact on performance due to
the demands of the x86 memory consistency model and the need to transparently
instrument x86 machine code. It shows that both of these problems can be
overcome to reach a reasonable level of performance, where transparent
software transactional memory can perform better than a lock. SLE_x86 can
ensure that programs are ready for transactional memory in any form, without
being explicitly written for it
Secured Client Portal
This project is aimed at developing an online search Portal for the Placement Department of the college. The system is an online application that can be accessed throughout the organization and outside as well with proper login provided. This system can be used as an Online Job Portal for the Placement Department of the college to manage the student information with regards to placement. Students logging should be able to upload their information in the form of a CV. Visitors/Company representatives logging in may also access/search any information put up by Students
Accelerating Transactional Memory by Exploiting Platform Specificity
Transactional Memory (TM) is one of the most promising alternatives to lock-based concurrency, but there still remain obstacles that keep TM from being utilized in the real world. Performance, in terms of high scalability and low latency, is always one of the most important keys to general purpose usage. While most of the research in this area focuses on improving a specific single TM implementation and some default platform (a certain operating system, compiler and/or processor), little has been conducted on improving performance more generally, and across platforms.We found that by utilizing platform specificity, we could gain tremendous performance improvement and avoid unnecessary costs due to false assumptions of platform properties, on not only a single TM implementation, but many. In this dissertation, we will present our findings in four sections: 1) we discover and quantify hidden costs from inappropriate compiler instrumentations, and provide sug- gestions and solutions; 2) we boost a set of mainstream timestamp-based TM implementations with the x86-specific hardware cycle counter; 3) we explore compiler opportunities to reduce the transaction abort rate, by reordering read-modify-write operations — the whole technique can be applied to all TM implementations, and could be more effective with some help from compilers; and 4) we coordinate the state-of-the-art Intel Haswell TSX hardware TM with a software TM “Cohorts”, and develop a safe and flexible Hybrid TM, “HyCo”, to be our final performance boost in this dissertation.The impact of our research extends beyond Transactional Memory, to broad areas of concurrent programming. Some of our solutions and discussions, such as the synchronization between accesses of the hardware cycle counter and memory loads and stores, can be utilized to boost concurrent data structures and many timestamp-based systems and applications. Others, such as discussions of compiler instrumentation costs and reordering opportunities, provide additional insights to compiler designers. Our findings show that platform specificity must be taken into consideration to achieve peak performance
Enhancing the efficiency and practicality of software transactional memory on massively multithreaded systems
Chip Multithreading (CMT) processors promise to deliver higher performance by running more than one stream of instructions in parallel. To exploit CMT's capabilities, programmers have to parallelize their applications, which is not a trivial task. Transactional Memory (TM) is one of parallel programming models that aims at simplifying synchronization by raising the level of abstraction between semantic atomicity and the means by which that atomicity is achieved. TM is a promising programming model but there are still important challenges that must be addressed to make it more practical and efficient in mainstream parallel programming.
The first challenge addressed in this dissertation is that of making the evaluation of TM proposals more solid with realistic TM benchmarks and being able to run the same benchmarks on different STM systems. We first introduce a benchmark suite, RMS-TM, a comprehensive benchmark suite to evaluate HTMs and STMs. RMS-TM consists of seven applications from the Recognition, Mining and Synthesis (RMS) domain that are representative of future workloads. RMS-TM features current TM research issues such as nesting and I/O inside transactions, while also providing various TM characteristics. Most STM systems are implemented as user-level libraries: the programmer is expected to manually instrument not only transaction boundaries, but also individual loads and stores within transactions. This library-based approach is increasingly tedious and error prone and also makes it difficult to make reliable performance comparisons. To enable an "apples-to-apples" performance comparison, we then develop a software layer that allows researchers to test the same applications with interchangeable STM back ends.
The second challenge addressed is that of enhancing performance and scalability of TM applications running on aggressive multi-core/multi-threaded processors. Performance and scalability of current TM designs, in particular STM desings, do not always meet the programmer's expectation, especially at scale. To overcome this limitation, we propose a new STM design, STM2, based on an assisted execution model in which time-consuming TM operations are offloaded to auxiliary threads while application threads optimistically perform computation. Surprisingly, our results show that STM2 provides, on average, speedups between 1.8x and 5.2x over state-of-the-art STM systems. On the other hand, we notice that assisted-execution systems may show low processor utilization. To alleviate this problem and to increase the efficiency of STM2, we enriched STM2 with a runtime mechanism that automatically and adaptively detects application and auxiliary threads' computing demands and dynamically partition hardware resources between the pair through the hardware thread prioritization mechanism implemented in POWER machines.
The third challenge is to define a notion of what it means for a TM program to be correctly synchronized. The current definition of transactional data race requires all transactions to be totally ordered "as if'' serialized by a global lock, which limits the scalability of TM designs. To remove this constraint, we first propose to relax the current definition of transactional data race to allow a higher level of concurrency. Based on this definition we propose the first practical race detection algorithm for C/C++ applications (TRADE) and implement the corresponding race detection tool. Then, we introduce a new definition of transactional data race that is more intuitive, transparent to the underlying TM implementation, can be used for a broad set of C/C++ TM programs. Based on this new definition, we proposed T-Rex, an efficient and scalable race detection tool for C/C++ TM applications. Using TRADE and T-Rex, we have discovered subtle transactional data races in widely-used STAMP applications which have not been reported in the past
Efficient query processing in managed runtimes
This thesis presents strategies to improve the query evaluation performance over
huge volumes of relational-like data that is stored in the memory space of managed
applications. Storing and processing application data in the memory space of managed
applications is motivated by the convergence of two recent trends in data management.
First, dropping DRAM prices have led to memory capacities that allow the entire working
set of an application to fit into main memory and to the emergence of in-memory
database systems (IMDBs). Second, language-integrated query transparently integrates
query processing syntax into programming languages and, therefore, allows complex
queries to be composed in the application. IMDBs typically serve as data stores to applications
written in an object-oriented language running on a managed runtime. In
this thesis, we propose a deeper integration of the two by storing all application data in
the memory space of the application and using language-integrated query, combined
with query compilation techniques, to provide fast query processing.
As a starting point, we look into storing data as runtime-managed objects in collection
types provided by the programming language. Queries are formulated using
language-integrated query and dynamically compiled to specialized functions that produce
the result of the query in a more efficient way by leveraging query compilation
techniques similar to those used in modern database systems. We show that the generated
query functions significantly improve query processing performance compared to
the default execution model for language-integrated query. However, we also identify
additional inefficiencies that can only be addressed by processing queries using low-level
techniques which cannot be applied to runtime-managed objects. To address this,
we introduce a staging phase in the generated code that makes query-relevant managed
data accessible to low-level query code. Our experiments in .NET show an improvement
in query evaluation performance of up to an order of magnitude over the default
language-integrated query implementation.
Motivated by additional inefficiencies caused by automatic garbage collection, we
introduce a new collection type, the black-box collection. Black-box collections integrate
the in-memory storage layer of a relational database system to store data and hide
the internal storage layout from the application by employing existing object-relational
mapping techniques (hence, the name black-box). Our experiments show that black-box
collections provide better query performance than runtime-managed collections
by allowing the generated query code to directly access the underlying relational in-memory
data store using low-level techniques. Black-box collections also outperform
a modern commercial database system. By removing huge volumes of collection data
from the managed heap, black-box collections further improve the overall performance
and response time of the application and improve the application’s scalability when
facing huge volumes of collection data.
To enable a deeper integration of the data store with the application, we introduce
self-managed collections. Self-managed collections are a new type of collection for
managed applications that, in contrast to black-box collections, store objects. As the
data elements stored in the collection are objects, they are directly accessible from the
application using references which allows for better integration of the data store with
the application. Self-managed collections manually manage the memory of objects
stored within them in a private heap that is excluded from garbage collection. We introduce
a special collection syntax and a novel type-safe manual memory management
system for this purpose. As was the case for black-box collections, self-managed collections
improve query performance by utilizing a database-inspired data layout and
allowing the use of low-level techniques. By also supporting references between collection
objects, they outperform black-box collections
- …