13 research outputs found

    Vector-thread architecture and implementation

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 181-186).This thesis proposes vector-thread architectures as a performance-efficient solution for all-purpose computing. The VT architectural paradigm unifies the vector and multithreaded compute models. VT provides the programmer with a control processor and a vector of virtual processors. The control processor can use vector-fetch commands to broadcast instructions to all the VPs or each VP can use thread-fetches to direct its own control flow. A seamless intermixing of the vector and threaded control mechanisms allows a VT architecture to flexibly and compactly encode application parallelism and locality. VT architectures can efficiently exploit a wide variety of loop-level parallelism, including non-vectorizable loops with cross-iteration dependencies or internal control flow. The Scale VT architecture is an instantiation of the vector-thread paradigm designed for low-power and high-performance embedded systems. Scale includes a scalar RISC control processor and a four-lane vector-thread unit that can execute 16 operations per cycle and supports up to 128 simultaneously active virtual processor threads. Scale provides unit-stride and strided-segment vector loads and stores, and it implements cache refill/access decoupling. The Scale memory system includes a four-port, non-blocking, 32-way set-associative, 32 KB cache. A prototype Scale VT processor was implemented in 180 nm technology using an ASIC-style design flow. The chip has 7.1 million transistors and a core area of 16.6 mm2, and it runs at 260 MHz while consuming 0.4-1.1 W. This thesis evaluates Scale using a diverse selection of embedded benchmarks, including example kernels for image processing, audio processing, text and data processing, cryptography, network processing, and wireless communication.(cont.) Larger applications also include a JPEG image encoder and an IEEE 802.11 la wireless transmitter. Scale achieves high performance on a range of different types of codes, generally executing 3-11 compute operations per cycle. Unlike other architectures which improve performance at the expense of increased energy consumption, Scale is generally even more energy efficient than a scalar RISC processor.by Ronny Meir Krashinsky.Ph.D

    Understanding and Leveraging Virtualization Technology in Commodity Computing Systems

    Get PDF
    Commodity computing platforms are imperfect, requiring various enhancements for performance and security purposes. In the past decade, virtualization technology has emerged as a promising trend for commodity computing platforms, ushering many opportunities to optimize the allocation of hardware resources. However, many abstractions offered by virtualization not only make enhancements more challenging, but also complicate the proper understanding of virtualized systems. The current understanding and analysis of these abstractions are far from being satisfactory. This dissertation aims to tackle this problem from a holistic view, by systematically studying the system behaviors. The focus of our work lies in performance implication and security vulnerabilities of a virtualized system.;We start with the first abstraction---an intensive memory multiplexing for I/O of Virtual Machines (VMs)---and present a new technique, called Batmem, to effectively reduce the memory multiplexing overhead of VMs and emulated devices by optimizing the operations of the conventional emulated Memory Mapped I/O in hypervisors. Then we analyze another particular abstraction---a nested file system---and attempt to both quantify and understand the crucial aspects of performance in a variety of settings. Our investigation demonstrates that the choice of a file system at both the guest and hypervisor levels has significant impact upon I/O performance.;Finally, leveraging utilities to manage VM disk images, we present a new patch management framework, called Shadow Patching, to achieve effective software updates. This framework allows system administrators to still take the offline patching approach but retain most of the benefits of live patching by using commonly available virtualization techniques. to demonstrate the effectiveness of the approach, we conduct a series of experiments applying a wide variety of software patches. Our results show that our framework incurs only small overhead in running systems, but can significantly reduce maintenance window

    Toward Reliable and Efficient Message Passing Software for HPC Systems: Fault Tolerance and Vector Extension

    Get PDF
    As the scale of High-performance Computing (HPC) systems continues to grow, researchers are devoted themselves to achieve the best performance of running long computing jobs on these systems. My research focus on reliability and efficiency study for HPC software. First, as systems become larger, mean-time-to-failure (MTTF) of these HPC systems is negatively impacted and tends to decrease. Handling system failures becomes a prime challenge. My research aims to present a general design and implementation of an efficient runtime-level failure detection and propagation strategy targeting large-scale, dynamic systems that is able to detect both node and process failures. Using multiple overlapping topologies to optimize the detection and propagation, minimizing the incurred overhead sand guaranteeing the scalability of the entire framework. Results from different machines and benchmarks compared to related works shows that my design and implementation outperforms non-HPC solutions significantly, and is competitive with specialized HPC solutions that can manage only MPI applications. Second, I endeavor to implore instruction level parallelization to achieve optimal performance. Novel processors support long vector extensions, which enables researchers to exploit the potential peak performance of target architectures. Intel introduced Advanced Vector Extension (AVX512 and AVX2) instructions for x86 Instruction Set Architecture (ISA). Arm introduced Scalable Vector Extension (SVE) with a new set of A64 instructions. Both enable greater parallelisms. My research utilizes long vector reduction instructions to improve the performance of MPI reduction operations. Also, I use gather and scatter feature to speed up the packing and unpacking operation in MPI. The evaluation of the resulting software stack under different scenarios demonstrates that the approach is not only efficient but also generalizable to many vector architecture and efficient

    Annotierte interaktive nichtlineare Videos - Software Suite, Download- und Cache-Management

    Get PDF
    Modern Web technology makes the dream of fully interactive and enriched video come true. Nowadays it is possible to organize videos in a non-linear way playing in a sequence unknown in advance. Furthermore, additional information can be added to the video, ranging from short descriptions to animated images and further videos. This affords an easy and efficient to use authoring tool which is capable of the management of the single media objects, as well as a clear arrangement of the links between the parts. Tools of this kind can be found rarely and do mostly not provide the full range of needed functions. While providing an interactive experience to the viewer in the Web player, parallel plot sequences and additional information lead to an increased download volume. This may cause pauses during playback while elements have to be downloaded which are displayed with the video. A good quality of experience for these videos with small waiting times and a playback without interruptions is desired. This work presents the SIVA Suite to create the previously described annotated interactive non-linear videos. We propose a video model for interactivity, non-linearity, and annotations, which is implemented in an XML format, an authoring tool, and a player. Video is the main medium, whereby different scenes are linked to a scene graph. Time controlled additional content called annotations, like text, images, audio files, or videos, is added to the scenes. The user is able to navigate in the scene graph by selecting a button at a button panel. Furthermore, other navigational elements like a table of contents or a keyword search are provided. Besides the SIVA Suite, this thesis presents algorithms and strategies for download and cache management to provide a good quality of experience while watching the annotated interactive non-linear videos. Therefor, we implemented a standard-independent player framework. Integrated into a simulation environment, the framework allows to evaluate algorithms and strategies for the calculation of start-up times, and the selection of elements to pre-fetch into and delete from the cache. Their interaction during the playback of non-linear video contents can be analyzed. The algorithms and strategies can be used to minimize interruptions in the video flow after user interactions. Our extensive evaluation showed that our techniques result in faster start-up times and lesser interruptions in the video flow than those of other players. Knowledge of the structure of an interactive non-linear video can be used to minimize the start-up time at the beginning of a video while minimizing an increase in the overall download volume.Moderne Web-Technologien lassen den Traum von voll interaktiven und bereicherten Videos wahr werden. Heutzutage ist es möglich, Videos in nicht-linearer Art und Weise zu organisieren, welche dann in einer vorher unbekannten Reihenfolge abgespielt werden können. Weiterhin können den Videos Zusatzinformationen in Form von kurzen Beschreibungen über animierte Bilder bis hin zu weiteren Videos hinzugefügt werden. Dies erfordert ein einfach und effizient zu bedienendes Autorenwerkzeug, das in der Lage ist, sowohl einzelne Medien-Objekte zu verwalten, als auch die Verbindungen zwischen den einzelnen Teilen klar darzustellen. Tools dieser Art sind selten und bieten meist nicht den vollen benötigten Funktionsumfang. Während dem Betrachter dieses interaktive Erlebnis im Web Player zur Verfügung gestellt wird, führen parallele Handlungsstränge und zusätzliche Inhalte zu einem erhöhten Download-Volumen. Dies kann zu Pausen während der Wiedergabe führen, in denen Elemente vom Server geladen werden müssen, welche mit dem Video angezeigt werden sollen. Ein gutes Benutzungserlebnis für solche Videos kann durch geringe Wartezeiten und eine unterbrechungsfreie Wiedergabe erreicht werden. Diese Arbeit stellt die SIVA Suite vor, mit der die zuvor beschriebenen annotierten interaktiven nicht-linearen Videos erstellt werden können. Wir bilden Interaktivität, Nichtlinearität und Annotationen in einem Video-Model ab. Dieses wird in unserem XML-Format, Autorentool und Player umgesetzt. Als Leitmedium werden hierbei Videos verwendet, welche aufgeteilt in Szenen zu einer Graphstruktur zusammengefügt werden können. Zeitlich gesteuerte zusätzliche Inhalte, sogenannte Annotationen, wie Texte, Bilder, Audio-Dateien und Videos, werden den Szenen hinzugefügt. Der Betrachter kann im Szenengraph navigieren, indem er in einem bereitgestellten Button-Panel eine Nachfolgeszene auswählt. Andere Navigationselemente sind ein Inhaltsverzeichnis sowie eine Suchfunktion. Neben der SIVA Suite beschreibt diese Arbeit Algorithmen und Strategien für Download und Cache Management, um eine gute Nutzungserfahrung während der Betrachtung der annotierten interaktiven nicht-linearen Videos zu bieten. Ein Webstandard-unabhängiges Playerframework erlaubt es, das Zusammenspiel von Algorithmen und Strategien zu evaluieren, welche für die Berechnung der Start-Zeitpunkte für die Wiedergabe, sowie die Auswahl von vorauszuladenden sowie zu löschenden Elemente verwendet werden. Ziel ist es, Unterbrechungen zu minimieren, wenn der Ablauf des Videos durch Benutzerinteraktion beeinflusst wird. Unsere umfassende Evaluation zeigte, dass es möglich ist, kürzere Startup-Zeiten und weniger Unterbrechungen mit unseren Strategien zu erreichen, als bei der Verwendung der Strategien anderer Player. Die Kenntnis der Struktur des interaktiven nicht-linearen Videos kann dazu verwendet werden, die Startzeit am Anfang der Szenen zu minimieren, während das Download-Volumen nicht erhöht wird
    corecore