109 research outputs found

    Mitigating the Effect of Misspeculations in Superscalar Processors

    Get PDF
    Modern superscalar processors highly rely on the speculative execution which speculatively executes instructions and then verifies. If the prediction is different from the execution result, a misspeculation recovery is performed. Misspeculation recovery penalties still account for a substantial amount of performance reduction. This work focuses on the techniques to mitigate the effect of recovery penalties and proposes practical mechanisms which are thoroughly implemented and analyzed. In general, we can divide the misspeculation penalty into four parts: misspeculation detection delay; stale instruction elimination delay; state restoration delay and pipeline fill delay. This dissertation does not consider the detection delay, instead, we design four innovative mechanisms. Some of these mechanisms target a specific recovery delay whereas others target multiple types of delay in a unified algorithm. Mower was designed to address the stale instruction elimination delay and the state restoration delay by using a special walker. When a misprediction is detected, the walker will scan and repair the instructions which are younger than the mispredicted instruction. During the walking procedure, the correct state is restored and the stale instructions are eliminated. Based on Mower, we further simplify the design and develop a Two-Phase recovery mechanism. This mechanism uses only a basic recovery mechanism except for the case in which the retire stage was stalled by a long latency instruction. When the retire stage is stalled, the second phase is launched and the instructions in the pipeline are re-fetched. Two-Phase mechanism recovers from an earlier point in the program and overlaps the recovery penalty with the long latency penalty. In reality, some of the instructions on the wrong path can be reused during the recovery. However, such reuse of misprediction results is not easy and most of the time involves significant complexity. We design Passing Loop to reduce the pipeline fill delay. We applied our mechanism only for short forward branches which eliminates a substantial amount of complexity. In terms of memory dependence speculation and associated delays due to memory ordering violations, we develop a mechanism that optimizes store-queue-free architectures. A store-queue-free architecture experiences more memory dependence mispredictions due to its aggressive approach to speculations. A common solution is to delay the execution of an instruction which is more likely to be mispredicted. We propose a mechanism to dynamically insert predicates for comparing the address of memory instructions, which is called “Dynamic Memory Dependence Predication” (DMDP). This mechanism boosts the instruction execution to its earliest point and reduces the number of mispredictions

    Plasmonic Metamaterials: Physical Background and Some Technological Applications

    Get PDF
    New technological frontiers appear every year, and few are as intriguing as the field of plasmonic metamaterials (PMMs). These uniquely designed materials use coherent electron oscillations to accomplish an astonishing array of tasks, and they present diverse opportunities in many scientific fields. This paper consists of an explanation of the scientific background of PMMs and some technological applications of these fascinating materials. The physics section addresses the foundational concepts necessary to understand the operation of PMMs, while the technology section addresses various applications, like precise biological and chemical sensors, cloaking devices for several frequency ranges, nanoscale photovoltaics, experimental optical computing components, and superlenses that can surpass the diffraction limit of conventional optics

    Improving processor efficiency by exploiting common-case behaviors of memory instructions

    Get PDF
    Processor efficiency can be described with the help of a number of  desirable effects or metrics, for example, performance, power, area, design complexity and access latency. These metrics serve as valuable tools used in designing new processors and they also act as  effective standards for comparing current processors. Various factors impact the efficiency of modern out-of-order processors and one important factor is the manner in which instructions are processed through the processor pipeline. In this dissertation research, we study the impact of load and store instructions (collectively known as memory instructions) on processor efficiency,  and show how to improve efficiency by exploiting common-case or  predictable patterns in the behavior of memory instructions. The memory behavior patterns that we focus on in our research are the predictability of memory dependences, the predictability in data forwarding patterns,   predictability in instruction criticality and conservativeness in resource allocation and deallocation policies. We first design a scalable  and high-performance memory dependence predictor and then apply accurate memory dependence prediction to improve the efficiency of the fetch engine of a simultaneous multi-threaded processor. We then use predictable data forwarding patterns to eliminate power-hungry  hardware in the processor with no loss in performance.  We then move to  studying instruction criticality to improve  processor efficiency. We study the behavior of critical load instructions  and propose applications that can be optimized using  predictable, load-criticality  information. Finally, we explore conventional techniques for allocation and deallocation  of critical structures that process memory instructions and propose new techniques to optimize the same.  Our new designs have the potential to reduce  the power and the area required by processors significantly without losing  performance, which lead to efficient designs of processors.Ph.D.Committee Chair: Loh, Gabriel H.; Committee Member: Clark, Nathan; Committee Member: Jaleel, Aamer; Committee Member: Kim, Hyesoon; Committee Member: Lee, Hsien-Hsin S.; Committee Member: Prvulovic, Milo

    An evaluation of the TRIPS computer system

    Get PDF
    The TRIPS system employs a new instruction set architecture (ISA) called Explicit Data Graph Execution (EDGE) that renegotiates the boundary between hardware and software to expose and exploit concurrency. EDGE ISAs use a block-atomic execution model in which blocks are composed of dataflow instructions. The goal of the TRIPS design is to mine concurrency for high performance while tolerating emerging technology scaling challenges, such as increasing wire delays and power consumption. This paper evaluates how well TRIPS meets this goal through a detailed ISA and performance analysis. We compare performance, using cycles counts, to commercial processors. On SPEC CPU2000, the Intel Core 2 outperforms compiled TRIPS code in most cases, although TRIPS matches a Pentium 4. On simple benchmarks, compiled TRIPS code outperforms the Core 2 by 10% and hand-optimized TRIPS code outperforms it by factor of 3. Compared to conventional ISAs, the block-atomic model provides a larger instruction window, increases concurrency at a cost of more instructions executed, and replaces register and memory accesses with more efficient direct instruction-to-instruction communication. Our analysis suggests ISA, microarchitecture, and compiler enhancements for addressing weaknesses in TRIPS and indicates that EDGE architectures have the potential to exploit greater concurrency in future technologies

    高効率なメモリ順序違反検出機構に関する研究

    Get PDF
    学位の種別: 課程博士審査委員会委員 : (主査)東京大学教授 浅見 徹, 東京大学教授 坂井 修一, 東京大学准教授 田浦 健次朗, 東京大学准教授 豊田 正史, 国立情報学研究所教授 五島 正裕University of Tokyo(東京大学

    高効率なメモリ順序違反検出機構に関する研究

    Get PDF
    学位の種別: 課程博士審査委員会委員 : (主査)東京大学教授 浅見 徹, 東京大学教授 坂井 修一, 東京大学准教授 田浦 健次朗, 東京大学准教授 豊田 正史, 国立情報学研究所教授 五島 正裕University of Tokyo(東京大学

    Code optimizations for narrow bitwidth architectures

    Get PDF
    This thesis takes a HW/SW collaborative approach to tackle the problem of computational inefficiency in a holistic manner. The hardware is redesigned by restraining the datapath to merely 16-bit datawidth (integer datapath only) to provide an extremely simple, low-cost, low-complexity execution core which is best at executing the most common case efficiently. This redesign, referred to as the Narrow Bitwidth Architecture, is unique in that although the datapath is squeezed to 16-bits, it continues to offer the advantage of higher memory addressability like the contemporary wider datapath architectures. Its interface to the outside (software) world is termed as the Narrow ISA. The software is responsible for efficiently mapping the current stack of 64-bit applications onto the 16-bit hardware. However, this HW/SW approach introduces a non-negligible penalty both in dynamic code-size and performance-impact even with a reasonably smart code-translator that maps the 64- bit applications on to the 16-bit processor. The goal of this thesis is to design a software layer that harnesses the power of compiler optimizations to assuage this negative performance penalty of the Narrow ISA. More specifically, this thesis focuses on compiler optimizations targeting the problem of how to compile a 64-bit program to a 16-bit datapath machine from the perspective of Minimum Required Computations (MRC). Given a program, the notion of MRC aims to infer how much computation is really required to generate the same (correct) output as the original program. Approaching perfect MRC is an intrinsically ambitious goal and it requires oracle predictions of program behavior. Towards this end, the thesis proposes three heuristic-based optimizations to closely infer the MRC. The perspective of MRC unfolds into a definition of productiveness - if a computation does not alter the storage location, it is non-productive and hence, not necessary to be performed. In this research, the definition of productiveness has been applied to different granularities of the data-flow as well as control-flow of the programs. Three profile-based, code optimization techniques have been proposed : 1. Global Productiveness Propagation (GPP) which applies the concept of productiveness at the granularity of a function. 2. Local Productiveness Pruning (LPP) applies the same concept but at a much finer granularity of a single instruction. 3. Minimal Branch Computation (MBC) is an profile-based, code-reordering optimization technique which applies the principles of MRC for conditional branches. The primary aim of all these techniques is to reduce the dynamic code footprint of the Narrow ISA. The first two optimizations (GPP and LPP) perform the task of speculatively pruning the non-productive (useless) computations using profiles. Further, these two optimization techniques perform backward traversal of the optimization regions to embed checks into the nonspeculative slices, hence, making them self-sufficient to detect mis-speculation dynamically. The MBC optimization is a use case of a broader concept of a lazy computation model. The idea behind MBC is to reorder the backslices containing narrow computations such that the minimal necessary computations to generate the same (correct) output are performed in the most-frequent case; the rest of the computations are performed only when necessary. With the proposed optimizations, it can be concluded that there do exist ways to smartly compile a 64-bit application to a 16- bit ISA such that the overheads are considerably reduced.Esta tesis deriva su motivación en la inherente ineficiencia computacional de los procesadores actuales: a pesar de que muchas aplicaciones contemporáneas tienen unos requisitos de ancho de bits estrechos (aplicaciones de enteros, de red y multimedia), el hardware acaba utilizando el camino de datos completo, utilizando más recursos de los necesarios y consumiendo más energía. Esta tesis utiliza una aproximación HW/SW para atacar, de forma íntegra, el problema de la ineficiencia computacional. El hardware se ha rediseñado para restringir el ancho de bits del camino de datos a sólo 16 bits (únicamente el de enteros) y ofrecer así un núcleo de ejecución simple, de bajo consumo y baja complejidad, el cual está diseñado para ejecutar de forma eficiente el caso común. El rediseño, llamado en esta tesis Arquitectura de Ancho de Bits Estrecho (narrow bitwidth en inglés), es único en el sentido que aunque el camino de datos se ha estrechado a 16 bits, el sistema continúa ofreciendo las ventajas de direccionar grandes cantidades de memoria tal como procesadores con caminos de datos más anchos (64 bits actualmente). Su interface con el mundo exterior se denomina ISA estrecho. En nuestra propuesta el software es responsable de mapear eficientemente la actual pila software de las aplicaciones de 64 bits en el hardware de 16 bits. Sin embargo, esta aproximación HW/SW introduce penalizaciones no despreciables tanto en el tamaño del código dinámico como en el rendimiento, incluso con un traductor de código inteligente que mapea las aplicaciones de 64 bits en el procesador de 16 bits. El objetivo de esta tesis es el de diseñar una capa software que aproveche la capacidad de las optimizaciones para reducir el efecto negativo en el rendimiento del ISA estrecho. Concretamente, esta tesis se centra en optimizaciones que tratan el problema de como compilar programas de 64 bits para una máquina de 16 bits desde la perspectiva de las Mínimas Computaciones Requeridas (MRC en inglés). Dado un programa, la noción de MRC intenta deducir la cantidad de cómputo que realmente se necesita para generar la misma (correcta) salida que el programa original. Aproximarse al MRC perfecto es una meta intrínsecamente ambiciosa y que requiere predicciones perfectas de comportamiento del programa. Con este fin, la tesis propone tres heurísticas basadas en optimizaciones que tratan de inferir el MRC. La utilización de MRC se desarrolla en la definición de productividad: si un cálculo no altera el dato que ya había almacenado, entonces no es productivo y por lo tanto, no es necesario llevarlo a cabo. Se han propuesto tres optimizaciones del código basadas en profile: 1. Propagación Global de la Productividad (GPP en inglés) aplica el concepto de productividad a la granularidad de función. 2. Poda Local de Productividad (LPP en inglés) aplica el mismo concepto pero a una granularidad mucho más fina, la de una única instrucción. 3. Computación Mínima del Salto (MBC en inglés) es una técnica de reordenación de código que aplica los principios de MRC a los saltos condicionales. El objetivo principal de todas esta técnicas es el de reducir el tamaño dinámico del código estrecho. Las primeras dos optimizaciones (GPP y LPP) realizan la tarea de podar especulativamente las computaciones no productivas (innecesarias) utilizando profiles. Además, estas dos optimizaciones realizan un recorrido hacia atrás de las regiones a optimizar para añadir chequeos en el código no especulativo, haciendo de esta forma la técnica autosuficiente para detectar, dinámicamente, los casos de fallo en la especulación. La idea de la optimización MBC es reordenar las instrucciones que generan el salto condicional tal que las mínimas computaciones que general la misma (correcta) salida se ejecuten en la mayoría de los casos; el resto de las computaciones se ejecutarán sólo cuando sea necesario

    Benjamin Britten’s compositions for children and amateurs: cloaking simplicity behind the veil of sophistication.

    Get PDF
    The principal aim of this thesis is to examine a portion of Britten’s oeuvre which was intended for performance by children and amateurs and thereby to assess the extent of the composer’s success in achieving music which betrayed all the hallmarks of a sophisticated composition, yet providing a score which was accessible to its performers. Indeed, ‘because of his refusal to ‘write down’ to the child performer, although drawing on his considerable technical expertise to tailor his music to the appropriate level of skill, there can be little doubt that this body of work contributes significantly towards what Humphrey Carpenter termed Britten’s “‘huge achievement.’

    Benjamin Britten’s compositions for children and amateurs: cloaking simplicity behind the veil of sophistication.

    Get PDF
    The principal aim of this thesis is to examine a portion of Britten’s oeuvre which was intended for performance by children and amateurs and thereby to assess the extent of the composer’s success in achieving music which betrayed all the hallmarks of a sophisticated composition, yet providing a score which was accessible to its performers. Indeed, ‘because of his refusal to ‘write down’ to the child performer, although drawing on his considerable technical expertise to tailor his music to the appropriate level of skill, there can be little doubt that this body of work contributes significantly towards what Humphrey Carpenter termed Britten’s “‘huge achievement.’

    Night moves: A mise-en-scene of a luminous economy

    Get PDF
    Since the general electrification of ambient urban lighting in the late nineteenth century, complex arrangements of functional and aesthetic lighting have become increasingly deployed to intensify the capitalization of the city at night. Contemporary solid-state lighting integrated with networked control systems means that scenic effects once contained within theatrical and cinematic production, have infiltrated the built spaces that we occupy. As digital imaging technologies converge with the built environment, the city at night can be considered as a moving image. This research considers the implications of the nocturnal city when it is understood as a manufactured atmosphere, where the distinctions between media interfaces and the construction of urban space are no longer distinguishable as distinct zones of experience. By employing Bertolt Brecht’s and Antonin Artaud’s concepts of a mise-en-scene of light as a critical and transformational tool, the thesis develops connections between current theories of atmosphere and post-cinematic urbanism. The thesis proposes a practice-based analytical and critical mise-en-sce ne that draws on embodied empirical methods for creating lens, light and sound-based artworks within installation art and the urban environment. This research explores the effects of light and digital projection on urban subjectivity and its representations. Recent formulations of atmosphere in Gernot Böhme’s phenomenological conception of architectural atmospheres and Andreas Philippopoulos-Mihalopoulos’ theorisation of lawscape are integrated into a broader corpus of analysis and theory through empirical, theoretical and historical modes of enquiry. Together, the written thesis and body of practice provide the framework phototropia. This aims to establish a transversal platform for critical thought and practice from which to think and remake the city at night. From the perspective of a material practice this method offers ways of understanding the changing relations between imaging technologies and contemporary urban subjectivity
    corecore