894 research outputs found
RAPPID: an asynchronous instruction length decoder
Journal ArticleThis paper describes an investigation of potential advantages and risks of applying an aggressive asynchronous design methodology to Intel Architecture. RAPPID ("Revolving Asynchronous PentiumÂź Processor Instruction Decoder"), a prototype IA32 instruction length decoding and steering unit, was implemented using self-timed techniques. RAPPID chip was fabricated on a 0.25m CMOS process and tested successfully. Results show significant advantages-in particular, performance of 2.5-4.5 instructions/nS-with manageable risks using this design technology. RAPPID achieves three times the throughput and half the latency, dissipating only half the power and requiring about the same area as an existing 400MHz clocked circuit
An asynchronous instruction length decoder
Journal ArticleAbstract-This paper describes an investigation of potential advantages and pitfalls of applying an asynchronous design methodology to an advanced microprocessor architecture. A prototype complex instruction set length decoding and steering unit was implemented using self-timed circuits. [The Revolving Asynchronous PentiumÂź Processor Instruction Decoder (RAPPID) design implemented the complete Pentium IIÂź 32-bit MMX instruction set.] The prototype chip was fabricated on a 0.25-CMOS process and tested successfully. Results show significant advantages-in particular, performance of 2.5-4.5 instructions per nanosecond-with manageable risks using this design technology. The prototype achieves three times the throughput and half the latency, dissipating only half the power and requiring about the same area as the fastest commercial 400-MHz clocked circuit fabricated on the same process
Recommended from our members
Differences in partnership and marital status at first birth by womenâs and partnersâ education: Evidence from Britain 1991-2012
Non-marital childbearing, especially within cohabitation, has become increasingly common in Britain as in other Western countries. Nonetheless, births outside marriage occur more frequently among the relatively disadvantaged in terms of income potential. Building upon previous research in family formation patterns, we examine differences by education and employment status in the proportion of marital and non-marital first births among British women and couples over the past two decades. In particular, we explore trends in educational differences in non-marital first births among women and the role of partnersâ joint educational attainment in relation to childbearing within cohabitation or within marriage. We find a steady increase in cohabiting first births among all educational groups, without significant change in unpartnered births. However, differences by education in non-marital first births have not increased significantly during the observed period. Male partnerâs education is negatively associated with childbearing within cohabitation, although this relationship varies according to womenâs educational attainment.This work was supported by the Philomathia Social Sciences Research Programme, University of Cambridge
Parallel Access of Out-Of-Core Dense Extendible Arrays
Datasets used in scientific and engineering applications are often modeled as dense multi-dimensional arrays. For very large datasets, the corresponding array models are typically stored out-of-core as array files. The array elements are mapped onto linear consecutive locations that correspond to the linear ordering of the multi-dimensional indices. Two conventional mappings used are the row-major order and the column-major order of multi-dimensional arrays. Such conventional mappings of dense array files highly limit the performance of applications and the extendibility of the dataset. Firstly, an array file that is organized in say row-major order causes applications that subsequently access the data in column-major order, to have abysmal performance. Secondly, any subsequent expansion of the array file is limited to only one dimension. Expansions of such out-of-core conventional arrays along arbitrary dimensions, require storage reorganization that can be very expensive. Wepresent a solution for storing out-of-core dense extendible arrays that resolve the two limitations. The method uses a mapping function F*(), together with information maintained in axial vectors, to compute the linear address of an extendible array element when passed its k-dimensional index. We also give the inverse function, F-1*() for deriving the k-dimensional index when given the linear address. We show how the mapping function, in combination with MPI-IO and a parallel file system, allows for the growth of the extendible array without reorganization and no significant performance degradation of applications accessing elements in any desired order. We give methods for reading and writing sub-arrays into and out of parallel applications that run on a cluster of workstations. The axial-vectors are replicated and maintained in each node that accesses sub-array elements
Temperature measurement in the IntelÂź CoreTM Duo Processor
Modern CPUs with increasing core frequency and power are rapidly reaching a point that the CPU frequency and performance are limited by the amount of heat that can be extracted by the cooling technology. In mobile environment, this issue is becoming more apparent, as form factors become thinner and lighter. Often, mobile platforms trade CPU performance in order to reduce power and manage the box thermals. Most of today's high performance CPUs provide thermal sensor on the die to allow thermal management, typically in the form of analog thermal diode. Operating system algorithms and platform embedded controllers read the temperature and control the processor power. In addition to full temperature reading, some products implement digital sensors with fixed temperature threshold, intended for fail safe operation. Temperature measurements using the diode suffer some inherent inaccuracies : ? Measurement accuracy - An external device connects to the diode and performs the A/D conversion. The combination of diode behavior, electrical noise and conversion accuracy result with measurement error ? Distance to the die hot spot - Due to routing restrictions, the diode is not placed at the hottest spot on the die. The temperature difference between the diode and the hot spot varies with the workload and the reported temperature dose not accurately represent the die max temperature. This offset is increasing as power density of the CPU increase. multiple core CPUs introduce harder problem to address as the workload and the thermal distribution changes with the different active cores. ? Manufacturing temperature accuracy - Inaccuracies in the test environment induce additional temperature inaccuracy between the measured temperature vs. the actual temperature. As a result to these effects, the thermal control algorithm requires to add temperature guard bend to account for the control feedback errors. These impact the performance and reliability of the silicon. In order to address the thermal control issues, the IntelÂź CoreTM Duo has introduced a new digital temperature reading capability on die. Multiple thermal sensors are distributed on the die on different possible hot spots. An A/D logic built around these sensors translates the temperature into a digital value, accessible to operating system thermal control S/W, or driver based control mechanism. Providing high accuracy temperature reading requires a calibration process. During high volume manufacturing, each sensor is calibrated to provide good accuracy and linearity. The die specification and reliability limitation is defined by the hottest spot on the die. In addition the calibration of the sensor is done at the same test conditions as the specification testing. Any test control inaccuracy is eliminated because the part is guaranteed to meet specifications at max temperature, as measured by the digital thermometer. As a result, the use of integrated thermal sensor enables improved reliability and performance at high workloads while meeting specifications at ant time. In this paper we will present the implementation and calibration details of the digital thermometer. We will show some studies of the temperature distribution on die and compare traditional diode based measurement to the digital sensor implementation
Utility of human life cost in anaesthesiology cost-benefit decisions
The United States (US) aviation industry provides a potentially useful mental model for dealing with certain cost-benefit decisions in aesthesiology. The Federal Aviation Administration (FAA), the national aviation authority of the United States, quantifies a price for the value of a human life based on the U.S. Department of Transportationâs (DOT) value of a statistical life (VSL) unit. The current VSL is around 9.4 million [1]. To illustrate the concept, if the FAA estimates that 100 people are likely to die in the future given the current practice standards then the monetary cost of this loss will be 940 million cost then the FAA will not adopt the proposed regulation and hence will not require the industry to undertake this cost
Recommended from our members
Optimal Chunking of Large Multidimensional Arrays for Data Warehousing
Very large multidimensional arrays are commonly used in data intensive scientific computations as well as on-line analytical processingapplications referred to as MOLAP. The storage organization of such arrays on disks is done by partitioning the large global array into fixed size sub-arrays called chunks or tiles that form the units of data transfer between disk and memory. Typical queries involve the retrieval of sub-arrays in a manner that access all chunks that overlap the query results. An important metric of the storage efficiency is the expected number of chunks retrieved over all such queries. The question that immediately arises is"what shapes of array chunks give the minimum expected number of chunks over a query workload?" The problem of optimal chunking was first introduced by Sarawagi and Stonebraker who gave an approximate solution. In this paper we develop exact mathematical models of the problem and provide exact solutions using steepest descent and geometric programming methods. Experimental results, using synthetic and real life workloads, show that our solutions are consistently within than 2.0percent of the true number of chunks retrieved for any number of dimensions. In contrast, the approximate solution of Sarawagi and Stonebraker can deviate considerably from the true result with increasing number of dimensions and also may lead to suboptimal chunk shapes
Comparative transcriptomics of pathogenic and non-pathogenic Listeria species
Comparative RNA-seq analysis of two related pathogenic and non-pathogenic bacterial strains reveals a hidden layer of divergence in the non-coding genome as well as conserved, widespread regulatory structures called âExcludons', which mediate regulation through long non-coding antisense RNAs
- âŠ