929 research outputs found

    Microarchitectural Low-Power Design Techniques for Embedded Microprocessors

    Get PDF
    With the omnipresence of embedded processing in all forms of electronics today, there is a strong trend towards wireless, battery-powered, portable embedded systems which have to operate under stringent energy constraints. Consequently, low power consumption and high energy efficiency have emerged as the two key criteria for embedded microprocessor design. In this thesis we present a range of microarchitectural low-power design techniques which enable the increase of performance for embedded microprocessors and/or the reduction of energy consumption, e.g., through voltage scaling. In the context of cryptographic applications, we explore the effectiveness of instruction set extensions (ISEs) for a range of different cryptographic hash functions (SHA-3 candidates) on a 16-bit microcontroller architecture (PIC24). Specifically, we demonstrate the effectiveness of light-weight ISEs based on lookup table integration and microcoded instructions using finite state machines for operand and address generation. On-node processing in autonomous wireless sensor node devices requires deeply embedded cores with extremely low power consumption. To address this need, we present TamaRISC, a custom-designed ISA with a corresponding ultra-low-power microarchitecture implementation. The TamaRISC architecture is employed in conjunction with an ISE and standard cell memories to design a sub-threshold capable processor system targeted at compressed sensing applications. We furthermore employ TamaRISC in a hybrid SIMD/MIMD multi-core architecture targeted at moderate to high processing requirements (> 1 MOPS). A range of different microarchitectural techniques for efficient memory organization are presented. Specifically, we introduce a configurable data memory mapping technique for private and shared access, as well as instruction broadcast together with synchronized code execution based on checkpointing. We then study an inherent suboptimality due to the worst-case design principle in synchronous circuits, and introduce the concept of dynamic timing margins. We show that dynamic timing margins exist in microprocessor circuits, and that these margins are to a large extent state-dependent and that they are correlated to the sequences of instruction types which are executed within the processor pipeline. To perform this analysis we propose a circuit/processor characterization flow and tool called dynamic timing analysis. Moreover, this flow is employed in order to devise a high-level instruction set simulation environment for impact-evaluation of timing errors on application performance. The presented approach improves the state of the art significantly in terms of simulation accuracy through the use of statistical fault injection. The dynamic timing margins in microprocessors are then systematically exploited for throughput improvements or energy reductions via our proposed instruction-based dynamic clock adjustment (DCA) technique. To this end, we introduce a 6-stage 32-bit microprocessor with cycle-by-cycle DCA. Besides a comprehensive design flow and simulation environment for evaluation of the DCA approach, we additionally present a silicon prototype of a DCA-enabled OpenRISC microarchitecture fabricated in 28 nm FD-SOI CMOS. The test chip includes a suitable clock generation unit which allows for cycle-by-cycle DCA over a wide range with fine granularity at frequencies exceeding 1 GHz. Measurement results of speedups and power reductions are provided

    The Camassa-Holm Equation: A Loop Group Approach

    Full text link
    A map is presented that associates with each element of a loop group a solution of an equation related by a simple change of coordinates to the Camassa-Holm (CH) Equation. Certain simple automorphisms of the loop group give rise to Backlund transformations of the equation. These are used to find 2-soliton solutions of the CH equation, as well as some novel singular solutions.Comment: 19 pages, 7 figures; LaTeX with psfi

    A Timing-Monitoring Sequential for Forward and Backward Error-Detection in 28 nm FD-SOI

    Get PDF
    The increasing impact of variability on near-threshold nanometer circuits calls for a tighter online monitoring and control of the available timing margins. Error-detection sequentials are widely used together with error-correction techniques to operate digital designs with such carefully controlled far-below-worst-case margins, ensuring their correct operation even in the presence of uncertainties and variations. However, these registers are often designed only to either detect setup timing violations or to measure the available positive timing slack for a small detection-window. In this paper we propose a timing-monitoring sequential that provides both timing-monitoring modes, which can be selected at run-time depending on the desired timing-monitoring strategy. As the detection window of the presented circuit depends on the duty-cycle of the clock, either slow paths or fast paths can be monitored and measured with wide timing windows. The performance of this timing-monitoring sequential is evaluated in a 28nm FD-SOI process with post-layout simulations which show that the circuit is able to monitor a positive timing slack as small as 140 ps or to measure a path delay as fast as 50 ps. The proposed circuit is applied to a digital multiplier that was fabricated in a test chip and measurements show that the timing-monitoring sequentials are able to measure the critical path of the multiplier with a 1% accuracy and without incurring any timing violation

    Investigating the Potential of Custom Instruction Set Extensions for SHA-3 Candidates on a 16-bit Microcontroller Architecture

    Get PDF
    In this paper, we investigate the benefit of instruction set extensions for software implementations of all five SHA-3 candidates. To this end, we start from optimized assembly code for a common 16-bit microcontroller instruction set architecture. By themselves, these implementations provide reference for complexity of the algorithms on 16-bit architectures, commonly used in embedded systems. For each algorithm, we then propose suitable instruction set extensions and implement the modified processor core. We assess the gains in throughput, memory consumption, and the area overhead. Our results show that with less than 10% additional area, it is possible to increase the execution speed on average by almost 40%, while reducing memory requirements on average by more than 40%. In particular, the Grostl algorithm, which was one of the slowest algorithms in previous reference implementations, ends up being the fastest implementation by some margin, once minor (but dedicated) instruction set extensions are taken into account

    A Wireless Body Sensor Network For Activity Monitoring With Low Transmission Overhead

    Get PDF
    Activity recognition has been a research field of high interest over the last years, and it finds application in the medical domain, as well as personal healthcare monitoring during daily home- and sports-activities. With the aim of producing minimum discomfort while performing supervision of subjects, miniaturized networks of low-power wireless nodes are typically deployed on the body to gather and transmit physiological data, thus forming a Wireless Body Sensor Network (WBSN). In this work, we propose a WBSN for online activity monitoring, which combines the sensing capabilities of wearable nodes and the high computational resources of modern smartphones. The proposed solution provides different tradeoffs between classification accuracy and energy consumption, thanks to different workloads assigned to the nodes and to the mobile phone in different network configurations. In particular, our WBSN is able to achieve very high activity recognition accuracies (up to 97.2%) on multiple subjects, while significantly reducing the sampling frequency and the volume of transmitted data with respect to other state-of-the art solutions

    DynOR: A 32-bit Microprocessor in 28 nm FD-SOI with Cycle-By-Cycle Dynamic Clock Adjustment

    Get PDF
    This paper presents DynOR, a 32-bit 6-stage OpenRISC microprocessor with dynamic clock adjustment. To alleviate the issue of unused dynamic timing margins, the clock period of the processor is adjusted on a cycle-by-cycle level, based on the instruction types currently in flight in the pipeline. To this end, we employ a custom designed clock generation unit, capable of immediate glitch-free adjustments of the clock period over a wide range with fine granularity. Our chip measurements in 28nm FD-SOI technology show that DynOR provides an average speedup of 19% in program execution over a wide range of operating conditions, with a peak speedup for certain applications of up to 41%. Furthermore, this speedup can be traded off against energy, to reduce the chip power consumption for a typical die by up to 15%, compared to a static clocking scheme based on worst case excitation

    Quasilinear Schr\"odinger equations I: Small data and quadratic interactions

    Get PDF
    In this article we prove local well-posedness in low-regularity Sobolev spaces for general quasilinear Schr\"odinger equations. These results represent improvements of the pioneering works by Kenig-Ponce-Vega and Kenig-Ponce-Rolvung-Vega, where viscosity methods were used to prove existence of solutions in very high regularity spaces. Our arguments here are purely dispersive. The function spaces in which we show existence are constructed in ways motivated by the results of Mizohata, Ichinose, Doi, and others, including the authors.Comment: 25 pages, 0 figures, References Updated, Typos Fixe

    Item response theory assumptions were adequately met by the Oxford hip and knee scores

    Get PDF
    Objectives: To develop item response theory (IRT) models for the Oxford hip and knee scores which convert patient responses into continuous scores with quantifiable precision and provide these as web applications for efficient score conversion. Study Design and Setting: Data from the National Health Service patient-reported outcome measures program were used to test the assumptions of IRT (unidimensionality, monotonicity, local independence, and measurement invariance) before fitting models to preoperative response patterns obtained from patients undergoing primary elective hip or knee arthroplasty. The hip and knee datasets contained 321,147 and 355,249 patients, respectively. Results: Scree plots, Kaiser criterion analyses, and confirmatory factor analyses confirmed unidimensionality and Mokken analysis confirmed monotonicity of both scales. In each scale, all item pairs shared a residual correlation of ≤ 0.20. At the test level, both scales showed measurement invariance by age and gender. Both scales provide precise measurement in preoperative settings but demonstrate poorer precision and ceiling effects in postoperative settings. Conclusion: We provide IRT parameters and web applications that can convert Oxford Hip Score or Oxford Knee Score response sets into continuous measurements and quantify individual measurement error. These can be used in sensitivity analyses or to administer truncated and individualized computerized adaptive tests

    TamaRISC-CS: An Ultra-Low-Power Application-Specific Processor for Compressed Sensing

    Get PDF
    Compressed sensing (CS) is a universal technique for the compression of sparse signals. CS has been widely used in sensing platforms where portable, autonomous devices have to operate for long periods of time with limited energy resources. Therefore, an ultra-low-power (ULP) CS implementation is vital for these kind of energy-limited systems. Sub-threshold (sub-VT ) operation is commonly used for ULP computing, and can also be combined with CS. However, most established CS implementations can achieve either no or very limited benefit from sub-VT operation. Therefore, we propose a sub-VT application-specific instruction-set processor (ASIP), exploiting the specific operations of CS. Our results show that the proposed ASIP accomplishes 62x speed-up and 11.6x power savings with respect to an established CS implementation running on the baseline low-power processor

    Item response theory may account for unequal item weighting and individual-level measurement error in trials that use PROMs : a psychometric sensitivity analysis of the TOPKAT trial

    Get PDF
    To apply item response theory as a framework for studying measurement error in superiority trials which use patient-reported outcome measures (PROMs). We reanalyzed data from the TOPKAT trial, which compared the Oxford Knee Score (OKS) responses of patients undergoing partial or total knee replacement, using traditional sum-scoring, after accounting for OKS item characteristics with expected a posteriori (EAP) scoring, and after accounting for individual-level measurement error with plausible value imputation (PVI). We compared the marginalized mean scores of each group at baseline, 2 months, and yearly for 5 years. We used registry data to estimate the minimal important difference (MID) of OKS scores with sum-scoring and EAP scoring. With sum-scoring, we found statistically significant differences in mean OKS score at 2 months (p=0.030) and 1 year (p=0.030). EAP scores produced slightly different results, with statistically significant differences at 1 year (p=0.041) and 3 years (p=0.043). With PVI, there were no statistically significant differences. Psychometric sensitivity analyses can be readily performed for superiority trials using PROMs and may aid the interpretation of results. [Abstract copyright: Copyright © 2023 The Author(s). Published by Elsevier Inc. All rights reserved.
    • …
    corecore