5 research outputs found

    Architectural and Software Optimizations for Next- Generation Heterogeneous Low-Power Mobile Application Processors /

    No full text
    State-of-the-art smartphones and tablets have evolved to the level of having feature-rich applications comparable to those of interactive desktop programs, providing high- quality visual and auditory experiences. Furthermore, mobile processors are becoming increasingly complex in order to respond to this more diverse and demanding application base. Many mobile processors, such as the Qualcomm Snapdragon 800, have begun to include features such as multi-level data caches, complex branch prediction, and multi-core architectures. The high-performance mobile processor domain is unique in a number of ways. The mobile software ecosystem provides a central repository of robust applications that rely on device-specific framework libraries. These devices contain numerous sensors, such as accelerometers, GPS, and proximity detectors. They are always-on and always-connected, continuously communicating and updating information in the background, while also being used for periods of intensive computational tasks like playing video games or providing interactive navigation. The peak performance that is demanded of these devices rivals that of a high-performance desktop, while most of the time a much lower level of performance is required. Given this, heterogeneous processor topologies have been introduced to handle these large swings in performance demands. Additionally, these devices need to be compact and able to easily be carried on a person, so challenges exist in terms of area and heat dissipation. Given this, many of the microarchitectural hardware structures found in these mobile devices are often smaller or less complex than their desktop equivalents. This thesis develops a novel three-pronged optimization framework. First, the compiler-device interface is enhanced to allow more high-level application information to be relayed onto the device and underlying microarchitecture. Second, application-specific information is gleaned and used to optimize program execution. Lastly, the microarchitecture itself is augmented to dynamically detect and respond to changes in program execution patterns. The high-level goal of these three approaches is to extend the continuum of the heterogeneous processor topology and provide additional granularity to help deliver the necessary performance for the least amount of power during execution. The proposed optimization framework is shown to improve a broad range of structures, including branch prediction, instruction and data caches, and instruction pipeline

    Miss reduction in embedded processors through dynamic, power-friendly cache design

    No full text
    Today, embedded processors are expected to be able to run complex, algorithm-heavy applications that were originally designed and coded for general-purpose processors. As a result, traditional methods for addressing performance and determinism become inadequate. This paper explores a new data cache design for use in modern high-performance em-bedded processors that will dynamically improve execution time, power efficiency, and determinism within the system. The simulation results show significant improvement in cache miss ratios and reduction in power consumption of approxi-mately 30 % and 15%, respectively

    Reducing impact of cache miss stalls in embedded systems by extracting guaranteed independent instructions

    Get PDF
    Today, embedded processors are expected to be able to run algorithmically complex, memory-intensive applications that were originally designed and coded for general-purpose processors. As such, the impact of memory latencies on the execution time increasingly becomes evident. All the while, it is also expected that embedded processors be power-conscientious as well as of minimal area impact, as they are often used in mobile devices such as wireless smartphones and portable MP3 players. As a result, traditional methods for addressing performance and memory latencies, such as multiple issue, out-of-order execution and large, associative caches, are not aptly suited for the mobile embedded domain due to the significant area and power overhead. This paper explores a novel approach to mitigating execution delays caused by memory latencies that would otherwise not be possible in a regular in-order, single-issue embedded processor without large, power-hungry constructs like a Reorder Buffer (ROB). The concept relies on efficiently leveraging both compile-time and run-time information to safely allow non-data-dependent instructions to continue executing in the event of a memory stall. The simulation results show significant improvement in overall execution throughput of approximately 11%, while having a minimal impact on area overhead and power
    corecore