6 research outputs found

    An Adaptive and Integrated Low-Power Framework for Multicore Mobile Computing

    Get PDF

    System Support For Energy Efficient Mobile Computing

    Get PDF
    Mobile devices are developed rapidly and they have been an integrated part of our daily life. With the blooming of Internet of Things, mobile computing will become more and more important. However, the battery drain problem is a critical issue that hurts user experience. High performance devices require more power support, while the battery capacity only increases 5% per year on average. Researchers are working on kinds of energy saving approaches. For examples, hardware components provide different power state to save idle power; operating systems provide power management APIs to better control power dissipation. However, the system energy efficiency is still low that cannot reach users’ expectation. To improve energy efficiency, we studied how to provide system support for mobile computing in four different aspects. First, we focused on the influence of user behavior on system energy consumption. We monitored and analyzed users’ application usages information. From the results, we built battery prediction model to estimate the battery time based on user behavior and hardware components’ usage. By adjusting user behavior, we can at most double the battery time. To understand why different applications can cause such huge energy difference, we built a power profiler Bugu to figure out where does the power go. Bugu analyzes power and event information for applications, it has high accuracy and low overhead. We analyzed almost 100 mobile applications’ power behavior and several implications are derived to save energy of applications and systems. In addition, to understand the energy behavior of modern hardware architectures, we analyzed the energy consumption and performance of heterogeneous platforms and compared them with homogeneous platforms. The results show that heterogeneous platforms indeed have great potential for energy saving which mostly comes from idle and low workload situations. However, a wrong scheduling decision may cause up to 30% more energy consumption. Scheduling becomes the key point for energy efficient computing. At last, as the increased power density leads to high device temperature, we investigated the thermal management system and developed an ambient temperature aware thermal control policy Falcon. It can save 4.85% total system power and more adaptive in various environments compared with the default approach. Finally, we discussed several potential directions for future research in this field

    Effective memory management for mobile environments

    Get PDF
    Smartphones, tablets, and other mobile devices exhibit vastly different constraints compared to regular or classic computing environments like desktops, laptops, or servers. Mobile devices run dozens of so-called “apps” hosted by independent virtual machines (VM). All these VMs run concurrently and each VM deploys purely local heuristics to organize resources like memory, performance, and power. Such a design causes conflicts across all layers of the software stack, calling for the evaluation of VMs and the optimization techniques specific for mobile frameworks. In this dissertation, we study the design of managed runtime systems for mobile platforms. More specifically, we deepen the understanding of interactions between garbage collection (GC) and system layers. We develop tools to monitor the memory behavior of Android-based apps and to characterize GC performance, leading to the development of new techniques for memory management that address energy constraints, time performance, and responsiveness. We implement a GC-aware frequency scaling governor for Android devices. We also explore the tradeoffs of power and performance in vivo for a range of realistic GC variants, with established benchmarks and real applications running on Android virtual machines. We control for variation due to dynamic voltage and frequency scaling (DVFS), Just-in-time (JIT) compilation, and across established dimensions of heap memory size and concurrency. Finally, we provision GC as a global service that collects statistics from all running VMs and then makes an informed decision that optimizes across all them (and not just locally), and across all layers of the stack. Our evaluation illustrates the power of such a central coordination service and garbage collection mechanism in improving memory utilization, throughput, and adaptability to user activities. In fact, our techniques aim at a sweet spot, where total on-chip energy is reduced (20–30%) with minimal impact on throughput and responsiveness (5–10%). The simplicity and efficacy of our approach reaches well beyond the usual optimization techniques

    Energy Aware Runtime Systems for Elastic Stream Processing Platforms

    Get PDF
    Following an invariant growth in the required computational performance of processors, the multicore revolution started around 20 years ago. This revolution was mainly an answer to power dissipation constraints restricting the increase of clock frequency in single-core processors. The multicore revolution not only brought in the challenge of parallel programming, i.e. being able to develop software exploiting the entire capabilities of manycore architectures, but also the challenge of programming heterogeneous platforms. The question of “on which processing element to map a specific computational unit?”, is well known in the embedded community. With the introduction of general-purpose graphics processing units (GPGPUs), digital signal processors (DSPs) along with many-core processors on different system-on-chip platforms, heterogeneous parallel platforms are nowadays widespread over several domains, from consumer devices to media processing platforms for telecom operators. Finding mapping together with a suitable hardware architecture is a process called design-space exploration. This process is very challenging in heterogeneous many-core architectures, which promise to offer benefits in terms of energy efficiency. The main problem is the exponential explosion of space exploration. With the recent trend of increasing levels of heterogeneity in the chip, selecting the parameters to take into account when mapping software to hardware is still an open research topic in the embedded area. For example, the current Linux scheduler has poor performance when mapping tasks to computing elements available in hardware. The only metric considered is CPU workload, which as was shown in recent work does not match true performance demands from the applications. Doing so may produce an incorrect allocation of resources, resulting in a waste of energy. The origin of this research work comes from the observation that these approaches do not provide full support for the dynamic behavior of stream processing applications, especially if these behaviors are established only at runtime. This research will contribute to the general goal of developing energy-efficient solutions to design streaming applications on heterogeneous and parallel hardware platforms. Streaming applications are nowadays widely spread in the software domain. Their distinctive characiteristic is the retrieving of multiple streams of data and the need to process them in real time. The proposed work will develop new approaches to address the challenging problem of efficient runtime coordination of dynamic applications, focusing on energy and performance management.Efter en oförĂ€nderlig tillvĂ€xt i prestandakrav hos processorer, började den flerkĂ€rniga processor-revolutionen för ungefĂ€r 20 Ă„r sedan. Denna revolution skedde till största del som en lösning till begrĂ€nsningar i energieffekten allt eftersom klockfrekvensen kontinuerligt höjdes i en-kĂ€rniga processorer. Den flerkĂ€rniga processor-revolutionen medförde inte enbart utmaningen gĂ€llande parallellprogrammering, m.a.o. förmĂ„gan att utveckla mjukvara som anvĂ€nder sig av alla delelement i de flerkĂ€rniga processorerna, men ocksĂ„ utmaningen med programmering av heterogena plattformar. FrĂ„gestĂ€llningen ”pĂ„ vilken processorelement skall en viss berĂ€kning utföras?” Ă€r vĂ€l kĂ€nt inom ramen för inbyggda datorsystem. Efter introduktionen av grafikprocessorer för allmĂ€nna berĂ€kningar (GPGPU), signalprocesserings-processorer (DSP) samt flerkĂ€rniga processorer pĂ„ olika system-on-chip plattformar, Ă€r heterogena parallella plattformar idag omfattande inom mĂ„nga domĂ€ner, frĂ„n konsumtionsartiklar till mediaprocesseringsplattformar för telekommunikationsoperatörer. Processen att placera berĂ€kningarna pĂ„ en passande hĂ„rdvaruplattform kallas för utforskning av en designrymd (design-space exploration). Denna process Ă€r mycket utmanande för heterogena flerkĂ€rniga arkitekturer, och kan medföra fördelar nĂ€r det gĂ€ller energieffektivitet. Det största problemet Ă€r att de olika valmöjligheterna i designrymden kan vĂ€xa exponentiellt. Enligt den nuvarande trenden som förespĂ„r ökad heterogeniska aspekter i processorerna Ă€r utmaningen att hitta den mest passande placeringen av berĂ€kningarna pĂ„ hĂ„rdvaran Ă€nnu en forskningsfrĂ„ga inom ramen för inbyggda datorsystem. Till exempel, den nuvarande schemalĂ€ggaren i Linux operativsystemet Ă€r inkapabel att hitta en effektiv placering av berĂ€kningarna pĂ„ den underliggande hĂ„rdvaran. Det enda mĂ€tsĂ€ttet som anvĂ€nds Ă€r processorns belastning vilket, som visats i tidigare forskning, inte motsvarar den verkliga prestandan i applikationen. AnvĂ€ndning av detta mĂ€tsĂ€tt vid resursallokering resulterar i slöseri med energi. Denna forskning hĂ€rstammar frĂ„n observationerna att dessa tillvĂ€gagĂ„ngssĂ€tt inte stöder det dynamiska beteendet hos ström-processeringsapplikationer (stream processing applications), speciellt om beteendena bara etableras vid körtid. Denna forskning kontribuerar till det allmĂ€nna mĂ„let att utveckla energieffektiva lösningar för ström-applikationer (streaming applications) pĂ„ heterogena flerkĂ€rniga hĂ„rdvaruplattformar. Ström-applikationer Ă€r numera mycket vanliga i mjukvarudomĂ€n. Deras distinkta karaktĂ€r Ă€r inlĂ€sning av flertalet dataströmmar, och behov av att processera dem i realtid. Arbetet i denna forskning understöder utvecklingen av nya sĂ€tt för att lösa det utmanade problemet att effektivt koordinera dynamiska applikationer i realtid och fokus pĂ„ energi- och prestandahantering

    ?????????, ?????????, ????????? ?????? ???????????? ????????? ????????? ??????????????? ???????????? ????????? ??????????????? ??????

    Get PDF
    Department of Computer Science and EngineeringHardware with advanced functionalities and/or improved performance and efficiency has been introduced in modern computer systems. However, there exist several challenges with such emerging hardware. First, the characteristics of emerging hardware are unknown but deriving useful properties through characterization studies is hard because emerging hardware has different effects on applications with different characteristics. Second, sole use of emerging hardware is suboptimal but coordination of emerging hardware and other techniques is hard due to large and complex system state space. To address the problem, we first conduct in-depth characterization studies for emerging hardware based on applications with various characteristics. Guided by the observations from our characterization studies, we propose a set of system software techniques to effectively leverage emerging hardware. The system software techniques combine emerging hardware and other techniques to improve the performance, efficiency, and fairness of computer systems based on efficient optimization algorithms. First, we investigate system software techniques to effectively manage hardware-based last-level cache (LLC) and memory bandwidth partitioning functionalities. For effective memory bandwidth partitioning on commodity servers, we propose HyPart, a hybrid technique for practical memory bandwidth partitioning on commodity servers. HyPart combines the three widely used memory bandwidth partitioning techniques (i.e., thread packing, clock modulation, and Intel MBA) in a coordinated manner considering the characteristics of the target applications. We demonstrate the effectiveness of HyPart through the quantitative evaluation. We also propose CoPart, coordinated partitioning of LLC and memory bandwidth for fairness-aware workload consolidation on commodity servers. We first characterize the impact of LLC and memory bandwidth partitioning on the performance and fairness of the consolidated workloads. Guided by the characterization, we design and implement CoPart. CoPart dynamically profiles the characteristics of the consolidated workloads and partitions LLC and memory bandwidth in a coordinated manner to maximize the fairness of the consolidated workloads. Through the quantitative evaluation with various workloads and system configurations, we demonstrate the effectiveness of CoPart in the sense that it significantly improves the overall fairness of the consolidated workloads. Second, we investigate a system software technique to effectively leverage hardware-based power capping functionality. We first characterize the performance impact of the two key system knobs (i.e., concurrency level of the target applications and cross component power allocation) for power capping. Guided by the characterization results, we design and implement RPPC, a holistic runtime system for maximizing performance under power capping. RPPC dynamically controls the key system knobs in a cooperative manner considering the characteristics (e.g., scalability and memory intensity) of the target applications. Our evaluation results show the effectiveness of RPPC in the sense that it significantly improves the performance under power capping on various application and system configurations. Third, we investigate system software techniques for effective dynamic concurrency control on many-core systems and heterogeneous multiprocessing systems. We propose RMC, an integrated runtime system for adaptive many-core computing. RMC combines the two widely used dynamic concurrency control techniques (i.e., thread packing and dynamic threading) in a coordinated manner to exploit the advantages of both techniques. RMC quickly controls the concurrency level of the target applications through the thread packing technique to improve the performance and efficiency. RMC further improves the performance and efficiency by determining the optimal thread count through the dynamic threading technique. Our quantitative experiments show the effectiveness of RMC in the sense that it outperforms the existing dynamic concurrency control techniques in terms of the performance and energy efficiency. In addition, we also propose PALM, progress- and locality-aware adaptive task migration for efficient thread packing. We first conduct an in-depth performance analysis of thread packing with various synchronization-intensive benchmarks and system configurations and find the root causes of the performance pathologies of thread packing. Based on the characterization results, we design and implement PALM, which supports both of symmetric multiprocessing systems and heterogeneous multiprocessing systems. For efficient thread packing, PALM solves the three key problems, progress-aware task migration, locality-aware task migration, and scheduling period control. Our quantitative evaluation explains the effectiveness of PALM in the sense that it achieves substantially higher performance and energy efficiency than the conventional thread packing. We also present case studies in which PALM considerably improves the efficiency of dynamic server consolidation and the performance under power capping.ope

    Unifying DVFS and offlining in mobile multicores

    No full text
    corecore