37 research outputs found
A Survey of Performance Optimization for Mobile Applications
Nowadays there is a mobile application for almost everything a user may think of, ranging from paying bills and gathering information to playing games and watching movies. In order to ensure user satisfaction and success of applications, it is important to provide high performant applications. This is particularly important for resource constraint systems such as mobile devices. Thereby, non-functional performance characteristics, such as energy and memory consumption, play an important role for user satisfaction. This paper provides a comprehensive survey of non-functional performance optimization for Android applications. We collected 155 unique publications, published between 2008 and 2020, that focus on the optimization of non-functional performance of mobile applications. We target our search at four performance characteristics, in particular: responsiveness, launch time, memory and energy consumption. For each performance characteristic, we categorize optimization approaches based on the method used in the corresponding publications. Furthermore, we identify research gaps in the literature for future work
Service Abstractions for Scalable Deep Learning Inference at the Edge
Deep learning driven intelligent edge has already become a reality, where millions of mobile, wearable, and IoT devices analyze real-time data and transform those into actionable insights on-device. Typical approaches for optimizing deep learning inference mostly focus on accelerating the execution of individual inference tasks, without considering the contextual correlation unique to edge environments and the statistical nature of learning-based computation. Specifically, they treat inference workloads as individual black boxes and apply canonical system optimization techniques, developed over the last few decades, to handle them as yet another type of computation-intensive applications. As a result, deep learning inference on edge devices still face the ever increasing challenges of customization to edge device heterogeneity, fuzzy computation redundancy between inference tasks, and end-to-end deployment at scale. In this thesis, we propose the first framework that automates and scales the end-to-end process of deploying efficient deep learning inference from the cloud to heterogeneous edge devices. The framework consists of a series of service abstractions that handle DNN model tailoring, model indexing and query, and computation reuse for runtime inference respectively. Together, these services bridge the gap between deep learning training and inference, eliminate computation redundancy during inference execution, and further lower the barrier for deep learning algorithm and system co-optimization. To build efficient and scalable services, we take a unique algorithmic approach of harnessing the semantic correlation between the learning-based computation. Rather than viewing individual tasks as isolated black boxes, we optimize them collectively in a white box approach, proposing primitives to formulate the semantics of the deep learning workloads, algorithms to assess their hidden correlation (in terms of the input data, the neural network models, and the deployment trials) and merge common processing steps to minimize redundancy
Reduced complexity multicast beamforming and group assignment schemes for multi-antenna coded caching
Abstract. In spite of recent advancements in wireless communication technologies and data delivery networks, it is unlikely that the speeds supported by these networks will be able to keep up with the exponentially increasing demand caused by the widespread adoption of high-speed and large-data applications. One appealing idea proposed to address this issue is coded caching, which is an innovative data delivery technique that makes use of the network’s aggregate cache rather than the individual memory available to each user. This proposed idea of coded caching helps boost the data rates by distributing cache material throughout the network and delivering independent content to many users at a time. Despite the original theoretical promises for large caching gains, in reality, coded caching suffers from severe bottlenecks that dramatically limit these gains. Some of these bottlenecks are requiring complex successive interference cancellation (SIC) at the receiver, exponential increase in subpacketization, applicability to a limited range of input parameters, and performance losses in low- and mid- signal to noise ratio (SNR) regimes. In this study, we present a novel coded caching scheme based on user grouping for cache-aided multi-input single-output (MISO) networks. One special property of this new scheme is its applicability to every set of input values for the user count (), transmitter-side antenna count (), and the global coded caching gain (). Moreover, for a fixed , this scheme can achieve theoretical sum-DoF optimality with no limitations. This strategy yields superior performance in terms of subpacketization when input parameters satisfy . This performance boost is enabled by the underlying user grouping structure during data delivery. However, when input parameters do not comply with , in order to guarantee symmetry of the scheme and optimal DoF, multicast and unicast messages need to be constructed using a tree diagram, resulting in excess subpacketization and transmission count. Nevertheless, the simple receiver structure without the SIC requirement not only simplifies the implementation complexity but also enables us to use state-of-the-art methods to readily design optimized transmit beamformers maximizing the achievable symmetric rate. Finally, we use numerical analysis to compare our new proposed scheme with well-known coded caching schemes in the literature
Recommended from our members
Designing Efficient and Accurate Behavior-Aware Mobile Systems
The proliferation of sensors on smartphones, tablets and wearables has led to a plethora of behavior classification algorithms designed to sense various aspects of individual user\u27s behavior such as daily habits, activity, physiology, mobility, sleep, emotional and social contexts. This ability to sense and understand behaviors of mobile users will drive the next generation of mobile applications providing services based on the users\u27 behavioral patterns. In this thesis, we investigate ways in which we can enhance and utilize the understanding of user behaviors in such applications. In particular, we focus on identifying the key challenges in the following three aspects of behavior-aware applications: detection, understanding, and prediction of user behaviors; and present systems and techniques developed to address these challenges. In this thesis, we first demonstrate the utility of wristbands equipped with inertial sensors in real-time detection of health-related behaviors such as smoking and eating. Our approach detects these behaviors in a passive manner without any explicit user interaction and does not require use of any cumbersome device. Our results show that we can detect smoking with 95% accuracy, 91% precision and 81% recall in the natural environment. Second, we design a context-query engine for sensing multiple user contexts continuously, accurately and efficiently on mobile devices; the key necessity for understanding and analyzing behaviors. Our context-query engine performs information fusion of contexts for an individual user to enable optimizations like i) energy-efficient sensing, and ii) accurate context inference. Our results show that we can improve accuracy of a context classifier by up to 42% and reduce the number of classifiers required to observe the user state by 33%. Finally, we demonstrate the utility of predicting app usage behavior, in improving the freshness of mobile apps such as Facebook that present users with the latest content fetched from remote servers. We present an app prediction algorithm that utilizes user contexts to predict the app a user is likely to use and pre-fetches the data over the network for the predicted app. We show that our proposed algorithm delivers application content to the user that is on an average fresh within 3 minutes
Measuring and Mitigating Potential Risks of Third-party Resource Inclusions
In today's computer services, developers commonly use third-party resources like libraries, hosting infrastructure and advertisements. Using third-party components improves the efficiency and enhances the quality of developing custom applications. However, while using third-party resources adopts their benefits, it adopts their vulnerabilities, as well. Unfortunately, developers are uninformed about the risks, as a result of which, the services are susceptible to various attacks. There has been a lot of work on how to develop first-hand secure services. The key focus in my thesis is quantifying the risks in the inclusion of third-party resources and looking into possible ways of mitigating them. Based on the fundamental ways that risks arise, we broadly classify them into Direct and Indirect Risks. Direct risk is the risk that comes with invoking the third-party resource incorrectly—even if the third party is otherwise trustworthy whereas indirect risk is the risk that comes with the third-party resource potentially acting in an untrustworthy manner—even if it were invoked correctly.
To understand the security related direct risks in third-party inclusions, we study cryptographic frameworks. Developers often use these frameworks incorrectly and introduce security vulnerabilities. This is because current cryptographic frameworks erode abstraction boundaries, as they do not encapsulate all the framework-specific knowledge and expect developers to understand security attacks and defenses. Starting from the documented misuse cases of cryptographic APIs, we infer five developer needs and we show that a good API design would address these needs only partially. Building on this observation, we propose APIs that are semantically meaningful for developers. We show how these interfaces can be implemented consistently on top of existing frameworks using novel and known design patterns, and we propose build management hooks for isolating security workarounds needed during the development and test phases.
To understand the performance related direct risks in third-party inclusions, we study resource hints in webpage HTML. Today's websites involve loading a large number of resources, resulting in a considerable amount of time issuing DNS requests, requesting resources, and waiting for responses. As an optimization for these time sinks, websites may load resource hints, such as DNS prefetch, preconnect, preload, pre-render, and prefetch tags in their HTML files to cause clients to initiate DNS queries and resource fetches early in their web-page downloads before encountering the precise resource to download. We explore whether websites are making effective use of resource hints using techniques based on the tool we developed to obtain a complete snapshot of a webpage at a given point in time. We find that many popular websites are highly ineffective in their use of resource hints, causing clients to query and connect to extraneous domains, download unnecessary data, and may even use resource hints to bypass ad blockers.
To evaluate the indirect risks, we study the web topology. Users who visit benign, popular websites are unfortunately bombarded with malicious popups, malware- loading sites, and phishing sites. The questions we want to address here are: Which domains are responsible for such malicious activity? At what point in the process of loading a popular, trusted website does the trust break down to loading dangerous content? To answer these questions, we first understand what third-party resources websites load (both directly and indirectly). I present a tool that constructs the most complete map of a website’s resource-level topology to date. This is surprisingly nontrivial; most prior work used only a single run of a single tool (e.g., Puppeteer or Selenium), but I show that this misses a significant fraction of resources. I then apply my tool to collect the resource topology graphs of 20,000 websites from the Alexa ranking, and analyze them to understand which third-party resource inclusions lead to malicious resources. I believe that these third-party inclusions are not always constant or blocked by existing Ad-blockers. We argue that greater accountability of these third parties can lead to a safer web
Recommended from our members
Minimally Invasive Solutions to Challenges Posed by Mobility Changes
Today, things have changed radically. As network technologies have proliferated and evolved, the components of, and participants in, computerized systems have become increasingly decoupled. Users travel and commute while connecting to their office computer or home media server. Hardware devices may be carried by users, move on their own, or reside in data centers, never to be seen or touched by end-users. Even operating systems (OSes) and applications may now migrate across the network while executing, thanks to advances in virtualization that are only just beginning to remake the computing landscape. The decoupling of users, devices, and software has invalidated properties that enabled desired functionality: resulting in compromised function. Power interfaces utilize physi- cal user interactions to determine when transitions between high and lower power states should occur; what happens when users are no longer physically present? Operating system execution often relies on components such as CPU and local disk responding with tightly bounded delays; what should be done when the OS itself is in the process of migrating between two separate physical machines? The fundamental question explored by this dissertation is: Can we find highly adoptable solutions to restore desired functionality that has been lost because of changed mobility characteristics? Our emphasis on adoptability stems from pragmatic concerns: if a solution is difficult to adopt, it is highly unlikely to be used. Consequently, while many potential approaches may involve changes to the network itself, our work focuses on modifying end-point behavior. We show that practical solutions implemented solely in software and deployed only on network endpoints can be developed for a wide problem range. We consider concrete challenges arising from user, device, and software mobility changes, affecting sub-disciplines spanning cloud computing, green computing, and wireless networks. Cloud Computing: Users increasingly utilize virtual machine (VM) technology to migrate and replicate OS and software amongst networked hosts. Traditional execution required one VM image copy on each host's local storage. By transitioning to networked execution, dozens, if not hundreds, of VM replicas may now be distributed from a single networked storage location to a commensurately large set of physical machines. As these systems expand, they have come to be plagued by boot storms (and similar problems) caused when networked access to storage becomes a major bottleneck, drastically delaying VM distribution and execution. Can we develop techniques that resolve this network bottleneck without the need for expensive hardware over-provisioning? Green Computing: Remote access technologies have enabled users to travel while still interacting with computational machinery left in the office or home. Yet, energy savings mechanisms have traditionally relied on the activity of attached peripherals to determine power usage. The shift to remote interaction, which bypasses physically attached peripherals, has effectively broken these energy savings mechanisms. Can we build an economic and practical system that accommodates energy efficiency without compromising the fluid remote interactions users have now come to expect? Wireless Computing: Increasingly advanced mobile devices have provoked a shift towards heavy usage of 3G and 4G bandwidth use. Accordingly, the capacity of infrastructure wireless networks becomes increasingly strained. Can we find a way of supplementing this relatively low-latency infrastructure with high-latency, high-bandwidth opportunistic content exchange? In each scenario, we design a solution that aims to strike the proper balance between adoptability and technical efficiency - producing what we believe are rigorous, practical and adoptable solutions
Memory Subsystem Optimization Techniques for Modern High-Performance General-Purpose Processors
abstract: General-purpose processors propel the advances and innovations that are the subject of humanity’s many endeavors. Catering to this demand, chip-multiprocessors (CMPs) and general-purpose graphics processing units (GPGPUs) have seen many high-performance innovations in their architectures. With these advances, the memory subsystem has become the performance- and energy-limiting aspect of CMPs and GPGPUs alike. This dissertation identifies and mitigates the key performance and energy-efficiency bottlenecks in the memory subsystem of general-purpose processors via novel, practical, microarchitecture and system-architecture solutions.
Addressing the important Last Level Cache (LLC) management problem in CMPs, I observe that LLC management decisions made in isolation, as in prior proposals, often lead to sub-optimal system performance. I demonstrate that in order to maximize system performance, it is essential to manage the LLCs while being cognizant of its interaction with the system main memory. I propose ReMAP, which reduces the net memory access cost by evicting cache lines that either have no reuse, or have low memory access cost. ReMAP improves the performance of the CMP system by as much as 13%, and by an average of 6.5%.
Rather than the LLC, the L1 data cache has a pronounced impact on GPGPU performance by acting as the bandwidth filter for the rest of the memory subsystem. Prior work has shown that the severely constrained data cache capacity in GPGPUs leads to sub-optimal performance. In this thesis, I propose two novel techniques that address the GPGPU data cache capacity problem. I propose ID-Cache that performs effective cache bypassing and cache line size selection to improve cache capacity utilization. Next, I propose LATTE-CC that considers the GPU’s latency tolerance feature and adaptively compresses the data stored in the data cache, thereby increasing its effective capacity. ID-Cache and LATTE-CC are shown to achieve 71% and 19.2% speedup, respectively, over a wide variety of GPGPU applications.
Complementing the aforementioned microarchitecture techniques, I identify the need for system architecture innovations to sustain performance scalability of GPG- PUs in the face of slowing Moore’s Law. I propose a novel GPU architecture called the Multi-Chip-Module GPU (MCM-GPU) that integrates multiple GPU modules to form a single logical GPU. With intelligent memory subsystem optimizations tailored for MCM-GPUs, it can achieve within 7% of the performance of a similar but hypothetical monolithic die GPU. Taking a step further, I present an in-depth study of the energy-efficiency characteristics of future MCM-GPUs. I demonstrate that the inherent non-uniform memory access side-effects form the key energy-efficiency bottleneck in the future.
In summary, this thesis offers key insights into the performance and energy-efficiency bottlenecks in CMPs and GPGPUs, which can guide future architects towards developing high-performance and energy-efficient general-purpose processors.Dissertation/ThesisDoctoral Dissertation Computer Science 201