35 research outputs found

    Balancing Interactive Performance and Budgeted Resources in Mobile Computing.

    Full text link
    In this dissertation, we explore the various limited resources involved in mobile applications --- battery energy, cellular data usage, and, critically, user attention --- and we devise principled methods for managing the tradeoffs involved in creating a good user experience. Building quality mobile applications requires developers to understand complex interactions between network usage, performance, and resource consumption. Because of this difficulty, developers commonly choose simple but suboptimal approaches that strictly prioritize performance or resource conservation. These extremes are symptoms of a lack of system-provided abstractions for managing the complexity inherent in managing performance/resource tradeoffs. By providing abstractions that help applications manage these tradeoffs, mobile systems can significantly improve user-visible performance without exhausting resource budgets. This dissertation explores three such abstractions in detail. We first present Intentional Networking, a system that provides synchronization primitives and intelligent scheduling for multi-network traffic. Next, we present Informed Mobile Prefetching, a system that helps applications decide when to prefetch data and how aggressively to spend limited battery energy and cellular data resources toward that end. Finally, we present Meatballs, a library that helps applications consider the cloudy nature of predictions when making decisions, selectively employing redundancy to mitigate uncertainty and provide more reliable performance. Overall, experiments show that these abstractions can significantly reduce interactive delay without overspending the available energy and data resources.PHDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/108956/1/brettdh_1.pd

    Smart PIN: performance and cost-oriented context-aware personal information network

    Get PDF
    The next generation of networks will involve interconnection of heterogeneous individual networks such as WPAN, WLAN, WMAN and Cellular network, adopting the IP as common infrastructural protocol and providing virtually always-connected network. Furthermore, there are many devices which enable easy acquisition and storage of information as pictures, movies, emails, etc. Therefore, the information overload and divergent content’s characteristics make it difficult for users to handle their data in manual way. Consequently, there is a need for personalised automatic services which would enable data exchange across heterogeneous network and devices. To support these personalised services, user centric approaches for data delivery across the heterogeneous network are also required. In this context, this thesis proposes Smart PIN - a novel performance and cost-oriented context-aware Personal Information Network. Smart PIN's architecture is detailed including its network, service and management components. Within the service component, two novel schemes for efficient delivery of context and content data are proposed: Multimedia Data Replication Scheme (MDRS) and Quality-oriented Algorithm for Multiple-source Multimedia Delivery (QAMMD). MDRS supports efficient data accessibility among distributed devices using data replication which is based on a utility function and a minimum data set. QAMMD employs a buffer underflow avoidance scheme for streaming, which achieves high multimedia quality without content adaptation to network conditions. Simulation models for MDRS and QAMMD were built which are based on various heterogeneous network scenarios. Additionally a multiple-source streaming based on QAMMS was implemented as a prototype and tested in an emulated network environment. Comparative tests show that MDRS and QAMMD perform significantly better than other approaches

    Virtual Machine Image Management for Elastic Resource Usage in Grid Computing

    Get PDF
    Grid Computing has evolved from an academic concept to a powerful paradigm in the area of high performance computing (HPC). Over the last few years, powerful Grid computing solutions were developed that allow the execution of computational tasks on distributed computing resources. Grid computing has recently attracted many commercial customers. To enable commercial customers to be able to execute sensitive data in the Grid, strong security mechanisms must be put in place to secure the customers' data. In contrast, the development of Cloud Computing, which entered the scene in 2006, was driven by industry: it was designed with respect to security from the beginning. Virtualization technology is used to separate the users e.g., by putting the different users of a system inside a virtual machine, which prevents them from accessing other users' data. The use of virtualization in the context of Grid computing has been examined early and was found to be a promising approach to counter the security threats that have appeared with commercial customers. One main part of the work presented in this thesis is the Image Creation Station (ICS), a component which allows users to administer their virtual execution environments (virtual machines) themselves and which is responsible for managing and distributing the virtual machines in the entire system. In contrast to Cloud computing, which was designed to allow even inexperienced users to execute their computational tasks in the Cloud easily, Grid computing is much more complex to use. The ICS makes it easier to use the Grid by overcoming traditional limitations like installing needed software on the compute nodes that users use to execute the computational tasks. This allows users to bring commercial software to the Grid for the first time, without the need for local administrators to install the software to computing nodes that are accessible by all users. Moreover, the administrative burden is shifted from the local Grid site's administrator to the users or experienced software providers that allow the provision of individually tailored virtual machines to each user. But the ICS is not only responsible for enabling users to manage their virtual machines themselves, it also ensures that the virtual machines are available on every site that is part of the distributed Grid system. A second aspect of the presented solution focuses on the elasticity of the system by automatically acquiring free external resources depending on the system's current workload. In contrast to existing systems, the presented approach allows the system's administrator to add or remove resource sets during runtime without needing to restart the entire system. Moreover, the presented solution allows users to not only use existing Grid resources but allows them to scale out to Cloud resources and use these resources on-demand. By ensuring that unused resources are shut down as soon as possible, the computational costs of a given task are minimized. In addition, the presented solution allows each user to specify which resources can be used to execute a particular job. This is useful when a job processes sensitive data e.g., that is not allowed to leave the company. To obtain a comparable function in today's systems, a user must submit her computational task to a particular resource set, losing the ability to automatically schedule if more than one set of resources can be used. In addition, the proposed solution prioritizes each set of resources by taking different metrics into account (e.g. the level of trust or computational costs) and tries to schedule the job to resources with the highest priority first. It is notable that the priority often mimics the physical distance from the resources to the user: a locally available Cluster usually has a higher priority due to the high level of trust and the computational costs, that are usually lower than the costs of using Cloud resources. Therefore, this scheduling strategy minimizes the costs of job execution by improving security at the same time since data is not necessarily transferred to remote resources and the probability of attacks by malicious external users is minimized. Bringing both components together results in a system that adapts automatically to the current workload by using external (e.g., Cloud) resources together with existing locally available resources or Grid sites and provides individually tailored virtual execution environments to the system's users

    Goddard Conference on Mass Storage Systems and Technologies, Volume 1

    Get PDF
    Copies of nearly all of the technical papers and viewgraphs presented at the Goddard Conference on Mass Storage Systems and Technologies held in Sep. 1992 are included. The conference served as an informational exchange forum for topics primarily relating to the ingestion and management of massive amounts of data and the attendant problems (data ingestion rates now approach the order of terabytes per day). Discussion topics include the IEEE Mass Storage System Reference Model, data archiving standards, high-performance storage devices, magnetic and magneto-optic storage systems, magnetic and optical recording technologies, high-performance helical scan recording systems, and low end helical scan tape drives. Additional topics addressed the evolution of the identifiable unit for processing purposes as data ingestion rates increase dramatically, and the present state of the art in mass storage technology

    Personal Data Management in the Internet of Things

    Get PDF
    Due to a sharp decrease in hardware costs and shrinking form factors, networked sensors have become ubiquitous. Today, a variety of sensors are embedded into smartphones, tablets, and personal wearable devices, and are commonly installed in homes and buildings. Sensors are used to collect data about people in their proximity, referred to as users. The collection of such networked sensors is commonly referred to as the Internet of Things. Although sensor data enables a wide range of applications from security, to efficiency, to healthcare, this data can be used to reveal unwarranted private information about users. Thus it is imperative to preserve data privacy while providing users with a wide variety of applications to process their personal data. Unfortunately, most existing systems do not meet these goals. Users are either forced to release their data to third parties, such as application developers, thus giving up data privacy in exchange for using data-driven applications, or are limited to using a fixed set of applications, such as those provided by the sensor manufacturer. To avoid this trade-off, users may chose to host their data and applications on their personal devices, but this requires them to maintain data backups and ensure application performance. What is needed, therefore, is a system that gives users flexibility in their choice of data-driven applications while preserving their data privacy, without burdening users with the need to backup their data and providing computational resources for their applications. We propose a software architecture that leverages a user's personal virtual execution environment (VEE) to host data-driven applications. This dissertation describes key software techniques and mechanisms that are necessary to enable this architecture. First, we provide a proof-of-concept implementation of our proposed architecture and demonstrate a privacy-preserving ecosystem of applications that process users' energy data as a case study. Second, we present a data management system (called Bolt) that provides applications with efficient storage and retrieval of time-series data, and guarantees the confidentiality and integrity of stored data. We then present a methodology to provision large numbers of personal VEEs on a single physical machine, and demonstrate its use with LinuX Containers (LXC). We conclude by outlining the design of an abstract framework to allow users to balance data privacy and application utility

    Contribution à la convergence d'infrastructure entre le calcul haute performance et le traitement de données à large échelle

    Get PDF
    The amount of produced data, either in the scientific community or the commercialworld, is constantly growing. The field of Big Data has emerged to handle largeamounts of data on distributed computing infrastructures. High-Performance Computing (HPC) infrastructures are traditionally used for the execution of computeintensive workloads. However, the HPC community is also facing an increasingneed to process large amounts of data derived from high definition sensors andlarge physics apparati. The convergence of the two fields -HPC and Big Data- iscurrently taking place. In fact, the HPC community already uses Big Data tools,which are not always integrated correctly, especially at the level of the file systemand the Resource and Job Management System (RJMS).In order to understand how we can leverage HPC clusters for Big Data usage, andwhat are the challenges for the HPC infrastructures, we have studied multipleaspects of the convergence: We initially provide a survey on the software provisioning methods, with a focus on data-intensive applications. We contribute a newRJMS collaboration technique called BeBiDa which is based on 50 lines of codewhereas similar solutions use at least 1000 times more. We evaluate this mechanism on real conditions and in simulated environment with our simulator Batsim.Furthermore, we provide extensions to Batsim to support I/O, and showcase thedevelopments of a generic file system model along with a Big Data applicationmodel. This allows us to complement BeBiDa real conditions experiments withsimulations while enabling us to study file system dimensioning and trade-offs.All the experiments and analysis of this work have been done with reproducibilityin mind. Based on this experience, we propose to integrate the developmentworkflow and data analysis in the reproducibility mindset, and give feedback onour experiences with a list of best practices.RĂ©sumĂ©La quantitĂ© de donnĂ©es produites, que ce soit dans la communautĂ© scientifiqueou commerciale, est en croissance constante. Le domaine du Big Data a Ă©mergĂ©face au traitement de grandes quantitĂ©s de donnĂ©es sur les infrastructures informatiques distribuĂ©es. Les infrastructures de calcul haute performance (HPC) sont traditionnellement utilisĂ©es pour l’exĂ©cution de charges de travail intensives en calcul. Cependant, la communautĂ© HPC fait Ă©galement face Ă  un nombre croissant debesoin de traitement de grandes quantitĂ©s de donnĂ©es dĂ©rivĂ©es de capteurs hautedĂ©finition et de grands appareils physique. La convergence des deux domaines-HPC et Big Data- est en cours. En fait, la communautĂ© HPC utilise dĂ©jĂ  des outilsBig Data, qui ne sont pas toujours correctement intĂ©grĂ©s, en particulier au niveaudu systĂšme de fichiers ainsi que du systĂšme de gestion des ressources (RJMS).Afin de comprendre comment nous pouvons tirer parti des clusters HPC pourl’utilisation du Big Data, et quels sont les dĂ©fis pour les infrastructures HPC, nousavons Ă©tudiĂ© plusieurs aspects de la convergence: nous avons d’abord proposĂ© uneĂ©tude sur les mĂ©thodes de provisionnement logiciel, en mettant l’accent sur lesapplications utilisant beaucoup de donnĂ©es. Nous contribuons a l’état de l’art avecune nouvelle technique de collaboration entre RJMS appelĂ©e BeBiDa basĂ©e sur 50lignes de code alors que des solutions similaires en utilisent au moins 1000 fois plus.Nous Ă©valuons ce mĂ©canisme en conditions rĂ©elles et en environnement simulĂ©avec notre simulateur Batsim. En outre, nous fournissons des extensions Ă  Batsimpour prendre en charge les entrĂ©es/sorties et prĂ©sentons le dĂ©veloppements d’unmodĂšle de systĂšme de fichiers gĂ©nĂ©rique accompagnĂ© d’un modĂšle d’applicationBig Data. Cela nous permet de complĂ©ter les expĂ©riences en conditions rĂ©ellesde BeBiDa en simulation tout en Ă©tudiant le dimensionnement et les diffĂ©rentscompromis autours des systĂšmes de fichiers.Toutes les expĂ©riences et analyses de ce travail ont Ă©tĂ© effectuĂ©es avec la reproductibilitĂ© Ă  l’esprit. Sur la base de cette expĂ©rience, nous proposons d’intĂ©grerle flux de travail du dĂ©veloppement et de l’analyse des donnĂ©es dans l’esprit dela reproductibilitĂ©, et de donner un retour sur nos expĂ©riences avec une liste debonnes pratiques
    corecore