35 research outputs found
Recommended from our members
Minimally Invasive Solutions to Challenges Posed by Mobility Changes
Today, things have changed radically. As network technologies have proliferated and evolved, the components of, and participants in, computerized systems have become increasingly decoupled. Users travel and commute while connecting to their office computer or home media server. Hardware devices may be carried by users, move on their own, or reside in data centers, never to be seen or touched by end-users. Even operating systems (OSes) and applications may now migrate across the network while executing, thanks to advances in virtualization that are only just beginning to remake the computing landscape. The decoupling of users, devices, and software has invalidated properties that enabled desired functionality: resulting in compromised function. Power interfaces utilize physi- cal user interactions to determine when transitions between high and lower power states should occur; what happens when users are no longer physically present? Operating system execution often relies on components such as CPU and local disk responding with tightly bounded delays; what should be done when the OS itself is in the process of migrating between two separate physical machines? The fundamental question explored by this dissertation is: Can we find highly adoptable solutions to restore desired functionality that has been lost because of changed mobility characteristics? Our emphasis on adoptability stems from pragmatic concerns: if a solution is difficult to adopt, it is highly unlikely to be used. Consequently, while many potential approaches may involve changes to the network itself, our work focuses on modifying end-point behavior. We show that practical solutions implemented solely in software and deployed only on network endpoints can be developed for a wide problem range. We consider concrete challenges arising from user, device, and software mobility changes, affecting sub-disciplines spanning cloud computing, green computing, and wireless networks. Cloud Computing: Users increasingly utilize virtual machine (VM) technology to migrate and replicate OS and software amongst networked hosts. Traditional execution required one VM image copy on each host's local storage. By transitioning to networked execution, dozens, if not hundreds, of VM replicas may now be distributed from a single networked storage location to a commensurately large set of physical machines. As these systems expand, they have come to be plagued by boot storms (and similar problems) caused when networked access to storage becomes a major bottleneck, drastically delaying VM distribution and execution. Can we develop techniques that resolve this network bottleneck without the need for expensive hardware over-provisioning? Green Computing: Remote access technologies have enabled users to travel while still interacting with computational machinery left in the office or home. Yet, energy savings mechanisms have traditionally relied on the activity of attached peripherals to determine power usage. The shift to remote interaction, which bypasses physically attached peripherals, has effectively broken these energy savings mechanisms. Can we build an economic and practical system that accommodates energy efficiency without compromising the fluid remote interactions users have now come to expect? Wireless Computing: Increasingly advanced mobile devices have provoked a shift towards heavy usage of 3G and 4G bandwidth use. Accordingly, the capacity of infrastructure wireless networks becomes increasingly strained. Can we find a way of supplementing this relatively low-latency infrastructure with high-latency, high-bandwidth opportunistic content exchange? In each scenario, we design a solution that aims to strike the proper balance between adoptability and technical efficiency - producing what we believe are rigorous, practical and adoptable solutions
Balancing Interactive Performance and Budgeted Resources in Mobile Computing.
In this dissertation, we explore the various limited resources involved in mobile applications --- battery energy, cellular data usage, and, critically, user attention --- and we devise principled methods for managing the tradeoffs involved in creating a good user experience. Building quality mobile applications requires developers to understand complex interactions between network usage, performance, and resource consumption. Because of this
difficulty, developers commonly choose simple but suboptimal approaches that strictly prioritize performance or resource conservation.
These extremes are symptoms of a lack of system-provided abstractions for managing the complexity inherent in managing performance/resource tradeoffs. By providing abstractions that help applications manage these tradeoffs, mobile systems can significantly improve user-visible performance without exhausting resource budgets. This dissertation explores three such abstractions in detail. We first present Intentional
Networking, a system that provides synchronization primitives and intelligent scheduling for multi-network traffic. Next, we present Informed Mobile Prefetching, a system that helps applications decide when to prefetch data and how aggressively to spend limited battery energy and cellular data resources toward that end. Finally, we present Meatballs, a library that helps applications consider the cloudy nature of predictions when making decisions, selectively employing redundancy to mitigate uncertainty and provide more
reliable performance. Overall, experiments show that these abstractions can significantly reduce interactive delay without overspending the available energy and data resources.PHDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/108956/1/brettdh_1.pd
Smart PIN: performance and cost-oriented context-aware personal information network
The next generation of networks will involve interconnection of heterogeneous individual
networks such as WPAN, WLAN, WMAN and Cellular network, adopting the IP as common infrastructural protocol and providing virtually always-connected network. Furthermore,
there are many devices which enable easy acquisition and storage of information as pictures, movies, emails, etc. Therefore, the information overload and divergent contentâs
characteristics make it difficult for users to handle their data in manual way. Consequently, there is a need for personalised automatic services which would enable data exchange across heterogeneous network and devices. To support these personalised services, user centric approaches
for data delivery across the heterogeneous network are also required.
In this context, this thesis proposes Smart PIN - a novel performance and cost-oriented context-aware Personal Information Network. Smart PIN's architecture is detailed including its network, service and management components. Within the service component, two novel schemes for efficient delivery of context and content data are proposed:
Multimedia Data Replication Scheme (MDRS) and Quality-oriented Algorithm for Multiple-source Multimedia Delivery (QAMMD).
MDRS supports efficient data accessibility among distributed devices using data replication which is based on a utility function and a minimum data set. QAMMD employs a buffer underflow avoidance scheme for streaming, which achieves high multimedia quality without content adaptation to network conditions. Simulation models for MDRS and
QAMMD were built which are based on various heterogeneous network scenarios. Additionally a multiple-source streaming based on QAMMS was implemented as a prototype and tested in an emulated network environment. Comparative tests show that MDRS and QAMMD perform significantly better than other approaches
Virtual Machine Image Management for Elastic Resource Usage in Grid Computing
Grid Computing has evolved from an academic concept to a powerful paradigm in the area of high performance computing (HPC). Over the last few years, powerful Grid computing solutions were developed that allow the execution of computational tasks on distributed computing resources. Grid computing has recently attracted many commercial customers. To enable commercial customers to be able to execute sensitive data in the Grid, strong security mechanisms must be put in place to secure the customers' data.
In contrast, the development of Cloud Computing, which entered the scene in 2006, was driven by industry: it was designed with respect to security from the beginning. Virtualization technology is used to separate the users e.g., by putting the different users of a system inside a virtual machine, which prevents them from accessing other users' data.
The use of virtualization in the context of Grid computing has been examined early and was found to be a promising approach to counter the security threats that have appeared with commercial customers.
One main part of the work presented in this thesis is the Image Creation Station (ICS), a component which allows users to administer their virtual execution environments (virtual machines) themselves and which is responsible for managing and distributing the virtual machines in the entire system.
In contrast to Cloud computing, which was designed to allow even inexperienced users to execute their computational tasks in the Cloud easily, Grid computing is much more complex to use. The ICS makes it easier to use the Grid by overcoming traditional limitations like installing needed software on the compute nodes that users use to execute the computational tasks. This allows users to bring commercial software to the Grid for the first time, without the need for local administrators to install the software to computing nodes that are accessible by all users. Moreover, the administrative burden is shifted from the local Grid site's administrator to the users or experienced software providers that allow the provision of individually tailored virtual machines to each user. But the ICS is not only responsible for enabling users to manage their virtual machines themselves, it also ensures that the virtual machines are available on every site that is part of the distributed Grid system.
A second aspect of the presented solution focuses on the elasticity of the system by automatically acquiring free external resources depending on the system's current workload. In contrast to existing systems, the presented approach allows the system's administrator to add or remove resource sets during runtime without needing to restart the entire system. Moreover, the presented solution allows users to not only use existing Grid resources but allows them to scale out to Cloud resources and use these resources on-demand. By ensuring that unused resources are shut down as soon as possible, the computational costs of a given task are minimized. In addition, the presented solution allows each user to specify which resources can be used to execute a particular job. This is useful when a job processes sensitive data e.g., that is not allowed to leave the company. To obtain a comparable function in today's systems, a user must submit her computational task to a particular resource set, losing the ability to automatically schedule if more than one set of resources can be used.
In addition, the proposed solution prioritizes each set of resources by taking different metrics into account (e.g. the level of trust or computational costs) and tries to schedule the job to resources with the highest priority first. It is notable that the priority often mimics the physical distance from the resources to the user: a locally available Cluster usually has a higher priority due to the high level of trust and the computational costs, that are usually lower than the costs of using Cloud resources. Therefore, this scheduling strategy minimizes the costs of job execution by improving security at the same time since data is not necessarily transferred to remote resources and the probability of attacks by malicious external users is minimized.
Bringing both components together results in a system that adapts automatically to the current workload by using external (e.g., Cloud) resources together with existing locally available resources or Grid sites and provides individually tailored virtual execution environments to the system's users
Goddard Conference on Mass Storage Systems and Technologies, Volume 1
Copies of nearly all of the technical papers and viewgraphs presented at the Goddard Conference on Mass Storage Systems and Technologies held in Sep. 1992 are included. The conference served as an informational exchange forum for topics primarily relating to the ingestion and management of massive amounts of data and the attendant problems (data ingestion rates now approach the order of terabytes per day). Discussion topics include the IEEE Mass Storage System Reference Model, data archiving standards, high-performance storage devices, magnetic and magneto-optic storage systems, magnetic and optical recording technologies, high-performance helical scan recording systems, and low end helical scan tape drives. Additional topics addressed the evolution of the identifiable unit for processing purposes as data ingestion rates increase dramatically, and the present state of the art in mass storage technology
Personal Data Management in the Internet of Things
Due to a sharp decrease in hardware costs and shrinking form factors,
networked sensors have become ubiquitous.
Today, a variety of sensors are embedded
into smartphones, tablets, and personal wearable devices,
and are commonly installed in homes and buildings.
Sensors are used to collect data about people in their proximity, referred to as users.
The collection of such networked sensors is commonly referred to as the Internet of Things.
Although sensor data enables a wide range of
applications from security, to efficiency, to healthcare, this data can be used to reveal unwarranted private information about users.
Thus it is imperative to preserve data privacy while
providing users with a wide variety of applications to process their personal data.
Unfortunately, most existing systems do not meet these goals.
Users are either forced to release their data to third parties,
such as application developers, thus giving up data privacy in exchange for using data-driven applications,
or are limited to using a fixed set of applications, such as those provided by the sensor manufacturer.
To avoid this trade-off, users may chose to host their data and applications on their personal devices, but this
requires them to maintain data backups and ensure application performance.
What is needed, therefore, is a system that gives users flexibility in their choice of
data-driven applications while preserving their data privacy,
without burdening users with the need to backup their data and providing
computational resources for their applications.
We propose a software architecture that leverages a user's personal
virtual execution environment (VEE) to host data-driven applications.
This dissertation describes key software techniques and mechanisms that are
necessary to enable this architecture.
First, we provide a proof-of-concept implementation of our proposed architecture
and demonstrate a privacy-preserving ecosystem of applications that process
users' energy data as a case study.
Second, we present a data management system (called Bolt) that provides
applications with efficient storage and retrieval of time-series data,
and guarantees the confidentiality and integrity of stored data.
We then present a methodology to provision large numbers of
personal VEEs on a single physical machine, and demonstrate its use with LinuX Containers (LXC).
We conclude by outlining the design of an abstract framework to allow users to balance data privacy and application utility
Contribution à la convergence d'infrastructure entre le calcul haute performance et le traitement de données à large échelle
The amount of produced data, either in the scientific community or the commercialworld, is constantly growing. The field of Big Data has emerged to handle largeamounts of data on distributed computing infrastructures. High-Performance Computing (HPC) infrastructures are traditionally used for the execution of computeintensive workloads. However, the HPC community is also facing an increasingneed to process large amounts of data derived from high definition sensors andlarge physics apparati. The convergence of the two fields -HPC and Big Data- iscurrently taking place. In fact, the HPC community already uses Big Data tools,which are not always integrated correctly, especially at the level of the file systemand the Resource and Job Management System (RJMS).In order to understand how we can leverage HPC clusters for Big Data usage, andwhat are the challenges for the HPC infrastructures, we have studied multipleaspects of the convergence: We initially provide a survey on the software provisioning methods, with a focus on data-intensive applications. We contribute a newRJMS collaboration technique called BeBiDa which is based on 50 lines of codewhereas similar solutions use at least 1000 times more. We evaluate this mechanism on real conditions and in simulated environment with our simulator Batsim.Furthermore, we provide extensions to Batsim to support I/O, and showcase thedevelopments of a generic file system model along with a Big Data applicationmodel. This allows us to complement BeBiDa real conditions experiments withsimulations while enabling us to study file system dimensioning and trade-offs.All the experiments and analysis of this work have been done with reproducibilityin mind. Based on this experience, we propose to integrate the developmentworkflow and data analysis in the reproducibility mindset, and give feedback onour experiences with a list of best practices.RĂ©sumĂ©La quantitĂ© de donnĂ©es produites, que ce soit dans la communautĂ© scientifiqueou commerciale, est en croissance constante. Le domaine du Big Data a Ă©mergĂ©face au traitement de grandes quantitĂ©s de donnĂ©es sur les infrastructures informatiques distribuĂ©es. Les infrastructures de calcul haute performance (HPC) sont traditionnellement utilisĂ©es pour lâexĂ©cution de charges de travail intensives en calcul. Cependant, la communautĂ© HPC fait Ă©galement face Ă un nombre croissant debesoin de traitement de grandes quantitĂ©s de donnĂ©es dĂ©rivĂ©es de capteurs hautedĂ©finition et de grands appareils physique. La convergence des deux domaines-HPC et Big Data- est en cours. En fait, la communautĂ© HPC utilise dĂ©jĂ des outilsBig Data, qui ne sont pas toujours correctement intĂ©grĂ©s, en particulier au niveaudu systĂšme de fichiers ainsi que du systĂšme de gestion des ressources (RJMS).Afin de comprendre comment nous pouvons tirer parti des clusters HPC pourlâutilisation du Big Data, et quels sont les dĂ©fis pour les infrastructures HPC, nousavons Ă©tudiĂ© plusieurs aspects de la convergence: nous avons dâabord proposĂ© uneĂ©tude sur les mĂ©thodes de provisionnement logiciel, en mettant lâaccent sur lesapplications utilisant beaucoup de donnĂ©es. Nous contribuons a lâĂ©tat de lâart avecune nouvelle technique de collaboration entre RJMS appelĂ©e BeBiDa basĂ©e sur 50lignes de code alors que des solutions similaires en utilisent au moins 1000 fois plus.Nous Ă©valuons ce mĂ©canisme en conditions rĂ©elles et en environnement simulĂ©avec notre simulateur Batsim. En outre, nous fournissons des extensions Ă Batsimpour prendre en charge les entrĂ©es/sorties et prĂ©sentons le dĂ©veloppements dâunmodĂšle de systĂšme de fichiers gĂ©nĂ©rique accompagnĂ© dâun modĂšle dâapplicationBig Data. Cela nous permet de complĂ©ter les expĂ©riences en conditions rĂ©ellesde BeBiDa en simulation tout en Ă©tudiant le dimensionnement et les diffĂ©rentscompromis autours des systĂšmes de fichiers.Toutes les expĂ©riences et analyses de ce travail ont Ă©tĂ© effectuĂ©es avec la reproductibilitĂ© Ă lâesprit. Sur la base de cette expĂ©rience, nous proposons dâintĂ©grerle flux de travail du dĂ©veloppement et de lâanalyse des donnĂ©es dans lâesprit dela reproductibilitĂ©, et de donner un retour sur nos expĂ©riences avec une liste debonnes pratiques