702 research outputs found
Recommended from our members
NVSwap Latency-Aware Paging Using Non-Volatile Main Memory
Page relocation (paging) from DRAM to swap devices is an important task of a virtual memory system in operating systems. Existing Linux paging mechanisms have two main deficiencies: (1) they may incur a high I/O latency due to write interference on solid-state disks and aggressive memory page reclaiming rate under high memory pressure and (2) they do not provide predictable latency bound for latency-sensitive applications because they cannot control the allocation of system resources among concurrent processes sharing swap devices. In this thesis, we present the design and implementation of a latency-aware paging mechanism called NVSwap. It supports a hybrid swap space using both regular secondary storage devices (e.g., solid-state disks) and non-volatile main memory (NVMM). The design is more cost-effective than using only NVMM as swap spaces. Furthermore, NVSwap uses NVMM as a persistent paging buffer to serve the page-out requests and hide the latency of paging between the regular swap device and DRAM. It supports in-situ paging for pages in the persistent paging buffer avoiding the slow I/O path. Finally, NVSwap allows users to specify latency bounds for individual processes or a group of related processes and enforces the bounds by dynamically controlling the resource allocation of NVMM and page reclaiming rate in memory among scheduling units. We have implemented a prototype of NVSwap in the Linux kernel-3.16.74. Our results demonstrate that NVSwap reduces paging latency by up to 99% and provides performance guarantee and isolation among concurrent applications sharing swap devices
A comprehensive approach to MPSoC security: achieving network-on-chip security : a hierarchical, multi-agent approach
Multiprocessor Systems-on-Chip (MPSoCs) are pervading our lives, acquiring ever increasing relevance in a large number of applications, including even safety-critical ones. MPSoCs, are becoming increasingly complex and heterogeneous; the Networks on Chip (NoC paradigm has been introduced to support scalable on-chip communication, and (in some cases) even with reconfigurability support. The increased complexity as well as the networking approach in turn make security aspects more critical. In this work we propose and implement a hierarchical multi-agent approach providing solutions to secure NoC based MPSoCs at different levels of design. We develop a flexible, scalable and modular structure that integrates protection of different elements in the MPSoC (e.g. memory, processors) from different attack scenarios. Rather than focusing on protection strategies specifically devised for an individual attack or a particular core, this work aims at providing a comprehensive, system-level protection strategy: this constitutes its main methodological contribution. We prove feasibility of the concepts via prototype realization in FPGA technology
A storage architecture for data-intensive computing
The assimilation of computing into our daily lives is enabling the generation of data at unprecedented rates. In 2008, IDC estimated that the "digital universe" contained 486 exabytes of data [9]. The computing industry is being challenged to develop methods for the cost-effective processing of data at these large scales. The MapReduce programming model has emerged as a scalable way to perform data-intensive computations on commodity cluster computers. Hadoop is a popular open-source implementation of MapReduce. To manage storage resources across the cluster, Hadoop uses a distributed user-level filesystem. This filesystem --- HDFS --- is written in Java and designed for portability across heterogeneous hardware and software platforms. The efficiency of a Hadoop cluster depends heavily on the performance of this underlying storage system.
This thesis is the first to analyze the interactions between Hadoop and storage. It describes how the user-level Hadoop filesystem, instead of efficiently capturing the full performance potential of the underlying cluster hardware, actually degrades application performance significantly. Architectural bottlenecks in the Hadoop implementation result in inefficient HDFS usage due to delays in scheduling new MapReduce tasks. Further, HDFS implicitly makes assumptions about how the underlying native platform manages storage resources, even though native filesystems and I/O schedulers vary widely in design and behavior. Methods to eliminate these bottlenecks in HDFS are proposed and evaluated both in terms of their application performance improvement and impact on the portability of the Hadoop framework.
In addition to improving the performance and efficiency of the Hadoop storage system, this thesis also focuses on improving its flexibility. The goal is to allow Hadoop to coexist in cluster computers shared with a variety of other applications through the use of virtualization technology. The introduction of virtualization breaks the traditional Hadoop storage architecture, where persistent HDFS data is stored on local disks installed directly in the computation nodes. To overcome this challenge, a new flexible network-based storage architecture is proposed, along with changes to the HDFS framework. Network-based storage enables Hadoop to operate efficiently in a dynamic virtualized environment and furthers the spread of the MapReduce parallel programming model to new applications
Proceedings of the Second International Workshop on HyperTransport Research and Applications (WHTRA2011)
Proceedings of the Second International Workshop on HyperTransport Research and Applications (WHTRA2011) which was held Feb. 9th 2011 in Mannheim, Germany. The Second International Workshop for Research on HyperTransport is an international high quality forum for scientists, researches and developers working in the area of HyperTransport. This includes not only developments and research in HyperTransport itself, but also work which is based on or enabled by HyperTransport. HyperTransport (HT) is an interconnection technology which is typically used as system interconnect in modern computer systems, connecting the CPUs among each other and with the I/O bridges. Primarily designed as interconnect between high performance CPUs it provides an extremely low latency, high bandwidth and excellent scalability. The definition of the HTX connector allows the use of HT even for add-in cards. In opposition to other peripheral interconnect technologies like PCI-Express no protocol conversion or intermediate bridging is necessary. HT is a direct connection between device and CPU with minimal latency. Another advantage is the possibility of cache coherent devices. Because of these properties HT is of high interest for high performance I/O like networking and storage, but also for co-processing and acceleration based on ASIC or FPGA technologies. In particular acceleration sees a resurgence of interest today. One reason is the possibility to reduce power consumption by the use of accelerators. In the area of parallel computing the low latency communication allows for fine grain communication schemes and is perfectly suited for scalable systems. Summing up, HT technology offers key advantages and great performance to any research aspect related to or based on interconnects. For more information please consult the workshop website (http://whtra.uni-hd.de)
Recommended from our members
Fluidmem: Open Source Full Memory Disaggregation
To satisfy the performance demands of memory-intensive applications facing DRAM shortages, the focus of previous work has been on incorporating remote memory to expand capacity. However, the emergence of resource balancing as a priority for cloud computing requires the capability to dynamically size virtual machine memory up and down. Furthermore, hardware-based or kernel space implementations hamper flexibility with respect to making customizations or integrating the continuing open source advancements in software infrastructure for the datacenter. This thesis presents an architecture to meet the performance, bi-directional sizing, and flexibility challenges of memory disaggregation in the cloud. The implementation, called FluidMem, is open source software that integrates with the Linux kernel, KVM hypervisor, and multiple key-values stores. With FluidMem, a virtual machine's local memory can be transparently extended or entirely transferred to a remote key-value store. By fully implementing the dynamic aspect of data center memory disaggregation, FluidMem allows a VM's footprint to be precisely sized, expandable for application demands, but it leaves cloud operators with a non-intrusive recourse if memory becomes scarce. Full memory disaggregation in FluidMem enables the local memory footprint of a Linux VM to be scaled down to 180 pages (720 KB), yet still accept SSH logins. A user space page fault handler in FluidMem uses the userfaultfd mechanism in the Linux kernel to relocate any page of the VM to remote memory and outperforms swap-based paging. Page fault latencies via FluidMem to RAMCloud are 40% faster than the RDMA remote memory swap device in the Linux kernel, and 77% faster than SSD swap. FluidMem's remote memory expansion performance with three key-value backends is evaluated against swap-based alternatives for a MongoDB workload and the Graph500 benchmark
Lessons learnt on the analysis of large sequence data in animal genomics
The âomics revolution has made a large amount of sequence data available to researchers and the industry. This has had a profound impact in the field of bioinformatics, stimulating unprecedented advancements in this discipline. Mostly, this is usually looked at from the perspective of human âomics, in particular human genomics. Plant and animal genomics, however, have also been deeply influenced by nextâgeneration sequencing technologies, with several genomics applications now popular among researchers and the breeding industry. Genomics tends to generate huge amounts of data, and genomic sequence data account for an increasing proportion of big data in biological sciences, due largely to decreasing sequencing and genotyping costs and to largeâscale sequencing and resequencing projects. The analysis of big data poses a challenge to scientists, as data gathering currently takes place at a faster pace than does data processing and analysis, and the associated computational burden is increasingly taxing, making even simple manipulation, visualization and transferring of data a cumbersome operation. The time consumed by the processing and analysing of huge data sets may be at the expense of data quality assessment and critical interpretation. Additionally, when analysing lots of data, something is likely to go awryâthe software may crash or stopâand it can be very frustrating to track the error. We herein review the most relevant issues related to tackling these challenges and problems, from the perspective of animal genomics, and provide researchers that lack extensive computing experience with guidelines that will help when processing large genomic data sets
Architecting Efficient Data Centers.
Data center power consumption has become a key constraint in continuing to scale Internet services. As our societyâs reliance on âthe Cloudâ continues to grow, companies require an ever-increasing amount of computational capacity to support their customers. Massive warehouse-scale data centers have emerged, requiring 30MW or more of total power capacity. Over the lifetime of a typical high-scale data center, power-related costs make up 50% of the total cost of ownership (TCO). Furthermore, the aggregate effect of data center power consumption across the country cannot be ignored. In total, data center energy usage has reached approximately 2% of aggregate consumption in the United States and continues to grow.
This thesis addresses the need to increase computational efficiency to address this grow- ing problem. It proposes a new classes of power management techniques: coordinated full-system idle low-power modes to increase the energy proportionality of modern servers. First, we introduce the PowerNap server architecture, a coordinated full-system idle low- power mode which transitions in and out of an ultra-low power nap state to save power during brief idle periods. While effective for uniprocessor systems, PowerNap relies on full-system idleness and we show that such idleness disappears as the number of cores per processor continues to increase. We expose this problem in a case study of Google Web search in which we demonstrate that coordinated full-system active power modes are necessary to reach energy proportionality and that PowerNap is ineffective because of a lack of idleness. To recover full-system idleness, we introduce DreamWeaver, architectural support for deep sleep. DreamWeaver allows a server to exchange latency for full-system idleness, allowing PowerNap-enabled servers to be effective and provides a better latency- power savings tradeoff than existing approaches. Finally, this thesis investigates workloads which achieve efficiency through methodical cluster provisioning techniques. Using the popular memcached workload, this thesis provides examples of provisioning clusters for cost-efficiency given latency, throughput, and data set size targets.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/91499/1/meisner_1.pd
Doctor of Philosophy
dissertationWe develop a novel framework for friend-to-friend (f2f) distributed services (F3DS) by which applications can easily offer peer-to-peer (p2p) services among social peers with resource sharing governed by approximated levels of social altruism. Our frame- work differs significantly from typical p2p collaboration in that it provides a founda- tion for distributed applications to cooperate based on pre-existing trust and altruism among social peers. With the goal of facilitating the approximation of relative levels of altruism among social peers within F3DS, we introduce a new metric: SocialDistance. SocialDistance is a synthetic metric that combines direct levels of altruism between peers with an altruism decay for each hop to approximate indirect levels of altruism. The resulting multihop altruism levels are used by F3DS applications to proportion and prioritize the sharing of resources with other social peers. We use SocialDistance to implement a novel flash file/patch distribution method, SocialSwarm. SocialSwarm uses the SocialDistance metric as part of its resource allocation to overcome the neces- sity of (and inefficiency created by) resource bartering among friends participating in a BitTorrent swarm. We find that SocialSwarm achieves an average file download time reduction of 25% to 35% in comparison with standard BitTorrent under a variety of configurations and conditions, including file sizes, maximum SocialDistance, as well as leech and seed counts. The most socially connected peers yield up to a 47% decrease in download completion time in comparison with average nonsocial BitTorrent swarms. We also use the F3DS framework to implement novel malware detection application- F3DS Antivirus (F3AV)-and evaluate it on the Amazon cloud. We show that with f2f sharing of resources, F3AV achieves a 65% increase in the detection rate of 0- to 1-day-old malware among social peers as compared to the average of individual scanners. Furthermore, we show that F3AV provides the greatest diversity of mal- ware scanners (and thus malware protection) to social hubs-those nodes that are positioned to provide strategic defense against socially aware malware
Introductory Computer Forensics
INTERPOL (International Police) built cybercrime programs to keep up with emerging cyber threats, and aims to coordinate and assist international operations for ?ghting crimes involving computers. Although signi?cant international efforts are being made in dealing with cybercrime and cyber-terrorism, ?nding effective, cooperative, and collaborative ways to deal with complicated cases that span multiple jurisdictions has proven dif?cult in practic
- âŠ