455 research outputs found

    A Study of Linux Perf and Slab Allocation Sub-Systems

    Get PDF
    Today, modern processors are equipped with a special unit named PMU that enables software developers to gain access to micro-architectural level information such as CPU cycles count and executed instructions count. The PMU provides a set of programmable registers called hardware performance counters that can be programmed to count the specific hardware events. In the Linux operating system, many low-level interfaces are designed to provide access to the hardware counters facilities. One of these interfaces is perf_event, which was merged as a sub-system to the kernel mainline in 2009, and became a widely used interface for hardware counters. Firstly, we investigate the perf_event Linux sub-system in the kernel-level by exploring the kernel source code to identify the potential sources of overhead and counting error. We also study the Perf tool as one of the end-user interfaces that was built on top of the perf_event sub-system to provide an easy-to-use measurement and profiling tool in the Linux operating system. Moreover, we conduct some experiments on a variety of processors to analyze the overhead, determinism, and accuracy of the Perf tool and the underlying perf_event sub-system in counting hardware events. Although our results show 47% error in counting the number of taken branches as well as 5.92% relative overhead on the Intel Pentium 4 processors, we do not observe a significant overhead or defect on the modern x86 and ARM processors. Secondly, we explore a memory management sub-system of Linux kernel called slab allocator, that plays a crucial role in the overall performance of the system. We study three different implementations of the slab allocator that are currently available in the Linux kernel mainline and enumerate the advantages and disadvantages of each implementation. We also investigate the binning effect of the slab allocator on the Linux system calls execution time variation. Moreover, we introduce a new metric called Slab Metric that is assigned to each system call to represent the interaction level with the slab allocator. The results show a correlation coefficient of 0.78 between the dynamic slab metric and the execution time variation of the Linux system calls

    Automated Virtual Machine Introspection for Host-Based Intrusion Detection

    Get PDF
    This thesis examines techniques to automate configuration of an intrusion detection system utilizing hardware-assisted virtualization. These techniques are used to detect the version of a running guest operating system, automatically configure version-specific operating system information needed by the introspection library, and to locate and monitor important operating system data structures. This research simplifies introspection library configuration and is a step toward operating system independent introspection. An operating system detection algorithm and Windows virtual machine system service dispatch table monitor are implemented using the Xen hypervisor and a modified version of the XenAccess library. All detection and monitoring is implemented from the Xen management domain. Results of the operating system detection are used to initialize the XenAccess library. Library initialization time and kernel symbol retrieval are compared to the standard library. The algorithm is evaluated using nine versions of the Windows operating system. The system service dispatch table monitor is evaluated using the Agony and ProAgent rootkits. The automation techniques successfully detect the operating system and system service dispatch table hooks for the nine Windows versions tested. The modified XenAccess library exhibits an average initialization speedup of 1.9. Kernel symbol lookup is 10 times faster, on average. The hook detector is able to detect all hooks used by both rookits

    Efficient, transparent, and comprehensive runtime code manipulation

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2004.Includes bibliographical references (p. 293-306).This thesis addresses the challenges of building a software system for general-purpose runtime code manipulation. Modern applications, with dynamically-loaded modules and dynamically-generated code, are assembled at runtime. While it was once feasible at compile time to observe and manipulate every instruction--which is critical for program analysis, instrumentation, trace gathering, optimization, and similar tools--it can now only be done at runtime. Existing runtime tools are successful at inserting instrumentation calls, but no general framework has been developed for fine-grained and comprehensive code observation and modification without high overheads. This thesis demonstrates the feasibility of building such a system in software. We present DynamoRIO, a fully-implemented runtime code manipulation system that supports code transformations on any part of a program, while it executes. DynamoRIO uses code caching technology to provide efficient, transparent, and comprehensive manipulation of an unmodified application running on a stock operating system and commodity hardware. DynamoRIO executes large, complex, modern applications with dynamically-loaded, generated, or even modified code. Despite the formidable obstacles inherent in the IA-32 architecture, DynamoRIO provides these capabilities efficiently, with zero to thirty percent time and memory overhead on both Windows and Linux. DynamoRIO exports an interface for building custom runtime code manipulation tools of all types. It has been used by many researchers, with several hundred downloads of our public release, and is being commercialized in a product for protection against remote security exploits, one of numerous applications of runtime code manipulation.by Derek L. Bruening.Ph.D

    Understanding the thermal implications of multicore architectures

    Get PDF
    Multicore architectures are becoming the main design paradigm for current and future processors. The main reason is that multicore designs provide an effective way of overcoming instruction-level parallelism (ILP) limitations by exploiting thread-level parallelism (TLP). In addition, it is a power and complexity-effective way of taking advantage of the huge number of transistors that can be integrated on a chip. On the other hand, today's higher than ever power densities have made temperature one of the main limitations of microprocessor evolution. Thermal management in multicore architectures is a fairly new area. Some works have addressed dynamic thermal management in bi/quad-core architectures. This work provides insight and explores different alternatives for thermal management in multicore architectures with 16 cores. Schemes employing both energy reduction and activity migration are explored and improvements for thread migration schemes are proposed.Peer ReviewedPostprint (published version

    Passive available bandwidth: Applying self -induced congestion analysis of application-generated traffic

    Get PDF
    Monitoring end-to-end available bandwidth is critical in helping applications and users efficiently use network resources. Because the performance of distributed systems is intrinsically linked to the performance of the network, applications that have knowledge of the available bandwidth can adapt to changing network conditions and optimize their performance. A well-designed available bandwidth tool should be easily deployable and non-intrusive. While several tools have been created to actively measure the end-to-end available bandwidth of a network path, they require instrumentation at both ends of the path, and the traffic injected by these tools may affect the performance of other applications on the path.;We propose a new passive monitoring system that accurately measures available bandwidth by applying self-induced congestion analysis to traces of application-generated traffic. The Watching Resources from the Edge of the Network (Wren) system transparently provides available bandwidth information to applications without having to modify the applications to make the measurements and with negligible impact on the performance of applications. Wren produces a series of real-time available bandwidth measurements that can be used by applications to adapt their runtime behavior to optimize performance or that can be sent to a central monitoring system for use by other or future applications.;Most active bandwidth tools rely on adjustments to the sending rate of packets to infer the available bandwidth. The major obstacle with using passive kernel-level traces of TCP traffic is that we have no control over the traffic pattern. We demonstrate that there is enough natural variability in the sending rates of TCP traffic that techniques used by active tools can be applied to traces of application-generated traffic to yield accurate available bandwidth measurements.;Wren uses kernel-level instrumentation to collect traces of application traffic and analyzes the traces in the user-level to achieve the necessary accuracy and avoid intrusiveness. We introduce new passive bandwidth algorithms based on the principles of the active tools to measure available bandwidth, investigate the effectiveness of these new algorithms, implement a real-time system capable of efficiently monitoring available bandwidth, and demonstrate that applications can use Wren measurements to adapt their runtime decisions

    From Map to Dist: the Evolution of a Large-Scale Wlan Monitoring System

    Get PDF
    The edge of the Internet is increasingly becoming wireless. Therefore, monitoring the wireless edge is important to understanding the security and performance aspects of the Internet experience. We have designed and implemented a large-scale WLAN monitoring system, the Distributed Internet Security Testbed (DIST), at Dartmouth College. It is equipped with distributed arrays of “sniffers” that cover 210 diverse campus locations and more than 5,000 users. In this paper, we describe our approach, designs and solutions for addressing the technical challenges that have resulted from efficiency, scalability, security, and management perspectives. We also present extensive evaluation results on a production network, and summarize the lessons learned

    Cross-core Microarchitectural Attacks and Countermeasures

    Get PDF
    In the last decade, multi-threaded systems and resource sharing have brought a number of technologies that facilitate our daily tasks in a way we never imagined. Among others, cloud computing has emerged to offer us powerful computational resources without having to physically acquire and install them, while smartphones have almost acquired the same importance desktop computers had a decade ago. This has only been possible thanks to the ever evolving performance optimization improvements made to modern microarchitectures that efficiently manage concurrent usage of hardware resources. One of the aforementioned optimizations is the usage of shared Last Level Caches (LLCs) to balance different CPU core loads and to maintain coherency between shared memory blocks utilized by different cores. The latter for instance has enabled concurrent execution of several processes in low RAM devices such as smartphones. Although efficient hardware resource sharing has become the de-facto model for several modern technologies, it also poses a major concern with respect to security. Some of the concurrently executed co-resident processes might in fact be malicious and try to take advantage of hardware proximity. New technologies usually claim to be secure by implementing sandboxing techniques and executing processes in isolated software environments, called Virtual Machines (VMs). However, the design of these isolated environments aims at preventing pure software- based attacks and usually does not consider hardware leakages. In fact, the malicious utilization of hardware resources as covert channels might have severe consequences to the privacy of the customers. Our work demonstrates that malicious customers of such technologies can utilize the LLC as the covert channel to obtain sensitive information from a co-resident victim. We show that the LLC is an attractive resource to be targeted by attackers, as it offers high resolution and, unlike previous microarchitectural attacks, does not require core-colocation. Particularly concerning are the cases in which cryptography is compromised, as it is the main component of every security solution. In this sense, the presented work does not only introduce three attack variants that can be applicable in different scenarios, but also demonstrates the ability to recover cryptographic keys (e.g. AES and RSA) and TLS session messages across VMs, bypassing sandboxing techniques. Finally, two countermeasures to prevent microarchitectural attacks in general and LLC attacks in particular from retrieving fine- grain information are presented. Unlike previously proposed countermeasures, ours do not add permanent overheads in the system but can be utilized as preemptive defenses. The first identifies leakages in cryptographic software that can potentially lead to key extraction, and thus, can be utilized by cryptographic code designers to ensure the sanity of their libraries before deployment. The second detects microarchitectural attacks embedded into innocent-looking binaries, preventing them from being posted in official application repositories that usually have the full trust of the customer
    • …
    corecore