1,099 research outputs found
The Mercury System: Exploiting Truly Fast Hardware in Data Mining
In many data mining applications, the size of the database is not only extremely large, it is also growing rapidly. Even for relatively simple searches, the time required to move the data off magnetic media, cross the system bus into main memory, copy into processor cache, and then execute code to perform a search is prohibitive. We are building a system in which a significant portion of the data mining task (i.e., the portion that examines the bulk of the raw data) is implemented in fast hardware, close to the magnetic media on which it is stored. Furthermore, this hardware can be replicated allowing mining tasks to be performed in parallel, thus providing further speedup for the overall mining application. In this paper, we describe a general framework under which this can be accomplished and provide initial performance results for a set of applications
Technology Directions for the 21st Century, volume 1
For several decades, semiconductor device density and performance have been doubling about every 18 months (Moore's Law). With present photolithography techniques, this rate can continue for only about another 10 years. Continued improvement will need to rely on newer technologies. Transition from the current micron range for transistor size to the nanometer range will permit Moore's Law to operate well beyond 10 years. The technologies that will enable this extension include: single-electron transistors; quantum well devices; spin transistors; and nanotechnology and molecular engineering. Continuation of Moore's Law will rely on huge capital investments for manufacture as well as on new technologies. Much will depend on the fortunes of Intel, the premier chip manufacturer, which, in turn, depend on the development of mass-market applications and volume sales for chips of higher and higher density. The technology drivers are seen by different forecasters to include video/multimedia applications, digital signal processing, and business automation. Moore's Law will affect NASA in the areas of communications and space technology by reducing size and power requirements for data processing and data fusion functions to be performed onboard spacecraft. In addition, NASA will have the opportunity to be a pioneering contributor to nanotechnology research without incurring huge expenses
Recommended from our members
High-Performance Integrated Window and Façade Solutions for California
The researchers developed a new generation of high-performance façade systems and supporting design and management tools to support industry in meeting California’s greenhouse gas reduction targets, reduce energy consumption, and enable an adaptable response to minimize real-time demands on the electricity grid. The project resulted in five outcomes: (1) The research team developed an R-5, 1-inch thick, triplepane, insulating glass unit with a novel low-conductance aluminum frame. This technology can help significantly reduce residential cooling and heating loads, particularly during the evening. (2) The team developed a prototype of a windowintegrated local ventilation and energy recovery device that provides clean, dry fresh air through the façade with minimal energy requirements. (3) A daylight-redirecting louver system was prototyped to redirect sunlight 15–40 feet from the window. Simulations estimated that lighting energy use could be reduced by 35–54 percent without glare. (4) A control system incorporating physics-based equations and a mathematical solver was prototyped and field tested to demonstrate feasibility. Simulations estimated that total electricity costs could be reduced by 9-28 percent on sunny summer days through adaptive control of operable shading and daylighting components and the thermostat compared to state-of-the-art automatic façade controls in commercial building perimeter zones. (5) Supporting models and tools needed by industry for technology R&D and market transformation activities were validated. Attaining California’s clean energy goals require making a fundamental shift from today’s ad-hoc assemblages of static components to turnkey, intelligent, responsive, integrated building façade systems. These systems offered significant reductions in energy use, peak demand, and operating cost in California
Recommended from our members
Scalable Emulation of Heterogeneous Systems
The breakdown of Dennard's transistor scaling has driven computing systems toward application-specific accelerators, which can provide orders-of-magnitude improvements in performance and energy efficiency over general-purpose processors.
To enable the radical departures from conventional approaches that heterogeneous systems entail, research infrastructure must be able to model processors, memory and accelerators, as well as system-level changes---such as operating system or instruction set architecture (ISA) innovations---that might be needed to realize the accelerators' potential. Unfortunately, existing simulation tools that can support such system-level research are limited by the lack of fast, scalable machine emulators to drive execution.
To fill this need, in this dissertation we first present a novel machine emulator design based on dynamic binary translation that makes the following improvements over the state of the art: it scales on multicore hosts while remaining memory efficient, correctly handles cross-ISA differences in atomic instruction semantics, leverages the host floating point (FP) unit to speed up FP emulation without sacrificing correctness, and can be efficiently instrumented to---among other possible uses---drive the execution of a full-system, cross-ISA simulator with support for accelerators.
We then demonstrate the utility of machine emulation for studying heterogeneous systems by leveraging it to make two additional contributions. First, we quantify the trade-offs in different coupling models for on-chip accelerators. Second, we present a technique to reuse the private memories of on-chip accelerators when they are otherwise inactive to expand the system's last-level cache, thereby reducing the opportunity cost of the accelerators' integration
Recommended from our members
Multi-core processors and the future of parallelism in software
The purpose of this thesis is to examine multi-core technology. Multi-core architecture provides benefits such as less power consumption, scalability, and improved application performance enabled by thread-level parallelism
The testing, analysis, and correction of the update operation of a parallel, multi-backend supercomputer.
The Multi-Backend Database Supercomputer (MBDS) is designed to provide high-performance
database management parallely for applications with very large and growing databases. This thesis
is a testing, analysis of and correction of the primary database operation UPDATE of MBDS. We
provide an overview of the entire MBDS system and then focus on the parallel UPDATE operation
in an attempt to discover and correct the deficiencies of the original UPDATE algorithm.http://archive.org/details/testinganalysisc00willLieutenant, United States NavyApproved for public release; distribution is unlimited
Memory Systems and Interconnects for Scale-Out Servers
The information revolution of the last decade has been fueled by the digitization of almost all human activities through a wide range of Internet services. The backbone of this information age are scale-out datacenters that need to collect, store, and process massive amounts of data. These datacenters distribute vast datasets across a large number of servers, typically into memory-resident shards so as to maintain strict quality-of-service guarantees. While data is driving the skyrocketing demands for scale-out servers, processor and memory manufacturers have reached fundamental efficiency limits, no longer able to increase server energy efficiency at a sufficient pace. As a result, energy has emerged as the main obstacle to the scalability of information technology (IT) with huge economic implications. Delivering sustainable IT calls for a paradigm shift in computer system design. As memory has taken a central role in IT infrastructure, memory-centric architectures are required to fully utilize the IT's costly memory investment. In response, processor architects are resorting to manycore architectures to leverage the abundant request-level parallelism found in data-centric applications. Manycore processors fully utilize available memory resources, thereby increasing IT efficiency by almost an order of magnitude. Because manycore server chips execute a large number of concurrent requests, they exhibit high incidence of accesses to the last-level-cache for fetching instructions (due to large instruction footprints), and off-chip memory (due to lack of temporal reuse in on-chip caches) for accessing dataset objects. As a result, on-chip interconnects and the memory system are emerging as major performance and energy-efficiency bottlenecks in servers. This thesis seeks to architect on-chip interconnects and memory systems that are tuned for the requirements of memory-centric scale-out servers. By studying a wide range of data-centric applications, we uncover application phenomena common in data-centric applications, and examine their implications on on-chip network and off-chip memory traffic. Finally, we propose specialized on-chip interconnects and memory systems that leverage common traffic characteristics, thereby improving server throughput and energy efficiency
Doctor of Philosophy
dissertationWith the explosion of chip transistor counts, the semiconductor industry has struggled with ways to continue scaling computing performance in line with historical trends. In recent years, the de facto solution to utilize excess transistors has been to increase the size of the on-chip data cache, allowing fast access to an increased portion of main memory. These large caches allowed the continued scaling of single thread performance, which had not yet reached the limit of instruction level parallelism (ILP). As we approach the potential limits of parallelism within a single threaded application, new approaches such as chip multiprocessors (CMP) have become popular for scaling performance utilizing thread level parallelism (TLP). This dissertation identifies the operating system as a ubiquitous area where single threaded performance and multithreaded performance have often been ignored by computer architects. We propose that novel hardware and OS co-design has the potential to significantly improve current chip multiprocessor designs, enabling increased performance and improved power efficiency. We show that the operating system contributes a nontrivial overhead to even the most computationally intense workloads and that this OS contribution grows to a significant fraction of total instructions when executing several common applications found in the datacenter. We demonstrate that architectural improvements have had little to no effect on the performance of the OS over the last 15 years, leaving ample room for improvements. We specifically consider three potential solutions to improve OS execution on modern processors. First, we consider the potential of a separate operating system processor (OSP) operating concurrently with general purpose processors (GPP) in a chip multiprocessor organization, with several specialized structures acting as efficient conduits between these processors. Second, we consider the potential of segregating existing caching structures to decrease cache interference between the OS and application. Third, we propose that there are components within the OS itself that should be refactored to be both multithreaded and cache topology aware, which in turn, improves the performance and scalability of many-threaded applications
- …