662 research outputs found

    Large-Eddy Simulations of Flow and Heat Transfer in Complex Three-Dimensional Multilouvered Fins

    Get PDF
    The paper describes the computational procedure and results from large-eddy simulations in a complex three-dimensional louver geometry. The three-dimensionality in the louver geometry occurs along the height of the fin, where the angled louver transitions to the flat landing and joins with the tube surface. The transition region is characterized by a swept leading edge and decreasing flow area between louvers. Preliminary results show a high energy compact vortex jet forming in this region. The jet forms in the vicinity of the louver junction with the flat landing and is drawn under the louver in the transition region. Its interaction with the surface of the louver produces vorticity of the opposite sign, which aids in augmenting heat transfer on the louver surface. The top surface of the louver in the transition region experiences large velocities in the vicinity of the surface and exhibits higher heat transfer coefficients than the bottom surface.Air Conditioning and Refrigeration Project 9

    H-SIMD machine : configurable parallel computing for data-intensive applications

    Get PDF
    This dissertation presents a hierarchical single-instruction multiple-data (H-SLMD) configurable computing architecture to facilitate the efficient execution of data-intensive applications on field-programmable gate arrays (FPGAs). H-SIMD targets data-intensive applications for FPGA-based system designs. The H-SIMD machine is associated with a hierarchical instruction set architecture (HISA) which is developed for each application. The main objectives of this work are to facilitate ease of program development and high performance through ease of scheduling operations and overlapping communications with computations. The H-SIMD machine is composed of the host, FPGA and nano-processor layers. They execute host SIMD instructions (HSIs), FPGA SIMD instructions (FSIs) and nano-processor instructions (NPLs), respectively. A distinction between communication and computation instructions is intended for all the HISA layers. The H-SIMD machine also employs a memory switching scheme to bridge the omnipresent large bandwidth gaps in configurable systems. To showcase the proposed high-performance approach, the conditions to fully overlap communications with computations are investigated for important applications. The building blocks in the H-SLMD machine, such as high-performance and area-efficient register files, are presented in detail. The H-SLMD machine hierarchy is implemented on a host Dell workstation and the Annapolis Wildstar II FPGA board. Significant speedups have been achieved for matrix multiplication (MM), 2-dimensional discrete cosine transform (2D DCT) and 2-dimensional fast Fourier transform (2D FFT) which are used widely in science and engineering. In another FPGA-based programming paradigm, a high-level language (here ANSI C) can be used to program the FPGAs in a mode similar to that of the H-SIMD machine in terms of trying to minimize the effect of overheads. More specifically, a multi-threaded overlapping scheme is proposed to reduce as much as possible, or even completely hide, runtime FPGA reconfiguration overheads. Nevertheless, although the HLL-enabled reconfigurable machine allows software developers to customize FPGA functions easily, special architecture techniques are needed to achieve high-performance without significant penalty on area and clock frequency. Two important high-performance applications, matrix multiplication and image edge detection, are tested on the SRC-6 reconfigurable machine. The implemented algorithms are able to exploit the available data parallelism with independent functional units and application-specific cache support. Relevant performance and design tradeoffs are analyzed

    Performance Modeling and Prediction for the Scalable Solution of Partial Differential Equations on Unstructured Grids

    Get PDF
    This dissertation studies the sources of poor performance in scientific computing codes based on partial differential equations (PDEs), which typically perform at a computational rate well below other scientific simulations (e.g., those with dense linear algebra or N-body kernels) on modern architectures with deep memory hierarchies. We identify that the primary factors responsible for this relatively poor performance are: insufficient available memory bandwidth, low ratio of work to data size (good algorithmic efficiency), and nonscaling cost of synchronization and gather/scatter operations (for a fixed problem size scaling). This dissertation also illustrates how to reuse the legacy scientific and engineering software within a library framework. Specifically, a three-dimensional unstructured grid incompressible Euler code from NASA has been parallelized with the Portable Extensible Toolkit for Scientific Computing (PETSc) library for distributed memory architectures. Using this newly instrumented code (called PETSc-FUN3D) as an example of a typical PDE solver, we demonstrate some strategies that are effective in tolerating the latencies arising from the hierarchical memory system and the network. Even on a single processor from each of the major contemporary architectural families, the PETSc-FUN3D code runs from 2.5 to 7.5 times faster than the legacy code on a medium-sized data set (with approximately 105 degrees of freedom). The major source of performance improvement is the increased locality in data reference patterns achieved through blocking, interlacing, and edge reordering. To explain these performance gains, we provide simple performance models based on memory bandwidth and instruction issue rates. Experimental evidence, in terms of translation lookaside buffer (TLB) and data cache miss rates, achieved memory bandwidth, and graduated floating point instructions per memory reference, is provided through accurate measurements with hardware counters. The performance models and experimental results motivate algorithmic and software practices that lead to improvements in both parallel scalability and per-node performance. We identify the bottlenecks to scalability (algorithmic as well as implementation) for a fixed-size problem when the number of processors grows to several thousands (the expected level of concurrency on terascale architectures). We also evaluate the hybrid programming model (mixed distributed/shared) from a performance standpoint

    ADAPTIVE POWER MANAGEMENT FOR COMPUTERS AND MOBILE DEVICES

    Get PDF
    Power consumption has become a major concern in the design of computing systems today. High power consumption increases cooling cost, degrades the system reliability and also reduces the battery life in portable devices. Modern computing/communication devices support multiple power modes which enable power and performance tradeoff. Dynamic power management (DPM), dynamic voltage and frequency scaling (DVFS), and dynamic task migration for workload consolidation are system level power reduction techniques widely used during runtime. In the first part of the dissertation, we concentrate on the dynamic power management of the personal computer and server platform where the DPM, DVFS and task migrations techniques are proved to be highly effective. A hierarchical energy management framework is assumed, where task migration is applied at the upper level to improve server utilization and energy efficiency, and DPM/DVFS is applied at the lower level to manage the power mode of individual processor. This work focuses on estimating the performance impact of workload consolidation and searching for optimal DPM/DVFS that adapts to the changing workload. Machine learning based modeling and reinforcement learning based policy optimization techniques are investigated. Mobile computing has been weaved into everyday lives to a great extend in recent years. Compared to traditional personal computer and server environment, the mobile computing environment is obviously more context-rich and the usage of mobile computing device is clearly imprinted with user\u27s personal signature. The ability to learn such signature enables immense potential in workload prediction and energy or battery life management. In the second part of the dissertation, we present two mobile device power management techniques which take advantage of the context-rich characteristics of mobile platform and make adaptive energy management decisions based on different user behavior. We firstly investigate the user battery usage behavior modeling and apply the model directly for battery energy management. The first technique aims at maximizing the quality of service (QoS) while keeping the risk of battery depletion below a given threshold. The second technique is an user-aware streaming strategies for energy efficient smartphone video playback applications (e.g. YouTube) that minimizes the sleep and wake penalty of cellular module and at the same time avoid the energy waste from excessive downloading. Runtime power and thermal management has attracted substantial interests in multi-core distributed embedded systems. Fast performance evaluation is an essential step in the research of distributed power and thermal management. In last part of the dissertation, we present an FPGA based emulator of multi-core distributed embedded system designed to support the research in runtime power/thermal management. Hardware and software supports are provided to carry out basic power/thermal management actions including inter-core or inter-FPGA communications, runtime temperature monitoring and dynamic frequency scaling

    A review of data mining applications in semiconductor manufacturing

    Get PDF
    The authors acknowledge Fundacao para a Ciencia e a Tecnologia (FCT-MCTES) for its financial support via the project UIDB/00667/2020 (UNIDEMI).For decades, industrial companies have been collecting and storing high amounts of data with the aim of better controlling and managing their processes. However, this vast amount of information and hidden knowledge implicit in all of this data could be utilized more efficiently. With the help of data mining techniques unknown relationships can be systematically discovered. The production of semiconductors is a highly complex process, which entails several subprocesses that employ a diverse array of equipment. The size of the semiconductors signifies a high number of units can be produced, which require huge amounts of data in order to be able to control and improve the semiconductor manufacturing process. Therefore, in this paper a structured review is made through a sample of 137 papers of the published articles in the scientific community regarding data mining applications in semiconductor manufacturing. A detailed bibliometric analysis is also made. All data mining applications are classified in function of the application area. The results are then analyzed and conclusions are drawn.publishersversionpublishe

    A Review of Bayesian Methods in Electronic Design Automation

    Full text link
    The utilization of Bayesian methods has been widely acknowledged as a viable solution for tackling various challenges in electronic integrated circuit (IC) design under stochastic process variation, including circuit performance modeling, yield/failure rate estimation, and circuit optimization. As the post-Moore era brings about new technologies (such as silicon photonics and quantum circuits), many of the associated issues there are similar to those encountered in electronic IC design and can be addressed using Bayesian methods. Motivated by this observation, we present a comprehensive review of Bayesian methods in electronic design automation (EDA). By doing so, we hope to equip researchers and designers with the ability to apply Bayesian methods in solving stochastic problems in electronic circuits and beyond.Comment: 24 pages, a draft version. We welcome comments and feedback, which can be sent to [email protected]

    On the design of architecture-aware algorithms for emerging applications

    Get PDF
    This dissertation maps various kernels and applications to a spectrum of programming models and architectures and also presents architecture-aware algorithms for different systems. The kernels and applications discussed in this dissertation have widely varying computational characteristics. For example, we consider both dense numerical computations and sparse graph algorithms. This dissertation also covers emerging applications from image processing, complex network analysis, and computational biology. We map these problems to diverse multicore processors and manycore accelerators. We also use new programming models (such as Transactional Memory, MapReduce, and Intel TBB) to address the performance and productivity challenges in the problems. Our experiences highlight the importance of mapping applications to appropriate programming models and architectures. We also find several limitations of current system software and architectures and directions to improve those. The discussion focuses on system software and architectural support for nested irregular parallelism, Transactional Memory, and hybrid data transfer mechanisms. We believe that the complexity of parallel programming can be significantly reduced via collaborative efforts among researchers and practitioners from different domains. This dissertation participates in the efforts by providing benchmarks and suggestions to improve system software and architectures.Ph.D.Committee Chair: Bader, David; Committee Member: Hong, Bo; Committee Member: Riley, George; Committee Member: Vuduc, Richard; Committee Member: Wills, Scot

    Bio-Inspired Computer Vision: Towards a Synergistic Approach of Artificial and Biological Vision

    Get PDF
    To appear in CVIUStudies in biological vision have always been a great source of inspiration for design of computer vision algorithms. In the past, several successful methods were designed with varying degrees of correspondence with biological vision studies, ranging from purely functional inspiration to methods that utilise models that were primarily developed for explaining biological observations. Even though it seems well recognised that computational models of biological vision can help in design of computer vision algorithms, it is a non-trivial exercise for a computer vision researcher to mine relevant information from biological vision literature as very few studies in biology are organised at a task level. In this paper we aim to bridge this gap by providing a computer vision task centric presentation of models primarily originating in biological vision studies. Not only do we revisit some of the main features of biological vision and discuss the foundations of existing computational studies modelling biological vision, but also we consider three classical computer vision tasks from a biological perspective: image sensing, segmentation and optical flow. Using this task-centric approach, we discuss well-known biological functional principles and compare them with approaches taken by computer vision. Based on this comparative analysis of computer and biological vision, we present some recent models in biological vision and highlight a few models that we think are promising for future investigations in computer vision. To this extent, this paper provides new insights and a starting point for investigators interested in the design of biology-based computer vision algorithms and pave a way for much needed interaction between the two communities leading to the development of synergistic models of artificial and biological vision
    corecore