226 research outputs found

    Distributed Finite Element Analysis Using a Transputer Network

    Get PDF
    The principal objective of this research effort was to demonstrate the extraordinarily cost effective acceleration of finite element structural analysis problems using a transputer-based parallel processing network. This objective was accomplished in the form of a commercially viable parallel processing workstation. The workstation is a desktop size, low-maintenance computing unit capable of supercomputer performance yet costs two orders of magnitude less. To achieve the principal research objective, a transputer based structural analysis workstation termed XPFEM was implemented with linear static structural analysis capabilities resembling commercially available NASTRAN. Finite element model files, generated using the on-line preprocessing module or external preprocessing packages, are downloaded to a network of 32 transputers for accelerated solution. The system currently executes at about one third Cray X-MP24 speed but additional acceleration appears likely. For the NASA selected demonstration problem of a Space Shuttle main engine turbine blade model with about 1500 nodes and 4500 independent degrees of freedom, the Cray X-MP24 required 23.9 seconds to obtain a solution while the transputer network, operated from an IBM PC-AT compatible host computer, required 71.7 seconds. Consequently, the 80,000transputernetworkdemonstratedacost−performanceratioabout60timesbetterthanthe80,000 transputer network demonstrated a cost-performance ratio about 60 times better than the 15,000,000 Cray X-MP24 system

    Sparse component separation for accurate CMB map estimation

    Get PDF
    The Cosmological Microwave Background (CMB) is of premier importance for the cosmologists to study the birth of our universe. Unfortunately, most CMB experiments such as COBE, WMAP or Planck do not provide a direct measure of the cosmological signal; CMB is mixed up with galactic foregrounds and point sources. For the sake of scientific exploitation, measuring the CMB requires extracting several different astrophysical components (CMB, Sunyaev-Zel'dovich clusters, galactic dust) form multi-wavelength observations. Mathematically speaking, the problem of disentangling the CMB map from the galactic foregrounds amounts to a component or source separation problem. In the field of CMB studies, a very large range of source separation methods have been applied which all differ from each other in the way they model the data and the criteria they rely on to separate components. Two main difficulties are i) the instrument's beam varies across frequencies and ii) the emission laws of most astrophysical components vary across pixels. This paper aims at introducing a very accurate modeling of CMB data, based on sparsity, accounting for beams variability across frequencies as well as spatial variations of the components' spectral characteristics. Based on this new sparse modeling of the data, a sparsity-based component separation method coined Local-Generalized Morphological Component Analysis (L-GMCA) is described. Extensive numerical experiments have been carried out with simulated Planck data. These experiments show the high efficiency of the proposed component separation methods to estimate a clean CMB map with a very low foreground contamination, which makes L-GMCA of prime interest for CMB studies.Comment: submitted to A&

    A framework for developing finite element codes for multi- disciplinary applications

    Get PDF
    The world of computing simulation has experienced great progresses in recent years and requires more exigent multidisciplinary challenges to satisfy the new upcoming demands. Increasing the importance of solving multi-disciplinary problems makes developers put more attention to these problems and deal with difficulties involved in developing software in this area. Conventional finite element codes have several difficulties in dealing with multi-disciplinary problems. Many of these codes are designed and implemented for solving a certain type of problems, generally involving a single field. Extending these codes to deal with another field of analysis usually consists of several problems and large amounts of modifications and implementations. Some typical difficulties are: predefined set of degrees of freedom per node, data structure with fixed set of defined variables, global list of variables for all entities, domain based interfaces, IO restriction in reading new data and writing new results and algorithm definition inside the code. A common approach is to connect different solvers via a master program which implements the interaction algorithms and also transfers data from one solver to another. This approach has been used successfully in practice but results duplicated implementation and redundant overhead of data storing and transferring which may be significant depending to the solvers data structure. The objective of this work is to design and implement a framework for building multi-disciplinary finite element programs. Generality, reusability, extendibility, good performance and memory efficiency are considered to be the main points in design and implementation of this framework. Preparing the structure for team development is another objective because usually a team of experts in different fields are involved in the development of multi-disciplinary code. Kratos, the framework created in this work, provides several tools for easy implementation of finite element applications and also provides a common platform for natural interaction of its applications in different ways. This is done not only by a number of innovations but also by collecting and reusing several existing works. In this work an innovative variable base interface is designed and implemented which is used at different levels of abstraction and showed to be very clear and extendible. Another innovation is a very efficient and flexible data structure which can be used to store any type of data in a type-safe manner. An extendible IO is also created to overcome another bottleneck in dealing with multi-disciplinary problems. Collecting different concepts of existing works and adapting them to coupled problems is considered to be another innovation in this work. Examples are using an interpreter, different data organizations and variable number of dofs per node. The kernel and application approach is used to reduce the possible conflicts arising between developers of different fields and layers are designed to reflect the working space of different developers also considering their programming knowledge. Finally several technical details are applied in order to increase the performance and efficiency of Kratos which makes it practically usable. This work is completed by demonstrating the framework’s functionality in practice. First some classical single field applications like thermal, fluid and structural applications are implemented and used as benchmark to prove its performance. These applications are used to solve coupled problems in order to demonstrate the natural interaction facility provided by the framework. Finally some less classical coupled finite element algorithms are implemented to show its high flexibility and extendibility

    Fast, Sparse Matrix Factorization and Matrix Algebra via Random Sampling for Integral Equation Formulations in Electromagnetics

    Get PDF
    Many systems designed by electrical & computer engineers rely on electromagnetic (EM) signals to transmit, receive, and extract either information or energy. In many cases, these systems are large and complex. Their accurate, cost-effective design requires high-fidelity computer modeling of the underlying EM field/material interaction problem in order to find a design with acceptable system performance. This modeling is accomplished by projecting the governing Maxwell equations onto finite dimensional subspaces, which results in a large matrix equation representation (Zx = b) of the EM problem. In the case of integral equation-based formulations of EM problems, the M-by-N system matrix, Z, is generally dense. For this reason, when treating large problems, it is necessary to use compression methods to store and manipulate Z. One such sparse representation is provided by so-called H^2 matrices. At low-to-moderate frequencies, H^2 matrices provide a controllably accurate data-sparse representation of Z. The scale at which problems in EM are considered ``large\u27\u27 is continuously being redefined to be larger. This growth of problem scale is not only happening in EM, but respectively across all other sub-fields of computational science as well. The pursuit of increasingly large problems is unwavering in all these sub-fields, and this drive has long outpaced the rate of advancements in processing and storage capabilities in computing. This has caused computational science communities to now face the computational limitations of standard linear algebraic methods that have been relied upon for decades to run quickly and efficiently on modern computing hardware. This common set of algorithms can only produce reliable results quickly and efficiently for small to mid-sized matrices that fit into the memory of the host computer. Therefore, the drive to pursue larger problems has even began to outpace the reasonable capabilities of these common numerical algorithms; the deterministic numerical linear algebra algorithms that have gotten matrix computation this far have proven to be inadequate for many problems of current interest. This has computational science communities focusing on improvements in their mathematical and software approaches in order to push further advancement. Randomized numerical linear algebra (RandNLA) is an emerging area that both academia and industry believe to be strong candidates to assist in overcoming the limitations faced when solving massive and computationally expensive problems. This thesis presents results of recent work that uses a random sampling method (RSM) to implement algebraic operations involving multiple H^2 matrices. Significantly, this work is done in a manner that is non-invasive to an existing H^2 code base for filling and factoring H^2 matrices. The work presented thus expands the existing code\u27s capabilities with minimal impact on existing (and well-tested) applications. In addition to this work with randomized H^2 algebra, improvements in sparse factorization methods for the compressed H^2 data structure will also be presented. The reported developments in filling and factoring H^2 data structures assist in, and allow for, the further pursuit of large and complex problems in computational EM (CEM) within simulation code bases that utilize the H^2 data structure

    On the Practice and Application of Context-Free Language Reachability

    Get PDF
    The Context-Free Language Reachability (CFL-R) formalism relates to some of the most important computational problems facing researchers and industry practitioners. CFL-R is a generalisation of graph reachability and language recognition, such that pairs in a labelled graph are reachable if and only if there is a path between them whose labels, joined together in the order they were encountered, spell a word in a given context-free language. The formalism finds particular use as a vehicle for phrasing and reasoning about program analysis, since complex relationships within the data, logic or structure of computer programs are easily expressed and discovered in CFL-R. Unfortunately, The potential of CFL-R can not be met by state of the art solvers. Current algorithms have scalability and expressibility issues that prevent them from being used on large graph instances or complex grammars. This work outlines our efforts in understanding the practical concerns surrounding CFL-R, and applying this knowledge to improve the performance of CFL-R applications. We examine the major difficulties with solving CFL-R-based analyses at-scale, via a case-study of points-to analysis as a CFL-R problem. Points-to analysis is fundamentally important to many modern research and industry efforts, and is relevant to optimisation, bug-checking and security technologies. Our understanding of the scalability challenge motivates work in developing practical CFL-R techniques. We present improved evaluation algorithms and declarative optimisation techniques for CFL-R, capitalising on the simplicity of CFL-R to creating fully automatic methodologies. The culmination of our work is a general-purpose and high-performance tool called Cauliflower, a solver-generator for CFL-R problems. We describe Cauliflower and evaluate its performance experimentally, showing significant improvement over alternative general techniques

    Hierarchical Variance Reduction Techniques for Monte Carlo Rendering

    Get PDF
    Ever since the first three-dimensional computer graphics appeared half a century ago, the goal has been to model and simulate how light interacts with materials and objects to form an image. The ultimate goal is photorealistic rendering, where the created images reach a level of accuracy that makes them indistinguishable from photographs of the real world. There are many applications ñ visualization of products and architectural designs yet to be built, special effects, computer-generated films, virtual reality, and video games, to name a few. However, the problem has proven tremendously complex; the illumination at any point is described by a recursive integral to which a closed-form solution seldom exists. Instead, computer simulation and Monte Carlo methods are commonly used to statistically estimate the result. This introduces undesirable noise, or variance, and a large body of research has been devoted to finding ways to reduce the variance. I continue along this line of research, and present several novel techniques for variance reduction in Monte Carlo rendering, as well as a few related tools. The research in this dissertation focuses on using importance sampling to pick a small set of well-distributed point samples. As the primary contribution, I have developed the first methods to explicitly draw samples from the product of distant high-frequency lighting and complex reflectance functions. By sampling the product, low noise results can be achieved using a very small number of samples, which is important to minimize the rendering times. Several different hierarchical representations are explored to allow efficient product sampling. In the first publication, the key idea is to work in a compressed wavelet basis, which allows fast evaluation of the product. Many of the initial restrictions of this technique were removed in follow-up work, allowing higher-resolution uncompressed lighting and avoiding precomputation of reflectance functions. My second main contribution is to present one of the first techniques to take the triple product of lighting, visibility and reflectance into account to further reduce the variance in Monte Carlo rendering. For this purpose, control variates are combined with importance sampling to solve the problem in a novel way. A large part of the technique also focuses on analysis and approximation of the visibility function. To further refine the above techniques, several useful tools are introduced. These include a fast, low-distortion map to represent (hemi)spherical functions, a method to create high-quality quasi-random points, and an optimizing compiler for analyzing shaders using interval arithmetic. The latter automatically extracts bounds for importance sampling of arbitrary shaders, as opposed to using a priori known reflectance functions. In summary, the work presented here takes the field of computer graphics one step further towards making photorealistic rendering practical for a wide range of uses. By introducing several novel Monte Carlo methods, more sophisticated lighting and materials can be used without increasing the computation times. The research is aimed at domain-specific solutions to the rendering problem, but I believe that much of the new theory is applicable in other parts of computer graphics, as well as in other fields

    Three-Dimensional Segmentation and Visualization of Magnetic Resonance Imaging Data

    Get PDF
    In this thesis, I shall study and compare various methods for manipulating two- and three-dimensional image data produced with a nuclear magnetic resonance scanner. In particular, I will examine ways of focusing upon specific structures internal to the object under study (segmentation); and will explore means of rendering realistic images of these structures on a computer screen using depth-cueing, shading, and ray-casting techniques.The 3DHEAD volumetric dataset used for this project was created with the Siemens Magnetom and was provided courtesy of Siemens Medical Systems, Inc., Iselin, NJ. This dataset consists of 109 slices of a human head, with each slice stored consecutively as a 256 x 256 array. Each pixel is represented by two consecutive bytes, which make one binary integer. (A similar dataset of a human knee is also available.) The 3DHEAD dataset requires about 14 Mb of disk space uncompressed. The programs which manipulate this data are MS-DOS-based and were written and compiled using Microsoft QuickC version 2.51. The 2-D programs were executed on a CompuAdd 486DXl2-50 with 8 Mb of RAM, running MS-DOS version 6.22; the 3-D programs were executed on a 133 MHz Pentium clone with 48 Mb of RAM, running the DOS shell of Microsoft Windows 95.Our immediate objectives are to produce pleasing and informative 2-D and 3-D pictures of the internal structure of some component of the human head: for example, the brain.We need to remove from the original dataset all of the data which do not represent the brain. Then, for the 3-D images, we need to render the remaining data in such a way that it possesses depth and realism

    Scalable Graph Algorithms in a High-Level Language Using Primitives Inspired by Linear Algebra

    Get PDF
    This dissertation advances the state of the art for scalable high-performance graph analytics and data mining using the language of linear algebra. Many graph computations suffer poor scalability due to their irregular nature and low operational intensity. A small but powerful set of linear algebra primitives that specifically target graph and data mining applications can expose sufficient coarse-grained parallelism to scale to thousands of processors.In this dissertation we advance existing distributed memory approaches in two important ways. First, we observe that data scientists and domain experts know their analysis and mining problems well, but suffer from little HPC experience. We describe a system that presents the user with a clean API in a high-level language that scales from a laptop to a supercomputer with thousands of cores. We utilize a Domain-Specific Embedded Language with Selective Just-In-Time Specialization to ensure a negligible performance impact over the original distributed memory low-level code. The high-level language enables ease of use, rapid prototyping, and additional features such as on-the-fly filtering, runtime-defined objects, and exposure to a large set of third-party visualization packages.The second important advance is a new sparse matrix data structure and set of algorithms. We note that shared memory machines are dominant both in stand-alone form and as nodes in distributed memory clusters. This thesis offers the design of a new sparse-matrix data structure and set of parallel algorithms, a reusable implementation in shared memory, and a performance evaluation that shows significant speed and memory usage improvements over competing packages. Our method also offers features such as in-memory compression, a low-cost transpose, and chained primitives that do not materialize the entire intermediate result at any one time. We focus on a scalable, generalized, sparse matrix-matrix multiplication algorithm. This primitive is used extensively in many graph algorithms such as betweenness centrality, graph clustering, graph contraction, and subgraph extraction
    • …
    corecore