38 research outputs found

    Parallel volume rendering using binary-swap compositing

    Get PDF
    Journal ArticleExisting volume rendering methods, though capable of very effective visualizations, are computationally intensive and therefore fail to achieve interactive rendering rates for large data sets. Although computing technology continues to advance, computer processing power never seems to catch up to the increases in data size

    Adaptive and hybrid schemes for efficient parallel squaring and cubing units

    Get PDF
    Squaring (X2) and cubing (X3) units are special operations of multiplication used in many applications, such as image compression, equalization, decoding and demodulation, 3D graphics, scientific computing, artificial neural networks, logarithmic number system, and multimedia application. They can also be an efficient way to compute other basic functions. Therefore, improving their performances is a goal for many researchers. This dissertation will discuss modification to algorithms to compute parallel squaring and cubing units in both signed and unsigned representation. After that, truncated technique is applied to improve their performance. Each unit is modeled and estimated to obtain its area, delay by using linear evaluation model. A C program was written to generate Hardware Description Language files for each unit. These units are simulated and verified in simulation. Moreover, area, delay, and power consumption are calculated for each unit and compared with those ones in previous approaches for both Virtex 5 Xilinx FPGA and IBM 65nm ASIC technologies

    Cubing Units Using Carry-Save Array Implementations

    Get PDF
    This study is aimed at designing a specialized functional unit to perform the operation of Cubing. The design has been implemented based on the concept of carry-save array multipliers. The carry save concept aims at accelerating the process of addition by delaying the carry propagation operation till the last step. The motivation behind using array structures is simple. Array structures are regular and easy to design. This paper first looks at a simple cubing unit that can accept only unsigned inputs. This design is then modified based on mathematical derivations and architecture for signed cubing unit or two's complement cubing unit is derived. The designs have been tested, synthesized and compared with the traditional multiplication techniques for area and delay. When the different designs for various bit sizes of operands were tested and synthesized, some really interesting results were arrived at. The algorithmic analysis showed that the cubing unit implementation would require more delay and area as compared to two passes for the same operation through corresponding traditional CSAMs. After synthesis, while the results agreed with the area requirement, for the 6-bit version, it was seen that the cubing design is actually faster as compared to the traditional implementation. The interesting thing about the cubing unit design is that its area requirement and delay increases almost exponentially with the length of the input operands. Careful floorplanning and layout techniques must be employed in order to arrive at a good design.School of Electrical & Computer Engineerin

    Scalable Data Analysis on MapReduce-based Systems

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Optimized linear, quadratic and cubic interpolators for elementary function hardware implementations

    Get PDF
    This paper presents a method for designing linear, quadratic and cubic interpolators that compute elementary functions using truncated multipliers, squarers and cubers. Initial coefficient values are obtained using a Chebyshev series approximation. A direct search algorithm is then used to optimize the quantized coefficient values to meet a user-specified error constraint. The algorithm minimizes coefficient lengths to reduce lookup table requirements, maximizes the number of truncated columns to reduce the area, delay and power of the arithmetic units, and minimizes the maximum absolute error of the interpolator output. The method can be used to design interpolators to approximate any function to a user-specified accuracy, up to and beyond 53-bits of precision (e.g., IEEE double precision significand). Linear, quadratic and cubic interpolator designs that approximate reciprocal, square root, reciprocal square root and sine are presented and analyzed. Area, delay and power estimates are given for 16, 24 and 32-bit interpolators that compute the reciprocal function, targeting a 65 nm CMOS technology from IBM. Results indicate the proposed method uses smaller arithmetic units and has reduced lookup table sizes compared to previously proposed methods. The method can be used to optimize coefficients in other systems while accounting for coefficient quantization as well as truncation and rounding effects of multiple arithmetic units.Peer reviewedElectrical and Computer Engineerin

    Multi-Level Trace Abstraction, Linking and Display

    Get PDF
    RÉSUMÉ Certains types de problèmes logiciels et de bogues ne peuvent être identifiées et résolus que lors de l'analyse de l'exécution des applications. L'analyse d'exécution (et le débogage) des systèmes parallèles et distribués est très difficile en utilisant uniquement le code source et les autres artefacts logiciels statiques. L'analyse dynamique par trace d'exécution est de plus en plus utilisée pour étudier le comportement d'un système. Les traces d'exécution contiennent généralement une grande quantité d'information sur l'exécution du système, par exemple quel processus/module interagit avec quels autres processus/modules, ou encore quel fichier est touché par celui-ci, et ainsi de suite. Les traces au niveau du système d'exploitation sont un des types de données des plus utiles et efficaces qui peuvent être utilisés pour détecter des problèmes d'exécution complexes. En effet, ils contiennent généralement des informations détaillées sur la communication inter-processus, sur l'utilisation de la mémoire, le système de fichiers, les appels système, la couche réseau, les blocs de disque, etc. Cette information peut être utilisée pour raisonner sur le comportement d'exécution du système et investiguer les bogues ainsi que les problèmes d'exécution. D'un autre côté, les traces d'exécution peuvent rapidement avoir une très grande taille en peu de temps à cause de la grande quantité d'information qu'elles contiennent. De plus, les traces contiennent généralement des données de bas niveau (appels système, interruptions, etc ) pour lesquelles l'analyse et la compréhension du contexte requièrent des connaissances poussées dans le domaine des systèmes d'exploitation. Très souvent, les administrateurs système et analystes préfèrent des données de plus haut niveau pour avoir une idée plus générale du comportement du système, contrairement aux traces noyau dont le niveau d'abstraction est très bas. Pour pouvoir générer efficacement des analyses de plus haut niveau, il est nécessaire de développer des algorithmes et des outils efficaces pour analyser les traces noyau et mettre en évidence les événements les plus pertinents. Le caractère expressif des événements de trace permet aux analystes de raisonner sur l'exécution du système à des niveaux plus élevés, pour découvrir le sens de l'exécution différents endroits, et détecter les comportements problématiques et inattendus. Toutefois, pour permettre une telle analyse, un outil de visualisation supplémentaire est nécessaire pour afficher les événements abstraits à de plus hauts niveaux d'abstraction. Cet outil peut permettre aux utilisateurs de voir une liste des problèmes détectés, de les suivre dans les couches de plus bas niveau (ex. directement dans la trace détaillée) et éventuellement de découvrir les raisons des problèmes détectés. Dans cette thèse, un cadre d'application est présenté pour relever ces défis : réduire la taille d'une trace ainsi que sa complexité, générer plusieurs niveaux d'événements abstraits pour les organiser d'une facon hiérarchique et les visualiser à de multiples niveaux, et enfin permettre aux utilisateurs d'effectuer une analyse verticale et à plusieurs niveaux d'abstraction, conduisant à une meilleure connaissance et compréhension de l'exécution du système. Le cadre d'application proposé est étudié en deux grandes parties : d'abord plusieurs niveaux d'abstraction de trace, et ensuite l'organisation de la trace à plusieurs niveaux et sa visualisation. La première partie traite des techniques utilisées pour les données de trace abstraites en utilisant soit les traces des événements contenus, un ensemble prédéfini de paramètres et de mesures, ou une structure basée l'abstraction pour extraire les ressources impliquées dans le système. La deuxième partie, en revanche, indique l'organisation hiérarchique des événements abstraits générés, l'établissement de liens entre les événements connexes, et enfin la visualisation en utilisant une vue de la chronologie avec échelle ajustable. Cette vue affiche les événements à différents niveaux de granularité, et permet une navigation hiérarchique à travers différentes couches d'événements de trace, en soutenant la mise a l'échelle sémantique. Grâce à cet outil, les utilisateurs peuvent tout d'abord avoir un aperçu de l'exécution, contenant un ensemble de comportements de haut niveau, puis peuvent se déplacer dans la vue et se concentrer sur une zone d'intérêt pour obtenir plus de détails sur celle-ci. L'outil proposé synchronise et coordonne les différents niveaux de la vue en établissant des liens entre les données, structurellement ou sémantiquement. La liaison structurelle utilise la délimitation par estampilles de temps des événements pour lier les données, tandis que le second utilise une pré-analyse des événements de trace pour trouver la pertinence entre eux ainsi que pour les lier. Lier les événements relie les informations de différentes couches qui appartiennent théoriquement à la même procédure ou spécifient le même comportement. Avec l'utilisation de la liaison de la correspondance, des événements dans une couche peuvent être analysés par rapport aux événements et aux informations disponibles dans d'autres couches, ce qui conduit à une analyse à plusieurs niveaux et la compréhension de l'exécution du système sous-jacent. Les exemples et les résultats expérimentaux des techniques d'abstraction et de visualisation proposés sont présentes dans cette thèse qui prouve l'efficacité de l'approche. Dans ce projet, toutes les évaluations et les expériences ont été menées sur la base des événements de trace au niveau du système d'exploitation recueillies par le traceur Linux Trace Toolkit Next Generation ( LTTng ). LTTng est un outil libre, léger et à faible impact qui fournit des informations d'exécution détaillées à partir des différents modules du système sous-jacent, tant au niveau du noyau Linux que de l'espace utilisateur. LTTng fonctionne en instrumentant le noyau et les applications utilisateur en insérant quelques points de traces à des endroits différents.----------ABSTRACT Some problems and bugs can only be identified and resolved using runtime application behavior analysis. Runtime analysis of multi-threaded and distributed systems is very difficult, almost impossible, by only analyzing the source code and other static software artifacts. Therefore, dynamic analysis through execution traces is increasingly used to study system runtime behavior. Execution traces usually contain large amounts of valuable information about the system execution, e.g., which process/module interacts with which other processes/modules, which file is touched by which process/module, which function is called by which process/module/function and so on. Operating system level traces are among the most useful and effective information sources that can be used to detect complex bugs and problems. Indeed, they contain detailed information about inter-process communication, memory usage, file system, system calls, networking, disk blocks, etc. This information can be used to understand the system runtime behavior, and to identify a large class of bugs, problems, and misbehavior. However, execution traces may become large, even within a few seconds or minutes of execution, making the analysis difficult. Moreover, traces are often filled with low-level data (system calls, interrupts, etc.) so that people need a complete understanding of the domain knowledge to analyze these data. It is often preferable for analysts to look at relatively abstract and high-level events, which are more readable and representative than the original trace data, and reveal the same behavior but at higher levels of granularity. However, to achieve such high-level data, effective algorithms and tools must be developed to process trace events, remove less important ones, highlight only necessary data, generalize trace events, and finally aggregate and group similar and related events. The expressive nature of the synthetic events allows analysts to reason about system execution at higher levels, to uncover execution behavior in different areas, and detect its problematic and unexpected aspects. However, to allow such analysis, an additional visualization tool may be required to display abstract events at different levels and explore them easily. This tool may enable users to see a list of detected problems, follow the problems in the detailed levels (e.g., within the raw trace events), analyze, and possibly discover the reasons for the detected problems. In this thesis, a framework is presented to address those challenges: to reduce the execution trace size and complexity, to generate several levels of abstract events, to organize the data in a hierarchy, to visualize them at multiple levels, and finally to enable users to perform a top-down and multiscale analysis over trace data, leading to a better understanding and comprehension of underlying system execution. The proposed framework is studied in two major parts: multi-level trace abstraction, and multi-level trace organization and visualization. The first part discusses the techniques used to abstract out trace data using either the trace events content, a predefined set of metrics and measures, or structure-based abstraction to extract the resources involved in the system execution. The second part determines the hierarchical organization of the generated abstract events, establishes links between the related events, and finally visualizes events using a zoomable timeline view. This view displays the events at different granularity levels, and enables a hierarchical navigation through different layers of trace events by supporting the semantic zooming. Using this tool, users can first see an overview of the execution, and then can pan around the view, and focus and zoom on any area of interest for more details and insight. The proposed view synchronizes and coordinates the different view levels by establishing links between data, structurally or semantically. The structural linking uses bounding times-tamps of the events to link the data, while the latter uses a pre-analysis of the trace events to find their relevance, and to link them together. Establishing Links connects the information that are conceptually related together ( e.g., events belong to the same process or specify the same behavior), so that events in one layer can be analyzed with respect to events and information in other layers, leading to a multi-level analysis and a better comprehension of the underlying system execution. In this project, all evaluations and experiments were conducted on operating system level traces obtained with the Linux Trace Toolkit Next Generation (LTTng). LTTng is a lightweight and low-impact open source tracing tool that provides valuable runtime information from the various modules of the underlying system at both the Linux kernel and user-space levels. LTTng works by instrumenting the (kernel and user-space) applications, with statically inserted tracepoints, and by generating log entries at runtime each time a tracepoint is hit

    Sparse array representations and some selected array operations on GPUs

    Get PDF
    A dissertation submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in fulfilment of the requirements for the degree of Master of Science. Johannesburg, 2014.A multi-dimensional data model provides a good conceptual view of the data in data warehousing and On-Line Analytical Processing (OLAP). A typical representation of such a data model is as a multi-dimensional array which is well suited when the array is dense. If the array is sparse, i.e., has a few number of non-zero elements relative to the product of the cardinalities of the dimensions, using a multi-dimensional array to represent the data set requires extremely large memory space while the actual data elements occupy a relatively small fraction of the space. Existing storage schemes for Multi-Dimensional Sparse Arrays (MDSAs) of higher dimensions k (k > 2), focus on optimizing the storage utilization, and offer little flexibility in data access efficiency. Most efficient storage schemes for sparse arrays are limited to matrices that are arrays in 2 dimensions. In this dissertation, we introduce four storage schemes for MDSAs that handle the sparsity of the array with two primary goals; reducing the storage overhead and maintaining efficient data element access. These schemes, including a well known method referred to as the Bit Encoded Sparse Storage (BESS), were evaluated and compared on four basic array operations, namely construction of a scheme, large scale random element access, sub-array retrieval and multi-dimensional aggregation. The four storage schemes being proposed, together with the evaluation results are: i.) The extended compressed row storage (xCRS) which extends CRS method for sparse matrix storage to sparse arrays of higher dimensions and achieves the best data element access efficiency among the methods compared; ii.) The bit encoded xCRS (BxCRS) which optimizes the storage utilization of xCRS by applying data compression methods with run length encoding, while maintaining its data access efficiency; iii.) A hybrid approach (Hybrid) that provides the best control of the balance between the storage utilization and data manipulation efficiency by combining xCRS and BESS. iv.) The PATRICIA trie compressed storage (PTCS) which uses PATRICIA trie to store the valid non-zero array elements. PTCS supports efficient data access, and has a unique property of supporting update operations conveniently. v.) BESS performs the best for the multi-dimensional aggregation, closely followed by the other schemes. We also addressed the problem of accelerating some selected array operations using General Purpose Computing on Graphics Processing Unit (GPGPU). The experimental results showed different levels of speed up, ranging from 2 to over 20 times, on large scale random element access and sub-array retrieval. In particular, we utilized GPUs on the computation of the cube operator, a special case of multi-dimensional aggregation, using BESS. This resulted in a 5 to 8 times of speed up compared with our CPU only implementation. The main contributions of this dissertation include the developments, implementations and evaluations of four efficient schemes to store multi-dimensional sparse arrays, as well as utilizing massive parallelism of GPUs for some data warehousing operations

    Non-photorealistic rendering: a critical examination and proposed system.

    Get PDF
    In the first part of the program the emergent field of Non-Photorealistic Rendering is explored from a cultural perspective. This is to establish a clear understanding of what Non-Photorealistic Rendering (NPR) ought to be in its mature form in order to provide goals and an overall infrastructure for future development. This thesis claims that unless we understand and clarify NPR's relationship with other media (photography, photorealistic computer graphics and traditional media) we will continue to manufacture "new solutions" to computer based imaging which are confused and naive in their goals. Such solutions will be rejected by the art and design community, generally condemned as novelties of little cultural worth ( i.e. they will not sell). This is achieved by critically reviewing published systems that are naively described as Non-photorealistic or "painterly" systems. Current practices and techniques are criticised in terms of their low ability to articulate meaning in images; solutions to this problem are given. A further argument claims that NPR, while being similar to traditional "natural media" techniques in certain aspects, is fundamentally different in other ways. This similarity has lead NPR to be sometimes proposed as "painting simulation" — something it can never be. Methods for avoiding this position are proposed. The similarities and differences to painting and drawing are presented and NPR's relationship to its other counterpart, Photorealistic Rendering (PR), is then delineated. It is shown that NPR is paradigmatically different to other forms of representation — i.e. it is not an "effect", but rather something basically different. The benefits of NPR in its mature form are discussed in the context of Architectural Representation and Design in general. This is done in conjunction with consultations with designers and architects. From this consultation a "wish-list" of capabilities is compiled by way of a requirements capture for a proposed system. A series of computer-based experiments resulting in the systems "Expressive Marks" and 'Magic Painter" are carried out; these practical experiments add further understanding to the problems of NPR. The exploration concludes with a prototype system "Piranesi" which is submitted as a good overall solution to the problem of NPR. In support of this written thesis are : - • The Expressive Marks system • Magic Painter system • The Piranesi system (which includes the EPixel and Sketcher systems) • A large portfolio of images generated throughout the exploration

    The Development of a Numerical Solver for the Phase Field Crystal Model with Select Applications to Materials Science

    Get PDF
    The phase field crystal (PFC) model is a conserved continuum model which isused to investigate the phase behavior of materials near the melting point. The simplestPFC model, used in the present work, produces solid, liquid, and lamellar phases. Thesolid phase features a spatially periodic triangular lattice structure much like the crystalstructure of some real materials. The present work focuses on the development of computer codes to solvenumerically the differential equations of the PFC model using a Fourier spaceformulation. These programs are then used in studies of dendritic growth and Ostwaldripening. Results of the Ostwald ripening study are compared to experimental data ofdiffusion-limited growth of solid Sn domains in a liquid solution of Pb and Sn inmicrogravity. Possible next steps for this project are discussed

    A Field Guide to Genetic Programming

    Get PDF
    xiv, 233 p. : il. ; 23 cm.Libro ElectrónicoA Field Guide to Genetic Programming (ISBN 978-1-4092-0073-4) is an introduction to genetic programming (GP). GP is a systematic, domain-independent method for getting computers to solve problems automatically starting from a high-level statement of what needs to be done. Using ideas from natural evolution, GP starts from an ooze of random computer programs, and progressively refines them through processes of mutation and sexual recombination, until solutions emerge. All this without the user having to know or specify the form or structure of solutions in advance. GP has generated a plethora of human-competitive results and applications, including novel scientific discoveries and patentable inventions. The authorsIntroduction -- Representation, initialisation and operators in Tree-based GP -- Getting ready to run genetic programming -- Example genetic programming run -- Alternative initialisations and operators in Tree-based GP -- Modular, grammatical and developmental Tree-based GP -- Linear and graph genetic programming -- Probalistic genetic programming -- Multi-objective genetic programming -- Fast and distributed genetic programming -- GP theory and its applications -- Applications -- Troubleshooting GP -- Conclusions.Contents xi 1 Introduction 1.1 Genetic Programming in a Nutshell 1.2 Getting Started 1.3 Prerequisites 1.4 Overview of this Field Guide I Basics 2 Representation, Initialisation and GP 2.1 Representation 2.2 Initialising the Population 2.3 Selection 2.4 Recombination and Mutation Operators in Tree-based 3 Getting Ready to Run Genetic Programming 19 3.1 Step 1: Terminal Set 19 3.2 Step 2: Function Set 20 3.2.1 Closure 21 3.2.2 Sufficiency 23 3.2.3 Evolving Structures other than Programs 23 3.3 Step 3: Fitness Function 24 3.4 Step 4: GP Parameters 26 3.5 Step 5: Termination and solution designation 27 4 Example Genetic Programming Run 4.1 Preparatory Steps 29 4.2 Step-by-Step Sample Run 31 4.2.1 Initialisation 31 4.2.2 Fitness Evaluation Selection, Crossover and Mutation Termination and Solution Designation Advanced Genetic Programming 5 Alternative Initialisations and Operators in 5.1 Constructing the Initial Population 5.1.1 Uniform Initialisation 5.1.2 Initialisation may Affect Bloat 5.1.3 Seeding 5.2 GP Mutation 5.2.1 Is Mutation Necessary? 5.2.2 Mutation Cookbook 5.3 GP Crossover 5.4 Other Techniques 32 5.5 Tree-based GP 39 6 Modular, Grammatical and Developmental Tree-based GP 47 6.1 Evolving Modular and Hierarchical Structures 47 6.1.1 Automatically Defined Functions 48 6.1.2 Program Architecture and Architecture-Altering 50 6.2 Constraining Structures 51 6.2.1 Enforcing Particular Structures 52 6.2.2 Strongly Typed GP 52 6.2.3 Grammar-based Constraints 53 6.2.4 Constraints and Bias 55 6.3 Developmental Genetic Programming 57 6.4 Strongly Typed Autoconstructive GP with PushGP 59 7 Linear and Graph Genetic Programming 61 7.1 Linear Genetic Programming 61 7.1.1 Motivations 61 7.1.2 Linear GP Representations 62 7.1.3 Linear GP Operators 64 7.2 Graph-Based Genetic Programming 65 7.2.1 Parallel Distributed GP (PDGP) 65 7.2.2 PADO 67 7.2.3 Cartesian GP 67 7.2.4 Evolving Parallel Programs using Indirect Encodings 68 8 Probabilistic Genetic Programming 8.1 Estimation of Distribution Algorithms 69 8.2 Pure EDA GP 71 8.3 Mixing Grammars and Probabilities 74 9 Multi-objective Genetic Programming 75 9.1 Combining Multiple Objectives into a Scalar Fitness Function 75 9.2 Keeping the Objectives Separate 76 9.2.1 Multi-objective Bloat and Complexity Control 77 9.2.2 Other Objectives 78 9.2.3 Non-Pareto Criteria 80 9.3 Multiple Objectives via Dynamic and Staged Fitness Functions 80 9.4 Multi-objective Optimisation via Operator Bias 81 10 Fast and Distributed Genetic Programming 83 10.1 Reducing Fitness Evaluations/Increasing their Effectiveness 83 10.2 Reducing Cost of Fitness with Caches 86 10.3 Parallel and Distributed GP are Not Equivalent 88 10.4 Running GP on Parallel Hardware 89 10.4.1 Master–slave GP 89 10.4.2 GP Running on GPUs 90 10.4.3 GP on FPGAs 92 10.4.4 Sub-machine-code GP 93 10.5 Geographically Distributed GP 93 11 GP Theory and its Applications 97 11.1 Mathematical Models 98 11.2 Search Spaces 99 11.3 Bloat 101 11.3.1 Bloat in Theory 101 11.3.2 Bloat Control in Practice 104 III Practical Genetic Programming 12 Applications 12.1 Where GP has Done Well 12.2 Curve Fitting, Data Modelling and Symbolic Regression 12.3 Human Competitive Results – the Humies 12.4 Image and Signal Processing 12.5 Financial Trading, Time Series, and Economic Modelling 12.6 Industrial Process Control 12.7 Medicine, Biology and Bioinformatics 12.8 GP to Create Searchers and Solvers – Hyper-heuristics xiii 12.9 Entertainment and Computer Games 127 12.10The Arts 127 12.11Compression 128 13 Troubleshooting GP 13.1 Is there a Bug in the Code? 13.2 Can you Trust your Results? 13.3 There are No Silver Bullets 13.4 Small Changes can have Big Effects 13.5 Big Changes can have No Effect 13.6 Study your Populations 13.7 Encourage Diversity 13.8 Embrace Approximation 13.9 Control Bloat 13.10 Checkpoint Results 13.11 Report Well 13.12 Convince your Customers 14 Conclusions Tricks of the Trade A Resources A.1 Key Books A.2 Key Journals A.3 Key International Meetings A.4 GP Implementations A.5 On-Line Resources 145 B TinyGP 151 B.1 Overview of TinyGP 151 B.2 Input Data Files for TinyGP 153 B.3 Source Code 154 B.4 Compiling and Running TinyGP 162 Bibliography 167 Inde
    corecore