3,102 research outputs found

    Effective Monte Carlo simulation on System-V massively parallel associative string processing architecture

    Get PDF
    We show that the latest version of massively parallel processing associative string processing architecture (System-V) is applicable for fast Monte Carlo simulation if an effective on-processor random number generator is implemented. Our lagged Fibonacci generator can produce 10810^8 random numbers on a processor string of 12K PE-s. The time dependent Monte Carlo algorithm of the one-dimensional non-equilibrium kinetic Ising model performs 80 faster than the corresponding serial algorithm on a 300 MHz UltraSparc.Comment: 8 pages, 9 color ps figures embedde

    Computer vision libraries for trailer truck testbed using open source computer vision libraries

    Get PDF
    Computer Vision is a field that aims at understanding and analyzing images from the real world to produce numerical and symbolical data. It is a first step at duplicating the capabilities of human vision by electronically understanding the image and perceiving its features. This work aims at providing some of the features of a human eye to a trailer truck. These features include getting a 3D wireframe from continuous images and prediction of the next position of the objects in view, while the truck is moving. The thesis has been divided into 3 sections. First section is acquiring images in real time. Second section is the preprocessing of images to achieve edges and points in the image, and the third section is converting the edges and the points into 3D wireframe and predicting of the next position of the image. OpenCV library and Point cloud libraries are used in this process to facilitate operations on 2D and 3D images --Abstract, page iii

    Content addressable memory project

    Get PDF
    A parameterized version of the tree processor was designed and tested (by simulation). The leaf processor design is 90 percent complete. We expect to complete and test a combination of tree and leaf cell designs in the next period. Work is proceeding on algorithms for the computer aided manufacturing (CAM), and once the design is complete we will begin simulating algorithms for large problems. The following topics are covered: (1) the practical implementation of content addressable memory; (2) design of a LEAF cell for the Rutgers CAM architecture; (3) a circuit design tool user's manual; and (4) design and analysis of efficient hierarchical interconnection networks

    On benchmarking of deep learning systems: software engineering issues and reproducibility challenges

    Get PDF
    Since AlexNet won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2012, Deep Learning (and Machine Learning/AI in general) gained an exponential interest. Nowadays, their adoption spreads over numerous sectors, like automotive, robotics, healthcare and finance. The ML advancement goes in pair with the quality improvement delivered by those solutions. However, those ameliorations are not for free: ML algorithms always require an increasing computational power, which pushes computer engineers to develop new devices capable of coping with this demand for performance. To foster the evolution of DSAs, and thus ML research, it is key to make it easy to experiment and compare them. This may be challenging since, even if the software built around these devices simplifies their usage, obtaining the best performance is not always straightforward. The situation gets even worse when the experiments are not conducted in a reproducible way. Even though the importance of reproducibility for the research is evident, it does not directly translate into reproducible experiments. In fact, as already shown by previous studies regarding other research fields, also ML is facing a reproducibility crisis. Our work addresses the topic of reproducibility of ML applications. Reproducibility in this context has two aspects: results reproducibility and performance reproducibility. While the reproducibility of the results is mandatory, performance reproducibility cannot be neglected because high-performance device usage causes cost. To understand how the ML situation is regarding reproducibility of performance, we reproduce results published for the MLPerf suite, which seems to be the most used machine learning benchmark. Because of the wide range of devices and frameworks used in different benchmark submissions, we focus on a subset of accuracy and performance results submitted to the MLPerf Inference benchmark, presenting a detailed analysis of the difficulties a scientist may find when trying to reproduce such a benchmark and a possible solution using our workflow tool for experiment reproducibility: PROVA!. We designed PROVA! to support the reproducibility in traditional HPC experiments, but we will show how we extended it to be used as a 'driver' for MLPerf benchmark applications. The PROVA! driver mode allows us to experiment with different versions of the MLPerf Inference benchmark switching among different hardware and software combinations and compare them in a reproducible way. In the last part, we will present the results of our reproducibility study, demonstrating the importance of having a support tool to reproduce and extend original experiments getting deeper knowledge about performance behaviours

    Lempel Ziv Welch data compression using associative processing as an enabling technology for real time application

    Get PDF
    Data compression is a term that refers to the reduction of data representation requirements either in storage and/or in transmission. A commonly used algorithm for compression is the Lempel-Ziv-Welch (LZW) method proposed by Terry A. Welch[l]. LZW is an adaptive, dictionary based, lossless algorithm. This provides for a general compression mechanism that is applicable to a broad range of inputs. Furthermore, the lossless nature of LZW implies that it is a reversible process which results in the original file/message being fully recoverable from compression. A variant of this algorithm is currently the foundation of the UNIX compress program. Additionally, LZW is one of the compression schemes defined in the TIFF standard[12], as well as in the CCITT V.42bis standard. One of the challenges in designing an efficient compression mechanism, such as LZW, which can be used in real time applications, is the speed of the search into the data dictionary. In this paper an Associative Processing implementation of the LZW algorithm is presented. This approach provides an efficient solution to this requirement. Additionally, it is shown that Associative Processing (ASP) allows for rapid and elegant development of the LZW algorithm that will generally outperform standard approaches in complexity, readability, and performance

    Java on Networks of Workstations (JavaNOW): A Parallel Computing Framework Inspired by Linda and the Message Passing Interface (MPI)

    Get PDF
    Networks of workstations are a dominant force in the distributed computing arena, due primarily to the excellent price/performance ratio of such systems when compared to traditionally massively parallel architectures. It is therefore critical to develop programming languages and environments that can help harness the raw computational power available on these systems. In this article, we present JavaNOW (Java on Networks of Workstations), a Java‐based framework for parallel programming on networks of workstations. It creates a virtual parallel machine similar to the MPI (Message Passing Interface) model, and provides distributed associative shared memory similar to the Linda memory model but with a richer set of primitive operations. JavaNOW provides a simple yet powerful framework for performing computation on networks of workstations. In addition to the Linda memory model, it provides for shared objects, implicit multithreading, implicit synchronization, object dataflow, and collective communications similar to those defined in MPI. JavaNOW is also a component of the Computational Neighborhood, a Java‐enabled suite of services for desktop computational sharing. The intent of JavaNOW is to present an environment for parallel computing that is both expressive and reliable and ultimately can deliver good to excellent performance. As JavaNOW is a work in progress, this article emphasizes the expressive potential of the JavaNOW environment and presents preliminary performance results only
    corecore