89,501 research outputs found

    The Realization of Rapid Median Filter Algorithm on FPGA

    Get PDF
    Abstract. Traditional median filter algorithm has the long processing time, which goes against the real-time image processing. According to its shortcomings, this paper puts forward the rapid median filter algorithm, and uses DE2 board of the company called Altera to do the realization on FPGA (CycloneII 2C35). The experimental results show that the image pre-processing system is able to complete a variety of high-level image algorithms in milliseconds, and FPGA's parallel processing capability and pipeline operations can dramatically improve the speed of image processing, so the FPGA-based image processing system has broad prospects for development

    High volume colour image processing with massively parallel embedded processors

    Get PDF
    Currently Oc´e uses FPGA technology for implementing colour image processing for their high volume colour printers. Although FPGA technology provides enough performance it, however, has a rather tedious development process. This paper describes the research conducted on an alternative implementation technology: software defined massively parallel processing. It is shown that this technology not only leads to a reduction in development time but also adds flexibility to the design

    Characterizing Deep-Learning I/O Workloads in TensorFlow

    Full text link
    The performance of Deep-Learning (DL) computing frameworks rely on the performance of data ingestion and checkpointing. In fact, during the training, a considerable high number of relatively small files are first loaded and pre-processed on CPUs and then moved to accelerator for computation. In addition, checkpointing and restart operations are carried out to allow DL computing frameworks to restart quickly from a checkpoint. Because of this, I/O affects the performance of DL applications. In this work, we characterize the I/O performance and scaling of TensorFlow, an open-source programming framework developed by Google and specifically designed for solving DL problems. To measure TensorFlow I/O performance, we first design a micro-benchmark to measure TensorFlow reads, and then use a TensorFlow mini-application based on AlexNet to measure the performance cost of I/O and checkpointing in TensorFlow. To improve the checkpointing performance, we design and implement a burst buffer. We find that increasing the number of threads increases TensorFlow bandwidth by a maximum of 2.3x and 7.8x on our benchmark environments. The use of the tensorFlow prefetcher results in a complete overlap of computation on accelerator and input pipeline on CPU eliminating the effective cost of I/O on the overall performance. The use of a burst buffer to checkpoint to a fast small capacity storage and copy asynchronously the checkpoints to a slower large capacity storage resulted in a performance improvement of 2.6x with respect to checkpointing directly to slower storage on our benchmark environment.Comment: Accepted for publication at pdsw-DISCS 201

    Processing Images from the Zwicky Transient Facility

    Get PDF
    The Zwicky Transient Facility is a new robotic-observing program, in which a newly engineered 600-MP digital camera with a pioneeringly large field of view, 47~square degrees, will be installed into the 48-inch Samuel Oschin Telescope at the Palomar Observatory. The camera will generate ∼1\sim 1~petabyte of raw image data over three years of operations. In parallel related work, new hardware and software systems are being developed to process these data in real time and build a long-term archive for the processed products. The first public release of archived products is planned for early 2019, which will include processed images and astronomical-source catalogs of the northern sky in the gg and rr bands. Source catalogs based on two different methods will be generated for the archive: aperture photometry and point-spread-function fitting.Comment: 6 pages, 4 figures, submitted to RTSRE Proceedings (www.rtsre.org

    Neuroimaging study designs, computational analyses and data provenance using the LONI pipeline.

    Get PDF
    Modern computational neuroscience employs diverse software tools and multidisciplinary expertise to analyze heterogeneous brain data. The classical problems of gathering meaningful data, fitting specific models, and discovering appropriate analysis and visualization tools give way to a new class of computational challenges--management of large and incongruous data, integration and interoperability of computational resources, and data provenance. We designed, implemented and validated a new paradigm for addressing these challenges in the neuroimaging field. Our solution is based on the LONI Pipeline environment [3], [4], a graphical workflow environment for constructing and executing complex data processing protocols. We developed study-design, database and visual language programming functionalities within the LONI Pipeline that enable the construction of complete, elaborate and robust graphical workflows for analyzing neuroimaging and other data. These workflows facilitate open sharing and communication of data and metadata, concrete processing protocols, result validation, and study replication among different investigators and research groups. The LONI Pipeline features include distributed grid-enabled infrastructure, virtualized execution environment, efficient integration, data provenance, validation and distribution of new computational tools, automated data format conversion, and an intuitive graphical user interface. We demonstrate the new LONI Pipeline features using large scale neuroimaging studies based on data from the International Consortium for Brain Mapping [5] and the Alzheimer's Disease Neuroimaging Initiative [6]. User guides, forums, instructions and downloads of the LONI Pipeline environment are available at http://pipeline.loni.ucla.edu
    • …
    corecore