89,501 research outputs found
The Realization of Rapid Median Filter Algorithm on FPGA
Abstract. Traditional median filter algorithm has the long processing time, which goes against the real-time image processing. According to its shortcomings, this paper puts forward the rapid median filter algorithm, and uses DE2 board of the company called Altera to do the realization on FPGA (CycloneII 2C35). The experimental results show that the image pre-processing system is able to complete a variety of high-level image algorithms in milliseconds, and FPGA's parallel processing capability and pipeline operations can dramatically improve the speed of image processing, so the FPGA-based image processing system has broad prospects for development
High volume colour image processing with massively parallel embedded processors
Currently Oc´e uses FPGA technology for implementing colour image processing for their high volume colour printers. Although FPGA technology provides enough performance it, however, has a rather tedious development process. This paper describes the research conducted on an alternative implementation technology: software defined massively parallel processing. It is shown that this technology not only leads to a reduction in development time but also adds flexibility to the design
Characterizing Deep-Learning I/O Workloads in TensorFlow
The performance of Deep-Learning (DL) computing frameworks rely on the
performance of data ingestion and checkpointing. In fact, during the training,
a considerable high number of relatively small files are first loaded and
pre-processed on CPUs and then moved to accelerator for computation. In
addition, checkpointing and restart operations are carried out to allow DL
computing frameworks to restart quickly from a checkpoint. Because of this, I/O
affects the performance of DL applications. In this work, we characterize the
I/O performance and scaling of TensorFlow, an open-source programming framework
developed by Google and specifically designed for solving DL problems. To
measure TensorFlow I/O performance, we first design a micro-benchmark to
measure TensorFlow reads, and then use a TensorFlow mini-application based on
AlexNet to measure the performance cost of I/O and checkpointing in TensorFlow.
To improve the checkpointing performance, we design and implement a burst
buffer. We find that increasing the number of threads increases TensorFlow
bandwidth by a maximum of 2.3x and 7.8x on our benchmark environments. The use
of the tensorFlow prefetcher results in a complete overlap of computation on
accelerator and input pipeline on CPU eliminating the effective cost of I/O on
the overall performance. The use of a burst buffer to checkpoint to a fast
small capacity storage and copy asynchronously the checkpoints to a slower
large capacity storage resulted in a performance improvement of 2.6x with
respect to checkpointing directly to slower storage on our benchmark
environment.Comment: Accepted for publication at pdsw-DISCS 201
Processing Images from the Zwicky Transient Facility
The Zwicky Transient Facility is a new robotic-observing program, in which a
newly engineered 600-MP digital camera with a pioneeringly large field of view,
47~square degrees, will be installed into the 48-inch Samuel Oschin Telescope
at the Palomar Observatory. The camera will generate ~petabyte of raw
image data over three years of operations. In parallel related work, new
hardware and software systems are being developed to process these data in real
time and build a long-term archive for the processed products. The first public
release of archived products is planned for early 2019, which will include
processed images and astronomical-source catalogs of the northern sky in the
and bands. Source catalogs based on two different methods will be
generated for the archive: aperture photometry and point-spread-function
fitting.Comment: 6 pages, 4 figures, submitted to RTSRE Proceedings (www.rtsre.org
Neuroimaging study designs, computational analyses and data provenance using the LONI pipeline.
Modern computational neuroscience employs diverse software tools and multidisciplinary expertise to analyze heterogeneous brain data. The classical problems of gathering meaningful data, fitting specific models, and discovering appropriate analysis and visualization tools give way to a new class of computational challenges--management of large and incongruous data, integration and interoperability of computational resources, and data provenance. We designed, implemented and validated a new paradigm for addressing these challenges in the neuroimaging field. Our solution is based on the LONI Pipeline environment [3], [4], a graphical workflow environment for constructing and executing complex data processing protocols. We developed study-design, database and visual language programming functionalities within the LONI Pipeline that enable the construction of complete, elaborate and robust graphical workflows for analyzing neuroimaging and other data. These workflows facilitate open sharing and communication of data and metadata, concrete processing protocols, result validation, and study replication among different investigators and research groups. The LONI Pipeline features include distributed grid-enabled infrastructure, virtualized execution environment, efficient integration, data provenance, validation and distribution of new computational tools, automated data format conversion, and an intuitive graphical user interface. We demonstrate the new LONI Pipeline features using large scale neuroimaging studies based on data from the International Consortium for Brain Mapping [5] and the Alzheimer's Disease Neuroimaging Initiative [6]. User guides, forums, instructions and downloads of the LONI Pipeline environment are available at http://pipeline.loni.ucla.edu
- …