449,036 research outputs found

    Evaluation of Single-Chip, Real-Time Tomographic Data Processing on FPGA - SoC Devices

    Get PDF
    A novel approach to tomographic data processing has been developed and evaluated using the Jagiellonian PET (J-PET) scanner as an example. We propose a system in which there is no need for powerful, local to the scanner processing facility, capable to reconstruct images on the fly. Instead we introduce a Field Programmable Gate Array (FPGA) System-on-Chip (SoC) platform connected directly to data streams coming from the scanner, which can perform event building, filtering, coincidence search and Region-Of-Response (ROR) reconstruction by the programmable logic and visualization by the integrated processors. The platform significantly reduces data volume converting raw data to a list-mode representation, while generating visualization on the fly.Comment: IEEE Transactions on Medical Imaging, 17 May 201

    Parallel VLSI architecture emulation and the organization of APSA/MPP

    Get PDF
    The Applicative Programming System Architecture (APSA) combines an applicative language interpreter with a novel parallel computer architecture that is well suited for Very Large Scale Integration (VLSI) implementation. The Massively Parallel Processor (MPP) can simulate VLSI circuits by allocating one processing element in its square array to an area on a square VLSI chip. As long as there are not too many long data paths, the MPP can simulate a VLSI clock cycle very rapidly. The APSA circuit contains a binary tree with a few long paths and many short ones. A skewed H-tree layout allows every processing element to simulate a leaf cell and up to four tree nodes, with no loss in parallelism. Emulation of a key APSA algorithm on the MPP resulted in performance 16,000 times faster than a Vax. This speed will make it possible for the APSA language interpreter to run fast enough to support research in parallel list processing algorithms

    STS-41 Space Shuttle mission report

    Get PDF
    The STS-41 Space Shuttle Program Mission Report contains a summary of the vehicle subsystem activities on this thirty-sixth flight of the Space Shuttle and the eleventh flight of the Orbiter vehicle, Discovery (OV-103). In addition to the Discovery vehicle, the flight vehicle consisted of an External Tank (ET) (designated as ET-39/LWT-32), three Space Shuttle main engines (SSME's) (serial numbers 2011, 2031, and 2107), and two Solid Rocket Boosters (SRB's), designated as BI-040. The primary objective of the STS-41 mission was to successfully deploy the Ulysses/inertial upper stage (IUS)/payload assist module (PAM-S) spacecraft. The secondary objectives were to perform all operations necessary to support the requirements of the Shuttle Backscatter Ultraviolet (SSBUV) Spectrometer, Solid Surface Combustion Experiment (SSCE), Space Life Sciences Training Program Chromosome and Plant Cell Division in Space (CHROMEX), Voice Command System (VCS), Physiological Systems Experiment (PSE), Radiation Monitoring Experiment - 3 (RME-3), Investigations into Polymer Membrane Processing (IPMP), Air Force Maui Optical Calibration Test (AMOS), and Intelsat Solar Array Coupon (ISAC) payloads. The sequence of events for this mission is shown in tabular form. Summarized are the significant problems that occurred in the Orbiter subsystems during the mission. The official problem tracking list is presented. In addition, each Orbiter problem is cited in the subsystem discussion

    Methods of forming an expert assessment of the criteria of an information system for managing projects and programs

    Get PDF
    The article presents a method for determining and ranking significance of the criteria of an information system for managing projects and programs (hereinafter, PMIS) based on the concept of subjective probability with the help of expert assessments. The method of expert assessments is implemented by processing the opinions of experienced specialists on the possible values of losses and (or) the probability of their occurrence. It is also used in non-formalizable problem situations, when the lack of a sufficient array of information or its unreliability does not allow the use of purely formal mathematical methods. When analyzing the PMIS choice, expert assessments can be used, firstly, to form a subjective assessment of one or another PMIS with the subsequent use of this information in order to quantify it using statistical methods. Secondly, for a qualitative assessment of the PMIS choice in terms of determining their rank significance, priority in an ordered list of PMIS criteria. As the main stages of the proposed methodology, the following are proposed: 1) development of a list of assessed PMIS criteria and formation of a list of experts; 2) conducting a survey of experts in order to obtain a set of individual expert assessments according to the PMIS criteria; 3) calculation of the average assessment criteria of the PMIS; 4) checking the consistency of expert opinions on the rank significance of the assessed PMIS criteria based on the Kendall coefficient of concordance; 5) summing up the results of expert assessment of the PMIS criteria. The practical aspects of the expert assessment are considered: calculation tables, the method of filling them, processing and analyzing the results. The method of expert assessment of the PMIS criteria was further developed, thanks to which a set of effective and functional criteria was determined, which will be taken into account when developing technical requirements for this syste

    Method of up-front load balancing for local memory parallel processors

    Get PDF
    In a parallel processing computer system with multiple processing units and shared memory, a method is disclosed for uniformly balancing the aggregate computational load in, and utilizing minimal memory by, a network having identical computations to be executed at each connection therein. Read-only and read-write memory are subdivided into a plurality of process sets, which function like artificial processing units. Said plurality of process sets is iteratively merged and reduced to the number of processing units without exceeding the balance load. Said merger is based upon the value of a partition threshold, which is a measure of the memory utilization. The turnaround time and memory savings of the instant method are functions of the number of processing units available and the number of partitions into which the memory is subdivided. Typical results of the preferred embodiment yielded memory savings of from sixty to seventy five percent

    Breadth First Search Vectorization on the Intel Xeon Phi

    Full text link
    Breadth First Search (BFS) is a building block for graph algorithms and has recently been used for large scale analysis of information in a variety of applications including social networks, graph databases and web searching. Due to its importance, a number of different parallel programming models and architectures have been exploited to optimize the BFS. However, due to the irregular memory access patterns and the unstructured nature of the large graphs, its efficient parallelization is a challenge. The Xeon Phi is a massively parallel architecture available as an off-the-shelf accelerator, which includes a powerful 512 bit vector unit with optimized scatter and gather functions. Given its potential benefits, work related to graph traversing on this architecture is an active area of research. We present a set of experiments in which we explore architectural features of the Xeon Phi and how best to exploit them in a top-down BFS algorithm but the techniques can be applied to the current state-of-the-art hybrid, top-down plus bottom-up, algorithms. We focus on the exploitation of the vector unit by developing an improved highly vectorized OpenMP parallel algorithm, using vector intrinsics, and understanding the use of data alignment and prefetching. In addition, we investigate the impact of hyperthreading and thread affinity on performance, a topic that appears under researched in the literature. As a result, we achieve what we believe is the fastest published top-down BFS algorithm on the version of Xeon Phi used in our experiments. The vectorized BFS top-down source code presented in this paper can be available on request as free-to-use software

    Self-organizing lists on the Xnet

    Get PDF
    The first parallel designs for implementing self-organizing lists on the Xnet interconnection network are presented. Self-organizing lists permute the order of list entries after an entry is accessed according to some update hueristic. The heuristic attempts to place frequently requested entries closer to the front of the list. This paper outlines Xnet systems for self-organizing lists under the move-to-front and transpose update heuristics. Our novel designs can be used to achieve high-speed lossless text compression

    GraphH: High Performance Big Graph Analytics in Small Clusters

    Full text link
    It is common for real-world applications to analyze big graphs using distributed graph processing systems. Popular in-memory systems require an enormous amount of resources to handle big graphs. While several out-of-core approaches have been proposed for processing big graphs on disk, the high disk I/O overhead could significantly reduce performance. In this paper, we propose GraphH to enable high-performance big graph analytics in small clusters. Specifically, we design a two-stage graph partition scheme to evenly divide the input graph into partitions, and propose a GAB (Gather-Apply-Broadcast) computation model to make each worker process a partition in memory at a time. We use an edge cache mechanism to reduce the disk I/O overhead, and design a hybrid strategy to improve the communication performance. GraphH can efficiently process big graphs in small clusters or even a single commodity server. Extensive evaluations have shown that GraphH could be up to 7.8x faster compared to popular in-memory systems, such as Pregel+ and PowerGraph when processing generic graphs, and more than 100x faster than recently proposed out-of-core systems, such as GraphD and Chaos when processing big graphs

    Processing Posting Lists Using OpenCL

    Get PDF
    One of the main requirements of internet search engines is the ability to retrieve relevant results with faster response times. Yioop is an open source search engine designed and developed in PHP by Dr. Chris Pollett. The goal of this project is to explore the possibilities of enhancing the performance of Yioop by substituting resource-intensive existing PHP functions with C based native PHP extensions and the parallel data processing technology OpenCL. OpenCL leverages the Graphical Processing Unit (GPU) of a computer system for performance improvements. Some of the critical functions in search engines are resource-intensive in terms of processing power, memory, and I/O usage. The processing times vary based on the complexity and magnitude of data involved. This project involves different phases such as identifying critical resource intensive functions, initially replacing such methods with PHP Extensions, and eventually experimenting with OpenCL code. We also ran performance tests to measure the reduction in processing times. From our results, we concluded that PHP Extensions and OpenCL processing resulted in performance improvements
    corecore