449,036 research outputs found
Evaluation of Single-Chip, Real-Time Tomographic Data Processing on FPGA - SoC Devices
A novel approach to tomographic data processing has been developed and
evaluated using the Jagiellonian PET (J-PET) scanner as an example. We propose
a system in which there is no need for powerful, local to the scanner
processing facility, capable to reconstruct images on the fly. Instead we
introduce a Field Programmable Gate Array (FPGA) System-on-Chip (SoC) platform
connected directly to data streams coming from the scanner, which can perform
event building, filtering, coincidence search and Region-Of-Response (ROR)
reconstruction by the programmable logic and visualization by the integrated
processors. The platform significantly reduces data volume converting raw data
to a list-mode representation, while generating visualization on the fly.Comment: IEEE Transactions on Medical Imaging, 17 May 201
Parallel VLSI architecture emulation and the organization of APSA/MPP
The Applicative Programming System Architecture (APSA) combines an applicative language interpreter with a novel parallel computer architecture that is well suited for Very Large Scale Integration (VLSI) implementation. The Massively Parallel Processor (MPP) can simulate VLSI circuits by allocating one processing element in its square array to an area on a square VLSI chip. As long as there are not too many long data paths, the MPP can simulate a VLSI clock cycle very rapidly. The APSA circuit contains a binary tree with a few long paths and many short ones. A skewed H-tree layout allows every processing element to simulate a leaf cell and up to four tree nodes, with no loss in parallelism. Emulation of a key APSA algorithm on the MPP resulted in performance 16,000 times faster than a Vax. This speed will make it possible for the APSA language interpreter to run fast enough to support research in parallel list processing algorithms
STS-41 Space Shuttle mission report
The STS-41 Space Shuttle Program Mission Report contains a summary of the vehicle subsystem activities on this thirty-sixth flight of the Space Shuttle and the eleventh flight of the Orbiter vehicle, Discovery (OV-103). In addition to the Discovery vehicle, the flight vehicle consisted of an External Tank (ET) (designated as ET-39/LWT-32), three Space Shuttle main engines (SSME's) (serial numbers 2011, 2031, and 2107), and two Solid Rocket Boosters (SRB's), designated as BI-040. The primary objective of the STS-41 mission was to successfully deploy the Ulysses/inertial upper stage (IUS)/payload assist module (PAM-S) spacecraft. The secondary objectives were to perform all operations necessary to support the requirements of the Shuttle Backscatter Ultraviolet (SSBUV) Spectrometer, Solid Surface Combustion Experiment (SSCE), Space Life Sciences Training Program Chromosome and Plant Cell Division in Space (CHROMEX), Voice Command System (VCS), Physiological Systems Experiment (PSE), Radiation Monitoring Experiment - 3 (RME-3), Investigations into Polymer Membrane Processing (IPMP), Air Force Maui Optical Calibration Test (AMOS), and Intelsat Solar Array Coupon (ISAC) payloads. The sequence of events for this mission is shown in tabular form. Summarized are the significant problems that occurred in the Orbiter subsystems during the mission. The official problem tracking list is presented. In addition, each Orbiter problem is cited in the subsystem discussion
Methods of forming an expert assessment of the criteria of an information system for managing projects and programs
The article presents a method for determining and ranking significance of the criteria of an information system for managing projects and programs (hereinafter, PMIS) based on the concept of subjective probability with the help of expert assessments. The method of expert assessments is implemented by processing the opinions of experienced specialists on the possible values of losses and (or) the probability of their occurrence. It is also used in non-formalizable problem situations, when the lack of a sufficient array of information or its unreliability does not allow the use of purely formal mathematical methods. When analyzing the PMIS choice, expert assessments can be used, firstly, to form a subjective assessment of one or another PMIS with the subsequent use of this information in order to quantify it using statistical methods. Secondly, for a qualitative assessment of the PMIS choice in terms of determining their rank significance, priority in an ordered list of PMIS criteria.
As the main stages of the proposed methodology, the following are proposed:
1) development of a list of assessed PMIS criteria and formation of a list of experts;
2) conducting a survey of experts in order to obtain a set of individual expert assessments according to the PMIS criteria;
3) calculation of the average assessment criteria of the PMIS;
4) checking the consistency of expert opinions on the rank significance of the assessed PMIS criteria based on the Kendall coefficient of concordance;
5) summing up the results of expert assessment of the PMIS criteria.
The practical aspects of the expert assessment are considered: calculation tables, the method of filling them, processing and analyzing the results. The method of expert assessment of the PMIS criteria was further developed, thanks to which a set of effective and functional criteria was determined, which will be taken into account when developing technical requirements for this syste
Method of up-front load balancing for local memory parallel processors
In a parallel processing computer system with multiple processing units and shared memory, a method is disclosed for uniformly balancing the aggregate computational load in, and utilizing minimal memory by, a network having identical computations to be executed at each connection therein. Read-only and read-write memory are subdivided into a plurality of process sets, which function like artificial processing units. Said plurality of process sets is iteratively merged and reduced to the number of processing units without exceeding the balance load. Said merger is based upon the value of a partition threshold, which is a measure of the memory utilization. The turnaround time and memory savings of the instant method are functions of the number of processing units available and the number of partitions into which the memory is subdivided. Typical results of the preferred embodiment yielded memory savings of from sixty to seventy five percent
Recommended from our members
Parallel data compression
Data compression schemes remove data redundancy in communicated and stored data and increase the effective capacities of communication and storage devices. Parallel algorithms and implementations for textual data compression are surveyed. Related concepts from parallel computation and information theory are briefly discussed. Static and dynamic methods for codeword construction and transmission on various models of parallel computation are described. Included are parallel methods which boost system speed by coding data concurrently, and approaches which employ multiple compression techniques to improve compression ratios. Theoretical and empirical comparisons are reported and areas for future research are suggested
Breadth First Search Vectorization on the Intel Xeon Phi
Breadth First Search (BFS) is a building block for graph algorithms and has
recently been used for large scale analysis of information in a variety of
applications including social networks, graph databases and web searching. Due
to its importance, a number of different parallel programming models and
architectures have been exploited to optimize the BFS. However, due to the
irregular memory access patterns and the unstructured nature of the large
graphs, its efficient parallelization is a challenge. The Xeon Phi is a
massively parallel architecture available as an off-the-shelf accelerator,
which includes a powerful 512 bit vector unit with optimized scatter and gather
functions. Given its potential benefits, work related to graph traversing on
this architecture is an active area of research.
We present a set of experiments in which we explore architectural features of
the Xeon Phi and how best to exploit them in a top-down BFS algorithm but the
techniques can be applied to the current state-of-the-art hybrid, top-down plus
bottom-up, algorithms.
We focus on the exploitation of the vector unit by developing an improved
highly vectorized OpenMP parallel algorithm, using vector intrinsics, and
understanding the use of data alignment and prefetching. In addition, we
investigate the impact of hyperthreading and thread affinity on performance, a
topic that appears under researched in the literature. As a result, we achieve
what we believe is the fastest published top-down BFS algorithm on the version
of Xeon Phi used in our experiments. The vectorized BFS top-down source code
presented in this paper can be available on request as free-to-use software
Self-organizing lists on the Xnet
The first parallel designs for implementing self-organizing lists on the Xnet interconnection network are presented. Self-organizing lists permute the order of list entries after an entry is accessed according to some update hueristic. The heuristic attempts to place frequently requested entries closer to the front of the list. This paper outlines Xnet systems for self-organizing lists under the move-to-front and transpose update heuristics. Our novel designs can be used to achieve high-speed lossless text compression
GraphH: High Performance Big Graph Analytics in Small Clusters
It is common for real-world applications to analyze big graphs using
distributed graph processing systems. Popular in-memory systems require an
enormous amount of resources to handle big graphs. While several out-of-core
approaches have been proposed for processing big graphs on disk, the high disk
I/O overhead could significantly reduce performance. In this paper, we propose
GraphH to enable high-performance big graph analytics in small clusters.
Specifically, we design a two-stage graph partition scheme to evenly divide the
input graph into partitions, and propose a GAB (Gather-Apply-Broadcast)
computation model to make each worker process a partition in memory at a time.
We use an edge cache mechanism to reduce the disk I/O overhead, and design a
hybrid strategy to improve the communication performance. GraphH can
efficiently process big graphs in small clusters or even a single commodity
server. Extensive evaluations have shown that GraphH could be up to 7.8x faster
compared to popular in-memory systems, such as Pregel+ and PowerGraph when
processing generic graphs, and more than 100x faster than recently proposed
out-of-core systems, such as GraphD and Chaos when processing big graphs
Processing Posting Lists Using OpenCL
One of the main requirements of internet search engines is the ability to retrieve relevant results with faster response times. Yioop is an open source search engine designed and developed in PHP by Dr. Chris Pollett. The goal of this project is to explore the possibilities of enhancing the performance of Yioop by substituting resource-intensive existing PHP functions with C based native PHP extensions and the parallel data processing technology OpenCL. OpenCL leverages the Graphical Processing Unit (GPU) of a computer system for performance improvements.
Some of the critical functions in search engines are resource-intensive in terms of processing power, memory, and I/O usage. The processing times vary based on the complexity and magnitude of data involved. This project involves different phases such as identifying critical resource intensive functions, initially replacing such methods with PHP Extensions, and eventually experimenting with OpenCL code. We also ran performance tests to measure the reduction in processing times. From our results, we concluded that PHP Extensions and OpenCL processing resulted in performance improvements
- …