109,257 research outputs found
Pauli Tomography: complete characterization of a single qubit device
The marriage of Quantum Physics and Information Technology, originally
motivated by the need for miniaturization, has recently opened the way to the
realization of radically new information-processing devices, with the
possibility of guaranteed secure cryptographic communications, and tremendous
speedups of some complex computational tasks. Among the many problems posed by
the new information technology there is the need of characterizing the new
quantum devices, making a complete identification and characterization of their
functioning. As we will see, quantum mechanics provides us with a powerful tool
to achieve the task easily and efficiently: this tools is the so called quantum
entanglement, the basis of the quantum parallelism of the future computers. We
present here the first full experimental quantum characterization of a
single-qubit device. The new method, we may refer to as ''quantum
radiography'', uses a Pauli Quantum Tomography at the output of the device, and
needs only a single entangled state at the input, which works on the test
channel as all possible input states in quantum parallel. The method can be
easily extended to any n-qubits device
Characterization of message-passing overhead on the AP3000 multicomputer
This is a post-peer-review, pre-copyedit version. The final authenticated version is available online at: http://dx.doi.org/10.1109/ICPP.2001.952077[Abstract] The performance of the communication primitives of parallel computers is critical for the overall system performance. The characterization of the communication overhead is very important to estimate the global performance of parallel applications and to detect possible bottlenecks. In this paper, we evaluate, model and compare the performance of the message-passing libraries provided by the Fujitsu AP3000 multicomputer: MPI/AP, PVM/AP and APlib. Our aim is to fairly characterize the communication primitives using general models and performance metrics.Ministerio de Ciencia y Tecnología; 1FD97-0118-C02
Parallel linear algebra on clusters
Parallel performance optimization is being applied and further improvements are studied for parallel linear algebra on clusters. Several parallelization guidelines have been defined and are being used on single clusters and local area networks used for parallel computing. In this context, some linear algebra parallel algorithms have been implemented following the parallelization guidelines, and experimentation has shown very good performance. Also, the parallel algorithms outperform the corresponding parallel algorithms implemented on ScaLAPACK (Scalable LAPACK), which is considered to have highly optimized parallel algorithms for distributed memory parallel computers. Also, using more than a single cluster or local area network for parallel linear algebra computing seems to be a natural approach, taking into account the high availability of such computing platforms in academic/research environments. In this context of multiple clusters, there are many interesting challenges, and many of them are still to be exactly defined and/or characterized. Intercluster communication performance characterization seems to be the first factor to be precisely quantified and it is expected that communication performance quantification will give a starting point from which analyze current and future approaches for parallel performance using more than one cluster or local area network for parallel cooperating processing.Eje: Otro
High-performance cluster computing, algorithms, implementations and performance evaluation for computation-intensive applications to promote complex scientific research on turbulent flows
Large-scale high-performance computing is a very rapidly growing field of research that plays a vital role in the advance of science, engineering, and modern industrial technology. Increasing sophistication in research has led to a need for bigger and faster computers or computer clusters, and high-performance computer systems are themselves stimulating the redevelopment of the methods of computation. Computing is fast becoming the most frequently used technique to explore new questions. We have developed high-performance computer simulation modeling software system on turbulent flows. Five papers are selected to present here from dozens of papers published in our efforts on complex software system development and knowledge discovery through computer simulations. The first paper describes the end-to-end computer simulation system development and simulation results that help understand the nature of complex shelterbelt turbulent flows. The second paper deals specifically with high-performance algorithm design and implementation in a cluster of computers. The third paper discusses the twelve design processes of parallel algorithms and software system as well as theoretical performance modeling and characterization of cluster computing. The fourth paper is about the computing framework of drag and pressure coefficients. The fifth paper is about simulated evapotranspiration and energy partition of inhomogeneous ecosystems. We discuss the end-to-end computer simulation system software development, distributed parallel computing performance modeling and system performance characterization. We design and compare several parallel implementations of our computer simulation system and show that the performance depends on algorithm design, communication channel pattern, and coding strategies that significantly impact load balancing, speedup, and computing efficiency. For a given cluster communication characteristics and a given problem complexity, there exists an optimal number of nodes. With this computer simulation system, we resolved many historically controversial issues and a lot of important problems
Parallel linear algebra on clusters
Parallel performance optimization is being applied and further improvements are studied for parallel linear algebra on clusters. Several parallelization guidelines have been defined and are being used on single clusters and local area networks used for parallel computing. In this context, some linear algebra parallel algorithms have been implemented following the parallelization guidelines, and experimentation has shown very good performance. Also, the parallel algorithms outperform the corresponding parallel algorithms implemented on ScaLAPACK (Scalable LAPACK), which is considered to have highly optimized parallel algorithms for distributed memory parallel computers. Also, using more than a single cluster or local area network for parallel linear algebra computing seems to be a natural approach, taking into account the high availability of such computing platforms in academic/research environments. In this context of multiple clusters, there are many interesting challenges, and many of them are still to be exactly defined and/or characterized. Intercluster communication performance characterization seems to be the first factor to be precisely quantified and it is expected that communication performance quantification will give a starting point from which analyze current and future approaches for parallel performance using more than one cluster or local area network for parallel cooperating processing.Eje: OtrosRed de Universidades con Carreras en Informática (RedUNCI
FPGA Acceleration of Communication-Bound Streaming Applications: Architecture Modeling and a 3D Image Compositing Case Study
Reconfigurable computers usually provide a limited
number of different memory resources, such as host
memory, external memory, and on-chip memory with different
capacities and communication characteristics. A key
challenge for achieving high-performance with reconfigurable
accelerators is the efficient utilization of the available memory
resources. A detailed knowledge of the memories' parameters
is key for generating an optimized communication layout. In this paper, we discuss a benchmarking environment
for generating such a characterization. The environment is
built on IMORC, our architectural template and on-chip
network for creating reconfigurable accelerators. We provide
a characterization of the memory resources available on the
XtremeData XD1000 reconfigurable computer. Based on this
data, we present as a case study the implementation of a 3D
image compositing accelerator that is able to double the frame rate
of a parallel renderer
Performance Evaluation of Automatically Generated Data Parallel Programs
International audienceIn this paper, the problem of evaluating the performance of parallel programs generated by data parallel compilers is studied. These compilers take as input an application written in a sequential language augmented with data distribution directives and produce a parallel version based on the specifed partitioning of data. A methodology for evaluating the relationships existing among the program characteristics, the data distribution adopted, and the performance indices measured during the program execution is described. It consists of three phases: a "static" description of the program under study, a "dynamic" description, based on the measurement and the analysis of its execution on a real system, and the construction of a workload model, by using workload characterization techniques. Following such a methodology, decisions related to the selection of the data distribution to be adopted can be facilitated. The approach is exposed through the use of the Pandore environment, designed for the execution of sequential programs on distributed memory parallel computers. It is composed of a compiler, a runtime system and tools for trace and profile generation. The results of an experiment explaining the methodology are presented
Paralellized ensemble Kalman filter for hydraulic conductivity characterization
[EN] The ensemble Kalman filter (EnKF) is nowadays recognized as an excellent inverse method for hydraulic conductivity characterization using transient piezometric head data. Its implementation is well suited for a parallel computing environment. A parallel code has been designed that uses parallelization both in the forecast step and in the analysis step. In the forecast step, each member of the ensemble is sent to a different processor, while in the analysis step, the computations of the covariances are distributed between the different processors. An important aspect of the parallelization is to limit as much as possible the communication between the processors in order to maximize execution time reduction.
Four tests are carried out to evaluate the performance of the parallelization with different ensemble and model sizes. The results show the savings provided by the parallel EnKF, especially for a large number of ensemble realizations. (c) 2012 Elsevier Ltd. All rights reserved.The first author acknowledges the financial support from China Scholarship Council (CSC). Financial support to carry out this work was also received from the Spanish Ministry of Science and Innovation through project CGL2011-23295, and from the Universitat Politecnica de Valencia through project PERFORA.Xu, T.; Gómez-Hernández, JJ.; Li ., L.; Zhou ., H. (2013). Paralellized ensemble Kalman filter for hydraulic conductivity characterization. Computers and Geosciences. 52:42-49. https://doi.org/10.1016/j.cageo.2012.10.007S42495
Recommended from our members
Functional requirements for gas characterization system computer software
This document provides the Functional Requirements for the Computer Software operating the Gas Characterization System (GCS), which monitors the combustible gasses in the vapor space of selected tanks. Necessary computer functions are defined to support design, testing, operation, and change control. The GCS requires several individual computers to address the control and data acquisition functions of instruments and sensors. These computers are networked for communication, and must multi-task to accommodate operation in parallel
Workload characterization of the shared/buy-in computing cluster at Boston University
Computing clusters provide a complete environment
for computational research, including bio-informatics, machine
learning, and image processing. The Shared Computing Cluster
(SCC) at Boston University is based on a shared/buy-in architecture
that combines shared computers, which are free to be
used by all users, and buy-in computers, which are computers
purchased by users for semi-exclusive use. Although there exists
significant work on characterizing the performance of computing
clusters, little is known about shared/buy-in architectures. Using
data traces, we statistically analyze the performance of the SCC.
Our results show that the average waiting time of a buy-in job
is 16.1% shorter than that of a shared job. Furthermore, we
identify parameters that have a major impact on the performance
experienced by shared and buy-in jobs. These parameters include
the type of parallel environment and the run time limit (i.e., the
maximum time during which a job can use a resource). Finally,
we show that the semi-exclusive paradigm, which allows any SCC
user to use idle buy-in resources for a limited time, increases
the utilization of buy-in resources by 17.4%, thus significantly
improving the performance of the system as a whole.http://people.bu.edu/staro/MIT_Conference_Yoni.pdfAccepted manuscrip
- …