109,257 research outputs found

    Pauli Tomography: complete characterization of a single qubit device

    Get PDF
    The marriage of Quantum Physics and Information Technology, originally motivated by the need for miniaturization, has recently opened the way to the realization of radically new information-processing devices, with the possibility of guaranteed secure cryptographic communications, and tremendous speedups of some complex computational tasks. Among the many problems posed by the new information technology there is the need of characterizing the new quantum devices, making a complete identification and characterization of their functioning. As we will see, quantum mechanics provides us with a powerful tool to achieve the task easily and efficiently: this tools is the so called quantum entanglement, the basis of the quantum parallelism of the future computers. We present here the first full experimental quantum characterization of a single-qubit device. The new method, we may refer to as ''quantum radiography'', uses a Pauli Quantum Tomography at the output of the device, and needs only a single entangled state at the input, which works on the test channel as all possible input states in quantum parallel. The method can be easily extended to any n-qubits device

    Characterization of message-passing overhead on the AP3000 multicomputer

    Get PDF
    This is a post-peer-review, pre-copyedit version. The final authenticated version is available online at: http://dx.doi.org/10.1109/ICPP.2001.952077[Abstract] The performance of the communication primitives of parallel computers is critical for the overall system performance. The characterization of the communication overhead is very important to estimate the global performance of parallel applications and to detect possible bottlenecks. In this paper, we evaluate, model and compare the performance of the message-passing libraries provided by the Fujitsu AP3000 multicomputer: MPI/AP, PVM/AP and APlib. Our aim is to fairly characterize the communication primitives using general models and performance metrics.Ministerio de Ciencia y Tecnología; 1FD97-0118-C02

    Parallel linear algebra on clusters

    Get PDF
    Parallel performance optimization is being applied and further improvements are studied for parallel linear algebra on clusters. Several parallelization guidelines have been defined and are being used on single clusters and local area networks used for parallel computing. In this context, some linear algebra parallel algorithms have been implemented following the parallelization guidelines, and experimentation has shown very good performance. Also, the parallel algorithms outperform the corresponding parallel algorithms implemented on ScaLAPACK (Scalable LAPACK), which is considered to have highly optimized parallel algorithms for distributed memory parallel computers. Also, using more than a single cluster or local area network for parallel linear algebra computing seems to be a natural approach, taking into account the high availability of such computing platforms in academic/research environments. In this context of multiple clusters, there are many interesting challenges, and many of them are still to be exactly defined and/or characterized. Intercluster communication performance characterization seems to be the first factor to be precisely quantified and it is expected that communication performance quantification will give a starting point from which analyze current and future approaches for parallel performance using more than one cluster or local area network for parallel cooperating processing.Eje: Otro

    High-performance cluster computing, algorithms, implementations and performance evaluation for computation-intensive applications to promote complex scientific research on turbulent flows

    Get PDF
    Large-scale high-performance computing is a very rapidly growing field of research that plays a vital role in the advance of science, engineering, and modern industrial technology. Increasing sophistication in research has led to a need for bigger and faster computers or computer clusters, and high-performance computer systems are themselves stimulating the redevelopment of the methods of computation. Computing is fast becoming the most frequently used technique to explore new questions. We have developed high-performance computer simulation modeling software system on turbulent flows. Five papers are selected to present here from dozens of papers published in our efforts on complex software system development and knowledge discovery through computer simulations. The first paper describes the end-to-end computer simulation system development and simulation results that help understand the nature of complex shelterbelt turbulent flows. The second paper deals specifically with high-performance algorithm design and implementation in a cluster of computers. The third paper discusses the twelve design processes of parallel algorithms and software system as well as theoretical performance modeling and characterization of cluster computing. The fourth paper is about the computing framework of drag and pressure coefficients. The fifth paper is about simulated evapotranspiration and energy partition of inhomogeneous ecosystems. We discuss the end-to-end computer simulation system software development, distributed parallel computing performance modeling and system performance characterization. We design and compare several parallel implementations of our computer simulation system and show that the performance depends on algorithm design, communication channel pattern, and coding strategies that significantly impact load balancing, speedup, and computing efficiency. For a given cluster communication characteristics and a given problem complexity, there exists an optimal number of nodes. With this computer simulation system, we resolved many historically controversial issues and a lot of important problems

    Parallel linear algebra on clusters

    Get PDF
    Parallel performance optimization is being applied and further improvements are studied for parallel linear algebra on clusters. Several parallelization guidelines have been defined and are being used on single clusters and local area networks used for parallel computing. In this context, some linear algebra parallel algorithms have been implemented following the parallelization guidelines, and experimentation has shown very good performance. Also, the parallel algorithms outperform the corresponding parallel algorithms implemented on ScaLAPACK (Scalable LAPACK), which is considered to have highly optimized parallel algorithms for distributed memory parallel computers. Also, using more than a single cluster or local area network for parallel linear algebra computing seems to be a natural approach, taking into account the high availability of such computing platforms in academic/research environments. In this context of multiple clusters, there are many interesting challenges, and many of them are still to be exactly defined and/or characterized. Intercluster communication performance characterization seems to be the first factor to be precisely quantified and it is expected that communication performance quantification will give a starting point from which analyze current and future approaches for parallel performance using more than one cluster or local area network for parallel cooperating processing.Eje: OtrosRed de Universidades con Carreras en Informática (RedUNCI

    FPGA Acceleration of Communication-Bound Streaming Applications: Architecture Modeling and a 3D Image Compositing Case Study

    Get PDF
    Reconfigurable computers usually provide a limited number of different memory resources, such as host memory, external memory, and on-chip memory with different capacities and communication characteristics. A key challenge for achieving high-performance with reconfigurable accelerators is the efficient utilization of the available memory resources. A detailed knowledge of the memories' parameters is key for generating an optimized communication layout. In this paper, we discuss a benchmarking environment for generating such a characterization. The environment is built on IMORC, our architectural template and on-chip network for creating reconfigurable accelerators. We provide a characterization of the memory resources available on the XtremeData XD1000 reconfigurable computer. Based on this data, we present as a case study the implementation of a 3D image compositing accelerator that is able to double the frame rate of a parallel renderer

    Performance Evaluation of Automatically Generated Data Parallel Programs

    Get PDF
    International audienceIn this paper, the problem of evaluating the performance of parallel programs generated by data parallel compilers is studied. These compilers take as input an application written in a sequential language augmented with data distribution directives and produce a parallel version based on the specifed partitioning of data. A methodology for evaluating the relationships existing among the program characteristics, the data distribution adopted, and the performance indices measured during the program execution is described. It consists of three phases: a "static" description of the program under study, a "dynamic" description, based on the measurement and the analysis of its execution on a real system, and the construction of a workload model, by using workload characterization techniques. Following such a methodology, decisions related to the selection of the data distribution to be adopted can be facilitated. The approach is exposed through the use of the Pandore environment, designed for the execution of sequential programs on distributed memory parallel computers. It is composed of a compiler, a runtime system and tools for trace and profile generation. The results of an experiment explaining the methodology are presented

    Paralellized ensemble Kalman filter for hydraulic conductivity characterization

    Full text link
    [EN] The ensemble Kalman filter (EnKF) is nowadays recognized as an excellent inverse method for hydraulic conductivity characterization using transient piezometric head data. Its implementation is well suited for a parallel computing environment. A parallel code has been designed that uses parallelization both in the forecast step and in the analysis step. In the forecast step, each member of the ensemble is sent to a different processor, while in the analysis step, the computations of the covariances are distributed between the different processors. An important aspect of the parallelization is to limit as much as possible the communication between the processors in order to maximize execution time reduction. Four tests are carried out to evaluate the performance of the parallelization with different ensemble and model sizes. The results show the savings provided by the parallel EnKF, especially for a large number of ensemble realizations. (c) 2012 Elsevier Ltd. All rights reserved.The first author acknowledges the financial support from China Scholarship Council (CSC). Financial support to carry out this work was also received from the Spanish Ministry of Science and Innovation through project CGL2011-23295, and from the Universitat Politecnica de Valencia through project PERFORA.Xu, T.; Gómez-Hernández, JJ.; Li ., L.; Zhou ., H. (2013). Paralellized ensemble Kalman filter for hydraulic conductivity characterization. Computers and Geosciences. 52:42-49. https://doi.org/10.1016/j.cageo.2012.10.007S42495

    Workload characterization of the shared/buy-in computing cluster at Boston University

    Full text link
    Computing clusters provide a complete environment for computational research, including bio-informatics, machine learning, and image processing. The Shared Computing Cluster (SCC) at Boston University is based on a shared/buy-in architecture that combines shared computers, which are free to be used by all users, and buy-in computers, which are computers purchased by users for semi-exclusive use. Although there exists significant work on characterizing the performance of computing clusters, little is known about shared/buy-in architectures. Using data traces, we statistically analyze the performance of the SCC. Our results show that the average waiting time of a buy-in job is 16.1% shorter than that of a shared job. Furthermore, we identify parameters that have a major impact on the performance experienced by shared and buy-in jobs. These parameters include the type of parallel environment and the run time limit (i.e., the maximum time during which a job can use a resource). Finally, we show that the semi-exclusive paradigm, which allows any SCC user to use idle buy-in resources for a limited time, increases the utilization of buy-in resources by 17.4%, thus significantly improving the performance of the system as a whole.http://people.bu.edu/staro/MIT_Conference_Yoni.pdfAccepted manuscrip
    corecore