606,376 research outputs found

    Qubit-Qutrit Separability-Probability Ratios

    Full text link
    Paralleling our recent computationally-intensive (quasi-Monte Carlo) work for the case N=4 (quant-ph/0308037), we undertake the task for N=6 of computing to high numerical accuracy, the formulas of Sommers and Zyczkowski (quant-ph/0304041) for the (N^2-1)-dimensional volume and (N^2-2)-dimensional hyperarea of the (separable and nonseparable) N x N density matrices, based on the Bures (minimal monotone) metric -- and also their analogous formulas (quant-ph/0302197) for the (non-monotone) Hilbert-Schmidt metric. With the same seven billion well-distributed (``low-discrepancy'') sample points, we estimate the unknown volumes and hyperareas based on five additional (monotone) metrics of interest, including the Kubo-Mori and Wigner-Yanase. Further, we estimate all of these seven volume and seven hyperarea (unknown) quantities when restricted to the separable density matrices. The ratios of separable volumes (hyperareas) to separable plus nonseparable volumes (hyperareas) yield estimates of the separability probabilities of generically rank-six (rank-five) density matrices. The (rank-six) separability probabilities obtained based on the 35-dimensional volumes appear to be -- independently of the metric (each of the seven inducing Haar measure) employed -- twice as large as those (rank-five ones) based on the 34-dimensional hyperareas. Accepting such a relationship, we fit exact formulas to the estimates of the Bures and Hilbert-Schmidt separable volumes and hyperareas.(An additional estimate -- 33.9982 -- of the ratio of the rank-6 Hilbert-Schmidt separability probability to the rank-4 one is quite clearly close to integral too.) The doubling relationship also appears to hold for the N=4 case for the Hilbert-Schmidt metric, but not the others. We fit exact formulas for the Hilbert-Schmidt separable volumes and hyperareas.Comment: 36 pages, 15 figures, 11 tables, final PRA version, new last paragraph presenting qubit-qutrit probability ratios disaggregated by the two distinct forms of partial transpositio

    OPENMENDEL: A Cooperative Programming Project for Statistical Genetics

    Full text link
    Statistical methods for genomewide association studies (GWAS) continue to improve. However, the increasing volume and variety of genetic and genomic data make computational speed and ease of data manipulation mandatory in future software. In our view, a collaborative effort of statistical geneticists is required to develop open source software targeted to genetic epidemiology. Our attempt to meet this need is called the OPENMENDELproject (https://openmendel.github.io). It aims to (1) enable interactive and reproducible analyses with informative intermediate results, (2) scale to big data analytics, (3) embrace parallel and distributed computing, (4) adapt to rapid hardware evolution, (5) allow cloud computing, (6) allow integration of varied genetic data types, and (7) foster easy communication between clinicians, geneticists, statisticians, and computer scientists. This article reviews and makes recommendations to the genetic epidemiology community in the context of the OPENMENDEL project.Comment: 16 pages, 2 figures, 2 table

    Novel Monte Carlo Methods for Large-Scale Linear Algebra Operations

    Get PDF
    Linear algebra operations play an important role in scientific computing and data analysis. With increasing data volume and complexity in the Big Data era, linear algebra operations are important tools to process massive datasets. On one hand, the advent of modern high-performance computing architectures with increasing computing power has greatly enhanced our capability to deal with a large volume of data. One the other hand, many classical, deterministic numerical linear algebra algorithms have difficulty to scale to handle large data sets. Monte Carlo methods, which are based on statistical sampling, exhibit many attractive properties in dealing with large volume of datasets, including fast approximated results, memory efficiency, reduced data accesses, natural parallelism, and inherent fault tolerance. In this dissertation, we present new Monte Carlo methods to accommodate a set of fundamental and ubiquitous large-scale linear algebra operations, including solving large-scale linear systems, constructing low-rank matrix approximation, and approximating the extreme eigenvalues/ eigenvectors, across modern distributed and parallel computing architectures. First of all, we revisit the classical Ulam-von Neumann Monte Carlo algorithm and derive the necessary and sufficient condition for its convergence. To support a broad family of linear systems, we develop Krylov subspace Monte Carlo solvers that go beyond the use of Neumann series. New algorithms used in the Krylov subspace Monte Carlo solvers include (1) a Breakdown-Free Block Conjugate Gradient algorithm to address the potential rank deficiency problem occurred in block Krylov subspace methods; (2) a Block Conjugate Gradient for Least Squares algorithm to stably approximate the least squares solutions of general linear systems; (3) a BCGLS algorithm with deflation to gain convergence acceleration; and (4) a Monte Carlo Generalized Minimal Residual algorithm based on sampling matrix-vector products to provide fast approximation of solutions. Secondly, we design a rank-revealing randomized Singular Value Decomposition (R3SVD) algorithm for adaptively constructing low-rank matrix approximations to satisfy application-specific accuracy. Thirdly, we study the block power method on Markov Chain Monte Carlo transition matrices and find that the convergence is actually depending on the number of independent vectors in the block. Correspondingly, we develop a sliding window power method to find stationary distribution, which has demonstrated success in modeling stochastic luminal Calcium release site. Fourthly, we take advantage of hybrid CPU-GPU computing platforms to accelerate the performance of the Breakdown-Free Block Conjugate Gradient algorithm and the randomized Singular Value Decomposition algorithm. Finally, we design a Gaussian variant of Freivalds’ algorithm to efficiently verify the correctness of matrix-matrix multiplication while avoiding undetectable fault patterns encountered in deterministic algorithms

    Accelerating science: The usage of commercial clouds in ATLAS Distributed Computing

    Get PDF
    The ATLAS experiment at CERN is one of the largest scientific machines built to date and will have ever growing computing needs as the Large Hadron Collider collects an increasingly larger volume of data over the next 20 years. ATLAS is conducting R&D projects on Amazon Web Services and Google Cloud as complementary resources for distributed computing, focusing on some of the key features of commercial clouds: lightweight operation, elasticity and availability of multiple chip architectures. The proof of concept phases have concluded with the cloud-native, vendoragnostic integration with the experiment’s data and workload management frameworks. Google Cloud has been used to evaluate elastic batch computing, ramping up ephemeral clusters of up to O(100k) cores to process tasks requiring quick turnaround. Amazon Web Services has been exploited for the successful physics validation of the Athena simulation software on ARM processors. We have also set up an interactive facility for physics analysis allowing endusers to spin up private, on-demand clusters for parallel computing with up to 4 000 cores, or run GPU enabled notebooks and jobs for machine learning applications. The success of the proof of concept phases has led to the extension of the Google Cloud project, where ATLAS will study the total cost of ownership of a production cloud site during 15 months with 10k cores on average, fully integrated with distributed grid computing resources and continue the R&D projects

    Scaling associative classification for very large datasets

    Get PDF
    Supervised learning algorithms are nowadays successfully scaling up to datasets that are very large in volume, leveraging the potential of in-memory cluster-computing Big Data frameworks. Still, massive datasets with a number of large-domain categorical features are a difficult challenge for any classifier. Most off-the-shelf solutions cannot cope with this problem. In this work we introduce DAC, a Distributed Associative Classifier. DAC exploits ensemble learning to distribute the training of an associative classifier among parallel workers and improve the final quality of the model. Furthermore, it adopts several novel techniques to reach high scalability without sacrificing quality, among which a preventive pruning of classification rules in the extraction phase based on Gini impurity. We ran experiments on Apache Spark, on a real large-scale dataset with more than 4 billion records and 800 million distinct categories. The results showed that DAC improves on a state-of-the-art solution in both prediction quality and execution time. Since the generated model is human-readable, it can not only classify new records, but also allow understanding both the logic behind the prediction and the properties of the model, becoming a useful aid for decision makers
    • …
    corecore