49,625 research outputs found

    Query-driven learning for predictive analytics of data subspace cardinality

    Get PDF
    Fundamental to many predictive analytics tasks is the ability to estimate the cardinality (number of data items) of multi-dimensional data subspaces, defined by query selections over datasets. This is crucial for data analysts dealing with, e.g., interactive data subspace explorations, data subspace visualizations, and in query processing optimization. However, in many modern data systems, predictive analytics may be (i) too costly money-wise, e.g., in clouds, (ii) unreliable, e.g., in modern Big Data query engines, where accurate statistics are difficult to obtain/maintain, or (iii) infeasible, e.g., for privacy issues. We contribute a novel, query-driven, function estimation model of analyst-defined data subspace cardinality. The proposed estimation model is highly accurate in terms of prediction and accommodating the well-known selection queries: multi-dimensional range and distance-nearest neighbors (radius) queries. Our function estimation model: (i) quantizes the vectorial query space, by learning the analysts’ access patterns over a data space, (ii) associates query vectors with their corresponding cardinalities of the analyst-defined data subspaces, (iii) abstracts and employs query vectorial similarity to predict the cardinality of an unseen/unexplored data subspace, and (iv) identifies and adapts to possible changes of the query subspaces based on the theory of optimal stopping. The proposed model is decentralized, facilitating the scaling-out of such predictive analytics queries. The research significance of the model lies in that (i) it is an attractive solution when data-driven statistical techniques are undesirable or infeasible, (ii) it offers a scale-out, decentralized training solution, (iii) it is applicable to different selection query types, and (iv) it offers a performance that is superior to that of data-driven approaches

    Results of the SDHCAL technological prototype

    Full text link
    The SDHCAL technological prototype that has been completed in 2012 was exposed to beams of pions and electrons of different energies at the CERN SPS for a total time period of 5 weeks. The data has been analyzed within the CALICE collaboration. Preliminary results indicate that a highly granular hadronic calorimeter conceived for PFA application is also a powerful tool to separate pions from electrons. The SDHCAL provides also a very good resolution of hadronic showers energy measurement. The use of multi-threshold readout mode shows a clear improvement of the resolution at energies exceeding 30 GeV with respect to the binary readout mode. Simulations of the pion interactions in the SDHCAL are presented and new ideas to improve on the energy resolution using the topology of hadronic showers are mentioned.Comment: Talk presented at the International Workshop on Future Linear Colliders (LCWS13), Tokyo, Japan, 11-15 November 201

    First results of the SDHCAL technological prototype

    Full text link
    The CALICE Semi-digital hadronic calorimeter built in 2011, was installed and tested during two periods of two weeks each in 2012 at CERN SPS facilities. The detector has more than 450000 channels with a semi-digital readout distributed on 48 layers with efficiency exceeding 95%. It has been run using the trigger-less and power pulsing modes. Data have been collected with muon, electron and hadron beams in the energy range between 5 and 80 GeV. This contribution focuses on the performances, the shower selection methods and on the first results on the calibration using pions.Comment: Proceeding of the CHEF 2013 (Calorimetry for the High Energy Frontier) International conference, Eds J,-C Brient, R. Salerno, and Y. Sirois, ISBN number 978-2-7302-1624-1. year 2013, 1-488 page

    Accelerating scientific codes by performance and accuracy modeling

    Full text link
    Scientific software is often driven by multiple parameters that affect both accuracy and performance. Since finding the optimal configuration of these parameters is a highly complex task, it extremely common that the software is used suboptimally. In a typical scenario, accuracy requirements are imposed, and attained through suboptimal performance. In this paper, we present a methodology for the automatic selection of parameters for simulation codes, and a corresponding prototype tool. To be amenable to our methodology, the target code must expose the parameters affecting accuracy and performance, and there must be formulas available for error bounds and computational complexity of the underlying methods. As a case study, we consider the particle-particle particle-mesh method (PPPM) from the LAMMPS suite for molecular dynamics, and use our tool to identify configurations of the input parameters that achieve a given accuracy in the shortest execution time. When compared with the configurations suggested by expert users, the parameters selected by our tool yield reductions in the time-to-solution ranging between 10% and 60%. In other words, for the typical scenario where a fixed number of core-hours are granted and simulations of a fixed number of timesteps are to be run, usage of our tool may allow up to twice as many simulations. While we develop our ideas using LAMMPS as computational framework and use the PPPM method for dispersion as case study, the methodology is general and valid for a range of software tools and methods

    Techniques for clustering gene expression data

    Get PDF
    Many clustering techniques have been proposed for the analysis of gene expression data obtained from microarray experiments. However, choice of suitable method(s) for a given experimental dataset is not straightforward. Common approaches do not translate well and fail to take account of the data profile. This review paper surveys state of the art applications which recognises these limitations and implements procedures to overcome them. It provides a framework for the evaluation of clustering in gene expression analyses. The nature of microarray data is discussed briefly. Selected examples are presented for the clustering methods considered
    • 

    corecore