49,625 research outputs found
Query-driven learning for predictive analytics of data subspace cardinality
Fundamental to many predictive analytics tasks is the ability to estimate the cardinality (number of data items) of multi-dimensional data subspaces, defined by query selections over datasets. This is crucial for data analysts dealing with, e.g., interactive data subspace explorations, data subspace visualizations, and in query processing optimization. However, in many modern data systems, predictive analytics may be (i) too costly money-wise, e.g., in clouds, (ii) unreliable, e.g., in modern Big Data query engines, where accurate statistics are difficult to obtain/maintain, or (iii) infeasible, e.g., for privacy issues. We contribute a novel, query-driven, function estimation model of analyst-defined data subspace cardinality. The proposed estimation model is highly accurate in terms of prediction and accommodating the well-known selection queries: multi-dimensional range and distance-nearest neighbors (radius) queries. Our function estimation model: (i) quantizes the vectorial query space, by learning the analystsâ access patterns over a data space, (ii) associates query vectors with their corresponding cardinalities of the analyst-defined data subspaces, (iii) abstracts and employs query vectorial similarity to predict the cardinality of an unseen/unexplored data subspace, and (iv) identifies and adapts to possible changes of the query subspaces based on the theory of optimal stopping. The proposed model is decentralized, facilitating the scaling-out of such predictive analytics queries. The research significance of the model lies in that (i) it is an attractive solution when data-driven statistical techniques are undesirable or infeasible, (ii) it offers a scale-out, decentralized training solution, (iii) it is applicable to different selection query types, and (iv) it offers a performance that is superior to that of data-driven approaches
Results of the SDHCAL technological prototype
The SDHCAL technological prototype that has been completed in 2012 was
exposed to beams of pions and electrons of different energies at the CERN SPS
for a total time period of 5 weeks. The data has been analyzed within the
CALICE collaboration. Preliminary results indicate that a highly granular
hadronic calorimeter conceived for PFA application is also a powerful tool to
separate pions from electrons. The SDHCAL provides also a very good resolution
of hadronic showers energy measurement. The use of multi-threshold readout mode
shows a clear improvement of the resolution at energies exceeding 30 GeV with
respect to the binary readout mode. Simulations of the pion interactions in the
SDHCAL are presented and new ideas to improve on the energy resolution using
the topology of hadronic showers are mentioned.Comment: Talk presented at the International Workshop on Future Linear
Colliders (LCWS13), Tokyo, Japan, 11-15 November 201
First results of the SDHCAL technological prototype
The CALICE Semi-digital hadronic calorimeter built in 2011, was installed and
tested during two periods of two weeks each in 2012 at CERN SPS facilities. The
detector has more than 450000 channels with a semi-digital readout distributed
on 48 layers with efficiency exceeding 95%. It has been run using the
trigger-less and power pulsing modes. Data have been collected with muon,
electron and hadron beams in the energy range between 5 and 80 GeV. This
contribution focuses on the performances, the shower selection methods and on
the first results on the calibration using pions.Comment: Proceeding of the CHEF 2013 (Calorimetry for the High Energy
Frontier) International conference, Eds J,-C Brient, R. Salerno, and Y.
Sirois, ISBN number 978-2-7302-1624-1. year 2013, 1-488 page
Accelerating scientific codes by performance and accuracy modeling
Scientific software is often driven by multiple parameters that affect both
accuracy and performance. Since finding the optimal configuration of these
parameters is a highly complex task, it extremely common that the software is
used suboptimally. In a typical scenario, accuracy requirements are imposed,
and attained through suboptimal performance. In this paper, we present a
methodology for the automatic selection of parameters for simulation codes, and
a corresponding prototype tool. To be amenable to our methodology, the target
code must expose the parameters affecting accuracy and performance, and there
must be formulas available for error bounds and computational complexity of the
underlying methods. As a case study, we consider the particle-particle
particle-mesh method (PPPM) from the LAMMPS suite for molecular dynamics, and
use our tool to identify configurations of the input parameters that achieve a
given accuracy in the shortest execution time. When compared with the
configurations suggested by expert users, the parameters selected by our tool
yield reductions in the time-to-solution ranging between 10% and 60%. In other
words, for the typical scenario where a fixed number of core-hours are granted
and simulations of a fixed number of timesteps are to be run, usage of our tool
may allow up to twice as many simulations. While we develop our ideas using
LAMMPS as computational framework and use the PPPM method for dispersion as
case study, the methodology is general and valid for a range of software tools
and methods
Techniques for clustering gene expression data
Many clustering techniques have been proposed for the analysis of gene expression data obtained from microarray experiments. However, choice of suitable method(s) for a given experimental dataset is not straightforward. Common approaches do not translate well and fail to take account of the data profile. This review paper surveys state of the art applications which recognises these limitations and implements procedures to overcome them. It provides a framework for the evaluation of clustering in gene expression analyses. The nature of microarray data is discussed briefly. Selected examples are presented for the clustering methods considered
- âŠ