52 research outputs found

    Efficient HTTP based I/O on very large datasets for high performance computing with the libdavix library

    Full text link
    Remote data access for data analysis in high performance computing is commonly done with specialized data access protocols and storage systems. These protocols are highly optimized for high throughput on very large datasets, multi-streams, high availability, low latency and efficient parallel I/O. The purpose of this paper is to describe how we have adapted a generic protocol, the Hyper Text Transport Protocol (HTTP) to make it a competitive alternative for high performance I/O and data analysis applications in a global computing grid: the Worldwide LHC Computing Grid. In this work, we first analyze the design differences between the HTTP protocol and the most common high performance I/O protocols, pointing out the main performance weaknesses of HTTP. Then, we describe in detail how we solved these issues. Our solutions have been implemented in a toolkit called davix, available through several recent Linux distributions. Finally, we describe the results of our benchmarks where we compare the performance of davix against a HPC specific protocol for a data analysis use case.Comment: Presented at: Very large Data Bases (VLDB) 2014, Hangzho

    Plotting the Differences Between Data and Expectation

    Full text link
    This article proposes a way to improve the presentation of histograms where data are compared to expectation. Sometimes, it is difficult to judge by eye whether the difference between the bin content and the theoretical expectation (provided by either a fitting function or another histogram) is just due to statistical fluctuations. More importantly, there could be statistically significant deviations which are completely invisible in the plot. We propose to add a small inset at the bottom of the plot, in which the statistical significance of the deviation observed in each bin is shown. Even though the numerical routines which we developed have only illustration purposes, it comes out that they are based on formulae which could be used to perform statistical inference in a proper way. An implementation of our computation is available at https://github.com/dcasadei/psde .Comment: 10 pages, 7 figures. CODE: https://github.com/dcasadei/psd

    Sqrt{shat}_{min} resurrected

    Full text link
    We discuss the use of the variable sqrt{shat}_{min}, which has been proposed in order to measure the hard scale of a multi parton final state event using inclusive quantities only, on a SUSY data sample for a 14 TeV LHC. In its original version, where this variable was proposed on calorimeter level, the direct correlation to the hard scattering scale does not survive when effects from soft physics are taken into account. We here show that when using reconstructed objects instead of calorimeter energy and momenta as input, we manage to actually recover this correlation for the parameter point considered here. We furthermore discuss the effect of including W + jets and t tbar+jets background in our analysis and the use of sqrt{shat}_{min} for the suppression of SM induced background in new physics searches.Comment: 23 pages, 9 figures; v2: 1 figure, several subsections and references as well as new author affiliation added. Corresponds to published versio

    Type Ia supernova parameter estimation: a comparison of two approaches using current datasets

    Full text link
    By using the Sloan Digital Sky Survey (SDSS) first year type Ia supernova (SN Ia) compilation, we compare two different approaches (traditional \chi^2 and complete likelihood) to determine parameter constraints when the magnitude dispersion is to be estimated as well. We consider cosmological constant + Cold Dark Matter (\Lambda CDM) and spatially flat, constant w Dark Energy + Cold Dark Matter (FwCDM) cosmological models and show that, for current data, there is a small difference in the best fit values and \sim 30% difference in confidence contour areas in case the MLCS2k2 light-curve fitter is adopted. For the SALT2 light-curve fitter the differences are less significant (\lesssim 13% difference in areas). In both cases the likelihood approach gives more restrictive constraints. We argue for the importance of using the complete likelihood instead of the \chi^2 approach when dealing with parameters in the expression for the variance.Comment: 16 pages, 5 figures. More complete analysis by including peculiar velocities and correlations among SALT2 parameters. Use of 2D contours instead of 1D intervals for comparison. There can be now a significant difference between the approaches, around 30% in contour area for MLCS2k2 and up to 13% for SALT2. Generic streamlining of text and suppression of section on model selectio

    Analysis of high-identity segmental duplications in the grapevine genome

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Segmental duplications (SDs) are blocks of genomic sequence of 1-200 kb that map to different loci in a genome and share a sequence identity > 90%. SDs show at the sequence level the same characteristics as other regions of the human genome: they contain both high-copy repeats and gene sequences. SDs play an important role in genome plasticity by creating new genes and modeling genome structure. Although data is plentiful for mammals, not much was known about the representation of SDs in plant genomes. In this regard, we performed a genome-wide analysis of high-identity SDs on the sequenced grapevine (<it>Vitis vinifera</it>) genome (PN40024).</p> <p>Results</p> <p>We demonstrate that recent SDs (> 94% identity and >= 10 kb in size) are a relevant component of the grapevine genome (85 Mb, 17% of the genome sequence). We detected mitochondrial and plastid DNA and genes (10% of gene annotation) in segmentally duplicated regions of the nuclear genome. In particular, the nine highest copy number genes have a copy in either or both organelle genomes. Further we showed that several duplicated genes take part in the biosynthesis of compounds involved in plant response to environmental stress.</p> <p>Conclusions</p> <p>These data show the great influence of SDs and organelle DNA transfers in modeling the <it>Vitis vinifera </it>nuclear DNA structure as well as the impact of SDs in contributing to the adaptive capacity of grapevine and the nutritional content of grape products through genome variation. This study represents a step forward in the full characterization of duplicated genes important for grapevine cultural needs and human health.</p

    gSeaGen: The KM3NeT GENIE-based code for neutrino telescopes

    Get PDF
    Program summary Program Title: gSeaGen CPC Library link to program files: http://dx.doi.org/10.17632/ymgxvy2br4.1 Licensing provisions: GPLv3 Programming language: C++ External routines/libraries: GENIE [1] and its external dependencies. Linkable to MUSIC [2] and PROPOSAL [3]. Nature of problem: Development of a code to generate detectable events in neutrino telescopes, using modern and maintained neutrino interaction simulation libraries which include the state-of-the-art physics models. The default application is the simulation of neutrino interactions within KM3NeT [4]. Solution method: Neutrino interactions are simulated using GENIE, a modern framework for Monte Carlo event generators. The GENIE framework, used by nearly all modern neutrino experiments, is considered as a reference code within the neutrino community. Additional comments including restrictions and unusual features: The code was tested with GENIE version 2.12.10 and it is linkable with release series 3. Presently valid up to 5 TeV. This limitation is not intrinsic to the code but due to the present GENIE valid energy range. References: [1] C. Andreopoulos at al., Nucl. Instrum. Meth. A614 (2010) 87. [2] P. Antonioli et al., Astropart. Phys. 7 (1997) 357. [3] J. H. Koehne et al., Comput. Phys. Commun. 184 (2013) 2070. [4] S. Adrián-Martínez et al., J. Phys. G: Nucl. Part. Phys. 43 (2016) 084001.The gSeaGen code is a GENIE-based application developed to efficiently generate high statistics samples of events, induced by neutrino interactions, detectable in a neutrino telescope. The gSeaGen code is able to generate events induced by all neutrino flavours, considering topological differences between tracktype and shower-like events. Neutrino interactions are simulated taking into account the density and the composition of the media surrounding the detector. The main features of gSeaGen are presented together with some examples of its application within the KM3NeT project.French National Research Agency (ANR) ANR-15-CE31-0020Centre National de la Recherche Scientifique (CNRS)European Union (EU)Institut Universitaire de France (IUF), FranceIdEx program, FranceUnivEarthS Labex program at Sorbonne Paris Cite ANR-10-LABX-0023 ANR-11-IDEX-000502Paris Ile-de-France Region, FranceShota Rustaveli National Science Foundation of Georgia (SRNSFG), Georgia FR-18-1268German Research Foundation (DFG)Greek Ministry of Development-GSRTIstituto Nazionale di Fisica Nucleare (INFN)Ministry of Education, Universities and Research (MIUR)PRIN 2017 program Italy NAT-NET 2017W4HA7SMinistry of Higher Education, Scientific Research and Professional Training, MoroccoNetherlands Organization for Scientific Research (NWO) Netherlands GovernmentNational Science Centre, Poland 2015/18/E/ST2/00758National Authority for Scientific Research (ANCS), RomaniaMinisterio de Ciencia, Innovacion, Investigacion y Universidades (MCIU): Programa Estatal de Generacion de Conocimiento, Spain (MCIU/FEDER) PGC2018-096663-B-C41 PGC2018-096663-A-C42 PGC2018-096663-BC43 PGC2018-096663-B-C44Severo Ochoa Centre of Excellence and MultiDark Consolider (MCIU), Junta de Andalucia, Spain SOMM17/6104/UGRGeneralitat Valenciana: Grisolia, Spain GRISOLIA/2018/119GenT, Spain CIDEGENT/2018/034La Caixa Foundation LCF/BQ/IN17/11620019EU: MSC program, Spain 71367

    ROOT — A C++ framework for petabyte data storage, statistical analysis and visualization

    No full text
    This program has been imported from the CPC Program Library held at Queen's University Belfast (1969-2018) Abstract ROOT is an object-oriented C++ framework conceived in the high-energy physics (HEP) community, designed for storing and analyzing petabytes of data in an efficient way. Any instance of a C++ class can be stored into a ROOT file in a machine-independent compressed binary format. In ROOT the TTree object container is optimized for statistical data analysis over very large data sets by using vertical data storage techniques. These containers can span a large number of files on local disks, the w... Title of program: ROOT Catalogue Id: AEFA_v1_0 Nature of problem Storage, analysis and visualization of scientific data Versions of this program held in the CPC repository in Mendeley Data AEFA_v1_0; ROOT; 10.1016/j.cpc.2009.08.005 AEFA_v2_0; ROOT; 10.1016/j.cpc.2011.02.00
    corecore