252 research outputs found

    High dimensional biological data retrieval optimization with NoSQL technology.

    Get PDF
    Background High-throughput transcriptomic data generated by microarray experiments is the most abundant and frequently stored kind of data currently used in translational medicine studies. Although microarray data is supported in data warehouses such as tranSMART, when querying relational databases for hundreds of different patient gene expression records queries are slow due to poor performance. Non-relational data models, such as the key-value model implemented in NoSQL databases, hold promise to be more performant solutions. Our motivation is to improve the performance of the tranSMART data warehouse with a view to supporting Next Generation Sequencing data. Results In this paper we introduce a new data model better suited for high-dimensional data storage and querying, optimized for database scalability and performance. We have designed a key-value pair data model to support faster queries over large-scale microarray data and implemented the model using HBase, an implementation of Google's BigTable storage system. An experimental performance comparison was carried out against the traditional relational data model implemented in both MySQL Cluster and MongoDB, using a large publicly available transcriptomic data set taken from NCBI GEO concerning Multiple Myeloma. Our new key-value data model implemented on HBase exhibits an average 5.24-fold increase in high-dimensional biological data query performance compared to the relational model implemented on MySQL Cluster, and an average 6.47-fold increase on query performance on MongoDB. Conclusions The performance evaluation found that the new key-value data model, in particular its implementation in HBase, outperforms the relational model currently implemented in tranSMART. We propose that NoSQL technology holds great promise for large-scale data management, in particular for high-dimensional biological data such as that demonstrated in the performance evaluation described in this paper. We aim to use this new data model as a basis for migrating tranSMART's implementation to a more scalable solution for Big Data

    A Protocol for the Secure Linking of Registries for HPV Surveillance

    Get PDF
    In order to monitor the effectiveness of HPV vaccination in Canada the linkage of multiple data registries may be required. These registries may not always be managed by the same organization and, furthermore, privacy legislation or practices may restrict any data linkages of records that can actually be done among registries. The objective of this study was to develop a secure protocol for linking data from different registries and to allow on-going monitoring of HPV vaccine effectiveness.A secure linking protocol, using commutative hash functions and secure multi-party computation techniques was developed. This protocol allows for the exact matching of records among registries and the computation of statistics on the linked data while meeting five practical requirements to ensure patient confidentiality and privacy. The statistics considered were: odds ratio and its confidence interval, chi-square test, and relative risk and its confidence interval. Additional statistics on contingency tables, such as other measures of association, can be added using the same principles presented. The computation time performance of this protocol was evaluated.The protocol has acceptable computation time and scales linearly with the size of the data set and the size of the contingency table. The worse case computation time for up to 100,000 patients returned by each query and a 16 cell contingency table is less than 4 hours for basic statistics, and the best case is under 3 hours.A computationally practical protocol for the secure linking of data from multiple registries has been demonstrated in the context of HPV vaccine initiative impact assessment. The basic protocol can be generalized to the surveillance of other conditions, diseases, or vaccination programs

    Routes for breaching and protecting genetic privacy

    Full text link
    We are entering the era of ubiquitous genetic information for research, clinical care, and personal curiosity. Sharing these datasets is vital for rapid progress in understanding the genetic basis of human diseases. However, one growing concern is the ability to protect the genetic privacy of the data originators. Here, we technically map threats to genetic privacy and discuss potential mitigation strategies for privacy-preserving dissemination of genetic data.Comment: Draft for comment

    Publishing data from electronic health records while preserving privacy: a survey of algorithms

    Get PDF
    The dissemination of Electronic Health Records (EHRs) can be highly beneficial for a range of medical studies, spanning from clinical trials to epidemic control studies, but it must be performed in a way that preserves patients’ privacy. This is not straightforward, because the disseminated data need to be protected against several privacy threats, while remaining useful for subsequent analysis tasks. In this work, we present a survey of algorithms that have been proposed for publishing structured patient data, in a privacy-preserving way. We review more than 45 algorithms, derive insights on their operation, and highlight their advantages and disadvantages. We also provide a discussion of some promising directions for future research in this area

    Energy and exergy analysis of chemical looping combustion technology and comparison with pre-combustion and oxy-fuel combustion technologies for CO2 capture

    Get PDF
    Carbon dioxide (CO2) emitted from conventional coal-based power plants is a growing concern for the environment. Chemical looping combustion (CLC), pre-combustion and oxy-fuel combustion are promising CO2 capture technologies which allow clean electricity generation from coal in an integrated gasification combined cycle (IGCC) power plant. This work compares the characteristics of the above three capture technologies to those of a conventional IGCC plant without CO2 capture. CLC technology is also investigated for two different process configurations—(i) an integrated gasification combined cycle coupled with chemical looping combustion (IGCC–CLC), and (ii) coal direct chemical looping combustion (CDCLC)—using exergy analysis to exploit the complete potential of CLC. Power output, net electrical efficiency and CO2 capture efficiency are the key parameters investigated for the assessment. Flowsheet models of five different types of IGCC power plants, (four with and one without CO2 capture), were developed in the Aspen plus simulation package. The results indicate that with respect to conventional IGCC power plant, IGCC–CLC exhibited an energy penalty of 4.5%, compared with 7.1% and 9.1% for pre-combustion and oxy-fuel combustion technologies, respectively. IGCC–CLC and oxy-fuel combustion technologies achieved an overall CO2 capture rate of ∼100% whereas pre-combustion technology could capture ∼94.8%. Modification of IGCC–CLC into CDCLC tends to increase the net electrical efficiency by 4.7% while maintaining 100% CO2 capture rate. A detailed exergy analysis performed on the two CLC process configurations (IGCC–CLC and CDCLC) and conventional IGCC process demonstrates that CLC technology can be thermodynamically as efficient as a conventional IGCC process

    Dilepton mass spectra in p+p collisions at sqrt(s)= 200 GeV and the contribution from open charm

    Get PDF
    The PHENIX experiement has measured the electron-positron pair mass spectrum from 0 to 8 GeV/c^2 in p+p collisions at sqrt(s)=200 GeV. The contributions from light meson decays to e^+e^- pairs have been determined based on measurements of hadron production cross sections by PHENIX. They account for nearly all e^+e^- pairs in the mass region below 1 GeV/c^2. The e^+e^- pair yield remaining after subtracting these contributions is dominated by semileptonic decays of charmed hadrons correlated through flavor conservation. Using the spectral shape predicted by PYTHIA, we estimate the charm production cross section to be 544 +/- 39(stat) +/- 142(syst) +/- 200(model) \mu b, which is consistent with QCD calculations and measurements of single leptons by PHENIX.Comment: 375 authors from 57 institutions, 18 pages, 4 figures, 2 tables. Submitted to Physics Letters B. v2 fixes technical errors in matching authors to institutions. Plain text data tables for the points plotted in figures for this and previous PHENIX publications are (or will be) publicly available at http://www.phenix.bnl.gov/papers.htm

    Inclusive cross section and double helicity asymmetry for \pi^0 production in p+p collisions at sqrt(s)=200 GeV: Implications for the polarized gluon distribution in the proton

    Full text link
    The PHENIX experiment presents results from the RHIC 2005 run with polarized proton collisions at sqrt(s)=200 GeV, for inclusive \pi^0 production at mid-rapidity. Unpolarized cross section results are given for transverse momenta p_T=0.5 to 20 GeV/c, extending the range of published data to both lower and higher p_T. The cross section is described well for p_T < 1 GeV/c by an exponential in p_T, and, for p_T > 2 GeV/c, by perturbative QCD. Double helicity asymmetries A_LL are presented based on a factor of five improvement in uncertainties as compared to previously published results, due to both an improved beam polarization of 50%, and to higher integrated luminosity. These measurements are sensitive to the gluon polarization in the proton, and exclude maximal values for the gluon polarization.Comment: 375 authors, 7 pages, 3 figures. Submitted to Phys. Rev. D, Rapid Communications. Plain text data tables for the points plotted in figures for this and previous PHENIX publications are (or will be) publicly available at http://www.phenix.bnl.gov/papers.htm

    Measurement of high-p_T Single Electrons from Heavy-Flavor Decays in p+p Collisions at sqrt(s) = 200 GeV

    Get PDF
    The momentum distribution of electrons from decays of heavy flavor (charm and beauty) for midrapidity |y| < 0.35 in p+p collisions at sqrt(s) = 200 GeV has been measured by the PHENIX experiment at the Relativistic Heavy Ion Collider (RHIC) over the transverse momentum range 0.3 < p_T < 9 GeV/c. Two independent methods have been used to determine the heavy flavor yields, and the results are in good agreement with each other. A fixed-order-plus-next-to-leading-log pQCD calculation agrees with the data within the theoretical and experimental uncertainties, with the data/theory ratio of 1.72 +/- 0.02^stat +/- 0.19^sys for 0.3 < p_T < 9 GeV/c. The total charm production cross section at this energy has also been deduced to be sigma_(c c^bar) = 567 +/- 57^stat +/- 224^sys micro barns.Comment: 375 authors from 57 institutions, 6 pages, 3 figures. Submitted to Physical Review Letters. Plain text data tables for the points plotted in figures for this and previous PHENIX publications are (or will be) publicly available at http://www.phenix.bnl.gov/papers.htm

    System Size and Energy Dependence of Jet-Induced Hadron Pair Correlation Shapes in Cu+Cu and Au+Au Collisions at sqrt(s_NN) = 200 and 62.4 GeV

    Get PDF
    We present azimuthal angle correlations of intermediate transverse momentum (1-4 GeV/c) hadrons from {dijets} in Cu+Cu and Au+Au collisions at sqrt(s_NN) = 62.4 and 200 GeV. The away-side dijet induced azimuthal correlation is broadened, non-Gaussian, and peaked away from \Delta\phi=\pi in central and semi-central collisions in all the systems. The broadening and peak location are found to depend upon the number of participants in the collision, but not on the collision energy or beam nuclei. These results are consistent with sound or shock wave models, but pose challenges to Cherenkov gluon radiation models.Comment: 464 authors from 60 institutions, 6 pages, 3 figures, 2 tables. Submitted to Physical Review Letters. Plain text data tables for the points plotted in figures for this and previous PHENIX publications are (or will be) publicly available at http://www.phenix.bnl.gov/papers.htm
    corecore