73 research outputs found

    Accelerating Large-Scale Data Analysis by Offloading to High-Performance Computing Libraries using Alchemist

    Full text link
    Apache Spark is a popular system aimed at the analysis of large data sets, but recent studies have shown that certain computations---in particular, many linear algebra computations that are the basis for solving common machine learning problems---are significantly slower in Spark than when done using libraries written in a high-performance computing framework such as the Message-Passing Interface (MPI). To remedy this, we introduce Alchemist, a system designed to call MPI-based libraries from Apache Spark. Using Alchemist with Spark helps accelerate linear algebra, machine learning, and related computations, while still retaining the benefits of working within the Spark environment. We discuss the motivation behind the development of Alchemist, and we provide a brief overview of its design and implementation. We also compare the performances of pure Spark implementations with those of Spark implementations that leverage MPI-based codes via Alchemist. To do so, we use data science case studies: a large-scale application of the conjugate gradient method to solve very large linear systems arising in a speech classification problem, where we see an improvement of an order of magnitude; and the truncated singular value decomposition (SVD) of a 400GB three-dimensional ocean temperature data set, where we see a speedup of up to 7.9x. We also illustrate that the truncated SVD computation is easily scalable to terabyte-sized data by applying it to data sets of sizes up to 17.6TB.Comment: Accepted for publication in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, London, UK, 201

    Accelerating Large-Scale Data Analysis by Offloading to High-Performance Computing Libraries using Alchemist

    Get PDF
    Apache Spark is a popular system aimed at the analysis of large data sets, but recent studies have shown that certain computations---in particular, many linear algebra computations that are the basis for solving common machine learning problems---are significantly slower in Spark than when done using libraries written in a high-performance computing framework such as the Message-Passing Interface (MPI). To remedy this, we introduce Alchemist, a system designed to call MPI-based libraries from Apache Spark. Using Alchemist with Spark helps accelerate linear algebra, machine learning, and related computations, while still retaining the benefits of working within the Spark environment. We discuss the motivation behind the development of Alchemist, and we provide a brief overview of its design and implementation. We also compare the performances of pure Spark implementations with those of Spark implementations that leverage MPI-based codes via Alchemist. To do so, we use data science case studies: a large-scale application of the conjugate gradient method to solve very large linear systems arising in a speech classification problem, where we see an improvement of an order of magnitude; and the truncated singular value decomposition (SVD) of a 400GB three-dimensional ocean temperature data set, where we see a speedup of up to 7.9x. We also illustrate that the truncated SVD computation is easily scalable to terabyte-sized data by applying it to data sets of sizes up to 17.6TB.Comment: Accepted for publication in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, London, UK, 201

    Emission-Line Galaxies from the Hubble Space Telescope Probing Evolution and Reionization Spectroscopically (PEARS) Grism Survey. II: The Complete Sample

    Get PDF
    We present a full analysis of the Probing Evolution And Reionization Spectroscopically (PEARS) slitess grism spectroscopic data obtained with the Advanced Camera for Surveys on HST. PEARS covers fields within both the Great Observatories Origins Deep Survey (GOODS) North and South fields, making it ideal as a random survey of galaxies, as well as the availability of a wide variety of ancillary observations to support the spectroscopic results. Using the PEARS data we are able to identify star forming galaxies within the redshift volume 0< z<1.5. Star forming regions in the PEARS survey are pinpointed independently of the host galaxy. This method allows us to detect the presence of multiple emission line regions (ELRs) within a single galaxy. 1162 Ha, [OIII] and/or [OII] emission lines have been identified in the PEARS sample of ~906 galaxies down to a limiting flux of ~1e-18 erg/s/cm^2. The ELRs have also been compared to the properties of the host galaxy, including morphology, luminosity, and mass. From this analysis we find three key results: 1) The computed line luminosities show evidence of a flattening in the luminosity function with increasing redshift; 2) The star forming systems show evidence of disturbed morphologies, with star formation occurring predominantly within one effective (half-light) radius. However, the morphologies show no correlation with host stellar mass; and 3) The number density of star forming galaxies with M_* > 1e9} M_sun decreases by an order of magnitude at z<0.5 relative to the number at 0.5<z<0.9 in support of the argument for galaxy downsizing.Comment: Submitted. 48 pages. 19 figures. Accepted to Ap

    Chronic thromboembolic pulmonary hypertension and impairment after pulmonary embolism: the FOCUS study

    Full text link
    AIMS: To systematically assess late outcomes of acute pulmonary embolism (PE) and to investigate the clinical implications of post-PE impairment (PPEI) fulfilling prospectively defined criteria. METHODS AND RESULTS: A prospective multicentre observational cohort study was conducted in 17 large-volume centres across Germany. Adult consecutive patients with confirmed acute symptomatic PE were followed with a standardized assessment plan and pre-defined visits at 3, 12, and 24 months. The co-primary outcomes were (i) diagnosis of chronic thromboembolic pulmonary hypertension (CTEPH), and (ii) PPEI, a combination of persistent or worsening clinical, functional, biochemical, and imaging parameters during follow-up. A total of 1017 patients (45% women, median age 64 years) were included in the primary analysis. They were followed for a median duration of 732 days after PE diagnosis. The CTEPH was diagnosed in 16 (1.6%) patients, after a median of 129 days; the estimated 2-year cumulative incidence was 2.3% (1.2-4.4%). Overall, 880 patients were evaluable for PPEI; the 2-year cumulative incidence was 16.0% (95% confidence interval 12.8-20.8%). The PPEI helped to identify 15 of the 16 patients diagnosed with CTEPH during follow-up (hazard ratio for CTEPH vs. no CTEPH 393; 95% confidence interval 73-2119). Patients with PPEI had a higher risk of re-hospitalization and death as well as worse quality of life compared with those without PPEI. CONCLUSION: In this prospective study, the cumulative 2-year incidence of CTEPH was 2.3%, but PPEI diagnosed by standardized criteria was frequent. Our findings support systematic follow-up of patients after acute PE and may help to optimize guideline recommendations and algorithms for post-PE care

    Observation of High-Energy Astrophysical Neutrinos in Three Years of IceCube Data

    Get PDF
    A search for high-energy neutrinos interacting within the IceCube detector between 2010 and 2012 provided the first evidence for a high-energy neutrino flux of extraterrestrial origin. Results from an analysis using the same methods with a third year (2012-2013) of data from the complete IceCube detector are consistent with the previously reported astrophysical flux in the 100 TeV - PeV range at the level of 10−8 GeV cm−2 s−1 sr−110^{-8}\, \mathrm{GeV}\, \mathrm{cm}^{-2}\, \mathrm{s}^{-1}\, \mathrm{sr}^{-1} per flavor and reject a purely atmospheric explanation for the combined 3-year data at 5.7σ5.7 \sigma. The data are consistent with expectations for equal fluxes of all three neutrino flavors and with isotropic arrival directions, suggesting either numerous or spatially extended sources. The three-year dataset, with a livetime of 988 days, contains a total of 37 neutrino candidate events with deposited energies ranging from 30 to 2000 TeV. The 2000 TeV event is the highest-energy neutrino interaction ever observed.Comment: 8 pages, 5 figures. Accepted by PRL. The event catalog, event displays, and other data tables are included after the final page of the article. Changed from the initial submission to reflect referee comments, expanding the section on atmospheric backgrounds, and fixes offsets of up to 0.9 seconds in reported event times. Address correspondence to: J. Feintzeig, C. Kopper, N. Whitehor

    Measurement of atmospheric neutrino oscillations with IceCube

    Get PDF
    We present the first statistically significant detection of neutrino oscillations in the high-energy regime (>20 GeV) from an analysis of IceCube Neutrino Observatory data collected in 2010 and 2011. This measurement is made possible by the low-energy threshold of the DeepCore detector (~20 GeV) and benefits from the use of the IceCube detector as a veto against cosmic-ray-induced muon background. The oscillation signal was detected within a low-energy muon neutrino sample (20-100 GeV) extracted from data collected by DeepCore. A high-energy muon neutrino sample (100 GeV-10 TeV) was extracted from IceCube data to constrain systematic uncertainties. The disappearance of low-energy upward-going muon neutrinos was observed, and the nonoscillation hypothesis is rejected with more than 5σ significance. In a two-neutrino flavor formalism, our data are best described by the atmospheric neutrino oscillation parameters |Δm(32)(2)|=(2.3(-0.5)(+0.6))×10(-3) eV(2) and sin(2)(2Ξ(23))>0.93, and maximum mixing is favored.M. G. Aartsen ... G. C. Hill ... et al. (IceCube Collaboration

    Measurement of the cosmic ray energy spectrum with IceTop-73

    Get PDF
    We report on the measurement of the all-particle cosmic ray energy spectrum with the IceTop air shower array in the energy range from 1.58 PeV to 1.26 EeV. The IceTop air shower array is the surface component of the IceCube Neutrino Observatory at the geographical South Pole. The analysis was performed using only information from IceTop. The data used in this work were taken from June 1, 2010 to May 13, 2011. During that period the IceTop array consisted of 73 stations, compared to 81 in its final configuration. The measured spectrum exhibits a clear deviation from a single power law above the knee around 4 PeV and below 1 EeV. We observe spectral hardening around 18 PeV and steepening around 130 PeV.M. G. Aartsen ... G. C. Hill ... et al. (IceCube Collaboration

    Riociguat treatment in patients with chronic thromboembolic pulmonary hypertension: Final safety data from the EXPERT registry

    Get PDF
    Objective: The soluble guanylate cyclase stimulator riociguat is approved for the treatment of adult patients with pulmonary arterial hypertension (PAH) and inoperable or persistent/recurrent chronic thromboembolic pulmonary hypertension (CTEPH) following Phase
    • 

    corecore