73 research outputs found
Accelerating Large-Scale Data Analysis by Offloading to High-Performance Computing Libraries using Alchemist
Apache Spark is a popular system aimed at the analysis of large data sets,
but recent studies have shown that certain computations---in particular, many
linear algebra computations that are the basis for solving common machine
learning problems---are significantly slower in Spark than when done using
libraries written in a high-performance computing framework such as the
Message-Passing Interface (MPI).
To remedy this, we introduce Alchemist, a system designed to call MPI-based
libraries from Apache Spark. Using Alchemist with Spark helps accelerate linear
algebra, machine learning, and related computations, while still retaining the
benefits of working within the Spark environment. We discuss the motivation
behind the development of Alchemist, and we provide a brief overview of its
design and implementation.
We also compare the performances of pure Spark implementations with those of
Spark implementations that leverage MPI-based codes via Alchemist. To do so, we
use data science case studies: a large-scale application of the conjugate
gradient method to solve very large linear systems arising in a speech
classification problem, where we see an improvement of an order of magnitude;
and the truncated singular value decomposition (SVD) of a 400GB
three-dimensional ocean temperature data set, where we see a speedup of up to
7.9x. We also illustrate that the truncated SVD computation is easily scalable
to terabyte-sized data by applying it to data sets of sizes up to 17.6TB.Comment: Accepted for publication in Proceedings of the 24th ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining, London, UK,
201
Accelerating Large-Scale Data Analysis by Offloading to High-Performance Computing Libraries using Alchemist
Apache Spark is a popular system aimed at the analysis of large data sets,
but recent studies have shown that certain computations---in particular, many
linear algebra computations that are the basis for solving common machine
learning problems---are significantly slower in Spark than when done using
libraries written in a high-performance computing framework such as the
Message-Passing Interface (MPI).
To remedy this, we introduce Alchemist, a system designed to call MPI-based
libraries from Apache Spark. Using Alchemist with Spark helps accelerate linear
algebra, machine learning, and related computations, while still retaining the
benefits of working within the Spark environment. We discuss the motivation
behind the development of Alchemist, and we provide a brief overview of its
design and implementation.
We also compare the performances of pure Spark implementations with those of
Spark implementations that leverage MPI-based codes via Alchemist. To do so, we
use data science case studies: a large-scale application of the conjugate
gradient method to solve very large linear systems arising in a speech
classification problem, where we see an improvement of an order of magnitude;
and the truncated singular value decomposition (SVD) of a 400GB
three-dimensional ocean temperature data set, where we see a speedup of up to
7.9x. We also illustrate that the truncated SVD computation is easily scalable
to terabyte-sized data by applying it to data sets of sizes up to 17.6TB.Comment: Accepted for publication in Proceedings of the 24th ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining, London, UK,
201
Emission-Line Galaxies from the Hubble Space Telescope Probing Evolution and Reionization Spectroscopically (PEARS) Grism Survey. II: The Complete Sample
We present a full analysis of the Probing Evolution And Reionization
Spectroscopically (PEARS) slitess grism spectroscopic data obtained with the
Advanced Camera for Surveys on HST. PEARS covers fields within both the Great
Observatories Origins Deep Survey (GOODS) North and South fields, making it
ideal as a random survey of galaxies, as well as the availability of a wide
variety of ancillary observations to support the spectroscopic results. Using
the PEARS data we are able to identify star forming galaxies within the
redshift volume 0< z<1.5. Star forming regions in the PEARS survey are
pinpointed independently of the host galaxy. This method allows us to detect
the presence of multiple emission line regions (ELRs) within a single galaxy.
1162 Ha, [OIII] and/or [OII] emission lines have been identified in the PEARS
sample of ~906 galaxies down to a limiting flux of ~1e-18 erg/s/cm^2. The ELRs
have also been compared to the properties of the host galaxy, including
morphology, luminosity, and mass. From this analysis we find three key results:
1) The computed line luminosities show evidence of a flattening in the
luminosity function with increasing redshift; 2) The star forming systems show
evidence of disturbed morphologies, with star formation occurring predominantly
within one effective (half-light) radius. However, the morphologies show no
correlation with host stellar mass; and 3) The number density of star forming
galaxies with M_* > 1e9} M_sun decreases by an order of magnitude at z<0.5
relative to the number at 0.5<z<0.9 in support of the argument for galaxy
downsizing.Comment: Submitted. 48 pages. 19 figures. Accepted to Ap
Chronic thromboembolic pulmonary hypertension and impairment after pulmonary embolism: the FOCUS study
AIMS: To systematically assess late outcomes of acute pulmonary embolism (PE) and to investigate the clinical implications of post-PE impairment (PPEI) fulfilling prospectively defined criteria.
METHODS AND RESULTS: A prospective multicentre observational cohort study was conducted in 17 large-volume centres across Germany. Adult consecutive patients with confirmed acute symptomatic PE were followed with a standardized assessment plan and pre-defined visits at 3, 12, and 24 months. The co-primary outcomes were (i) diagnosis of chronic thromboembolic pulmonary hypertension (CTEPH), and (ii) PPEI, a combination of persistent or worsening clinical, functional, biochemical, and imaging parameters during follow-up. A total of 1017 patients (45% women, median age 64 years) were included in the primary analysis. They were followed for a median duration of 732 days after PE diagnosis. The CTEPH was diagnosed in 16 (1.6%) patients, after a median of 129 days; the estimated 2-year cumulative incidence was 2.3% (1.2-4.4%). Overall, 880 patients were evaluable for PPEI; the 2-year cumulative incidence was 16.0% (95% confidence interval 12.8-20.8%). The PPEI helped to identify 15 of the 16 patients diagnosed with CTEPH during follow-up (hazard ratio for CTEPH vs. no CTEPH 393; 95% confidence interval 73-2119). Patients with PPEI had a higher risk of re-hospitalization and death as well as worse quality of life compared with those without PPEI.
CONCLUSION: In this prospective study, the cumulative 2-year incidence of CTEPH was 2.3%, but PPEI diagnosed by standardized criteria was frequent. Our findings support systematic follow-up of patients after acute PE and may help to optimize guideline recommendations and algorithms for post-PE care
Observation of High-Energy Astrophysical Neutrinos in Three Years of IceCube Data
A search for high-energy neutrinos interacting within the IceCube detector
between 2010 and 2012 provided the first evidence for a high-energy neutrino
flux of extraterrestrial origin. Results from an analysis using the same
methods with a third year (2012-2013) of data from the complete IceCube
detector are consistent with the previously reported astrophysical flux in the
100 TeV - PeV range at the level of per flavor and reject a
purely atmospheric explanation for the combined 3-year data at .
The data are consistent with expectations for equal fluxes of all three
neutrino flavors and with isotropic arrival directions, suggesting either
numerous or spatially extended sources. The three-year dataset, with a livetime
of 988 days, contains a total of 37 neutrino candidate events with deposited
energies ranging from 30 to 2000 TeV. The 2000 TeV event is the highest-energy
neutrino interaction ever observed.Comment: 8 pages, 5 figures. Accepted by PRL. The event catalog, event
displays, and other data tables are included after the final page of the
article. Changed from the initial submission to reflect referee comments,
expanding the section on atmospheric backgrounds, and fixes offsets of up to
0.9 seconds in reported event times. Address correspondence to: J. Feintzeig,
C. Kopper, N. Whitehor
Measurement of atmospheric neutrino oscillations with IceCube
We present the first statistically significant detection of neutrino oscillations in the high-energy regime (>20 GeV) from an analysis of IceCube Neutrino Observatory data collected in 2010 and 2011. This measurement is made possible by the low-energy threshold of the DeepCore detector (~20 GeV) and benefits from the use of the IceCube detector as a veto against cosmic-ray-induced muon background. The oscillation signal was detected within a low-energy muon neutrino sample (20-100 GeV) extracted from data collected by DeepCore. A high-energy muon neutrino sample (100 GeV-10 TeV) was extracted from IceCube data to constrain systematic uncertainties. The disappearance of low-energy upward-going muon neutrinos was observed, and the nonoscillation hypothesis is rejected with more than 5Ï significance. In a two-neutrino flavor formalism, our data are best described by the atmospheric neutrino oscillation parameters |Îm(32)(2)|=(2.3(-0.5)(+0.6))Ă10(-3) eV(2) and sin(2)(2Ξ(23))>0.93, and maximum mixing is favored.M. G. Aartsen ... G. C. Hill ... et al. (IceCube Collaboration
Measurement of the cosmic ray energy spectrum with IceTop-73
We report on the measurement of the all-particle cosmic ray energy spectrum with the IceTop air shower array in the energy range from 1.58 PeV to 1.26 EeV. The IceTop air shower array is the surface component of the IceCube Neutrino Observatory at the geographical South Pole. The analysis was performed using only information from IceTop. The data used in this work were taken from June 1, 2010 to May 13, 2011. During that period the IceTop array consisted of 73 stations, compared to 81 in its final configuration. The measured spectrum exhibits a clear deviation from a single power law above the knee around 4 PeV and below 1 EeV. We observe spectral hardening around 18 PeV and steepening around 130 PeV.M. G. Aartsen ... G. C. Hill ... et al. (IceCube Collaboration
Riociguat treatment in patients with chronic thromboembolic pulmonary hypertension: Final safety data from the EXPERT registry
Objective: The soluble guanylate cyclase stimulator riociguat is approved for the treatment of adult patients with pulmonary arterial hypertension (PAH) and inoperable or persistent/recurrent chronic thromboembolic pulmonary hypertension (CTEPH) following Phase
- âŠ