281 research outputs found

    Analysis of GeV-band gamma-ray emission from SNR RX J1713.7-3946

    Full text link
    RX J1713.7-3946 is the brightest shell-type Supernova remnant (SNR) of the TeV gamma-ray sky. Earlier Fermi-LAT results on low-energy gamma-ray emission suggested that, despite large uncertainties in the background determination, the spectrum is inconsistent with a hadronic origin. We update the GeV-band spectra using improved estimates for the diffuse galactic gamma-ray emission and more than doubled data volume. We further investigate the viability of hadronic emission models for RX J1713.7-3946. We produced a high-resolution map of the diffuse Galactic gamma-ray background corrected for HI self-absorption and used it in the analysis of more than 5~years worth of Fermi-LAT data. We used hydrodynamic scaling relations and a kinetic transport equation to calculate the acceleration and propagation of cosmic-rays in SNR. We then determined spectra of hadronic gamma-ray emission from RX J1713.7-3946, separately for the SNR interior and the cosmic-ray precursor region of the forward shock, and computed flux variations that would allow to test the model with observations. We find that RX J1713.7-3946 is now detected by Fermi-LAT with very high statistical significance, and the source morphology is best described by that seen in the TeV band. The measured spectrum of RX J1713.7-3946 is hard with index gamma=1.53 +/- 0.07, and the integral flux above 500 MeV is F = (5.5 +/- 1.1)e-9 photons/cm^2/s. We demonstrate that scenarios based on hadronic emission from the cosmic-ray precursor region are acceptable for RX J1713.7-3946, and we predict a secular flux increase at a few hundred GeV at the level of around 15% over 10 years, which may be detectable with the upcoming CTA observatory.Comment: 9 pages, accepted for publication in Astronomy & Astrophysic

    An integrated compile-time/run-time software distributed shared memory system

    Get PDF
    On a distributed memory machine, hand-coded message passing leads to the most efficient execution, but it is difficult to use. Parallelizing compilers can approach the performance of hand-coded message passing by translating data-parallel programs into message passing programs, but efficient execution is limited to those programs for which precise analysis can be carried out. Shared memory is easier to program than message passing and its domain is not constrained by the limitations of parallelizing compilers, but it lags in performance. Our goal is to close that performance gap while retaining the benefits of shared memory. In other words, our goal is (1) to make shared memory as efficient as message passing, whether hand-coded or compiler-generated, (2) to retain its ease of programming, and (3) to retain the broader class of applications it supports.To this end we have designed and implemented an integrated compile-time and run-time software DSM system. The programming model remains identical to the original pure run-time DSM system. No user intervention is required to obtain the benefits of our system. The compiler computes data access patterns for the individual processors. It then performs a source-to-source transformation, inserting in the program calls to inform the run-time system of the computed data access patterns. The run-time system uses this information to aggregate communication, to aggregate data and synchronization into a single message, to eliminate consistency overhead, and to replace global synchronization with point-to-point synchronization wherever possible.We extended the Parascope programming environment to perform the required analysis, and we augmented the TreadMarks run-time DSM library to take advantage of the analysis. We used six Fortran programs to assess the performance benefits: Jacobi, 3D-FFT, Integer Sort, Shallow, Gauss, and Modified Gramm-Schmidt, each with two different data set sizes. The experiments were run on an 8-node IBM SP/2 using user-space communication. Compiler optimization in conjunction with the augmented run-time system achieves substantial execution time improvements in comparison to the base TreadMarks, ranging from 4% to 59% on 8 processors. Relative to message passing implementations of the same applications, the compile-time run-time system is 0-29% slower than message passing, while the base run-time system is 5-212% slower. For the five programs that XHPF could parallelize (all except IS), the execution times achieved by the compiler optimized shared memory programs are within 9% of XHPF

    TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems

    Get PDF
    TreadMarks is a distributed shared memory (DSM) system for standard Unix systems such as SunOS and Ultrix. This paper presents a performance evaluation of TreadMarks running on Ultrix using DECstation-5000/240's that are connected by a 100-Mbps switch-based ATM LAN and a 10-Mbps Ethernet. Our objective is to determine the efficiency of a user-level DSM implementation on commercially available workstations and operating systems. We achieved good speedups on the 8-processor ATM network for Jacobi (7.4), TSP (7.2), Quicksort (6.3), and ILINK (5.7). For a slightly modified version ofWater from the SPLASH benchmark suite, we achieved only moderate speedups (4.0) due to the high communication and synchronization rate. Speedups decline on the 10-Mbps Ethernet (5.5 for Jacobi, 6.5 for TSP, 4.2 for Quicksort, 5.1 for ILINK, and 2.1 for Water), re ecting the bandwidth limitations of the Ethernet. These results support the contention that, with suitable networking technology, DSM is a viable technique for parallel computation on clusters of workstations. To achieve these speedups, TreadMarks goes to great lengths to reduce the amount of communication performed to maintain memory consistency. It uses a lazy implementation of release consistency, and it allows multiple concurrent writers to modify a page, reducing the impact of false sharing. Great care was taken to minimize communication overhead. In particular, on the ATM network, we used a standard low-level protocol, AAL3/4, bypassing the TCP/IP protocol stack. Unix communication overhead, however, remains the main obstacle in the way of better performance for programs like Water. Compared to the Unix communication overhead, memory management cost (both kernel and user level) is small and wire time is negligible

    Supernova 1996cr: SN 1987A's Wild Cousin?

    Full text link
    We report on new VLT optical spectroscopic and multi-wavelength archival observations of SN1996cr, a previously identified ULX known as Circinus Galaxy X-2. Our optical spectrum confirms SN1996cr as a bona fide type IIn SN, while archival imaging isolates its explosion date to between 1995-02-28 and 1996-03-16. SN1996cr is one of the closest SNe (~3.8 Mpc) in the last several decades and in terms of flux ranks among the brightest radio and X-ray SNe ever detected. The wealth of optical, X-ray, and radio observations that exist for this source provide relatively detailed constraints on its post-explosion expansion and progenitor history, including an preliminary angular size constaint from VLBI. The archival X-ray and radio data imply that the progenitor of SN1996cr evacuated a large cavity just prior to exploding: the blast wave likely expanded for ~1-2 yrs before eventually striking the dense circumstellar material which surrounds SN1996cr. The X-ray and radio emission, which trace the progenitor mass-loss rate, have respectively risen by a factor of ~2 and remained roughly constant over the past 7 yr. This behavior is reminiscent of the late rise of SN1987A, but 1000 times more luminous and much more rapid to onset. Complex Oxygen line emission in the optical spectrum further hints at a possible concentric shell or ring-like structure. The discovery of SN1996cr suggests that a substantial fraction of the closest SNe observed in the last several decades have occurred in wind-blown bubbles. An Interplanetary Network position allows us to reject a tentative GRB association with BATSE 4B960202. [Abridged]Comment: 25 pages with tables, 12 figures (color), accepted to ApJ, comments welcome; v2 - updated to reflect the subsequent rejection of our tentative GRB association based on a revised error region from the Interplanetary Network (thanks to Kevin Hurley) and include a few additional references; v3 - corrected some errors in Tables 7 and

    SN 1993J VLBI (IV): A Geometric Determination of the Distance to M81 with the Expanding Shock Front Method

    Full text link
    We compare the angular expansion velocities, determined with VLBI, with the linear expansion velocities measured from optical spectra for supernova 1993J in the galaxy M81, over the period from 7 d to ~9 yr after shock breakout. We estimate the distance to SN 1993J using the Expanding Shock Front Method (ESM). We find the best distance estimate is obtained by fitting the angular velocity of a point halfway between the contact surface and outer shock front to the maximum observed hydrogen gas velocity. We obtain a direct, geometric, distance estimate for M81 of D=3.96+-0.05+-0.29 Mpc with statistical and systematic error contributions, respectively, corresponding to a total standard error of $+-0.29 Mpc. The upper limit of 4.25 Mpc corresponds to the hydrogen gas with the highest observed velocity reaching no farther out than the contact surface a few days after shock breakout. The lower limit of 3.67 Mpc corresponds to this hydrogen gas reaching as far out as the forward shock for the whole period, which would mean that Rayleigh-Taylor fingers have grown to the forward shock already a few days after shock breakout. Our distance estimate is 9+-13 % larger than that of 3.63+-0.34 Mpc from the HST Key Project, which is near our lower limit but within the errors.Comment: 25 pages, 11 figures, accepted for publication in Ap

    TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems

    Get PDF
    TreadMarks is a distributed shared memory (DSM) system for standard Unix systems such as SunOS and Ultrix. This paper presents a performance evaluation of TreadMarks running on Ultrix using DECstation-5000/240's that are connected by a 100-Mbps switch-based ATM LAN and a 10-Mbps Ethernet. Our objective is to determine the efficiency of a user-level DSM implementation on commercially available workstations and operating systems. We achieved good speedups on the 8-processor ATM network for Jacobi (7.4), TSP (7.2), Quicksort (6.3), and ILINK (5.7). For a slightly modified version ofWater from the SPLASH benchmark suite, we achieved only moderate speedups (4.0) due to the high communication and synchronization rate. Speedups decline on the 10-Mbps Ethernet (5.5 for Jacobi, 6.5 for TSP, 4.2 for Quicksort, 5.1 for ILINK, and 2.1 for Water), reecting the bandwidth limitations of the Ethernet. These results support the contention that, with suitable networking technology, DSM is a viable technique for parallel computation on clusters of workstations. To achieve these speedups, TreadMarks goes to great lengths to reduce the amount of communication performed to maintain memory consistency. It uses a lazy implementation of release consistency, and it allows multiple concurrent writers to modify a page, reducing the impact of false sharing. Great care was taken to minimize communication overhead. In particular, on the ATM network, we used a standard low-level protocol, AAL3/4, bypassing the TCP/IP protocol stack. Unix communication overhead, however, remains the main obstacle in the way of better performance for programs like Water. Compared to the Unix communication overhead, memory management cost (both kernel and user level) is small and wire time is negligible

    An Evaluation of Software Distributed Shared Memory for Next-Generation Processors and Networks

    Get PDF
    We evaluate the effect of processor speed, network characteristics, and software overhead on the performance of release-consistent software distributed shared memory. We examine five different protocols for implementing release consistency: eager update, eager invalidate, lazy update, lazy invalidate, and a new protocol called lazy hybrid. This lazy hybrid protocol combines the benefits of both lazy update and lazy invalidate. Our simulations indicate that with the processors and networks that are becoming available, coarse-grained applications such as Jacobi and TSP perform well, more or less independent of the protocol used. Medium-grained applications, such as Water, can achieve good performance, but the choice of protocol is critical. For sixteen processors, the best protocol, lazy hybrid, performed more than three times better than the worst, the eager update. Fine-grained applications such as Cholesky achieve little speedup regardless of the protocol used because of the frequency of synchronization operations and the high latency involved. While the use of relaxed memory models, lazy implementations, and multiple-writer protocols has reduced the impact of false sharing, synchronization latency remains a serious problem for software distributed shared memory systems. These results suggest that future work on software DSMs should concentrate on reducing the amount of synchronization or its effect

    Annihilation emission from young supernova remnants

    Get PDF
    A promising source of the positrons that contribute through annihilation to the diffuse Galactic 511keV emission is the beta-decay of unstable nuclei like 56Ni and 44Ti synthesised by massive stars and supernovae. Although a large fraction of these positrons annihilate in the ejecta of SNe/SNRs, no point-source of annihilation radiation appears in the INTEGRAL/SPI map of the 511keV emission. We exploit the absence of detectable annihilation emission from young local SNe/SNRs to derive constraints on the transport of MeV positrons inside SN/SNR ejecta and their escape into the CSM/ISM, both aspects being crucial to the understanding of the observed Galactic 511keV emission. We simulated 511keV lightcurves resulting from the annihilation of the decay positrons of 56Ni and 44Ti in SNe/SNRs and their surroundings using a simple model. We computed specific 511keV lightcurves for Cas A, Tycho, Kepler, SN1006, G1.9+0.3 and SN1987A, and compared these to the upper-limits derived from INTEGRAL/SPI observations. The predicted 511keV signals from positrons annihilating in the ejecta are below the sensitivity of the SPI instrument by several orders of magnitude, but the predicted 511keV signals for positrons escaping the ejecta and annihilating in the surrounding medium allowed to derive upper-limits on the positron escape fraction of ~13% for Cas A, ~12% for Tycho, ~30% for Kepler and ~33% for SN1006. The transport of ~MeV positrons inside SNe/SNRs cannot be constrained from current observations of the 511keV emission from these objects, but the limits obtained on their escape fraction are consistent with a nucleosynthesis origin of the positrons that give rise to the diffuse Galactic 511keV emission.Comment: 15 pages, 11 figures, accepted for publication in A&

    An Evaluation of Software Release-Consistent Protocols

    Get PDF
    This paper presents an evaluation of three software implementations of release consistency. Release consistent protocols allow data communication to be aggregated, and multiple writers to simultaneously modify a single page. We evaluated an eager invalidate protocol that enforces consistency when synchronization variables are released, a lazy invalidate protocol that enforces consistency when synchronization variables are acquired, and a lazy hybrid protocol that selectively uses update to reduce access misses. Our evaluation is based on implementations running on DECstation-5000/240s connected by an ATM LAN, and an execution driven simulator that allows us to vary network parameters. Our results show that the lazy protocols consistently outperform the eager protocol for all but one application, and that the lazy hybrid performs the best overall. However, the relative performance of the implementations is highly dependent on the relative speeds of the network, processor, and communication software. Lower bandwidths and high per byte software communication costs favor the lazy invalidate protocol, while high bandwidths and low per byte costs favor the hybrid. Performance of the eager protocol approaches that of the lazy protocols only when communication becomes essentially free

    Quantifying the Performance Differences Between PVM and TreadMarks

    Get PDF
    We compare two systems for parallel programming on networks of workstations: Parallel Virtual Machine (PVM) a message passing system, and TreadMarks, a software distributed shared memory (DSM) system. We present results for eight applications that were implemented using both systems. The programs are Water and Barnes-Hut from the SPLASH benchmark suite; 3-D FFT, Integer Sort (IS) and Embarrassingly Parallel (EP) from the NAS benchmarks; ILINK, a widely used genetic linkage analysis program; and Successive Over-Relaxation (SOR) and Traveling Salesman (TSP). Two different input data sets were used for five of the applications. We use two execution environments. The first is an 155 Mbps ATM network with eight Sparc-20 model 61 workstations; the second is an eight processor IBM SP/2. The differences in speedup between TreadMarks and PVM are dependent on the application, and, only to much a lesser extent, on the platform and the data set used. In particular, the TreadMarks speedup for six of the eight applications is within 15% of that achieved with PVM. For one application, the difference in speedup is between 15% and 30%, and for one application, the difference is around 50%. More important than the actual differences in speedups, we investigate the causes behind these differences. The cost of sending and receiving messages on current networks of workstations is very high, and previous work has identified communication costs as the primary source of overhead in software DSM implementations. The observed performance differences between PVM and TreadMarks are therefore primarily a result of differences in the amount of communication between the two systems. We identified four factors that contribute to the larger amount of communication in TreadMarks:1) extra messages due to the separation of synchronization and data transfer, 2) extra messages to handle access misses caused by the use of an invalidate protocol, 3) false sharing, and 4) d iff accumulation for migratory data. We have quantified the effect of the last three factors by measuring the performance gain when each is eliminated. Because the separation of synchronization and data transfer is a fundamental characteristic of the shared memory model, there is no way to measure its contribution to performance without completely deviating from the shared memory model. Of the three remaining factors, TreadMarks’ inability to send data belonging to different pages in a single message is the most important. The effect of false sharing is quite limited. Reducing diff accumulation benefits migratory data only when the diffs completely overlap. When these performance impediments are removed, all of the TreadMarks programs perform within 25% of PVM, and for six out of eight experiments, TreadMarks is less than 5% slower than PVM
    • …
    corecore