33 research outputs found

    A Comparison of Photometric Redshift Techniques for Large Radio Surveys

    Get PDF
    Future radio surveys will generate catalogs of tens of millions of radio sources, for which redshift estimates will be essential to achieve many of the science goals. However, spectroscopic data will be available for only a small fraction of these sources, and in most cases even the optical and infrared photometry will be of limited quality. Furthermore, radio sources tend to be at higher redshift than most optical sources (most radio surveys have a median redshift greater than 1) and so a significant fraction of radio sources hosts differ from those for which most photometric redshift templates are designed. We therefore need to develop new techniques for estimating the redshifts of radio sources. As a starting point in this process, we evaluate a number of machine-learning techniques for estimating redshift, together with a conventional template-fitting technique. We pay special attention to how the performance is affected by the incompleteness of the training sample and by sparseness of the parameter space or by limited availability of ancillary multiwavelength data. As expected, we find that the quality of the photometric-redshift degrades as the quality of the photometry decreases, but that even with the limited quality of photometry available for all-sky-surveys, useful redshift information is available for the majority of sources, particularly at low redshift. We find that a template-fitting technique performs best in the presence of high-quality and almost complete multi-band photometry, especially if radio sources that are also X-ray emitting are treated separately, using specific templates and priors. When we reduced the quality of photometry to match that available for the EMU all-sky radio survey, the quality of the template-fitting degraded and became comparable to some of the machine-learning methods. Machine learning techniques currently perform better at low redshift than at high redshift, because of incompleteness of the currently available training data at high redshifts

    A comparison of photometric redshift techniques for large radio surveys

    Get PDF
    Future radio surveys will generate catalogs of tens of millions of radio sources, for which redshift estimates will be essential to achieve many of the science goals. However, spectroscopic data will be available for only a small fraction of these sources, and in most cases even the optical and infrared photometry will be of limited quality. Furthermore, radio sources tend to be at higher redshift than most optical sources (most radio surveys have a median redshift greater than 1) and so a significant fraction of radio sources hosts differ from those for which most photometric redshift templates are designed. We therefore need to develop new techniques for estimating the redshifts of radio sources. As a starting point in this process, we evaluate a number of machine-learning techniques for estimating redshift, together with a conventional template-fitting technique. We pay special attention to how the performance is affected by the incompleteness of the training sample and by sparseness of the parameter space or by limited availability of ancillary multiwavelength data. As expected, we find that the quality of the photometric-redshift degrades as the quality of the photometry decreases, but that even with the limited quality of photometry available for all-sky-surveys, useful redshift information is available for the majority of sources, particularly at low redshift. We find that a template-fitting technique performs best in the presence of high-quality and almost complete multi-band photometry, especially if radio sources that are also X-ray emitting are treated separately, using specific templates and priors. When we reduced the quality of photometry to match that available for the EMU all-sky radio survey, the quality of the template-fitting degraded and became comparable to some of the machine-learning methods. Machine learning techniques currently perform better at low redshift than at high redshift, because of incompleteness of the currently available training data at high redshifts

    Active Learning to Overcome Sample Selection Bias: Application to Photometric Variable Star Classification

    Full text link
    Despite the great promise of machine-learning algorithms to classify and predict astrophysical parameters for the vast numbers of astrophysical sources and transients observed in large-scale surveys, the peculiarities of the training data often manifest as strongly biased predictions on the data of interest. Typically, training sets are derived from historical surveys of brighter, more nearby objects than those from more extensive, deeper surveys (testing data). This sample selection bias can cause catastrophic errors in predictions on the testing data because a) standard assumptions for machine-learned model selection procedures break down and b) dense regions of testing space might be completely devoid of training data. We explore possible remedies to sample selection bias, including importance weighting (IW), co-training (CT), and active learning (AL). We argue that AL---where the data whose inclusion in the training set would most improve predictions on the testing set are queried for manual follow-up---is an effective approach and is appropriate for many astronomical applications. For a variable star classification problem on a well-studied set of stars from Hipparcos and OGLE, AL is the optimal method in terms of error rate on the testing data, beating the off-the-shelf classifier by 3.4% and the other proposed methods by at least 3.0%. To aid with manual labeling of variable stars, we developed a web interface which allows for easy light curve visualization and querying of external databases. Finally, we apply active learning to classify variable stars in the ASAS survey, finding dramatic improvement in our agreement with the ACVS catalog, from 65.5% to 79.5%, and a significant increase in the classifier's average confidence for the testing set, from 14.6% to 42.9%, after a few AL iterations.Comment: 43 pages, 11 figures, submitted to Ap

    A Comparison of Photometric Redshift Techniques for Large Radio Surveys

    Get PDF
    Future radio surveys will generate catalogs of tens of millions of radio sources, for which redshift estimates will be essential to achieve many of the science goals. However, spectroscopic data will be available for only a small fraction of these sources, and in most cases even the optical and infrared photometry will be of limited quality. Furthermore, radio sources tend to be at higher redshift than most optical sources (most radio surveys have a median redshift greater than 1) and so a significant fraction of radio sources hosts differ from those for which most photometric redshift templates are designed. We therefore need to develop new techniques for estimating the redshifts of radio sources. As a starting point in this process, we evaluate a number of machine-learning techniques for estimating redshift, together with a conventional template-fitting technique. We pay special attention to how the performance is affected by the incompleteness of the training sample and by sparseness of the parameter space or by limited availability of ancillary multiwavelength data. As expected, we find that the quality of the photometric-redshift degrades as the quality of the photometry decreases, but that even with the limited quality of photometry available for all-sky-surveys, useful redshift information is available for the majority of sources, particularly at low redshift. We find that a template-fitting technique performs best in the presence of high-quality and almost complete multi-band photometry, especially if radio sources that are also X-ray emitting are treated separately, using specific templates and priors. When we reduced the quality of photometry to match that available for the EMU all-sky radio survey, the quality of the template-fitting degraded and became comparable to some of the machine-learning methods. Machine learning techniques currently perform better at low redshift than at high redshift, because of incompleteness of the currently available training data at high redshifts

    The Fifth Data Release of the Sloan Digital Sky Survey

    Get PDF
    This paper describes the Fifth Data Release (DR5) of the Sloan Digital Sky Survey (SDSS). DR5 includes all survey quality data taken through June 2005 and represents the completion of the SDSS-I project (whose successor, SDSS-II will continue through mid-2008). It includes five-band photometric data for 217 million objects selected over 8000 square degrees, and 1,048,960 spectra of galaxies, quasars, and stars selected from 5713 square degrees of that imaging data. These numbers represent a roughly 20% increment over those of the Fourth Data Release; all the data from previous data releases are included in the present release. In addition to "standard" SDSS observations, DR5 includes repeat scans of the southern equatorial stripe, imaging scans across M31 and the core of the Perseus cluster of galaxies, and the first spectroscopic data from SEGUE, a survey to explore the kinematics and chemical evolution of the Galaxy. The catalog database incorporates several new features, including photometric redshifts of galaxies, tables of matched objects in overlap regions of the imaging survey, and tools that allow precise computations of survey geometry for statistical investigations.Comment: ApJ Supp, in press, October 2007. This paper describes DR5. The SDSS Sixth Data Release (DR6) is now public, available from http://www.sdss.or

    The Seventh Data Release of the Sloan Digital Sky Survey

    Get PDF
    This paper describes the Seventh Data Release of the Sloan Digital Sky Survey (SDSS), marking the completion of the original goals of the SDSS and the end of the phase known as SDSS-II. It includes 11663 deg^2 of imaging data, with most of the roughly 2000 deg^2 increment over the previous data release lying in regions of low Galactic latitude. The catalog contains five-band photometry for 357 million distinct objects. The survey also includes repeat photometry over 250 deg^2 along the Celestial Equator in the Southern Galactic Cap. A coaddition of these data goes roughly two magnitudes fainter than the main survey. The spectroscopy is now complete over a contiguous area of 7500 deg^2 in the Northern Galactic Cap, closing the gap that was present in previous data releases. There are over 1.6 million spectra in total, including 930,000 galaxies, 120,000 quasars, and 460,000 stars. The data release includes improved stellar photometry at low Galactic latitude. The astrometry has all been recalibrated with the second version of the USNO CCD Astrograph Catalog (UCAC-2), reducing the rms statistical errors at the bright end to 45 milli-arcseconds per coordinate. A systematic error in bright galaxy photometr is less severe than previously reported for the majority of galaxies. Finally, we describe a series of improvements to the spectroscopic reductions, including better flat-fielding and improved wavelength calibration at the blue end, better processing of objects with extremely strong narrow emission lines, and an improved determination of stellar metallicities. (Abridged)Comment: 20 pages, 10 embedded figures. Accepted to ApJS after minor correction

    The Second Data Release of the Sloan Digital Sky Survey

    Get PDF
    The Sloan Digital Sky Survey (SDSS) has validated and made publicly available its Second Data Release. This data release consists of 3324 deg2 of five-band (ugriz) imaging data with photometry for over 88 million unique objects, 367,360 spectra of galaxies, quasars, stars, and calibrating blank sky patches selected over 2627 deg2 of this area, and tables of measured parameters from these data. The imaging data reach a depth of r ≈ 22.2 (95% completeness limit for point sources) and are photometrically and astrometrically calibrated to 2% rms and 100 mas rms per coordinate, respectively. The imaging data have all been processed through a new version of the SDSS imaging pipeline, in which the most important improvement since the last data release is fixing an error in the model fits to each object. The result is that model magnitudes are now a good proxy for point-spread function magnitudes for point sources, and Petrosian magnitudes for extended sources. The spectroscopy extends from 3800 to 9200 Å at a resolution of 2000. The spectroscopic software now repairs a systematic error in the radial velocities of certain types of stars and has substantially improved spectrophotometry. All data included in the SDSS Early Data Release and First Data Release are reprocessed with the improved pipelines and included in the Second Data Release. Further characteristics of the data are described, as are the data products themselves and the tools for accessing them

    Rapid, Machine-Learned Resource Allocation: Application to High-redshift GRB Follow-up

    Full text link
    As the number of observed Gamma-Ray Bursts (GRBs) continues to grow, follow-up resources need to be used more efficiently in order to maximize science output from limited telescope time. As such, it is becoming increasingly important to rapidly identify bursts of interest as soon as possible after the event, before the afterglows fade beyond detectability. Studying the most distant (highest redshift) events, for instance, remains a primary goal for many in the field. Here we present our Random forest Automated Triage Estimator for GRB redshifts (RATE GRB-z) for rapid identification of high-redshift candidates using early-time metrics from the three telescopes onboard Swift. While the basic RATE methodology is generalizable to a number of resource allocation problems, here we demonstrate its utility for telescope-constrained follow-up efforts with the primary goal to identify and study high-z GRBs. For each new GRB, RATE GRB-z provides a recommendation - based on the available telescope time - of whether the event warrants additional follow-up resources. We train RATE GRB-z using a set consisting of 135 Swift bursts with known redshifts, only 18 of which are z > 4. Cross-validated performance metrics on this training data suggest that ~56% of high-z bursts can be captured from following up the top 20% of the ranked candidates, and ~84% of high-z bursts are identified after following up the top ~40% of candidates. We further use the method to rank 200+ Swift bursts with unknown redshifts according to their likelihood of being high-z.Comment: 44 pages, 11 figures, 6 tables, Accepted to Ap

    PHAT: PHoto-z Accuracy Testing

    Get PDF
    Original article can be found at: http://www.aanda.org Copyright European Southern ObservatoryContext: Photometric redshifts (photo-z’s) have become an essential tool in extragalactic astronomy. Many current and upcoming observing programmes require great accuracy of photo-z’s to reach their scientific goals. Aims: Here we introduce PHAT, the PHoto-z Accuracy Testing programme, an international initiative to test and compare different methods of photo-z estimation. Methods: Two different test environments are set up, one (PHAT0) based on simulations to test the basic functionality of the different photo-z codes, and another one (PHAT1) based on data from the GOODS survey including 18-band photometry and ~2000 spectroscopic redshifts. Results: The accuracy of the different methods is expressed and ranked by the global photo-z bias, scatter, and outlier rates. While most methods agree very well on PHAT0 there are differences in the handling of the Lyman-α forest for higher redshifts. Furthermore, different methods produce photo-z scatters that can differ by up to a factor of two even in this idealised case. A larger spread in accuracy is found for PHAT1. Few methods benefit from the addition of mid-IR photometry. The accuracy of the other methods is unaffected or suffers when IRAC data are included. Remaining biases and systematic effects can be explained by shortcomings in the different template sets (especially in the mid-IR) and the use of priors on the one hand and an insufficient training set on the other hand. Some strategies to overcome these problems are identified by comparing the methods in detail. Scatters of 4–8% in Δz / (1 + z) were obtained, consistent with other studies. However, somewhat larger outlier rates (>7.5% with Δz / (1 + z) > 0.15; > 4.5% after cleaning) are found for all codes that can only partly be explained by AGN or issues in the photometry or the spec-z catalogue. Some outliers were probably missed in comparisons of photo-z’s to other, less complete spectroscopic surveys in the past. There is a general trend that empirical codes produce smaller biases than template-based codes. Conclusions: The systematic, quantitative comparison of different photo-z codes presented here is a snapshot of the current state-of-the-art of photo-z estimation and sets a standard for the assessment of photo-z accuracy in the future. The rather large outlier rates reported here for PHAT1 on real data should be investigated further since they are most probably also present (and possibly hidden) in many other studies. The test data sets are publicly available and can be used to compare new, upcoming methods to established ones and help in guiding future photo-z method development.Peer reviewe
    corecore