Search CORE

67 research outputs found

Five Tales of Random Forest Regression

Author: Carliles Samuel Lee
Publication venue: 'The Busan Gyeongnam Mathematical Society'
Publication date: 19/04/2017
Field of study

We present a set of variations on the theme of Random Forest regression: two applications to the problem of estimating galactic distances based on photometry which produce results comparable to or better than all other current approaches to the problem, an extension of the methodology to produce error distribution variance estimates for individual regression estimates which property appears unique among non-parametric regression estimators, an exponential asymptotic improvement in algorithmic training speed over the current de facto standard implementation which improvement was derived from a theoretical model of the training process combined with competent software engineering, a massively parallel implementation of the regression algorithm for a GPGPU cluster integrated with a distributed database management system resulting in a fast roundtrip ingest-analyze-archive procedure on a system with total power consumption under 1kW, and a novel theoretical comparison of the methodology with that of kernel regression relating the Random Forest bootstrap sample size to the kernel regression bandwidth parameter, resulting in a novel extension of the Random Forest methodology which offers lower mean-squared error than the standard methodology

JScholarship

New Approaches To Photometric Redshift Prediction Via Gaussian Process Regression In The Sloan Digital Sky Survey

Author: A. N. Srivastava
Abazajian
Abazajian
Abazajian
Adelman-McCarthy
Ball
Bernstein
Blanton
Carliles
Csabai
D'Abrusco
Efron
Eisenstein
Foster
Giavalisco
Golub
Ivezic
Kaczmarczik
Kurtz
L. V. Foster
M. J. Way
Martin
Neal
P. R. Gazis
Rasmussen
Seeger
Skrutskie
Stoughton
Strauss
Suchkov
Sérsic
Wahba
Wang
Wang
Way
Wray
York
Publication venue: 'IOP Publishing'
Publication date: 26/10/2009
Field of study

Expanding upon the work of Way and Srivastava 2006 we demonstrate how the use of training sets of comparable size continue to make Gaussian process regression (GPR) a competitive approach to that of neural networks and other least-squares fitting methods. This is possible via new large size matrix inversion techniques developed for Gaussian processes (GPs) that do not require that the kernel matrix be sparse. This development, combined with a neural-network kernel function appears to give superior results for this problem. Our best fit results for the Sloan Digital Sky Survey (SDSS) Main Galaxy Sample using u,g,r,i,z filters gives an rms error of 0.0201 while our results for the same filters in the luminous red galaxy sample yield 0.0220. We also demonstrate that there appears to be a minimum number of training-set galaxies needed to obtain the optimal fit when using our GPR rank-reduction methods. We find that morphological information included with many photometric surveys appears, for the most part, to make the photometric redshift evaluation slightly worse rather than better. This would indicate that most morphological information simply adds noise from the GP point of view in the data used herein. In addition, we show that cross-match catalog results involving combinations of the Two Micron All Sky Survey, SDSS, and Galaxy Evolution Explorer have to be evaluated in the context of the resulting cross-match magnitude and redshift distribution. Otherwise one may be misled into overly optimistic conclusions.Comment: 32 pages, ApJ in Press, 2 new figures, 1 new table of comparison methods, updated discussion, references and typos to reflect version in Pres

arXiv.org e-Print Archive

Crossref

ArborZ: Photometric Redshifts Using Boosted Decision Trees

Author: Abbott
Adam J. Sypniewski
Adelman-McCarthy
Ball
Ball
Baum
Benítez
Blanton
Breiman
Carliles
Coe
David W. Gerdes
Davis
Eisenstein
Faber
Fernández-Soto
Giavalisco
Hansen
Hastie
Hoecker
Ilbert
Ivezic
Jiangang Hao
Koo
Lin
Matthew R. Weis
Michael T. Busha
Mitchell
Newman
Oyaizu
Risa H. Wechsler
Scoville
Strauss
Timothy A. McKay
York
Zehavi
Zheng
Publication venue: 'IOP Publishing'
Publication date: 27/08/2009
Field of study

Precision photometric redshifts will be essential for extracting cosmological parameters from the next generation of wide-area imaging surveys. In this paper we introduce a photometric redshift algorithm, ArborZ, based on the machine-learning technique of Boosted Decision Trees. We study the algorithm using galaxies from the Sloan Digital Sky Survey and from mock catalogs intended to simulate both the SDSS and the upcoming Dark Energy Survey. We show that it improves upon the performance of existing algorithms. Moreover, the method naturally leads to the reconstruction of a full probability density function (PDF) for the photometric redshift of each galaxy, not merely a single "best estimate" and error, and also provides a photo-z quality figure-of-merit for each galaxy that can be used to reject outliers. We show that the stacked PDFs yield a more accurate reconstruction of the redshift distribution N(z). We discuss limitations of the current algorithm and ideas for future work.Comment: 10 pages, 13 figures, submitted to Ap

arXiv.org e-Print Archive

Crossref

Reconstructing galaxy fundamental distributions and scaling relations from photometric redshift surveys. Applications to the SDSS early-type sample

Author: Banerji
Bernardi
Bernardi
Blanton
Blanton
Bolton
Bolzonella
Bridle
BudavÃ¡ri
BudavÃ¡ri
Carliles
Changbom Park
Coleman
Collister
Connolly
Croom
Csabai
Faber
Feldmann
Fosalba
Frieman
Graziano Rossi
Hildebrandt
Hoaglin
Hyde
Ilbert
Jouvel
Kormendy
Krauss
Lilly
Lima
Lucy
Ma
Mandelbaum
Melbourne
Oyaizu
Oyaizu
Padmanabhan
Park
Ravi K. Sheth
Richards
Rossi
Sako
Salvato
Saracco
Schmidt
Sheth
Stabenau
Sun
Tully
van den Bosch
van der Wel
Yasuda
Publication venue: 'Wiley'
Publication date: 07/10/2009
Field of study

Noisy distance estimates associated with photometric rather than spectroscopic redshifts lead to a mis-estimate of the luminosities, and produce a correlated mis-estimate of the sizes. We consider a sample of early-type galaxies from the SDSS DR6 for which both spectroscopic and photometric information is available, and apply the generalization of the V_max method to correct for these biases. We show that our technique recovers the true redshift, magnitude and size distributions, as well as the true size-luminosity relation. We find that using only 10% of the spectroscopic information randomly spaced in our catalog is sufficient for the reconstructions to be accurate within about 3%, when the photometric redshift error is dz = 0.038. We then address the problem of extending our method to deep redshift catalogs, where only photometric information is available. In addition to the specific applications outlined here, our technique impacts a broader range of studies, when at least one distance-dependent quantity is involved. It is particularly relevant for the next generation of surveys, some of which will only have photometric information.Comment: 14 pages, 12 figures, 1 table, new section 3.1 and appendix added, MNRAS in pres

arXiv.org e-Print Archive

Crossref

Automated measurement of redshift from mid-infrared low resolution spectroscopy

Author: Antonio Hernán-Caballero
Abramo
Assef
Avni
Babbedge
Baum
Benítez
Benítez
Bolzonella
Brodwin
Brunner
Bruzual
Bruzual
Carliles
Coleman
Collister
Connolly
Farrah
Feldmann
Fernández-Soto
Goicoechea
Gwyn
Hatziminaoglou
Hernán-Caballero
Hernán-Caballero
Houck
Houck
Ilbert
Imanishi
Koo
Lanzetta
Le Borgne
Lebouteiller
Matute
Murakami
Murphy
Negrello
Onaka
Oyaizu
Pirzkal
Press
Richards
Rowan-Robinson
Sawicki
Silva
Trouille
Wada
Wadadekar
Wang
Weedman
Weedman
Wirth
Wright
Yan
Publication venue: 'Wiley'
Publication date: 13/09/2011
Field of study

We present a new SED-fitting based routine for redshift determination that is optimised for mid-infrared (MIR) low-resolution spectroscopy. Its flexible template scaling increases the sensitivity to slope changes and small scale features in the spectrum, while a new selection algorithm called Maximum Combined Pseudo-Likelihood (MCPL) provides increased accuracy and a lower number of outliers compared to the standard maximum-likelihood (ML) approach. Unlike ML, MCPL searches for local (instead of absolute) maxima of a 'pseudo-likelihood' (PL) function, and combines results obtained for all the templates in the library to weed out spurious redshift solutions. The capabilities of MCPL are demonstrated by comparing its results to those of regular ML and to the optical spectroscopic redshifts of a sample of 491 Spitzer/IRS spectra from sources at 0<z<3.7. MCPL achieves a redshift accuracy dz/(1+z)<0.005 for 78% of the galaxies in the sample compared to 68% for ML. The rate of outliers (dz/(1+z)>0.02) is 14% for MCPL and 22% for ML. chi^2 values for ML solutions are found to correlate with the SNR of the spectra, but not with redshift accuracy. By contrast, the peak value of the normalised combined PL (gamma) is found to provide a good indication on the reliability of the MCPL solution for individual sources. The accuracy and reliability of the redshifts depends strongly on the MIR SED. Sources with significant polycyclic aromatic hydrocarbon emission obtain much better results compared to sources dominated by AGN continuum. Nevertheless, for a given gamma the frequency of accurate solutions and outliers is largely independent on their SED type. This reliability indicator for MCPL solutions allows to select subsamples with highly reliable redshifts. In particular, a gamma>0.15 threshold retains 79% of the sources with dz/(1+z)<0.005 while reducing the outlier rate to 3.8% (abridged).Comment: 23 pages, 12 figures, 5 tables. Accepted for publication in MNRA

arXiv.org e-Print Archive

Crossref

Digital.CSIC

Scholar Commons - University of South Florida

Automated measurement of redshift from mid-infrared low resolution spectroscopy

Author: Abramo
Antonio Hernán-Caballero
Assef
Avni
Babbedge
Baum
Benítez
Benítez
Bolzonella
Brodwin
Brunner
Bruzual
Bruzual
Carliles
Coleman
Collister
Connolly
Farrah
Feldmann
Fernández-Soto
Goicoechea
Gwyn
Hatziminaoglou
Hernán-Caballero
Hernán-Caballero
Houck
Houck
Ilbert
Imanishi
Koo
Lanzetta
Le Borgne
Lebouteiller
Matute
Murakami
Murphy
Negrello
Onaka
Oyaizu
Pirzkal
Press
Richards
Rowan-Robinson
Sawicki
Silva
Trouille
Wada
Wadadekar
Wang
Weedman
Weedman
Wirth
Wright
Yan
Publication venue: 'Wiley'
Publication date: 04/09/2012
Field of study

arXiv.org e-Print Archive

Crossref

A Comparison of Photometric Redshift Techniques for Large Radio Surveys

Author: Brescia M.
Budavari T.
Carliles S.
Cavuoti S.
Farrah D.
Geach J.
Longo G.
Luken K.
Musaeva A.
Norris Ray P.
Polsterer K.
Riccio G.
Salvato M.
Seymour N.
Smolčić V.
Vaccari M.
Zinn P.
Publication venue: 'Astronomical Society of the Pacific Conference Series'
Publication date: 01/01/2019
Field of study

Future radio surveys will generate catalogs of tens of millions of radio sources, for which redshift estimates will be essential to achieve many of the science goals. However, spectroscopic data will be available for only a small fraction of these sources, and in most cases even the optical and infrared photometry will be of limited quality. Furthermore, radio sources tend to be at higher redshift than most optical sources (most radio surveys have a median redshift greater than 1) and so a significant fraction of radio sources hosts differ from those for which most photometric redshift templates are designed. We therefore need to develop new techniques for estimating the redshifts of radio sources. As a starting point in this process, we evaluate a number of machine-learning techniques for estimating redshift, together with a conventional template-fitting technique. We pay special attention to how the performance is affected by the incompleteness of the training sample and by sparseness of the parameter space or by limited availability of ancillary multiwavelength data. As expected, we find that the quality of the photometric-redshift degrades as the quality of the photometry decreases, but that even with the limited quality of photometry available for all-sky-surveys, useful redshift information is available for the majority of sources, particularly at low redshift. We find that a template-fitting technique performs best in the presence of high-quality and almost complete multi-band photometry, especially if radio sources that are also X-ray emitting are treated separately, using specific templates and priors. When we reduced the quality of photometry to match that available for the EMU all-sky radio survey, the quality of the template-fitting degraded and became comparable to some of the machine-learning methods. Machine learning techniques currently perform better at low redshift than at high redshift, because of incompleteness of the currently available training data at high redshifts

arXiv.org e-Print Archive

OA@INAF - Istituto Nazionale di Astrofisica

Caltech Authors

Western Sydney ResearchDirect

MPG.PuRe

A comparison of photometric redshift techniques for large radio surveys

Author: Brescia M.
Budavari T.
Carliles S.
Cavuoti S.
Farrah D.
Geach J.
Longo G.
Luken K.
Musaeva A.
Norris R. P.
Polsterer K.
Riccio G.
Salvato M.
Seymour N.
Smolcic V.
Vaccari M.
Zinn P.
Publication venue: 'IOP Publishing'
Publication date: 01/01/2019
Field of study

Archivio della ricerca - Università degli studi di Napoli Federico II