144,437 research outputs found
Kernel Regression For Determining Photometric Redshifts From Sloan Broadband Photometry
We present a new approach, kernel regression, to determine photometric
redshifts for 399,929 galaxies in the Fifth Data Release of the Sloan Digital
Sky Survey (SDSS). In our case, kernel regression is a weighted average of
spectral redshifts of the neighbors for a query point, where higher weights are
associated with points that are closer to the query point. One important design
decision when using kernel regression is the choice of the bandwidth. We apply
10-fold cross-validation to choose the optimal bandwidth, which is obtained as
the cross-validation error approaches the minimum. The experiments show that
the optimal bandwidth is different for diverse input patterns, the least rms
error of photometric redshift estimation arrives at 0.019 using color+eClass as
the inputs, the less rms error amounts to 0.020 using ugriz+eClass as the
inputs. Here eClass is a galaxy spectra type. Then the little rms scatter is
0.021 with color+r as the inputs.Comment: 6 pages,2 figures, accepted for publication in MNRA
Rule-based Machine Learning Methods for Functional Prediction
We describe a machine learning method for predicting the value of a
real-valued function, given the values of multiple input variables. The method
induces solutions from samples in the form of ordered disjunctive normal form
(DNF) decision rules. A central objective of the method and representation is
the induction of compact, easily interpretable solutions. This rule-based
decision model can be extended to search efficiently for similar cases prior to
approximating function values. Experimental results on real-world data
demonstrate that the new techniques are competitive with existing machine
learning and statistical methods and can sometimes yield superior regression
performance.Comment: See http://www.jair.org/ for any accompanying file
Weighted Polynomial Approximations: Limits for Learning and Pseudorandomness
Polynomial approximations to boolean functions have led to many positive
results in computer science. In particular, polynomial approximations to the
sign function underly algorithms for agnostically learning halfspaces, as well
as pseudorandom generators for halfspaces. In this work, we investigate the
limits of these techniques by proving inapproximability results for the sign
function.
Firstly, the polynomial regression algorithm of Kalai et al. (SIAM J. Comput.
2008) shows that halfspaces can be learned with respect to log-concave
distributions on in the challenging agnostic learning model. The
power of this algorithm relies on the fact that under log-concave
distributions, halfspaces can be approximated arbitrarily well by low-degree
polynomials. We ask whether this technique can be extended beyond log-concave
distributions, and establish a negative result. We show that polynomials of any
degree cannot approximate the sign function to within arbitrarily low error for
a large class of non-log-concave distributions on the real line, including
those with densities proportional to .
Secondly, we investigate the derandomization of Chernoff-type concentration
inequalities. Chernoff-type tail bounds on sums of independent random variables
have pervasive applications in theoretical computer science. Schmidt et al.
(SIAM J. Discrete Math. 1995) showed that these inequalities can be established
for sums of random variables with only -wise independence,
for a tail probability of . We show that their results are tight up to
constant factors.
These results rely on techniques from weighted approximation theory, which
studies how well functions on the real line can be approximated by polynomials
under various distributions. We believe that these techniques will have further
applications in other areas of computer science.Comment: 22 page
- …