8,641 research outputs found
Robust regression with imprecise data
We consider the problem of regression analysis with imprecise data. By imprecise data we mean imprecise observations of precise quantities in the form of sets of values. In this paper, we explore a recently introduced likelihood-based approach to regression with such data. The approach is very general, since it covers all kinds of imprecise data (i.e. not only intervals) and it is not restricted to linear regression. Its result consists of a set of functions, reflecting the entire uncertainty of the regression problem. Here we study in particular a robust special case of the likelihood-based imprecise regression, which can be interpreted as a generalization of the method of least median of squares. Moreover, we apply it to data from a social survey, and compare it with other approaches to regression with imprecise data. It turns out that the likelihood-based approach is the most generally applicable one and is the only approach accounting for multiple sources of uncertainty at the same time
Linear regression for numeric symbolic variables: an ordinary least squares approach based on Wasserstein Distance
In this paper we present a linear regression model for modal symbolic data.
The observed variables are histogram variables according to the definition
given in the framework of Symbolic Data Analysis and the parameters of the
model are estimated using the classic Least Squares method. An appropriate
metric is introduced in order to measure the error between the observed and the
predicted distributions. In particular, the Wasserstein distance is proposed.
Some properties of such metric are exploited to predict the response variable
as direct linear combination of other independent histogram variables. Measures
of goodness of fit are discussed. An application on real data corroborates the
proposed method
The first analytical expression to estimate photometric redshifts suggested by a machine
We report the first analytical expression purely constructed by a machine to
determine photometric redshifts () of galaxies. A simple and
reliable functional form is derived using galaxies from the Sloan
Digital Sky Survey Data Release 10 (SDSS-DR10) spectroscopic sample. The method
automatically dropped the and bands, relying only on , and
for the final solution. Applying this expression to other SDSS-DR10
galaxies, with measured spectroscopic redshifts (), we achieved a
mean and a scatter when averaged up to . The method was
also applied to the PHAT0 dataset, confirming the competitiveness of our
results when faced with other methods from the literature. This is the first
use of symbolic regression in cosmology, representing a leap forward in
astronomy-data-mining connection.Comment: 6 pages, 4 figures. Accepted for publication in MNRAS Letter
Likelihood-based Imprecise Regression
We introduce a new approach to regression with imprecisely observed data, combining likelihood inference with ideas from imprecise probability theory, and thereby taking different kinds of uncertainty into account. The approach is very general and applicable to various kinds of imprecise data, not only to intervals.
In the present paper, we propose a regression method based on this approach, where no parametric distributional assumption is needed and interval estimates of quantiles of the error distribution are used to identify plausible descriptions of the relationship of interest. Therefore, the proposed regression method is very robust.
We apply our robust regression method to an interesting question in the social sciences. The analysis, based on survey data, yields a relatively imprecise result, reflecting the high amount of uncertainty inherent in the analyzed data set
- …