1,607 research outputs found
Monotonicity preserving approximation of multivariate scattered data
This paper describes a new method of monotone interpolation and smoothing of multivariate scattered data. It is based on the assumption that the function to be approximated is Lipschitz continuous. The method provides the optimal approximation in the worst case scenario and tight error bounds. Smoothing of noisy data subject to monotonicity constraints is converted into a quadratic programming problem. Estimation of the unknown Lipschitz constant from the data by sample splitting and cross-validation is described. Extension of the method for locally Lipschitz functions is presented.<br /
Functional Data Analysis in Electronic Commerce Research
This paper describes opportunities and challenges of using functional data
analysis (FDA) for the exploration and analysis of data originating from
electronic commerce (eCommerce). We discuss the special data structures that
arise in the online environment and why FDA is a natural approach for
representing and analyzing such data. The paper reviews several FDA methods and
motivates their usefulness in eCommerce research by providing a glimpse into
new domain insights that they allow. We argue that the wedding of eCommerce
with FDA leads to innovations both in statistical methodology, due to the
challenges and complications that arise in eCommerce data, and in online
research, by being able to ask (and subsequently answer) new research questions
that classical statistical methods are not able to address, and also by
expanding on research questions beyond the ones traditionally asked in the
offline environment. We describe several applications originating from online
transactions which are new to the statistics literature, and point out
statistical challenges accompanied by some solutions. We also discuss some
promising future directions for joint research efforts between researchers in
eCommerce and statistics.Comment: Published at http://dx.doi.org/10.1214/088342306000000132 in the
Statistical Science (http://www.imstat.org/sts/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Smoothing sparse and unevenly sampled curves using semiparametric mixed models: An application to online auctions
Functional data analysis can be challenging when the functional objects are sampled only very sparsely and unevenly. Most approaches rely on smoothing to recover the underlying functional object from the data which can be difficult if the data is irregularly distributed. In this paper we present a new approach that can overcome this challenge. The approach is based on the ideas of mixed models. Specifically, we propose a semiparametric mixed model with boosting to recover the functional object. While the model can handle sparse and unevenly distributed data, it also results in conceptually more meaningful functional objects. In particular, we motivate our method within the framework of eBay's online auctions. Online auctions produce monotonic increasing price curves that are often correlated across two auctions. The semiparametric mixed model accounts for this correlation in a parsimonious way. It also estimates the underlying increasing trend from the data without imposing model-constraints. Our application shows that the resulting functional objects are conceptually more appealing. Moreover, when used to forecast the outcome of an online auction, our approach also results in more accurate price predictions compared to standard approaches. We illustrate our model on a set of 183 closed auctions for Palm M515 personal digital assistants
L1 Control Theoretic Smoothing Splines
In this paper, we propose control theoretic smoothing splines with L1
optimality for reducing the number of parameters that describes the fitted
curve as well as removing outlier data. A control theoretic spline is a
smoothing spline that is generated as an output of a given linear dynamical
system. Conventional design requires exactly the same number of base functions
as given data, and the result is not robust against outliers. To solve these
problems, we propose to use L1 optimality, that is, we use the L1 norm for the
regularization term and/or the empirical risk term. The optimization is
described by a convex optimization, which can be efficiently solved via a
numerical optimization software. A numerical example shows the effectiveness of
the proposed method.Comment: Accepted for publication in IEEE Signal Processing Letters. 4 pages
(twocolumn), 5 figure
Shape preserving approximation using least squares splines
Least squares polynomial splines are an effective tool for data fitting, but they may fail to preserve essential properties of the underlying function, such as monotonicity or convexity. The shape restrictions are translated into linear inequality conditions on spline coefficients. The basis functions are selected in such a way that these conditions take a simple form, and the problem becomes non-negative least squares problem, for which effecitive and robust methods of solution exist. Multidimensional monotone approximation is achieved by using tensor-product splines with the appropriate restrictions. Additional inter polation conditions can also be introduced. The conversion formulas to traditional B-spline representation are provided. <br /
Conditional Transformation Models
The ultimate goal of regression analysis is to obtain information about the
conditional distribution of a response given a set of explanatory variables.
This goal is, however, seldom achieved because most established regression
models only estimate the conditional mean as a function of the explanatory
variables and assume that higher moments are not affected by the regressors.
The underlying reason for such a restriction is the assumption of additivity of
signal and noise. We propose to relax this common assumption in the framework
of transformation models. The novel class of semiparametric regression models
proposed herein allows transformation functions to depend on explanatory
variables. These transformation functions are estimated by regularised
optimisation of scoring rules for probabilistic forecasts, e.g. the continuous
ranked probability score. The corresponding estimated conditional distribution
functions are consistent. Conditional transformation models are potentially
useful for describing possible heteroscedasticity, comparing spatially varying
distributions, identifying extreme events, deriving prediction intervals and
selecting variables beyond mean regression effects. An empirical investigation
based on a heteroscedastic varying coefficient simulation model demonstrates
that semiparametric estimation of conditional distribution functions can be
more beneficial than kernel-based non-parametric approaches or parametric
generalised additive models for location, scale and shape
Monotone approximation of aggregation operators using least squares splines
The need for monotone approximation of scattered data often arises in many problems of regression, when the monotonicity is semantically important. One such domain is fuzzy set theory, where membership functions and aggregation operators are order preserving. Least squares polynomial splines provide great flexbility when modeling non-linear functions, but may fail to be monotone. Linear restrictions on spline coefficients provide necessary and sufficient conditions for spline monotonicity. The basis for splines is selected in such a way that these restrictions take an especially simple form. The resulting non-negative least squares problem can be solved by a variety of standard proven techniques. Additional interpolation requirements can also be imposed in the same framework. The method is applied to fuzzy systems, where membership functions and aggregation operators are constructed from empirical data.<br /
- …