31,620 research outputs found
A Better Alternative to Piecewise Linear Time Series Segmentation
Time series are difficult to monitor, summarize and predict. Segmentation
organizes time series into few intervals having uniform characteristics
(flatness, linearity, modality, monotonicity and so on). For scalability, we
require fast linear time algorithms. The popular piecewise linear model can
determine where the data goes up or down and at what rate. Unfortunately, when
the data does not follow a linear model, the computation of the local slope
creates overfitting. We propose an adaptive time series model where the
polynomial degree of each interval vary (constant, linear and so on). Given a
number of regressors, the cost of each interval is its polynomial degree:
constant intervals cost 1 regressor, linear intervals cost 2 regressors, and so
on. Our goal is to minimize the Euclidean (l_2) error for a given model
complexity. Experimentally, we investigate the model where intervals can be
either constant or linear. Over synthetic random walks, historical stock market
prices, and electrocardiograms, the adaptive model provides a more accurate
segmentation than the piecewise linear model without increasing the
cross-validation error or the running time, while providing a richer vocabulary
to applications. Implementation issues, such as numerical stability and
real-world performance, are discussed.Comment: to appear in SIAM Data Mining 200
Augmented Sparse Reconstruction of Protein Signaling Networks
The problem of reconstructing and identifying intracellular protein signaling
and biochemical networks is of critical importance in biology today. We sought
to develop a mathematical approach to this problem using, as a test case, one
of the most well-studied and clinically important signaling networks in biology
today, the epidermal growth factor receptor (EGFR) driven signaling cascade.
More specifically, we suggest a method, augmented sparse reconstruction, for
the identification of links among nodes of ordinary differential equation (ODE)
networks from a small set of trajectories with different initial conditions.
Our method builds a system of representation by using a collection of integrals
of all given trajectories and by attenuating block of terms in the
representation itself. The system of representation is then augmented with
random vectors, and minimization of the 1-norm is used to find sparse
representations for the dynamical interactions of each node. Augmentation by
random vectors is crucial, since sparsity alone is not able to handle the large
error-in-variables in the representation. Augmented sparse reconstruction
allows to consider potentially very large spaces of models and it is able to
detect with high accuracy the few relevant links among nodes, even when
moderate noise is added to the measured trajectories. After showing the
performance of our method on a model of the EGFR protein network, we sketch
briefly the potential future therapeutic applications of this approach.Comment: 24 pages, 6 figure
Implicitly Constrained Semi-Supervised Least Squares Classification
We introduce a novel semi-supervised version of the least squares classifier.
This implicitly constrained least squares (ICLS) classifier minimizes the
squared loss on the labeled data among the set of parameters implied by all
possible labelings of the unlabeled data. Unlike other discriminative
semi-supervised methods, our approach does not introduce explicit additional
assumptions into the objective function, but leverages implicit assumptions
already present in the choice of the supervised least squares classifier. We
show this approach can be formulated as a quadratic programming problem and its
solution can be found using a simple gradient descent procedure. We prove that,
in a certain way, our method never leads to performance worse than the
supervised classifier. Experimental results corroborate this theoretical result
in the multidimensional case on benchmark datasets, also in terms of the error
rate.Comment: 12 pages, 2 figures, 1 table. The Fourteenth International Symposium
on Intelligent Data Analysis (2015), Saint-Etienne, Franc
Robust optimization in simulation: Taguchi and response surface methodology
Optimization of simulated systems is tackled by many methods, but most methods assume known environments. This article, however, develops a `robust' methodology for uncertain environments. This methodology uses Taguchi's view of the uncertain world, but replaces his statistical techniques by Response Surface Methodology (RSM). George Box originated RSM, and Douglas Montgomery recently extended RSM to robust optimization of real (non-simulated) systems. We combine Taguchi's view with RSM for simulated systems. We illustrate the resulting methodology through classic Economic Order Quantity (EOQ) inventory models, which demonstrate that robust optimization may require order quantities that differ from the classic EOQ
- …