6,373,041 research outputs found
Computation of sum of squares polynomials from data points
We propose an iterative algorithm for the numerical computation of sums of
squares of polynomials approximating given data at prescribed interpolation
points. The method is based on the definition of a convex functional
arising from the dualization of a quadratic regression over the Cholesky
factors of the sum of squares decomposition. In order to justify the
construction, the domain of , the boundary of the domain and the behavior at
infinity are analyzed in details. When the data interpolate a positive
univariate polynomial, we show that in the context of the Lukacs sum of squares
representation, is coercive and strictly convex which yields a unique
critical point and a corresponding decomposition in sum of squares. For
multivariate polynomials which admit a decomposition in sum of squares and up
to a small perturbation of size , is always
coercive and so it minimum yields an approximate decomposition in sum of
squares. Various unconstrained descent algorithms are proposed to minimize .
Numerical examples are provided, for univariate and bivariate polynomials
Certainty of outlier and boundary points processing in data mining
Data certainty is one of the issues in the real-world applications which is
caused by unwanted noise in data. Recently, more attentions have been paid to
overcome this problem. We proposed a new method based on neutrosophic set (NS)
theory to detect boundary and outlier points as challenging points in
clustering methods. Generally, firstly, a certainty value is assigned to data
points based on the proposed definition in NS. Then, certainty set is presented
for the proposed cost function in NS domain by considering a set of main
clusters and noise cluster. After that, the proposed cost function is minimized
by gradient descent method. Data points are clustered based on their membership
degrees. Outlier points are assigned to noise cluster and boundary points are
assigned to main clusters with almost same membership degrees. To show the
effectiveness of the proposed method, two types of datasets including 3
datasets in Scatter type and 4 datasets in UCI type are used. Results
demonstrate that the proposed cost function handles boundary and outlier points
with more accurate membership degrees and outperforms existing state of the art
clustering methods.Comment: Conference Paper, 6 page
Data Processing Protocol for Regression of Geothermal Times Series with Uneven Intervals
Regression of data generated in simulations or experiments has important
implications in sensitivity studies, uncertainty analysis, and prediction
accuracy. Depending on the nature of the physical model, data points may not be
evenly distributed. It is not often practical to choose all points for
regression of a model because it doesn't always guarantee a better fit. Fitness
of the model is highly dependent on the number of data points and the
distribution of the data along the curve. In this study, the effect of the
number of points selected for regression is investigated and various schemes
aimed to process regression data points are explored. Time series data i.e.,
output varying with time, is our prime interest mainly the temperature profile
from enhanced geothermal system. The objective of the research is to find a
better scheme for choosing a fraction of data points from the entire set to
find a better fitness of the model without losing any features or trends in the
data. A workflow is provided to summarize the entire protocol of data
preprocessing, regression of mathematical model using training data, model
testing, and error analysis. Six different schemes are developed to process
data by setting criteria such as equal spacing along axes (X and Y), equal
distance between two consecutive points on the curve, constraint in the angle
of curvature, etc. As an example for the application of the proposed schemes, 1
to 20% of the data generated from the temperature change of a typical
geothermal system is chosen from a total of 9939 points. It is shown that the
number of data points, to a degree, has negligible effect on the fitted model
depending on the scheme. The proposed data processing schemes are ranked in
terms of R2 and NRMSE values
Detection and localization of change-points in high-dimensional network traffic data
We propose a novel and efficient method, that we shall call TopRank in the
following paper, for detecting change-points in high-dimensional data. This
issue is of growing concern to the network security community since network
anomalies such as Denial of Service (DoS) attacks lead to changes in Internet
traffic. Our method consists of a data reduction stage based on record
filtering, followed by a nonparametric change-point detection test based on
-statistics. Using this approach, we can address massive data streams and
perform anomaly detection and localization on the fly. We show how it applies
to some real Internet traffic provided by France-T\'el\'ecom (a French Internet
service provider) in the framework of the ANR-RNRT OSCAR project. This approach
is very attractive since it benefits from a low computational load and is able
to detect and localize several types of network anomalies. We also assess the
performance of the TopRank algorithm using synthetic data and compare it with
alternative approaches based on random aggregation.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS232 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …
