Search CORE

6,373,041 research outputs found

VetCompass clinical data points the way forward

Author: O'Neill D G
Publication venue
Publication date
Field of study

Computation of sum of squares polynomials from data points

Author: Després Bruno
Herda Maxime
Publication venue
Publication date: 01/01/2020
Field of study

We propose an iterative algorithm for the numerical computation of sums of squares of polynomials approximating given data at prescribed interpolation points. The method is based on the definition of a convex functional

G

arising from the dualization of a quadratic regression over the Cholesky factors of the sum of squares decomposition. In order to justify the construction, the domain of

G

, the boundary of the domain and the behavior at infinity are analyzed in details. When the data interpolate a positive univariate polynomial, we show that in the context of the Lukacs sum of squares representation,

G

is coercive and strictly convex which yields a unique critical point and a corresponding decomposition in sum of squares. For multivariate polynomials which admit a decomposition in sum of squares and up to a small perturbation of size

\varepsilon

G^\varepsilon

is always coercive and so it minimum yields an approximate decomposition in sum of squares. Various unconstrained descent algorithms are proposed to minimize

G

. Numerical examples are provided, for univariate and bivariate polynomials

arXiv.org e-Print Archive

Portail HAL Nantes Université

INRIA a CCSD electronic archive server

HAL: Hyper Article en Ligne

Certainty of outlier and boundary points processing in data mining

Author: Guo Yanhui
Minaei-bidgoli Behrouz
Norouzi Sanaz Saki
Rashno Elyas
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/12/2018
Field of study

Data certainty is one of the issues in the real-world applications which is caused by unwanted noise in data. Recently, more attentions have been paid to overcome this problem. We proposed a new method based on neutrosophic set (NS) theory to detect boundary and outlier points as challenging points in clustering methods. Generally, firstly, a certainty value is assigned to data points based on the proposed definition in NS. Then, certainty set is presented for the proposed cost function in NS domain by considering a set of main clusters and noise cluster. After that, the proposed cost function is minimized by gradient descent method. Data points are clustered based on their membership degrees. Outlier points are assigned to noise cluster and boundary points are assigned to main clusters with almost same membership degrees. To show the effectiveness of the proposed method, two types of datasets including 3 datasets in Scatter type and 4 datasets in UCI type are used. Results demonstrate that the proposed cost function handles boundary and outlier points with more accurate membership degrees and outperforms existing state of the art clustering methods.Comment: Conference Paper, 6 page

arXiv.org e-Print Archive

Crossref

Data Processing Protocol for Regression of Geothermal Times Series with Uneven Intervals

Author: Asai Pranay
Deo Milind
Panja Palash
Velasco Raul
Publication venue: 'Tech Reviews Ltd'
Publication date: 16/05/2019
Field of study

Regression of data generated in simulations or experiments has important implications in sensitivity studies, uncertainty analysis, and prediction accuracy. Depending on the nature of the physical model, data points may not be evenly distributed. It is not often practical to choose all points for regression of a model because it doesn't always guarantee a better fit. Fitness of the model is highly dependent on the number of data points and the distribution of the data along the curve. In this study, the effect of the number of points selected for regression is investigated and various schemes aimed to process regression data points are explored. Time series data i.e., output varying with time, is our prime interest mainly the temperature profile from enhanced geothermal system. The objective of the research is to find a better scheme for choosing a fraction of data points from the entire set to find a better fitness of the model without losing any features or trends in the data. A workflow is provided to summarize the entire protocol of data preprocessing, regression of mathematical model using training data, model testing, and error analysis. Six different schemes are developed to process data by setting criteria such as equal spacing along axes (X and Y), equal distance between two consecutive points on the curve, constraint in the angle of curvature, etc. As an example for the application of the proposed schemes, 1 to 20% of the data generated from the temperature change of a typical geothermal system is chosen from a total of 9939 points. It is shown that the number of data points, to a degree, has negligible effect on the fitted model depending on the scheme. The proposed data processing schemes are ranked in terms of R2 and NRMSE values

arXiv.org e-Print Archive

Open access publications from Tech Reviews

Detection and localization of change-points in high-dimensional network traffic data

Author: Lévy-Leduc Céline
Roueff François
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 17/08/2009
Field of study

We propose a novel and efficient method, that we shall call TopRank in the following paper, for detecting change-points in high-dimensional data. This issue is of growing concern to the network security community since network anomalies such as Denial of Service (DoS) attacks lead to changes in Internet traffic. Our method consists of a data reduction stage based on record filtering, followed by a nonparametric change-point detection test based on

U

-statistics. Using this approach, we can address massive data streams and perform anomaly detection and localization on the fly. We show how it applies to some real Internet traffic provided by France-T\'el\'ecom (a French Internet service provider) in the framework of the ANR-RNRT OSCAR project. This approach is very attractive since it benefits from a low computational load and is able to detect and localize several types of network anomalies. We also assess the performance of the TopRank algorithm using synthetic data and compare it with alternative approaches based on random aggregation.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS232 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref