Search CORE

11,181 research outputs found

Change detection in categorical evolving data streams

Author: Aggarwal C. C.
Asuncion A.
Cao F.
Witten I. H.
Yu L.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

Detecting change in evolving data streams is a central issue for accurate adaptive learning. In real world applications, data streams have categorical features, and changes induced in the data distribution of these categorical features have not been considered extensively so far. Previous work on change detection focused on detecting changes in the accuracy of the learners, but without considering changes in the data distribution. To cope with these issues, we propose a new unsupervised change detection method, called CDCStream (Change Detection in Categorical Data Streams), well suited for categorical data streams. The proposed method is able to detect changes in a batch incremental scenario. It is based on the two following characteristics: (i) a summarization strategy is proposed to compress the actual batch by extracting a descriptive summary and (ii) a new segmentation algorithm is proposed to highlight changes and issue warnings for a data stream. To evaluate our proposal we employ it in a learning task over real world data and we compare its results with state of the art methods. We also report qualitative evaluation in order to show the behavior of CDCStream

CiteSeerX

Crossref

Research Commons@Waikato

A High-Fidelity Realization of the Euclid Code Comparison $N$ -body Simulation with Abacus

Author: Eisenstein Daniel J.
Garrison Lehman H.
Pinto Philip A.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 06/03/2019
Field of study

We present a high-fidelity realization of the cosmological

N

-body simulation from the Schneider et al. (2016) code comparison project. The simulation was performed with our Abacus

N

-body code, which offers high force accuracy, high performance, and minimal particle integration errors. The simulation consists of

2048^3

particles in a

500\ h^{-1}\mathrm{Mpc}

box, for a particle mass of

1.2\times 10^9\ h^{-1}\mathrm{M}_\odot

with $10\ h^{-1}\mathrm{kpc}

spline softening. Abacus executed 1052 global time steps to

z=0

in 107 hours on one dual-Xeon, dual-GPU node, for a mean rate of 23 million particles per second per step. We find Abacus is in good agreement with Ramses and Pkdgrav3 and less so with Gadget3. We validate our choice of time step by halving the step size and find sub-percent differences in the power spectrum and 2PCF at nearly all measured scales, with

<0.3\%

errors at

k<10\ \mathrm{Mpc}^{-1}h

. On large scales, Abacus reproduces linear theory better than

0.01\%$. Simulation snapshots are available at http://nbody.rc.fas.harvard.edu/public/S2016 .Comment: 13 pages, 8 figures. Minor changes to match MNRAS accepted versio

arXiv.org e-Print Archive

The University of Arizona

SOM-based algorithms for qualitative variables

Author: Cottrell Marie
Ibbou Smail
Letrémy Patrick
Publication venue: 'Elsevier BV'
Publication date: 01/10/2004
Field of study

It is well known that the SOM algorithm achieves a clustering of data which can be interpreted as an extension of Principal Component Analysis, because of its topology-preserving property. But the SOM algorithm can only process real-valued data. In previous papers, we have proposed several methods based on the SOM algorithm to analyze categorical data, which is the case in survey data. In this paper, we present these methods in a unified manner. The first one (Kohonen Multiple Correspondence Analysis, KMCA) deals only with the modalities, while the two others (Kohonen Multiple Correspondence Analysis with individuals, KMCA\_ind, Kohonen algorithm on DISJonctive table, KDISJ) can take into account the individuals, and the modalities simultaneously.Comment: Special Issue apr\`{e}s WSOM 03 \`{a} Kitakiush

arXiv.org e-Print Archive

HAL-Paris1

Improving the family orientation process in Cuban Special Schools trough Nearest Prototype classification

Author: Caballero-Mota Y.
García-Lorenzo M. M.
Rey-Benguría C.
Villuendas-Rey Y.
Publication venue: 'Universidad Internacional de La Rioja'
Publication date: 14/01/2020
Field of study

Cuban Schools for children with Affective – Behavioral Maladies (SABM) have as goal to accomplish a major change in children behavior, to insert them effectively into society. One of the key elements in this objective is to give an adequate orientation to the children’s families; due to the family is one of the most important educational contexts in which the children will develop their personality. The family orientation process in SABM involves clustering and classification of mixed type data with non-symmetric similarity functions. To improve this process, this paper includes some novel characteristics in clustering and prototype selection. The proposed approach uses a hierarchical clustering based on compact sets, making it suitable for dealing with non-symmetric similarity functions, as well as with mixed and incomplete data. The proposal obtains very good results on the SABM data, and over repository databases

Re-UNIR

Robust PCA as Bilinear Decomposition with Outlier-Sparsity Regularization

Author: Giannakis Georgios B.
Mateos Gonzalo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 07/11/2011
Field of study

Principal component analysis (PCA) is widely used for dimensionality reduction, with well-documented merits in various applications involving high-dimensional data, including computer vision, preference measurement, and bioinformatics. In this context, the fresh look advocated here permeates benefits from variable selection and compressive sampling, to robustify PCA against outliers. A least-trimmed squares estimator of a low-rank bilinear factor analysis model is shown closely related to that obtained from an

\ell_0

-(pseudo)norm-regularized criterion encouraging sparsity in a matrix explicitly modeling the outliers. This connection suggests robust PCA schemes based on convex relaxation, which lead naturally to a family of robust estimators encompassing Huber's optimal M-class as a special case. Outliers are identified by tuning a regularization parameter, which amounts to controlling sparsity of the outlier matrix along the whole robustification path of (group) least-absolute shrinkage and selection operator (Lasso) solutions. Beyond its neat ties to robust statistics, the developed outlier-aware PCA framework is versatile to accommodate novel and scalable algorithms to: i) track the low-rank signal subspace robustly, as new data are acquired in real time; and ii) determine principal components robustly in (possibly) infinite-dimensional feature spaces. Synthetic and real data tests corroborate the effectiveness of the proposed robust PCA schemes, when used to identify aberrant responses in personality assessment surveys, as well as unveil communities in social networks, and intruders from video surveillance data.Comment: 30 pages, submitted to IEEE Transactions on Signal Processin

arXiv.org e-Print Archive

CiteSeerX

Crossref