Search CORE

8 research outputs found

SECURE MINING OF ASSOCITION RULUES OVER HORIZONTALLY PARTIONED DATA IN DATA MINING

Author: Dr. M.Mohan Rao
M.Laxmaiah
Publication venue: Global Journals Inc. (US)
Publication date: 12/06/2010
Field of study

Global Journal of Computer Science and Technology (GJCST)

Ensembles based on Random Projection for gene expression data analysis

Author: R. Folgieri
Publication venue: Universit\ue0 degli Studi di Milano
Publication date: 01/01/2008
Field of study

In this work we focused on methods to solve classification problems characterized by high dimensionality and low cardinality data. These features are relevant in bio-molecular data analysis and particularly in class prediction whith microarray data. Many methods have been proposed to approach this problem, characterized by the so called curse of dimensionality (term introduced by Richard Bellman (9)). Among them, gene selection methods, principal and independent component analysis, kernel methods. In this work we propose and we experimentally analyze two ensemble methods based on two randomized techniques for data compression: Random Subspaces and Random Projections. While Random Subspaces, originally proposed by T. K. Ho, is a technique related to feature subsampling, Random Projections is a feature extraction technique motivated by the Johnson-Lindenstrauss theory about distance preserving random projections. The randomness underlying the proposed approach leads to diverse sets of extracted features corresponding to low dimensional subspaces with low metric distortion and approximate preservation of the expected loss of the trained base classifiers. In the first part of the work we justify our approach with two theoretical results. The first regards unsupervised learning: we prove that a clustering algorithm minimizing the objective (quadratic) function provides a -closed solution if applied to compressed data according to Johnson-Lindenstrauss theory. The second one is related to supervised learning: we prove that Polynomials kernels are approximatively preserved by Random Projections, up to a degradation proportional to the square of the degree of the polynomial. In the second part of the work, we propose ensemble algorithms based on Random Subspaces and Random Projections, and we experimentally compare them with single SVM and other state-of-the-art ensemble methods, using three gene expression data set: Colon, Leukemia and DLBL-FL - i.e. Diffuse Large B-cell and Follicular Lymphoma. The obtained results confirm the effectiveness of the proposed approach. Moreover, we observed a certain performance degradation of Random Projection methods when the base learners are SVMs with polynomial kernel of high degree

AIR Universita degli studi di Milano

Spectral analysis of large dimentional random matrices

Author: ZHANG LIXIN
Publication venue
Publication date: 07/03/2007
Field of study

Ph.DDOCTOR OF PHILOSOPH

ScholarBank@NUS

Random matrices in data analysis

Author: Dimitris Achlioptas
Publication venue
Publication date
Field of study

Abstract. We show how carefully crafted random matrices can achieve distance-preserving dimensionality reduction, accelerate spectral computations, and reduce the sample complexity of certain kernel methods.

CiteSeerX