Search CORE

253,023 research outputs found

Projection Pursuit for Exploratory Supervised Classification

Author: Dianne Cook
Eun-Kyung Lee
Sigbert Klinke
Thomas Lumley
Publication venue
Publication date
Field of study

In high-dimensional data, one often seeks a few interesting low-dimensional projections that reveal important features of the data. Projection pursuit is a procedure for searching high-dimensional data for interesting low-dimensional projections via the optimization of a criterion function called the projection pursuit index. Very few projection pursuit indices incorporate class or group information in the calculation. Hence, they cannot be adequately applied in supervised classification problems to provide low-dimensional projections revealing class differences in the data . We introduce new indices derived from linear discriminant analysis that can be used for exploratory supervised classification.Data mining, Exploratory multivariate data analysis, Gene expression data, Discriminant analysis

Research Papers in Economics

An information-driven framework for image mining

Author: Hsu Wynne
Lee Mong Li
Zhang Ji
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2001
Field of study

[Abstract]: Image mining systems that can automatically extract semantically meaningful information (knowledge) from image data are increasingly in demand. The fundamental challenge in image mining is to determine how low-level, pixel representation contained in a raw image or image sequence can be processed to identify high-level spatial objects and relationships. To meet this challenge, we propose an efficient information-driven framework for image mining. We distinguish four levels of information: the Pixel Level, the Object Level, the Semantic Concept Level, and the Pattern and Knowledge Level. High-dimensional indexing schemes and retrieval techniques are also included in the framework to support the flow of information among the levels. We believe this framework represents the first step towards capturing the different levels of information present in image data and addressing the issues and challenges of discovering useful patterns/knowledge from each level

CiteSeerX

Crossref

University of Southern Queensland ePrints

Foundational principles for large scale inference: Illustrations through correlation mining

Author: Alfred O. Hero
Alfred O. Hero
Alfred O. Hero
Bala Rajaratnam
Bala Rajaratnam
Bala Rajaratnam
Publication venue
Publication date: 18/05/2015
Field of study

When can reliable inference be drawn in the "Big Data" context? This paper presents a framework for answering this fundamental question in the context of correlation mining, with implications for general large scale inference. In large scale data applications like genomics, connectomics, and eco-informatics the dataset is often variable-rich but sample-starved: a regime where the number

n

of acquired samples (statistical replicates) is far fewer than the number

p

of observed variables (genes, neurons, voxels, or chemical constituents). Much of recent work has focused on understanding the computational complexity of proposed methods for "Big Data." Sample complexity however has received relatively less attention, especially in the setting when the sample size

n

is fixed, and the dimension

p

grows without bound. To address this gap, we develop a unified statistical framework that explicitly quantifies the sample complexity of various inferential tasks. Sampling regimes can be divided into several categories: 1) the classical asymptotic regime where the variable dimension is fixed and the sample size goes to infinity; 2) the mixed asymptotic regime where both variable dimension and sample size go to infinity at comparable rates; 3) the purely high dimensional asymptotic regime where the variable dimension goes to infinity and the sample size is fixed. Each regime has its niche but only the latter regime applies to exa-scale data dimension. We illustrate this high dimensional framework for the problem of correlation mining, where it is the matrix of pairwise and partial correlations among the variables that are of interest. We demonstrate various regimes of correlation mining based on the unifying perspective of high dimensional learning rates and sample complexity for different structured covariance models and different inference tasks

arXiv.org e-Print Archive

CiteSeerX

PubMed Central

eScholarship - University of California

A Near-linear Time Approximation Algorithm for Angle-based Outlier Detection in High-dimensional Data

Author: Pagh Rasmus
Pham Ninh Dang
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2012
Field of study

Outlier mining in d-dimensional point sets is a fundamental and well studied data mining task due to its variety of ap-plications. Most such applications arise in high-dimensional domains. A bottleneck of existing approaches is that implicit or explicit assessments on concepts of distance or nearest neighbor are deteriorated in high-dimensional data. Follow-ing up on the work of Kriegel et al. (KDD ’08), we inves-tigate the use of angle-based outlier factor in mining high-dimensional outliers. While their algorithm runs in cubic time (with a quadratic time heuristic), we propose a novel random projection-based technique that is able to estimate the angle-based outlier factor for all data points in time near-linear in the size of the data. Also, our approach is suitable to be performed in parallel environment to achieve a parallel speedup. We introduce a theoretical analysis of the quality of approximation to guarantee the reliability of our estima-tion algorithm. The empirical experiments on synthetic and real world data sets demonstrate that our approach is effi-cient and scalable to very large high-dimensional data sets

CiteSeerX

The IT University of Copenhagen's Repository

An Interactive Clustering Methodology for High Dimensional Data Mining

Author: Alidaee Bahram
Glover Fred
Kochenberger Gary
Wang Haibo
Publication venue: AIS Electronic Library (AISeL)
Publication date: 31/12/2004
Field of study

AIS Electronic Library (AISeL)