14 research outputs found
Statistical estimation of the intrinsic dimensionality of data collections
A realization fi(路) from a class F(路) can be represented as a point in a metric space and the locus of all points belonging to F(路) lie on a surface in this space. The intrinsic dimensionality of F(路), defined as the least number of parameters needed to identify any fi(路) belonging to F(路), is equal to the topological dimensionality of this surface. Given a sample set of realizations fi(路) from F(路), a statistical method is presented for estimating the intrinsic dimensionality of F(路)
A Probabilistic Definition of Intrinsic Dimensionality for Images
In this paper we address the problem of appropriately representing the intrinsic dimensionality of image neighborhoods. This dimensionality describes the degrees of freedom of a local image patch and it gives rise to some of the most often applied corner and edge detectors
Feature selection with adjustable criteria
Abstract. We present a study on a rough set based approach for feature selection. Instead of using significance or support, Parameterized Average Support Heuristic (PASH) considers the overall quality of the potential set of rules. It will produce a set of rules with balanced support distribution over all decision classes. Adjustable parameters of PASH can help users with different levels of approximation needs to extract predictive rules that may be ignored by other methods. This paper finetunes the PASH heuristic and provides experimental results to PASH.
GA-Facilitated Knowledge Discovery and Pattern Recognition Optimization Applied to the Biochemistry of Protein Solvation
The authors present a GA optimization technique for cosine-based k-nearest neighbors classification that improves predictive accuracy in a class-balanced manner while simultaneously enabling knowledge discovery. The GA performs feature selection and extraction by searching for feature weights and offsets maximizing cosine classifier performance. GA-selected feature weights determine the relevance of each feature to the classification task. This hybrid GA/classifier provides insight to a notoriously difficult problem in molecular biology, the correct treatment of water molecules mediating ligand binding to proteins. In distinguishing patterns of water conservation and displacement, this method achieves higher accuracy than previous techniques. The data mining capabilities of the hybrid system improve the understanding of the physical and chemical determinants governing favored protein-water binding