4,778 research outputs found
Exact heat kernel on a hypersphere and its applications in kernel SVM
Many contemporary statistical learning methods assume a Euclidean feature
space. This paper presents a method for defining similarity based on
hyperspherical geometry and shows that it often improves the performance of
support vector machine compared to other competing similarity measures.
Specifically, the idea of using heat diffusion on a hypersphere to measure
similarity has been previously proposed, demonstrating promising results based
on a heuristic heat kernel obtained from the zeroth order parametrix expansion;
however, how well this heuristic kernel agrees with the exact hyperspherical
heat kernel remains unknown. This paper presents a higher order parametrix
expansion of the heat kernel on a unit hypersphere and discusses several
problems associated with this expansion method. We then compare the heuristic
kernel with an exact form of the heat kernel expressed in terms of a uniformly
and absolutely convergent series in high-dimensional angular momentum
eigenmodes. Being a natural measure of similarity between sample points
dwelling on a hypersphere, the exact kernel often shows superior performance in
kernel SVM classifications applied to text mining, tumor somatic mutation
imputation, and stock market analysis
Population structure-learned classifier for high-dimension low-sample-size class-imbalanced problem
The Classification on high-dimension low-sample-size data (HDLSS) is a
challenging problem and it is common to have class-imbalanced data in most
application fields. We term this as Imbalanced HDLSS (IHDLSS). Recent
theoretical results reveal that the classification criterion and tolerance
similarity are crucial to HDLSS, which emphasizes the maximization of
within-class variance on the premise of class separability. Based on this idea,
a novel linear binary classifier, termed Population Structure-learned
Classifier (PSC), is proposed. The proposed PSC can obtain better
generalization performance on IHDLSS by maximizing the sum of inter-class
scatter matrix and intra-class scatter matrix on the premise of class
separability and assigning different intercept values to majority and minority
classes. The salient features of the proposed approach are: (1) It works well
on IHDLSS; (2) The inverse of high dimensional matrix can be solved in low
dimensional space; (3) It is self-adaptive in determining the intercept term
for each class; (4) It has the same computational complexity as the SVM. A
series of evaluations are conducted on one simulated data set and eight
real-world benchmark data sets on IHDLSS on gene analysis. Experimental results
demonstrate that the PSC is superior to the state-of-art methods in IHDLSS.Comment: 41 pages,10 Figures,10 Table
The classification for High-dimension low-sample size data
Huge amount of applications in various fields, such as gene expression
analysis or computer vision, undergo data sets with high-dimensional
low-sample-size (HDLSS), which has putted forward great challenges for standard
statistical and modern machine learning methods. In this paper, we propose a
novel classification criterion on HDLSS, tolerance similarity, which emphasizes
the maximization of within-class variance on the premise of class separability.
According to this criterion, a novel linear binary classifier is designed,
denoted by No-separated Data Maximum Dispersion classifier (NPDMD). The
objective of NPDMD is to find a projecting direction w in which all of training
samples scatter in as large an interval as possible. NPDMD has several
characteristics compared to the state-of-the-art classification methods. First,
it works well on HDLSS. Second, it combines the sample statistical information
and local structural information (supporting vectors) into the objective
function to find the solution of projecting direction in the whole feature
spaces. Third, it solves the inverse of high dimensional matrix in low
dimensional space. Fourth, it is relatively simple to be implemented based on
Quadratic Programming. Fifth, it is robust to the model specification for
various real applications. The theoretical properties of NPDMD are deduced. We
conduct a series of evaluations on one simulated and six real-world benchmark
data sets, including face classification and mRNA classification. NPDMD
outperforms those widely used approaches in most cases, or at least obtains
comparable results.Comment: arXiv admin note: text overlap with arXiv:1901.0137
A survey on utilization of data mining approaches for dermatological (skin) diseases prediction
Due to recent technology advances, large volumes of medical data is obtained. These data contain valuable information. Therefore data mining techniques can be used to extract useful patterns. This paper is intended to introduce data mining and its various techniques and a survey of the available literature on medical data mining. We emphasize mainly on the application of data mining on skin diseases. A categorization has been provided based on the different data mining techniques. The utility of the various data mining methodologies is highlighted. Generally association mining is suitable for extracting rules. It has been used especially in cancer diagnosis. Classification is a robust method in medical mining. In this paper, we have summarized the different uses of classification in dermatology. It is one of the most important methods for diagnosis of erythemato-squamous diseases. There are different methods like Neural Networks, Genetic Algorithms and fuzzy classifiaction in this topic. Clustering is a useful method in medical images mining. The purpose of clustering techniques is to find a structure for the given data by finding similarities between data according to data characteristics. Clustering has some applications in dermatology. Besides introducing different mining methods, we have investigated some challenges which exist in mining skin data
Hardware-Amenable Structural Learning for Spike-based Pattern Classification using a Simple Model of Active Dendrites
This paper presents a spike-based model which employs neurons with
functionally distinct dendritic compartments for classifying high dimensional
binary patterns. The synaptic inputs arriving on each dendritic subunit are
nonlinearly processed before being linearly integrated at the soma, giving the
neuron a capacity to perform a large number of input-output mappings. The model
utilizes sparse synaptic connectivity; where each synapse takes a binary value.
The optimal connection pattern of a neuron is learned by using a simple
hardware-friendly, margin enhancing learning algorithm inspired by the
mechanism of structural plasticity in biological neurons. The learning
algorithm groups correlated synaptic inputs on the same dendritic branch. Since
the learning results in modified connection patterns, it can be incorporated
into current event-based neuromorphic systems with little overhead. This work
also presents a branch-specific spike-based version of this structural
plasticity rule. The proposed model is evaluated on benchmark binary
classification problems and its performance is compared against that achieved
using Support Vector Machine (SVM) and Extreme Learning Machine (ELM)
techniques. Our proposed method attains comparable performance while utilizing
10 to 50% less computational resources than the other reported techniques.Comment: Accepted for publication in Neural Computatio
- …