2,320 research outputs found
Temporal Feature Selection with Symbolic Regression
Building and discovering useful features when constructing machine learning models is the central task for the machine learning practitioner. Good features are useful not only in increasing the predictive power of a model but also in illuminating the underlying drivers of a target variable. In this research we propose a novel feature learning technique in which Symbolic regression is endowed with a ``Range Terminal\u27\u27 that allows it to explore functions of the aggregate of variables over time. We test the Range Terminal on a synthetic data set and a real world data in which we predict seasonal greenness using satellite derived temperature and snow data over a portion of the Arctic. On the synthetic data set we find Symbolic regression with the Range Terminal outperforms standard Symbolic regression and Lasso regression. On the Arctic data set we find it outperforms standard Symbolic regression, fails to beat the Lasso regression, but finds useful features describing the interaction between Land Surface Temperature, Snow, and seasonal vegetative growth in the Arctic
Classifying sequences by the optimized dissimilarity space embedding approach: a case study on the solubility analysis of the E. coli proteome
We evaluate a version of the recently-proposed classification system named
Optimized Dissimilarity Space Embedding (ODSE) that operates in the input space
of sequences of generic objects. The ODSE system has been originally presented
as a classification system for patterns represented as labeled graphs. However,
since ODSE is founded on the dissimilarity space representation of the input
data, the classifier can be easily adapted to any input domain where it is
possible to define a meaningful dissimilarity measure. Here we demonstrate the
effectiveness of the ODSE classifier for sequences by considering an
application dealing with the recognition of the solubility degree of the
Escherichia coli proteome. Solubility, or analogously aggregation propensity,
is an important property of protein molecules, which is intimately related to
the mechanisms underlying the chemico-physical process of folding. Each protein
of our dataset is initially associated with a solubility degree and it is
represented as a sequence of symbols, denoting the 20 amino acid residues. The
herein obtained computational results, which we stress that have been achieved
with no context-dependent tuning of the ODSE system, confirm the validity and
generality of the ODSE-based approach for structured data classification.Comment: 10 pages, 49 reference
Exact heat kernel on a hypersphere and its applications in kernel SVM
Many contemporary statistical learning methods assume a Euclidean feature
space. This paper presents a method for defining similarity based on
hyperspherical geometry and shows that it often improves the performance of
support vector machine compared to other competing similarity measures.
Specifically, the idea of using heat diffusion on a hypersphere to measure
similarity has been previously proposed, demonstrating promising results based
on a heuristic heat kernel obtained from the zeroth order parametrix expansion;
however, how well this heuristic kernel agrees with the exact hyperspherical
heat kernel remains unknown. This paper presents a higher order parametrix
expansion of the heat kernel on a unit hypersphere and discusses several
problems associated with this expansion method. We then compare the heuristic
kernel with an exact form of the heat kernel expressed in terms of a uniformly
and absolutely convergent series in high-dimensional angular momentum
eigenmodes. Being a natural measure of similarity between sample points
dwelling on a hypersphere, the exact kernel often shows superior performance in
kernel SVM classifications applied to text mining, tumor somatic mutation
imputation, and stock market analysis
Eddy current defect response analysis using sum of Gaussian methods
This dissertation is a study of methods to automatedly detect and produce approximations of eddy current differential coil defect signatures in terms of a summed collection of Gaussian functions (SoG). Datasets consisting of varying material, defect size, inspection frequency, and coil diameter were investigated. Dimensionally reduced representations of the defect responses were obtained utilizing common existing reduction methods and novel enhancements to them utilizing SoG Representations. Efficacy of the SoG enhanced representations were studied utilizing common Machine Learning (ML) interpretable classifier designs with the SoG representations indicating significant improvement of common analysis metrics
- …