95,779 research outputs found
Machine Learning vs Conventional Analysis Techniques for the Earth’s Magnetic Field Study
Abstract. Current techniques for calculating and generating models used for analyzing the Earth’s magnetic field are laborious and time-consuming. We assert that machine learning can have a significant impact on building magnetic field models more quickly and on various levels of complexity, specifically as it pertains to data cleansing and sorting. Our approach to this problem uses a reverse iterative multi-phase process for data cleansing, in which, initially, the CHAOS-6 model data is examined to determine if machine learning can be used to differentiate between useful data components for spherical harmonics, versus data noise. During this phase, six different machine learning techniques are used and compared: two classification techniques (Convolutional Neural Network (CNN) and Support Vector Classification (SVC)) and four regression techniques (Random Forest Regression (RFR), Support Vector Regression (SVR), Logistic Regression, and Linear Regression). During this initial phase, the focus is on understanding the accuracy of machine learning for model selection and uses relatively clean data. Future phases should include machine learning relevance as it pertains to the massive volume of data received from satellites. Exploring the machine learning capabilities for magnetic field datasets accomplishes 1) faster and more efficient computation when there are millions of rows of data in any given 30-day period, and 2) lowers the propagation of errors that cause some data to be useless in the spherical harmonics computations used in the model generation
Linear Time Feature Selection for Regularized Least-Squares
We propose a novel algorithm for greedy forward feature selection for
regularized least-squares (RLS) regression and classification, also known as
the least-squares support vector machine or ridge regression. The algorithm,
which we call greedy RLS, starts from the empty feature set, and on each
iteration adds the feature whose addition provides the best leave-one-out
cross-validation performance. Our method is considerably faster than the
previously proposed ones, since its time complexity is linear in the number of
training examples, the number of features in the original data set, and the
desired size of the set of selected features. Therefore, as a side effect we
obtain a new training algorithm for learning sparse linear RLS predictors which
can be used for large scale learning. This speed is possible due to matrix
calculus based short-cuts for leave-one-out and feature addition. We
experimentally demonstrate the scalability of our algorithm and its ability to
find good quality feature sets.Comment: 17 pages, 15 figure
An Algorithmic Framework for Computing Validation Performance Bounds by Using Suboptimal Models
Practical model building processes are often time-consuming because many
different models must be trained and validated. In this paper, we introduce a
novel algorithm that can be used for computing the lower and the upper bounds
of model validation errors without actually training the model itself. A key
idea behind our algorithm is using a side information available from a
suboptimal model. If a reasonably good suboptimal model is available, our
algorithm can compute lower and upper bounds of many useful quantities for
making inferences on the unknown target model. We demonstrate the advantage of
our algorithm in the context of model selection for regularized learning
problems
- …