107,465 research outputs found
Feature interval learning algorithms for classification
Cataloged from PDF version of article.This paper presents Feature Interval Learning algorithms (FIL) which represent multi-concept descriptions in the form of disjoint feature intervals. The FIL algorithms are batch supervised inductive learning algorithms and use feature projections of the training instances to represent induced classification knowledge. The concept description is learned separately for each feature and is in the form of a set of disjoint intervals. The class of an unseen instance is determined by the weighted-majority voting of the feature predictions. The basic FIL algorithm is enhanced with adaptive interval and feature weight schemes in order to handle noisy and irrelevant features. The algorithms are empirically evaluated on twelve data sets from the UCI repository and are compared with k-NN, k-NNFP, and NBC classification algorithms. The experiments demonstrate that the FIL algorithms are robust to irrelevant features and missing feature values, achieve accuracy comparable to the best of the existing algorithms with significantly less average running times. (C) 2010 Elsevier B.V. All rights reserved
Batch learning of disjoint feature intervals
Ankara : Department of Computer Engineering and Information Science and the Institute of Engineering and Science of Bilkent University, 1996.Thesis (Master's) -- Bilkent University, 1996.Includes bibliographical references leaves 98-104.This thesis presents several learning algorithms for multi-concept descriptions
in the form of disjoint feature intervals, called Feature Interval Learning algorithms
(FIL). These algorithms are batch supervised inductive learning algorithms,
and use feature projections of the training instances for the representcition
of the classification knowledge induced. These projections can be generalized
into disjoint feature intervals. Therefore, the concept description learned
is a set of disjoint intervals separately for each feature. The classification of
an unseen instance is based on the weighted majority voting among the local
predictions of features. In order to handle noisy instances, several extensions
are developed by placing weights to intervals rather than features. Empirical
evaluation of the FIL algorithms is presented and compared with some other
similar classification algorithms. Although the FIL algorithms achieve comparable
accuracies with other algorithms, their average running times are much
more less than the others.
This thesis also presents a new adaptation of the well-known /s-NN classification
algorithm to the feature projections approach, called A:-NNFP for
k-Nearest Neighbor on Feature Projections, based on a majority voting on individual
classifications made by the projections of the training set on each
feature and compares with the /:-NN algorithm on some real-world and cirtificial
datasets.Akkuş, AynurM.S
Human-assisted vs. deep learning feature extraction: an evaluation of ECG features extraction methods for arrhythmia classification using machine learning
The success of arrhythmia classification tasks with Machine Learning (ML) algorithms is based on the handcrafting extraction of features from Electrocardiography (ECG) signals. However, feature extraction is a time-consuming trial-and-error approach. Deep Neural Network (DNN) algorithms bypass the process of handcrafting feature extraction since the algorithm extracts the features automatically in their hidden layers. However, it is important to have access to a balanced dataset for algorithm training. In this exploratory research study, we will compare the evaluation metrics among Convolutional Neural Networks (1D-CNN) and Support Vector Machines (SVM) using a dataset based on the merged public ECG signals database TNMG and CINC17 databases. Results: Both algorithms showed good performance using the new, merged ECG database. For evaluation metrics, the 1D-CNN algorithm has a precision of 93.04%, an accuracy of 93.07%, a recall of 93.20%, and an F1-score of 93.05%. The SVM classifier (λ = 10, C = 10 × 109) achieved the best classification metrics with two combined, handcrafted feature extraction methods: Wavelet transforms and R-peak Interval features, which achieved an overall precision of 89.04%, accuracy of 92.00%, recall of 94.20%, and F1-score of 91.54%. As an unique input feature and SVM (λ=10,C=100), wavelet transforms achieved precision, accuracy, recall, and F1-score metrics of 86.15%, 85.33%, 81.16%, and 83.58%. Conclusion: Researchers face a challenge in finding a broad dataset to evaluate ML models. One way to solve this problem, especially for deep learning models, is to combine several public datasets to increase the amount of data. The SVM and 1D-CNN algorithms showed positive results with the merge of databases, showing similar F1-score, precision, and recall during arrhythmia classification. Despite the favorable results for both of them, it should be considered that in the SVM, feature selection is a time-consuming trial-and-error process; meanwhile, CNN algorithms can reduce the workload significantly. The disadvantage of CNN algorithms is that it has a higher computational processing cost; moreover, in the absence of access to powerful computational processing, the SVM can be a reliable solution.“FCT–Fundação para a Ciência e Tecnologia” within the R&D Units Project Scope: UIDB/00319/2020
Random Feature-based Online Multi-kernel Learning in Environments with Unknown Dynamics
Kernel-based methods exhibit well-documented performance in various nonlinear
learning tasks. Most of them rely on a preselected kernel, whose prudent choice
presumes task-specific prior information. Especially when the latter is not
available, multi-kernel learning has gained popularity thanks to its
flexibility in choosing kernels from a prescribed kernel dictionary. Leveraging
the random feature approximation and its recent orthogonality-promoting
variant, the present contribution develops a scalable multi-kernel learning
scheme (termed Raker) to obtain the sought nonlinear learning function `on the
fly,' first for static environments. To further boost performance in dynamic
environments, an adaptive multi-kernel learning scheme (termed AdaRaker) is
developed. AdaRaker accounts not only for data-driven learning of kernel
combination, but also for the unknown dynamics. Performance is analyzed in
terms of both static and dynamic regrets. AdaRaker is uniquely capable of
tracking nonlinear learning functions in environments with unknown dynamics,
and with with analytic performance guarantees. Tests with synthetic and real
datasets are carried out to showcase the effectiveness of the novel algorithms.Comment: 36 page
Clustering for Different Scales of Measurement - the Gap-Ratio Weighted K-means Algorithm
This paper describes a method for clustering data that are spread out over
large regions and which dimensions are on different scales of measurement. Such
an algorithm was developed to implement a robotics application consisting in
sorting and storing objects in an unsupervised way. The toy dataset used to
validate such application consists of Lego bricks of different shapes and
colors. The uncontrolled lighting conditions together with the use of RGB color
features, respectively involve data with a large spread and different levels of
measurement between data dimensions. To overcome the combination of these two
characteristics in the data, we have developed a new weighted K-means
algorithm, called gap-ratio K-means, which consists in weighting each dimension
of the feature space before running the K-means algorithm. The weight
associated with a feature is proportional to the ratio of the biggest gap
between two consecutive data points, and the average of all the other gaps.
This method is compared with two other variants of K-means on the Lego bricks
clustering problem as well as two other common classification datasets.Comment: 13 pages, 6 figures, 2 tables. This paper is under the review process
for AIAP 201
Feature-based time-series analysis
This work presents an introduction to feature-based time-series analysis. The
time series as a data type is first described, along with an overview of the
interdisciplinary time-series analysis literature. I then summarize the range
of feature-based representations for time series that have been developed to
aid interpretable insights into time-series structure. Particular emphasis is
given to emerging research that facilitates wide comparison of feature-based
representations that allow us to understand the properties of a time-series
dataset that make it suited to a particular feature-based representation or
analysis algorithm. The future of time-series analysis is likely to embrace
approaches that exploit machine learning methods to partially automate human
learning to aid understanding of the complex dynamical patterns in the time
series we measure from the world.Comment: 28 pages, 9 figure
Ensemble Committees for Stock Return Classification and Prediction
This paper considers a portfolio trading strategy formulated by algorithms in
the field of machine learning. The profitability of the strategy is measured by
the algorithm's capability to consistently and accurately identify stock
indices with positive or negative returns, and to generate a preferred
portfolio allocation on the basis of a learned model. Stocks are characterized
by time series data sets consisting of technical variables that reflect market
conditions in a previous time interval, which are utilized produce binary
classification decisions in subsequent intervals. The learned model is
constructed as a committee of random forest classifiers, a non-linear support
vector machine classifier, a relevance vector machine classifier, and a
constituent ensemble of k-nearest neighbors classifiers. The Global Industry
Classification Standard (GICS) is used to explore the ensemble model's efficacy
within the context of various fields of investment including Energy, Materials,
Financials, and Information Technology. Data from 2006 to 2012, inclusive, are
considered, which are chosen for providing a range of market circumstances for
evaluating the model. The model is observed to achieve an accuracy of
approximately 70% when predicting stock price returns three months in advance.Comment: 15 pages, 4 figures, Neukom Institute Computational Undergraduate
Research prize - second plac
Neural activity classification with machine learning models trained on interspike interval series data
The flow of information through the brain is reflected by the activity
patterns of neural cells. Indeed, these firing patterns are widely used as
input data to predictive models that relate stimuli and animal behavior to the
activity of a population of neurons. However, relatively little attention was
paid to single neuron spike trains as predictors of cell or network properties
in the brain. In this work, we introduce an approach to neuronal spike train
data mining which enables effective classification and clustering of neuron
types and network activity states based on single-cell spiking patterns. This
approach is centered around applying state-of-the-art time series
classification/clustering methods to sequences of interspike intervals recorded
from single neurons. We demonstrate good performance of these methods in tasks
involving classification of neuron type (e.g. excitatory vs. inhibitory cells)
and/or neural circuit activity state (e.g. awake vs. REM sleep vs. nonREM sleep
states) on an open-access cortical spiking activity dataset
- …