12 research outputs found
Attention mechanisms in the CHREST cognitive architecture
In this paper, we describe the attention mechanisms in CHREST, a computational architecture of human visual expertise. CHREST organises information acquired by direct experience from the world in the form of chunks. These chunks are searched for, and verified, by a unique set of heuristics, comprising the attention mechanism. We explain how the attention mechanism combines bottom-up and top-down heuristics from internal and external sources of information. We describe some experimental evidence demonstrating the correspondence of CHRESTās perceptual mechanisms with those of human subjects. Finally, we discuss how visual attention can play an important role in actions carried out by human experts in domains such as chess
A New Approach Based on Quantum Clustering and Wavelet Transform for Breast Cancer Classification: Comparative Study
Feature selection involves identifying a subset of the most useful features that produce the same results as the original set of features. In this paper, we present a new approach for improving classification accuracy. This approach is based on quantum clustering for feature subset selection and wavelet transform for features extraction. The feature selection is performed in three steps. First the mammographic image undergoes a wavelet transform then some features are extracted. In the second step the original feature space is partitioned in clusters in order to group similar features. This operation is performed using the Quantum Clustering algorithm. The third step deals with the selection of a representative feature for each cluster. This selection is based on similarity measures such as the correlation coefficient (CC) and the mutual information (MI). The feature which maximizes this information (CC or MI) is chosen by the algorithm. This approach is applied for breast cancer classification. The K-nearest neighbors (KNN) classifier is used to achieve the classification. We have presented classification accuracy versus feature type, wavelet transform and K neighbors in the KNN classifier. An accuracy of 100% was reached in some cases
Toward optimal feature selection using ranking methods and classification algorithms
We presented a comparison between several feature ranking methods used on two real datasets. We considered six ranking methods that can be divided into two broad categories: statistical and entropy-based. Four supervised learning algorithms are adopted to build models, namely, IB1, Naive Bayes, C4.5 decision tree and the RBF network. We showed that the selection of ranking methods could be important for classification accuracy. In our experiments, ranking methods with different supervised learning algorithms give quite different results for balanced accuracy. Our cases confirm that, in order to be sure that a subset of features giving the highest accuracy has been selected, the use of many different indices is recommended
Elastic Step DQN: A novel multi-step algorithm to alleviate overestimation in Deep QNetworks
Deep Q-Networks algorithm (DQN) was the first reinforcement learning
algorithm using deep neural network to successfully surpass human level
performance in a number of Atari learning environments. However, divergent and
unstable behaviour have been long standing issues in DQNs. The unstable
behaviour is often characterised by overestimation in the -values, commonly
referred to as the overestimation bias. To address the overestimation bias and
the divergent behaviour, a number of heuristic extensions have been proposed.
Notably, multi-step updates have been shown to drastically reduce unstable
behaviour while improving agent's training performance. However, agents are
often highly sensitive to the selection of the multi-step update horizon (),
and our empirical experiments show that a poorly chosen static value for
can in many cases lead to worse performance than single-step DQN. Inspired by
the success of -step DQN and the effects that multi-step updates have on
overestimation bias, this paper proposes a new algorithm that we call `Elastic
Step DQN' (ES-DQN). It dynamically varies the step size horizon in multi-step
updates based on the similarity of states visited. Our empirical evaluation
shows that ES-DQN out-performs -step with fixed updates, Double DQN and
Average DQN in several OpenAI Gym environments while at the same time
alleviating the overestimation bias
Data Patterns Discovery Using Unsupervised Learning
Self-care activities classification poses significant challenges in identifying childrenās unique functional abilities and needs within the exceptional children healthcare system. The accuracy of diagnosing a child\u27s self-care problem, such as toileting or dressing, is highly influenced by an occupational therapistsā experience and time constraints. Thus, there is a need for objective means to detect and predict in advance the self-care problems of children with physical and motor disabilities. We use clustering to discover interesting information from self-care problems, perform automatic classification of binary data, and discover outliers. The advantages are twofold: the advancement of knowledge on identifying self-care problems in children and comprehensive experimental results on clustering binary healthcare data. By using various distances and linkage methods, resampling techniques of imbalanced data, and feature selection preprocessing in a clustering framework, we find associations among patients and an Adjusted Rand Index (ARI) of 76.26\
Empirical Analysis ot the Top 800 Cryptocurrencies using Machine Learning Techniques
The International Token Classification (ITC) Framework by the Blockchain Center in Frankfurt classifies 795 cryptocurrency tokens based on their economic, technological, legal and industry categorization. This work analyzes cryptocurrency data to evaluate the categorization with real-world market data. The feature space includes price, volume and market capitalization data. Additional metrics such as the moving average and the relative strengh index are added to get a more in-depth understanding of market movements. The data set is used to build supervised and unsupervised machine learning models. The prediction accuracies varied amongst labels and all remained below 90%. The technological label had the highest prediction accuracy at 88.9% using Random Forests. The economic label could be predicted with an accuracy of 81.7% using K-Nearest Neighbors. The classification using machine learning techniques is not yet accurate enough to automate the classification process. But it can be improved by adding additional features. The unsupervised clustering shows that there are more layers to the data that can be added to the ITC. The additional categories are built upon a combination of token mining, maximal supply, volume and market capitalization data. As a result we suggest that a data-driven extension of the categorization in to a token profile would allow investors and regulators to gain a deeper understanding of token performance, maturity and usage
Feature Selection as a Preprocessing Step for Hierarchical Clustering
Although feature selection is a central problem in inductive learning as suggested by the growing amount of research in this area, most of the work has been carried out under the supervised learning paradigm, paying little attention to unsupervised learning tasks and, particularly, clustering tasks. In this paper, we analyze the particular benefits that feature selection may provide in hierarchical clustering tasks and explore the power of feature selection methods applied as a preprocessing step under the proposed dimensions. Instead of only predicting class labels, the focus is on a more general inference tasks over all the features. Empirical results suggest that feature selection as preprocessing only provides limited improvements in the performance task. In addition, they raise the problem of the notion of irrelevance in unsupervised settings. 1 INTRODUCTION Inductive learning systems are a powerful approach for automatically extracting useful information from data or for assisti..
High-dimensional Sparse Count Data Clustering Using Finite Mixture Models
Due to the massive amount of available digital data, automating its analysis and modeling for
different purposes and applications has become an urgent need. One of the most challenging tasks
in machine learning is clustering, which is defined as the process of assigning observations sharing
similar characteristics to subgroups. Such a task is significant, especially in implementing complex
algorithms to deal with high-dimensional data. Thus, the advancement of computational power in
statistical-based approaches is increasingly becoming an interesting and attractive research domain.
Among the successful methods, mixture models have been widely acknowledged and successfully
applied in numerous fields as they have been providing a convenient yet flexible formal setting for
unsupervised and semi-supervised learning. An essential problem with these approaches is to develop
a probabilistic model that represents the data well by taking into account its nature. Count
data are widely used in machine learning and computer vision applications where an object, e.g.,
a text document or an image, can be represented by a vector corresponding to the appearance frequencies
of words or visual words, respectively. Thus, they usually suffer from the well-known
curse of dimensionality as objects are represented with high-dimensional and sparse vectors, i.e., a
few thousand dimensions with a sparsity of 95 to 99%, which decline the performance of clustering
algorithms dramatically. Moreover, count data systematically exhibit the burstiness and overdispersion
phenomena, which both cannot be handled with a generic multinomial distribution, typically
used to model count data, due to its dependency assumption.
This thesis is constructed around six related manuscripts, in which we propose several approaches
for high-dimensional sparse count data clustering via various mixture models based on hierarchical Bayesian modeling frameworks that have the ability to model the dependency of repetitive
word occurrences. In such frameworks, a suitable distribution is used to introduce the prior
information into the construction of the statistical model, based on a conjugate distribution to the
multinomial, e.g. the Dirichlet, generalized Dirichlet, and the Beta-Liouville, which has numerous
computational advantages. Thus, we proposed a novel model that we call the Multinomial
Scaled Dirichlet (MSD) based on using the scaled Dirichlet as a prior to the multinomial to allow
more modeling flexibility. Although these frameworks can model burstiness and overdispersion
well, they share similar disadvantages making their estimation procedure is very inefficient when
the collection size is large. To handle high-dimensionality, we considered two approaches. First,
we derived close approximations to the distributions in a hierarchical structure to bring them to
the exponential-family form aiming to combine the flexibility and efficiency of these models with
the desirable statistical and computational properties of the exponential family of distributions, including
sufficiency, which reduce the complexity and computational efforts especially for sparse
and high-dimensional data. Second, we proposed a model-based unsupervised feature selection approach
for count data to overcome several issues that may be caused by the high dimensionality of
the feature space, such as over-fitting, low efficiency, and poor performance.
Furthermore, we handled two significant aspects of mixture based clustering methods, namely,
parameters estimation and performing model selection. We considered the Expectation-Maximization
(EM) algorithm, which is a broadly applicable iterative algorithm for estimating the mixture model
parameters, with incorporating several techniques to avoid its initialization dependency and poor
local maxima. For model selection, we investigated different approaches to find the optimal number
of components based on the Minimum Message Length (MML) philosophy. The effectiveness of
our approaches is evaluated using challenging real-life applications, such as sentiment analysis, hate
speech detection on Twitter, topic novelty detection, human interaction recognition in films and TV
shows, facial expression recognition, face identification, and age estimation