6,617 research outputs found
GBG++: A Fast and Stable Granular Ball Generation Method for Classification
Granular ball computing (GBC), as an efficient, robust, and scalable learning
method, has become a popular research topic of granular computing. GBC includes
two stages: granular ball generation (GBG) and multi-granularity learning based
on the granular ball (GB). However, the stability and efficiency of existing
GBG methods need to be further improved due to their strong dependence on
-means or -division. In addition, GB-based classifiers only unilaterally
consider the GB's geometric characteristics to construct classification rules,
but the GB's quality is ignored. Therefore, in this paper, based on the
attention mechanism, a fast and stable GBG (GBG++) method is proposed first.
Specifically, the proposed GBG++ method only needs to calculate the distances
from the data-driven center to the undivided samples when splitting each GB
instead of randomly selecting the center and calculating the distances between
it and all samples. Moreover, an outlier detection method is introduced to
identify local outliers. Consequently, the GBG++ method can significantly
improve effectiveness, robustness, and efficiency while being absolutely
stable. Second, considering the influence of the sample size within the GB on
the GB's quality, based on the GBG++ method, an improved GB-based -nearest
neighbors algorithm (GBNN++) is presented, which can reduce
misclassification at the class boundary. Finally, the experimental results
indicate that the proposed method outperforms several existing GB-based
classifiers and classical machine learning classifiers on public benchmark
datasets
Automatic generation of fuzzy classification rules using granulation-based adaptive clustering
A central problem of fuzzy modelling is the generation of fuzzy rules that fit the data to the highest possible extent. In this study, we present a method for automatic generation of fuzzy rules from data. The main advantage of the proposed method is its ability to perform data clustering without the requirement of predefining any parameters including number of clusters. The proposed method creates data clusters at different levels of granulation and selects the best clustering results based on some measures. The proposed method involves merging clusters into new clusters that have a coarser granulation. To evaluate performance of the proposed method, three different datasets are used to compare performance of the proposed method to other classifiers: SVM classifier, FCM fuzzy classifier, subtractive clustering fuzzy classifier. Results show that the proposed method has better classification results than other classifiers for all the datasets used
Recommended from our members
Granular computing approach for the design of medical data classification systems
Granular computing is a computation theory that imitates human thinking and reasoning by dealing with information at different levels of abstraction/precision. The adoption of granular computing approach in the design of data classification systems improves their performance in dealing with data uncertainty and facilitates handling large volumes of data. In this paper, a new approach for the design of medical data classification systems is proposed. The proposed approach makes use of data granulation in training the classifier. Training data is granulated at different levels and data from each level is used for constructing the classification system. To evaluate performance of the proposed approach, a classification system based on neural network is implemented. Four medical datasets are used to compare performance of the proposed approach to other classifiers: neural network classifier, ANFIS classifier and SVM classifier. Results show that the proposed approach improves classification performance of neural network classifier and produces better accuracy and area under curve than other classifiers for most of the datasets used
X-ray Astronomical Point Sources Recognition Using Granular Binary-tree SVM
The study on point sources in astronomical images is of special importance,
since most energetic celestial objects in the Universe exhibit a point-like
appearance. An approach to recognize the point sources (PS) in the X-ray
astronomical images using our newly designed granular binary-tree support
vector machine (GBT-SVM) classifier is proposed. First, all potential point
sources are located by peak detection on the image. The image and spectral
features of these potential point sources are then extracted. Finally, a
classifier to recognize the true point sources is build through the extracted
features. Experiments and applications of our approach on real X-ray
astronomical images are demonstrated. comparisons between our approach and
other SVM-based classifiers are also carried out by evaluating the precision
and recall rates, which prove that our approach is better and achieves a higher
accuracy of around 89%.Comment: Accepted by ICSP201
Face Alignment Using Boosting and Evolutionary Search
In this paper, we present a face alignment approach using granular features, boosting, and an evolutionary search algorithm. Active Appearance Models (AAM) integrate a shape-texture-combined morphable face model into an efficient fitting strategy, then Boosting Appearance Models (BAM) consider the face alignment problem as a process of maximizing the response from a boosting classifier. Enlightened by AAM and BAM, we present a framework which implements improved boosting classifiers based on more discriminative features and exhaustive search strategies. In this paper, we utilize granular features to replace the conventional rectangular Haar-like features, to improve discriminability, computational efficiency, and a larger search space. At the same time, we adopt the evolutionary search process to solve the deficiency of searching in the large feature space. Finally, we test our approach on a series of challenging data sets, to show the accuracy and efficiency on versatile face images
Probabilistic identification of cerebellar cortical neurones across species.
Despite our fine-grain anatomical knowledge of the cerebellar cortex, electrophysiological studies of circuit information processing over the last fifty years have been hampered by the difficulty of reliably assigning signals to identified cell types. We approached this problem by assessing the spontaneous activity signatures of identified cerebellar cortical neurones. A range of statistics describing firing frequency and irregularity were then used, individually and in combination, to build Gaussian Process Classifiers (GPC) leading to a probabilistic classification of each neurone type and the computation of equi-probable decision boundaries between cell classes. Firing frequency statistics were useful for separating Purkinje cells from granular layer units, whilst firing irregularity measures proved most useful for distinguishing cells within granular layer cell classes. Considered as single statistics, we achieved classification accuracies of 72.5% and 92.7% for granular layer and molecular layer units respectively. Combining statistics to form twin-variate GPC models substantially improved classification accuracies with the combination of mean spike frequency and log-interval entropy offering classification accuracies of 92.7% and 99.2% for our molecular and granular layer models, respectively. A cross-species comparison was performed, using data drawn from anaesthetised mice and decerebrate cats, where our models offered 80% and 100% classification accuracy. We then used our models to assess non-identified data from awake monkeys and rabbits in order to highlight subsets of neurones with the greatest degree of similarity to identified cell classes. In this way, our GPC-based approach for tentatively identifying neurones from their spontaneous activity signatures, in the absence of an established ground-truth, nonetheless affords the experimenter a statistically robust means of grouping cells with properties matching known cell classes. Our approach therefore may have broad application to a variety of future cerebellar cortical investigations, particularly in awake animals where opportunities for definitive cell identification are limited
Classifying sequences by the optimized dissimilarity space embedding approach: a case study on the solubility analysis of the E. coli proteome
We evaluate a version of the recently-proposed classification system named
Optimized Dissimilarity Space Embedding (ODSE) that operates in the input space
of sequences of generic objects. The ODSE system has been originally presented
as a classification system for patterns represented as labeled graphs. However,
since ODSE is founded on the dissimilarity space representation of the input
data, the classifier can be easily adapted to any input domain where it is
possible to define a meaningful dissimilarity measure. Here we demonstrate the
effectiveness of the ODSE classifier for sequences by considering an
application dealing with the recognition of the solubility degree of the
Escherichia coli proteome. Solubility, or analogously aggregation propensity,
is an important property of protein molecules, which is intimately related to
the mechanisms underlying the chemico-physical process of folding. Each protein
of our dataset is initially associated with a solubility degree and it is
represented as a sequence of symbols, denoting the 20 amino acid residues. The
herein obtained computational results, which we stress that have been achieved
with no context-dependent tuning of the ODSE system, confirm the validity and
generality of the ODSE-based approach for structured data classification.Comment: 10 pages, 49 reference
Learning to Predict with Highly Granular Temporal Data: Estimating individual behavioral profiles with smart meter data
Big spatio-temporal datasets, available through both open and administrative
data sources, offer significant potential for social science research. The
magnitude of the data allows for increased resolution and analysis at
individual level. While there are recent advances in forecasting techniques for
highly granular temporal data, little attention is given to segmenting the time
series and finding homogeneous patterns. In this paper, it is proposed to
estimate behavioral profiles of individuals' activities over time using
Gaussian Process-based models. In particular, the aim is to investigate how
individuals or groups may be clustered according to the model parameters. Such
a Bayesian non-parametric method is then tested by looking at the
predictability of the segments using a combination of models to fit different
parts of the temporal profiles. Model validity is then tested on a set of
holdout data. The dataset consists of half hourly energy consumption records
from smart meters from more than 100,000 households in the UK and covers the
period from 2015 to 2016. The methodological approach developed in the paper
may be easily applied to datasets of similar structure and granularity, for
example social media data, and may lead to improved accuracy in the prediction
of social dynamics and behavior
- …