7,711 research outputs found
Recommended from our members
The robust selection of predictive genes via a simple classifier
Identifying genes that direct the mechanism of a disease from expression data is extremely useful in understanding how that mechanism works.
This in turn may lead to better diagnoses and potentially can lead to a cure for that disease. This task becomes extremely challenging when the
data are characterised by only a small number of samples and a high number of dimensions, as it is often the case with gene expression data.
Motivated by this challenge, we present a general framework that focuses on simplicity and data perturbation. These are the keys for the robust
identification of the most predictive features in such data. Within this framework, we propose a simple selective na¨ıve Bayes classifier discovered using a global search technique, and combine it with data perturbation to
increase its robustness to small sample sizes.
An extensive validation of the method was carried out using two applied datasets from the field of microarrays and a simulated dataset, all
confounded by small sample sizes and high dimensionality. The method has been shown capable of identifying genes previously confirmed or associated with prostate cancer and viral infections
Wavelet feature extraction and genetic algorithm for biomarker detection in colorectal cancer data
Biomarkers which predict patient’s survival can play an important role in medical diagnosis and
treatment. How to select the significant biomarkers from hundreds of protein markers is a key step in
survival analysis. In this paper a novel method is proposed to detect the prognostic biomarkers ofsurvival in colorectal cancer patients using wavelet analysis, genetic algorithm, and Bayes classifier. One dimensional discrete wavelet transform (DWT) is normally used to reduce the dimensionality of biomedical data. In this study one dimensional continuous wavelet transform (CWT) was proposed to extract the features of colorectal cancer data. One dimensional CWT has no ability to reduce
dimensionality of data, but captures the missing features of DWT, and is complementary part of DWT. Genetic algorithm was performed on extracted wavelet coefficients to select the optimized features, using Bayes classifier to build its fitness function. The corresponding protein markers were
located based on the position of optimized features. Kaplan-Meier curve and Cox regression model 2 were used to evaluate the performance of selected biomarkers. Experiments were conducted on colorectal cancer dataset and several significant biomarkers were detected. A new protein biomarker CD46 was found to significantly associate with survival time
AI Solutions for MDS: Artificial Intelligence Techniques for Misuse Detection and Localisation in Telecommunication Environments
This report considers the application of Articial Intelligence (AI) techniques to
the problem of misuse detection and misuse localisation within telecommunications
environments. A broad survey of techniques is provided, that covers inter alia
rule based systems, model-based systems, case based reasoning, pattern matching,
clustering and feature extraction, articial neural networks, genetic algorithms, arti
cial immune systems, agent based systems, data mining and a variety of hybrid
approaches. The report then considers the central issue of event correlation, that
is at the heart of many misuse detection and localisation systems. The notion of
being able to infer misuse by the correlation of individual temporally distributed
events within a multiple data stream environment is explored, and a range of techniques,
covering model based approaches, `programmed' AI and machine learning
paradigms. It is found that, in general, correlation is best achieved via rule based approaches,
but that these suffer from a number of drawbacks, such as the difculty of
developing and maintaining an appropriate knowledge base, and the lack of ability
to generalise from known misuses to new unseen misuses. Two distinct approaches
are evident. One attempts to encode knowledge of known misuses, typically within
rules, and use this to screen events. This approach cannot generally detect misuses
for which it has not been programmed, i.e. it is prone to issuing false negatives.
The other attempts to `learn' the features of event patterns that constitute normal
behaviour, and, by observing patterns that do not match expected behaviour, detect
when a misuse has occurred. This approach is prone to issuing false positives,
i.e. inferring misuse from innocent patterns of behaviour that the system was not
trained to recognise. Contemporary approaches are seen to favour hybridisation,
often combining detection or localisation mechanisms for both abnormal and normal
behaviour, the former to capture known cases of misuse, the latter to capture
unknown cases. In some systems, these mechanisms even work together to update
each other to increase detection rates and lower false positive rates. It is concluded
that hybridisation offers the most promising future direction, but that a rule or state
based component is likely to remain, being the most natural approach to the correlation
of complex events. The challenge, then, is to mitigate the weaknesses of
canonical programmed systems such that learning, generalisation and adaptation
are more readily facilitated
Self-adaptive attribute weighting for Naive Bayes classification
©2014 Elsevier Ltd. All rights reserved. Naive Bayes (NB) is a popular machine learning tool for classification, due to its simplicity, high computational efficiency, and good classification accuracy, especially for high dimensional data such as texts. In reality, the pronounced advantage of NB is often challenged by the strong conditional independence assumption between attributes, which may deteriorate the classification performance. Accordingly, numerous efforts have been made to improve NB, by using approaches such as structure extension, attribute selection, attribute weighting, instance weighting, local learning and so on. In this paper, we propose a new Artificial Immune System (AIS) based self-adaptive attribute weighting method for Naive Bayes classification. The proposed method, namely AISWNB, uses immunity theory in Artificial Immune Systems to search optimal attribute weight values, where self-adjusted weight values will alleviate the conditional independence assumption and help calculate the conditional probability in an accurate way. One noticeable advantage of AISWNB is that the unique immune system based evolutionary computation process, including initialization, clone, section, and mutation, ensures that AISWNB can adjust itself to the data without explicit specification of functional or distributional forms of the underlying model. As a result, AISWNB can obtain good attribute weight values during the learning process. Experiments and comparisons on 36 machine learning benchmark data sets and six image classification data sets demonstrate that AISWNB significantly outperforms its peers in classification accuracy, class probability estimation, and class ranking performance
SODE: Self-Adaptive One-Dependence Estimators for classification
© 2015 Elsevier Ltd. SuperParent-One-Dependence Estimators (SPODEs) represent a family of semi-naive Bayesian classifiers which relax the attribute independence assumption of Naive Bayes (NB) to allow each attribute to depend on a common single attribute (superparent). SPODEs can effectively handle data with attribute dependency but still inherent NB's key advantages such as computational efficiency and robustness for high dimensional data. In reality, determining an optimal superparent for SPODEs is difficult. One common approach is to use weighted combinations of multiple SPODEs, each having a different superparent with a properly assigned weight value (i.e., a weight value is assigned to each attribute). In this paper, we propose a self-adaptive SPODEs, namely SODE, which uses immunity theory in artificial immune systems to automatically and self-adaptively select the weight for each single SPODE. SODE does not need to know the importance of individual SPODE nor the relevance among SPODEs, and can flexibly and efficiently search optimal weight values for each SPODE during the learning process. Extensive experiments and comparisons on 56 benchmark data sets, and validations on image and text classification, demonstrate that SODE outperforms state-of-the-art weighted SPODE algorithms and is suitable for a wide range of learning tasks. Results also confirm that SODE provides an appropriate balance between runtime efficiency and accuracy
Application of artificial neural network in market segmentation: A review on recent trends
Despite the significance of Artificial Neural Network (ANN) algorithm to
market segmentation, there is a need of a comprehensive literature review and a
classification system for it towards identification of future trend of market
segmentation research. The present work is the first identifiable academic
literature review of the application of neural network based techniques to
segmentation. Our study has provided an academic database of literature between
the periods of 2000-2010 and proposed a classification scheme for the articles.
One thousands (1000) articles have been identified, and around 100 relevant
selected articles have been subsequently reviewed and classified based on the
major focus of each paper. Findings of this study indicated that the research
area of ANN based applications are receiving most research attention and self
organizing map based applications are second in position to be used in
segmentation. The commonly used models for market segmentation are data mining,
intelligent system etc. Our analysis furnishes a roadmap to guide future
research and aid knowledge accretion and establishment pertaining to the
application of ANN based techniques in market segmentation. Thus the present
work will significantly contribute to both the industry and academic research
in business and marketing as a sustainable valuable knowledge source of market
segmentation with the future trend of ANN application in segmentation.Comment: 24 pages, 7 figures,3 Table
- …