42 research outputs found
Granular Support Vector Machines Based on Granular Computing, Soft Computing and Statistical Learning
With emergence of biomedical informatics, Web intelligence, and E-business, new challenges are coming for knowledge discovery and data mining modeling problems. In this dissertation work, a framework named Granular Support Vector Machines (GSVM) is proposed to systematically and formally combine statistical learning theory, granular computing theory and soft computing theory to address challenging predictive data modeling problems effectively and/or efficiently, with specific focus on binary classification problems. In general, GSVM works in 3 steps. Step 1 is granulation to build a sequence of information granules from the original dataset or from the original feature space. Step 2 is modeling Support Vector Machines (SVM) in some of these information granules when necessary. Finally, step 3 is aggregation to consolidate information in these granules at suitable abstract level. A good granulation method to find suitable granules is crucial for modeling a good GSVM. Under this framework, many different granulation algorithms including the GSVM-CMW (cumulative margin width) algorithm, the GSVM-AR (association rule mining) algorithm, a family of GSVM-RFE (recursive feature elimination) algorithms, the GSVM-DC (data cleaning) algorithm and the GSVM-RU (repetitive undersampling) algorithm are designed for binary classification problems with different characteristics. The empirical studies in biomedical domain and many other application domains demonstrate that the framework is promising. As a preliminary step, this dissertation work will be extended in the future to build a Granular Computing based Predictive Data Modeling framework (GrC-PDM) with which we can create hybrid adaptive intelligent data mining systems for high quality prediction
Stable Feature Selection for Biomarker Discovery
Feature selection techniques have been used as the workhorse in biomarker
discovery applications for a long time. Surprisingly, the stability of feature
selection with respect to sampling variations has long been under-considered.
It is only until recently that this issue has received more and more attention.
In this article, we review existing stable feature selection methods for
biomarker discovery using a generic hierarchal framework. We have two
objectives: (1) providing an overview on this new yet fast growing topic for a
convenient reference; (2) categorizing existing methods under an expandable
framework for future research and development
Breast cancer disease classification using fuzzy-ID3 algorithm based on association function
Breast cancer is the second leading cause of mortality among female cancer patients worldwide. Early detection of breast cancer is considerd as one of the most effective ways to prevent the disease from spreading and enable human can make correct decision on the next process. Automatic diagnostic methods were frequently used to conduct breast cancer diagnoses in order to increase the accuracy and speed of detection. The fuzzy-ID3 algorithm with association function implementation (FID3-AF) is proposed as a classification technique for breast cancer detection. The FID3-AF algorithm is a hybridisation of the fuzzy system, the iterative dichotomizer 3 (ID3) algorithm, and the association function. The fuzzy-neural dynamic-bottleneck-detection (FUZZYDBD) is considered as an automatic fuzzy database definition method, would aid in the development of the fuzzy database for the data fuzzification process in FID3-AF. The FID3-AF overcame ID3’s issue of being unable to handle continuous data. The association function is implemented to minimise overfitting and enhance generalisation ability. The results indicated that FID3-AF is robust in breast cancer classification. A thorough comparison of FID3-AF to numerous existing methods was conducted to validate the proposed method’s competency. This study established that the FID3-AF performed well and outperform other methods in breast cancer classification
Recommended from our members
Granular computing approach for intelligent classifier design
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University London.Granular computing facilitates dealing with information by providing a theoretical framework to deal with information as granules at different levels of granularity (different levels of specificity/abstraction). It aims to provide an abstract explainable description of the data by forming granules that represent the features or the
underlying structure of corresponding subsets of the data. In this thesis, a granular computing approach to the design of intelligent classification systems is proposed. The proposed approach is employed for different
classification systems to investigate its efficiency. Fuzzy inference systems, neural networks, neuro-fuzzy systems and classifier ensembles are considered to evaluate the efficiency of the proposed approach. Each of the considered systems is designed using the proposed approach and classification performance is evaluated and compared to that of the standard system. The proposed approach is based on constructing information granules from data at multiple levels of granularity. The granulation process is performed using a modified fuzzy c-means algorithm that takes classification problem into account. Clustering is followed by a coarsening process that involves merging small clusters into large ones to form a lower granularity level. The resulted granules are used to build each of the considered binary classifiers in different settings and approaches.
Granules produced by the proposed granulation method are used to build a fuzzy classifier for each granulation level or set of levels. The performance of the classifiers is evaluated using real life data sets and measured by two classification performance measures: accuracy and area under receiver operating characteristic curve. Experimental results show that fuzzy systems constructed using the proposed method achieved better classification performance. In addition, the proposed approach is used for the design of neural network classifiers. Resulted granules from one or more granulation levels are used to train the classifiers at different levels of specificity/abstraction. Using this approach, the classification problem is broken down into the modelling of classification rules represented by the information granules resulting in more interpretable system. Experimental results show that neural network classifiers trained using the proposed approach have better classification performance for most of the data sets. In a similar manner, the proposed approach is used for the training of neuro-fuzzy systems resulting in similar improvement in classification performance. Lastly, neural networks built using the proposed approach are used to construct a classifier ensemble. Information granules are used to generate and train the base classifiers. The final ensemble output is produced by a weighted sum combiner. Based on the experimental results, the proposed approach has improved the classification performance of the base classifiers for most of the data sets. Furthermore, a genetic algorithm is used to determine the combiner weights automatically.Higher Committee for Education Development in Iraq (HCED
PREDICTION OF RECURRENCE AND MORTALITY OF ORAL TONGUE CANCER USING ARTIFICIAL NEURAL NETWORK (A case study of 5 hospitals in Finland and 1 hospital from Sao Paulo, Brazil)
Cancer is a dreadful disease that had caused the death of millions of people. It is characterized by an uncontrollable growth of cell to form lumps or masses of tissue that are known as tumour. Therefore, it is a concern to all and sundry as these tumours mostly release hormones which have negative impact on the body system. Data mining approaches, statistical methods and machine learning algorithms have been proposed for effective cancer data classification. Artificial Neural Networks (ANN) have been used in this thesis for the prediction of recurrence and mortality of oral tongue cancer in patients. Similarly, ANN was also used to examine the diagnostic and prognostic factors. This was aimed at determining which of these diagnostic and prognostics factors had influence on the prediction of recurrence and mortality of oral tongue cancer in patients. Three different ANN have been applied for the learning and testing phases. The aim was to find the most effective technique. They are Elman, Feedforward, and Layer Recurrent neural networks techniques. Elman neural network was not able to make acceptable prediction of the recurrence or the mortality of tongue cancer based on the data. In contrast, Feedforward neural network captured the relationship between the prognostic factors and correctly predicted recurrence. However, it failed to predict the mortality based on the patient's data. Layer Recurrence neural network has been very effective and successfully predicted the recurrence and the mortality of oral tongue cancer in patients. The constructed layered recurrence neural network has been used to investigate the correlation between the prognostic factors. It was found that out of 11 prognostic factors in the data sheet, it was only 5 of them that had considerable impact on the recurrence and mortality. These are grade, depth, budding, modified stage, and gender. Time in months and disease free months were also used to train the network.fi=Opinnäytetyö kokotekstinä PDF-muodossa.|en=Thesis fulltext in PDF format.|sv=Lärdomsprov tillgängligt som fulltext i PDF-format
Risk prediction analysis for post-surgical complications in cardiothoracic surgery
Cardiothoracic surgery patients have the risk of developing surgical site infections
(SSIs), which causes hospital readmissions, increases healthcare costs and may lead to
mortality. The first 30 days after hospital discharge are crucial for preventing these
kind of infections. As an alternative to a hospital-based diagnosis, an automatic digital
monitoring system can help with the early detection of SSIs by analyzing daily images
of patient’s wounds. However, analyzing a wound automatically is one of the biggest
challenges in medical image analysis.
The proposed system is integrated into a research project called CardioFollowAI,
which developed a digital telemonitoring service to follow-up the recovery of cardiothoracic
surgery patients. This present work aims to tackle the problem of SSIs by predicting
the existence of worrying alterations in wound images taken by patients, with the help of
machine learning and deep learning algorithms. The developed system is divided into a
segmentation model which detects the wound region area and categorizes the wound type,
and a classification model which predicts the occurrence of alterations in the wounds.
The dataset consists of 1337 images with chest wounds (WC), drainage wounds (WD)
and leg wounds (WL) from 34 cardiothoracic surgery patients. For segmenting the images,
an architecture with a Mobilenet encoder and an Unet decoder was used to obtain
the regions of interest (ROI) and attribute the wound class. The following model was
divided into three sub-classifiers for each wound type, in order to improve the model’s
performance. Color and textural features were extracted from the wound’s ROIs to feed
one of the three machine learning classifiers (random Forest, support vector machine and
K-nearest neighbors), that predict the final output.
The segmentation model achieved a final mean IoU of 89.9%, a dice coefficient of
94.6% and a mean average precision of 90.1%, showing good results. As for the algorithms
that performed classification, the WL classifier exhibited the best results with a
87.6% recall and 52.6% precision, while WC classifier achieved a 71.4% recall and 36.0%
precision. The WD had the worst performance with a 68.4% recall and 33.2% precision.
The obtained results demonstrate the feasibility of this solution, which can be a start for
preventing SSIs through image analysis with artificial intelligence.Os pacientes submetidos a uma cirurgia cardiotorácica tem o risco de desenvolver
infeções no local da ferida cirúrgica, o que pode consequentemente levar a readmissões
hospitalares, ao aumento dos custos na saúde e à mortalidade. Os primeiros 30 dias
após a alta hospitalar são cruciais na prevenção destas infecções. Assim, como alternativa
ao diagnóstico no hospital, a utilização diária de um sistema digital e automático de
monotorização em imagens de feridas cirúrgicas pode ajudar na precoce deteção destas
infeções. No entanto, a análise automática de feridas é um dos grandes desafios em análise
de imagens médicas.
O sistema proposto integra um projeto de investigação designado CardioFollow.AI,
que desenvolveu um serviço digital de telemonitorização para realizar o follow-up da recuperação
dos pacientes de cirurgia cardiotorácica. Neste trabalho, o problema da infeção
de feridas cirúrgicas é abordado, através da deteção de alterações preocupantes na ferida
com ajuda de algoritmos de aprendizagem automática. O sistema desenvolvido divide-se
num modelo de segmentação, que deteta a região da ferida e a categoriza consoante o seu
tipo, e num modelo de classificação que prevê a existência de alterações na ferida.
O conjunto de dados consistiu em 1337 imagens de feridas do peito (WC), feridas
dos tubos de drenagem (WD) e feridas da perna (WL), provenientes de 34 pacientes de
cirurgia cardiotorácica. A segmentação de imagem foi realizada através da combinação
de Mobilenet como codificador e Unet como decodificador, de forma a obter-se as regiões
de interesse e atribuir a classe da ferida. O modelo seguinte foi dividido em três subclassificadores
para cada tipo de ferida, de forma a melhorar a performance do modelo.
Caraterísticas de cor e textura foram extraídas da região da ferida para serem introduzidas
num dos modelos de aprendizagem automática de forma a prever a classificação final
(Random Forest, Support Vector Machine and K-Nearest Neighbors).
O modelo de segmentação demonstrou bons resultados ao obter um IoU médio final
de 89.9%, um dice de 94.6% e uma média de precisão de 90.1%. Relativamente aos algoritmos
que realizaram a classificação, o classificador WL exibiu os melhores resultados
com 87.6% de recall e 62.6% de precisão, enquanto o classificador das WC conseguiu um recall de 71.4% e 36.0% de precisão. Por fim, o classificador das WD teve a pior performance
com um recall de 68.4% e 33.2% de precisão. Os resultados obtidos demonstram
a viabilidade desta solução, que constitui o início da prevenção de infeções em feridas
cirúrgica a partir da análise de imagem, com recurso a inteligência artificial
Protein Tertiary Model Assessment Using Granular Machine Learning Techniques
The automatic prediction of protein three dimensional structures from its amino acid sequence has become one of the most important and researched fields in bioinformatics. As models are not experimental structures determined with known accuracy but rather with prediction it’s vital to determine estimates of models quality. We attempt to solve this problem using machine learning techniques and information from both the sequence and structure of the protein. The goal is to generate a machine that understands structures from PDB and when given a new model, predicts whether it belongs to the same class as the PDB structures (correct or incorrect protein models). Different subsets of PDB (protein data bank) are considered for evaluating the prediction potential of the machine learning methods. Here we show two such machines, one using SVM (support vector machines) and another using fuzzy decision trees (FDT). First using a preliminary encoding style SVM could get around 70% in protein model quality assessment accuracy, and improved Fuzzy Decision Tree (IFDT) could reach above 80% accuracy. For the purpose of reducing computational overhead multiprocessor environment and basic feature selection method is used in machine learning algorithm using SVM.
Next an enhanced scheme is introduced using new encoding style. In the new style, information like amino acid substitution matrix, polarity, secondary structure information and relative distance between alpha carbon atoms etc is collected through spatial traversing of the 3D structure to form training vectors. This guarantees that the properties of alpha carbon atoms that are close together in 3D space and thus interacting are used in vector formation. With the use of fuzzy decision tree, we obtained a training accuracy around 90%. There is significant improvement compared to previous encoding technique in prediction accuracy and execution time. This outcome motivates to continue to explore effective machine learning algorithms for accurate protein model quality assessment.
Finally these machines are tested using CASP8 and CASP9 templates and compared with other CASP competitors, with promising results. We further discuss the importance of model quality assessment and other information from proteins that could be considered for the same
Streaming Feature Grouping and Selection (Sfgs) For Big Data Classification
Real-time data has always been an essential element for organizations when the quickness of data delivery is critical to their businesses. Today, organizations understand the importance of real-time data analysis to maintain benefits from their generated data. Real-time data analysis is also known as real-time analytics, streaming analytics, real-time streaming analytics, and event processing. Stream processing is the key to getting results in real-time. It allows us to process the data stream in real-time as it arrives. The concept of streaming data means the data are generated dynamically, and the full stream is unknown or even infinite. This data becomes massive and diverse and forms what is known as a big data challenge. In machine learning, streaming feature selection has always been a preferred method in the preprocessing of streaming data. Recently, feature grouping, which can measure the hidden information between selected features, has begun gaining attention. This dissertation’s main contribution is in solving the issue of the extremely high dimensionality of streaming big data by delivering a streaming feature grouping and selection algorithm. Also, the literature review presents a comprehensive review of the current streaming feature selection approaches and highlights the state-of-the-art algorithms trending in this area. The proposed algorithm is designed with the idea of grouping together similar features to reduce redundancy and handle the stream of features in an online fashion. This algorithm has been implemented and evaluated using benchmark datasets against state-of-the-art streaming feature selection algorithms and feature grouping techniques. The results showed better performance regarding prediction accuracy than with state-of-the-art algorithms