Search CORE

42 research outputs found

Granular Support Vector Machines Based on Granular Computing, Soft Computing and Statistical Learning

Author: Tang Yuchun
Publication venue: ScholarWorks @ Georgia State University
Publication date: 01/01/2006
Field of study

With emergence of biomedical informatics, Web intelligence, and E-business, new challenges are coming for knowledge discovery and data mining modeling problems. In this dissertation work, a framework named Granular Support Vector Machines (GSVM) is proposed to systematically and formally combine statistical learning theory, granular computing theory and soft computing theory to address challenging predictive data modeling problems effectively and/or efficiently, with specific focus on binary classification problems. In general, GSVM works in 3 steps. Step 1 is granulation to build a sequence of information granules from the original dataset or from the original feature space. Step 2 is modeling Support Vector Machines (SVM) in some of these information granules when necessary. Finally, step 3 is aggregation to consolidate information in these granules at suitable abstract level. A good granulation method to find suitable granules is crucial for modeling a good GSVM. Under this framework, many different granulation algorithms including the GSVM-CMW (cumulative margin width) algorithm, the GSVM-AR (association rule mining) algorithm, a family of GSVM-RFE (recursive feature elimination) algorithms, the GSVM-DC (data cleaning) algorithm and the GSVM-RU (repetitive undersampling) algorithm are designed for binary classification problems with different characteristics. The empirical studies in biomedical domain and many other application domains demonstrate that the framework is promising. As a preliminary step, this dissertation work will be extended in the future to build a Granular Computing based Predictive Data Modeling framework (GrC-PDM) with which we can create hybrid adaptive intelligent data mining systems for high quality prediction

CiteSeerX

ScholarWorks @ Georgia State University

Stable Feature Selection for Biomarker Discovery

Author: He Zengyou
Yu Weichuan
Publication venue
Publication date: 01/01/2010
Field of study

Feature selection techniques have been used as the workhorse in biomarker discovery applications for a long time. Surprisingly, the stability of feature selection with respect to sampling variations has long been under-considered. It is only until recently that this issue has received more and more attention. In this article, we review existing stable feature selection methods for biomarker discovery using a generic hierarchal framework. We have two objectives: (1) providing an overview on this new yet fast growing topic for a convenient reference; (2) categorizing existing methods under an expandable framework for future research and development

arXiv.org e-Print Archive

CiteSeerX

Hong Kong University of Science and Technology Institutional Repository

Interpretability-oriented data-driven modelling of bladder cancer via computational intelligence

Author: De Alejandro Montalvo Julio Cesar
Publication venue: 'University of Sheffield Conference Proceedings'
Publication date: 13/02/2015
Field of study

White Rose E-theses Online

Breast cancer disease classification using fuzzy-ID3 algorithm based on association function

Author: Mohd Arfian Ismail
Mohd Saberi Mohamad
Nur Farahaina Idris
Shahreen Kasim
Sutikno Tole
Zalmiyah Zakaria
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/06/2022
Field of study

Breast cancer is the second leading cause of mortality among female cancer patients worldwide. Early detection of breast cancer is considerd as one of the most effective ways to prevent the disease from spreading and enable human can make correct decision on the next process. Automatic diagnostic methods were frequently used to conduct breast cancer diagnoses in order to increase the accuracy and speed of detection. The fuzzy-ID3 algorithm with association function implementation (FID3-AF) is proposed as a classification technique for breast cancer detection. The FID3-AF algorithm is a hybridisation of the fuzzy system, the iterative dichotomizer 3 (ID3) algorithm, and the association function. The fuzzy-neural dynamic-bottleneck-detection (FUZZYDBD) is considered as an automatic fuzzy database definition method, would aid in the development of the fuzzy database for the data fuzzification process in FID3-AF. The FID3-AF overcame ID3’s issue of being unable to handle continuous data. The association function is implemented to minimise overfitting and enhance generalisation ability. The results indicated that FID3-AF is robust in breast cancer classification. A thorough comparison of FID3-AF to numerous existing methods was conducted to validate the proposed method’s competency. This study established that the FID3-AF performed well and outperform other methods in breast cancer classification

UMP Institutional Repository

Recommended from our members

Granular computing approach for intelligent classifier design

Author: Al-Shammaa Mohammed
Publication venue: Brunel University London
Publication date: 01/01/2016
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University London.Granular computing facilitates dealing with information by providing a theoretical framework to deal with information as granules at different levels of granularity (different levels of specificity/abstraction). It aims to provide an abstract explainable description of the data by forming granules that represent the features or the underlying structure of corresponding subsets of the data. In this thesis, a granular computing approach to the design of intelligent classification systems is proposed. The proposed approach is employed for different classification systems to investigate its efficiency. Fuzzy inference systems, neural networks, neuro-fuzzy systems and classifier ensembles are considered to evaluate the efficiency of the proposed approach. Each of the considered systems is designed using the proposed approach and classification performance is evaluated and compared to that of the standard system. The proposed approach is based on constructing information granules from data at multiple levels of granularity. The granulation process is performed using a modified fuzzy c-means algorithm that takes classification problem into account. Clustering is followed by a coarsening process that involves merging small clusters into large ones to form a lower granularity level. The resulted granules are used to build each of the considered binary classifiers in different settings and approaches. Granules produced by the proposed granulation method are used to build a fuzzy classifier for each granulation level or set of levels. The performance of the classifiers is evaluated using real life data sets and measured by two classification performance measures: accuracy and area under receiver operating characteristic curve. Experimental results show that fuzzy systems constructed using the proposed method achieved better classification performance. In addition, the proposed approach is used for the design of neural network classifiers. Resulted granules from one or more granulation levels are used to train the classifiers at different levels of specificity/abstraction. Using this approach, the classification problem is broken down into the modelling of classification rules represented by the information granules resulting in more interpretable system. Experimental results show that neural network classifiers trained using the proposed approach have better classification performance for most of the data sets. In a similar manner, the proposed approach is used for the training of neuro-fuzzy systems resulting in similar improvement in classification performance. Lastly, neural networks built using the proposed approach are used to construct a classifier ensemble. Information granules are used to generate and train the base classifiers. The final ensemble output is produced by a weighted sum combiner. Based on the experimental results, the proposed approach has improved the classification performance of the base classifiers for most of the data sets. Furthermore, a genetic algorithm is used to determine the combiner weights automatically.Higher Committee for Education Development in Iraq (HCED

Brunel University Research Archive

PREDICTION OF RECURRENCE AND MORTALITY OF ORAL TONGUE CANCER USING ARTIFICIAL NEURAL NETWORK (A case study of 5 hospitals in Finland and 1 hospital from Sao Paulo, Brazil)

Author: Alabi Rasheed Omobolaji
Publication venue
Publication date: 01/01/2017
Field of study

Cancer is a dreadful disease that had caused the death of millions of people. It is characterized by an uncontrollable growth of cell to form lumps or masses of tissue that are known as tumour. Therefore, it is a concern to all and sundry as these tumours mostly release hormones which have negative impact on the body system. Data mining approaches, statistical methods and machine learning algorithms have been proposed for effective cancer data classification. Artificial Neural Networks (ANN) have been used in this thesis for the prediction of recurrence and mortality of oral tongue cancer in patients. Similarly, ANN was also used to examine the diagnostic and prognostic factors. This was aimed at determining which of these diagnostic and prognostics factors had influence on the prediction of recurrence and mortality of oral tongue cancer in patients. Three different ANN have been applied for the learning and testing phases. The aim was to find the most effective technique. They are Elman, Feedforward, and Layer Recurrent neural networks techniques. Elman neural network was not able to make acceptable prediction of the recurrence or the mortality of tongue cancer based on the data. In contrast, Feedforward neural network captured the relationship between the prognostic factors and correctly predicted recurrence. However, it failed to predict the mortality based on the patient's data. Layer Recurrence neural network has been very effective and successfully predicted the recurrence and the mortality of oral tongue cancer in patients. The constructed layered recurrence neural network has been used to investigate the correlation between the prognostic factors. It was found that out of 11 prognostic factors in the data sheet, it was only 5 of them that had considerable impact on the recurrence and mortality. These are grade, depth, budding, modified stage, and gender. Time in months and disease free months were also used to train the network.fi=Opinnäytetyö kokotekstinä PDF-muodossa.|en=Thesis fulltext in PDF format.|sv=Lärdomsprov tillgängligt som fulltext i PDF-format

Osuva

Risk prediction analysis for post-surgical complications in cardiothoracic surgery

Author: Pereira Catarina
Publication venue
Publication date: 01/05/2022
Field of study

Cardiothoracic surgery patients have the risk of developing surgical site infections (SSIs), which causes hospital readmissions, increases healthcare costs and may lead to mortality. The first 30 days after hospital discharge are crucial for preventing these kind of infections. As an alternative to a hospital-based diagnosis, an automatic digital monitoring system can help with the early detection of SSIs by analyzing daily images of patient’s wounds. However, analyzing a wound automatically is one of the biggest challenges in medical image analysis. The proposed system is integrated into a research project called CardioFollowAI, which developed a digital telemonitoring service to follow-up the recovery of cardiothoracic surgery patients. This present work aims to tackle the problem of SSIs by predicting the existence of worrying alterations in wound images taken by patients, with the help of machine learning and deep learning algorithms. The developed system is divided into a segmentation model which detects the wound region area and categorizes the wound type, and a classification model which predicts the occurrence of alterations in the wounds. The dataset consists of 1337 images with chest wounds (WC), drainage wounds (WD) and leg wounds (WL) from 34 cardiothoracic surgery patients. For segmenting the images, an architecture with a Mobilenet encoder and an Unet decoder was used to obtain the regions of interest (ROI) and attribute the wound class. The following model was divided into three sub-classifiers for each wound type, in order to improve the model’s performance. Color and textural features were extracted from the wound’s ROIs to feed one of the three machine learning classifiers (random Forest, support vector machine and K-nearest neighbors), that predict the final output. The segmentation model achieved a final mean IoU of 89.9%, a dice coefficient of 94.6% and a mean average precision of 90.1%, showing good results. As for the algorithms that performed classification, the WL classifier exhibited the best results with a 87.6% recall and 52.6% precision, while WC classifier achieved a 71.4% recall and 36.0% precision. The WD had the worst performance with a 68.4% recall and 33.2% precision. The obtained results demonstrate the feasibility of this solution, which can be a start for preventing SSIs through image analysis with artificial intelligence.Os pacientes submetidos a uma cirurgia cardiotorácica tem o risco de desenvolver infeções no local da ferida cirúrgica, o que pode consequentemente levar a readmissões hospitalares, ao aumento dos custos na saúde e à mortalidade. Os primeiros 30 dias após a alta hospitalar são cruciais na prevenção destas infecções. Assim, como alternativa ao diagnóstico no hospital, a utilização diária de um sistema digital e automático de monotorização em imagens de feridas cirúrgicas pode ajudar na precoce deteção destas infeções. No entanto, a análise automática de feridas é um dos grandes desafios em análise de imagens médicas. O sistema proposto integra um projeto de investigação designado CardioFollow.AI, que desenvolveu um serviço digital de telemonitorização para realizar o follow-up da recuperação dos pacientes de cirurgia cardiotorácica. Neste trabalho, o problema da infeção de feridas cirúrgicas é abordado, através da deteção de alterações preocupantes na ferida com ajuda de algoritmos de aprendizagem automática. O sistema desenvolvido divide-se num modelo de segmentação, que deteta a região da ferida e a categoriza consoante o seu tipo, e num modelo de classificação que prevê a existência de alterações na ferida. O conjunto de dados consistiu em 1337 imagens de feridas do peito (WC), feridas dos tubos de drenagem (WD) e feridas da perna (WL), provenientes de 34 pacientes de cirurgia cardiotorácica. A segmentação de imagem foi realizada através da combinação de Mobilenet como codificador e Unet como decodificador, de forma a obter-se as regiões de interesse e atribuir a classe da ferida. O modelo seguinte foi dividido em três subclassificadores para cada tipo de ferida, de forma a melhorar a performance do modelo. Caraterísticas de cor e textura foram extraídas da região da ferida para serem introduzidas num dos modelos de aprendizagem automática de forma a prever a classificação final (Random Forest, Support Vector Machine and K-Nearest Neighbors). O modelo de segmentação demonstrou bons resultados ao obter um IoU médio final de 89.9%, um dice de 94.6% e uma média de precisão de 90.1%. Relativamente aos algoritmos que realizaram a classificação, o classificador WL exibiu os melhores resultados com 87.6% de recall e 62.6% de precisão, enquanto o classificador das WC conseguiu um recall de 71.4% e 36.0% de precisão. Por fim, o classificador das WD teve a pior performance com um recall de 68.4% e 33.2% de precisão. Os resultados obtidos demonstram a viabilidade desta solução, que constitui o início da prevenção de infeções em feridas cirúrgica a partir da análise de imagem, com recurso a inteligência artificial

Repositório da Universidade Nova de Lisboa

Protein Tertiary Model Assessment Using Granular Machine Learning Techniques

Author: Chida Anjum A
Publication venue: ScholarWorks @ Georgia State University
Publication date: 21/03/2012
Field of study

The automatic prediction of protein three dimensional structures from its amino acid sequence has become one of the most important and researched fields in bioinformatics. As models are not experimental structures determined with known accuracy but rather with prediction it’s vital to determine estimates of models quality. We attempt to solve this problem using machine learning techniques and information from both the sequence and structure of the protein. The goal is to generate a machine that understands structures from PDB and when given a new model, predicts whether it belongs to the same class as the PDB structures (correct or incorrect protein models). Different subsets of PDB (protein data bank) are considered for evaluating the prediction potential of the machine learning methods. Here we show two such machines, one using SVM (support vector machines) and another using fuzzy decision trees (FDT). First using a preliminary encoding style SVM could get around 70% in protein model quality assessment accuracy, and improved Fuzzy Decision Tree (IFDT) could reach above 80% accuracy. For the purpose of reducing computational overhead multiprocessor environment and basic feature selection method is used in machine learning algorithm using SVM. Next an enhanced scheme is introduced using new encoding style. In the new style, information like amino acid substitution matrix, polarity, secondary structure information and relative distance between alpha carbon atoms etc is collected through spatial traversing of the 3D structure to form training vectors. This guarantees that the properties of alpha carbon atoms that are close together in 3D space and thus interacting are used in vector formation. With the use of fuzzy decision tree, we obtained a training accuracy around 90%. There is significant improvement compared to previous encoding technique in prediction accuracy and execution time. This outcome motivates to continue to explore effective machine learning algorithms for accurate protein model quality assessment. Finally these machines are tested using CASP8 and CASP9 templates and compared with other CASP competitors, with promising results. We further discuss the importance of model quality assessment and other information from proteins that could be considered for the same

ScholarWorks @ Georgia State University

Streaming Feature Grouping and Selection (Sfgs) For Big Data Classification

Author: Hamad Al Nuaimi Noura Helal
Publication venue: Scholarworks@UAEU
Publication date: 01/03/2019
Field of study

Real-time data has always been an essential element for organizations when the quickness of data delivery is critical to their businesses. Today, organizations understand the importance of real-time data analysis to maintain benefits from their generated data. Real-time data analysis is also known as real-time analytics, streaming analytics, real-time streaming analytics, and event processing. Stream processing is the key to getting results in real-time. It allows us to process the data stream in real-time as it arrives. The concept of streaming data means the data are generated dynamically, and the full stream is unknown or even infinite. This data becomes massive and diverse and forms what is known as a big data challenge. In machine learning, streaming feature selection has always been a preferred method in the preprocessing of streaming data. Recently, feature grouping, which can measure the hidden information between selected features, has begun gaining attention. This dissertation’s main contribution is in solving the issue of the extremely high dimensionality of streaming big data by delivering a streaming feature grouping and selection algorithm. Also, the literature review presents a comprehensive review of the current streaming feature selection approaches and highlights the state-of-the-art algorithms trending in this area. The proposed algorithm is designed with the idea of grouping together similar features to reduce redundancy and handle the stream of features in an online fashion. This algorithm has been implemented and evaluated using benchmark datasets against state-of-the-art streaming feature selection algorithms and feature grouping techniques. The results showed better performance regarding prediction accuracy than with state-of-the-art algorithms

United Arab Emirates University: Scholarworks@UAEU / جامعة الامارات