2,052 research outputs found

    Alzheimeri tõve klassifitseerimine MRI-piltidest

    Get PDF
    In English: In this thesis work machine learning techniques are used to classify MRI brain scans of people with Alzheimers Disease. This work deals with binary classification between Alzheimers Disease (AD) and Cognitively Normal (CN). Supervised learning algorithms were used to train a classifier using MATLAB Classification Learner App in which the accuracy is being compared. The dataset used is from The Alzheimers Disease Neuroimaging Initiative (ADNI). Histogram is used for all slices of all images. Based on the highest performance, specific slices were selected for further examination. Majority voting and weighted voting is applied in which the accuracy is calculated and the best result is 69.5% for majority voting. Eesti keeles: Käesolevas töös kasutatakse masinõppe meetodeid, et klassifitseerida Alzheimeri tõvega inimeste MRI aju skaneeringuid. Töös rakendatakse binaarset liigitust Alzheimeri tõve (AD) ja kognitiivse normaalsuse (CD) vahel. Kasutati juhendatud masinõppealgoritme, et treenida klassifikaatoreid MATLAB’i klassifikaatorite õpperakenduses (Classification Learner App), kus võrreldi algoritmi täpsust. Kasutatav andmestik pärineb ADNI andmebaasist (The Alzheimer’s Disease Neuroimaging Initiative). Kõikidest piltidest võetud osadele arvutati histogrammid. Kõrgeima jõudluse põhjal valiti konkreetsed osad edasiseks uurimiseks. Võtteldi enamus ja kaalutud valikute täpsust ja parimaks tulemuseks saadi enamusvalikuid kasutades 69.5%

    Fast Machine Learning Algorithms for Massive Datasets with Applications in the Biomedical Domain

    Get PDF
    The continuous increase in the size of datasets introduces computational challenges for machine learning algorithms. In this dissertation, we cover the machine learning algorithms and applications in large-scale data analysis in manufacturing and healthcare. We begin with introducing a multilevel framework to scale the support vector machine (SVM), a popular supervised learning algorithm with a few tunable hyperparameters and highly accurate prediction. The computational complexity of nonlinear SVM is prohibitive on large-scale datasets compared to the linear SVM, which is more scalable for massive datasets. The nonlinear SVM has shown to produce significantly higher classification quality on complex and highly imbalanced datasets. However, a higher classification quality requires a computationally expensive quadratic programming solver and extra kernel parameters for model selection. We introduce a generalized fast multilevel framework for regular, weighted, and instance weighted SVM that achieves similar or better classification quality compared to the state-of-the-art SVM libraries such as LIBSVM. Our framework improves the runtime more than two orders of magnitude for some of the well-known benchmark datasets. We cover multiple versions of our proposed framework and its implementation in detail. The framework is implemented using PETSc library which allows easy integration with scientific computing tasks. Next, we propose an adaptive multilevel learning framework for SVM to reduce the variance between prediction qualities across the levels, improve the overall prediction accuracy, and boost the runtime. We implement multi-threaded support to speed up the parameter fitting runtime that results in more than an order of magnitude speed-up. We design an early stopping criteria to reduce the extra computational cost when we achieve expected prediction quality. This approach provides significant speed-up, especially for massive datasets. Finally, we propose an efficient low dimensional feature extraction over massive knowledge networks. Knowledge networks are becoming more popular in the biomedical domain for knowledge representation. Each layer in knowledge networks can store the information from one or multiple sources of data. The relationships between concepts or between layers represent valuable information. The proposed feature engineering approach provides an efficient and highly accurate prediction of the relationship between biomedical concepts on massive datasets. Our proposed approach utilizes semantics and probabilities to reduce the potential search space for the exploration and learning of machine learning algorithms. The calculation of probabilities is highly scalable with the size of the knowledge network. The number of features is fixed and equivalent to the number of relationships or classes in the data. A comprehensive comparison of well-known classifiers such as random forest, SVM, and deep learning over various features extracted from the same dataset, provides an overview for performance and computational trade-offs. Our source code, documentation and parameters will be available at https://github.com/esadr/

    Proceedings of the 2nd Computer Science Student Workshop: Microsoft Istanbul, Turkey, April 9, 2011

    Get PDF

    A Study on Comparison of Classification Algorithms for Pump Failure Prediction

    Get PDF
    The reliability of pumps can be compromised by faults, impacting their functionality. Detecting these faults is crucial, and many studies have utilized motor current signals for this purpose. However, as pumps are rotational equipped, vibrations also play a vital role in fault identification. Rising pump failures have led to increased maintenance costs and unavailability, emphasizing the need for cost-effective and dependable machinery operation. This study addresses the imperative challenge of defect classification through the lens of predictive modeling. With a problem statement centered on achieving accurate and efficient identification of defects, this study’s objective is to evaluate the performance of five distinct algorithms: Fine Decision Tree, Medium Decision Tree, Bagged Trees (Ensemble), RUS-Boosted Trees, and Boosted Trees. Leveraging a comprehensive dataset, the study meticulously trained and tested each model, analyzing training accuracy, test accuracy, and Area Under the Curve (AUC) metrics. The results showcase the supremacy of the Fine Decision Tree (91.2% training accuracy, 74% test accuracy, AUC 0.80), the robustness of the Ensemble approach (Bagged Trees with 94.9% training accuracy, 99.9% test accuracy, and AUC 1.00), and the competitiveness of Boosted Trees (89.4% training accuracy, 72.2% test accuracy, AUC 0.79) in defect classification. Notably, Support Vector Machines (SVM), Artificial Neural Networks (ANN), and k-Nearest Neighbors (KNN) exhibited comparatively lower performance. Our study contributes valuable insights into the efficacy of these algorithms, guiding practitioners toward optimal model selection for defect classification scenarios. This research lays a foundation for enhanced decision-making in quality control and predictive maintenance, fostering advancements in the realm of defect prediction and classification

    Robust classification of advanced power quality disturbances in smart grids

    Get PDF
    The insertion of new devices, increased data flow, intermittent generation and massive computerization have considerably increased current electrical systems’ complexity. This increase resulted in necessary changes, such as the need for more intelligent electrical net works to adapt to this different reality. Artificial Intelligence (AI) plays an important role in society, especially the techniques based on the learning process, and it is extended to the power systems. In the context of Smart Grids (SG), where the information and innovative solutions in monitoring is a primary concern, those techniques based on AI can present several applications. This dissertation investigates the use of advanced signal processing and ML algorithms to create a Robust Classifier of Advanced Power Quality (PQ) Dis turbances in SG. For this purpose, known models of PQ disturbances were generated with random elements to approach real applications. From these models, thousands of signals were generated with the performance of these disturbances. Signal processing techniques using Discrete Wavelet Transform (DWT) were used to extract the signal’s main charac teristics. This research aims to use ML algorithms to classify these data according to their respective features. ML algorithms were trained, validated, and tested. Also, the accuracy and confusion matrix were analyzed, relating the logic behind the results. The stages of data generation, feature extraction and optimization techniques were performed in the MATLAB software. The Classification Learner toolbox was used for training, validation and testing the 27 different ML algorithms and assess each performance. All stages of the work were previously idealized, enabling their correct development and execution. The results show that the Cubic Support Vector Machine (SVM) classifier achieved the maximum accuracy of all algorithms, indicating the effectiveness of the proposed method for classification. Considerations about the results were interpreted as explaining the per formance of each technique, its relations and their respective justifications.A inserção de novos dispositivos na rede, aumento do fluxo de dados, geração intermitente e a informatização massiva aumentaram consideravelmente a complexidade dos sistemas elétricos atuais. Esse aumento resultou em mudanças necessárias, como a necessidade de redes elétricas mais inteligentes para se adaptarem a essa realidade diferente. A nova ger ação de técnicas de Inteligência Artificial, representada pelo "Big Data", Aprendizado de Máquina ("Machine Learning"), Aprendizagem Profunda e Reconhecimento de Padrões representa uma nova era na sociedade e no desenvolvimento global baseado na infor mação e no conhecimento. Com as mais recentes Redes Inteligentes, o uso de técnicas que utilizem esse tipo de inteligência será ainda mais necessário. Esta dissertação investiga o uso de processamento de sinais avançado e também algoritmos de Aprendizagem de Máquina para desenvolver um classificador robusto de distúrbios de qualidade de energia no contexto das Redes Inteligentes. Para isso, modelos já conhecidos de alguns proble mas de qualidade foram gerados junto com ruídos aleatórios para que o sistema fosse similar a aplicações reais. A partir desses modelos, milhares de sinais foram gerados e a Transformada Wavelet Discreta foi usada para extrair as principais características destas perturbações. Esta dissertação tem como objetivo utilizar algoritmos baseados no con ceito de Aprendizado de Máquina para classificar os dados gerados de acordo com suas classes. Todos estes algoritmos foram treinados, validados e por fim, testados. Além disso, a acurácia e a matriz de confusão de cada um dos modelos foi apresentada e analisada. As etapas de geração de dados, extração das principais características e otimização dos dados foram realizadas no software MATLAB. Uma toolbox específica deste programa foi us ada para treinar, validar e testar os 27 algoritmos diferentes e avaliar cada desempenho. Todas as etapas do trabalho foram previamente idealizadas, possibilitando seu correto desenvolvimento e execução. Os resultados mostram que o classificador "Cubic Support Vector Machine" obteve a máxima precisão entre todos os algoritmos, indicando a eficácia do método proposto para classificação. As considerações sobre os resultados foram inter pretadas, como por exemplo a explicação da performance de cada técnica, suas relações e suas justificativas

    Machine learning for network based intrusion detection: an investigation into discrepancies in findings with the KDD cup '99 data set and multi-objective evolution of neural network classifier ensembles from imbalanced data.

    Get PDF
    For the last decade it has become commonplace to evaluate machine learning techniques for network based intrusion detection on the KDD Cup '99 data set. This data set has served well to demonstrate that machine learning can be useful in intrusion detection. However, it has undergone some criticism in the literature, and it is out of date. Therefore, some researchers question the validity of the findings reported based on this data set. Furthermore, as identified in this thesis, there are also discrepancies in the findings reported in the literature. In some cases the results are contradictory. Consequently, it is difficult to analyse the current body of research to determine the value in the findings. This thesis reports on an empirical investigation to determine the underlying causes of the discrepancies. Several methodological factors, such as choice of data subset, validation method and data preprocessing, are identified and are found to affect the results significantly. These findings have also enabled a better interpretation of the current body of research. Furthermore, the criticisms in the literature are addressed and future use of the data set is discussed, which is important since researchers continue to use it due to a lack of better publicly available alternatives. Due to the nature of the intrusion detection domain, there is an extreme imbalance among the classes in the KDD Cup '99 data set, which poses a significant challenge to machine learning. In other domains, researchers have demonstrated that well known techniques such as Artificial Neural Networks (ANNs) and Decision Trees (DTs) often fail to learn the minor class(es) due to class imbalance. However, this has not been recognized as an issue in intrusion detection previously. This thesis reports on an empirical investigation that demonstrates that it is the class imbalance that causes the poor detection of some classes of intrusion reported in the literature. An alternative approach to training ANNs is proposed in this thesis, using Genetic Algorithms (GAs) to evolve the weights of the ANNs, referred to as an Evolutionary Neural Network (ENN). When employing evaluation functions that calculate the fitness proportionally to the instances of each class, thereby avoiding a bias towards the major class(es) in the data set, significantly improved true positive rates are obtained whilst maintaining a low false positive rate. These findings demonstrate that the issues of learning from imbalanced data are not due to limitations of the ANNs; rather the training algorithm. Moreover, the ENN is capable of detecting a class of intrusion that has been reported in the literature to be undetectable by ANNs. One limitation of the ENN is a lack of control of the classification trade-off the ANNs obtain. This is identified as a general issue with current approaches to creating classifiers. Striving to create a single best classifier that obtains the highest accuracy may give an unfruitful classification trade-off, which is demonstrated clearly in this thesis. Therefore, an extension of the ENN is proposed, using a Multi-Objective GA (MOGA), which treats the classification rate on each class as a separate objective. This approach produces a Pareto front of non-dominated solutions that exhibit different classification trade-offs, from which the user can select one with the desired properties. The multi-objective approach is also utilised to evolve classifier ensembles, which yields an improved Pareto front of solutions. Furthermore, the selection of classifier members for the ensembles is investigated, demonstrating how this affects the performance of the resultant ensembles. This is a key to explaining why some classifier combinations fail to give fruitful solutions

    Random Forest as a tumour genetic marker extractor

    Get PDF
    Identifying tumour genetic markers is an essential task for biomedicine. In this thesis, we analyse a dataset of chromosomal rearrangements of cancer samples and present a methodology for extracting genetic markers from this dataset by using a Random Forest as a feature selection tool
    corecore