7,135 research outputs found

    Decentralized and collaborative machine learning framework for IoT

    Get PDF
    Decentralized machine learning has recently been proposed as a potential solution to the security issues of the canonical federated learning approach. In this paper, we propose a decentralized and collaborative machine learning framework specially oriented to resource-constrained devices, usual in IoT deployments. With this aim we propose the following construction blocks. First, an incremental learning algorithm based on prototypes that was specifically implemented to work in low-performance computing elements. Second, two random-based protocols to exchange the local models among the computing elements in the network. Finally, two algorithmics approaches for prediction and prototype creation. This proposal was compared to a typical centralized incremental learning approach in terms of accuracy, training time and robustness with very promising results.Axencia Galega de Innovación | Ref. 25/IN606D/2021/2612348Agencia Estatal de Investigación | Ref. PID2020-113795RB-C3

    On the Generation of Realistic and Robust Counterfactual Explanations for Algorithmic Recourse

    Get PDF
    This recent widespread deployment of machine learning algorithms presents many new challenges. Machine learning algorithms are usually opaque and can be particularly difficult to interpret. When humans are involved, algorithmic and automated decisions can negatively impact people’s lives. Therefore, end users would like to be insured against potential harm. One popular way to achieve this is to provide end users access to algorithmic recourse, which gives end users negatively affected by algorithmic decisions the opportunity to reverse unfavorable decisions, e.g., from a loan denial to a loan acceptance. In this thesis, we design recourse algorithms to meet various end user needs. First, we propose methods for the generation of realistic recourses. We use generative models to suggest recourses likely to occur under the data distribution. To this end, we shift the recourse action from the input space to the generative model’s latent space, allowing to generate counterfactuals that lie in regions with data support. Second, we observe that small changes applied to the recourses prescribed to end users likely invalidate the suggested recourse after being nosily implemented in practice. Motivated by this observation, we design methods for the generation of robust recourses and for assessing the robustness of recourse algorithms to data deletion requests. Third, the lack of a commonly used code-base for counterfactual explanation and algorithmic recourse algorithms and the vast array of evaluation measures in literature make it difficult to compare the per formance of different algorithms. To solve this problem, we provide an open source benchmarking library that streamlines the evaluation process and can be used for benchmarking, rapidly developing new methods, and setting up new experiments. In summary, our work contributes to a more reliable interaction of end users and machine learned models by covering fundamental aspects of the recourse process and suggests new solutions towards generating realistic and robust counterfactual explanations for algorithmic recourse

    Deep Clustering for Data Cleaning and Integration

    Get PDF
    Deep Learning (DL) techniques now constitute the state-of-theart for important problems in areas such as text and image processing, and there have been impactful results that deploy DL in several data management tasks. Deep Clustering (DC) has recently emerged as a sub-discipline of DL, in which data representations are learned in tandem with clustering, with a view to automatically identifying the features of the data that lead to improved clustering results. While DC has been used to good effect in several domains, particularly in image processing, the potential of DC for data management tasks remains unexplored. In this paper, we address this gap by investigating the suitability of DC for data cleaning and integration tasks, specifically schema inference, entity resolution and domain discovery, from the perspective of tables, rows and columns, respectively. In this setting, we compare and contrast several DC and non-DC clustering algorithms using standard benchmarks. The results show, among other things, that the most effective DC algorithms consistently outperform non-DC clustering algorithms for data integration tasks. Experiments also show consistently strong performance compared with state-of-the-art bespoke algorithms for each of the data integration tasks

    A novel approach for breast ultrasound classification using two-dimensional empirical mode decomposition and multiple features

    Get PDF
    Aim: Breast cancer stands as a prominent cause of female mortality on a global scale, underscoring the critical need for precise and efficient diagnostic techniques. This research significantly enriches the body of knowledge pertaining to breast cancer classification, especially when employing breast ultrasound images, by introducing a novel method rooted in the two dimensional empirical mode decomposition (biEMD) method. In this study, an evaluation of the classification performance is proposed based on various texture features of breast ultrasound images and their corresponding biEMD subbands. Methods: A total of 437 benign and 210 malignant breast ultrasound images were analyzed, preprocessed, and decomposed into three biEMD sub-bands. A variety of features, including the Gray Level Co-occurrence Matrix (GLCM), Local Binary Patterns (LBP), and Histogram of Oriented Gradient (HOG), were extracted, and a feature selection process was performed using the least absolute shrinkage and selection operator method. The study employed GLCM, LBP and HOG, and machine learning techniques, including artificial neural networks (ANN), k-nearest neighbors (kNN), the ensemble method, and statistical discriminant analysis, to classify benign and malignant cases. The classification performance, measured through Area Under the Curve (AUC), accuracy, and F1 score, was evaluated using a 10-fold cross-validation approach. Results: The study showed that using the ANN method and hybrid features (GLCM+LBP+HOG) from BUS images' biEMD sub-bands led to excellent performance, with an AUC of 0.9945, an accuracy of 0.9644, and an F1 score of 0.9668. This has revealed the effectiveness of the biEMD method for classifying breast tumor types from ultrasound images. Conclusion: The obtained results have revealed the effectiveness of the biEMD method for classifying breast tumor types from ultrasound images, demonstrating high-performance classification using the proposed approach

    Smart city: an advanced framework for analyzing public sentiment orientation toward recycled water

    Get PDF
    The coronavirus pandemic of the past several years has had a profound impact on all aspects of life, including resource utilization. One notable example is the increased demand for freshwater, a lifeblood of our planet, on the other hand, the smart city vision aims to attain a smart water management goal by investing in innovative solutions such as recycled water systems. However, the problem lies in the public’s sentiment and willingness to use this new resource which discourages investors and hinders the development of this field. Therefore, in our work, we applied sentiment analysis using an extended version of the fuzzy logic and neural network model from our previous work, to find out the general public opinion regarding recycled water and to assess the effects of sentiments on the public’s readiness to use this resource. Our analysis was based on a dataset of over 1 million text content from 2013 to 2022. The results show, from spatio-temporal perspectives, that sentiment orientation and acceptance-behavior towards using recycled water have increased positively. Additionally, the public is more concerned in areas driven by the smart city vision than in areas of medium and low economic development, where investment in sensibilization campaigns is needed

    Stroke Classification Comparison with KNN through Standardization and Normalization Techniques

    Get PDF
    This study explores the impact of z-score standardization and min-max normalization on K-Nearest Neighbors (KNN) classification for strokes. Focused on managing diverse scales in health attributes within the stroke dataset, the research aims to improve classification model accuracy and reliability. Preprocessing involves z-score standardization, min-max normalization, and no data scaling. The KNN model is trained and evaluated using various methods. Results reveal comparable performance between z-score standardization and min-max normalization, with slight variations across data split ratios. Demonstrating the importance of data scaling, both z-score and min-max achieve 95.07% accuracy. Notably, normalization averages a higher accuracy (94.25%) than standardization (94.21%), highlighting the critical role of data scaling for robust machine learning performance and informed health decisions

    An innovative network intrusion detection system (NIDS): Hierarchical deep learning model based on Unsw-Nb15 dataset

    Get PDF
    With the increasing prevalence of network intrusions, the development of effective network intrusion detection systems (NIDS) has become crucial. In this study, we propose a novel NIDS approach that combines the power of long short-term memory (LSTM) and attention mechanisms to analyze the spatial and temporal features of network traffic data. We utilize the benchmark UNSW-NB15 dataset, which exhibits a diverse distribution of patterns, including a significant disparity in the size of the training and testing sets. Unlike traditional machine learning techniques like support vector machines (SVM) and k-nearest neighbors (KNN) that often struggle with limited feature sets and lower accuracy, our proposed model overcomes these limitations. Notably, existing models applied to this dataset typically require manual feature selection and extraction, which can be time-consuming and less precise. In contrast, our model achieves superior results in binary classification by leveraging the advantages of LSTM and attention mechanisms. Through extensive experiments and evaluations with state-of-the-art ML/DL models, we demonstrate the effectiveness and superiority of our proposed approach. Our findings highlight the potential of combining LSTM and attention mechanisms for enhanced network intrusion detection

    Advancing aviation safety through machine learning and psychophysiological data: a systematic review

    Get PDF
    In the aviation industry, safety remains vital, often compromised by pilot errors attributed to factors such as workload, fatigue, stress, and emotional disturbances. To address these challenges, recent research has increasingly leveraged psychophysiological data and machine learning techniques, offering the potential to enhance safety by understanding pilot behavior. This systematic literature review rigorously follows a widely accepted methodology, scrutinizing 80 peer-reviewed studies out of 3352 studies from five key electronic databases. The paper focuses on behavioral aspects, data types, preprocessing techniques, machine learning models, and performance metrics used in existing studies. It reveals that the majority of research disproportionately concentrates on workload and fatigue, leaving behavioral aspects like emotional responses and attention dynamics less explored. Machine learning models such as tree-based and support vector machines are most commonly employed, but the utilization of advanced techniques like deep learning remains limited. Traditional preprocessing techniques dominate the landscape, urging the need for advanced methods. Data imbalance and its impact on model performance is identified as a critical, under-researched area. The review uncovers significant methodological gaps, including the unexplored influence of preprocessing on model efficacy, lack of diversification in data collection environments, and limited focus on model explainability. The paper concludes by advocating for targeted future research to address these gaps, thereby promoting both methodological innovation and a more comprehensive understanding of pilot behavior

    An improved Arabic text classification method using word embedding

    Get PDF
    Feature selection (FS) is a widely used method for removing redundant or irrelevant features to improve classification accuracy and decrease the model’s computational cost. In this paper, we present an improved method (referred to hereafter as RARF) for Arabic text classification (ATC) that employs the term frequency-inverse document frequency (TF-IDF) and Word2Vec embedding technique to identify words that have a particular semantic relationship. In addition, we have compared our method with four benchmark FS methods namely principal component analysis (PCA), linear discriminant analysis (LDA), chi-square, and mutual information (MI). Support vector machine (SVM), k-nearest neighbors (K-NN), and naive Bayes (NB) are three machine learning based algorithms used in this work. Two different Arabic datasets are utilized to perform a comparative analysis of these algorithms. This paper also evaluates the efficiency of our method for ATC on the basis of performance metrics viz accuracy, precision, recall, and F-measure. Results revealed that the highest accuracy achieved for the SVM classifier applied to the Khaleej-2004 Arabic dataset with 94.75%, while the same classifier recorded an accuracy of 94.01% for the Watan-2004 Arabic dataset

    Face Emotion Recognition Based on Machine Learning: A Review

    Get PDF
    Computers can now detect, understand, and evaluate emotions thanks to recent developments in machine learning and information fusion. Researchers across various sectors are increasingly intrigued by emotion identification, utilizing facial expressions, words, body language, and posture as means of discerning an individual's emotions. Nevertheless, the effectiveness of the first three methods may be limited, as individuals can consciously or unconsciously suppress their true feelings. This article explores various feature extraction techniques, encompassing the development of machine learning classifiers like k-nearest neighbour, naive Bayesian, support vector machine, and random forest, in accordance with the established standard for emotion recognition. The paper has three primary objectives: firstly, to offer a comprehensive overview of effective computing by outlining essential theoretical concepts; secondly, to describe in detail the state-of-the-art in emotion recognition at the moment; and thirdly, to highlight important findings and conclusions from the literature, with an emphasis on important obstacles and possible future paths, especially in the creation of state-of-the-art machine learning algorithms for the identification of emotions
    • …
    corecore