4 research outputs found

    Analysis of Trustworthiness in Machine Learning and Deep Learning

    Get PDF
    Trustworthy Machine Learning (TML) represents a set of mechanisms and explainable layers, which enrich the learning model in order to be clear, understood, thus trusted by users. A literature review has been conducted in this paper to provide a comprehensive analysis on TML perception. A quantitative study accompanied with qualitative observations have been discussed by categorizing machine learning algorithms and emphasising deep learning ones, the latter models have achieved very high performance as real-world function approximators (e.g., natural language and signal processing, robotics, etc.). However, to be fully adapted by humans, a level of transparency needs to be guaranteed which makes the task harder regarding recent techniques (e.g., fully connected layers in neural net-works, dynamic bias, parallelism, etc.). The paper covered both academics and practitioners works, some promising results have been covered, the goal is a high trade-off transparency/accuracy achievement towards a reliable learning approach

    R : A hybrid machine learning feature selection model—HMLFSM to enhance gene classification applied to multiple colon cancers dataset

    Get PDF
    Colon cancer is a significant global health problem, and early detection is critical for improving survival rates. Traditional detection methods, such as colonoscopies, can be invasive and uncomfortable for patients. Machine Learning (ML) algorithms have emerged as a promising approach for non-invasive colon cancer classification using genetic data or patient demographics and medical history. One approach is to use ML to analyse genetic data, or patient demographics and medical history, to predict the likelihood of colon cancer. However, due to the challenges imposed by variable gene expression and the high dimensionality of cancer-related datasets, traditional transductive ML applications have limited accuracy and risk overfitting. In this paper, we propose a new hybrid feature selection model called HMLFSM–Hybrid Machine Learning Feature Selection Model to improve colon cancer gene classification. We developed a multifilter hybrid model including a two-phase feature selection approach, combining Information Gain (IG) and Genetic Algorithms (GA), and minimum Redundancy Maximum Relevance (mRMR) coupling with Particle Swarm Optimization (PSO). We critically tested our model on three colon cancer genetic datasets and found that the new framework outperformed other models with significant accuracy improvements (95%, ~97%, and ~94% accuracies for datasets 1, 2, and 3 respectively). The results show that our approach improves the classification accuracy of colon cancer detection by highlighting important and relevant genes, eliminating irrelevant ones, and revealing the genes that have a direct influence on the classification process. For colon cancer gene analysis, and along with our experiments and literature review, we found that selective input feature extraction prior to feature selection is essential for improving predictive performance

    An investigation into the deep learning approach in sentimental analysis using graph-based theories

    No full text
    Sentiment analysis is a branch of natural language analytics that aims to correlate what is expressed which comes normally within unstructured format with what is believed and learnt. Several attempts have tried to address this gap (i.e., Naive Bayes, RNN, LSTM, word embedding, etc.), even though the deep learning models achieved high performance, their generative process remains a “black-box” and not fully disclosed due to the high dimensional feature and the non-deterministic weights assignment. Meanwhile, graphs are becoming more popular when modeling complex systems while being traceable and understood. Here, we reveal that a good trade-off transparency and efficiency could be achieved with a Deep Neural Network by exploring the Credit Assignment Paths theory. To this end, we propose a novel algorithm which alleviates the features’ extraction mechanism and attributes an importance level of selected neurons by applying a deterministic edge/node embeddings with attention scores on the input unit and backward path respectively. We experiment on the Twitter Health News dataset were the model has been extended to approach different approximations (tweet/aspect and tweets’ source levels, frequency, polarity/subjectivity), it was also transparent and traceable. Moreover, results of comparing with four recent models on same data corpus for tweets analysis showed a rapid convergence with an overall accuracy of ≈83% and 94% of correctly identified true positive sentiments. Therefore, weights can be ideally assigned to specific active features by following the proposed method. As opposite to other compared works, the inferred features are conditioned through the users’ preferences (i.e., frequency degree) and via the activation’s derivatives (i.e., reject feature if not scored). Future direction will address the inductive aspect of graph embeddings to include dynamic graph structures and expand the model resiliency by considering other datasets like SemEval task7, covid-19 tweets, etc

    A new Learning Model with Graph-based Time Variant Changes during a Pandemic Period

    No full text
    Covid-19 has become a worldwide pandemic showing multiple variants over time and regions. This phenomenon has been the subject of deep learning investigation. Although promising performance has been achieved, these models fail to track the temporal evolution of this virus due to the lack of time-varying features. We propose a model called TvDNN (Time-varying Deep-Neural Network) which includes time-variant parameters into the covid-19 sentiment analysis. Central entities (variants) were detected within the input space. These entities were featured by a high betweenness centrality and a provable time-variance of the Covid pandemic. We formalized the correspondence between the time-evolving state of the virus mutation, e.g., covid-19, and the corresponding tweets’ publication for an accurate sentiment analysis learning. It was found that TvDNN’s performance outperforms the reported models in literature on Covid-19 sentiment analysis by achieving (99.86%) accuracy, (98%) AUC/ROC. To conclude, characterizing the domain dynamically, e.g., changes with time, is the key to success. Consequently, the developed DNN with better performance. This should be generalized as that dynamic input changes are one of the possible types of domains in neural network development.<br/
    corecore