10 research outputs found

    Comprehensive Analysis of Various Big Data Classification Techniques: A Challenging Overview

    No full text
    Data over the internet has been increasing everyday, and automatic mining of essential information from an enormous amount of data has become a challenging task today for an organisation with a huge dataset. In recent years, the prominent technology in the domain of Information Technology (IT) is big data, which is unstructured data that solves the computational complexity of classical database systems. The data is fast and big and typically derived from multiple and independent sources. The three main challenges are data accessing, semantics, and domain knowledge for various big data utilisations and complexities raised by big data volumes. One of the major limitations is the classification of big data. This paper introduces well-defined classification methodologies employed for big data classification. This paper reviews 50 research papers based on classification methods of big data, and such methodologies are primarily categorised into six different categories, namely K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Fuzzy-based method, Bayesian-based method, Random Forest, and Decision Tree. In addition, detailed analysis and discussion are carried out by considering classification techniques, dataset utilised, evaluation metrics, semantic similarity measures, and publication year. In addition, research gaps and issues for several traditional big data classification techniques are explained to expand investigators\u27 works to provide effective big data management

    K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data

    No full text
    Advances in recent techniques for scientific data collection in the era of big data allow for the systematic accumulation of large quantities of data at various data-capturing sites. Similarly, exponential growth in the development of different data analysis approaches has been reported in the literature, amongst which the K-means algorithm remains the most popular and straightforward clustering algorithm. The broad applicability of the algorithm in many clustering application areas can be attributed to its implementation simplicity and low computational complexity. However, the K-means algorithm has many challenges that negatively affect its clustering performance. In the algorithm\u27s initialization process, users must specify the number of clusters in a given dataset apriori while the initial cluster centers are randomly selected. Furthermore, the algorithm\u27s performance is susceptible to the selection of this initial cluster and for large datasets, determining the optimal number of clusters to start with becomes complex and is a very challenging task. Moreover, the random selection of the initial cluster centers sometimes results in minimal local convergence due to its greedy nature. A further limitation is that certain data object features are used in determining their similarity by using the Euclidean distance metric as a similarity measure, but this limits the algorithm\u27s robustness in detecting other cluster shapes and poses a great challenge in detecting overlapping clusters. Many research efforts have been conducted and reported in literature with regard to improving the K-means algorithm\u27s performance and robustness. The current work presents an overview and taxonomy of the K-means clustering algorithm and its variants. The history of the K-means, current trends, open issues and challenges, and recommended future research perspectives are also discussed

    Hybrid CLAHE-CNN Deep Neural Networks for Classifying Lung Diseases from X-ray Acquisitions

    No full text
    Chest and lung diseases are among the most serious chronic diseases in the world, and they occur as a result of factors such as smoking, air pollution, or bacterial infection, which would expose the respiratory system and chest to serious disorders. Chest diseases lead to a natural weakness in the respiratory system, which requires the patient to take care and attention to alleviate this problem. Countries are interested in encouraging medical research and monitoring the spread of communicable diseases. Therefore, they advised researchers to perform studies to curb the diseasesā€™ spread and urged researchers to devise methods for swiftly and readily detecting and distinguishing lung diseases. In this paper, we propose a hybrid architecture of contrast-limited adaptive histogram equalization (CLAHE) and deep convolutional network for the classification of lung diseases. We used X-ray images to create a convolutional neural network (CNN) for early identification and categorization of lung diseases. Initially, the proposed method implemented the support vector machine to classify the images with and without using CLAHE equalizer. The obtained results were compared with the CNN networks. Later, two different experiments were implemented with hybrid architecture of deep CNN networks and CLAHE as a preprocessing for image enhancement. The experimental results indicate that the suggested hybrid architecture outperforms traditional methods by roughly 20% in terms of accuracy

    A comprehensive study of machine learning for predicting cardiovascular disease using Weka and Statistical Package for Social Sciences tools

    No full text
    Artificial intelligence (AI) is simulating human intelligence processes by machines and software simulators to help humans in making accurate, informed, and fast decisions based on data analysis. The medical field can make use of such AI simulators because medical data records are enormous with many overlapping parameters. Using in-depth classification techniques and data analysis can be the first step in identifying and reducing the risk factors. In this research, we are evaluating a dataset of cardiovascular abnormalities affecting a group of potential patients. We aim to employ the help of AI simulators such as Weka to understand the effect of each parameter on the risk of suffering from cardiovascular disease (CVD). We are utilizing seven classes, such as baseline accuracy, naĆÆve Bayes, k-nearest neighbor, decision tree, support vector machine, linear regression, and artificial neural network multilayer perceptron. The classifiers are assisted by a correlation-based filter to select the most influential attributes that may have an impact on obtaining a higher classification accuracy. Analysis of the results based on sensitivity, specificity, accuracy, and precision results from Weka and Statistical Package for Social Sciences (SPSS) is illustrated. A decision tree method (J48) demonstrated its ability to classify CVD cases with high accuracy 95.76%

    Moth Flame Optimization: Theory, Modifications, Hybridizations, and Applications

    No full text
    The Moth flame optimization (MFO) algorithm belongs to the swarm intelligence family and is applied to solve complex real-world optimization problems in numerous domains. MFO and its variants are easy to understand and simple to operate. However, these algorithms have successfully solved optimization problems in different areas such as power and energy systems, engineering design, economic dispatch, image processing, and medical applications. A comprehensive review of MFO variants is presented in this context, including the classic version, binary types, modified versions, hybrid versions, multi-objective versions, and application part of the MFO algorithm in various sectors. Finally, the evaluation of the MFO algorithm is presented to measure its performance compared to other algorithms. The main focus of this literature is to present a survey and review the MFO and its applications. Also, the concluding remark section discusses some possible future research directions of the MFO algorithm and its variants

    A novel secure cryptography model for data transmission based on Rotor64 technique

    No full text
    In recent years, there have been many Security vulnerabilities that threaten user security, these threats have led to the finding of user files, so the use of the Internet has become unlimited, and the number of digital network devices has increased,Therefore, maintaining the confidentiality and integrity of information has become an urgent necessity to preserve user information, due to the increase in hackers and intruders, and the innovation of modern methods of penetration every day. Data cryptography has proven to be a secure way to protect a user\u27s data. Many current cryptography algorithms are considered weak regarding data transmission over the Internet, so newly updated algorithms are in high demand. In this paper, we proposed to develop the ancient rotor machine depending on the base64 codding technique, in which we replaced the alphabets of the ancient rotor machine with the alphabets of base64 that contain 64 characters. Furthermore, we proposed a key exchange based on One-time password OTP code via SMS, OTP is mechanism for logging on to a network using unique password that can only be used once, to overcome the static password method that is least secure, and used it to generate the subkeys for rotor machines based on hash and random permutation techniques. MD5 algorithm function is used to authenticate the original message, Finally, we experimented with these techniques of secure sending e-mails by encrypting the contents of them with the proposed technique. However, the proposed security technique got promising results

    Improved prairie dog optimization algorithm by dwarf mongoose optimization algorithm for optimization problems

    No full text
    Recently, optimization problems have been revised in many domains, and they need powerful search methods to address them. In this paper, a novel hybrid optimization algorithm is proposed to solve various benchmark functions, which is called IPDOA. The proposed method is based on enhancing the search process of the Prairie Dog Optimization Algorithm (PDOA) by using the primary updating mechanism of the Dwarf Mongoose Optimization Algorithm (DMOA). The main aim of the proposed IPDOA is to avoid the main weaknesses of the original methods; these weaknesses are poor convergence ability, the imbalance between the search process, and premature convergence. Experiments are conducted on 23 standard benchmark functions, and the results are compared with similar methods from the literature. The results are recorded in terms of the best, worst, and average fitness function, showing that the proposed method is more vital to deal with various problems than other methods

    Evolution of Machine Learning in Tuberculosis Diagnosis: A Review of Deep Learning-Based Medical Applications

    No full text
    Tuberculosis (TB) is an infectious disease that has been a major menace to human health globally, causing millions of deaths yearly. Well-timed diagnosis and treatment are an arch to full recovery of the patient. Computer-aided diagnosis (CAD) has been a hopeful choice for TB diagnosis. Many CAD approaches using machine learning have been applied for TB diagnosis, specific to the artificial intelligence (AI) domain, which has led to the resurgence of AI in the medical field. Deep learning (DL), a major branch of AI, provides bigger room for diagnosing deadly TB disease. This review is focused on the limitations of conventional TB diagnostics and a broad description of various machine learning algorithms and their applications in TB diagnosis. Furthermore, various deep learning methods integrated with other systems such as neuro-fuzzy logic, genetic algorithm, and artificial immune systems are discussed. Finally, multiple state-of-the-art tools such as CAD4TB, Lunit INSIGHT, qXR, and InferRead DR Chest are summarized to view AI-assisted future aspects in TB diagnosis

    A Comprehensive Review of Bat Inspired Algorithm: Variants, Applications, and Hybridization

    No full text
    Bat algorithm (BA) is one of the promising metaheuristic algorithms. It proved its efficiency in dealing with various optimization problems in diverse fields, such as power and energy systems, economic load dispatch problems, engineering design, image processing and medical applications. Thus, this review introduces a comprehensive and exhaustive review of the BA, as well as evaluates its main characteristics by comparing it with other optimization algorithms. The review paper highlights the performance of BA in different applications and the modifications that have been conducted by researchers (i.e., variants of BA). At the end, the conclusions focus on the current work on BA, highlighting its weaknesses, and suggest possible future research directions. The review paper will be helpful for the researchers and practitioners of BA belonging to a wide range of audiences from the domains of optimization, engineering, medical, data mining and clustering. As well, it is wealthy in research on health, environment and public safety. Also, it will aid those who are interested by providing them with potential future research
    corecore