23 research outputs found

    A comprehensive study of machine learning for predicting cardiovascular disease using Weka and Statistical Package for Social Sciences tools

    No full text
    Artificial intelligence (AI) is simulating human intelligence processes by machines and software simulators to help humans in making accurate, informed, and fast decisions based on data analysis. The medical field can make use of such AI simulators because medical data records are enormous with many overlapping parameters. Using in-depth classification techniques and data analysis can be the first step in identifying and reducing the risk factors. In this research, we are evaluating a dataset of cardiovascular abnormalities affecting a group of potential patients. We aim to employ the help of AI simulators such as Weka to understand the effect of each parameter on the risk of suffering from cardiovascular disease (CVD). We are utilizing seven classes, such as baseline accuracy, naïve Bayes, k-nearest neighbor, decision tree, support vector machine, linear regression, and artificial neural network multilayer perceptron. The classifiers are assisted by a correlation-based filter to select the most influential attributes that may have an impact on obtaining a higher classification accuracy. Analysis of the results based on sensitivity, specificity, accuracy, and precision results from Weka and Statistical Package for Social Sciences (SPSS) is illustrated. A decision tree method (J48) demonstrated its ability to classify CVD cases with high accuracy 95.76%

    Cyberbullying detection framework for short and imbalanced Arabic datasets

    No full text
    Cyberbullying detection has attracted many researchers to detect negative comments deployed on communication platforms as cyberbullying can take many forms: verbal, implicit, explicit, or even nonverbal. The successful growth of social media in recent years has opened new perspectives on the detection of cyberbullying, although related research still encounters several challenges, such as data imbalance and expression implicitness. In this paper, we propose an automated cyberbullying detection framework designed to produce satisfactory results, especially when imbalanced short text and different dialects exist in the Arabic text data. In the proposed framework a new method to solve the imbalance problem is suggested, where the modified simulated annealing optimization algorithm is used to find the optimal set of samples from the majority class to balance the training set. This method has been evaluated using traditional machine learning algorithms including support vector machine, and deep learning algorithms including Long Short-Term Memory (LSTM) and Bidirectional LSTM (Bi-LSTM). To generate a framework that can detect Arabic written cyberbullying on communication platforms, the accuracy, recall, specificity, sensitivity and mean squared error are used as the main performance indicators. The results indicate that the proposed framework can improve the performance of the tested algorithms, and Bi-LSTM outperforms other methods for cyberbullying classification

    Comparison of specific segmentation methods used for copy move detection

    No full text
    In this digital age, the widespread use of digital images and the availability of image editors have made the credibility of images controversial. To confirm the credibility of digital images many image forgery detection types are arises, copy-move forgery is consisting of transforming any image by duplicating a part of the image, to add or hide existing objects. Several methods have been proposed in the literature to detect copy-move forgery, these methods use the key point-based and block-based to find the duplicated areas. However, the key point-based and block-based have a drawback of the ability to handle the smooth region. In addition, image segmentation plays a vital role in changing the representation of the image in a meaningful form for analysis. Hence, we execute a comparison study for segmentation based on two clustering algorithms (i.e., k-means and super pixel segmentation with density-based spatial clustering of applications with noise (DBSCAN)), the paper compares methods in term of the accuracy of detecting the forgery regions of digital images. K-means shows better performance compared with DBSCAN and with other techniques in the literature

    Susceptible exposed infectious recovered-machine learning for COVID-19 prediction in Saudi Arabia

    Get PDF
    Susceptible exposed infectious recovered (SEIR) is among the epidemiological models used in forecasting the spread of disease in large populations. SEIR is a fitting model for coronavirus disease (COVID-19) spread prediction. Somehow, in its original form, SEIR could not measure the impact of lockdowns. So, in the SEIR equations system utilized in this study, a variable was included to evaluate the impact of varying levels of social distance on the transmission of COVID-19. Additionally, we applied artificial intelligence utilizing the deep neural network machine learning (ML) technique. On the initial spread data for Saudi Arabia that were available up to June 25th, 2021, this improved SEIR model was used. The study shows possible infection to around 3.1 million persons without lockdown in Saudi Arabia at the peak of spread, which lasts for about 3 months beginning from the lockdown date (March 21st). On the other hand, the Kingdom's current partial lockdown policy was estimated to cut the estimated number of infections to 0.5 million over nine months. The data shows that stricter lockdowns may successfully flatten the COVID-19 graph curve in Saudi Arabia. We successfully predicted the COVID-19 epidemic's peaks and sizes using our modified deep neural network (DNN) and SEIR model

    Improved Multi-Verse Optimizer Feature Selection Technique With Application To Phishing, Spam, and Denial Of Service Attacks

    Get PDF
    Intelligent classification systems proved their merits in different fields including cybersecurity. However, most cybercrime issues are characterized of being dynamic and not static classification problems where the set of discriminative features keep changing with time. This indeed requires revising the cybercrime classification system and pick a group of features that preserve or enhance its performance. Not only this but also the system compactness is regarded as an important factor to judge on the capability of any classification system where cybercrime classification systems are not an exception. The current research proposes an improved feature selection algorithm that is inspired from the well-known multi-verse optimizer (MVO) algorithm. Such an algorithm is then applied to 3 different cybercrime classification problems namely phishing websites, spam, and denial of service attacks. MVO is a population-based approach which stimulates a well-known theory in physics namely multi-verse theory. MVO uses the black and white holes principles for exploration, and wormholes principle for exploitation. A roulette selection schema is used for scientifically modeling the principles of white hole and black hole in exploration phase, which bias to the good solutions, in this case the solutions will be moved toward the best solution and probably to lose the diversity, other solutions may contain important information but didnÔÇÖt get chance to be improved. Thus, this research will improve the exploration of the MVO by introducing the adaptive neighborhood search operations in updating the MVO solutions. The classification phase has been done using a classifier to evaluate the results and to validate the selected features. Empirical outcomes confirmed that the improved MVO (IMVO) algorithm is capable to enhance the search capability of MVO, and outperform other algorithm involved in comparison

    Learning trends in customer churn with rule-based and kernel methods

    No full text
    In the present article an attempt has been made to predict the occurrences of customers leaving or ÔÇśchurningÔÇÖ a business enterprise and explain the possible causes for the customer churning. Three different algorithms are used to predict churn, viz. decision tree, support vector machine and rough set theory. While two are rule-based learning methods which lead to more interpretable results that might help the marketing division to retain or hasten cross-sell of customers, one of them is a kernel-based classification that separates the customers on a feature hyperplane. The nature of predictions and rules obtained from them are able to provide a choice between a more focused or more extensive program the company may wish to implement as part of its customer retention program

    Use of production functions in assessing the profitability of shares of insurance companies

    Get PDF
    In this study the production functions (Cobb-Douglas, Zener-Rivanker, and the transcendental production function) have been used to assess the profitability of insurance companies, by reformulating these nonlinear functions based on the introduction of a set of variables that contribute to increase the explanatory capacity of the model. Then the best production function commensurate with the nature of the variable representing the profitability of insurance companies was chosen, to use it to assess the efficiency of their profitability versus the use of different factors of production and thus the possibility of using it in forecasting. It was found that the proposed model of the production function "Zener-Rivanker" is the best production functions representing the profitability of the Tawuniya and Bupa Insurance Companies. The proposed model of the Cobb-Douglas production function is suitable for the results of both Enaya and Sanad Cooperative Insurance Companies. The explanatory capacity of the production functions was also increased when the proposed variables were added (net subscribed premiums-net claims incurred)

    A novel population-based local search for nurse rostering problem

    Get PDF
    Population-based approaches regularly are better than single based (local search) approaches in exploring the search space. However, the drawback of population-based approaches is in exploiting the search space. Several hybrid approaches have proven their efficiency through different domains of optimization problems by incorporating and integrating the strength of population and local search approaches. Meanwhile, hybrid methods have a drawback of increasing the parameter tuning. Recently, population-based local search was proposed for a university course-timetabling problem with fewer parameters than existing approaches, the proposed approach proves its effectiveness. The proposed approach employs two operators to intensify and diversify the search space. The first operator is applied to a single solution, while the second is applied for all solutions. This paper aims to investigate the performance of population-based local search for the nurse rostering problem. The INRC2010 database with a dataset composed of 69 instances is used to test the performance of PB-LS. A comparison was made between the performance of PB-LS and other existing approaches in the literature. Results show good performances of proposed approach compared to other approaches, where population-based local search provided best results in 55 cases over 69 instances used in experiments

    Hybrid feature selection method based on particle swarm optimization and adaptive local search method

    Get PDF
    Machine learning has been expansively examined with data classification as the most popularly researched subject. The accurateness of prediction is impacted by the data provided to the classification algorithm. Meanwhile, utilizing a large amount of data may incur costs especially in data collection and preprocessing. Studies on feature selection were mainly to establish techniques that can decrease the number of utilized features (attributes) in classification, also using data that generate accurate prediction is important. Hence, a particle swarm optimization (PSO) algorithm is suggested in the current article for selecting the ideal set of features. PSO algorithm showed to be superior in different domains in exploring the search space and local search algorithms are good in exploiting the search regions. Thus, we propose the hybridized PSO algorithm with an adaptive local search technique which works based on the current PSO search state and used for accepting the candidate solution. Having this combination balances the local intensification as well as the global diversification of the searching process. Hence, the suggested algorithm surpasses the original PSO algorithm and other comparable approaches, in terms of performance

    Memory based cuckoo search algorithm for feature selection of gene expression dataset

    No full text
    Cancer prediction has been shown to be important in the cancer research area. This importance has prompted many researchers to review machine learning-approaches to predict cancer outcome using gene expression dataset. This dataset consists of many genes (features) which can mislead the prediction ability of the machine learning methods, as some features may lead to confusion or inaccurate classification. Since finding the most informative genes for cancer prediction is challenging, feature selection techniques are recommended to pick important and relevant features out of large and complex datasets. In this research, we propose the Cuckoo search method as a feature selection algorithm, guided by the memory-based mechanism to save the most informative features that are identified by the best solutions. The purpose of the memory is to keep track of the selected features at every iteration and find the features that enhance classification accuracy. The suggested algorithm has been contrasted with the original algorithm using microarray datasets and the proposed algorithm has been shown to produce good results as compared to original and contemporary algorithms
    corecore