5 research outputs found

    Modified Genetic Algorithm for Feature Selection and Hyper Parameter Optimization: Case of XGBoost in Spam Prediction

    Full text link
    Recently, spam on online social networks has attracted attention in the research and business world. Twitter has become the preferred medium to spread spam content. Many research efforts attempted to encounter social networks spam. Twitter brought extra challenges represented by the feature space size, and imbalanced data distributions. Usually, the related research works focus on part of these main challenges or produce black-box models. In this paper, we propose a modified genetic algorithm for simultaneous dimensionality reduction and hyper parameter optimization over imbalanced datasets. The algorithm initialized an eXtreme Gradient Boosting classifier and reduced the features space of tweets dataset; to generate a spam prediction model. The model is validated using a 50 times repeated 10-fold stratified cross-validation, and analyzed using nonparametric statistical tests. The resulted prediction model attains on average 82.32\% and 92.67\% in terms of geometric mean and accuracy respectively, utilizing less than 10\% of the total feature space. The empirical results show that the modified genetic algorithm outperforms Chi2Chi^2 and PCAPCA feature selection methods. In addition, eXtreme Gradient Boosting outperforms many machine learning algorithms, including BERT-based deep learning model, in spam prediction. Furthermore, the proposed approach is applied to SMS spam modeling and compared to related works

    Modeling the Telemarketing Process using Genetic Algorithms and Extreme Boosting: Feature Selection and Cost-Sensitive Analytical Approach

    Full text link
    Currently, almost all direct marketing activities take place virtually rather than in person, weakening interpersonal skills at an alarming pace. Furthermore, businesses have been striving to sense and foster the tendency of their clients to accept a marketing offer. The digital transformation and the increased virtual presence forced firms to seek novel marketing research approaches. This research aims at leveraging the power of telemarketing data in modeling the willingness of clients to make a term deposit and finding the most significant characteristics of the clients. Real-world data from a Portuguese bank and national socio-economic metrics are used to model the telemarketing decision-making process. This research makes two key contributions. First, propose a novel genetic algorithm-based classifier to select the best discriminating features and tune classifier parameters simultaneously. Second, build an explainable prediction model. The best-generated classification models were intensively validated using 50 times repeated 10-fold stratified cross-validation and the selected features have been analyzed. The models significantly outperform the related works in terms of class of interest accuracy, they attained an average of 89.07\% and 0.059 in terms of geometric mean and type I error respectively. The model is expected to maximize the potential profit margin at the least possible cost and provide more insights to support marketing decision-making

    Multi-author document decomposition based on authorship

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.Decomposing a document written by more than one author into sentences based on authorship is of great significance due to the increasing demand for plagiarism detection, forensic analysis, civil law (i.e., disputed copyright issues) and intelligence issues that involves disputed anonymous documents. Among the existing studies for document decomposition, some were limited by specific languages, according to topics or restricted to a document of two authors, and their accuracies have big rooms for improvement. In this thesis, we propose novel approaches for decomposition of a multi-author document written in any language disregarding to topics, based on a Naive-Bayesian model and Hidden Markov Model (HMM). The proposed approaches of the Naive-Bayesian model aim to exploit the difference in its posterior probability to improve the performance of decomposition. Two main procedures are proposed based on Naive-Bayesian model, and they are Segment Elicitation procedure and Probability Indication Procedure. The segment elicitation procedure is proposed to form a strong labeled training dataset. The probability indication procedure is developed to improve the purity of the sentence decomposition. The proposed approaches of the HMM strive to exploit the contextual correlation hidden among sentences when determining their authorships. In this thesis, it is for the first time the sequential patterns hidden among document elements is considered for such a problem. To build and learn the HMM, a new unsupervised learning method is proposed to estimate its initial parameters. The proposed frameworks do not require the availability of any information of authors or document's context other than how many authors have contributed to writing the document. The effectiveness of the proposed algorithms is proved using benchmark datasets which are widely used for authorship analysis of documents. Furthermore, scientific papers are used to demonstrate the performance of the proposed approaches on authentic documents. Comparisons with recent state-the-art approaches are also presented to demonstrate the significance of our new ideas and the superior performance of the proposed approaches

    Evaluation of sex hormone profiles and seminal fluid analysis in psoriatic patients and their correlation with psoriasis severity

    No full text
    Background and objective: Psoriasis is a chronic inflammatory skin condition characterized by thick silvery plaques, commonly involving the elbow, knees, lower back, and scalp. Psoriasis also affects the reproductive systems of patients. Males with untreated psoriasis are at risk of impaired fertility due to chronic systemic inflammation, which might affect the hormonal profile and sexual accessory glands. In females, having psoriasis does not affect the chances of getting pregnant. This study aims to assess the effect of psoriasis, as a chronic inflammatory condition, on sex hormone profiles and seminal fluid parameters. Methods: 87 male patients aged 18−50 with psoriasis who fulfilled the inclusion criteria were included in the study and matched with healthy controls. Demographic and clinical data, including age, severity, duration, and body mass index (BMI) were recorded. All patients underwent a complete physical exam, including a skin and andrological exam, in addition to ultrasound scrotum and seminal fluid analysis. Blood sample tests were conducted for a complete hormonal profile, including luteinizing hormone (LH), follicular stimulating hormone (FSH), testosterone, and estradiol. Results: The mean age of the case group was 39.5 ± 5.6 years, and the mean BMI was 24.0 ± 2.2. The mean duration of psoriasis was 6.5 ± 3.5 years. The mean levels of testosterone and LH of cases were lower than those of controls, whereas FSH and estradiol were abnormally higher among case groups. Sperm concentration, normal sperm motility, and normal sperm morphology were also found to be lower than in the case group. Age, psoriasis area, and severity index (PASI) scores were significant predictors of sperm concentration (P = 0.000). The BMI was negatively correlated with sperm concentration (−0.249, P = 0.01), motility (−0.198, P = 0.05), and morphology (−0.205, P = 0.05). A negative correlation was found between the PASI score and sperm concentration (−0.519, P = 0.01). Conclusion: The evaluation of seminal fluid analysis and hormone profiles among psoriasis patients showed marked variability. However, it was evident that the levels of sex hormones and seminal parameters were lower among patients with psoriasis than the healthy controls; this may indicate the possibility of developing sexual dysfunction and infertility among patients with untreated psoriasis. The level of estradiol was found to be abnormally high among psoriasis cases, which may account for a possible compensatory mechanism in ongoing sexual dysfunction among psoriasis patients

    Protecting Digital Images Using Keys Enhanced by 2D Chaotic Logistic Maps

    No full text
    This research paper presents a novel digital color image encryption approach that ensures high-level security while remaining simple and efficient. The proposed method utilizes a composite key r and x of 128-bits to create a small in-dimension private key (a chaotic map), which is then resized to match the color matrix dimension. The proposed method is uncomplicated and can be applied to any image without any modification. Image quality, sensitivity analysis, security analysis, correlation analysis, quality analysis, speed analysis, and attack robustness analysis are conducted to prove the efficiency and security aspects of the proposed method. The speed analysis shows that the proposed method improves the performance of image cryptography by minimizing encryption–decryption time and maximizing the throughput of the process of color cryptography. The results demonstrate that the proposed method provides better throughput than existing methods. Overall, this research paper provides a new approach to digital color image encryption that is highly secure, efficient, and applicable to various images
    corecore