5 research outputs found

    Breast tumor prediction and feature importance score finding using machine learning algorithms

    Get PDF
    The subject matter of this study is breast tumor prediction and feature importance score finding using machine learning algorithms. The goal of this study was to develop an accurate predictive model for identifying breast tumors and determining the importance of various features in the prediction process.  The tasks undertaken included collecting and preprocessing the Wisconsin Breast Cancer original dataset (WBCD). Dividing the dataset into training and testing sets, training using machine learning algorithms such as Random Forest, Decision Tree (DT), Logistic Regression, Multi-Layer Perceptron, Gradient Boosting Classifier, Gradient Boosting Classifier (GBC), and K-Nearest Neighbors, evaluating the models using performance metrics, and calculating feature importance scores. The methods used involve data collection, preprocessing, model training, and evaluation. The outcomes showed that the Random Forest model is the most reliable predictor with 98.56 % accuracy. A total of 699 instances were found, and 461 instances were reached using data optimization methods. In addition, we ranked the top features from the dataset by feature importance scores to determine how they affect the classification models. Furthermore, it was subjected to a 10-fold cross-validation process for performance analysis and comparison. The conclusions drawn from this study highlight the effectiveness of machine learning algorithms in breast tumor prediction, achieving high accuracy and robust performance metrics. In addition, the analysis of feature importance scores provides valuable insights into the key indicators of breast cancer development. These findings contribute to the field of breast cancer diagnosis and prediction by enhancing early detection and personalized treatment strategies and improving patient outcomes

    Paddynet: An organized dataset of paddy leaves for a smart fertilizer recommendation system

    No full text
    The dataset of Leaf Color Chart (PaddyNet) is publicly unavailable. As far as the author's knowledge, this is the first dataset about paddy leaves based on LCC. This dataset has been generated by collecting images from a particular location such as Sajiali, Dogachia and Shyamnagar at Jashore, Bangladesh. This dataset contains 4 categories of Aman paddy leaves. The leaf images were captured by smart phones. There are 560 images of Aman paddy leaves. The data collection procedure was carried out according to the guidelines of Bangladesh Agricultural Research Institute (BARI). We meticulously categorized the entire dataset with regard to the LCC level and validated the data with the assistance of domain specialists. Hence, the images are analyzed and categorized with standards. The dataset is utilized for recognizing Leaf Color Chart level which will help of farmers recommending nitrogen fertilizer in their paddy fields

    Temporal dynamics and fatality of SARS‐CoV‐2 variants in Bangladesh

    No full text
    Abstract Background and Aims Since the beginning of the SARS‐CoV‐2 pandemic, multiple new variants have emerged posing an increased risk to global public health. This study aimed to investigate SARS‐CoV‐2 variants, their temporal dynamics, infection rate (IFR) and case fatality rate (CFR) in Bangladesh by analyzing the published genomes. Methods We retrieved 6610 complete whole genome sequences of the SARS‐CoV‐2 from the GISAID (Global Initiative on Sharing all Influenza Data) platform from March 2020 to October 2022, and performed different in‐silico bioinformatics analyses. The clade and Pango lineages were assigned by using Nextclade v2.8.1. SARS‐CoV‐2 infections and fatality data were collected from the Institute of Epidemiology Disease Control and Research (IEDCR), Bangladesh. The average IFR was calculated from the monthly COVID‐19 cases and population size while average CFR was calculated from the number of monthly deaths and number of confirmed COVID‐19 cases. Results SARS‐CoV‐2 first emerged in Bangladesh on March 3, 2020 and created three pandemic waves so far. The phylogenetic analysis revealed multiple introductions of SARS‐CoV‐2 variant(s) into Bangladesh with at least 22 Nextstrain clades and 107 Pangolin lineages with respect to the SARS‐CoV‐2 reference genome of Wuhan/Hu‐1/2019. The Delta variant was detected as the most predominant (48.06%) variant followed by Omicron (27.88%), Beta (7.65%), Alpha (1.56%), Eta (0.33%) and Gamma (0.03%) variant. The overall IFR and CFR from circulating variants were 13.59% and 1.45%, respectively. A time‐dependent monthly analysis showed significant variations in the IFR (p = 0.012, Kruskal–Wallis test) and CFR (p = 0.032, Kruskal–Wallis test) throughout the study period. We found the highest IFR (14.35%) in 2020 while Delta (20A) and Beta (20H) variants were circulating in Bangladesh. Remarkably, the highest CFR (1.91%) from SARS‐CoV‐2 variants was recorded in 2021. Conclusion Our findings highlight the importance of genomic surveillance for careful monitoring of variants of concern emergence to interpret correctly their relative IFR and CFR, and thus, for implementation of strengthened public health and social measures to control the spread of the virus. Furthermore, the results of the present study may provide important context for sequence‐based inference in SARS‐CoV‐2 variant(s) evolution and clinical epidemiology beyond Bangladesh

    Phylogenetic diversity and functional potential of the microbial communities along the Bay of Bengal coast

    No full text
    Abstract The Bay of Bengal, the world's largest bay, is bordered by populous countries and rich in resources like fisheries, oil, gas, and minerals, while also hosting diverse marine ecosystems such as coral reefs, mangroves, and seagrass beds; regrettably, its microbial diversity and ecological significance have received limited research attention. Here, we present amplicon (16S and 18S) profiling and shotgun metagenomics data regarding microbial communities from BoB’s eastern coast, viz., Saint Martin and Cox’s Bazar, Bangladesh. From the 16S barcoding data, Proteobacteria appeared to be the dominant phylum in both locations, with Alteromonas, Methylophaga, Anaerospora, Marivita, and Vibrio dominating in Cox’s Bazar and Pseudoalteromonas, Nautella, Marinomonas, Vibrio, and Alteromonas dominating the Saint Martin site. From the 18S barcoding data, Ochrophyta, Chlorophyta, and Protalveolata appeared among the most abundant eukaryotic divisions in both locations, with significantly higher abundance of Choanoflagellida, Florideophycidae, and Dinoflagellata in Cox’s Bazar. The shotgun sequencing data reveals that in both locations, Alteromonas is the most prevalent bacterial genus, closely paralleling the dominance observed in the metabarcoding data, with Methylophaga in Cox’s Bazar and Vibrio in Saint Martin. Functional annotations revealed that the microbial communities in these samples harbor genes for biofilm formation, quorum sensing, xenobiotics degradation, antimicrobial resistance, and a variety of other processes. Together, these results provide the first molecular insight into the functional and phylogenetic diversity of microbes along the BoB coast of Bangladesh. This baseline understanding of microbial community structure and functional potential will be critical for assessing impacts of climate change, pollution, and other anthropogenic disturbances on this ecologically and economically vital bay
    corecore