16 research outputs found

    Data Mining Algorithms Predicting Different Types of Cancer: Integrative Literature Review

    Get PDF
    Based on the World Health Organization, cancer is the second leading cause of death globally and is responsible for an estimated 9.6 million deaths in 2018. Globally, about 1 in 6 deaths is due to cancer, and approximately 70% of deaths from cancer occur in low and middle-income countries. with accelerating developments in technologies and the digitization of healthcare, a lot of cancer\u27s data have been collected, and multiple cancer repositories have been created as a result. cancer has become a data-intensive area of research over the last decade. A large number of researchers have used data mining algorithms in predicting different types of cancer to reduce the cost of tests used to predict different types of cancer, especially in low and middle-income countries. This paper reports on a systematic examination of the literature on data mining algorithms predicting different types of cancer through which we provide a thorough review, analysis, and synthesis of research published in the past 10 years. We follow the systematic literature review methodology to examine theories, problems, methodologies, and major findings of related studies on data mining algorithms predicting cancer that were published between 2009 and 2019. Using thematic analysis, we develop a research taxonomy that summarizes the main algorithms used in the existing research in the field, and we identify the most used data mining algorithms in predicting different types of cancer. In addition, to data mining algorithms used in predicting each type of cancer, as mentioned in the reviewed studies. We also identify the most popular types of cancer that researchers tackled using predictive analytics

    Maximum Likelihood Methods in Biology Revisited with Tools of Computational Intelligence

    Get PDF
    We investigate the problem of identification of genes correlated with the occurrence of diseases in a given population. The classical method of parametric linkage analysis is combined with newer tools and results are achieved on a model problem. This traditional method has advantages over non-parametric methods, but these advantages have been difficult to realize due to their high computational cost. We study a class of Evolutionary Algorithms from the Computational Intelligence literature which are designed to cut such costs considerably for optimization problems. We outline the details of this algorithm, called Particle Swarm Optimization, and present all the equations and parameter values we used to accomplish our optimization. We view this study as a launching point for a wider investigation into the leveraging of computational intelligence tools in the study of complex biological systems

    Genetic Algorithm to Optimize k-Nearest Neighbor Parameter for Benchmarked Medical Datasets Classification

    Get PDF
    Computer assisted medical diagnosis is a major machine learning problem being researched recently. General classifiers learn from the data itself through training process, due to the inexperience of an expert in determining parameters. This research proposes a methodology based on machine learning paradigm. Integrates the search heuristic that is inspired by natural evolution called genetic algorithm with the simplest and the most used learning algorithm, k-nearest Neighbor. The genetic algorithm were used for feature selection and parameter optimization while k-nearest Neighbor were used as a classifier. The proposed method is experimented on five benchmarked medical datasets from University California Irvine Machine Learning Repository and compared with original k-NN and other feature selection algorithm i.e., forward selection, backward elimination and greedy feature selection.  Experiment results show that the proposed method is able to achieve good performance with significant improvement with p value of t-Test is 0.0011

    Cancer-inspired Genomics Mapper Model for the Generation of Synthetic DNA Sequences with Desired Genomics Signatures

    Full text link
    Genome data are crucial in modern medicine, offering significant potential for diagnosis and treatment. Thanks to technological advancements, many millions of healthy and diseased genomes have already been sequenced; however, obtaining the most suitable data for a specific study, and specifically for validation studies, remains challenging with respect to scale and access. Therefore, in silico genomics sequence generators have been proposed as a possible solution. However, the current generators produce inferior data using mostly shallow (stochastic) connections, detected with limited computational complexity in the training data. This means they do not take the appropriate biological relations and constraints, that originally caused the observed connections, into consideration. To address this issue, we propose cancer-inspired genomics mapper model (CGMM), that combines genetic algorithm (GA) and deep learning (DL) methods to tackle this challenge. CGMM mimics processes that generate genetic variations and mutations to transform readily available control genomes into genomes with the desired phenotypes. We demonstrate that CGMM can generate synthetic genomes of selected phenotypes such as ancestry and cancer that are indistinguishable from real genomes of such phenotypes, based on unsupervised clustering. Our results show that CGMM outperforms four current state-of-the-art genomics generators on two different tasks, suggesting that CGMM will be suitable for a wide range of purposes in genomic medicine, especially for much-needed validation studies

    Molecular Prognostic Prediction for Locally Advanced Nasopharyngeal Carcinoma by Support Vector Machine Integrated Approach

    Get PDF
    BACKGROUND:Accurate prognostication of locally advanced nasopharyngeal carcinoma (NPC) will benefit patients for tailored therapy. Here, we addressed this issue by developing a mathematical algorithm based on support vector machine (SVM) through integrating the expression levels of multi-biomarkers. METHODOLOGY/PRINCIPAL FINDINGS:Ninety-seven locally advanced NPC patients in a randomized controlled trial (RCT), consisting of 48 cases serving as training set and 49 cases as testing set of SVM models, with 5-year follow-up were studied. We designed SVM models by selecting the variables from 38 tissue molecular biomarkers, which represent 6 tumorigenesis signaling pathways, and 3 EBV-related serological biomarkers. We designed 3 SVM models to refine prognosis of NPC with 5-year follow-up. The SVM1 displayed highly predictive sensitivity (sensitivity, specificity were 88.0% and 81.9%, respectively) by integrating the expression of 7 molecular biomarkers. The SVM2 model showed highly predictive specificity (sensitivity, specificity were 84.0% and 94.5%, respectively) by grouping the expression level of 12 molecular biomarkers and 3 EBV-related serological biomarkers. The SVM3 model, constructed by combination SVM1 with SVM2, displayed a high predictive capacity (sensitivity, specificity were 88.0% and 90.3%, respectively). We found that 3 SVM models had strong power in classification of prognosis. Moreover, Cox multivariate regression analysis confirmed these 3 SVM models were all the significant independent prognostic model for overall survival in testing set and overall patients. CONCLUSIONS/SIGNIFICANCE:Our SVM prognostic models designed in the RCT displayed strong power in refining patient prognosis for locally advanced NPC, potentially directing future target therapy against the related signaling pathways

    Examining applying high performance genetic data feature selection and classification algorithms for colon cancer diagnosis

    Get PDF
    Background and Objectives: This paper examines the accuracy and efficiency (time complexity) of high performance genetic data feature selection and classification algorithms for colon cancer diagnosis. The need for this research derives from the urgent and increasing need for accurate and efficient algorithms. Colon cancer is a leading cause of death worldwide, hence it is vitally important for the cancer tissues to be expertly identified and classified in a rapid and timely manner, to assure both a fast detection of the disease and to expedite the drug discovery process. Methods: In this research, a three-phase approach was proposed and implemented: Phases One and Two examined the feature selection algorithms and classification algorithms employed separately, and Phase Three examined the performance of the combination of these. Results: It was found from Phase One that the Particle Swarm Optimization (PSO) algorithm performed best with the colon dataset as a feature selection (29 genes selected) and from Phase Two that the Sup- port Vector Machine (SVM) algorithm outperformed other classifications, with an accuracy of almost 86%. It was also found from Phase Three that the combined use of PSO and SVM surpassed other algorithms in accuracy and performance, and was faster in terms of time analysis (94%). Conclusions: It is concluded that applying feature selection algorithms prior to classification algorithms results in better accuracy than when the latter are applied alone. This conclusion is important and significant to industry and society

    Does One Size Fit All? Why Our Genes Show the Need for Tailor-Made Solutions

    Get PDF
    Since the human genome was first sequenced in 2003, millions of consumers and medical professionals have swarmed the field of medical genetics, seeking to peer into the crystal ball and see what their own, or their patients’, futures may hold. Also rushing in are direct-to-consumer genetic testing companies like 23andMe and AncestryDNA, which can circumvent medical privacy laws by offering genetic testing without a medical provider. Medical privacy regulations, such as the Health Information Portability and Accountability Act of 1996 (HIPAA), the Genetic Information Discrimination Act of 2008 (GINA), and those promulgated by the Federal Trade Commission, do not regulate these companies adequately for a litany of reasons. These loopholes and shortcomings in regulation leave American consumers substantially less protected, less medically informed, and in some instances can jeopardize national security. This Note proposes that Congress should enact legislation overhauling the current regulatory regime in at least three ways: (1) the “covered entity” approach should be abandoned and replaced with a data-driven model; (2) the Safe Harbor provision of HIPAA should explicitly exclude genomic data; and (3) consumers should be given a “right to be forgotten” and compel companies to delete their data. These reforms would significantly strengthen consumers’ genetic privacy and give them an escape hatch to safeguard the core of their identity

    Comparing AI Archetypes and Hybrids Using Blackjack

    Get PDF
    The discipline of artificial intelligence (AI) is a diverse field, with a vast variety of philosophies and implementations to consider. This work attempts to compare several of these paradigms as well as their variations and hybrids, using the card game of blackjack as the field of competition. This is done with an automated blackjack emulator, written in Java, which accepts computer-controlled players of various AI philosophies and their variants, training them and finally pitting them against each other in a series of tournaments with customizable rule sets. In order to avoid bias towards any particular implementation, the system treats each group as a team, allowing each team to run optimally and handle their own evolution. The primary AI paradigms examined in this work are rule-based AI and genetic learning, drawing from the philosophies of fuzzy logic and intelligent agents. The rule-based AI teams apply various commonly used algorithms for real-world blackjack, ranging from the basic rules of a dealer to the situational rule of thumb formula suggested to amateurs. The blackjack options of hit, stand, surrender, and double down are supported, but advanced options such as hand splitting and card counting are not examined. Various tests exploring possible configurations of genetic learning systems were devised, implemented, and analyzed. Future work would expand the variety and complexity of the teams, as well as implementing advanced game features
    corecore