11,311 research outputs found

    A survey on utilization of data mining approaches for dermatological (skin) diseases prediction

    Get PDF
    Due to recent technology advances, large volumes of medical data is obtained. These data contain valuable information. Therefore data mining techniques can be used to extract useful patterns. This paper is intended to introduce data mining and its various techniques and a survey of the available literature on medical data mining. We emphasize mainly on the application of data mining on skin diseases. A categorization has been provided based on the different data mining techniques. The utility of the various data mining methodologies is highlighted. Generally association mining is suitable for extracting rules. It has been used especially in cancer diagnosis. Classification is a robust method in medical mining. In this paper, we have summarized the different uses of classification in dermatology. It is one of the most important methods for diagnosis of erythemato-squamous diseases. There are different methods like Neural Networks, Genetic Algorithms and fuzzy classifiaction in this topic. Clustering is a useful method in medical images mining. The purpose of clustering techniques is to find a structure for the given data by finding similarities between data according to data characteristics. Clustering has some applications in dermatology. Besides introducing different mining methods, we have investigated some challenges which exist in mining skin data

    Socioeconomic characteristics of cancer mortality in the United States of America: a spatial data mining approach

    Get PDF
    Cancer is the second leading cause of death in the United States of America. Though it is generally known that cancer is influenced by environment, its relation to socioeconomic conditions is still widely debated. This research analyzed the spatial distribution of cancer mortalities of breast, colorectal, lung, and prostate, and their associated socioeconomic characteristics using association rule mining technique. The mortality patterns were analyzed at the county and health service area levels that corresponded to the years between 1999 – 2002 and 1988 – 1992, respectively. Distinct socioeconomic characteristics of cancer mortality were revealed by the association rule mining technique. The counties that had very high rates of breast cancer mortality also had very low percent of whites who walked to work; very high rates of colorectal cancer mortality was associated with very low percentage of foreign born population; very high rates of lung cancer mortality was associated with very low percent of whites who walked to work; and counties that had very high prostate cancer mortality rates had a very low percentage of their residents born in the west. The cancer mortality and socioeconomic variables were discretized using equal interval, natural breaks, and quantile discretization methods to analyze the impact discretization techniques have on the cancer mortality and socioeconomic patterns obtained using association rule mining. The three discretization techniques produced patterns that involved different rates of cancer mortality and socioeconomic characteristics. Results of this analysis showed that a 5-class interval natural breaks discretization technique achieved the highest discretization accuracy, while the equal interval method produced association rules that had the highest support value. The research also analyzed the effect of scale on the patterns produced by the association rule technique. At the county level breast and lung cancers associated with mode of transportation to work, whereas colorectal and prostate cancers associated with place of birth. At the health service area level, the association rule with the highest support value among the breast-, colorectal-, and prostate-cancer mortality rates involved a household family characteristics, whereas high lung cancer mortality rates were associated with low educational attainment

    Mining Pure, Strict Epistatic Interactions from High-Dimensional Datasets: Ameliorating the Curse of Dimensionality

    Get PDF
    Background: The interaction between loci to affect phenotype is called epistasis. It is strict epistasis if no proper subset of the interacting loci exhibits a marginal effect. For many diseases, it is likely that unknown epistatic interactions affect disease susceptibility. A difficulty when mining epistatic interactions from high-dimensional datasets concerns the curse of dimensionality. There are too many combinations of SNPs to perform an exhaustive search. A method that could locate strict epistasis without an exhaustive search can be considered the brass ring of methods for analyzing high-dimensional datasets. Methodology/Findings: A SNP pattern is a Bayesian network representing SNP-disease relationships. The Bayesian score for a SNP pattern is the probability of the data given the pattern, and has been used to learn SNP patterns. We identified a bound for the score of a SNP pattern. The bound provides an upper limit on the Bayesian score of any pattern that could be obtained by expanding a given pattern. We felt that the bound might enable the data to say something about the promise of expanding a 1-SNP pattern even when there are no marginal effects. We tested the bound using simulated datasets and semi-synthetic high-dimensional datasets obtained from GWAS datasets. We found that the bound was able to dramatically reduce the search time for strict epistasis. Using an Alzheimer's dataset, we showed that it is possible to discover an interaction involving the APOE gene based on its score because of its large marginal effect, but that the bound is most effective at discovering interactions without marginal effects. Conclusions/Significance: We conclude that the bound appears to ameliorate the curse of dimensionality in high-dimensional datasets. This is a very consequential result and could be pivotal in our efforts to reveal the dark matter of genetic disease risk from high-dimensional datasets. © 2012 Jiang, Neapolitan

    Validation of Results from Knowledge Discovery: Mass Density as a Predictor of Breast Cancer

    Get PDF
    The purpose of our study is to identify and quantify the association between high breast mass density and breast malignancy using inductive logic programming (ILP) and conditional probabilities, and validate this association in an independent dataset. We ran our ILP algorithm on 62,219 mammographic abnormalities. We set the Aleph ILP system to generate 10,000 rules per malignant finding with a recall >5% and precision >25%. Aleph reported the best rule for each malignant finding. A total of 80 unique rules were learned. A radiologist reviewed all rules and identified potentially interesting rules. High breast mass density appeared in 24% of the learned rules. We confirmed each interesting rule by calculating the probability of malignancy given each mammographic descriptor. High mass density was the fifth highest ranked predictor. To validate the association between mass density and malignancy in an independent dataset, we collected data from 180 consecutive breast biopsies performed between 2005 and 2007. We created a logistic model with benign or malignant outcome as the dependent variable while controlling for potentially confounding factors. We calculated odds ratios based on dichomotized variables. In our logistic regression model, the independent predictors high breast mass density (OR 6.6, CI 2.5–17.6), irregular mass shape (OR 10.0, CI 3.4–29.5), spiculated mass margin (OR 20.4, CI 1.9–222.8), and subject age (β = 0.09, p < 0.0001) significantly predicted malignancy. Both ILP and conditional probabilities show that high breast mass density is an important adjunct predictor of malignancy, and this association is confirmed in an independent data set of prospectively collected mammographic findings

    ANN for Predicting DNA Lung Cancer

    Get PDF
    Abstract: Lung cancer is the top reason of cancer-associated deaths globally. Surgery is the typical treatment for early-stage non-small cell lung cancer (NSCLC). Advancement in the knowledge of the biology of non-small cell lung cancer has shown molecular evidence used for systemic cancer therapy aiming metastatic disease, with a significant impact on patients’ overall survival (OS) and eminence of life. Though, a biopsy of overt metastases is an invasive technique restricted to assured positions and not effortlessly satisfactory in the clinic. The examination of peripheral blood samples of cancer patients embodies a new basis of cancer-derived material, recognized as liquid biopsy, and its constituents (circulating tumour cells (CTCS), circulating free DNA (cfDNA), exosomes, and tumour-educated platelets (TEP)) may be gotten from nearly any body liquids. These constituents have shown to imitate features of the status of both the primary and metastatic diseases, aiding the clinicians to go towards a tailored medicine. In this paper, the reasons of lung cancer will be recognized and the risk elements that initiated the increase of infection, for instance Smoking, Disclosure to secondhand smoke, Disclosure to radon gas, Disclosure to asbestos and other compounds, Family past history of lung cancer, and decrease of the spread of disease and approaches of handling and prevention of lung cancer

    Socioeconomic inequality of cancer mortality in the United States: a spatial data mining approach

    Get PDF
    BACKGROUND: The objective of this study was to demonstrate the use of an association rule mining approach to discover associations between selected socioeconomic variables and the four most leading causes of cancer mortality in the United States. An association rule mining algorithm was applied to extract associations between the 1988–1992 cancer mortality rates for colorectal, lung, breast, and prostate cancers defined at the Health Service Area level and selected socioeconomic variables from the 1990 United States census. Geographic information system technology was used to integrate these data which were defined at different spatial resolutions, and to visualize and analyze the results from the association rule mining process. RESULTS: Health Service Areas with high rates of low education, high unemployment, and low paying jobs were found to associate with higher rates of cancer mortality. CONCLUSION: Association rule mining with geographic information technology helps reveal the spatial patterns of socioeconomic inequality in cancer mortality in the United States and identify regions that need further attention

    Machine Learning Approach for Cancer Entities Association and Classification

    Full text link
    According to the World Health Organization (WHO), cancer is the second leading cause of death globally. Scientific research on different types of cancers grows at an ever-increasing rate, publishing large volumes of research articles every year. The insight information and the knowledge of the drug, diagnostics, risk, symptoms, treatments, etc., related to genes are significant factors that help explore and advance the cancer research progression. Manual screening of such a large volume of articles is very laborious and time-consuming to formulate any hypothesis. The study uses the two most non-trivial NLP, Natural Language Processing functions, Entity Recognition, and text classification to discover knowledge from biomedical literature. Named Entity Recognition (NER) recognizes and extracts the predefined entities related to cancer from unstructured text with the support of a user-friendly interface and built-in dictionaries. Text classification helps to explore the insights into the text and simplifies data categorization, querying, and article screening. Machine learning classifiers are also used to build the classification model and Structured Query Languages (SQL) is used to identify the hidden relations that may lead to significant predictions
    corecore