20,629 research outputs found

    A novel data-driven robust framework based on machine learning and knowledge graph for disease classification

    Get PDF
    Abstract(#br)As Noncommunicable Diseases (NCDs) are affected or controlled by diverse factors such as age, regionalism, timeliness or seasonality, they are always challenging to be treated accurately, which has impacted on daily life and work of patients. Unfortunately, although a number of researchers have already made some achievements (including clinical or even computer-based) on certain diseases, current situation is eager to be improved via computer technologies such as data mining and Deep Learning. In addition, the progress of NCD research has been hampered by privacy of health and medical data. In this paper, a hierarchical idea has been proposed to study the effects of various factors on diseases, and a data-driven framework named d-DC with good extensibility is presented. d-DC is able to classify the disease according to the occupation on the premise where the disease is occurring in a certain region. During collecting data, we used a combination of personal or family medical records and traditional methods to build a data acquisition model. Not only can it realize automatic collection and replenishment of data, but it can also effectively tackle the cold start problem of the model with relatively few data effectively. The diversity of information gathering includes structured data and unstructured data (such as plain texts, images or videos), which contributes to improve the classification accuracy and new knowledge acquisition. Apart from adopting machine learning methods, d-DC has employed knowledge graph (KG) to classify diseases for the first time. The vectorization of medical texts by using knowledge embedding is a novel consideration in the classification of diseases. When results are singular, the medical expert system was proposed to address inconsistencies through knowledge bases or online experts. The results of d-DC are displayed by using a combination of KG and traditional methods, which intuitively provides a reasonable interpretation to the results (highly descriptive). Experiments show that d-DC achieved the improved accuracy than the other previous methods. Especially, a fusion method called RKRE based on both ResNet and the expert system attained an average correct proportion of 86.95%, which is a good feasibility study in the field of disease classification

    Attribute Interactions in Medical Data Analysis

    Get PDF
    There is much empirical evidence about the success of naive Bayesian classification (NBC) in medical applications of attribute-based machine learning. NBC assumes conditional independence between attributes. In classification, such classifiers sum up the pieces of class-related evidence from individual attributes, independently of other attributes. The performance, however, deteriorates significantly when the “interactions” between attributes become critical. We propose an approach to handling attribute interactions within the framework of “voting” classifiers, such as NBC. We propose an operational test for detecting interactions in learning data and a procedure that takes the detected interactions into account while learning. This approach induces a structuring of the domain of attributes, it may lead to improved classifier’s performance and may provide useful novel information for the domain expert when interpreting the results of learning. We report on its application in data analysis and model construction for the prediction of clinical outcome in hip arthroplasty

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
    • …
    corecore