193,537 research outputs found

    Attribute Selection for Classification

    Get PDF
    The selection of attributes used to construct a classification model is crucial in machine learning, in particular with instance similarity methods. We present a new algorithm to select and rank attributes based on weighing features according to their ability to help class prediction. The algorithm uses the same structure that holds training records for classification. Attribute values and their classes are projected into a one-dimensional space, to account for various degrees of the relationship between them. With the user deciding on the degree of this relation, any of several potential solutions can be used as criterion to determine attribute relevance. This low complexity algorithm increases classification predictive accuracy and also helps to reduce the feature dimension problem

    A Social Network Image Classification Algorithm Based on Multimodal Deep Learning

    Get PDF
    The complex data structure and massive image data of social networks pose a huge challenge to the mining of associations between social information. For accurate classification of social network images, this paper proposes a social network image classification algorithm based on multimodal deep learning. Firstly, a social network association clustering model (SNACM) was established, and used to calculate trust and similarity, which represent the degree of similarity between users. Based on artificial ant colony algorithm, the SNACM was subject to weighted stacking, and the social network image association network was constructed. After that, the social network images of three modes, i.e. RGB (red-green-blue) image, grayscale image, and depth image, were fused. Finally, a three-dimensional neural network (3D NN) was constructed to extract the features of the multimodal social network image. The proposed algorithm was proved valid and accurate through experiments. The research results provide a reference for applying multimodal deep learning to classify the images in other fields

    Improving ICD-based semantic similarity by accounting for varying degrees of comorbidity

    Full text link
    Finding similar patients is a common objective in precision medicine, facilitating treatment outcome assessment and clinical decision support. Choosing widely-available patient features and appropriate mathematical methods for similarity calculations is crucial. International Statistical Classification of Diseases and Related Health Problems (ICD) codes are used worldwide to encode diseases and are available for nearly all patients. Aggregated as sets consisting of primary and secondary diagnoses they can display a degree of comorbidity and reveal comorbidity patterns. It is possible to compute the similarity of patients based on their ICD codes by using semantic similarity algorithms. These algorithms have been traditionally evaluated using a single-term expert rated data set. However, real-word patient data often display varying degrees of documented comorbidities that might impair algorithm performance. To account for this, we present a scale term that considers documented comorbidity-variance. In this work, we compared the performance of 80 combinations of established algorithms in terms of semantic similarity based on ICD-code sets. The sets have been extracted from patients with a C25.X (pancreatic cancer) primary diagnosis and provide a variety of different combinations of ICD-codes. Using our scale term we yielded the best results with a combination of level-based information content, Leacock & Chodorow concept similarity and bipartite graph matching for the set similarities reaching a correlation of 0.75 with our expert's ground truth. Our results highlight the importance of accounting for comorbidity variance while demonstrating how well current semantic similarity algorithms perform.Comment: 11 pages, 6 figures, 1 tabl

    Clustering Mining Algorithm of Internet of Things Database Based on Python Language

    Get PDF
    In order to solve the problems of reading delay in data mining of the Internet of Things database, a clustering mining algorithm of the Internet of Things database based on Python language is proposed. We designed an improved crawler algorithm based on the open-source structure of scratch through Python language, judge the similarity of recruitment data topics in the Internet of Things database through Bayesian classifier, and crawl the recruitment data in the Internet of Things database: the number of keywords in the text space, the degree of keyword extraction, and the number of keyword data in the text space. The time series model is used to eliminate the delay of text features. On this basis, the semi-supervised learning and semi-cluster analysis method is used to construct the corresponding classifier, complete the adaptive classification process of the text data stream and realize the clustering mining of the Internet of Things database based on Python language. The experimental results show that this method has a low reading delay, and can mine the attention, number of posts and click time frequency of the Internet of Things database from which the recruitment data are obtained

    Email grouping method

    Get PDF
    • …
    corecore