28,743 research outputs found

    Blog Analysis with Fuzzy TFIDF

    Get PDF
    These days blogs are becoming increasingly popular because it allows anyone to share their personal diary, opinions, and comments on the World Wide Wed. Many blogs contain valuable information, but it is a difficult task to extract this information from a high number of blog comments. The goal is to analyze a high number of blog comments by clustering all blog comments by their similarity based on keyword relevance into smaller groups. TF-IDF weight has been used in classifying documents by measuring appearance frequency of each keyword in a document, but it is not effective in differentiating semantic similarities between words. By applying fuzzy semantic to TF-IDF, TF-IDF becomes fuzzy TF-IDF and has the ability to rank semantic relevancy. Fuzzy VSM can be effective in exploring hidden relationship between blog comments by adapting fuzzy TF-IDF and fuzzy semantic for extending Vector Space Model to fuzzy VSM. Therefore, fuzzy VSM can cluster a high number of blog comments into small number of groups based on document similarity and semantic relevancy

    Mapping Big Data into Knowledge Space with Cognitive Cyber-Infrastructure

    Full text link
    Big data research has attracted great attention in science, technology, industry and society. It is developing with the evolving scientific paradigm, the fourth industrial revolution, and the transformational innovation of technologies. However, its nature and fundamental challenge have not been recognized, and its own methodology has not been formed. This paper explores and answers the following questions: What is big data? What are the basic methods for representing, managing and analyzing big data? What is the relationship between big data and knowledge? Can we find a mapping from big data into knowledge space? What kind of infrastructure is required to support not only big data management and analysis but also knowledge discovery, sharing and management? What is the relationship between big data and science paradigm? What is the nature and fundamental challenge of big data computing? A multi-dimensional perspective is presented toward a methodology of big data computing.Comment: 59 page

    Mining health knowledge graph for health risk prediction

    Get PDF
    Nowadays classification models have been widely adopted in healthcare, aiming at supporting practitioners for disease diagnosis and human error reduction. The challenge is utilising effective methods to mine real-world data in the medical domain, as many different models have been proposed with varying results. A large number of researchers focus on the diversity problem of real-time data sets in classification models. Some previous works developed methods comprising of homogeneous graphs for knowledge representation and then knowledge discovery. However, such approaches are weak in discovering different relationships among elements. In this paper, we propose an innovative classification model for knowledge discovery from patients’ personal health repositories. The model discovers medical domain knowledge from the massive data in the National Health and Nutrition Examination Survey (NHANES). The knowledge is conceptualised in a heterogeneous knowledge graph. On the basis of the model, an innovative method is developed to help uncover potential diseases suffered by people and, furthermore, to classify patients’ health risk. The proposed model is evaluated by comparison to a baseline model also built on the NHANES data set in an empirical experiment. The performance of proposed model is promising. The paper makes significant contributions to the advancement of knowledge in data mining with an innovative classification model specifically crafted for domain-based data. In addition, by accessing the patterns of various observations, the research contributes to the work of practitioners by providing a multifaceted understanding of individual and public health
    • …
    corecore