11 research outputs found

    A Survey Paper on Ontology-Based Approaches for Semantic Data Mining

    Get PDF
    Semantic Data Mining alludes to the information mining assignments that deliberately consolidate area learning, particularly formal semantics, into the procedure. Numerous exploration endeavors have validated the advantages of fusing area learning in information mining and in the meantime, the expansion of information building has enhanced the group of space learning, particularly formal semantics and Semantic Web ontology. Ontology is an explicit specification of conceptualization and a formal approach to characterize the semantics of information and data. The formal structure of ontology makes it a nature approach to encode area information for the information mining utilization. Here in Semantic information mining ontology can possibly help semantic information mining and how formal semantics in ontologies can be joined into the data mining procedure. DOI: 10.17762/ijritcc2321-8169.16048

    Ontology-Based Social Media Talks Topic Classification (Twitter Case)

    Get PDF
    In the era of digital communication, the use of Twitter as a customer service has been widely encountered. Companies have started to develop strategies around effective use of Twitter, one of which was to identify problems that customers frequently complain about. Twitter, with its straightforward tweet characteristics, will certainly contain sentences with very specific and easily recognizable keywords. These characteristics can be used as a basis for classifying tweets into certain topics. With a help of ontology, classification with keywords can be done automatically. The purpose of this paper is to design an ontology used as a basis for classifying tweets into certain topics related to the 4G telecommunications network in Indonesia and to evaluate performance of proposed classifier model

    Semantic-Based Model Analysis Towards Enhancing Information Values of Process Mining: Case Study of Learning Process Domain

    Get PDF
    Process mining results can be enhanced by adding semantic knowledge to the derived models. Information discovered due to semantic enrichment of the deployed process models can be used to lift process analysis from syntactic level to a more conceptual level. The work in this paper corroborates that semantic-based process mining is a useful technique towards improving the information value of derived models from the large volume of event logs about any process domain. We use a case study of learning process to illustrate this notion. Our goal is to extract streams of event logs from a learning execution environment and describe formats that allows for mining and improved process analysis of the captured data. The approach involves mapping of the resulting learning model derived from mining event data about a learning process by semantically annotating the process elements with concepts they represent in real time using process descriptions languages, and linking them to an ontology specifically designed for representing learning processes. The semantic analysis allows the meaning of the learning objects to be enhanced through the use of property characteristics and classification of discoverable entities, to generate inference knowledge which are used to determine useful learning patterns by means of the Semantic Learning Process Mining (SLPM) algorithm - technically described as Semantic-Fuzzy Miner. To this end, we show how data from learning processes are being extracted, semantically prepared, and transformed into mining executable formats to enable prediction of individual learning patterns through further semantic analysis of the discovered models

    Stroke outcome measurements from electronic medical records : cross-sectional study on the effectiveness of neural and nonneural classifiers

    Get PDF
    Background: With the rapid adoption of electronic medical records (EMRs), there is an ever-increasing opportunity to collect data and extract knowledge from EMRs to support patient-centered stroke management. Objective: This study aims to compare the effectiveness of state-of-the-art automatic text classification methods in classifying data to support the prediction of clinical patient outcomes and the extraction of patient characteristics from EMRs. Methods: Our study addressed the computational problems of information extraction and automatic text classification. We identified essential tasks to be considered in an ischemic stroke value-based program. The 30 selected tasks were classified (manually labeled by specialists) according to the following value agenda: tier 1 (achieved health care status), tier 2 (recovery process), care related (clinical management and risk scores), and baseline characteristics. The analyzed data set was retrospectively extracted from the EMRs of patients with stroke from a private Brazilian hospital between 2018 and 2019. A total of 44,206 sentences from free-text medical records in Portuguese were used to train and develop 10 supervised computational machine learning methods, including state-of-the-art neural and nonneural methods, along with ontological rules. As an experimental protocol, we used a 5-fold cross-validation procedure repeated 6 times, along with subject-wise sampling. A heatmap was used to display comparative result analyses according to the best algorithmic effectiveness (F1 score), supported by statistical significance tests. A feature importance analysis was conducted to provide insights into the results. Results: The top-performing models were support vector machines trained with lexical and semantic textual features, showing the importance of dealing with noise in EMR textual representations. The support vector machine models produced statistically superior results in 71% (17/24) of tasks, with an F1 score >80% regarding care-related tasks (patient treatment location, fall risk, thrombolytic therapy, and pressure ulcer risk), the process of recovery (ability to feed orally or ambulate and communicate), health care status achieved (mortality), and baseline characteristics (diabetes, obesity, dyslipidemia, and smoking status). Neural methods were largely outperformed by more traditional nonneural methods, given the characteristics of the data set. Ontological rules were also effective in tasks such as baseline characteristics (alcoholism, atrial fibrillation, and coronary artery disease) and the Rankin scale. The complementarity in effectiveness among models suggests that a combination of models could enhance the results and cover more tasks in the future. Conclusions: Advances in information technology capacity are essential for scalability and agility in measuring health status outcomes. This study allowed us to measure effectiveness and identify opportunities for automating the classification of outcomes of specific tasks related to clinical conditions of stroke victims, and thus ultimately assess the possibility of proactively using these machine learning techniques in real-world situations

    A Knowledge-Based Topic Modeling Approach for Automatic Topic Labeling

    Get PDF
    Probabilistic topic models, which aim to discover latent topics in text corpora define each document as a multinomial distributions over topics and each topic as a multinomial distributions over words. Although, humans can infer a proper label for each topic by looking at top representative words of the topic but, it is not applicable for machines. Automatic Topic Labeling techniques try to address the problem. The ultimate goal of topic labeling techniques are to assign interpretable labels for the learned topics. In this paper, we are taking concepts of ontology into consideration instead of words alone to improve the quality of generated labels for each topic. Our work is different in comparison with the previous efforts in this area, where topics are usually represented with a batch of selected words from topics. We have highlighted some aspects of our approach including: 1) we have incorporated ontology concepts with statistical topic modeling in a unified framework, where each topic is a multinomial probability distribution over the concepts and each concept is represented as a distribution over words; and 2) a topic labeling model according to the meaning of the concepts of the ontology included in the learned topics. The best topic labels are selected with respect to the semantic similarity of the concepts and their ontological categorizations. We demonstrate the effectiveness of considering ontological concepts as richer aspects between topics and words by comprehensive experiments on two different data sets. In another word, representing topics via ontological concepts shows an effective way for generating descriptive and representative labels for the discovered topics

    KDC: uma abordagem baseada em conhecimento para classificação de documentos

    Get PDF
    Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Ciência da Computação, Florianópolis, 2015.Classificação de documentos fornece um meio para organizar as informações, permitindo uma melhor compreensão e interpretação dos dados. A tarefa de classificar é caracterizada pela associação de rótulos de classes a documentos com o objetivo de criar agrupamentos semânticos. O aumento exponencial no número de documentos e dados digitais demanda formas mais precisas, abrangentes e eficientes para busca e organização de informações. Nesse contexto, o aprimoramento de técnicas de classificação de documentos com o uso de informação semântica é considerado essencial. Sendo assim, este trabalho propõe uma abordagem baseada em conhecimento para a classificação de documentos. A técnica utiliza termos extraídos de documentos associando-os a conceitos de uma base de conhecimento de domínio aberto. Em seguida, os conceitos são generalizados a um nível maior de abstração. Por fim, é calculado um valor de disparidade entre os conceitos generalizados e o documento, sendo o conceito de menor disparidade considerado como rótulo de classe aplicável ao documento. A aplicação da técnica proposta oferece vantagens sobre os métodos convencionais como a ausência da necessidade de treinamento, a oportunidade de atribuir uma ou múltiplas classes a um documento e a capacidade de aplicação em diferentes temas de classificação sem a necessidade de alterar o classificador.Abstract : Document classification provides a way to organize information, providing a better way to understand available data. The classification task is characterized by the association of class labels to documents, aiming to create semantic clusters. The exponential increase in the number of documents and digital data demands for more precise, comprehensive and efficient ways to search and organize information. In this context, the improvement of document classification techniques using semantic information is considered essential. Thus, this paper proposes a knowledge-based approach for the classification of documents. The technique uses terms extracted from documents in association with concepts of an open domain knowledge base. Then, the concepts are generalized to a higher level of abstraction. Finally a disparity value between generalized concepts and the document is calculated, and the best ranked concept is then considered as a class label applicable to the document. The application of the proposed technique offers advantages over conventional methods including no need for training, the choice to assign one or multiple classes to a document and the capacity to classify over different subjects without the need to change the classifier

    Ontology-Based Text Classification into Dynamically Defined Topics

    No full text

    Modelos para automatização de análises de desfechos clínico-assistenciais

    Get PDF
    Resumo não disponíve

    Semantically aware hierarchical Bayesian network model for knowledge discovery in data : an ontology-based framework

    Get PDF
    Several mining algorithms have been invented over the course of recent decades. However, many of the invented algorithms are confined to generating frequent patterns and do not illustrate how to act upon them. Hence, many researchers have argued that existing mining algorithms have some limitations with respect to performance and workability. Quantity and quality are the main limitations of the existing mining algorithms. While quantity states that the generated patterns are abundant, quality indicates that they cannot be integrated into the business domain seamlessly. Consequently, recent research has suggested that the limitations of the existing mining algorithms are the result of treating the mining process as an isolated and autonomous data-driven trial-and-error process and ignoring the domain knowledge. Accordingly, the integration of domain knowledge into the mining process has become the goal of recent data mining algorithms. Domain knowledge can be represented using various techniques. However, recent research has stated that ontology is the natural way to represent knowledge for data mining use. The structural nature of ontology makes it a very strong candidate for integrating domain knowledge with data mining algorithms. It has been claimed that ontology can play the following roles in the data mining process: •Bridging the semantic gap. •Providing prior knowledge and constraints. •Formally representing the DM results. Despite the fact that a variety of research has used ontology to enrich different tasks in the data mining process, recent research has revealed that the process of developing a framework that systematically consolidates ontology and the mining algorithms in an intelligent mining environment has not been realised. Hence, this thesis proposes an automatic, systematic and flexible framework that integrates the Hierarchical Bayesian Network (HBN) and domain ontology. The ultimate aim of this thesis is to propose a data mining framework that implicitly caters for the underpinning domain knowledge and eventually leads to a more intelligent and accurate mining process. To a certain extent the proposed mining model will simulate the cognitive system in the human being. The similarity between ontology, the Bayesian Network (BN) and bioinformatics applications establishes a strong connection between these research disciplines. This similarity can be summarised in the following points: •Both ontology and BN have a graphical-based structure. •Biomedical applications are known for their uncertainty. Likewise, BN is a powerful tool for reasoning under uncertainty. •The medical data involved in biomedical applications is comprehensive and ontology is the right model for representing comprehensive data. Hence, the proposed ontology-based Semantically Aware Hierarchical Bayesian Network (SAHBN) is applied to eight biomedical data sets in the field of predicting the effect of the DNA repair gene in the human ageing process and the identification of hub protein. Consequently, the performance of SAHBN was compared with existing Bayesian-based classification algorithms. Overall, SAHBN demonstrated a very competitive performance. The contribution of this thesis can be summarised in the following points. •Proposed an automatic, systematic and flexible framework to integrate ontology and the HBN. Based on the literature review, and to the best of our knowledge, no such framework has been proposed previously. •The complexity of learning HBN structure from observed data is significant. Hence, the proposed SAHBN model utilized the domain knowledge in the form of ontology to overcome this challenge. •The proposed SAHBN model preserves the advantages of both ontology and Bayesian theory. It integrates the concept of Bayesian uncertainty with the deterministic nature of ontology without extending ontology structure and adding probability-specific properties that violate the ontology standard structure. •The proposed SAHBN utilized the domain knowledge in the form of ontology to define the semantic relationships between the attributes involved in the mining process, guides the HBN structure construction procedure, checks the consistency of the training data set and facilitates the calculation of the associated conditional probability tables (CPTs). •The proposed SAHBN model lay out a solid foundation to integrate other semantic relations such as equivalent, disjoint, intersection and union
    corecore