153 research outputs found
Unsupervised Terminological Ontology Learning based on Hierarchical Topic Modeling
In this paper, we present hierarchical relationbased latent Dirichlet
allocation (hrLDA), a data-driven hierarchical topic model for extracting
terminological ontologies from a large number of heterogeneous documents. In
contrast to traditional topic models, hrLDA relies on noun phrases instead of
unigrams, considers syntax and document structures, and enriches topic
hierarchies with topic relations. Through a series of experiments, we
demonstrate the superiority of hrLDA over existing topic models, especially for
building hierarchies. Furthermore, we illustrate the robustness of hrLDA in the
settings of noisy data sets, which are likely to occur in many practical
scenarios. Our ontology evaluation results show that ontologies extracted from
hrLDA are very competitive with the ontologies created by domain experts
Distributional semantic modeling: a revised technique to train term/word vector space models applying the ontology-related approach
We design a new technique for the distributional semantic modeling with a
neural network-based approach to learn distributed term representations (or
term embeddings) - term vector space models as a result, inspired by the recent
ontology-related approach (using different types of contextual knowledge such
as syntactic knowledge, terminological knowledge, semantic knowledge, etc.) to
the identification of terms (term extraction) and relations between them
(relation extraction) called semantic pre-processing technology - SPT. Our
method relies on automatic term extraction from the natural language texts and
subsequent formation of the problem-oriented or application-oriented (also
deeply annotated) text corpora where the fundamental entity is the term
(includes non-compositional and compositional terms). This gives us an
opportunity to changeover from distributed word representations (or word
embeddings) to distributed term representations (or term embeddings). This
transition will allow to generate more accurate semantic maps of different
subject domains (also, of relations between input terms - it is useful to
explore clusters and oppositions, or to test your hypotheses about them). The
semantic map can be represented as a graph using Vec2graph - a Python library
for visualizing word embeddings (term embeddings in our case) as dynamic and
interactive graphs. The Vec2graph library coupled with term embeddings will not
only improve accuracy in solving standard NLP tasks, but also update the
conventional concept of automated ontology development. The main practical
result of our work is the development kit (set of toolkits represented as web
service APIs and web application), which provides all necessary routines for
the basic linguistic pre-processing and the semantic pre-processing of the
natural language texts in Ukrainian for future training of term vector space
models.Comment: In English, 9 pages, 2 figures. Not published yet. Prepared for
special issue (UkrPROG 2020 conference) of the scientific journal "Problems
in programming" (Founder: National Academy of Sciences of Ukraine, Institute
of Software Systems of NAS Ukraine
Patient triage by topic modelling of referral letters: Feasibility study
Background: Musculoskeletal conditions are managed within primary care but patients can be referred to secondary care if a specialist opinion is required. The ever increasing demand of healthcare resources emphasizes the need to streamline care pathways with the ultimate aim of ensuring that patients receive timely and optimal care. Information contained in referral letters underpins the referral decision-making process but is yet to be explored systematically for the purposes of treatment prioritization for musculoskeletal conditions. Objective: This study aims to explore the feasibility of using natural language processing and machine learning to automate triage of patients with musculoskeletal conditions by analyzing information from referral letters. Specifically, we aim to determine whether referral letters can be automatically assorted into latent topics that are clinically relevant, i.e. considered relevant when prescribing treatments. Here, clinical relevance is assessed by posing two research questions. Can latent topics be used to automatically predict the treatment? Can clinicians interpret latent topics as cohorts of patients who share common characteristics or experience such as medical history, demographics and possible treatments? Methods: We used latent Dirichlet allocation to model each referral letter as a finite mixture over an underlying set of topics and model each topic as an infinite mixture over an underlying set of topic probabilities. The topic model was evaluated in the context of automating patient triage. Given a set of treatment outcomes, a binary classifier was trained for each outcome using previously extracted topics as the input features of the machine learning algorithm. In addition, qualitative evaluation was performed to assess human interpretability of topics. Results: The prediction accuracy of binary classifiers outperformed the stratified random classifier by a large margin giving an indication that topic modelling could be used to predict the treatment thus effectively supporting patient triage. Qualitative evaluation confirmed high clinical interpretability of the topic model. Conclusions: The results established the feasibility of using natural language processing and machine learning to automate triage of patients with knee and/or hip pain by analyzing information from their referral letters
A Latent-Dirichlet-Allocation Based Extension for Domain Ontology of Enterprise’s Technological Innovation
This paper proposed a method for building enterprise's technological innovation domain ontology automatically from plain text corpus based on Latent Dirichlet Allocation (LDA). The proposed method consisted of four modules: 1) introducing the seed ontology for domain of enterprise's technological innovation, 2) using Natural Language Processing (NLP) technique to preprocess the collected textual data, 3) mining domain specific terms from document collections based on LDA, 4) obtaining the relationship between the terms through the defined relevant rules. The experiments have been carried out to demonstrate the effectiveness of this method and the results indicated that many terms in domain of enterprise's technological innovation and the semantic relations between terms are discovered. The proposed method is a process of continuously cycles and iterations, that is the obtained objective ontology can be re-iterated as initial seed ontology. The constant knowledge acquisition in the domain of enterprise's technological innovation to update and perfect the initial seed ontology
A case study on sepsis using PubMed and Deep Learning for ontology learning
We investigate the application of distributional semantics models for facilitating unsupervised extraction of biomedical terms from unannotated corpora.Term extraction is used as the first step of an ontology learning process that aims to (semi-)automatic annotation of biomedical concepts and relations from more than 300K PubMed titles and abstracts. We experimented with both traditional distributional semantics methods such as Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA) as well as the neural language models CBOW and Skip-gram from Deep Learning. The evaluation conducted concentrates on sepsis, a major life-threatening condition, and shows that Deep Learning models outperform LSA and LDA with much higher precision
Knowledge Base Enrichment by Relation Learning from Social Tagging Data
There has been considerable interest in transforming unstructured social tagging data into structured knowledge for semantic-based retrieval and recommendation. Research in this line mostly exploits data co-occurrence and often overlooks the complex and ambiguous meanings of tags. Furthermore, there have been few comprehensive evaluation studies regarding the quality of the discovered knowledge. We propose a supervised learning method to discover subsumption relations from tags. The key to this method is quantifying the probabilistic association among tags to better characterise their relations. We further develop an algorithm to organise tags into hierarchies based on the learned relations. Experiments were conducted using a large, publicly available dataset, Bibsonomy, and three popular, human-engineered or data-driven knowledge bases: DBpedia, Microsoft Concept Graph, and ACM Computing Classification System. We performed a comprehensive evaluation using different strategies: relation-level, ontology-level, and knowledge base enrichment based evaluation. The results clearly show that the proposed method can extract knowledge of better quality than the existing methods against the gold standard knowledge bases. The proposed approach can also enrich knowledge bases with new subsumption relations, having the potential to significantly reduce time and human effort for knowledge base maintenance and ontology evolution
A data mining approach to ontology learning for automatic content-related question-answering in MOOCs.
The advent of Massive Open Online Courses (MOOCs) allows massive volume of registrants to enrol in these MOOCs. This research aims to offer MOOCs registrants with automatic content related feedback to fulfil their cognitive needs. A framework is proposed which consists of three modules which are the subject ontology learning module, the short text classification module, and the question answering module. Unlike previous research, to identify relevant concepts for ontology learning a regular expression parser approach is used. Also, the relevant concepts are extracted from unstructured documents. To build the concept hierarchy, a frequent pattern mining approach is used which is guided by a heuristic function to ensure that sibling concepts are at the same level in the hierarchy. As this process does not require specific lexical or syntactic information, it can be applied to any subject. To validate the approach, the resulting ontology is used in a question-answering system which analyses students' content-related questions and generates answers for them. Textbook end of chapter questions/answers are used to validate the question-answering system. The resulting ontology is compared vs. the use of Text2Onto for the question-answering system, and it achieved favourable results. Finally, different indexing approaches based on a subject's ontology are investigated when classifying short text in MOOCs forum discussion data; the investigated indexing approaches are: unigram-based, concept-based and hierarchical concept indexing. The experimental results show that the ontology-based feature indexing approaches outperform the unigram-based indexing approach. Experiments are done in binary classification and multiple labels classification settings . The results are consistent and show that hierarchical concept indexing outperforms both concept-based and unigram-based indexing. The BAGGING and random forests classifiers achieved the best result among the tested classifiers
A Hypergraph Data Model for Expert-Finding in Multimedia Social Networks
Online Social Networks (OSNs) have found widespread applications in every area of our life. A large number of people have signed up to OSN for different purposes, including to meet old friends, to choose a given company, to identify expert users about a given topic, producing a large number of social connections. These aspects have led to the birth of a new generation of OSNs, called Multimedia Social Networks (MSNs), in which user-generated content plays a key role to enable interactions among users. In this work, we propose a novel expert-finding technique exploiting a hypergraph-based data model for MSNs. In particular, some user-ranking measures, obtained considering only particular useful hyperpaths, have been profitably used to evaluate the related expertness degree with respect to a given social topic. Several experiments on Last.FM have been performed to evaluate the proposed approach's effectiveness, encouraging future work in this direction for supporting several applications such as multimedia recommendation, influence analysis, and so on
- …