Search CORE

531 research outputs found

Topic Distiller:distilling semantic topics from documents

Author: Moilanen M. (Miika)
Publication venue: University of Oulu
Publication date: 08/05/2019
Field of study

Abstract. This thesis details the design and implementation of a system that can find relevant and latent semantic topics from textual documents. The design of this system, named Topic Distiller, is inspired by research conducted on automatic keyphrase extraction and automatic topic labeling, and it employs entity linking and knowledge bases to reduce text documents to their semantic topics. The Topic Distiller is evaluated using methods and datasets used in information retrieval and automatic keyphrase extraction. On top of the common datasets used in the literature three additional datasets are created to evaluate the system. The evaluation reveals that the Topic Distiller is able to find relevant and latent topics from textual documents, beating the state-of-the-art automatic keyphrase methods in performance when used on news articles and social media posts.Semanttisten aiheiden suodattaminen dokumenteista. Tiivistelmä. Tässä diplomityössä tarkastellaan järjestelmää, joka pystyy löytämään tekstistä relevantteja ja piileviä semanttisia aihealueita, sekä kyseisen järjestelmän suunnittelua ja implementaatiota. Tämän Topic Distiller -järjestelmän suunnittelu ammentaa inspiraatiota automaattisen termintunnistamisen ja automaattisen aiheiden nimeämisen tutkimuksesta sekä hyödyntää automaattista semanttista annotointia ja tietämyskantoja tekstin aihealueiden löytämisessä. Topic Distiller -järjestelmän suorituskykyä mitataan hyödyntämällä kirjallisuudessa paljon käytettyjä automaattisen termintunnistamisen evaluontimenetelmiä ja aineistoja. Näiden yleisten aineistojen lisäksi esittelemme kolme uutta aineistoa, jotka on luotu Topic Distiller -järjestelmän arviointia varten. Evaluointi tuo ilmi, että Topic Distiller kykenee löytämään relevantteja ja piileviä aiheita tekstistä. Se päihittää kirjallisuuden viimeisimmät automaattisen termintunnistamisen menetelmät suorituskyvyssä, kun sitä käytetään uutisartikkelien sekä sosiaalisen median julkaisujen analysointiin

University of Oulu Repository - Jultika

Proceedings of the 3rd Workshop on Domain-Specific Language Design and Implementation (DSLDI 2015)

Author: Erdweg Sebastian
van der Storm Tijs
Publication venue
Publication date: 01/08/2015
Field of study

The goal of the DSLDI workshop is to bring together researchers and practitioners interested in sharing ideas on how DSLs should be designed, implemented, supported by tools, and applied in realistic application contexts. We are both interested in discovering how already known domains such as graph processing or machine learning can be best supported by DSLs, but also in exploring new domains that could be targeted by DSLs. More generally, we are interested in building a community that can drive forward the development of modern DSLs. These informal post-proceedings contain the submitted talk abstracts to the 3rd DSLDI workshop (DSLDI'15), and a summary of the panel discussion on Language Composition

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Knowledgeable Preference Alignment for LLMs in Domain-specific Question Answering

Author: Chen Huajun
Chen Zhuo
Cheng Lei
Fang Yin
Li Fangming
Lu Yanxi
Zhang Wen
Zhang Yichi
Publication venue
Publication date: 11/11/2023
Field of study

Recently, the development of large language models (LLMs) has attracted wide attention in academia and industry. Deploying LLMs to real scenarios is one of the key directions in the current Internet industry. In this paper, we present a novel pipeline to apply LLMs for domain-specific question answering (QA) that incorporates domain knowledge graphs (KGs), addressing an important direction of LLM application. As a real-world application, the content generated by LLMs should be user-friendly to serve the customers. Additionally, the model needs to utilize domain knowledge properly to generate reliable answers. These two issues are the two major difficulties in the LLM application as vanilla fine-tuning can not adequately address them. We think both requirements can be unified as the model preference problem that needs to align with humans to achieve practical application. Thus, we introduce Knowledgeable Preference AlignmenT (KnowPAT), which constructs two kinds of preference set called style preference set and knowledge preference set respectively to tackle the two issues. Besides, we design a new alignment objective to align the LLM preference with human preference, aiming to train a better LLM for real-scenario domain-specific QA to generate reliable and user-friendly answers. Adequate experiments and comprehensive with 15 baseline methods demonstrate that our KnowPAT is an outperforming pipeline for real-scenario domain-specific QA with LLMs. Our code is open-source at https://github.com/zjukg/KnowPAT.Comment: Work in progress. Code is available at https://github.com/zjukg/KnowPA

arXiv.org e-Print Archive

Recommended from our members

The development of a fuzzy expert system to help top decision makers in political and investment domains

Author: Alshayji Sameera
Publication venue: Brunel University, School of Information Systems, Computing and Mathematics
Publication date: 01/01/2012
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and was awarded by Brunel UniversityThe world’s increasing interconnectedness and the recent increase in the number of notable regional and international events pose greater and greater challenges for political decision-making, especially the decision to strengthen bilateral economic relationships between friendly nations. Typically, such critical decisions are influenced by certain factors and variables that are based on heterogeneous and vague information that exists in different domains. A serious problem that the decision-maker faces is the difficulty in building efficient political decision support systems (DSS) with heterogeneous factors. One must take many factors into account, for example, language (natural or human language), the availability, or lack thereof, of precise data (vague information), and possible consequences (rule conclusions). The basic concept is a linguistic variable whose values are words rather than numbers and are therefore closer to human intuition. A common language is thus needed to describe such information which requires human knowledge for interpretation. To achieve robustness and efficiency of interpretation, we need to apply a method that can be used to generate high-level knowledge and information integration. Fuzzy logic is based on natural language and is tolerant of imprecise data. Fuzzy logic’s greatest strength lies in its ability to handle imprecise data, and it is perfectly suited for this situation. In this thesis, we propose to use ontology to integrate the scattered information resources from the political and investment domains. The process started with understanding each concept and extracting key ideas and relationships between sets of information by constructing object paradigm ontology. Re-engineering according to the object-paradigm (OP) provided quality for the developed ontology where conceptualization can provide more expressive, reusable object and temporal ontology. Then fuzzy logic has been integrated with ontology. And a fuzzy ontology membership value that reflects the strength of an inter-concept relationship to represent pairs of concepts across ontology has been consistently used. Each concept is assigned a fixed numerical value representing the concept consistency. Concept consistency is computed as a function of strength of all the relationships associated with the concept. Fuzzy expert systems enable one to weigh the consequences (rule conclusions) of certain choices based on vague information. Rule conclusions follow from rules composed of two parts, the if antecedent (input) and the then consequent (output). With fuzzy expert systems, one uses fuzzy logic toolbox graphical user interface (GUI) tools to build up a fuzzy inference system (FIS) to aid in decision-making. This research includes four main phases to develop a prototype architecture for an intelligent DSS that can help top political decision makers

Brunel University Research Archive

Designing a scalable and multi-institutional deployable cardiovascular data intergration solution

Author: De Mulder W.
Publication venue
Publication date: 23/04/2021
Field of study

Open University of the Netherlands Research Portal

Using semantic technologies to resolve heterogeneity issues in sustainability and disaster management knowledge bases

Author: Ghahremanloo L
Publication venue: RMIT University
Publication date
Field of study

This thesis examines issues of semantic heterogeneity in the domains of sustainability indicators and disaster management. We propose a model that links two domains with the following logic. While disaster management implies a proper and efficient response to a risk that has materialised as a disaster, sustainability can be defined as the preparedness to unexpected situations by applying measurements such as sustainability indicators. As a step to this direction, we investigate how semantic technologies can tackle the issues of heterogeneity in the aforementioned domains. First, we consider approaches to resolve the heterogeneity issues of representing the key concepts of sustainability indicator sets. To develop a knowledge base, we apply the METHONTOLOGY approach to guide the construction of two ontology design candidates: generic and specic. Of the two, the generic design is more abstract, with fewer classes and properties. Documents describing two indicator systems - the Global Reporting Initiative and the Organisation for Economic Co-operation and Development - are used in the design of both candidate ontologies. We then evaluate both ontology designs using the ROMEO approach, to calculate their level of coverage against the seen indicators, as well as against an unseen third indicator set (the United Nations Statistics Division). We also show that use of existing structured approaches like METHONTOLOGY and ROMEO can reduce ambiguity in ontology design and evaluation for domain-level ontologies. It is concluded that where an ontology needs to be designed for both seen and unseen indicator systems, a generic and reusable design is preferable. Second, having addressed the heterogeneity issues at the data level of sustainability indicators in the first phase of the research, we then develop a software for a sustainability reporting framework - Circles of Sustainability - which provides two mechanisms for browsing heterogeneous sustainability indicator sets: a Tabular view and a Circular view. In particular, the generic design of ontology developed during the first phase of the research is applied to this software. Next, we evaluate the overall usefulness and ease of use for the presented software and the associated user interfaces by conducting a user study. The analysis of quantitative and qualitative results of the user study concludes that the Circular view is the preferred interface by most participants for browsing semantic heterogeneous indicators. Third, in the context of disaster management, we present a geotagger method for the OzCrisisTracker application that automatically detects and disambiguates the heterogeneity of georeferences mentioned in the tweets' content with three possibilities: definite, ambiguous and no-location. Our method semantically annotates the tweet components utilising existing and new ontologies. We also concluded that the accuracy of geographic focus of our geotagger is considerably higher than other systems. From a more general perspective the research contributions can be articulated as follows. The knowledge bases developed in this research have been applied to the two domain applications. The thesis therefore demonstrates how semantic technologies, such as ontology design patterns, browsing tools and geocoding, can untangle data representation and navigation issues of semantic heterogeneity in sustainability and disaster management domains

RMIT Research Repository

Recommended from our members

Towards an aspect weaving BPEL engine

Author: Courbis C.
Finkelstein A.
Publication venue
Publication date: 01/01/2004
Field of study

This position paper proposes the use of dynamic aspects and the visitor design pattern to obtain a highly configurable and extensible BPEL engine. Using these two techniques, the core of this infrastructural software can be customised to meet new requirements and add features such as debugging, execution monitoring, or changing to another Web Service selection policy. Additionally, it can easily be extended to cope with customer-specific BPEL extensions. We propose the use of dynamic aspects not only on the engine itself but also on the workflow in order to tackle the problems of Web Service hot deployment and hot fixes to long running processes. In this way, composing aWeb Service "on-the-fly" means weaving its choreography interface into the workflow

City Research Online

UCL Discovery

Understanding Patient Safety Reports via Multi-label Text Classification and Semantic Representation

Author: Liang Chen
Publication venue: DigitalCommons@TMC
Publication date: 01/04/2017
Field of study

Medical errors are the results of problems in health care delivery. One of the key steps to eliminate errors and improve patient safety is through patient safety event reporting. A patient safety report may record a number of critical factors that are involved in the health care when incidents, near misses, and unsafe conditions occur. Therefore, clinicians and risk management can generate actionable knowledge by harnessing useful information from reports. To date, efforts have been made to establish a nationwide reporting and error analysis mechanism. The increasing volume of reports has been driving improvement in quantity measures of patient safety. For example, statistical distributions of errors across types of error and health care settings have been well documented. Nevertheless, a shift to quality measure is highly demanded. In a health care system, errors are likely to occur if one or more components (e.g., procedures, equipment, etc.) that are intrinsically associated go wrong. However, our understanding of what and how these components are connected is limited for at least two reasons. Firstly, the patient safety reports present difficulties in aggregate analysis since they are large in volume and complicated in semantic representation. Secondly, an efficient and clinically valuable mechanism to identify and categorize these components is absent. I strive to make my contribution by investigating the multi-labeled nature of patient safety reports. To facilitate clinical implementation, I propose that machine learning and semantic information of reports, e.g., semantic similarity between terms, can be used to jointly perform automated multi-label classification. My work is divided into three specific aims. In the first aim, I developed a patient safety ontology to enhance semantic representation of patient safety reports. The ontology supports a number of applications including automated text classification. In the second aim, I evaluated multilabel text classification algorithms on patient safety reports. The results demonstrated a list of productive algorithms with balanced predictive power and efficiency. In the third aim, to improve the performance of text classification, I developed a framework for incorporating semantic similarity and kernel-based multi-label text classification. Semantic similarity values produced by different semantic representation models are evaluated in the classification tasks. Both ontology-based and distributional semantic similarity exerted positive influence on classification performance but the latter one shown significant efficiency in terms of the measure of semantic similarity. Our work provides insights into the nature of patient safety reports, that is a report can be labeled by multiple components (e.g., different procedures, settings, error types, and contributing factors) it contains. Multi-labeled reports hold promise to disclose system vulnerabilities since they provide the insight of the intrinsically correlated components of health care systems. I demonstrated the effectiveness and efficiency of the use of automated multi-label text classification embedded with semantic similarity information on patient safety reports. The proposed solution holds potential to incorporate with existing reporting systems, significantly reducing the workload of aggregate report analysis

DigitalCommons@The Texas Medical Center