5,284 research outputs found
Expertise Style Transfer: A New Task Towards Better Communication between Experts and Laymen
The curse of knowledge can impede communication between experts and laymen.
We propose a new task of expertise style transfer and contribute a manually
annotated dataset with the goal of alleviating such cognitive biases. Solving
this task not only simplifies the professional language, but also improves the
accuracy and expertise level of laymen descriptions using simple words. This is
a challenging task, unaddressed in previous work, as it requires the models to
have expert intelligence in order to modify text with a deep understanding of
domain knowledge and structures. We establish the benchmark performance of five
state-of-the-art models for style transfer and text simplification. The results
demonstrate a significant gap between machine and human performance. We also
discuss the challenges of automatic evaluation, to provide insights into future
research directions. The dataset is publicly available at
https://srhthu.github.io/expertise-style-transfer.Comment: 11 pages, 6 figures; To appear in ACL 202
Recommended from our members
Computational Approaches to Assisting Patients\u27 Medical Comprehension from Electronic Health Records
Patient-centered care has been established as a fundamental approach to improve the quality of health care in a seminal report by the Institute of Medicine published at the start of the century. Improved access to health information and demand for greater transparency contributed to its move into the mainstream. Research has also demonstrated that actively involving patients in the management of their own health can lead to better outcomes, and potentially lower costs. However, despite the efforts in many areas of medicine to embrace patient-centered care, engaging patients is still considered a challenge. One of the barriers is the lack of effective tools to help patients understand their health conditions, options and their consequences.
Patient portals are now widely adopted by hospitals and other healthcare practices to provide patients with the capabilities to view their own Electronic Health Records. They are a rich resource of information for patients. However, the language in the records are generally difficult for patients without training in medicine to understand. Furthermore, the amount of information can often be overwhelming as well. In this work, we propose computational approaches to foster patient engagement from three aspects by exploiting the rich information in the medical records.
First, we design a framework to automatically generate health literacy instruments to measure a patient\u27s literacy levels. This framework exploits readily available large scale corpora to generate instruments in a commonly used test format. Second, we investigate methods that can determine the readability of complex documents such as health records. We propose to rank document readability, instead of assigning a grade level or a pre-defined difficulty category. Lastly, we examine the problem of finding targeted educational materials to facilitate patient comprehension of medical notes. We study methods to formulate effective queries from specialized and long clinical narratives. In addition, we propose a neural network based method to identify medical concepts that are important to patients.
The three aspects of this work address the issues of the overabundance and technical complexity of medical language in health records. We demonstrate that our approaches are effective with various experiments and evaluation metric
Promoting understandability in consumer healt information seach
Nowadays, in the area of Consumer Health Information Retrieval, techniques
and methodologies are still far from being effective in answering complex
health queries. One main challenge comes from the varying and limited
medical knowledge background of consumers; the existing language gap be-
tween non-expert consumers and the complex medical resources confuses
them. So, returning not only topically relevant but also understandable
health information to the user is a significant and practical challenge in this
area.
In this work, the main research goal is to study ways to promote under-
standability in Consumer Health Information Retrieval. To help reaching
this goal, two research questions are issued: (i) how to bridge the existing
language gap; (ii) how to return more understandable documents. Two mod-
ules are designed, each answering one research question. In the first module,
a Medical Concept Model is proposed for use in health query processing;
this model integrates Natural Language Processing techniques into state-of-
the-art Information Retrieval. Moreover, aiming to integrate syntactic and
semantic information, word embedding models are explored as query expan-
sion resources. The second module is designed to learn understandability
from past data; a two-stage learning to rank model is proposed with rank
aggregation methods applied on single field-based ranking models.
These proposed modules are assessed on FIRE’2016 CHIS track data and
CLEF’2016-2018 eHealth IR data collections. Extensive experimental com-
parisons with the state-of-the-art baselines on the considered data collec-
tions confirmed the effectiveness of the proposed approaches: regarding un-
derstandability relevance, the improvement is 11.5%, 9.3% and 16.3% in
RBP, uRBP and uRBPgr evaluation metrics, respectively; in what concerns
to topical relevance, the improvement is 7.8%, 16.4% and 7.6% in P@10,
NDCG@10 and MAP evaluation metrics, respectively; Sumário:
Promoção da Compreensibilidade na Pesquisa de
Informação de Saúde pelo Consumidor
Atualmente as técnicas e metodologias utilizadas na área da Recuperação
de Informação em Saúde estão ainda longe de serem efetivas na resposta
às interrogações colocadas pelo consumidor. Um dos principais desafios é
o variado e limitado conhecimento médico dos consumidores; a lacuna lin-
guística entre os consumidores e os complexos recursos médicos confundem
os consumidores não especializados. Assim, a disponibilização, não apenas
de informação de saúde relevante, mas também compreensível, é um desafio
significativo e prático nesta área.
Neste trabalho, o objetivo é estudar formas de promover a compreensibili-
dade na Recuperação de Informação em Saúde. Para tal, são são levantadas
duas questões de investigação: (i) como diminuir as diferenças de linguagem
existente entre consumidores e recursos médicos; (ii) como recuperar textos
mais compreensíveis. São propostos dois módulos, cada um para respon-
der a uma das questões. No primeiro módulo é proposto um Modelo de
Conceitos Médicos para inclusão no processo da consulta de informação que
integra técnicas de Processamento de Linguagem Natural na Recuperação
de Informação. Mais ainda, com o objetivo de incorporar informação sin-
tática e semântica, são também explorados modelos de word embedding na
expansão de consultas. O segundo módulo é desenhado para aprender a com-
preensibilidade a partir de informação do passado; é proposto um modelo de
learning to rank de duas etapas, com métodos de agregação aplicados sobre
os modelos de ordenação criados com informação de campos específicos dos
documentos.
Os módulos propostos são avaliados nas coleções CHIS do FIRE’2016 e
eHealth do CLEF’2016-2018. Comparações experimentais extensivas real-
izadas com modelos atuais (baselines) confirmam a eficácia das abordagens
propostas: relativamente à relevância da compreensibilidade, obtiveram-se melhorias de 11.5%, 9.3% e 16.3 % nas medidas de avaliação RBP, uRBP e
uRBPgr, respectivamente; no que respeita à relevância dos tópicos recupera-
dos, obtiveram-se melhorias de 7.8%, 16.4% e 7.6% nas medidas de avaliação
P@10, NDCG@10 e MAP, respectivamente
Neural Representations of Concepts and Texts for Biomedical Information Retrieval
Information retrieval (IR) methods are an indispensable tool in the current landscape of exponentially increasing textual data, especially on the Web. A typical IR task involves fetching and ranking a set of documents (from a large corpus) in terms of relevance to a user\u27s query, which is often expressed as a short phrase. IR methods are the backbone of modern search engines where additional system-level aspects including fault tolerance, scale, user interfaces, and session maintenance are also addressed. In addition to fetching documents, modern search systems may also identify snippets within the documents that are potentially most relevant to the input query. Furthermore, current systems may also maintain preprocessed structured knowledge derived from textual data as so called knowledge graphs, so certain types of queries that are posed as questions can be parsed as such; a response can be an output of one or more named entities instead of a ranked list of documents (e.g., what diseases are associated with EGFR mutations? ). This refined setup is often termed as question answering (QA) in the IR and natural language processing (NLP) communities.
In biomedicine and healthcare, specialized corpora are often at play including research articles by scientists, clinical notes generated by healthcare professionals, consumer forums for specific conditions (e.g., cancer survivors network), and clinical trial protocols (e.g., www.clinicaltrials.gov). Biomedical IR is specialized given the types of queries and the variations in the texts are different from that of general Web documents. For example, scientific articles are more formal with longer sentences but clinical notes tend to have less grammatical conformity and are rife with abbreviations. There is also a mismatch between the vocabulary of consumers and the lingo of domain experts and professionals. Queries are also different and can range from simple phrases (e.g., COVID-19 symptoms ) to more complex implicitly fielded queries (e.g., chemotherapy regimens for stage IV lung cancer patients with ALK mutations ). Hence, developing methods for different configurations (corpus, query type, user type) needs more deliberate attention in biomedical IR.
Representations of documents and queries are at the core of IR methods and retrieval methodology involves coming up with these representations and matching queries with documents based on them. Traditional IR systems follow the approach of keyword based indexing of documents (the so called inverted index) and matching query phrases against the document index. It is not difficult to see that this keyword based matching ignores the semantics of texts (synonymy at the lexeme level and entailment at phrase/clause/sentence levels) and this has lead to dimensionality reduction methods such as latent semantic indexing that generally have scale-related concerns; such methods also do not address similarity at the sentence level. Since the resurgence of neural network methods in NLP, the IR field has also moved to incorporate advances in neural networks into current IR methods.
This dissertation presents four specific methodological efforts toward improving biomedical IR. Neural methods always begin with dense embeddings for words and concepts to overcome the limitations of one-hot encoding in traditional NLP/IR. In the first effort, we present a new neural pre-training approach to jointly learn word and concept embeddings for downstream use in applications. In the second study, we present a joint neural model for two essential subtasks of information extraction (IE): named entity recognition (NER) and entity normalization (EN). Our method detects biomedical concept phrases in texts and links them to the corresponding semantic types and entity codes. These first two studies provide essential tools to model textual representations as compositions of both surface forms (lexical units) and high level concepts with potential downstream use in QA. In the third effort, we present a document reranking model that can help surface documents that are likely to contain answers (e.g, factoids, lists) to a question in a QA task. The model is essentially a sentence matching neural network that learns the relevance of a candidate answer sentence to the given question parametrized with a bilinear map. In the fourth effort, we present another document reranking approach that is tailored for precision medicine use-cases. It combines neural query-document matching and faceted text summarization. The main distinction of this effort from previous efforts is to pivot from a query manipulation setup to transforming candidate documents into pseudo-queries via neural text summarization. Overall, our contributions constitute nontrivial advances in biomedical IR using neural representations of concepts and texts
The Role of Vocabulary Mediation to Discover and Represent Relevant Information in Privacy Policies
To date, the effort made by existing vocabularies to provide a shared representation of the data protection domain is not fully exploited. Different natural language processing (NLP) techniques have been applied to the text of privacy policies without, however, taking advantage of existing vocabularies to provide those documents with a shared semantic superstructure. In this paper we show how a recently released domain-specific vocabulary, i.e. the Data Privacy Vocabulary (DPV), can be used to discover, in privacy policies, the information that is relevant with respect to the concepts modelled in the vocabulary itself. We also provide a machine-readable representation of this information to bridge the unstructured textual information to the formal taxonomy modelled in it. This is the first approach to the automatic processing of privacy policies that relies on the DPV, fuelling further investigation on the applicability of existing semantic resources to promote the reuse of information and the interoperability between systems in the data protection domain
Knowledge Graph and Deep Learning-based Text-to-GQL Model for Intelligent Medical Consultation Chatbot
Text-to-GQL (Text2GQL) is a task that converts the user's questions into GQL (Graph Query Language) when a graph database is given. That is a task of semantic parsing that transforms natural language problems into logical expressions, which will bring more efficient direct communication between humans and machines. The existing related work mainly focuses on Text-to-SQL tasks, and there is no available semantic parsing method and data set for the graph database. In order to fill the gaps in this field to serve the medical Human–Robot Interactions (HRI) better, we propose this task and a pipeline solution for the Text2GQL task. This solution uses the Adapter pre-trained by “the linking of GQL schemas and the corresponding utterances" as an external knowledge introduction plug-in. By inserting the Adapter into the language model, the mapping between logical language and natural language can be introduced faster and more directly to better realize the end-to-end human–machine language translation task. In the study, the proposed Text2GQL task model is mainly constructed based on an improved pipeline composed of a Language Model, Pre-trained Adapter plug-in, and Pointer Network. This enables the model to copy objects' tokens from utterances, generate corresponding GQL statements for graph database retrieval, and builds an adjustment mechanism to improve the final output. And the experiments have proved that our proposed method has certain competitiveness on the counterpart datasets (Spider, ATIS, GeoQuery, and 39.net) converted from the Text2SQL task, and the proposed method is also practical in medical scenarios
A survey on the development status and application prospects of knowledge graph in smart grids
With the advent of the electric power big data era, semantic interoperability
and interconnection of power data have received extensive attention. Knowledge
graph technology is a new method describing the complex relationships between
concepts and entities in the objective world, which is widely concerned because
of its robust knowledge inference ability. Especially with the proliferation of
measurement devices and exponential growth of electric power data empowers,
electric power knowledge graph provides new opportunities to solve the
contradictions between the massive power resources and the continuously
increasing demands for intelligent applications. In an attempt to fulfil the
potential of knowledge graph and deal with the various challenges faced, as
well as to obtain insights to achieve business applications of smart grids,
this work first presents a holistic study of knowledge-driven intelligent
application integration. Specifically, a detailed overview of electric power
knowledge mining is provided. Then, the overview of the knowledge graph in
smart grids is introduced. Moreover, the architecture of the big knowledge
graph platform for smart grids and critical technologies are described.
Furthermore, this paper comprehensively elaborates on the application prospects
leveraged by knowledge graph oriented to smart grids, power consumer service,
decision-making in dispatching, and operation and maintenance of power
equipment. Finally, issues and challenges are summarised.Comment: IET Generation, Transmission & Distributio
- …