46 research outputs found
Knowing what to believe (when you already know something
Although much work in NLP has focused on simply determining what a document means, we also must know whether or not to believe it. Fact-finding algorithms attempt to identify the “truth ” among competing claims in a corpus, but fail to take advantage of the user’s prior knowledge and presume that truth itself is universal and objective rather than subjective. We introduce a framework for incorporating prior knowledge into any factfinding algorithm, expressing both general “common-sense ” reasoning and specific facts already known to the user as first-order logic and translating this into a tractable linear program. As our results show, this approach scales well to even large problems, both reducing error and allowing the system to determine truth respective to the user rather than the majority. Additionally, we introduce three new fact-finding algorithms capable of outperforming existing fact-finders in many of our experiments.
People on Drugs: Credibility of User Statements in Health Communities
Online health communities are a valuable source of information for patients
and physicians. However, such user-generated resources are often plagued by
inaccuracies and misinformation. In this work we propose a method for
automatically establishing the credibility of user-generated medical statements
and the trustworthiness of their authors by exploiting linguistic cues and
distant supervision from expert sources. To this end we introduce a
probabilistic graphical model that jointly learns user trustworthiness,
statement credibility, and language objectivity. We apply this methodology to
the task of extracting rare or unknown side-effects of medical drugs --- this
being one of the problems where large scale non-expert data has the potential
to complement expert medical knowledge. We show that our method can reliably
extract side-effects and filter out false statements, while identifying
trustworthy users that are likely to contribute valuable medical information
MedTruth: A Semi-supervised Approach to Discovering Knowledge Condition Information from Multi-Source Medical Data
Knowledge Graph (KG) contains entities and the relations between entities.
Due to its representation ability, KG has been successfully applied to support
many medical/healthcare tasks. However, in the medical domain, knowledge holds
under certain conditions. For example, symptom \emph{runny nose} highly
indicates the existence of disease \emph{whooping cough} when the patient is a
baby rather than the people at other ages. Such conditions for medical
knowledge are crucial for decision-making in various medical applications,
which is missing in existing medical KGs. In this paper, we aim to discovery
medical knowledge conditions from texts to enrich KGs.
Electronic Medical Records (EMRs) are systematized collection of clinical
data and contain detailed information about patients, thus EMRs can be a good
resource to discover medical knowledge conditions. Unfortunately, the amount of
available EMRs is limited due to reasons such as regularization. Meanwhile, a
large amount of medical question answering (QA) data is available, which can
greatly help the studied task. However, the quality of medical QA data is quite
diverse, which may degrade the quality of the discovered medical knowledge
conditions. In the light of these challenges, we propose a new truth discovery
method, MedTruth, for medical knowledge condition discovery, which incorporates
prior source quality information into the source reliability estimation
procedure, and also utilizes the knowledge triple information for trustworthy
information computation. We conduct series of experiments on real-world medical
datasets to demonstrate that the proposed method can discover meaningful and
accurate conditions for medical knowledge by leveraging both EMR and QA data.
Further, the proposed method is tested on synthetic datasets to validate its
effectiveness under various scenarios.Comment: Accepted as CIKM2019 long pape
Content Modelling for unbiased Information Analysis
Content is the form through which the information is conveyed as per the requirement of user. A volume of content is huge and expected to grow exponentially hence classification of useful data and not useful data is a very tedious task. Interface between content and user is Search engine. Therefore, the contents are designed considering search engine\u27s perspective. Content designed by the organization, utilizes user’s data for promoting their products and services. This is done mostly using inorganic ways utilized to influence the quality measures of a content, this may mislead the information. There is no correct mechanism available to analyse and disseminate the data. The gap between Actual results displayed to the user and results expected by the user can be minimized by introducing the quality check for the parameter to assess the quality of content. This may help to ensure the quality of content and popularity will not be allowed to precede quality of content. Social networking sites will help in doing the user modelling so that the qualitative dissemination of content can be validated
A Bayesian Approach to Discovering Truth from Conflicting Sources for Data Integration
In practical data integration systems, it is common for the data sources
being integrated to provide conflicting information about the same entity.
Consequently, a major challenge for data integration is to derive the most
complete and accurate integrated records from diverse and sometimes conflicting
sources. We term this challenge the truth finding problem. We observe that some
sources are generally more reliable than others, and therefore a good model of
source quality is the key to solving the truth finding problem. In this work,
we propose a probabilistic graphical model that can automatically infer true
records and source quality without any supervision. In contrast to previous
methods, our principled approach leverages a generative process of two types of
errors (false positive and false negative) by modeling two different aspects of
source quality. In so doing, ours is also the first approach designed to merge
multi-valued attribute types. Our method is scalable, due to an efficient
sampling-based inference algorithm that needs very few iterations in practice
and enjoys linear time complexity, with an even faster incremental variant.
Experiments on two real world datasets show that our new method outperforms
existing state-of-the-art approaches to the truth finding problem.Comment: VLDB201