HealthTrust: Assessing the Trustworthiness of Healthcare Information on the Internet

Abstract

As well recognized, healthcare information is growing exponentially and is made more available to public. Frequent users such as medical professionals and patients are highly dependent on the web sources to get the appropriate information promptly. However, the trustworthiness of the information on the web is always questionable due to the fast and augmentative properties of the Internet. Most search engines provide relevant pages to given keywords, but the results might contain some unreliable or biased information. Consequently, a significant challenge associated with the information explosion is to ensure effective use of information. One way to improve the search results is by accurately identifying more trustworthy data. Surprisingly, although trustworthiness of sources is essential for a great number of daily users, not much work has been done for healthcare information sources by far. In this dissertation, I am proposing a new system named HealthTrust, which automatically assesses the trustworthiness of healthcare information over the Internet. In the first phase, an unsupervised clustering using graph topology, on our collection of data is employed. The goal is to identify a relatively larger and reliable set of trusted websites as a seed set without much human efforts. After that, a new ranking algorithm for structure-based assessment is adopted. The basic hypothesis is that trustworthy pages are more likely to link to trustworthy pages. In this way, the original set of positive and negative seeds will propagate over the Web graph. With the credibility-based discriminators, the global scoring is biased towards trusted websites and away from untrusted websites. Next, in the second phase, the content consistency between general healthcare-related webpages and trusted sites is evaluated using information retrieval techniques to evaluate the content-semantics of the webpage with respect to the medical topics. In addition, graph modeling is employed to generate contents-based ranking for each page based on the sentences in the seed pages. Finally, in order to integrate the two components, an iterative approach that integrates the credibility assessments from structure-based and content-based methods to give a final verdict - a HealthTrust score for each webpage is exploited. I demonstrated the first attempt to integrate structure-based and content-based approaches to automatically evaluate the credibility of online healthcare information through HealthTrust and make fundamental contributions to both information retrieval and healthcare informatics communities

    Similar works