Search CORE

1,088 research outputs found

Doctor of Philosophy

Author: Bui Duy Duc an
Publication venue: University of Utah
Publication date: 01/01/2015
Field of study

dissertationMedical knowledge learned in medical school can become quickly outdated given the tremendous growth of the biomedical literature. It is the responsibility of medical practitioners to continuously update their knowledge with recent, best available clinical evidence to make informed decisions about patient care. However, clinicians often have little time to spend on reading the primary literature even within their narrow specialty. As a result, they often rely on systematic evidence reviews developed by medical experts to fulfill their information needs. At the present, systematic reviews of clinical research are manually created and updated, which is expensive, slow, and unable to keep up with the rapidly growing pace of medical literature. This dissertation research aims to enhance the traditional systematic review development process using computer-aided solutions. The first study investigates query expansion and scientific quality ranking approaches to enhance literature search on clinical guideline topics. The study showed that unsupervised methods can improve retrieval performance of a popular biomedical search engine (PubMed). The proposed methods improve the comprehensiveness of literature search and increase the ratio of finding relevant studies with reduced screening effort. The second and third studies aim to enhance the traditional manual data extraction process. The second study developed a framework to extract and classify texts from PDF reports. This study demonstrated that a rule-based multipass sieve approach is more effective than a machine-learning approach in categorizing document-level structures and iv that classifying and filtering publication metadata and semistructured texts enhances the performance of an information extraction system. The proposed method could serve as a document processing step in any text mining research on PDF documents. The third study proposed a solution for the computer-aided data extraction by recommending relevant sentences and key phrases extracted from publication reports. This study demonstrated that using a machine-learning classifier to prioritize sentences for specific data elements performs equally or better than an abstract screening approach, and might save time and reduce errors in the full-text screening process. In summary, this dissertation showed that there are promising opportunities for technology enhancement to assist in the development of systematic reviews. In this modern age when computing resources are getting cheaper and more powerful, the failure to apply computer technologies to assist and optimize the manual processes is a lost opportunity to improve the timeliness of systematic reviews. This research provides methodologies and tests hypotheses, which can serve as the basis for further large-scale software engineering projects aimed at fully realizing the prospect of computer-aided systematic reviews

The University of Utah: J. Willard Marriott Digital Library

Recommended from our members

Ranking for Scalable Information Extraction

Author: Barrio Gonzalez Pablo Javier
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2015
Field of study

Information extraction systems are complex software tools that discover structured information in natural language text. For instance, an information extraction system trained to extract tuples for an Occurs-in(Natural Disaster, Location) relation may extract the tuple from the sentence: "A tsunami swept the coast of Hawaii." Having information in structured form enables more sophisticated querying and data mining than what is possible over the natural language text. Unfortunately, information extraction is a time-consuming task. For example, a state-of-the-art information extraction system to extract Occurs-in tuples may take up to two hours to process only 1,000 text documents. Since document collections routinely contain millions of documents or more, improving the efficiency and scalability of the information extraction process over these collections is critical. As a significant step towards this goal, this dissertation presents approaches for (i) enabling the deployment of efficient information extraction systems and (ii) scaling the information extraction process to large volumes of text. To enable the deployment of efficient information extraction systems, we have developed two crucial building blocks for this task. As a first contribution, we have created REEL, a toolkit to easily implement, evaluate, and deploy full-fledged relation extraction systems. REEL, in contrast to existing toolkits, effectively modularizes the key components involved in relation extraction systems and can integrate other long-established text processing and machine learning toolkits. To define a relation extraction system for a new relation and text collection, users only need to specify the desired configuration, which makes REEL a powerful framework for both research and application building. As a second contribution, we have addressed the problem of building representative extraction task-specific document samples from collections, a step often required by approaches for efficient information extraction. Specifically, we devised fully automatic document sampling techniques for information extraction that can produce better-quality document samples than the state-of-the-art sampling strategies; furthermore, our techniques are substantially more efficient than the existing alternative approaches. To scale the information extraction process to large volumes of text, we have developed approaches that address the efficiency and scalability of the extraction process by focusing the extraction effort on the collections, documents, and sentences worth processing for a given extraction task. For collections, we have studied both (adaptations of) state-of-the art approaches for estimating the number of documents in a collection that lead to the extraction of tuples as well as information extraction-specific approaches. Using these estimations we can identify the collections worth processing and ignore the rest, for efficiency. For documents, we have developed an adaptive document ranking approach that relies on learning-to-rank techniques to prioritize the documents that are likely to produce tuples for an extraction task of choice. Our approach revises the (learned) ranking decisions periodically as the extraction process progresses and new characteristics of the useful documents are revealed. Finally, for sentences, we have developed an approach based on the sparse group selection problem that identifies sentences|modeled as groups of words|that best characterize the extraction task. Beyond identifying sentences worth processing, our approach aims at selecting sentences that lead to the extraction of unseen, novel tuples. Our approaches are lightweight and efficient, and dramatically improve the efficiency and scalability of the information extraction process. We can often complete the extraction task by focusing on just a very small fraction of the available text, namely, the text that contains relevant information for the extraction task at hand. Our approaches therefore constitute a substantial step towards efficient and scalable information extraction over large volumes of text

Columbia University Academic Commons

CiteFinder: a System to Find and Rank Medical Citations

Author: Moosavinasab Seyed Soheil
Publication venue: UWM Digital Commons
Publication date: 01/05/2014
Field of study

This thesis presents CiteFinder, a system to find relevant citations for clinicians\u27 written content. Inclusion of citations for clinical information content makes the content more reliable through the provision of scientific articles as references, and enables clinicians to easily update their written content using new information. The proposed approach splits the content into sentences, identifies the sentences that need to be supported with citations by applying classification algorithms, and uses information retrieval and ranking techniques to extract and rank relevant citations from MEDLINE for any given sentence. Additionally, this system extracts snippets from the retrieved articles. We assessed our approach on 3,699 MEDLINE papers on the subject of Heart Failure . We implemented multi-level and weight ranking algorithms to rank the citations. This study shows that using Journal priority and Study Design type significantly improves results obtained with the traditional approach of only using the text of articles, by approximately 63%. We also show that using the full-text, rather than just the abstract text, leads to extraction of higher quality snippets

University of Wisconsin-Milwaukee

A comparison of machine learning techniques for detection of drug target articles

Author: Danger Roxana
Martínez Fernández Paloma
Rosso Paolo
Segura-Bedmar Isabel
Publication venue: 'Elsevier BV'
Publication date: 01/01/2010
Field of study

Important progress in treating diseases has been possible thanks to the identification of drug targets. Drug targets are the molecular structures whose abnormal activity, associated to a disease, can be modified by drugs, improving the health of patients. Pharmaceutical industry needs to give priority to their identification and validation in order to reduce the long and costly drug development times. In the last two decades, our knowledge about drugs, their mechanisms of action and drug targets has rapidly increased. Nevertheless, most of this knowledge is hidden in millions of medical articles and textbooks. Extracting knowledge from this large amount of unstructured information is a laborious job, even for human experts. Drug target articles identification, a crucial first step toward the automatic extraction of information from texts, constitutes the aim of this paper. A comparison of several machine learning techniques has been performed in order to obtain a satisfactory classifier for detecting drug target articles using semantic information from biomedical resources such as the Unified Medical Language System. The best result has been achieved by a Fuzzy Lattice Reasoning classifier, which reaches 98% of ROC area measure.This research paper is supported by Projects TIN2007-67407- C03-01, S-0505/TIC-0267 and MICINN project TEXT-ENTERPRISE 2.0 TIN2009-13391-C04-03 (Plan I + D + i), as well as for the Juan de la Cierva program of the MICINN of SpainPublicad

Elsevier - Publisher Connector

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

Discovery of novel biomarkers and phenotypes by semantic technologies.

Author: Bureeva S
Peregrim D
Sharp ME
Trugenberger CA
Wälti C
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Biomarkers and target-specific phenotypes are important to targeted drug design and individualized medicine, thus constituting an important aspect of modern pharmaceutical research and development. More and more, the discovery of relevant biomarkers is aided by in silico techniques based on applying data mining and computational chemistry on large molecular databases. However, there is an even larger source of valuable information available that can potentially be tapped for such discoveries: repositories constituted by research documents

Crossref

PubMed Central

White Rose Research Online

Content-rich biological network constructed by mining PubMed abstracts

Author: Chen Hao
Sharp Burt M
Publication venue: BioMed Central
Publication date: 01/01/2004
Field of study

BACKGROUND: The integration of the rapidly expanding corpus of information about the genome, transcriptome, and proteome, engendered by powerful technological advances, such as microarrays, and the availability of genomic sequence from multiple species, challenges the grasp and comprehension of the scientific community. Despite the existence of text-mining methods that identify biological relationships based on the textual co-occurrence of gene/protein terms or similarities in abstract texts, knowledge of the underlying molecular connections on a large scale, which is prerequisite to understanding novel biological processes, lags far behind the accumulation of data. While computationally efficient, the co-occurrence-based approaches fail to characterize (e.g., inhibition or stimulation, directionality) biological interactions. Programs with natural language processing (NLP) capability have been created to address these limitations, however, they are in general not readily accessible to the public. RESULTS: We present a NLP-based text-mining approach, Chilibot, which constructs content-rich relationship networks among biological concepts, genes, proteins, or drugs. Amongst its features, suggestions for new hypotheses can be generated. Lastly, we provide evidence that the connectivity of molecular networks extracted from the biological literature follows the power-law distribution, indicating scale-free topologies consistent with the results of previous experimental analyses. CONCLUSIONS: Chilibot distills scientific relationships from knowledge available throughout a wide range of biological domains and presents these in a content-rich graphical format, thus integrating general biomedical knowledge with the specialized knowledge and interests of the user. Chilibot can be accessed free of charge to academic users

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Ethical issues in autologous stem cell transplantation (ASCT) in advanced breast cancer: A systematic literature review

Author: A Gratwohl
A Grunwald
A Mehnert
A Molassiotis
AJ Tarzian
AM Stiggelbout
AM Stiggelbout
Annegret Herrmann-Frank
AR Chapman
B Curbow
B Hofmann
B Hofmann
B Hüsing
BL Conner-Spady
Blue Cross Blue Shield Association
BP Rubin
C Bossemeyer
C Farquhar
C Farquhar
C Fortanier
CC Earle
CM Farquhar
Comité Consultatif National d'Ethique pour les Sciences de la Vie et de la Santé
D Belanger
D Kelly
D Kern
D Van Amerongen
DA Berry
DA Berry
DA Rushing
DA Singer
DB Brushwood
DM Eddy
DT Vogl
E Frick
EJ Freireich
EP Winer
European Group on Ethics in Science and New Technologies
European Network for Health Technology Assessment
Fueloep Scheibler
FX Mahaney
G Caocci
G Gahrton
G Richter
GW Sledge Jr
HEM Van Luijn
HG Welch
Institute for Quality and Efficiency in Health Care
J Bergh
J Erikson
J Mathews
J Peppercorn
J Stephenson
JD Hayes
JP Armand
K Fee
K Gahl
K Sazama
K Smigel
KA Schulman
KG Gervais
KJ Krause
KL Byar
L Brinch
L Scott
LH Jacoby
LJ Rose
LM Lesko
M Ader
M Hagmann
M Holoweiko
M Kamm
M Kettner
M Kletzel
M Mayer
M Van Hoef
MA Andrykowski
MA Andrykowski
MA Andrykowski
MA Weitzner
MJ Hjermstad
MM Mello
MM Yunkap Kwankam
MZ Cohen
N Daniels
NA Wynstra
P Jacobs
P Mazza
PA Lambird
PD Jacobson
PM Rosoff
PN Butow
PWM Johnson
R Horton
R Jones
R Lanza
R Rettig
R Walshe
RB Weiss
RCF Leonard
RE Nuscher Ford
RP McQuellon
S Ackermann
S Davies
S Downs
S Droste
S Droste
S Edwards
S Gottlieb
S Langer
S Sänger
SF Jaggar
Sigrid Droste
SJ Lee
SJ Lee
SMJ Vickberg
Statens Beredning för Utvärdering av Medicinsk Metodik
Swedish Council on Technology Assessment in Health Care
Tanja Krones
U Domann
W Giese
WP Peters
Y Nieto
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

BACKGROUND: An effectiveness assessment on ASCT in locally advanced and metastatic breast cancer identified serious ethical issues associated with this intervention. Our objective was to systematically review these aspects by means of a literature analysis. METHODS: We chose the reflexive Socratic approach as the review method using Hofmann's question list, conducted a comprehensive literature search in biomedical, psychological and ethics bibliographic databases and screened the resulting hits in a 2-step selection process. Relevant arguments were assembled from the included articles, and were assessed and assigned to the question list. Hofmann's questions were addressed by synthesizing these arguments. RESULTS: Of the identified 879 documents 102 included arguments related to one or more questions from Hofmann's question list. The most important ethical issues were the implementation of ASCT in clinical practice on the basis of phase-II trials in the 1990s and the publication of falsified data in the first randomized controlled trials (Bezwoda fraud), which caused significant negative effects on recruiting patients for further clinical trials and the doctor-patient relationship. Recent meta-analyses report a marginal effect in prolonging disease-free survival, accompanied by severe harms, including death. ASCT in breast cancer remains a stigmatized technology. Reported health-related-quality-of-life data are often at high risk of bias in favor of the survivors. Furthermore little attention has been paid to those patients who were dying. CONCLUSIONS: The questions were addressed in different degrees of completeness. All arguments were assignable to the questions. The central ethical dimensions of ASCT could be discussed by reviewing the published literature

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

ZORA

Matching Possible Mitigations to Cyber Threats: A Document-Driven Decision Support Systems Approach

Author: El-Gayar Omar
Liu Jun
Llanso Thomas
Mcneil Martha
Noteboom Cherie
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/01/2020
Field of study

Despite more than a decade of heightened focus on cybersecurity, the threat continues. To address possible impacts, cyber threats must be addressed. Mitigation catalogs exist in practice today, but these do not map mitigations to the specific threats they counter. Currently, mitigations are manually selected by cybersecurity experts (CSE) who are in short supply. To reduce labor and improve repeatability, an automated approach is needed for matching mitigations to cyber threats. This research explores the application of supervised machine learning and text retrieval techniques to automate matching of relevant mitigations to cyber threats where both are expressed as text, resulting in a novel method that combines two techniques: support vector machine classification and latent semantic analysis. In five test cases, the approach demonstrates high recall for known relevant mitigation documents, bolstering confidence that potentially relevant mitigations will not be overlooked. It automatically excludes 97% of non-relevant mitigations, greatly reducing the CSE’s workload over purely manual matching

Crossref

Beadle Scholar at Dakota State University

ScholarSpace at University of Hawai'i at Manoa

AIS Electronic Library (AISeL)