4,702 research outputs found

    Medical WordNet: A new methodology for the construction and validation of information resources for consumer health

    Get PDF
    A consumer health information system must be able to comprehend both expert and non-expert medical vocabulary and to map between the two. We describe an ongoing project to create a new lexical database called Medical WordNet (MWN), consisting of medically relevant terms used by and intelligible to non-expert subjects and supplemented by a corpus of natural-language sentences that is designed to provide medically validated contexts for MWN terms. The corpus derives primarily from online health information sources targeted to consumers, and involves two sub-corpora, called Medical FactNet (MFN) and Medical BeliefNet (MBN), respectively. The former consists of statements accredited as true on the basis of a rigorous process of validation, the latter of statements which non-experts believe to be true. We summarize the MWN / MFN / MBN project, and describe some of its applications

    Answering clinical questions with knowledge-based and statistical techniques

    Get PDF
    The combination of recent developments in question-answering research and the availability of unparalleled resources developed specifically for automatic semantic processing of text in the medical domain provides a unique opportunity to explore complex question answering in the domain of clinical medicine. This article presents a system designed to satisfy the information needs of physicians practicing evidence-based medicine. We have developed a series of knowledge extractors, which employ a combination of knowledge-based and statistical techniques, for automatically identifying clinically relevant aspects of MEDLINE abstracts. These extracted elements serve as the input to an algorithm that scores the relevance of citations with respect to structured representations of information needs, in accordance with the principles of evidencebased medicine. Starting with an initial list of citations retrieved by PubMed, our system can bring relevant abstracts into higher ranking positions, and from these abstracts generate responses that directly answer physicians ’ questions. We describe three separate evaluations: one focused on the accuracy of the knowledge extractors, one conceptualized as a document reranking task, and finally, an evaluation of answers by two physicians. Experiments on a collection of real-world clinical questions show that our approach significantly outperforms the already competitive PubMed baseline. 1

    Semi-automated Ontology Generation for Biocuration and Semantic Search

    Get PDF
    Background: In the life sciences, the amount of literature and experimental data grows at a tremendous rate. In order to effectively access and integrate these data, biomedical ontologies – controlled, hierarchical vocabularies – are being developed. Creating and maintaining such ontologies is a difficult, labour-intensive, manual process. Many computational methods which can support ontology construction have been proposed in the past. However, good, validated systems are largely missing. Motivation: The biocuration community plays a central role in the development of ontologies. Any method that can support their efforts has the potential to have a huge impact in the life sciences. Recently, a number of semantic search engines were created that make use of biomedical ontologies for document retrieval. To transfer the technology to other knowledge domains, suitable ontologies need to be created. One area where ontologies may prove particularly useful is the search for alternative methods to animal testing, an area where comprehensive search is of special interest to determine the availability or unavailability of alternative methods. Results: The Dresden Ontology Generator for Directed Acyclic Graphs (DOG4DAG) developed in this thesis is a system which supports the creation and extension of ontologies by semi-automatically generating terms, definitions, and parent-child relations from text in PubMed, the web, and PDF repositories. The system is seamlessly integrated into OBO-Edit and Protégé, two widely used ontology editors in the life sciences. DOG4DAG generates terms by identifying statistically significant noun-phrases in text. For definitions and parent-child relations it employs pattern-based web searches. Each generation step has been systematically evaluated using manually validated benchmarks. The term generation leads to high quality terms also found in manually created ontologies. Definitions can be retrieved for up to 78% of terms, child ancestor relations for up to 54%. No other validated system exists that achieves comparable results. To improve the search for information on alternative methods to animal testing an ontology has been developed that contains 17,151 terms of which 10% were newly created and 90% were re-used from existing resources. This ontology is the core of Go3R, the first semantic search engine in this field. When a user performs a search query with Go3R, the search engine expands this request using the structure and terminology of the ontology. The machine classification employed in Go3R is capable of distinguishing documents related to alternative methods from those which are not with an F-measure of 90% on a manual benchmark. Approximately 200,000 of the 19 million documents listed in PubMed were identified as relevant, either because a specific term was contained or due to the automatic classification. The Go3R search engine is available on-line under www.Go3R.org

    The devices, experimental scaffolds, and biomaterials ontology (DEB): a tool for mapping, annotation, and analysis of biomaterials' data

    Get PDF
    The size and complexity of the biomaterials literature makes systematic data analysis an excruciating manual task. A practical solution is creating databases and information resources. Implant design and biomaterials research can greatly benefit from an open database for systematic data retrieval. Ontologies are pivotal to knowledge base creation, serving to represent and organize domain knowledge. To name but two examples, GO, the gene ontology, and CheBI, Chemical Entities of Biological Interest ontology and their associated databases are central resources to their respective research communities. The creation of the devices, experimental scaffolds, and biomaterials ontology (DEB), an open resource for organizing information about biomaterials, their design, manufacture, and biological testing, is described. It is developed using text analysis for identifying ontology terms from a biomaterials gold standard corpus, systematically curated to represent the domain's lexicon. Topics covered are validated by members of the biomaterials research community. The ontology may be used for searching terms, performing annotations for machine learning applications, standardized meta-data indexing, and other cross-disciplinary data exploitation. The input of the biomaterials community to this effort to create data-driven open-access research tools is encouraged and welcomed.Preprin

    64-slice computed tomography angiography in the diagnosis and assessment of coronary artery disease : systematic review and meta-analysis

    Get PDF
    Objective To assess whether 64-slice computed tomography (CT) angiography might replace some coronary angiography (CA) for diagnosis and assessment of coronary artery disease (CAD). Methods We searched electronic databases, conference proceedings and scanned reference lists of included studies. Eligible studies compared 64-slice CT with a reference standard of CA in adults with suspected/known CAD, reporting sensitivity and specificity or true and false positives and negatives. Data were pooled using the hierarchical summary receiver operating characteristic model. Results Forty studies were included; 28 provided sufficient data for inclusion in the meta-analyses, all using a cutoff of ≥ 50% stenosis to define significant CAD. In patient-based detection (n=1286) 64-slice CT pooled sensitivity was 99% (95% credible interval (CrI) 97 to 99%), specificity 89% (95% CrI 83 to 94%), median positive predictive value (PPV) across studies 93% (range 64 to 100%) and negative predictive value (NPV) 100% (range 86 to 100%). In segment-based detection (n=14,199) 64-slice CT pooled sensitivity was 90% (95% CrI 85 to 94%), specificity 97% (95% CrI 95 to 98%), median positive predictive value (PPV) across studies 76% (range 44 to 93%) and negative predictive value (NPV) 99% (range 95 to 100%). Conclusions 64-slice CT is highly sensitive for patient-based detection of CAD and has high NPV. An ability to rule out significant CAD means that it may have a role in the assessment of chest pain, particularly when the diagnosis remains uncertain despite clinical evaluation and simple non-invasive testing.UK National Institute for Health Research Health Technology Assessment programme (project number 06/15/01). The Health Services Research Unit is core funded by the Chief Scientist Office of the Scottish Government Health Directorates.Peer reviewedAuthor versio

    Summarization from Medical Documents: A Survey

    Full text link
    Objective: The aim of this paper is to survey the recent work in medical documents summarization. Background: During the last decade, documents summarization got increasing attention by the AI research community. More recently it also attracted the interest of the medical research community as well, due to the enormous growth of information that is available to the physicians and researchers in medicine, through the large and growing number of published journals, conference proceedings, medical sites and portals on the World Wide Web, electronic medical records, etc. Methodology: This survey gives first a general background on documents summarization, presenting the factors that summarization depends upon, discussing evaluation issues and describing briefly the various types of summarization techniques. It then examines the characteristics of the medical domain through the different types of medical documents. Finally, it presents and discusses the summarization techniques used so far in the medical domain, referring to the corresponding systems and their characteristics. Discussion and conclusions: The paper discusses thoroughly the promising paths for future research in medical documents summarization. It mainly focuses on the issue of scaling to large collections of documents in various languages and from different media, on personalization issues, on portability to new sub-domains, and on the integration of summarization technology in practical applicationsComment: 21 pages, 4 table

    Semi-automated Ontology Generation for Biocuration and Semantic Search

    Get PDF
    Background: In the life sciences, the amount of literature and experimental data grows at a tremendous rate. In order to effectively access and integrate these data, biomedical ontologies – controlled, hierarchical vocabularies – are being developed. Creating and maintaining such ontologies is a difficult, labour-intensive, manual process. Many computational methods which can support ontology construction have been proposed in the past. However, good, validated systems are largely missing. Motivation: The biocuration community plays a central role in the development of ontologies. Any method that can support their efforts has the potential to have a huge impact in the life sciences. Recently, a number of semantic search engines were created that make use of biomedical ontologies for document retrieval. To transfer the technology to other knowledge domains, suitable ontologies need to be created. One area where ontologies may prove particularly useful is the search for alternative methods to animal testing, an area where comprehensive search is of special interest to determine the availability or unavailability of alternative methods. Results: The Dresden Ontology Generator for Directed Acyclic Graphs (DOG4DAG) developed in this thesis is a system which supports the creation and extension of ontologies by semi-automatically generating terms, definitions, and parent-child relations from text in PubMed, the web, and PDF repositories. The system is seamlessly integrated into OBO-Edit and Protégé, two widely used ontology editors in the life sciences. DOG4DAG generates terms by identifying statistically significant noun-phrases in text. For definitions and parent-child relations it employs pattern-based web searches. Each generation step has been systematically evaluated using manually validated benchmarks. The term generation leads to high quality terms also found in manually created ontologies. Definitions can be retrieved for up to 78% of terms, child ancestor relations for up to 54%. No other validated system exists that achieves comparable results. To improve the search for information on alternative methods to animal testing an ontology has been developed that contains 17,151 terms of which 10% were newly created and 90% were re-used from existing resources. This ontology is the core of Go3R, the first semantic search engine in this field. When a user performs a search query with Go3R, the search engine expands this request using the structure and terminology of the ontology. The machine classification employed in Go3R is capable of distinguishing documents related to alternative methods from those which are not with an F-measure of 90% on a manual benchmark. Approximately 200,000 of the 19 million documents listed in PubMed were identified as relevant, either because a specific term was contained or due to the automatic classification. The Go3R search engine is available on-line under www.Go3R.org

    Doctor of Philosophy

    Get PDF
    dissertationMedical knowledge learned in medical school can become quickly outdated given the tremendous growth of the biomedical literature. It is the responsibility of medical practitioners to continuously update their knowledge with recent, best available clinical evidence to make informed decisions about patient care. However, clinicians often have little time to spend on reading the primary literature even within their narrow specialty. As a result, they often rely on systematic evidence reviews developed by medical experts to fulfill their information needs. At the present, systematic reviews of clinical research are manually created and updated, which is expensive, slow, and unable to keep up with the rapidly growing pace of medical literature. This dissertation research aims to enhance the traditional systematic review development process using computer-aided solutions. The first study investigates query expansion and scientific quality ranking approaches to enhance literature search on clinical guideline topics. The study showed that unsupervised methods can improve retrieval performance of a popular biomedical search engine (PubMed). The proposed methods improve the comprehensiveness of literature search and increase the ratio of finding relevant studies with reduced screening effort. The second and third studies aim to enhance the traditional manual data extraction process. The second study developed a framework to extract and classify texts from PDF reports. This study demonstrated that a rule-based multipass sieve approach is more effective than a machine-learning approach in categorizing document-level structures and iv that classifying and filtering publication metadata and semistructured texts enhances the performance of an information extraction system. The proposed method could serve as a document processing step in any text mining research on PDF documents. The third study proposed a solution for the computer-aided data extraction by recommending relevant sentences and key phrases extracted from publication reports. This study demonstrated that using a machine-learning classifier to prioritize sentences for specific data elements performs equally or better than an abstract screening approach, and might save time and reduce errors in the full-text screening process. In summary, this dissertation showed that there are promising opportunities for technology enhancement to assist in the development of systematic reviews. In this modern age when computing resources are getting cheaper and more powerful, the failure to apply computer technologies to assist and optimize the manual processes is a lost opportunity to improve the timeliness of systematic reviews. This research provides methodologies and tests hypotheses, which can serve as the basis for further large-scale software engineering projects aimed at fully realizing the prospect of computer-aided systematic reviews
    • …
    corecore