7 research outputs found

    Using Noun Phrases for Navigating Biomedical Literature on Pubmed: How Many Updates Are We Losing Track of?

    Get PDF
    Author-supplied citations are a fraction of the related literature for a paper. The “related citations” on PubMed is typically dozens or hundreds of results long, and does not offer hints why these results are related. Using noun phrases derived from the sentences of the paper, we show it is possible to more transparently navigate to PubMed updates through search terms that can associate a paper with its citations. The algorithm to generate these search terms involved automatically extracting noun phrases from the paper using natural language processing tools, and ranking them by the number of occurrences in the paper compared to the number of occurrences on the web. We define search queries having at least one instance of overlap between the author-supplied citations of the paper and the top 20 search results as citation validated (CV). When the overlapping citations were written by same authors as the paper itself, we define it as CV-S and different authors is defined as CV-D. For a systematic sample of 883 papers on PubMed Central, at least one of the search terms for 86% of the papers is CV-D versus 65% for the top 20 PubMed “related citations.” We hypothesize these quantities computed for the 20 million papers on PubMed to differ within 5% of these percentages. Averaged across all 883 papers, 5 search terms are CV-D, and 10 search terms are CV-S, and 6 unique citations validate these searches. Potentially related literature uncovered by citation-validated searches (either CV-S or CV-D) are on the order of ten per paper – many more if the remaining searches that are not citation-validated are taken into account. The significance and relationship of each search result to the paper can only be vetted and explained by a researcher with knowledge of or interest in that paper

    A Relevance Feedback-Based System For Quickly Narrowing Biomedical Literature Search Result

    Get PDF
    The online literature is an important source that helps people find the information. The quick increase of online literature makes the manual search process for the most relevant information a very time-consuming task and leads to sifting through many results to find the relevant ones. The existing search engines and online databases return a list of results that satisfy the user\u27s search criteria. The list is often too long for the user to go through every hit if he/she does not exactly know what he/she wants or/and does not have time to review them one by one. My focus is on how to find biomedical literature in a fastest way. In this dissertation, I developed a biomedical literature search system that uses relevance feedback mechanism, fuzzy logic, text mining techniques and Unified Medical Language System. The system extracts and decodes information from the online biomedical documents and uses the extracted information to first filter unwanted documents and then ranks the related ones based on the user preferences. I used text mining techniques to extract PDF document features and used these features to filter unwanted documents with the help of fuzzy logic. The system extracts meaning and semantic relations between texts and calculates the similarity between documents using these relations. Moreover, I developed a fuzzy literature ranking method that uses fuzzy logic, text mining techniques and Unified Medical Language System. The ranking process is utilized based on fuzzy logic and Unified Medical Language System knowledge resources. The fuzzy ranking method uses semantic type and meaning concepts to map the relations between texts in documents. The relevance feedback-based biomedical literature search system is evaluated using a real biomedical data that created using dobutamine (drug name). The data set contains 1,099 original documents. To obtain coherent and reliable evaluation results, two physicians are involved in the system evaluation. Using (30-day mortality) as specific query, the retrieved result precision improves by 87.7% in three rounds, which shows the effectiveness of using relevance feedback, fuzzy logic and UMLS in the search process. Moreover, the fuzzy-based ranking method is evaluated in term of ranking the biomedical search result. Experiments show that the fuzzy-based ranking method improves the average ranking order accuracy by 3.35% and 29.55% as compared with UMLS meaning and semantic type methods respectively

    Extensions of SNOMED taxonomy abstraction networks supporting auditing and complexity analysis

    Get PDF
    The Systematized Nomenclature of Medicine – Clinical Terms (SNOMED CT) has been widely used as a standard terminology in various biomedical domains. The enhancement of the quality of SNOMED contributes to the improvement of the medical systems that it supports. In previous work, the Structural Analysis of Biomedical Ontologies Center (SABOC) team has defined the partial-area taxonomy, a hierarchical abstraction network consisting of units called partial-areas. Each partial-area comprises a set of SNOMED concepts exhibiting a particular relationship structure and being distinguished by a unique root concept. In this dissertation, some extensions and applications of the taxonomy framework are considered. Some concepts appearing in multiple partial-areas have been designated as complex due to the fact that they constitute a tangled portion of a hierarchy and can be obstacles to users trying to gain an understanding of the hierarchy’s content. A methodology for partitioning the entire collection of these so-called overlapping complex concepts into singly-rooted groups was presented. A novel auditing methodology based on an enhanced abstraction network is described. In addition, the existing abstraction network relies heavily on the structure of the outgoing relationships of the concepts. But some of SNOMED hierarchies (or subhierarchies) serve only as targets of relationships, with few or no outgoing relationships of their own. This situation impedes the applicability of the abstraction network. To deal with this problem, a variation of the above abstraction network, called the converse abstraction network (CAN) is defined and derived automatically from a given SNOMED hierarchy. An auditing methodology based on the CAN is formulated. Furthermore, a preliminary study of the complementary use of the abstraction network in description logic (DL) for quality assurance purposes pertaining to SNOMED is presented. Two complexity measures, a structural complexity measure and a hierarchical complexity measure, based on the abstraction network are introduced to quantify the complexity of a SNOMED hierarchy. An extension of the two measures is also utilized specifically to track the complexity of the versions of the SNOMED hierarchies before and after a sequence of auditing processes

    Fatores associados a cancro : proposta de ontologia

    Get PDF
    O cancro surge como consequência de múltiplos eventos moleculares, resultando numa desregulação celular tendo como causa última a perda de homeostase celular. Alguns pacientes são susceptíveis a estes eventos de desregulação devido a características hereditárias, outros, por contacto com fatores exógenos que provocam a desregulação celular. Depois das doenças cardiovasculares o cancro é a patologia que mais mata em todo mundo a seguir, no entanto, se detetado em fases precoces torna-se menos debilitante e com uma superior taxa de sobrevivência. Apesar do extenso volume de informação publicada e dedicada ao efeito de fatores com impacto na patologia carcinogénica, a terminologia utilizada não se encontra uniformizada nem organizada, dificultando a sua interpretação. Os sistemas de Informação que utilizam a tecnologia Web Semântica permitem a uniformização de terminologias dispersas. O trabalho realizado no âmbito da presente dissertação visou a criação de um sistema de ontologias descritoras dos fatores associados a cancro que foram reunidos na ferramenta OntoXcancer. O OntoXcancer é uma ferramenta bioinformática que permite a gestão e análise de fatores associados a cancro e permitiu a construção da estrutura de uma base de dados - FactorXcancer. No âmbito desta dissertação, os fatores associados a cancro foram catalogados de acordo com as ontologias geradas pelo OntoXcancer e a informação bibliográfica referente a estes fatores permitiu o estabelecimento de uma estratégia metodológica para a consulta da informação disponível no FactorXcancer.Cancer appears as a consequence of multiple molecular events resulting in cell disruption having as ultimate cause the loss of cellular homeostasis. Some patients are susceptible to these events of deregulation due to hereditary characteristics, others, by contact with external factors that caused cell disruption. Cancer has the second highest mortality rate all over the world, only behind cardiovascular diseases. However, if detected at an early stage, it becomes less debilitating and the survival rate increases. Despite the extensive amount of published data and dedicated investigations to best understand the factors that impact on carcinogenic pathology, the terminology is not standardized or organized, and this information remains difficult to interpret. Information Systems that use Semantic Web technology allow the standardization of dispersed terminologies, which will help in this matter. The work performed under this dissertation aimed to create a system of ontologies that could describe the factors associated with cancer. They were then gathered in OntoXcancer tool. The OntoXcancer is a bioinformatics tool that allows the management and analysis of factors associated with cancer and allowed the construction of a structured database - FactorXcancer. Within this dissertation, the cancer-associated factors were cataloged according to the ontologies generated by OntoXcancer and bibliographic information concerning these factors allowed the establishment of a methodology for consulting of the information available in FactorXcancer
    corecore