1,708 research outputs found

    Mining the Medical and Patent Literature to Support Healthcare and Pharmacovigilance

    Get PDF
    Recent advancements in healthcare practices and the increasing use of information technology in the medical domain has lead to the rapid generation of free-text data in forms of scientific articles, e-health records, patents, and document inventories. This has urged the development of sophisticated information retrieval and information extraction technologies. A fundamental requirement for the automatic processing of biomedical text is the identification of information carrying units such as the concepts or named entities. In this context, this work focuses on the identification of medical disorders (such as diseases and adverse effects) which denote an important category of concepts in the medical text. Two methodologies were investigated in this regard and they are dictionary-based and machine learning-based approaches. Futhermore, the capabilities of the concept recognition techniques were systematically exploited to build a semantic search platform for the retrieval of e-health records and patents. The system facilitates conventional text search as well as semantic and ontological searches. Performance of the adapted retrieval platform for e-health records and patents was evaluated within open assessment challenges (i.e. TRECMED and TRECCHEM respectively) wherein the system was best rated in comparison to several other competing information retrieval platforms. Finally, from the medico-pharma perspective, a strategy for the identification of adverse drug events from medical case reports was developed. Qualitative evaluation as well as an expert validation of the developed system's performance showed robust results. In conclusion, this thesis presents approaches for efficient information retrieval and information extraction from various biomedical literature sources in the support of healthcare and pharmacovigilance. The applied strategies have potential to enhance the literature-searches performed by biomedical, healthcare, and patent professionals. The applied strategies have potential to enhance the literature-searches performed by biomedical, healthcare, and patent professionals. This can promote the literature-based knowledge discovery, improve the safety and effectiveness of medical practices, and drive the research and development in medical and healthcare arena

    From Linguistic Resources to Ontology-Aware Terminologies: Minding the Representation Gap

    Get PDF
    Terminological resources have proven crucial in many applications ranging from Computer-Aided Translation tools to authoring software and multilingual and cross-lingual information retrieval systems. Nonetheless, with the exception of a few felicitous examples, such as the IATE (Interactive Terminology for Europe) Termbank, many terminological resources are not available in standard formats, such as Term Base eXchange (TBX), thus preventing their sharing and reuse. Yet, these terminologies could be improved associating the correspondent ontology-based information. The research described in the present contribution demonstrates the process and the methodologies adopted in the automatic conversion into TBX of such type of resources, together with their semantic enrichment based on the formalization of ontological information into terminologies. We present a proof-of-concept using the Italian Linguistic Resource for the Archaeological domain (developed according to Thesauri and Guidelines of the Italian Central Institute for the Catalogue and Documentation). Further, we introduce the conversion tool developed to support the process of creating ontology-aware terminologies for improving interoperability and sharing of existing language technologies and data set

    Knowledge Rich Natural Language Queries over Structured Biological Databases

    Full text link
    Increasingly, keyword, natural language and NoSQL queries are being used for information retrieval from traditional as well as non-traditional databases such as web, document, image, GIS, legal, and health databases. While their popularity are undeniable for obvious reasons, their engineering is far from simple. In most part, semantics and intent preserving mapping of a well understood natural language query expressed over a structured database schema to a structured query language is still a difficult task, and research to tame the complexity is intense. In this paper, we propose a multi-level knowledge-based middleware to facilitate such mappings that separate the conceptual level from the physical level. We augment these multi-level abstractions with a concept reasoner and a query strategy engine to dynamically link arbitrary natural language querying to well defined structured queries. We demonstrate the feasibility of our approach by presenting a Datalog based prototype system, called BioSmart, that can compute responses to arbitrary natural language queries over arbitrary databases once a syntactic classification of the natural language query is made

    A Novel ILP Framework for Summarizing Content with High Lexical Variety

    Full text link
    Summarizing content contributed by individuals can be challenging, because people make different lexical choices even when describing the same events. However, there remains a significant need to summarize such content. Examples include the student responses to post-class reflective questions, product reviews, and news articles published by different news agencies related to the same events. High lexical diversity of these documents hinders the system's ability to effectively identify salient content and reduce summary redundancy. In this paper, we overcome this issue by introducing an integer linear programming-based summarization framework. It incorporates a low-rank approximation to the sentence-word co-occurrence matrix to intrinsically group semantically-similar lexical items. We conduct extensive experiments on datasets of student responses, product reviews, and news documents. Our approach compares favorably to a number of extractive baselines as well as a neural abstractive summarization system. The paper finally sheds light on when and why the proposed framework is effective at summarizing content with high lexical variety.Comment: Accepted for publication in the journal of Natural Language Engineering, 201

    Exploiting and integrating rich features for biological literature classification

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Efficient features play an important role in automated text classification, which definitely facilitates the access of large-scale data. In the bioscience field, biological structures and terminologies are described by a large number of features; domain dependent features would significantly improve the classification performance. How to effectively select and integrate different types of features to improve the biological literature classification performance is the major issue studied in this paper.</p> <p>Results</p> <p>To efficiently classify the biological literatures, we propose a novel feature value schema <it>TF</it>*<it>ML</it>, features covering from lower level domain independent “string feature” to higher level domain dependent “semantic template feature”, and proper integrations among the features. Compared to our previous approaches, the performance is improved in terms of <it>AUC</it> and <it>F-Score</it> by 11.5% and 8.8% respectively, and outperforms the best performance achieved in BioCreAtIvE 2006.</p> <p>Conclusions</p> <p>Different types of features possess different discriminative capabilities in literature classification; proper integration of domain independent and dependent features would significantly improve the performance and overcome the over-fitting on data distribution.</p

    Knowledge organisation in LSP texts and dictionaries: a case study

    Get PDF
    1noIn LSP dictionaries the specialised knowledge contained and organised in texts is selected and restructured. This paper is focused on the analysis of a case study: the "Dizionario generale plurilingue del Lessico Metalinguistico" (DLM – General Multilingual Dictionary of the Metalinguistic Lexicon). The dictionary of linguistics terminology under examination is planned to complement the reference products available in this area of knowledge. In fact, it has a particular outline as the materials it records are directly drawn from the most representative texts produced throughout the history of linguistic speculation (§ 2.). The plan of the DLM establishes that the terminological information stored (definitions, cross-references, formal variants, translations) is directly drawn from the original texts, and not elaborated by the compilers. Therefore, the definitions of the indexed terms are not produced by terminographers: they are ‘defining quotations’ identified and extracted by specialists from the source texts. Specialised texts play an essential role in this project as they are analysed in order to both identify the core concepts used (or introduced) by their authors and to reconstruct the conceptual networks delineated in each of them. In the compilation of the DLM the problematic issues inherent in textual analysis clearly emerge (§ 3.). This is due to the fact that texts are multifaceted units where the various factors related to their structural organisation and informative content interact. The different degrees of ‘density’ of specialised information which is displayed in texts is determined, among others, by the conceptual, communicative, pragmatic, structural, cognitive, and socio-cultural components of LSP texts (§ 3.1.). The procedures of retrieval and organisation of specialised knowledge carried out in the DLM project are analysed in this study through the consideration of a sub-section of its terminological inventory, i.e. the metalinguistic units extracted from a text in which focal linguistic issues are discussed (§ 4.). Although this book was produced in the pre-scientific period of the history of linguistics, it was chosen because, in addition to providing interesting contributions to linguistics terminology – considered also from a historical viewpoint –, it yields a model for the arrangement of the conceptual relational network which is being implemented for the DLM (§ 4.1.). The bi-dimensional character of terminological records of the DLM is being integrated with graphic representations of conceptual relations, which provide a multidimensional outline to the defining section of this dictionary. The visual representation of relational networks provides further terminological information and it also makes available to the users an effective instrument for acquiring a more thorough understanding of the specialised knowledge which is transferred from LSP texts into this dictionary.openN. LEONARDILeonardi, Natasci
    • 

    corecore