1,708 research outputs found
Mining the Medical and Patent Literature to Support Healthcare and Pharmacovigilance
Recent advancements in healthcare practices and the increasing use of information technology in the medical domain has lead to the rapid generation of free-text data in forms of scientific articles, e-health records, patents, and document inventories. This has urged the development of sophisticated information retrieval and information extraction technologies. A fundamental requirement for the automatic processing of biomedical text is the identification of information carrying units such as the concepts or named entities. In this context, this work focuses on the identification of medical disorders (such as diseases and adverse effects) which denote an important category of concepts in the medical text. Two methodologies were investigated in this regard and they are dictionary-based and machine learning-based approaches. Futhermore, the capabilities of the concept recognition techniques were systematically exploited to build a semantic search platform for the retrieval of e-health records and patents. The system facilitates conventional text search as well as semantic and ontological searches. Performance of the adapted retrieval platform for e-health records and patents was evaluated within open assessment challenges (i.e. TRECMED and TRECCHEM respectively) wherein the system was best rated in comparison to several other competing information retrieval platforms. Finally, from the medico-pharma perspective, a strategy for the identification of adverse drug events from medical case reports was developed. Qualitative evaluation as well as an expert validation of the developed system's performance showed robust results. In conclusion, this thesis presents approaches for efficient information retrieval and information extraction from various biomedical literature sources in the support of healthcare and pharmacovigilance. The applied strategies have potential to enhance the literature-searches performed by biomedical, healthcare, and patent professionals. The applied strategies have potential to enhance the literature-searches performed by biomedical, healthcare, and patent professionals. This can promote the literature-based knowledge discovery, improve the safety and effectiveness of medical practices, and drive the research and development in medical and healthcare arena
From Linguistic Resources to Ontology-Aware Terminologies: Minding the Representation Gap
Terminological resources have proven crucial in many applications ranging from Computer-Aided Translation tools to authoring software and multilingual and cross-lingual information retrieval systems. Nonetheless, with the exception of a few felicitous examples, such as the IATE (Interactive Terminology for Europe) Termbank, many terminological resources are not available in standard formats, such as Term Base eXchange (TBX), thus preventing their sharing and reuse. Yet, these terminologies could be improved associating the correspondent ontology-based information. The research described in the present contribution demonstrates the process and the methodologies adopted in the automatic conversion into TBX of such type of resources, together with their semantic enrichment based on the formalization of ontological information into terminologies. We present a proof-of-concept using the Italian Linguistic Resource for the Archaeological domain (developed according to Thesauri and Guidelines of the Italian Central Institute for the Catalogue and Documentation). Further, we introduce the conversion tool developed to support the process of creating ontology-aware terminologies for improving interoperability and sharing of existing language technologies and data set
Knowledge Rich Natural Language Queries over Structured Biological Databases
Increasingly, keyword, natural language and NoSQL queries are being used for
information retrieval from traditional as well as non-traditional databases
such as web, document, image, GIS, legal, and health databases. While their
popularity are undeniable for obvious reasons, their engineering is far from
simple. In most part, semantics and intent preserving mapping of a well
understood natural language query expressed over a structured database schema
to a structured query language is still a difficult task, and research to tame
the complexity is intense. In this paper, we propose a multi-level
knowledge-based middleware to facilitate such mappings that separate the
conceptual level from the physical level. We augment these multi-level
abstractions with a concept reasoner and a query strategy engine to dynamically
link arbitrary natural language querying to well defined structured queries. We
demonstrate the feasibility of our approach by presenting a Datalog based
prototype system, called BioSmart, that can compute responses to arbitrary
natural language queries over arbitrary databases once a syntactic
classification of the natural language query is made
A Novel ILP Framework for Summarizing Content with High Lexical Variety
Summarizing content contributed by individuals can be challenging, because
people make different lexical choices even when describing the same events.
However, there remains a significant need to summarize such content. Examples
include the student responses to post-class reflective questions, product
reviews, and news articles published by different news agencies related to the
same events. High lexical diversity of these documents hinders the system's
ability to effectively identify salient content and reduce summary redundancy.
In this paper, we overcome this issue by introducing an integer linear
programming-based summarization framework. It incorporates a low-rank
approximation to the sentence-word co-occurrence matrix to intrinsically group
semantically-similar lexical items. We conduct extensive experiments on
datasets of student responses, product reviews, and news documents. Our
approach compares favorably to a number of extractive baselines as well as a
neural abstractive summarization system. The paper finally sheds light on when
and why the proposed framework is effective at summarizing content with high
lexical variety.Comment: Accepted for publication in the journal of Natural Language
Engineering, 201
Exploiting and integrating rich features for biological literature classification
<p>Abstract</p> <p>Background</p> <p>Efficient features play an important role in automated text classification, which definitely facilitates the access of large-scale data. In the bioscience field, biological structures and terminologies are described by a large number of features; domain dependent features would significantly improve the classification performance. How to effectively select and integrate different types of features to improve the biological literature classification performance is the major issue studied in this paper.</p> <p>Results</p> <p>To efficiently classify the biological literatures, we propose a novel feature value schema <it>TF</it>*<it>ML</it>, features covering from lower level domain independent âstring featureâ to higher level domain dependent âsemantic template featureâ, and proper integrations among the features. Compared to our previous approaches, the performance is improved in terms of <it>AUC</it> and <it>F-Score</it> by 11.5% and 8.8% respectively, and outperforms the best performance achieved in BioCreAtIvE 2006.</p> <p>Conclusions</p> <p>Different types of features possess different discriminative capabilities in literature classification; proper integration of domain independent and dependent features would significantly improve the performance and overcome the over-fitting on data distribution.</p
Knowledge organisation in LSP texts and dictionaries: a case study
1noIn LSP dictionaries the specialised knowledge contained and organised in texts is selected and restructured. This paper is focused on the analysis of a case study: the "Dizionario generale plurilingue del Lessico Metalinguistico" (DLM â General Multilingual Dictionary of the Metalinguistic Lexicon). The dictionary of linguistics terminology under examination is planned to complement the reference products available in this area of knowledge. In fact, it has a particular outline as the materials it records are directly drawn from the most representative texts produced throughout the history of linguistic speculation (§ 2.). The plan of the DLM establishes that the terminological information stored (definitions, cross-references, formal variants, translations) is directly drawn from the original texts, and not elaborated by the compilers. Therefore, the definitions of the indexed terms are not produced by terminographers: they are âdefining quotationsâ identified and extracted by specialists from the source texts.
Specialised texts play an essential role in this project as they are analysed in order to both identify the core concepts used (or introduced) by their authors and to reconstruct the conceptual networks delineated in each of them. In the compilation of the DLM the problematic issues inherent in textual analysis clearly emerge (§ 3.). This is due to the fact that texts are multifaceted units where the various factors related to their structural organisation and informative content interact. The different degrees of âdensityâ of specialised information which is displayed in texts is determined, among others, by the conceptual, communicative, pragmatic, structural, cognitive, and socio-cultural components of LSP texts (§ 3.1.).
The procedures of retrieval and organisation of specialised knowledge carried out in the DLM project are analysed in this study through the consideration of a sub-section of its terminological inventory, i.e. the metalinguistic units extracted from a text in which focal linguistic issues are discussed (§ 4.). Although this book was produced in the pre-scientific period of the history of linguistics, it was chosen because, in addition to providing interesting contributions to linguistics terminology â considered also from a historical viewpoint â, it yields a model for the arrangement of the conceptual relational network which is being implemented for the DLM (§ 4.1.). The bi-dimensional character of terminological records of the DLM is being integrated with graphic representations of conceptual relations, which provide a multidimensional outline to the defining section of this dictionary. The visual representation of relational networks provides further terminological information and it also makes available to the users an effective instrument for acquiring a more thorough understanding of the specialised knowledge which is transferred from LSP texts into this dictionary.openN. LEONARDILeonardi, Natasci
- âŠ