65 research outputs found

    Alignment of vaccine codes using an ontology of vaccine descriptions

    Get PDF
    BACKGROUND: Vaccine information in European electronic health record (EHR) databases is represented using various clinical and database-specific coding systems and drug vocabularies. The lack of harmonization constitutes a challenge in reusing EHR data in collaborative benefit-risk studies about vaccines. METHODS: We designed an ontology of the properties that are commonly used in vaccine descriptions, called Ontology of Vaccine Descriptions (VaccO), with a dictionary for the analysis of multilingual vaccine descriptions. We implemented five algorithms for the alignment of vaccine coding systems, i.e., the identification of corresponding codes from different coding ystems, based on an analysis of the code descriptors. The algorithms were evaluated by comparing their results with manually created alignments in two reference sets including clinical and database-specific coding systems with multilingual code descriptors. RESULTS: The best-performing algorithm represented code descriptors as logical statements about entities in the VaccO ontology and used an ontology reasoner to infer common properties and identify corresponding vaccine codes. The evaluation demonstrated excellent performance of the approach (F-scores 0.91 and 0.96). CONCLUSION: The VaccO ontology allows the identification, representation, and comparison of heterogeneous descriptions of vaccines. The automatic alignment of vaccine coding systems can accelerate the readiness of EHR databases in collaborative vaccine studies

    Extending import detection algorithms for concept import from two to three biomedical terminologies

    Full text link
    Background: While enrichment of terminologies can be achieved in different ways, filling gaps in the IS-A hierarchy backbone of a terminology appears especially promising. To avoid difficult manual inspection, we started a research program in 2014, investigating terminology densities, where the comparison of terminologies leads to the algorithmic discovery of potentially missing concepts in a target terminology. While candidate concepts have to be approved for import by an expert, the human effort is greatly reduced by algorithmic generation of candidates. In previous studies, a single source terminology was used with one target terminology. Methods: In this paper, we are extending the algorithmic detection of “candidate concepts for import” from one source terminology to two source terminologies used in tandem. We show that the combination of two source terminologies relative to one target terminology leads to the discovery of candidate concepts for import that could not be found with the same “reliability” when comparing one source terminology alone to the target terminology. We investigate which triples of UMLS terminologies can be gainfully used for the described purpose and how many candidate concepts can be found for each individual triple of terminologies. Results: The analysis revealed a specific configuration of concepts, overlapping two source and one target terminology, for which we coined the name “fire ladder” pattern. The three terminologies in this pattern are tied together by a kind of “transitivity.” We provide a quantitative analysis of the discovered fire ladder patterns and we report on the inter-rater agreement concerning the decision of importing candidate concepts from source terminologies into the target terminology. We algorithmically identified 55 instances of the fire ladder pattern and two domain experts agreed on import for 39 instances. In total, 48 concepts were approved by at least one expert. In addition, 105 import candidate concepts from a single source terminology into the target terminology were also detected, as a “beneficial side-effect” of this method, increasing the cardinality of the result. Conclusion: We showed that pairs of biomedical source terminologies can be transitively chained to suggest possible imports of concepts into a target terminology

    Worrying about the future: towards evidence-based prognosis in anxiety disorders.

    Get PDF

    Vaccine semantics : Automatic methods for recognizing, representing, and reasoning about vaccine-related information

    Get PDF
    Post-marketing management and decision-making about vaccines builds on the early detection of safety concerns and changes in public sentiment, the accurate access to established evidence, and the ability to promptly quantify effects and verify hypotheses about the vaccine benefits and risks. A variety of resources provide relevant information but they use different representations, which makes rapid evidence generation and extraction challenging. This thesis presents automatic methods for interpreting heterogeneously represented vaccine information. Part I evaluates social media messages for monitoring vaccine adverse events and public sentiment in social media messages, using automatic methods for information recognition. Parts II and III develop and evaluate automatic methods and res

    A proteomic investigation of the heat stress response of the South African abalone haliotis midae

    Get PDF
    Includes bibliographical references.The abalone Haliotis midae has been fished to near-extinction on the South African coastline, primarily to satisfy a growing international market. In order to meet demands, H. midae has been produced in South Africa by aquaculture for several decades, and the South African abalone aquaculture industry continues to expand. Internationally, abalone aquaculture has been actively affected by the outbreak of bacterial and viral diseases, which spread rapidly and lead to high abalone mortality. There is evidence that environmental stresses on abalone farms may lead to immunosuppression, and thereby increase the severity of disease outbreaks. The water temperature on abalone farms fluctuates seasonally, and increased abalone mortality has been associated with warmer water during the summer months. However, the molecular mechanisms affecting the abalone during exposure to stress remain unclear. With advances in proteomics technology, it is possible to identify and quantify the expression of several hundred proteins simultaneously. This study therefore aimed to gain insight into the H. midae stress response by using proteomic tools to identify proteins that are differentially regulated in haemocytes during exposure to acute heat stress. Identifying which biochemical pathways are involved in the abalone stress response will give some insight into the molecular mechanism by which H. midae responds to heat stress

    Application of information extraction techniques to pharmacological domain : extracting drug-drug interactions

    Get PDF
    Una interacción farmacológica ocurre cuando los efectos de un fármaco se modifican por la presencia de otro. Las consecuencias pueden ser perjudiciales si la interacción causa un aumento de la toxicidad del fármaco o la disminución de su efecto, pudiendo provocar incluso la muerte del paciente en los peores casos. Las interacciones farmacológicas no sólo suponen un grave problema para la seguridad del paciente, sino que además también conllevan un importante incremento en el gasto médico. En la actualidad, el personal sanitario tiene a su disposición diversas bases de datos sobre interacciones que permiten evitar posibles interacciones a la hora de prescribir un determinado tratamiento, sin embargo, estas bases de datos no están completas. Por este motivo, médicos y farmacéuticos se ven obligados a revisar una gran cantidad de artículos científicos e informes sobre seguridad de medicamentos para estar al día de todo lo publicado en relación al tema. Desgraciadamente, el gran volumen de información al respecto hace que estos profesionales estén desbordados ante tal avalancha. El desarrollo de métodos automáticos que permitan recopilar, mantener e interpretar toda esta información es crucial a la hora de conseguir una mejora real en la detección temprana de las interacciones entre fármacos. Por tanto, la extracción de información podría reducir el tiempo empleado por el personal médico en la revisión de la literatura médica. Sin embargo, la extracción de interacciones farmacológicas a partir textos biomédicos no ha sido dirigida hasta el momento. Motivados por estos aspectos, en esta tesis hemos realizado un estudio detallado sobre diversas técnicas de extracción de información aplicadas al dominio farmacológico. Basándonos en este estudio, hemos propuesto dos aproximaciones distintas para la extracción de interacciones farmacológicas de los textos. Nuestra primera aproximación propone un enfoque híbrido, que combina análisis sintáctico superficial y la aplicación de patrones léxicos definidos por un farmacéutico. La segunda aproximación se aborda mediante aprendizaje supervisado, concretamente, el uso de métodos kernels. Además, se han desarrollado las siguientes tareas auxiliares: (1) el análisis de los textos utilizando la herramienta UMLS MetaMap Transfer (MMTx), que proporciona información sintáctica y semántica, (2) un proceso para identificar y clasificar los nombres de fármacos que ocurren en los textos, y (3) un proceso para reconoger las expresiones anafóricas que se refieren a fármacos. Un prototipo ha sido desarrollado para integrar y combinar las distintas técnicas propuestas en esta tesis. Para la evaluación de las dos propuestas, con la ayuda de un farmacéutico desarrollamos y anotamos un corpus con interacciones farmacológicas. El corpus DrugDDI es una de las principales aportaciones de la tesis, ya que es el primer corpus en el dominio biomédico anotado con este tipo de información y porque creemos que puede alentar la investigación sobre extracción de información en el dominio farmacológico. Los experimentos realizados demuestran que el enfoque basado en kernels consigue mejores resultados que los reportados por el enfoque que utiliza información sintáctica y patrones léxicos. Además, los kernels consiguen resultados comparables a los obtenidos en dominios similares como son las interacciones entre proteínas. Esta tesis se ha llevado a cabo en el marco del consorcio de investigación MAVIRCM (Mejorando el acceso y visibilidad de la información multilingüe en red para la Comunidad de Madrid, www.mavir.net) dentro del Programa de Actividades de I+D en Tecnologías 2005-2008 de la Comunidad de Madrid (S-0505/TIC-0267) así como en el proyecto de investigación BRAVO: ”Búsqueda de Respuestas Avanzada Multimodal y Multilingüe” (TIN2007-67407-C03-01).----------------------------------------------------------------------------------------A drug-drug interaction occurs when one drug influences the level or activity of another drug. The detection of drug interactions is an important research area in patient safety since these interactions can become very dangerous and increase health care costs. Although there are different databases supporting health care professionals in the detection of drug interactions, this kind of resource is rarely complete. Drug interactions are frequently reported in journals of clinical pharmacology, making medical literature the most effective source for the detection of drug interactions. However, the increasing volume of the literature overwhelms health care professionals trying to keep an up-to-date collection of all reported drug-drug interactions. The development of automatic methods for collecting, maintaining and interpreting this information is crucial for achieving a real improvement in their early detection. Information Extraction (IE) techniques can provide an interesting way of reducing the time spent by health care professionals on reviewing the literature. Nevertheless, no approach has been carried out to extract drug-drug interactions from biomedical texts. In this thesis, we have conducted a detailed study on various IE techniques applied to biomedical domain. Based on this study, we have proposed two different approximations for the extraction of drug-drug interactions from texts. The first approximation proposes a hybrid approach, which combines shallow parsing and pattern matching to extract relations between drugs from biomedical texts. The second approximation is based on a supervised machine learning approach, in particular, kernel methods. In addition, we have created and annotated the first corpus, DrugDDI, annotated with drug-drug interactions, which allow us to evaluate and compare both approximations. To the best of our knowledge, the DrugDDI corpus is the only available corpus annotated for drug-drug interactions and this thesis is the first work which addresses the problem of extracting drug-drug interactions from biomedical texts. We believe the DrugDDI corpus is an important contribution because it could encourage other research groups to research into this problem. We have also defined three auxiliary processes to provide crucial information, which will be used by the aforementioned approximations. These auxiliary tasks are as follows: (1) a process for text analysis based on the UMLS MetaMap Transfer tool (MMTx) to provide shallow syntactic and semantic information from texts, (2) a process for drug name recognition and classification, and (3) a process for drug anaphora resolution. Finally, we have developed a pipeline prototype which integrates the different auxiliary processes. The pipeline architecture allows us to easily integrate these modules with each of the approaches proposed in this thesis: pattern-matching or kernels. Several experiments were performed on the DrugDDI corpus. They show that while the first approximation based on pattern matching achieves low performance, the approach based on kernel-methods achieves a performance comparable to those obtained by approaches which carry out a similar task such as the extraction of protein-protein interactions. This work has been partially supported by the Spanish research projects: MAVIR consortium (S-0505/TIC-0267, www.mavir.net), a network of excellence funded by the Madrid Regional Government and TIN2007-67407-C03-01 (BRAVO: Advanced Multimodal and Multilingual Question Answering)

    Expression levels of human lysosomal enzymes in Oryza sativa under the control of different regulatory elements

    Get PDF
    Negli ultimi 20 anni si è assistito alla produzione, sperimentazione e successiva commercializzazione di una serie di proteine enzimatiche curative per un gruppo di patologie lisosomiali (malattia di Gaucher, mucopolisaccaridosi I, II, VI, malattia di Anderson-Fabry, e malattia di Pompe). La tecnologia utilizzata in tutti questi casi è basata sull’utilizzo del DNA ricombinante in cellule eucariotiche umane (linee fibroblastoidi) o murine (Chinese Hamster Ovary) che necessitano di bioreattori meccanici ad elevato volume. Il sistema utilizzato presenta i difetti di una produzione limitata, di elevati costi di investimento e soprattutto di possibile contaminazione e trasmissione all’uomo di agenti infettivi. Negli ultimi anni, è stato dimostrato che le piante transgeniche rappresentano sistemi competitivi per la produzione di proteine farmaceutiche. Riuscire ad esprimere una proteina ricombinante con potenziale terapeutico in pianta potrebbe riflettersi nello sviluppo di un farmaco più accessibile ai pazienti colpiti da tali malattie. Ciò risulta particolarmente importante nel caso del trattamento di patologie rare come le malattie di Gaucher e Anderson-Fabry. Per superare questi problemi e garantire a tutti i pazienti l’accesso ai medicinali, le piante transgeniche sono state proposte come sistema di produzione alternativo, sicuro e con costi più ridotti. Lo scopo principale del presente lavoro, svolto presso l’azienda Transactiva Srl in collaborazione con l’Università degli studi di Udine, è stato il miglioramento dell’espressione di enzimi umani ricombinanti quali la β-glucocerebrosidasi e l’α-galattosidasi, esplorando il potenziale di bioproduzione delle piante. Il riso (Oryza sativa L. ssp. japonica, var. CR W3) è stato scelto come specie ospite e la cariosside come organo di accumulo delle proteine lisosomiali di interesse. Per raggiungere lo scopo della tesi, il lavoro è stato suddiviso in 4 fasi: i) creazione di versioni sintetiche dei geni umani codificanti gli enzimi di interesse applicando i criteri del Codon Context per avere un’alta espressione in endosperma della cariosside di riso; ii) realizzazione dei vettori di espressione finali per la trasformazione di riso mediante Agrobacterium tumefaciens; iii) studio dell’effetto che differenti elementi di regolazione usati (promotore, 5’ UTR e 3’ UTR) hanno sui livelli di espressione, eseguendo un sistema di analisi sui trasformati basato su saggio immunoenzimatico DAS-ELISA; iv) verifica dell’effettiva produzione di questi enzimi nella cariosside e del loro peso molecolare grazie alla tecnica Western blot

    Information retrieval and text mining technologies for chemistry

    Get PDF
    Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.A.V. and M.K. acknowledge funding from the European Community’s Horizon 2020 Program (project reference: 654021 - OpenMinted). M.K. additionally acknowledges the Encomienda MINETAD-CNIO as part of the Plan for the Advancement of Language Technology. O.R. and J.O. thank the Foundation for Applied Medical Research (FIMA), University of Navarra (Pamplona, Spain). This work was partially funded by Consellería de Cultura, Educación e Ordenación Universitaria (Xunta de Galicia), and FEDER (European Union), and the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of UID/BIO/04469/2013 unit and COMPETE 2020 (POCI-01-0145-FEDER-006684). We thank Iñigo Garciá -Yoldi for useful feedback and discussions during the preparation of the manuscript.info:eu-repo/semantics/publishedVersio
    corecore