7,837 research outputs found

    Development of an information retrieval tool for biomedical patents

    Get PDF
    Supplementary material associated with this article can be found, in the online version, at doi: 10.1016/j.cmpb.2018.03.012 .Background and objective. The volume of biomedical literature has been increasing in the last years. Patent documents have also followed this trend, being important sources of biomedical knowledge, technical details and curated data, which are put together along the granting process. The field of Biomedical text mining (BioTM) has been creating solutions for the problems posed by the unstructured nature of natural language, which makes the search of information a challenging task. Several BioTM techniques can be applied to patents. From those, Information Retrieval (IR) includes processes where relevant data are obtained from collections of documents. In this work, the main goal was to build a patent pipeline addressing IR tasks over patent repositories to make these documents amenable to BioTM tasks. Methods. The pipeline was developed within @Note2, an open-source computational framework for BioTM, adding a number of modules to the core libraries, including patent metadata and full text retrieval, PDF to text conversion and optical character recognition. Also, user interfaces were developed for the main operations materialized in a new @Note2 plug-in. Results. The integration of these tools in @Note2 opens opportunities to run BioTM tools over patent texts, including tasks from Information Extraction, such as Named Entity Recognition or Relation Extraction. We demonstrated the pipelines main functions with a case study, using an available benchmark dataset from BioCreative challenges. Also, we show the use of the plug-in with a user query related to the production of vanillin. Conclusions. This work makes available all the relevant content from patents to the scientific community, decreasing drastically the time required for this task, and provides graphical interfaces to ease the use of these tools.This work is co-funded by the Programa Operacional Re- gional do Norte, under the “Portugal2020”, through the Euro- pean Regional Development Fund ( ERDF ), within project SISBI- Ref a NORTE-01-0247-FEDER-003381 . This study was also supported by the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of UID/BIO/04469/2013 unit and COMPETE 2020 (POCI-01- 0145-FEDER-00 6 684) and BioTecNorte operation (NORTE-01-0145- FEDER-0 0 0 0 04) funded by European Regional Development Fund under the scope of Norte2020 - Programa Operacional Regional do Norte.info:eu-repo/semantics/publishedVersio

    Scientometric mapping as a strategic intelligence tool for the governance of emerging technologies

    Get PDF
    How can scientometric mapping function as a tool of ’strategic intelligence’ to aid the governance of emerging technologies? The present paper aims to address this question by focusing on a set of recently developed scientometric techniques, namely overlay mapping. We examine the potential these techniques have to inform, in a timely manner, analysts and decision-makers about relevant dynamics of technical emergence. We investigate the capability of overlay mapping in generating informed perspectives about emergence across three spaces: geographical, social, and cognitive. Our analysis relies on three empirical studies of emerging technologies in the biomedical domain: RNA interference (RNAi), Human Papilloma Virus (HPV) testing technologies for cervical cancer, and Thiopurine Methyltransferase (TPMT) genetic testing. The case-studies are analysed and mapped longitudinally by using publication and patent data. Results show the variety of ’intelligence’ inputs overlay mapping can produce for the governance of emerging technologies. Overlay mapping also confers to the investigation of emergence flexibility and granularity in terms of adaptability to different sources of data and selection of the levels of the analysis, respectively. These features make possible the integration and comparison of results from different contexts and cases, thus providing possibilities for a potentially more ’distributed’ strategic intelligence. The generated perspectives allow triangulation of findings, which is important given the complexity featuring in technical emergence and the limitations associated with the use of single scientometric approaches

    Information retrieval and text mining technologies for chemistry

    Get PDF
    Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.A.V. and M.K. acknowledge funding from the European Community’s Horizon 2020 Program (project reference: 654021 - OpenMinted). M.K. additionally acknowledges the Encomienda MINETAD-CNIO as part of the Plan for the Advancement of Language Technology. O.R. and J.O. thank the Foundation for Applied Medical Research (FIMA), University of Navarra (Pamplona, Spain). This work was partially funded by Consellería de Cultura, Educación e Ordenación Universitaria (Xunta de Galicia), and FEDER (European Union), and the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of UID/BIO/04469/2013 unit and COMPETE 2020 (POCI-01-0145-FEDER-006684). We thank Iñigo Garciá -Yoldi for useful feedback and discussions during the preparation of the manuscript.info:eu-repo/semantics/publishedVersio

    Development of text mining tools for information retrieval from patents

    Get PDF
    Biomedical literature is composed of an ever increasing number of publications in natural language. Patents are a relevant fraction of those, being important sources of information due to all the curated data from the granting process. However, their unstructured data turns the search of information a challenging task. To surpass that, Biomedical text mining (BioTM) creates methodologies to search and structure that data. Several BioTM techniques can be applied to patents. From those, Information Retrieval is the process where relevant data is obtained from collections of documents. In this work, a patent pipeline was developed and integrated intoFEDER -Federación Española de Enfermedades Raras(NORTE-01-0145-FEDER-000004)info:eu-repo/semantics/publishedVersio

    Overview of BioASQ 2021-MESINESP track. Evaluation of advance hierarchical classification techniques for scientific literature, patents and clinical trials

    Get PDF
    CLEF 2021 – Conference and Labs of the Evaluation Forum, September 21–24, 2021, Bucharest, Romania,There is a pressing need to exploit recent advances in natural language processing technologies, in particular language models and deep learning approaches, to enable improved retrieval, classification and ultimately access to information contained in multiple, heterogeneous types of documents. This is particularly true for the field of biomedicine and clinical research, where medical experts and scientists need to carry out complex search queries against a variety of document collections, including literature, patents, clinical trials or other kind of content like EHRs. Indexing documents with structured controlled vocabularies used for semantic search engines and query expansion purposes is a critical task for enabling sophisticated user queries and even cross-language retrieval. Due to the complexity of the medical domain and the use of very large hierarchical indexing terminologies, implementing efficient automatic systems to aid manual indexing is extremely difficult. This paper provides a summary of the MESINESP task results on medical semantic indexing in Spanish (BioASQ/ CLEF 2021 Challenge). MESINESP was carried out in direct collaboration with literature content databases and medical indexing experts using the DeCS vocabulary, a similar resource as MeSH terms. Seven participating teams used advanced technologies including extreme multilabel classification and deep language models to solve this challenge which can be viewed as a multi-label classification problem. MESINESP resources, we have released a Gold Standard collection of 243,000 documents with a total of 2179 manual annotations divided in train, development and test subsets covering literature, patents as well as clinical trial summaries, under a cross-genre training and data labeling scenario. Manual indexing of the evaluation subsets was carried out by three independent experts using a specially developed indexing interface called ASIT. Additionally, we have published a collection of large-scale automatic semantic annotations based on NER systems of these documents with mentions of drugs/medications (170,000), symptoms (137,000), diseases (840,000) and clinical procedures (415,000). In addition to a summary of the used technologies by the teams, this paperS

    Using Neural Networks for Relation Extraction from Biomedical Literature

    Full text link
    Using different sources of information to support automated extracting of relations between biomedical concepts contributes to the development of our understanding of biological systems. The primary comprehensive source of these relations is biomedical literature. Several relation extraction approaches have been proposed to identify relations between concepts in biomedical literature, namely, using neural networks algorithms. The use of multichannel architectures composed of multiple data representations, as in deep neural networks, is leading to state-of-the-art results. The right combination of data representations can eventually lead us to even higher evaluation scores in relation extraction tasks. Thus, biomedical ontologies play a fundamental role by providing semantic and ancestry information about an entity. The incorporation of biomedical ontologies has already been proved to enhance previous state-of-the-art results.Comment: Artificial Neural Networks book (Springer) - Chapter 1

    Mining the Medical and Patent Literature to Support Healthcare and Pharmacovigilance

    Get PDF
    Recent advancements in healthcare practices and the increasing use of information technology in the medical domain has lead to the rapid generation of free-text data in forms of scientific articles, e-health records, patents, and document inventories. This has urged the development of sophisticated information retrieval and information extraction technologies. A fundamental requirement for the automatic processing of biomedical text is the identification of information carrying units such as the concepts or named entities. In this context, this work focuses on the identification of medical disorders (such as diseases and adverse effects) which denote an important category of concepts in the medical text. Two methodologies were investigated in this regard and they are dictionary-based and machine learning-based approaches. Futhermore, the capabilities of the concept recognition techniques were systematically exploited to build a semantic search platform for the retrieval of e-health records and patents. The system facilitates conventional text search as well as semantic and ontological searches. Performance of the adapted retrieval platform for e-health records and patents was evaluated within open assessment challenges (i.e. TRECMED and TRECCHEM respectively) wherein the system was best rated in comparison to several other competing information retrieval platforms. Finally, from the medico-pharma perspective, a strategy for the identification of adverse drug events from medical case reports was developed. Qualitative evaluation as well as an expert validation of the developed system's performance showed robust results. In conclusion, this thesis presents approaches for efficient information retrieval and information extraction from various biomedical literature sources in the support of healthcare and pharmacovigilance. The applied strategies have potential to enhance the literature-searches performed by biomedical, healthcare, and patent professionals. The applied strategies have potential to enhance the literature-searches performed by biomedical, healthcare, and patent professionals. This can promote the literature-based knowledge discovery, improve the safety and effectiveness of medical practices, and drive the research and development in medical and healthcare arena

    Facilitating Design-by-Analogy: Development of a Complete Functional Vocabulary and Functional Vector Approach to Analogical Search

    Get PDF
    Design-by-analogy is an effective approach to innovative concept generation, but can be elusive at times due to the fact that few methods and tools exist to assist designers in systematically seeking and identifying analogies from general data sources, databases, or repositories, such as patent databases. A new method for extracting analogies from data sources has been developed to provide this capability. Building on past research, we utilize a functional vector space model to quantify analogous similarity between a design problem and the data source of potential analogies. We quantitatively evaluate the functional similarity between represented design problems and, in this case, patent descriptions of products. We develop a complete functional vocabulary to map the patent database to applicable functionally critical terms, using document parsing algorithms to reduce text descriptions of the data sources down to the key functions, and applying Zipf’s law on word count order reduction to reduce the words within the documents. The reduction of a document (in this case a patent) into functional analogous words enables the matching to novel ideas that are functionally similar, which can be customized in various ways. This approach thereby provides relevant sources of design-by-analogy inspiration. Although our implementation of the technique focuses on functional descriptions of patents and the mapping of these functions to those of the design problem, resulting in a set of analogies, we believe that this technique is applicable to other analogy data sources as well. As a verification of the approach, an original design problem for an automated window washer illustrates the distance range of analogical solutions that can be extracted, extending from very near-field, literal solutions to far-field cross-domain analogies. Finally, a comparison with a current patent search tool is performed to draw a contrast to the status quo and evaluate the effectiveness of this work.National Science Foundation (U.S.) (grant number CMMI-0855510)National Science Foundation (U.S.) (grant number CMMI-0855326)National Science Foundation (U.S.) (grant number CMMI-0855293)SUTD-MIT International Design Centre (IDC

    PatentMatrix: an automated tool to survey patents related to large sets of genes or proteins

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The number of patents associated with genes and proteins and the amount of information contained in each patent often present a real obstacle to the rapid evaluation of the novelty of findings associated to genes from an intellectual property (IP) perspective. This assessment, normally carried out by expert patent professionals, can therefore become cumbersome and time consuming. Here we present PatentMatrix, a novel software tool for the automated analysis of patent sequence text entries.</p> <p>Methods and Results</p> <p>PatentMatrix is written in the Awk language and requires installation of the Derwent GENESEQ™ patent sequence database under the sequence retrieval system SRS.</p> <p>The software works by taking as input two files: i) a list of genes or proteins with the associated GENESEQ™ patent sequence accession numbers ii) a list of keywords describing the research context of interest (e.g. 'lung', 'cancer', 'therapeutics', 'diagnostics'). The GENESEQ™ database is interrogated through the SRS system and each patent entry of interest is screened for the occurrence of user-defined keywords. Moreover, the software extracts the basic information useful for a preliminary assessment of the IP coverage of each patent from the GENESEQ™ database. As output, two tab-delimited files are generated which provide the user with a detailed and an aggregated view of the results.</p> <p>An example is given where the IP position of five genes is evaluated in the context of 'development of antibodies for cancer treatment'</p> <p>Conclusion</p> <p>PatentMatrix allows a rapid survey of patents associated with genes or proteins in a particular area of interest as defined by keywords. It can be efficiently used to evaluate the IP-related novelty of scientific findings and to rank genes or proteins according to their IP position.</p
    corecore