47 research outputs found

    Detectors could spot plagiarism in research proposals

    Get PDF
    Having all been involved in proposal evaluation, we believe the studies indicate that a text matching analysis of research proposals could reduce plagiarism in subsequent publications. For instance, when European Commission evaluators have met in the past to evaluate research proposals, they received printed copies which had to be returned before the panel members left, and had no computer access during deliberations. A plagiarism detector using text-mining methods could be used instead of the current security measures. Such a system could, in principle, detect similarities to previous submissions and uncited sources using advanced document segmentation. Only official agencies have access to confidential proposals and the funds to experiment with automated plagiarism-detectors. It is important that they should investigate these approaches to reducing the possibility of scientific misconduct

    A Knowledge Engineering Approach to Recognizing and Extracting Sequences of Nucleic Acids from Scientific Literature

    Full text link
    In this paper we present a knowledge engineering approach to automatically recognize and extract genetic sequences from scientific articles. To carry out this task, we use a preliminary recognizer based on a finite state machine to extract all candidate DNA/RNA sequences. The latter are then fed into a knowledge-based system that automatically discards false positives and refines noisy and incorrectly merged sequences. We created the knowledge base by manually analyzing different manuscripts containing genetic sequences. Our approach was evaluated using a test set of 211 full-text articles in PDF format containing 3134 genetic sequences. For such set, we achieved 87.76% precision and 97.70% recall respectively. This method can facilitate different research tasks. These include text mining, information extraction, and information retrieval research dealing with large collections of documents containing genetic sequences

    A Method for Indexing Biomedical Resources over the Internet

    Get PDF
    A large number of biomedical resources are publicly available over the Internet. This number grows every day. Biomedical researchers face the problem of locating, identifying and selecting the most appropriate resources according to their interests. Some resource indexes can be found in the Internet, but they only provide information and links related to resources created by the owner institution of each website. In this paper we propose a novel method for extracting information from the literature and create a Resourceome, i.e. an index of biomedical resources (databases, tools and services) in a semi-automatic way. In this approach we consider only the information provided by the abstracts of relevant papers in the area. Building a comprehensive resource index is the first step towards the development of new methodologies for the automatic or semi-automatic construction of complex biomedical workflows which allow combining several resources to obtain higher-level functionalities

    Using Machine Learning to Collect and Facilitate Remote Access to Biomedical Databases: Development of the Biomedical Database Inventory

    Get PDF
    [Abstract] Background: Currently, existing biomedical literature repositories do not commonly provide users with specific means to locate and remotely access biomedical databases. Objective: To address this issue, we developed the Biomedical Database Inventory (BiDI), a repository linking to biomedical databases automatically extracted from the scientific literature. BiDI provides an index of data resources and a path to access them seamlessly. Methods: We designed an ensemble of deep learning methods to extract database mentions. To train the system, we annotated a set of 1242 articles that included mentions of database publications. Such a data set was used along with transfer learning techniques to train an ensemble of deep learning natural language processing models targeted at database publication detection. Results: The system obtained an F1 score of 0.929 on database detection, showing high precision and recall values. When applying this model to the PubMed and PubMed Central databases, we identified over 10,000 unique databases. The ensemble model also extracted the weblinks to the reported databases and discarded irrelevant links. For the extraction of weblinks, the model achieved a cross-validated F1 score of 0.908. We show two use cases: one related to “omics” and the other related to the COVID-19 pandemic. Conclusions: BiDI enables access to biomedical resources over the internet and facilitates data-driven research and other scientific initiatives. The repository is openly available online and will be regularly updated with an automatic text processing pipeline. The approach can be reused to create repositories of different types (ie, biomedical and others).Proyecto colaborativo de integración de datos genómicos; PI17/0156

    An Automatic Method for Retrieving and Indexing Catalogues of Biomedical Courses

    Get PDF
    Although there is wide information about Biomedical Informatics education and courses in different Websites, information is usually not exhaustive and difficult to update. We propose a new methodology based on information retrieval techniques for extracting, indexing and retrieving automatically information about educational offers. A web application has been developed to make available such information in an inventory of courses and educational offers

    CDAPubMed: a browser extension to retrieve EHR-based biomedical literature

    Get PDF
    Over the last few decades, the ever-increasing output of scientific publications has led to new challenges to keep up to date with the literature. In the biomedical area, this growth has introduced new requirements for professionals, e.g., physicians, who have to locate the exact papers that they need for their clinical and research work amongst a huge number of publications. Against this backdrop, novel information retrieval methods are even more necessary. While web search engines are widespread in many areas, facilitating access to all kinds of information, additional tools are required to automatically link information retrieved from these engines to specific biomedical applications. In the case of clinical environments, this also means considering aspects such as patient data security and confidentiality or structured contents, e.g., electronic health records (EHRs). In this scenario, we have developed a new tool to facilitate query building to retrieve scientific literature related to EHRs. Results: We have developed CDAPubMed, an open-source web browser extension to integrate EHR features in biomedical literature retrieval approaches. Clinical users can use CDAPubMed to: (i) load patient clinical documents, i.e., EHRs based on the Health Level 7-Clinical Document Architecture Standard (HL7-CDA), (ii) identify relevant terms for scientific literature search in these documents, i.e., Medical Subject Headings (MeSH), automatically driven by the CDAPubMed configuration, which advanced users can optimize to adapt to each specific situation, and (iii) generate and launch literature search queries to a major search engine, i.e., PubMed, to retrieve citations related to the EHR under examination. Conclusions: CDAPubMed is a platform-independent tool designed to facilitate literature searching using keywords contained in specific EHRs. CDAPubMed is visually integrated, as an extension of a widespread web browser, within the standard PubMed interface. It has been tested on a public dataset of HL7-CDA documents, returning significantly fewer citations since queries are focused on characteristics identified within the EHR. For instance, compared with more than 200,000 citations retrieved by breast neoplasm, fewer than ten citations were retrieved when ten patient features were added using CDAPubMed. This is an open source tool that can be freely used for non-profit purposes and integrated with other existing systems

    Nanoinformática: retos e iniciativas para la gestión de la información generada en la investigación nanomédica

    Get PDF
    Durante la última década la investigación en nanomedicina ha generado gran cantidad de datos, heterogéneos, distribuidos en múltiples fuentes de información. El uso de las Tecnologías de la Información y la Comunicación (TIC) puede facilitar la investigación médica a escala nanométrica, proporcionando mecanismos y herramientas que permitan gestionar todos esos datos de una manera inteligente. Mientras que la informática biomédica comprende el procesamiento y gestión de la información generada desde el nivel de salud pública y aplicación clínica hasta el nivel molecular, la nanoinformática extiende este ámbito para incluir el “nivel nano”, ocupándose de gestionar y analizar los resultados generados durante la investigación en nanomedicina y desarrollar nuevas líneas de trabajo en este espacio interdisciplinar. En esta nueva área científica, la nanoinformática (que podría consolidarse como una auténtica disciplina en los próximos años), elGrupo de Informática Biomédica (GIB) de la Universidad Politécnica de Madrid (UPM) participa en numerosas iniciativas, que se detallan a continuación

    Cloud Computing Service for Managing Large Medical Image Data-Sets Using Balanced Collaborative Agents

    Get PDF
    Managing large medical image collections is an increasingly demanding important issue in many hospitals and other medical settings. A huge amount of this information is daily generated, which requires robust and agile systems. In this paper we present a distributed multi-agent system capable of managing very large medical image datasets. In this approach, agents extract low-level information from images and store them in a data structure implemented in a relational database. The data structure can also store semantic information related to images and particular regions. A distinctive aspect of our work is that a single image can be divided so that the resultant sub-images can be stored and managed separately by different agents to improve performance in data accessing and processing. The system also offers the possibility of applying some region-based operations and filters on images, facilitating image classification. These operations can be performed directly on data structures in the database

    Nanoinformatics: a new area of research in nanomedicine

    Get PDF
    Over a decade ago, nanotechnologists began research on applications of nanomaterials for medicine. This research has revealed a wide range of different challenges, as well as many opportunities. Some of these challenges are strongly related to informatics issues, dealing, for instance, with the management and integration of heterogeneous information, defining nomenclatures, taxonomies and classifications for various types of nanomaterials, and research on new modeling and simulation techniques for nanoparticles. Nanoinformatics has recently emerged in the USA and Europe to address these issues. In this paper, we present a review of nanoinformatics, describing its origins, the problems it addresses, areas of interest, and examples of current research initiatives and informatics resources. We suggest that nanoinformatics could accelerate research and development in nanomedicine, as has occurred in the past in other fields. For instance, biomedical informatics served as a fundamental catalyst for the Human Genome Project, and other genomic and ?omics projects, as well as the translational efforts that link resulting molecular-level research to clinical problems and findings

    Cloud Computing en salud: Sistema para Administrar Imagenes Biomedicas

    Get PDF
    En el campo de la biomedicina se genera una inmensa cantidad de imágenes diariamente. Para administrarlas es necesaria la creación de sistemas informáticos robustos y ágiles, que necesitan gran cantidad de recursos computacionales. El presente artículo presenta un servicio de cloud computing capaz de manejar grandes colecciones de imágenes biomédicas. Gracias a este servicio organizaciones y usuarios podrían administrar sus imágenes biomédicas sin necesidad de poseer grandes recursos informáticos. El servicio usa un sistema distribuido multi agente donde las imágenes son procesadas y se extraen y almacenan en una estructura de datos las regiones que contiene junto con sus características. Una característica novedosa del sistema es que una misma imagen puede ser dividida, y las sub-imágenes resultantes pueden ser almacenadas por separado por distintos agentes. Esta característica ayuda a mejorar el rendimiento del sistema a la hora de buscar y recuperar las imágenes almacenadas
    corecore