75 research outputs found

    Knowledge-Driven Methods for Geographic Information Extraction in the Biomedical Domain

    Get PDF
    abstract: Accounting for over a third of all emerging and re-emerging infections, viruses represent a major public health threat, which researchers and epidemiologists across the world have been attempting to contain for decades. Recently, genomics-based surveillance of viruses through methods such as virus phylogeography has grown into a popular tool for infectious disease monitoring. When conducting such surveillance studies, researchers need to manually retrieve geographic metadata denoting the location of infected host (LOIH) of viruses from public sequence databases such as GenBank and any publication related to their study. The large volume of semi-structured and unstructured information that must be reviewed for this task, along with the ambiguity of geographic locations, make it especially challenging. Prior work has demonstrated that the majority of GenBank records lack sufficient geographic granularity concerning the LOIH of viruses. As a result, reviewing full-text publications is often necessary for conducting in-depth analysis of virus migration, which can be a very time-consuming process. Moreover, integrating geographic metadata pertaining to the LOIH of viruses from different sources, including different fields in GenBank records as well as full-text publications, and normalizing the integrated metadata to unique identifiers for subsequent analysis, are also challenging tasks, often requiring expert domain knowledge. Therefore, automated information extraction (IE) methods could help significantly accelerate this process, positively impacting public health research. However, very few research studies have attempted the use of IE methods in this domain. This work explores the use of novel knowledge-driven geographic IE heuristics for extracting, integrating, and normalizing the LOIH of viruses based on information available in GenBank and related publications; when evaluated on manually annotated test sets, the methods were found to have a high accuracy and shown to be adequate for addressing this challenging problem. It also presents GeoBoost, a pioneering software system for georeferencing GenBank records, as well as a large-scale database containing over two million virus GenBank records georeferenced using the algorithms introduced here. The methods, database and software developed here could help support diverse public health domains focusing on sequence-informed virus surveillance, thereby enhancing existing platforms for controlling and containing disease outbreaks.Dissertation/ThesisDoctoral Dissertation Biomedical Informatics 201

    Biomedical Information Extraction Pipelines for Public Health in the Age of Deep Learning

    Get PDF
    abstract: Unstructured texts containing biomedical information from sources such as electronic health records, scientific literature, discussion forums, and social media offer an opportunity to extract information for a wide range of applications in biomedical informatics. Building scalable and efficient pipelines for natural language processing and extraction of biomedical information plays an important role in the implementation and adoption of applications in areas such as public health. Advancements in machine learning and deep learning techniques have enabled rapid development of such pipelines. This dissertation presents entity extraction pipelines for two public health applications: virus phylogeography and pharmacovigilance. For virus phylogeography, geographical locations are extracted from biomedical scientific texts for metadata enrichment in the GenBank database containing 2.9 million virus nucleotide sequences. For pharmacovigilance, tools are developed to extract adverse drug reactions from social media posts to open avenues for post-market drug surveillance from non-traditional sources. Across these pipelines, high variance is observed in extraction performance among the entities of interest while using state-of-the-art neural network architectures. To explain the variation, linguistic measures are proposed to serve as indicators for entity extraction performance and to provide deeper insight into the domain complexity and the challenges associated with entity extraction. For both the phylogeography and pharmacovigilance pipelines presented in this work the annotated datasets and applications are open source and freely available to the public to foster further research in public health.Dissertation/ThesisDoctoral Dissertation Biomedical Informatics 201

    A metadata model for the annotation of epidemiological data

    Get PDF
    Trabalho de projecto de mestrado em Tecnologias da Informação Aplicadas às Ciências Biológicas e Médicas, apresentado à Universidade de Lisboa, através da Faculdade de Ciências, 2010Esta dissertação apresenta um modelo de metadados para integração, gestão e partilha de dados epidemiológicos. O modelo incorpora elementos do Dublin Core, um standard para anotação de metadados largamente usado na internet. São também incluídos outros elementos de forma a melhor estruturar os termos Dublin Core, além de novos elementos para a descrição de conceitos epidemiológicos, ou relacionados, de uma forma mais específica. O modelo foi desenvolvido para a Epidemic Marketplace, uma plataforma de gestão e integração de dados para sistemas de modelação epidemiológica em desenvolvimento no âmbito do projecto de investigação EPIWORK. O repositório digital da Epidemic Marketplace foi construído fazendo uso do modelo de metadados desenvolvido neste trabalho, sobre a plataforma Fedora Commons, usando o software Muradora como interface. A anotação de recursos é assistida através do uso de listas baseadas em vocabulários controlados, menus de ajuda e preenchimento automático de metadados. O uso de vocabulários controlados, gerados frequentemente a partir de bases de termos ontológicos, é essencial para melhorar a qualidade da representação semântica dos metadados e facilita a sua interpretação automática.This thesis presents a metadata model for integration, management and sharing of epidemiological data. The model incorporates elements from the Dublin Core metadata standard along with new metadata elements to extend and structure the Dublin Core terms. It also includes new elements for the description of the specificities of the epidemiological information. The model was developed for the Epidemic Marketplace, a digital library for epidemic modeling systems under development within the EPIWORK research project. The deployed digital repository of the Epidemic Marketplace was implemented based on this model, using the Fedora Commons platform with Muradadora as front-end. The annotation of resources is assisted by controlled vocabularies, help menus and automatic filling of metadata. The use of controlled vocabularies, often created from ontolologic term lists, keeps metadata consistent, improves its semantics, and facilitates the automatic interpretation of metadata

    Planning for the Lifecycle Management and Long-Term Preservation of Research Data: A Federated Approach

    Get PDF
    Outcomes of the grant are archived here.The “data deluge” is a recent but increasingly well-understood phenomenon of scientific and social inquiry. Large-scale research instruments extend our observational power by many orders of magnitude but at the same time generate massive amounts of data. Researchers work feverishly to document and preserve changing or disappearing habitats, cultures, languages, and artifacts resulting in volumes of media in various formats. New software tools mine a growing universe of historical and modern texts and connect the dots in our semantic environment. Libraries, archives, and museums undertake digitization programs creating broad access to unique cultural heritage resources for research. Global-scale research collaborations with hundreds or thousands of participants, drive the creation of massive amounts of data, most of which cannot be recreated if lost. The University of Kansas (KU) Libraries in collaboration with two partners, the Greater Western Library Alliance (GWLA) and the Great Plains Network (GPN), received an IMLS National Leadership Grant designed to leverage collective strengths and create a proposal for a scalable and federated approach to the lifecycle management of research data based on the needs of GPN and GWLA member institutions.Institute for Museum and Library Services LG-51-12-0695-1

    Proceedings of the 12th International Conference on Digital Preservation

    Get PDF
    The 12th International Conference on Digital Preservation (iPRES) was held on November 2-6, 2015 in Chapel Hill, North Carolina, USA. There were 327 delegates from 22 countries. The program included 12 long papers, 15 short papers, 33 posters, 3 demos, 6 workshops, 3 tutorials and 5 panels, as well as several interactive sessions and a Digital Preservation Showcase

    Proceedings of the 12th International Conference on Digital Preservation

    Get PDF
    The 12th International Conference on Digital Preservation (iPRES) was held on November 2-6, 2015 in Chapel Hill, North Carolina, USA. There were 327 delegates from 22 countries. The program included 12 long papers, 15 short papers, 33 posters, 3 demos, 6 workshops, 3 tutorials and 5 panels, as well as several interactive sessions and a Digital Preservation Showcase

    Proceedings of the 10th International Conference on Ecological Informatics: translating ecological data into knowledge and decisions in a rapidly changing world: ICEI 2018

    Get PDF
    The Conference Proceedings are an impressive display of the current scope of Ecological Informatics. Whilst Data Management, Analysis, Synthesis and Forecasting have been lasting popular themes over the past nine biannual ICEI conferences, ICEI 2018 addresses distinctively novel developments in Data Acquisition enabled by cutting edge in situ and remote sensing technology. The here presented ICEI 2018 abstracts captures well current trends and challenges of Ecological Informatics towards: • regional, continental and global sharing of ecological data, • thorough integration of complementing monitoring technologies including DNA-barcoding, • sophisticated pattern recognition by deep learning, • advanced exploration of valuable information in ‘big data’ by means of machine learning and process modelling, • decision-informing solutions for biodiversity conservation and sustainable ecosystem management in light of global changes

    Proceedings of the 10th International Conference on Ecological Informatics: translating ecological data into knowledge and decisions in a rapidly changing world: ICEI 2018

    Get PDF
    The Conference Proceedings are an impressive display of the current scope of Ecological Informatics. Whilst Data Management, Analysis, Synthesis and Forecasting have been lasting popular themes over the past nine biannual ICEI conferences, ICEI 2018 addresses distinctively novel developments in Data Acquisition enabled by cutting edge in situ and remote sensing technology. The here presented ICEI 2018 abstracts captures well current trends and challenges of Ecological Informatics towards: • regional, continental and global sharing of ecological data, • thorough integration of complementing monitoring technologies including DNA-barcoding, • sophisticated pattern recognition by deep learning, • advanced exploration of valuable information in ‘big data’ by means of machine learning and process modelling, • decision-informing solutions for biodiversity conservation and sustainable ecosystem management in light of global changes

    Prospects for Schistosomiasis Elimination

    Get PDF
    Current efforts to limit the ravages of schistosomiasis are pushing the world closer to eliminating a chronic infection that has been associated with human life in the tropics since time immemorial. This notwithstanding, the disease remains a scourge for large populations in sub-Saharan Africa, Latin America, and Southeast Asia, and the main part of this book is made up by papers dealing with its current distribution, discussing ways and means to establish and implement improved control approaches. While chemotherapy limits the symptoms caused by schistosomiasis, the number of infected people will not decrease until the parasite's life cycle is interrupted. To that end, some papers focus on the intermediate snail host, which is notoriously difficult to control, while others discuss human hygiene and sanitation. The latter approach not only prevents infection through avoiding people being infected from the snail, but more importantly, also stops people infecting the snail by leaving contagious feces and urine in nature. With morbidity reduced by chemotherapy, the immediate target now is the interruption of transmission to be achieved by new tools, such as the novel chemotherapies, improved diagnostics (for humans, animals, and snails), and vaccines discussed in several of the papers. As made clear in this book, a complex infection requires new tools as well as work on many fronts, above all; however, a clear idea is needed as to how to skillfully combine the tools available and sustain implemented control activities
    corecore