1,144 research outputs found

    Tools for loading MEDLINE into a local relational database

    Get PDF
    BACKGROUND: Researchers who use MEDLINE for text mining, information extraction, or natural language processing may benefit from having a copy of MEDLINE that they can manage locally. The National Library of Medicine (NLM) distributes MEDLINE in eXtensible Markup Language (XML)-formatted text files, but it is difficult to query MEDLINE in that format. We have developed software tools to parse the MEDLINE data files and load their contents into a relational database. Although the task is conceptually straightforward, the size and scope of MEDLINE make the task nontrivial. Given the increasing importance of text analysis in biology and medicine, we believe a local installation of MEDLINE will provide helpful computing infrastructure for researchers. RESULTS: We developed three software packages that parse and load MEDLINE, and ran each package to install separate instances of the MEDLINE database. For each installation, we collected data on loading time and disk-space utilization to provide examples of the process in different settings. Settings differed in terms of commercial database-management system (IBM DB2 or Oracle 9i), processor (Intel or Sun), programming language of installation software (Java or Perl), and methods employed in different versions of the software. The loading times for the three installations were 76 hours, 196 hours, and 132 hours, and disk-space utilization was 46.3 GB, 37.7 GB, and 31.6 GB, respectively. Loading times varied due to a variety of differences among the systems. Loading time also depended on whether data were written to intermediate files or not, and on whether input files were processed in sequence or in parallel. Disk-space utilization depended on the number of MEDLINE files processed, amount of indexing, and whether abstracts were stored as character large objects or truncated. CONCLUSIONS: Relational database (RDBMS) technology supports indexing and querying of very large datasets, and can accommodate a locally stored version of MEDLINE. RDBMS systems support a wide range of queries and facilitate certain tasks that are not directly supported by the application programming interface to PubMed. Because there is variation in hardware, software, and network infrastructures across sites, we cannot predict the exact time required for a user to load MEDLINE, but our results suggest that performance of the software is reasonable. Our database schemas and conversion software are publicly available at

    botXminer: mining biomedical literature with a new web-based application

    Get PDF
    This paper outlines botXminer, a publicly available application to search XML-formatted MEDLINE(®) data in a complete, object-relational schema implemented in Oracle(®) XML DB. An advantage offered by botXminer is that it can generate quantitative results with certain queries that are not feasible through the Entrez-PubMed(®) interface. After retrieving citations associated with user-supplied search terms, MEDLINE fields (title, abstract, journal, MeSH(®) and chemical) and terms (MeSH qualifiers and descriptors, keywords, author, gene symbol and chemical), these citations are grouped and displayed as tabulated or graphic results. This work represents an extension of previous research for integrating these citations with relational systems. botXminer has a user-friendly, intuitive interface that can be freely accessed at

    Multilingual query expansion in the Svemed+ bibliographic database : a case study

    Get PDF
    SveMed+ is a bibliographic database covering Scandinavian medical journals. It is produced by the University Library of Karolinska Institutet in Sweden. The bibliographic references are indexed with terms from the Medical Subject Headings (MeSH) thesaurus. The MeSH has been translated into several languages including Swedish, making it suitable as the basis for multilingual tools in the medical field. The data structure of SveMed+ closely mimics that of PubMed/MEDLINE. Users of PubMed/MEDLINE and similar databases typically expect retrieval features that are not readily available off-the-shelf. The SveMed+ interface is based on a free text search engine (Solr) and a relational database management system (Microsoft SQL Server) containing the bibliographic database and a multilingual thesaurus database. The thesaurus database contains medical terms in three different languages and information about relationships between the terms. A combined approach involving the Solr free text index, the bibliographic database and the thesaurus database allowed the implementation of functionality such as automatic multilingual query expansion, faceting and hierarchical explode searches. The present paper describes how this was done in practice.NoneAccepte

    Developing a database for Genbank information.

    Get PDF
    The thesis project, Gene Database, was done to create a way for the bioinformatics research group at the University of Louisville to have access to GenBank EST information in the form of a database. This database allows for a programmable front end to be used to conduct further research with the use of EST information. The database backend used is Oracle and was populated through a custom Java program. The loader was created in lieu of using Oracle\u27s SQL*Loader because of the limitations in SQL*Loader. Previous ways of accessing the GenBank information included downloading the compressed files and using them locally as raw file formats or using the NCBI Website remotely. This Gene Database allows for a central location for bio-information of the GenBank to be kept at the University of Louisville. The database was initially populated with the human EST information. The database is versatile enough to allow for other organisms to be stored in the database as well. It also allows for custom queries for specific research goals that are spawned by having this information readily available for researchers

    PubFocus: semantic MEDLINE/PubMed citations analytics through integration of controlled biomedical dictionaries and ranking algorithm

    Get PDF
    BACKGROUND: Understanding research activity within any given biomedical field is important. Search outputs generated by MEDLINE/PubMed are not well classified and require lengthy manual citation analysis. Automation of citation analytics can be very useful and timesaving for both novices and experts. RESULTS: PubFocus web server automates analysis of MEDLINE/PubMed search queries by enriching them with two widely used human factor-based bibliometric indicators of publication quality: journal impact factor and volume of forward references. In addition to providing basic volumetric statistics, PubFocus also prioritizes citations and evaluates authors' impact on the field of search. PubFocus also analyses presence and occurrence of biomedical key terms within citations by utilizing controlled vocabularies. CONCLUSION: We have developed citations' prioritisation algorithm based on journal impact factor, forward referencing volume, referencing dynamics, and author's contribution level. It can be applied either to the primary set of PubMed search results or to the subsets of these results identified through key terms from controlled biomedical vocabularies and ontologies. NCI (National Cancer Institute) thesaurus and MGD (Mouse Genome Database) mammalian gene orthology have been implemented for key terms analytics. PubFocus provides a scalable platform for the integration of multiple available ontology databases. PubFocus analytics can be adapted for input sources of biomedical citations other than PubMed

    A review of mentorship measurement tools

    Get PDF
    © 2016 Elsevier Ltd. Objectives: To review mentorship measurement tools in various fields to inform nursing educators on selection, application, and developing of mentoring instruments. Design: A literature review informed by PRISMA 2009 guidelines. Data Sources: Six databases: CINHAL, Medline, PsycINFO, Academic Search Premier, ERIC, Business premier resource. Review Methods: Search terms and strategies used: mentor* N3 (behav* or skill? or role? or activit? or function* or relation*) and (scale or tool or instrument or questionnaire or inventory). The time limiter was set from January 1985 to June 2015. Extracted data were content of instruments, samples, psychometrics, theoretical framework, and utility. An integrative review method was used. Results: Twenty-eight papers linked to 22 scales were located, seven from business and industry, 11 from education, 3 from health science, and 1 focused on research mentoring. Mentorship measurement was pioneered by business with a universally accepted theoretical framework, i.e. career function and psychosocial function, and the trend of scale development is developing: from focusing on the positive side of mentorship shifting to negative mentoring experiences and challenges. Nursing educators mainly used instruments from business to assess mentorship among nursing teachers. In education and nursing, measurement has taken to a more specialised focus: researchers in different contexts have developed scales to measure different specific aspects of mentorship. Most tools show psychometric evidence of content homogeneity and construct validity but lack more comprehensive and advanced tests. Conclusion: Mentorship is widely used and conceptualised differently in different fields and is less mature in nursing than in business. Measurement of mentorship is heading to a more specialised and comprehensive process. Business and education provided measurement tools to nursing educators to assess mentorship among staff, but a robust instrument to measure nursing students' mentorship is needed

    Literature search

    Get PDF
    The paper seeks to highlight the complexity of literature searching in online bibliographic databases and the importance of developing advanced search skills towards greater search efficiency. The lack of knowledge of the content, structure and operation of databases, poor search skills, and superficiality in assessing search results are discussed as the major obstacles to efficient literature searching. It is suggested that despite technical improvements towards adjusting search engines to natural language processing, the knowledge of traditional search strategies remains highly relevan

    K2/Kleisli and GUS: Experiments in Integrated Access to Genomic Data Sources

    Get PDF
    The integration of heterogeneous data sources and software systems is a major issue in the biomed ical community and several approaches have been explored: linking databases, on-the- fly integration through views, and integration through warehousing. In this paper we report on our experiences with two systems that were developed at the University of Pennsylvania: an integration system called K2, which has primarily been used to provide views over multiple external data sources and software systems; and a data warehouse called GUS which downloads, cleans, integrates and annotates data from multiple external data sources. Although the view and warehouse approaches each have their advantages, there is no clear winner . Therefore, users must consider how the data is to be used, what the performance guarantees must be, and how much programmer time and expertise is available to choose the best strategy for a particular application

    PubMed QUEST: The PubMed Query Search Tool. An informatics tool to aid cancer centers and cancer investigators in searching the PubMed databases

    Get PDF
    Searching PubMed for citations related to a specific cancer center or group of authors can be labor-intensive. We have created a tool, PubMed QUEST, to aid in the rapid searching of PubMed for publications of interest. It was designed by taking into account the needs of entire cancer centers as well as individual investigators. The experience of using the tool by our institution’s cancer center administration and investigators has been favorable and we believe it could easily be adapted to other institutions. Use of the tool has identified limitations of automated searches for publications based on an author’s name, especially for common names. These limitations could likely be solved if the PubMed database assigned a unique identifier to each author
    • …
    corecore