866 research outputs found

    Finding scientific articles in a large digital archive: BioStor and the Biodiversity Heritage Library

    Get PDF
    The Biodiversity Heritage Library (BHL) is a large digital archive of legacy biological literature, comprising over 31 million pages scanned from books, monographs, and journals. During the digitisation process basic metadata about the scanned items is recorded, but not article-level metadata. Given that the article is the standard unit of citation, this makes it difficult to locate cited literature in BHL. Adding the ability to easily find articles in BHL would greatly enhance the value of the archive. A service was developed to locate articles in BHL based on matching article metadata to BHL metadata using approximate string matching, regular expressions, and string alignment. This article finding service is exposed as a standard OpenURL resolver on the BioStor web site "http://biostor.org/openurl/":http://biostor.org/openurl/. This resolver can be used on the web, or called by bibliographic tools that support OpenURL. BioStor provides tools for extracting, annotating, and visualising articles from the Biodiversity Heritage Library. BioStor is available from "http://biostor.org/":http://biostor.org/

    Digital library search preferences amongst historians and genealogists: British History Online user survey

    Get PDF
    This paper presents the results of a study of 1,439 users of British History Online (BHO). BHO is a digital library of key printed primary and secondary sources for the history of Britain and Ireland, with a principal focus on the period between 1300 and 1800. The collection currently contains 1,250 volumes, and 120,000 web pages of material. During a website rebuild in 2014, the project team asked its registered users about their preferences for searching and browsing the content in the collection. Respondents were asked about their current search and browsing behaviour, as well as their receptiveness to new navigation options, including fuzzy searching, proximity searching, limiting search to a subset of the collection, searching by publication metadata, and searching entities within the texts such as person names, place names, or footnotes. The study provides insight into the unique and often converging needs of the site’s academic and genealogical users, noting that the former tended to respond in favour of options that gave them greater control over the search process, whereas the latter generally opted for options to improve the efficacy of targeted keyword searching. Results and recommendations are offered for managers of similar digitally-driven repositories interested in understanding and improving user experience.Peer reviewedFinal Published versio

    Stuck in the middle: Developing research workflows for multi-scale text analysis

    Get PDF
    Stuck in the middle: Developing research workflows for multi-scale text analysi

    COSPO/CENDI Industry Day Conference

    Get PDF
    The conference's objective was to provide a forum where government information managers and industry information technology experts could have an open exchange and discuss their respective needs and compare them to the available, or soon to be available, solutions. Technical summaries and points of contact are provided for the following sessions: secure products, protocols, and encryption; information providers; electronic document management and publishing; information indexing, discovery, and retrieval (IIDR); automated language translators; IIDR - natural language capabilities; IIDR - advanced technologies; IIDR - distributed heterogeneous and large database support; and communications - speed, bandwidth, and wireless

    Information retrieval and text mining technologies for chemistry

    Get PDF
    Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.A.V. and M.K. acknowledge funding from the European Community’s Horizon 2020 Program (project reference: 654021 - OpenMinted). M.K. additionally acknowledges the Encomienda MINETAD-CNIO as part of the Plan for the Advancement of Language Technology. O.R. and J.O. thank the Foundation for Applied Medical Research (FIMA), University of Navarra (Pamplona, Spain). This work was partially funded by Consellería de Cultura, Educación e Ordenación Universitaria (Xunta de Galicia), and FEDER (European Union), and the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of UID/BIO/04469/2013 unit and COMPETE 2020 (POCI-01-0145-FEDER-006684). We thank Iñigo Garciá -Yoldi for useful feedback and discussions during the preparation of the manuscript.info:eu-repo/semantics/publishedVersio

    Advanced satellite workstation: An integrated workstation environment for operational support of satellite system planning and analysis

    Get PDF
    A prototype integrated environment, the Advanced Satellite Workstation (ASW), is described that has been developed and delivered for evaluation and operator feedback in an operational satellite control center. The current ASW hardware consists of a Sun Workstation and Macintosh II Workstation connected via an ethernet Network Hardware and Software, Laser Disk System, Optical Storage System, and Telemetry Data File Interface. The central mission of ASW is to provide an intelligent decision support and training environment for operator/analysts of complex systems such as satellites. There have been many workstation implementations recently which incorporate graphical telemetry displays and expert systems. ASW is a considerably broader look at intelligent, integrated environments for decision support, based upon the premise that the central features of such an environment are intelligent data access and integrated toolsets. A variety of tools have been constructed in support of this prototype environment including: an automated pass planner for scheduling vehicle support activities, architectural modeler for hierarchical simulation and analysis of satellite vehicle subsystems, multimedia-based information systems that provide an intuitive and easily accessible interface to Orbit Operations Handbook and other relevant support documentation, and a data analysis architecture that integrates user modifiable telemetry display systems, expert systems for background data analysis, and interfaces to the multimedia system via inter-process communication

    Old Books and Digital Publishing: Eighteenth-Century Collections Online

    Get PDF
    This is a history of Eighteenth-Century Collections Online, a database of over 180,000 titles. Published by Gale in 2003 it has had an enormous impact of the study of the eighteenth century. An essential aspect of this Element is how it explores the socio-cultural and technological debates around the access to old books

    Micro-database for sustainability (ESG) indicators developed at the Banco de España (2022)

    Get PDF
    En los últimos años, la preocupación por los temas sociales y medioambientales ha ido en aumento y, en consecuencia, la demanda de datos sobre sostenibilidad se ha incrementado exponencialmente. Por esta razón, se ha desarrollado en el Departamento de Estadística del Banco de España una base de microdatos sobre indicadores de sostenibilidad (ESG). Este documento presenta dos artículos que analizan el proceso desarrollado para capturar esta información, así como las numerosas limitaciones y dificultades encontradas a lo largo del camino de búsqueda de microdatos sobre sostenibilidad. Concretamente, los dos temas que tratan los artículos son: “Analysing climate change data gaps” (presentado en la 11th Biennial IFC Conference on “Post-pandemic landscape for central bank statistics” durante los días 25-27 de agosto de 2022 en la sesión 3.B “Environmental statistics”) “Creation of a structured sustainability database from company reports: A web application prototype for information retrieval and storage” (presentado en el IFC Bank of Italy workshop on “Data science in central banking” los días 14-17 de febrero de 2022 en la sesión 4.3 “Text Mining and ML utilized in Economic Research”) (Koblents and Morales (2022)) El primer artículo se centra en las numerosas limitaciones encontradas y logros conseguidos en el proceso de desarrollo de la base de microdatos sobre indicadores de sostenibilidad para sociedades no financieras. Tras analizar detalladamente los estándares actuales de información ESG, consultar a expertos en la materia, analizar las obligaciones regulatorias y llevar a cabo un ejercicio práctico de búsqueda de esta información, se seleccionó una lista de los 39 indicadores más relevantes para comenzar la búsqueda. Actualmente se han recopilado más de 15.000 datos correspondientes al período 2019-2020 utilizando una herramienta semiautomática de búsqueda de información desarrollada internamente (presentado en detalle en el segundo artículo). Durante el proyecto se identificaron numerosas dificultades tales como el uso de diferentes métricas al reportar los indicadores, falta de información y de soporte digital para la descarga, así como dificultades de comparabilidad y restricciones regulatorias. El segundo artículo se centra en la herramienta desarrollada para crear la base de microdatos presentada en el primer artículo. Esta aplicación web tiene como objetivo, mediante la extracción y almacenamiento semiautomático, obtener los indicadores de sostenibilidad de los estados no financieros anuales presentados por las sociedades no financieras españolas. El objetivo de la aplicación es facilitar a los usuarios el trabajo de búsqueda de indicadores de sostenibilidad en múltiples documentos y su almacenamiento en una base de datos estructurada. La herramienta desarrollada incorpora un conjunto de términos de búsqueda predefinidos para cada indicador que han sido seleccionados en base a conocimiento experto e inteligencia artificial en desarrollos posteriores. Para cada empresa e indicador, la herramienta sugiere los fragmentos de texto más relevantes al usuario, quien a su vez identifica el valor correcto del indicador y lo almacena en la base de datos utilizando la interfaz web de usuario. Esta herramienta ha sido creada por dos científicos de datos en tres meses, con el apoyo continuo de un equipo de expertos que ha contribuido a la definición de requisitos y propuestas de mejora, la recopilación de datos, así como la validación y prueba de la herramienta. A lo largo del artículo, se realiza una descripción del enfoque técnico y los principales módulos del prototipo implementado, incluyendo la extracción de texto, indexación y búsqueda, almacenamiento de datos y visualización

    A Relevance Feedback-Based System For Quickly Narrowing Biomedical Literature Search Result

    Get PDF
    The online literature is an important source that helps people find the information. The quick increase of online literature makes the manual search process for the most relevant information a very time-consuming task and leads to sifting through many results to find the relevant ones. The existing search engines and online databases return a list of results that satisfy the user\u27s search criteria. The list is often too long for the user to go through every hit if he/she does not exactly know what he/she wants or/and does not have time to review them one by one. My focus is on how to find biomedical literature in a fastest way. In this dissertation, I developed a biomedical literature search system that uses relevance feedback mechanism, fuzzy logic, text mining techniques and Unified Medical Language System. The system extracts and decodes information from the online biomedical documents and uses the extracted information to first filter unwanted documents and then ranks the related ones based on the user preferences. I used text mining techniques to extract PDF document features and used these features to filter unwanted documents with the help of fuzzy logic. The system extracts meaning and semantic relations between texts and calculates the similarity between documents using these relations. Moreover, I developed a fuzzy literature ranking method that uses fuzzy logic, text mining techniques and Unified Medical Language System. The ranking process is utilized based on fuzzy logic and Unified Medical Language System knowledge resources. The fuzzy ranking method uses semantic type and meaning concepts to map the relations between texts in documents. The relevance feedback-based biomedical literature search system is evaluated using a real biomedical data that created using dobutamine (drug name). The data set contains 1,099 original documents. To obtain coherent and reliable evaluation results, two physicians are involved in the system evaluation. Using (30-day mortality) as specific query, the retrieved result precision improves by 87.7% in three rounds, which shows the effectiveness of using relevance feedback, fuzzy logic and UMLS in the search process. Moreover, the fuzzy-based ranking method is evaluated in term of ranking the biomedical search result. Experiments show that the fuzzy-based ranking method improves the average ranking order accuracy by 3.35% and 29.55% as compared with UMLS meaning and semantic type methods respectively
    corecore