4,845 research outputs found

    Design and implementation of a workflow for quality improvement of the metadata of scientific publications

    Get PDF
    In this paper, a detailed workflow for analyzing and improving the quality of metadata of scientific publications is presented and tested. The workflow was developed based on approaches from the literature. Frequently occurring types of errors from the literature were compiled and mapped to the data-quality dimensions most relevant for publication data – completeness, correctness, and consistency – and made measurable. Based on the identified data errors, a process for improving data quality was developed. This process includes parsing hidden data, correcting incorrectly formatted attribute values, enriching with external data, carrying out deduplication, and filtering erroneous records. The effectiveness of the workflow was confirmed in an exemplary application to publication data from Open Researcher and Contributor ID (ORCID), with 56\% of the identified data errors corrected. The workflow will be applied to publication data from other source systems in the future to further increase its performance

    Practical Guidance for Integrating Data Management into Long-Term Ecological Monitoring Projects

    Get PDF
    Long-term monitoring and research projects are essential to understand ecological change and the effectiveness of management activities. An inherent characteristic of long-term projects is the need for consistent data collection over time, requiring rigorous attention to data management and quality assurance. Recent papers have provided broad recommendations for data management; however, practitioners need more detailed guidance and examples. We present general yet detailed guidance for the development of comprehensive, concise, and effective data management for monitoring projects. The guidance is presented as a graded approach, matching the scale of data management to the needs of the organization and the complexity of the project. We address the following topics: roles and responsibilities; consistent and precise data collection; calibration of field crews and instrumentation; management of tabular, photographic, video, and sound data; data completeness and quality; development of metadata; archiving data; and evaluation of existing data from other sources. This guidance will help practitioners execute effective data management, thereby, improving the quality and usability of data for meeting project objectives as well as broader meta-analysis and macrosystem ecology research

    Multicenter Collaborative Study to Optimize Mass Spectrometry Workflows of Clinical Specimens

    Get PDF
    The foundation for integrating mass spectrometry (MS)-based proteomics into systems medicine is the development of standardized start-to-finish and fit-for-purpose workflows for clinical specimens. An essential step in this pursuit is to highlight the common ground in a diverse landscape of different sample preparation techniques and liquid chromatography-mass spectrometry (LC-MS) setups. With the aim to benchmark and improve the current best practices among the proteomics MS laboratories of the CLINSPECT-M consortium, we performed two consecutive round-robin studies with full freedom to operate in terms of sample preparation and MS measurements. The six study partners were provided with two clinically relevant sample matrices: plasma and cerebrospinal fluid (CSF). In the first round, each laboratory applied their current best practice protocol for the respective matrix. Based on the achieved results and following a transparent exchange of all lab-specific protocols within the consortium, each laboratory could advance their methods before measuring the same samples in the second acquisition round. Both time points are compared with respect to identifications (IDs), data completeness, and precision, as well as reproducibility. As a result, the individual performances of participating study centers were improved in the second measurement, emphasizing the effect and importance of the expert-driven exchange of best practices for direct practical improvements

    Improving data preparation for the application of process mining

    Get PDF
    Immersed in what is already known as the fourth industrial revolution, automation and data exchange are taking on a particularly relevant role in complex environments, such as industrial manufacturing environments or logistics. This digitisation and transition to the Industry 4.0 paradigm is causing experts to start analysing business processes from other perspectives. Consequently, where management and business intelligence used to dominate, process mining appears as a link, trying to build a bridge between both disciplines to unite and improve them. This new perspective on process analysis helps to improve strategic decision making and competitive capabilities. Process mining brings together data and process perspectives in a single discipline that covers the entire spectrum of process management. Through process mining, and based on observations of their actual operations, organisations can understand the state of their operations, detect deviations, and improve their performance based on what they observe. In this way, process mining is an ally, occupying a large part of current academic and industrial research. However, although this discipline is receiving more and more attention, it presents severe application problems when it is implemented in real environments. The variety of input data in terms of form, content, semantics, and levels of abstraction makes the execution of process mining tasks in industry an iterative, tedious, and manual process, requiring multidisciplinary experts with extensive knowledge of the domain, process management, and data processing. Currently, although there are numerous academic proposals, there are no industrial solutions capable of automating these tasks. For this reason, in this thesis by compendium we address the problem of improving business processes in complex environments thanks to the study of the state-of-the-art and a set of proposals that improve relevant aspects in the life cycle of processes, from the creation of logs, log preparation, process quality assessment, and improvement of business processes. Firstly, for this thesis, a systematic study of the literature was carried out in order to gain an in-depth knowledge of the state-of-the-art in this field, as well as the different challenges faced by this discipline. This in-depth analysis has allowed us to detect a number of challenges that have not been addressed or received insufficient attention, of which three have been selected and presented as the objectives of this thesis. The first challenge is related to the assessment of the quality of input data, known as event logs, since the requeriment of the application of techniques for improving the event log must be based on the level of quality of the initial data, which is why this thesis presents a methodology and a set of metrics that support the expert in selecting which technique to apply to the data according to the quality estimation at each moment, another challenge obtained as a result of our analysis of the literature. Likewise, the use of a set of metrics to evaluate the quality of the resulting process models is also proposed, with the aim of assessing whether improvement in the quality of the input data has a direct impact on the final results. The second challenge identified is the need to improve the input data used in the analysis of business processes. As in any data-driven discipline, the quality of the results strongly depends on the quality of the input data, so the second challenge to be addressed is the improvement of the preparation of event logs. The contribution in this area is the application of natural language processing techniques to relabel activities from textual descriptions of process activities, as well as the application of clustering techniques to help simplify the results, generating more understandable models from a human point of view. Finally, the third challenge detected is related to the process optimisation, so we contribute with an approach for the optimisation of resources associated with business processes, which, through the inclusion of decision-making in the creation of flexible processes, enables significant cost reductions. Furthermore, all the proposals made in this thesis are validated and designed in collaboration with experts from different fields of industry and have been evaluated through real case studies in public and private projects in collaboration with the aeronautical industry and the logistics sector

    High Energy Physics Forum for Computational Excellence: Working Group Reports (I. Applications Software II. Software Libraries and Tools III. Systems)

    Full text link
    Computing plays an essential role in all aspects of high energy physics. As computational technology evolves rapidly in new directions, and data throughput and volume continue to follow a steep trend-line, it is important for the HEP community to develop an effective response to a series of expected challenges. In order to help shape the desired response, the HEP Forum for Computational Excellence (HEP-FCE) initiated a roadmap planning activity with two key overlapping drivers -- 1) software effectiveness, and 2) infrastructure and expertise advancement. The HEP-FCE formed three working groups, 1) Applications Software, 2) Software Libraries and Tools, and 3) Systems (including systems software), to provide an overview of the current status of HEP computing and to present findings and opportunities for the desired HEP computational roadmap. The final versions of the reports are combined in this document, and are presented along with introductory material.Comment: 72 page

    Biodiversity Data: Refinement of Technology and Implementation Methods

    Get PDF
    Biodiversity data consists of taxonomic specimens and information that inform our interpretations of ecosystems and life on Earth. Museum projects, exhibitions, and research utilize biodiversity data to construct answers and educational programming for staff and visitors. Cleaning and maintaining biodiversity data, however, is a difficult challenge that involves moderation and refinement of data entry, inventory, workflows, and protocols. Creating an ideal framework that involves the utilization of technology and the management practices of data standards will help in developing baseline recommendations for institutions struggling to maintain their biodiversity collections. Surveys were sent to listservs and museum professionals to acquire interpretation and data surrounding biodiversity data practices. From survey results, three interviews/case studies were performed with one staff member, respectively, from the University of Wyoming Museum of Vertebrates, Bernice Pauahi Bishop Museum, and the Smithsonian National Museum of Natural History. These interviews and surveys, in conjunction with a literature review, were conducted to explore processes and strategies currently being utilized to develop biodiversity data frameworks. Results indicate a strong desire for customizable and malleable databases that integrate institutional-level decision-making and preventative error protocols. In addition, thorough documentation and active engagement with staff and volunteers contribute to long-term benefits to data management standards

    Building essential biodiversity variables (EBVs) of species distribution and abundance at a global scale

    Get PDF
    Much biodiversity data is collected worldwide, but it remains challenging to assemble the scattered knowledge for assessing biodiversity status and trends. The concept of Essential Biodiversity Variables (EBVs) was introduced to structure biodiversity monitoring globally, and to harmonize and standardize biodiversity data from disparate sources to capture a minimum set of critical variables required to study, report and manage biodiversity change. Here, we assess the challenges of a ‘Big Data’ approach to building global EBV data products across taxa and spatiotemporal scales, focusing on species distribution and abundance. The majority of currently available data on species distributions derives from incidentally reported observations or from surveys where presence-only or presence–absence data are sampled repeatedly with standardized protocols. Most abundance data come from opportunistic population counts or from population time series using standardized protocols (e.g. repeated surveys of the same population from single or multiple sites). Enormous complexity exists in integrating these heterogeneous, multi-source data sets across space, time, taxa and different sampling methods. Integration of such data into global EBV data products requires correcting biases introduced by imperfect detection and varying sampling effort, dealing with different spatial resolution and extents, harmonizing measurement units from different data sources or sampling methods, applying statistical tools and models for spatial inter- or extrapolation, and quantifying sources of uncertainty and errors in data and models. To support the development of EBVs by the Group on Earth Observations Biodiversity Observation Network (GEO BON), we identify 11 key workflow steps that will operationalize the process of building EBV data products within and across research infrastructures worldwide. These workflow steps take multiple sequential activities into account, including identification and aggregation of various raw data sources, data quality control, taxonomic name matching and statistical modelling of integrated data. We illustrate these steps with concrete examples from existing citizen science and professional monitoring projects, including eBird, the Tropical Ecology Assessment and Monitoring network, the Living Planet Index and the Baltic Sea zooplankton monitoring. The identified workflow steps are applicable to both terrestrial and aquatic systems and a broad range of spatial, temporal and taxonomic scales. They depend on clear, findable and accessible metadata, and we provide an overview of current data and metadata standards. Several challenges remain to be solved for building global EBV data products: (i) developing tools and models for combining heterogeneous, multi-source data sets and filling data gaps in geographic, temporal and taxonomic coverage, (ii) integrating emerging methods and technologies for data collection such as citizen science, sensor networks, DNA-based techniques and satellite remote sensing, (iii) solving major technical issues related to data product structure, data storage, execution of workflows and the production process/cycle as well as approaching technical interoperability among research infrastructures, (iv) allowing semantic interoperability by developing and adopting standards and tools for capturing consistent data and metadata, and (v) ensuring legal interoperability by endorsing open data or data that are free from restrictions on use, modification and sharing. Addressing these challenges is critical for biodiversity research and for assessing progress towards conservation policy targets and sustainable development goals

    Investigation of Key Factors Affecting Quality of Patient Data from National Antiretroviral Therapy Electronic Medical Record System in Malawi

    Get PDF
    The Ministry of Health in Malawi implemented a National Antiretroviral Therapy Electronic Medical Record system currently deployed in over 150 health facilities. It thus expected quality and timely quarterly cohort reports. However, the raw electronic reports are rarely complete, accurate and consistent requiring cleaning hence being delayed. Such reports are now very critical under the COVID-19  pandemic. Adopting a mixed-method approach, this study assessed the key factors that affect quality of data entered in the electronic medical records system and the reports produced by the system. The study interviewed 134 health-care workers in 17 sites and 10 Baobab Health Trust officers. Observations were conducted and secondary data analysed. The analysis shows that the EMRs lacks proper documentation and validation rules, making it hard to maintain and increasing chances of duplicate entry, respectively. Coupled with lack of trained personnel, it was revealed that one set of login credentials is used by multiple users and vital data elements being null compromising security and completeness, respectively. The electronic medical records system was not used at 40% of the sites as a point of care system hence being used as a back-data entry tool. Thus, there is need to revise the system to include necessary validations, security features, back data-entry form and data quality dashboards. Keywords: Electronic Medical Records system, Data Quality, System Quality, Information Qualit

    Extending Health Information System Evaluation with an Importance‐Performance Map Analysis

    Get PDF
    Evaluation of a health information system is necessary for determining effective use and for enhancing the productivity of medical practitioners. However, the current system evaluation toolkit does not recommend specific areas required for further improvement. The objective of this chapter was to identify those constructs and their attributes that were the most suitable candidates for managerial intervention by applying partial least squares structural equation modeling. In doing so, the quantitative survey was adopted from the past studies together with new items creation representing system quality, records quality, service quality, and knowledge quality as the predictors while effective use and user performance as the outcomes. When extending the findings in importance‐performance map analysis, two‐system quality attributes (workflows fit and work styles fit) and all‐knowledge quality attributes exhibited higher importance rank for managerial actions. The chapter also provides a valuable recommendation for the policy and decision‐makers at the managerial level on how to apply the proposed system evaluation method in producing more efficient strategic‐planning strategies for further system upgrades and new implementation at health facilities
    corecore