56,331 research outputs found

    A Data Quality Framework for Process Mining of Electronic Health Record Data

    Get PDF
    Reliable research demands data of known quality. This can be very challenging for electronic health record (EHR) based research where data quality issues can be complex and often unknown. Emerging technologies such as process mining can reveal insights into how to improve care pathways but only if technological advances are matched by strategies and methods to improve data quality. The aim of this work was to develop a care pathway data quality framework (CP-DQF) to identify, manage and mitigate EHR data quality in the context of process mining, using dental EHRs as an example. Objectives: To: 1) Design a framework implementable within our e-health record research environments; 2) Scale it to further dimensions and sources; 3) Run code to mark the data; 4) Mitigate issues and provide an audit trail. Methods: We reviewed the existing literature covering data quality frameworks for process mining and for data mining of EHRs and constructed a unified data quality framework that met the requirements of both. We applied the framework to a practical case study mining primary care dental pathways from an EHR covering 41 dental clinics and 231,760 patients in the Republic of Ireland. Results: Applying the framework helped identify many potential data quality issues and mark-up every data point affected. This enabled systematic assessment of the data quality issues relevant to mining care pathways. Conclusion: The complexity of data quality in an EHR-data research environment was addressed through a re-usable and comprehensible framework that met the needs of our case study. This structured approach saved time and brought rigor to the management and mitigation of data quality issues. The resulting metadata is being used within cohort selection, experiment and process mining software so that our research with this data is based on data of known quality. Our framework is a useful starting point for process mining researchers to address EHR data quality concerns

    Towards information profiling: data lake content metadata management

    Get PDF
    There is currently a burst of Big Data (BD) processed and stored in huge raw data repositories, commonly called Data Lakes (DL). These BD require new techniques of data integration and schema alignment in order to make the data usable by its consumers and to discover the relationships linking their content. This can be provided by metadata services which discover and describe their content. However, there is currently a lack of a systematic approach for such kind of metadata discovery and management. Thus, we propose a framework for the profiling of informational content stored in the DL, which we call information profiling. The profiles are stored as metadata to support data analysis. We formally define a metadata management process which identifies the key activities required to effectively handle this.We demonstrate the alternative techniques and performance of our process using a prototype implementation handling a real-life case-study from the OpenML DL, which showcases the value and feasibility of our approach.Peer ReviewedPostprint (author's final draft

    Improving Knowledge Retrieval in Digital Libraries Applying Intelligent Techniques

    Get PDF
    Nowadays an enormous quantity of heterogeneous and distributed information is stored in the digital University. Exploring online collections to find knowledge relevant to a user’s interests is a challenging work. The artificial intelligence and Semantic Web provide a common framework that allows knowledge to be shared and reused in an efficient way. In this work we propose a comprehensive approach for discovering E-learning objects in large digital collections based on analysis of recorded semantic metadata in those objects and the application of expert system technologies. We have used Case Based-Reasoning methodology to develop a prototype for supporting efficient retrieval knowledge from online repositories. We suggest a conceptual architecture for a semantic search engine. OntoUS is a collaborative effort that proposes a new form of interaction between users and digital libraries, where the latter are adapted to users and their surroundings

    Developing information architecture through records management classification techniques

    Get PDF
    Purpose – This work aims to draw attention to information retrieval philosophies and techniques allied to the records management profession, advocating a wider professional consideration of a functional approach to information management, in this instance in the development of information architecture. Design/methodology/approach – The paper draws from a hypothesis originally presented by the author that advocated a viewpoint whereby the application of records management techniques, traditionally applied to develop business classification schemes, was offered as an additional solution to organising information resources and services (within a university intranet), where earlier approaches, notably subject- and administrative-based arrangements, were found to be lacking. The hypothesis was tested via work-based action learning and is presented here as an extended case study. The paper also draws on evidence submitted to the Joint Information Systems Committee in support of the Abertay University's application for consideration for the JISC award for innovation in records and information management. Findings – The original hypothesis has been tested in the workplace. Information retrieval techniques, allied to records management (functional classification), were the main influence in the development of pre- and post-coordinate information retrieval systems to support a wider information architecture, where the subject approach was found to be lacking. Their use within the workplace has since been extended. Originality/value – The paper advocates that the development of information retrieval as a discipline should include a wider consideration of functional classification, as this alternative to the subject approach is largely ignored in mainstream IR works

    DIDET: Digital libraries for distributed, innovative design education and teamwork. Final project report

    Get PDF
    The central goal of the DIDET Project was to enhance student learning opportunities by enabling them to partake in global, team based design engineering projects, in which they directly experience different cultural contexts and access a variety of digital information sources via a range of appropriate technology. To achieve this overall project goal, the project delivered on the following objectives: 1. Teach engineering information retrieval, manipulation, and archiving skills to students studying on engineering degree programs. 2. Measure the use of those skills in design projects in all years of an undergraduate degree program. 3. Measure the learning performance in engineering design courses affected by the provision of access to information that would have been otherwise difficult to access. 4. Measure student learning performance in different cultural contexts that influence the use of alternative sources of information and varying forms of Information and Communications Technology. 5. Develop and provide workshops for staff development. 6. Use the measurement results to annually redesign course content and the digital libraries technology. The overall DIDET Project approach was to develop, implement, use and evaluate a testbed to improve the teaching and learning of students partaking in global team based design projects. The use of digital libraries and virtual design studios was used to fundamentally change the way design engineering is taught at the collaborating institutions. This was done by implementing a digital library at the partner institutions to improve learning in the field of Design Engineering and by developing a Global Team Design Project run as part of assessed classes at Strathclyde, Stanford and Olin. Evaluation was carried out on an ongoing basis and fed back into project development, both on the class teaching model and the LauLima system developed at Strathclyde to support teaching and learning. Major findings include the requirement to overcome technological, pedagogical and cultural issues for successful elearning implementations. A need for strong leadership has been identified, particularly to exploit the benefits of cross-discipline team working. One major project output still being developed is a DIDET Project Framework for Distributed Innovative Design, Education and Teamwork to encapsulate all project findings and outputs. The project achieved its goal of embedding major change to the teaching of Design Engineering and Strathclyde's new Global Design class has been both successful and popular with students

    Representing Dataset Quality Metadata using Multi-Dimensional Views

    Full text link
    Data quality is commonly defined as fitness for use. The problem of identifying quality of data is faced by many data consumers. Data publishers often do not have the means to identify quality problems in their data. To make the task for both stakeholders easier, we have developed the Dataset Quality Ontology (daQ). daQ is a core vocabulary for representing the results of quality benchmarking of a linked dataset. It represents quality metadata as multi-dimensional and statistical observations using the Data Cube vocabulary. Quality metadata are organised as a self-contained graph, which can, e.g., be embedded into linked open datasets. We discuss the design considerations, give examples for extending daQ by custom quality metrics, and present use cases such as analysing data versions, browsing datasets by quality, and link identification. We finally discuss how data cube visualisation tools enable data publishers and consumers to analyse better the quality of their data.Comment: Preprint of a paper submitted to the forthcoming SEMANTiCS 2014, 4-5 September 2014, Leipzig, German

    A framework for design engineering education in a global context

    Get PDF
    This paper presents a framework for teaching design engineering in a global context using innovative technologies to enable distributed teams to work together effectively across international and cultural boundaries. The DIDET Framework represents the findings of a 5-year project conducted by the University of Strathclyde, Stanford University and Olin College which enhanced student learning opportunities by enabling them to partake in global, team based design engineering projects, directly experiencing different cultural contexts and accessing a variety of digital information sources via a range of innovative technology. The use of innovative technology enabled the formalization of design knowledge within international student teams as did the methods that were developed for students to store, share and reuse information. Coaching methods were used by teaching staff to support distributed teams and evaluation work on relevant classes was carried out regularly to allow ongoing improvement of learning and teaching and show improvements in student learning. Major findings of the 5 year project include the requirement to overcome technological, pedagogical and cultural issues for successful eLearning implementations. The DIDET Framework encapsulates all the conclusions relating to design engineering in a global context. Each of the principles for effective distributed design learning is shown along with relevant findings and suggested metrics. The findings detailed in the paper were reached through a series of interventions in design engineering education at the collaborating institutions. Evaluation was carried out on an ongoing basis and fed back into project development, both on the pedagogical and the technological approaches

    Bridging the Gap Between Traditional Metadata and the Requirements of an Academic SDI for Interdisciplinary Research

    Get PDF
    Metadata has long been understood as a fundamental component of any Spatial Data Infrastructure, providing information relating to discovery, evaluation and use of datasets and describing their quality. Having good metadata about a dataset is fundamental to using it correctly and to understanding the implications of issues such as missing data or incorrect attribution on the results obtained for any analysis carried out. Traditionally, spatial data was created by expert users (e.g. national mapping agencies), who created metadata for the data. Increasingly, however, data used in spatial analysis comes from multiple sources and could be captured or used by nonexpert users – for example academic researchers ‐ many of whom are from non‐GIS disciplinary backgrounds, not familiar with metadata and perhaps working in geographically dispersed teams. This paper examines the applicability of metadata in this academic context, using a multi‐national coastal/environmental project as a case study. The work to date highlights a number of suggestions for good practice, issues and research questions relevant to Academic SDI, particularly given the increased levels of research data sharing and reuse required by UK and EU funders
    corecore