1,733 research outputs found

    BlogForever D2.6: Data Extraction Methodology

    Get PDF
    This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform

    The CAP cancer protocols – a case study of caCORE based data standards implementation to integrate with the Cancer Biomedical Informatics Grid

    Get PDF
    BACKGROUND: The Cancer Biomedical Informatics Grid (caBIG™) is a network of individuals and institutions, creating a world wide web of cancer research. An important aspect of this informatics effort is the development of consistent practices for data standards development, using a multi-tier approach that facilitates semantic interoperability of systems. The semantic tiers include (1) information models, (2) common data elements, and (3) controlled terminologies and ontologies. The College of American Pathologists (CAP) cancer protocols and checklists are an important reporting standard in pathology, for which no complete electronic data standard is currently available. METHODS: In this manuscript, we provide a case study of Cancer Common Ontologic Representation Environment (caCORE) data standard implementation of the CAP cancer protocols and checklists model – an existing and complex paper based standard. We illustrate the basic principles, goals and methodology for developing caBIG™ models. RESULTS: Using this example, we describe the process required to develop the model, the technologies and data standards on which the process and models are based, and the results of the modeling effort. We address difficulties we encountered and modifications to caCORE that will address these problems. In addition, we describe four ongoing development projects that will use the emerging CAP data standards to achieve integration of tissue banking and laboratory information systems. CONCLUSION: The CAP cancer checklists can be used as the basis for an electronic data standard in pathology using the caBIG™ semantic modeling methodology

    Generator breast datamart\u2014the novel breast cancer data discovery system for research and monitoring: Preliminary results and future perspectives

    Get PDF
    Background: Artificial Intelligence (AI) is increasingly used for process management in daily life. In the medical field AI is becoming part of computerized systems to manage information and encourage the generation of evidence. Here we present the development of the application of AI to IT systems present in the hospital, for the creation of a DataMart for the management of clinical and research processes in the field of breast cancer. Materials and methods: A multidisciplinary team of radiation oncologists, epidemiologists, medical oncologists, breast surgeons, data scientists, and data management experts worked together to identify relevant data and sources located inside the hospital system. Combinations of open-source data science packages and industry solutions were used to design the target framework. To validate the DataMart directly on real-life cases, the working team defined tumoral pathology and clinical purposes of proof of concepts (PoCs). Results: Data were classified into \u201cNot organized, not \u2018ontologized\u2019 data\u201d, \u201cOrganized, not \u2018ontologized\u2019 data\u201d, and \u201cOrganized and \u2018ontologized\u2019 data\u201d. Archives of real-world data (RWD) identified were platform based on ontology, hospital data warehouse, PDF documents, and electronic reports. Data extraction was performed by direct connection with structured data or text-mining technology. Two PoCs were performed, by which waiting time interval for radiotherapy and performance index of breast unit were tested and resulted available. Conclusions: GENERATOR Breast DataMart was created for supporting breast cancer pathways of care. An AI-based process automatically extracts data from different sources and uses them for generating trend studies and clinical evidence. Further studies and more proof of concepts are needed to exploit all the potentials of this system

    A new knowledge sourcing framework to support knowledge-based engineering development

    Get PDF
    New trends in Knowledge-Based Engineering (KBE) highlight the need for decoupling the automation aspect from the knowledge management side of KBE. In this direction, some authors argue that KBE is capable of effectively capturing, retaining and reusing engineering knowledge. However, there are some limitations associated with some aspects of KBE that present a barrier to deliver the knowledge sourcing process requested by the industry. To overcome some of these limitations this research proposes a new methodology for efficient knowledge capture and effective management of the complete knowledge life cycle. Current knowledge capture procedures represent one of the main constraints limiting the wide use of KBE in the industry. This is due to the extraction of knowledge from experts in high cost knowledge capture sessions. To reduce the amount of time required from experts to extract relevant knowledge, this research uses Artificial Intelligence (AI) techniques capable of generating new knowledge from company assets. Moreover the research reported here proposes the integration of AI methods and experts increasing as a result the accuracy of the predictions and the reliability of using advanced reasoning tools. The proposed knowledge sourcing framework integrates two features: (i) use of advanced data mining tools and expert knowledge to create new knowledge from raw data, (ii) adoption of a well-established and reliable methodology to systematically capture, transfer and reuse engineering knowledge. The methodology proposed in this research is validated through the development and implementation of two case studies aiming at the optimisation of wing design concepts. The results obtained in both use cases proved the extended KBE capability for fast and effective knowledge sourcing. This evidence was provided by the experts working in the development of each of the case studies through the implementation of structured quantitative and qualitative analyses

    Information Systems: No Boundaries! A Concise Approach to Understanding Information Systems for All Disciplines

    Get PDF
    This book was created to provide a different experience for students beginning their studies in information systems. Instead of being bombarded with information from a business systems perspective, the goal of this book is to provide a baseline of material regarding information systems in all disciplines, not just business systems - hence the name No Boundaries!https://scholars.fhsu.edu/all_oer/1002/thumbnail.jp

    The Impact of Artificial Intelligence on Strategic and Operational Decision Making

    Get PDF
    openEffective decision making lies at the core of organizational success. In the era of digital transformation, businesses are increasingly adopting data-driven approaches to gain a competitive advantage. According to existing literature, Artificial Intelligence (AI) represents a significant advancement in this area, with the ability to analyze large volumes of data, identify patterns, make accurate predictions, and provide decision support to organizations. This study aims to explore the impact of AI technologies on different levels of organizational decision making. By separating these decisions into strategic and operational according to their properties, the study provides a more comprehensive understanding of the feasibility, current adoption rates, and barriers hindering AI implementation in organizational decision making

    Doctor of Philosophy

    Get PDF
    dissertationPublic health surveillance systems are crucial for the timely detection and response to public health threats. Since the terrorist attacks of September 11, 2001, and the release of anthrax in the following month, there has been a heightened interest in public health surveillance. The years immediately following these attacks were met with increased awareness and funding from the federal government which has significantly strengthened the United States surveillance capabilities; however, despite these improvements, there are substantial challenges faced by today's public health surveillance systems. Problems with the current surveillance systems include: a) lack of leveraging unstructured public health data for surveillance purposes; and b) lack of information integration and the ability to leverage resources, applications or other surveillance efforts due to systems being built on a centralized model. This research addresses these problems by focusing on the development and evaluation of new informatics methods to improve the public health surveillance. To address the problems above, we first identified a current public surveillance workflow which is affected by the problems described and has the opportunity for enhancement through current informatics techniques. The 122 Mortality Surveillance for Pneumonia and Influenza was chosen as the primary use case for this dissertation work. The second step involved demonstrating the feasibility of using unstructured public health data, in this case death certificates. For this we created and evaluated a pipeline iv composed of a detection rule and natural language processor, for the coding of death certificates and the identification of pneumonia and influenza cases. The second problem was addressed by presenting the rationale of creating a federated model by leveraging grid technology concepts and tools for the sharing and epidemiological analyses of public health data. As a case study of this approach, a secured virtual organization was created where users are able to access two grid data services, using death certificates from the Utah Department of Health, and two analytical grid services, MetaMap and R. A scientific workflow was created using the published services to replicate the mortality surveillance workflow. To validate these approaches, and provide proofs-of-concepts, a series of real-world scenarios were conducted

    Sharing interoperable workflow provenance: A review of best practices and their practical application in CWLProv

    Get PDF
    Background: The automation of data analysis in the form of scientific workflows has become a widely adopted practice in many fields of research. Computationally driven data-intensive experiments using workflows enable Automation, Scaling, Adaption and Provenance support (ASAP). However, there are still several challenges associated with the effective sharing, publication and reproducibility of such workflows due to the incomplete capture of provenance and lack of interoperability between different technical (software) platforms. Results: Based on best practice recommendations identified from literature on workflow design, sharing and publishing, we define a hierarchical provenance framework to achieve uniformity in the provenance and support comprehensive and fully re-executable workflows equipped with domain-specific information. To realise this framework, we present CWLProv, a standard-based format to represent any workflow-based computational analysis to produce workflow output artefacts that satisfy the various levels of provenance. We utilise open source community-driven standards; interoperable workflow definitions in Common Workflow Language (CWL), structured provenance representation using the W3C PROV model, and resource aggregation and sharing as workflow-centric Research Objects (RO) generated along with the final outputs of a given workflow enactment. We demonstrate the utility of this approach through a practical implementation of CWLProv and evaluation using real-life genomic workflows developed by independent groups. Conclusions: The underlying principles of the standards utilised by CWLProv enable semantically-rich and executable Research Objects that capture computational workflows with retrospective provenance such that any platform supporting CWL will be able to understand the analysis, re-use the methods for partial re-runs, or reproduce the analysis to validate the published findings.Submitted to GigaScience (GIGA-D-18-00483

    Designing ubiquitous computing for reflection and learning in diabetes management

    Get PDF
    This dissertation proposes principles for the design of ubiquitous health monitoring applications that support reflection and learning in context of diabetes management. Due to the high individual differences between diabetes cases, each affected individual must find the optimal combination of lifestyle alterations and medication through reflective analysis of personal diseases history. This dissertation advocates using technology to enable individuals' proactive engagement in monitoring of their health. In particular, it proposes promoting individuals' engagement in reflection by exploiting breakdowns in individuals' routines or understanding; supporting continuity in thinking that leads to a systematic refinement of ideas; and supporting articulation of thoughts and understanding that helps to transform insights into knowledge. The empirical evidence for these principles was gathered thought the deployment studies of three ubiquitous computing applications that help individuals with diabetes in management of their diseases. These deployment studies demonstrated that technology for reflection helps individuals achieve their personal disease management goals, such as diet goals. In addition, they showed that using technology helps individuals embrace a proactive attitude towards their health indicated by their adoption of the internal locus of control.Ph.D.Committee Chair: Elizabeth D. Mynatt; Committee Member: Abowd, Gregory; Committee Member: Bruckman, Amy; Committee Member: Dourish, Paul; Committee Member: Nersessian, Nanc
    corecore