7 research outputs found

    Framework for data quality in knowledge discovery tasks

    Get PDF
    Actualmente la explosión de datos es tendencia en el universo digital debido a los avances en las tecnologías de la información. En este sentido, el descubrimiento de conocimiento y la minería de datos han ganado mayor importancia debido a la gran cantidad de datos disponibles. Para un exitoso proceso de descubrimiento de conocimiento, es necesario preparar los datos. Expertos afirman que la fase de preprocesamiento de datos toma entre un 50% a 70% del tiempo de un proceso de descubrimiento de conocimiento. Herramientas software basadas en populares metodologías para el descubrimiento de conocimiento ofrecen algoritmos para el preprocesamiento de los datos. Según el cuadrante mágico de Gartner de 2018 para ciencia de datos y plataformas de aprendizaje automático, KNIME, RapidMiner, SAS, Alteryx, y H20.ai son las mejores herramientas para el desucrimiento del conocimiento. Estas herramientas proporcionan diversas técnicas que facilitan la evaluación del conjunto de datos, sin embargo carecen de un proceso orientado al usuario que permita abordar los problemas en la calidad de datos. Adem´as, la selección de las técnicas adecuadas para la limpieza de datos es un problema para usuarios inexpertos, ya que estos no tienen claro cuales son los métodos más confiables. De esta forma, la presente tesis doctoral se enfoca en abordar los problemas antes mencionados mediante: (i) Un marco conceptual que ofrezca un proceso guiado para abordar los problemas de calidad en los datos en tareas de descubrimiento de conocimiento, (ii) un sistema de razonamiento basado en casos que recomiende los algoritmos adecuados para la limpieza de datos y (iii) una ontología que representa el conocimiento de los problemas de calidad en los datos y los algoritmos de limpieza de datos. Adicionalmente, esta ontología contribuye en la representacion formal de los casos y en la fase de adaptación, del sistema de razonamiento basado en casos.The creation and consumption of data continue to grow by leaps and bounds. Due to advances in Information and Communication Technologies (ICT), today the data explosion in the digital universe is a new trend. The Knowledge Discovery in Databases (KDD) gain importance due the abundance of data. For a successful process of knowledge discovery is necessary to make a data treatment. The experts affirm that preprocessing phase take the 50% to 70% of the total time of knowledge discovery process. Software tools based on Knowledge Discovery Methodologies offers algorithms for data preprocessing. According to Gartner 2018 Magic Quadrant for Data Science and Machine Learning Platforms, KNIME, RapidMiner, SAS, Alteryx and H20.ai are the leader tools for knowledge discovery. These software tools provide different techniques and they facilitate the evaluation of data analysis, however, these software tools lack any kind of guidance as to which techniques can or should be used in which contexts. Consequently, the use of suitable data cleaning techniques is a headache for inexpert users. They have no idea which methods can be confidently used and often resort to trial and error. This thesis presents three contributions to address the mentioned problems: (i) A conceptual framework to provide the user a guidance to address data quality issues in knowledge discovery tasks, (ii) a Case-based reasoning system to recommend the suitable algorithms for data cleaning, and (iii) an Ontology that represent the knowledge in data quality issues and data cleaning methods. Also, this ontology supports the case-based reasoning system for case representation and reuse phase.Programa Oficial de Doctorado en Ciencia y Tecnología InformáticaPresidente: Fernando Fernández Rebollo.- Secretario: Gustavo Adolfo Ramírez.- Vocal: Juan Pedro Caraça-Valente Hernánde

    Development of benthic monitoring approaches for salmon aquaculture sites using machine learning, hydroacoustic data and bacterial eDNA

    Get PDF
    Intensive caged salmon production can lead to localized perturbations of the seafloor environment where organic waste (flocculent matter) accumulates and disrupts ecological processes. As the aquaculture industry expands, the development of tools to rapidly detect changes in seafloor condition is critical. Here, we examine whether applying machine learning to two types of monitoring data could improve environmental assessments at aquaculture sites in Newfoundland. First, we apply machine learning to single beam echosounder data to detect flocculent matter at aquaculture sites over larger areas than currently achieved used drop camera imaging. Then, we use machine learning to categorize sediments by levels of disturbance based on bacterial tetranucleotide frequency distributions generated from environmental DNA. While echosounder data can detect flocculent matter with moderate success in this region, bacterial tetranucleotide frequencies are highly effective classifiers of benthic disturbance; this simplified environmental DNA-based approach could be implemented within novel aquaculture benthic monitoring pipelines

    A Learning Health System for Radiation Oncology

    Get PDF
    The proposed research aims to address the challenges faced by clinical data science researchers in radiation oncology accessing, integrating, and analyzing heterogeneous data from various sources. The research presents a scalable intelligent infrastructure, called the Health Information Gateway and Exchange (HINGE), which captures and structures data from multiple sources into a knowledge base with semantically interlinked entities. This infrastructure enables researchers to mine novel associations and gather relevant knowledge for personalized clinical outcomes. The dissertation discusses the design framework and implementation of HINGE, which abstracts structured data from treatment planning systems, treatment management systems, and electronic health records. It utilizes disease-specific smart templates for capturing clinical information in a discrete manner. HINGE performs data extraction, aggregation, and quality and outcome assessment functions automatically, connecting seamlessly with local IT/medical infrastructure. Furthermore, the research presents a knowledge graph-based approach to map radiotherapy data to an ontology-based data repository using FAIR (Findable, Accessible, Interoperable, Reusable) concepts. This approach ensures that the data is easily discoverable and accessible for clinical decision support systems. The dissertation explores the ETL (Extract, Transform, Load) process, data model frameworks, ontologies, and provides a real-world clinical use case for this data mapping. To improve the efficiency of retrieving information from large clinical datasets, a search engine based on ontology-based keyword searching and synonym-based term matching tool was developed. The hierarchical nature of ontologies is leveraged to retrieve patient records based on parent and children classes. Additionally, patient similarity analysis is conducted using vector embedding models (Word2Vec, Doc2Vec, GloVe, and FastText) to identify similar patients based on text corpus creation methods. Results from the analysis using these models are presented. The implementation of a learning health system for predicting radiation pneumonitis following stereotactic body radiotherapy is also discussed. 3D convolutional neural networks (CNNs) are utilized with radiographic and dosimetric datasets to predict the likelihood of radiation pneumonitis. DenseNet-121 and ResNet-50 models are employed for this study, along with integrated gradient techniques to identify salient regions within the input 3D image dataset. The predictive performance of the 3D CNN models is evaluated based on clinical outcomes. Overall, the proposed Learning Health System provides a comprehensive solution for capturing, integrating, and analyzing heterogeneous data in a knowledge base. It offers researchers the ability to extract valuable insights and associations from diverse sources, ultimately leading to improved clinical outcomes. This work can serve as a model for implementing LHS in other medical specialties, advancing personalized and data-driven medicine

    Wearable Sensors Applied in Movement Analysis

    Get PDF
    Recent advances in electronics have led to sensors whose sizes and weights are such that they can be placed on living systems without impairing their natural motion and habits. They may be worn on the body as accessories or as part of the clothing and enable personalized mobile information processing. Wearable sensors open the way for a nonintrusive and continuous monitoring of body orientation, movements, and various physiological parameters during motor activities in real-life settings. Thus, they may become crucial tools not only for researchers, but also for clinicians, as they have the potential to improve diagnosis, better monitor disease development and thereby individualize treatment. Wearable sensors should obviously go unnoticed for the people wearing them and be intuitive in their installation. They should come with wireless connectivity and low-power consumption. Moreover, the electronics system should be self-calibrating and deliver correct information that is easy to interpret. Cross-platform interfaces that provide secure data storage and easy data analysis and visualization are needed.This book contains a selection of research papers presenting new results addressing the above challenges

    The value of magnetic resonance imaging in the assessment of degenerative lumbar spinal stenosis

    Get PDF
    This thesis explores the role of magnetic resonance imaging (MRI) of the lumbar spine in patients with the main clinical feature of lumbar spinal stenosis (LSS): neurogenic claudication (NC). NC is thought to be caused by positional compression of the cauda equina in a spinal canal narrowed by degenerative change. MRI is the primary tool for demonstrating such degeneration but no universally accepted and evidence based imaging definition of LSS exists. Systematic reviews of the literature are presented: the first finds the available studies comparing MRIs in NC patients to a control group have unsuitable methodologies to propose a definition of stenosis, largely due to use of imaging based inclusion criteria. The second finds the strength of relationship between canal size and symptom severity in LSS patients is inconsistent across different studies, but with most papers using surgical patient cohorts, likely to exclude those with minor symptoms. A diagnostic cross-sectional study, including both community and secondary care based participants is described, comparing MRIs in participants with NC and a separately recruited control group. Unlike prior studies, NC patients are selected for inclusion based upon their clinical presentation alone. NC patients are found to have smaller canals than the control group, but measurements of canal narrowing or qualitative judgement of nerve root compression generally fail to accurately predict NC symptoms, and various methods of combining the measurements, including machine learning techniques, fail to improve diagnostic accuracy. No convincing relationship between symptom severity and canal size is identified. A definition for radiological LSS is proposed for the central canal (grade C — Schizas et al. 2010) the lateral recess (grade 2 nerve root entrapment – Bartynski et al. 2003), and the neural exit foramen (neural exit foramen depth less than 4 mm) based upon the best performing measurements and other pragmatic considerations
    corecore