7 research outputs found
Framework for data quality in knowledge discovery tasks
Actualmente la explosión de datos es tendencia en el universo digital debido a los
avances en las tecnologÃas de la información. En este sentido, el descubrimiento
de conocimiento y la minerÃa de datos han ganado mayor importancia debido a
la gran cantidad de datos disponibles. Para un exitoso proceso de descubrimiento
de conocimiento, es necesario preparar los datos. Expertos afirman que la fase de
preprocesamiento de datos toma entre un 50% a 70% del tiempo de un proceso de
descubrimiento de conocimiento.
Herramientas software basadas en populares metodologÃas para el descubrimiento
de conocimiento ofrecen algoritmos para el preprocesamiento de los datos.
Según el cuadrante mágico de Gartner de 2018 para ciencia de datos y plataformas
de aprendizaje automático, KNIME, RapidMiner, SAS, Alteryx, y H20.ai son las
mejores herramientas para el desucrimiento del conocimiento. Estas herramientas
proporcionan diversas técnicas que facilitan la evaluación del conjunto de datos,
sin embargo carecen de un proceso orientado al usuario que permita abordar los
problemas en la calidad de datos. Adem´as, la selección de las técnicas adecuadas
para la limpieza de datos es un problema para usuarios inexpertos, ya que estos
no tienen claro cuales son los métodos más confiables.
De esta forma, la presente tesis doctoral se enfoca en abordar los problemas
antes mencionados mediante: (i) Un marco conceptual que ofrezca un proceso
guiado para abordar los problemas de calidad en los datos en tareas de descubrimiento
de conocimiento, (ii) un sistema de razonamiento basado en casos
que recomiende los algoritmos adecuados para la limpieza de datos y (iii) una ontologÃa que representa el conocimiento de los problemas de calidad en los datos
y los algoritmos de limpieza de datos. Adicionalmente, esta ontologÃa contribuye
en la representacion formal de los casos y en la fase de adaptación, del sistema de
razonamiento basado en casos.The creation and consumption of data continue to grow by leaps and bounds. Due
to advances in Information and Communication Technologies (ICT), today the
data explosion in the digital universe is a new trend. The Knowledge Discovery
in Databases (KDD) gain importance due the abundance of data. For a successful
process of knowledge discovery is necessary to make a data treatment. The
experts affirm that preprocessing phase take the 50% to 70% of the total time of
knowledge discovery process.
Software tools based on Knowledge Discovery Methodologies offers algorithms
for data preprocessing. According to Gartner 2018 Magic Quadrant for
Data Science and Machine Learning Platforms, KNIME, RapidMiner, SAS, Alteryx
and H20.ai are the leader tools for knowledge discovery. These software
tools provide different techniques and they facilitate the evaluation of data analysis,
however, these software tools lack any kind of guidance as to which techniques
can or should be used in which contexts. Consequently, the use of suitable data
cleaning techniques is a headache for inexpert users. They have no idea which
methods can be confidently used and often resort to trial and error.
This thesis presents three contributions to address the mentioned problems:
(i) A conceptual framework to provide the user a guidance to address data quality
issues in knowledge discovery tasks, (ii) a Case-based reasoning system to
recommend the suitable algorithms for data cleaning, and (iii) an Ontology that
represent the knowledge in data quality issues and data cleaning methods. Also,
this ontology supports the case-based reasoning system for case representation
and reuse phase.Programa Oficial de Doctorado en Ciencia y TecnologÃa InformáticaPresidente: Fernando Fernández Rebollo.- Secretario: Gustavo Adolfo RamÃrez.- Vocal: Juan Pedro Caraça-Valente Hernánde
Development of benthic monitoring approaches for salmon aquaculture sites using machine learning, hydroacoustic data and bacterial eDNA
Intensive caged salmon production can lead to localized perturbations of the seafloor environment where organic waste (flocculent matter) accumulates and disrupts ecological processes. As the aquaculture industry expands, the development of tools to rapidly detect changes in seafloor condition is critical. Here, we examine whether applying machine learning to two types of monitoring data could improve environmental assessments at aquaculture sites in Newfoundland. First, we apply machine learning to single beam echosounder data to detect flocculent matter at aquaculture sites over larger areas than currently achieved used drop camera imaging. Then, we use machine learning to categorize sediments by levels of disturbance based on bacterial tetranucleotide frequency distributions generated from environmental DNA. While echosounder data can detect flocculent matter with moderate success in this region, bacterial tetranucleotide frequencies are highly effective classifiers of benthic disturbance; this simplified environmental DNA-based approach could be implemented within novel aquaculture benthic monitoring pipelines
A Learning Health System for Radiation Oncology
The proposed research aims to address the challenges faced by clinical data science researchers in radiation oncology accessing, integrating, and analyzing heterogeneous data from various sources. The research presents a scalable intelligent infrastructure, called the Health Information Gateway and Exchange (HINGE), which captures and structures data from multiple sources into a knowledge base with semantically interlinked entities. This infrastructure enables researchers to mine novel associations and gather relevant knowledge for personalized clinical outcomes.
The dissertation discusses the design framework and implementation of HINGE, which abstracts structured data from treatment planning systems, treatment management systems, and electronic health records. It utilizes disease-specific smart templates for capturing clinical information in a discrete manner. HINGE performs data extraction, aggregation, and quality and outcome assessment functions automatically, connecting seamlessly with local IT/medical infrastructure.
Furthermore, the research presents a knowledge graph-based approach to map radiotherapy data to an ontology-based data repository using FAIR (Findable, Accessible, Interoperable, Reusable) concepts. This approach ensures that the data is easily discoverable and accessible for clinical decision support systems. The dissertation explores the ETL (Extract, Transform, Load) process, data model frameworks, ontologies, and provides a real-world clinical use case for this data mapping.
To improve the efficiency of retrieving information from large clinical datasets, a search engine based on ontology-based keyword searching and synonym-based term matching tool was developed. The hierarchical nature of ontologies is leveraged to retrieve patient records based on parent and children classes. Additionally, patient similarity analysis is conducted using vector embedding models (Word2Vec, Doc2Vec, GloVe, and FastText) to identify similar patients based on text corpus creation methods. Results from the analysis using these models are presented.
The implementation of a learning health system for predicting radiation pneumonitis following stereotactic body radiotherapy is also discussed. 3D convolutional neural networks (CNNs) are utilized with radiographic and dosimetric datasets to predict the likelihood of radiation pneumonitis. DenseNet-121 and ResNet-50 models are employed for this study, along with integrated gradient techniques to identify salient regions within the input 3D image dataset. The predictive performance of the 3D CNN models is evaluated based on clinical outcomes.
Overall, the proposed Learning Health System provides a comprehensive solution for capturing, integrating, and analyzing heterogeneous data in a knowledge base. It offers researchers the ability to extract valuable insights and associations from diverse sources, ultimately leading to improved clinical outcomes. This work can serve as a model for implementing LHS in other medical specialties, advancing personalized and data-driven medicine
Wearable Sensors Applied in Movement Analysis
Recent advances in electronics have led to sensors whose sizes and weights are such that they can be placed on living systems without impairing their natural motion and habits. They may be worn on the body as accessories or as part of the clothing and enable personalized mobile information processing. Wearable sensors open the way for a nonintrusive and continuous monitoring of body orientation, movements, and various physiological parameters during motor activities in real-life settings. Thus, they may become crucial tools not only for researchers, but also for clinicians, as they have the potential to improve diagnosis, better monitor disease development and thereby individualize treatment. Wearable sensors should obviously go unnoticed for the people wearing them and be intuitive in their installation. They should come with wireless connectivity and low-power consumption. Moreover, the electronics system should be self-calibrating and deliver correct information that is easy to interpret. Cross-platform interfaces that provide secure data storage and easy data analysis and visualization are needed.This book contains a selection of research papers presenting new results addressing the above challenges
The value of magnetic resonance imaging in the assessment of degenerative lumbar spinal stenosis
This thesis explores the role of magnetic resonance imaging (MRI) of the lumbar spine in patients with the main clinical feature of lumbar spinal stenosis (LSS): neurogenic claudication (NC). NC is thought to be caused by positional compression of the cauda equina in a spinal canal narrowed by degenerative change. MRI is the primary tool for demonstrating such degeneration but no universally accepted and evidence based imaging definition of LSS exists.
Systematic reviews of the literature are presented: the first finds the available studies comparing MRIs in NC patients to a control group have unsuitable methodologies to propose a definition of stenosis, largely due to use of imaging based inclusion criteria. The second finds the strength of relationship between canal size and symptom severity in LSS patients is inconsistent across different studies, but with most papers using surgical patient cohorts, likely to exclude those with minor symptoms.
A diagnostic cross-sectional study, including both community and secondary care based participants is described, comparing MRIs in participants with NC and a separately recruited control group. Unlike prior studies, NC patients are selected for inclusion based upon their clinical presentation alone. NC patients are found to have smaller canals than the control group, but measurements of canal narrowing or qualitative judgement of nerve root compression generally fail to accurately predict NC symptoms, and various methods of combining the measurements, including machine learning techniques, fail to improve diagnostic accuracy. No convincing relationship between symptom severity and canal size is identified.
A definition for radiological LSS is proposed for the central canal (grade C — Schizas et al. 2010) the lateral recess (grade 2 nerve root entrapment – Bartynski et al. 2003), and the neural exit foramen (neural exit foramen depth less than 4 mm) based upon the best performing measurements and other pragmatic considerations