159 research outputs found
Machine Learning Framework for Real-World Electronic Health Records Regarding Missingness, Interpretability, and Fairness
Machine learning (ML) and deep learning (DL) techniques have shown promising results in healthcare applications using Electronic Health Records (EHRs) data. However, their adoption in real-world healthcare settings is hindered by three major challenges. Firstly, real-world EHR data typically contains numerous missing values. Secondly, traditional ML/DL models are typically considered black-boxes, whereas interpretability is required for real-world healthcare applications. Finally, differences in data distributions may lead to unfairness and performance disparities, particularly in subpopulations.
This dissertation proposes methods to address missing data, interpretability, and fairness issues. The first work proposes an ensemble prediction framework for EHR data with large missing rates using multiple subsets with lower missing rates. The second method introduces the integration of medical knowledge graphs and double attention mechanism with the long short-term memory (LSTM) model to enhance interpretability by providing knowledge-based model interpretation. The third method develops an LSTM variant that integrates medical knowledge graphs and additional time-aware gates to handle multi-variable temporal missing issues and interpretability concerns. Finally, a transformer-based model is proposed to learn unbiased and fair representations of diverse subpopulations using domain classifiers and three attention mechanisms
Recommended from our members
Parallelizing support vector machines for scalable image annotation
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Machine learning techniques have facilitated image retrieval by automatically classifying and annotating images with keywords. Among them Support Vector Machines (SVMs) are used extensively due to their generalization properties. However, SVM training is notably a computationally intensive process especially when the training dataset is large.
In this thesis distributed computing paradigms have been investigated to speed up SVM training, by partitioning a large training dataset into small data chunks and process each chunk in parallel utilizing the resources of a cluster of computers. A resource aware parallel SVM algorithm is introduced for large scale image annotation in parallel using a cluster of computers. A genetic algorithm based load balancing scheme is designed to optimize the performance of the algorithm in heterogeneous computing environments.
SVM was initially designed for binary classifications. However, most classification problems arising in domains such as image annotation usually involve more than two classes. A resource aware parallel multiclass SVM algorithm for large scale image annotation in parallel using a cluster of computers is introduced.
The combination of classifiers leads to substantial reduction of classification error in a wide range of applications. Among them SVM ensembles with bagging is shown to outperform a single SVM in terms of classification accuracy. However, SVM ensembles training are notably a computationally intensive process especially when the number replicated samples based on bootstrapping is large. A distributed SVM ensemble algorithm for image annotation is introduced which re-samples the training data based on bootstrapping and training SVM on each sample in parallel using a cluster of computers.
The above algorithms are evaluated in both experimental and simulation environments showing that the distributed SVM algorithm, distributed multiclass SVM algorithm, and distributed SVM ensemble algorithm, reduces the training time significantly while maintaining a high level of accuracy in classifications
New Fundamental Technologies in Data Mining
The progress of data mining technology and large public popularity establish a need for a comprehensive text on the subject. The series of books entitled by "Data Mining" address the need by presenting in-depth description of novel mining algorithms and many useful applications. In addition to understanding each section deeply, the two books present useful hints and strategies to solving problems in the following chapters. The contributing authors have highlighted many future research directions that will foster multi-disciplinary collaborations and hence will lead to significant development in the field of data mining
A decision-making tool for real-time prediction of dynamic positioning reliability index
PhD ThesisThe Dynamic Positioning (DP) System is a complex system with significant levels of
integration between many sub-systems to perform diverse control functions. The extent of
information managed by each sub-system is enormous. The sophisticated level of integration
between sub-systems creates an array of possible failure scenarios. A systematic analysis of all
failure scenarios would be time-consuming and for an operator to handle any such catastrophic
situation is hugely demanding. There are many accidents where a failure in a DP system has
resulted in fatalities and environmental pollution. Therefore, the reliability assessment of a DP
system is critical for safe and efficient operation. The existing methods are time-consuming,
involving a lot of human effort which imposes built-in uncertainty and risk in the system during
complex operation.
This thesis has proposed a framework for a state-of-the-art decision-making tool to assist an
operator and prevent incidents by introducing a new concept of Dynamic Positioning â
Reliability Index (DP-RI). The DP-RI concept covers three phases, leading to technical
suggestions for the operator during complex operations, which are defined as Data,
Knowledge, Intelligence, and Action. The proposed framework covers analytics including
descriptive, diagnostic, predictive and prescriptive analytics. The first phase of the research
involves descriptive and diagnostic analytics by performing big data analytics on the available
databases to identify the sub-systems which play critical roles in DP system functionality. The
second phase of the research involves a novel approach where predictive analytics are used for
the weight assignment of the sub-systems, dynamic reliability modelling and offline and realtime forecasting of DP-RI. The third phase introduces innovative prescriptive analytics to
provide possible technical solutions to the operator in a short time during failures in the system
to enable them to respond quickly and prevent DP incidents. Thus, the DP-RI acts as an
innovative state-of-the-art decision-making tool which can suggest possible solutions to the
DPO by using analytics on the knowledge database. The results proved that it is a useful tool
if implemented on an actual vessel with diligent integration with the DP control system.Singapore Economic Development Board (EDB) and DNV
GL Singapore Pte Ltd
Parallelizing support vector machines for scalable image annotation
Machine learning techniques have facilitated image retrieval by automatically classifying and annotating images with keywords. Among them Support Vector Machines (SVMs) are used extensively due to their generalization properties. However, SVM training is notably a computationally intensive process especially when the training dataset is large. In this thesis distributed computing paradigms have been investigated to speed up SVM training, by partitioning a large training dataset into small data chunks and process each chunk in parallel utilizing the resources of a cluster of computers. A resource aware parallel SVM algorithm is introduced for large scale image annotation in parallel using a cluster of computers. A genetic algorithm based load balancing scheme is designed to optimize the performance of the algorithm in heterogeneous computing environments. SVM was initially designed for binary classifications. However, most classification problems arising in domains such as image annotation usually involve more than two classes. A resource aware parallel multiclass SVM algorithm for large scale image annotation in parallel using a cluster of computers is introduced. The combination of classifiers leads to substantial reduction of classification error in a wide range of applications. Among them SVM ensembles with bagging is shown to outperform a single SVM in terms of classification accuracy. However, SVM ensembles training are notably a computationally intensive process especially when the number replicated samples based on bootstrapping is large. A distributed SVM ensemble algorithm for image annotation is introduced which re-samples the training data based on bootstrapping and training SVM on each sample in parallel using a cluster of computers. The above algorithms are evaluated in both experimental and simulation environments showing that the distributed SVM algorithm, distributed multiclass SVM algorithm, and distributed SVM ensemble algorithm, reduces the training time significantly while maintaining a high level of accuracy in classifications.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
DiagnĂłstico de fallos y optimizaciĂłn de la planificaciĂłn en un marco de e-mantenimiento.
324 p.El objetivo principal es demostrar el potencial de mejora que las tĂ©cnicas y metodologĂas relacionadas con la analĂtica prescriptiva, pueden proporcionar en aplicaciones de mantenimiento industrial. Las tecnologĂas desarrolladas se pueden agrupar en tres ĂĄmbitos: - El e-mantenimiento, relacionado fundamentalmente con el desarrollo de plataformas colaborativas e inteligentes que permiten la integraciĂłn de nuevos sensores, sistemas de comunicaciones, estĂĄndares y protocolos, conceptos, mĂ©todos de almacenamiento y anĂĄlisis etc. que entran continuamente en nuestro abanico de posibilidades y nos ofrecen la posibilidad de seguir una tendencia de mejora en la optimizaciĂłn de activos y procesos, y en la interoperabilidad entre sistemas.- Las Redes Bayesianas (Bayesian Networks Âż BNs) junto con otras metodologĂas de recogida de informaciĂłn utilizadas en ingenierĂa nos ofrecen la posibilidad de automatizar la tarea de diagnĂłstico y predicciĂłn de fallos.- La optimizaciĂłn de las estrategias de mantenimiento, mediante simulaciones de fallos y anĂĄlisis coste-efectividad, que ayudan a la toma de decisiones a la hora de seleccionar una estrategia de mantenimiento adecuada para el activo. AdemĂĄs, mediante el uso de algoritmos de optimizaciĂłn logramos mejorar la planificaciĂłn del mantenimiento, reduciendo los tiempos y costes para realizar las tareas en un parque de activos
Explainable AI for clinical risk prediction: a survey of concepts, methods, and modalities
Recent advancements in AI applications to healthcare have shown incredible
promise in surpassing human performance in diagnosis and disease prognosis.
With the increasing complexity of AI models, however, concerns regarding their
opacity, potential biases, and the need for interpretability. To ensure trust
and reliability in AI systems, especially in clinical risk prediction models,
explainability becomes crucial. Explainability is usually referred to as an AI
system's ability to provide a robust interpretation of its decision-making
logic or the decisions themselves to human stakeholders. In clinical risk
prediction, other aspects of explainability like fairness, bias, trust, and
transparency also represent important concepts beyond just interpretability. In
this review, we address the relationship between these concepts as they are
often used together or interchangeably. This review also discusses recent
progress in developing explainable models for clinical risk prediction,
highlighting the importance of quantitative and clinical evaluation and
validation across multiple common modalities in clinical practice. It
emphasizes the need for external validation and the combination of diverse
interpretability methods to enhance trust and fairness. Adopting rigorous
testing, such as using synthetic datasets with known generative factors, can
further improve the reliability of explainability methods. Open access and
code-sharing resources are essential for transparency and reproducibility,
enabling the growth and trustworthiness of explainable research. While
challenges exist, an end-to-end approach to explainability in clinical risk
prediction, incorporating stakeholders from clinicians to developers, is
essential for success
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
In the last few years, Artificial Intelligence (AI) has achieved a notable momentum that, if harnessed
appropriately, may deliver the best of expectations over many application sectors across the field. For this
to occur shortly in Machine Learning, the entire community stands in front of the barrier of explainability,
an inherent problem of the latest techniques brought by sub-symbolism (e.g. ensembles or Deep Neural
Networks) that were not present in the last hype of AI (namely, expert systems and rule based models).
Paradigms underlying this problem fall within the so-called eXplainable AI (XAI) field, which is widely
acknowledged as a crucial feature for the practical deployment of AI models. The overview presented in
this article examines the existing literature and contributions already done in the field of XAI, including a
prospect toward what is yet to be reached. For this purpose we summarize previous efforts made to define
explainability in Machine Learning, establishing a novel definition of explainable Machine Learning that
covers such prior conceptual propositions with a major focus on the audience for which the explainability
is sought. Departing from this definition, we propose and discuss about a taxonomy of recent contributions
related to the explainability of different Machine Learning models, including those aimed at explaining
Deep Learning methods for which a second dedicated taxonomy is built and examined in detail. This
critical literature analysis serves as the motivating background for a series of challenges faced by XAI,
such as the interesting crossroads of data fusion and explainability. Our prospects lead toward the concept
of Responsible Artificial Intelligence, namely, a methodology for the large-scale implementation of AI
methods in real organizations with fairness, model explainability and accountability at its core. Our
ultimate goal is to provide newcomers to the field of XAI with a thorough taxonomy that can serve
as reference material in order to stimulate future research advances, but also to encourage experts and
professionals from other disciplines to embrace the benefits of AI in their activity sectors, without any
prior bias for its lack of interpretability.Basque GovernmentConsolidated Research Group MATHMODE - Department of Education of the Basque Government IT1294-19Spanish GovernmentEuropean Commission TIN2017-89517-PBBVA Foundation through its Ayudas Fundacion BBVA a Equipos de Investigacion Cientifica 2018 call (DeepSCOP project)European Commission 82561
Multi-Agent Systems
A multi-agent system (MAS) is a system composed of multiple interacting intelligent agents. Multi-agent systems can be used to solve problems which are difficult or impossible for an individual agent or monolithic system to solve. Agent systems are open and extensible systems that allow for the deployment of autonomous and proactive software components. Multi-agent systems have been brought up and used in several application domains
Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI
In the last few years, Artificial Intelligence (AI) has achieved a notable momentum that, if harnessed appropriately, may deliver the best of expectations over many application sectors across the field. For this to occur shortly in Machine Learning, the entire community stands in front of the barrier of explainability, an inherent problem of the latest techniques brought by sub-symbolism (e.g. ensembles or Deep Neural Networks) that were not present in the last hype of AI (namely, expert systems and rule based models). Paradigms underlying this problem fall within the so-called eXplainable AI (XAI) field, which is widely acknowledged as a crucial feature for the practical deployment of AI models. The overview presented in this article examines the existing literature and contributions already done in the field of XAI, including a prospect toward what is yet to be reached. For this purpose we summarize previous efforts made to define explainability in Machine Learning, establishing a novel definition of explainable Machine Learning that covers such prior conceptual propositions with a major focus on the audience for which the explainability is sought. Departing from this definition, we propose and discuss about a taxonomy of recent contributions related to the explainability of different Machine Learning models, including those aimed at explaining Deep Learning methods for which a second dedicated taxonomy is built and examined in detail. This critical literature analysis serves as the motivating background for a series of challenges faced by XAI, such as the interesting crossroads of data fusion and explainability. Our prospects lead toward the concept of Responsible Artificial Intelligence, namely, a methodology for the large-scale implementation of AI methods in real organizations with fairness, model explainability and accountability at its core. Our ultimate goal is to provide newcomers to the field of XAI with a thorough taxonomy that can serve as reference material in order to stimulate future research advances, but also to encourage experts and professionals from other disciplines to embrace the benefits of AI in their activity sectors, without any prior bias for its lack of interpretability
- âŠ