521 research outputs found
Data Mining
Data mining is a branch of computer science that is used to automatically extract meaningful, useful knowledge and previously unknown, hidden, interesting patterns from a large amount of data to support the decision-making process. This book presents recent theoretical and practical advances in the field of data mining. It discusses a number of data mining methods, including classification, clustering, and association rule mining. This book brings together many different successful data mining studies in various areas such as health, banking, education, software engineering, animal science, and the environment
Applicability and Interpretability of Logical Analysis of Data in Condition Based Maintenance
Résumé
Cette thĂšse Ă©tudie lâapplicabilitĂ© et lâadaptabilitĂ© dâune approche dâexploration de donnĂ©es basĂ©e sur lâintelligence artificielle proposĂ©e dans [Hammer, 1986] et appelĂ©e analyse logique de donnĂ©es (LAD) aux applications diagnostiques dans le domaine de la maintenance conditionnelle CBM). La plupart des technologies utilisĂ©es Ă ce jour pour la prise de dĂ©cision dans la maintenance conditionnelle ont tendance Ă automatiser le processus de diagnostic, sans offrir aucune connaissance ajoutĂ©e qui pourrait ĂȘtre utile Ă lâopĂ©ration de maintenance et au personnel de maintenance. Par comparaison Ă dâautres techniques de prise de dĂ©cision dans le domaine de
la CBM, la LAD possĂšde deux avantages majeurs : (1) il sâagit dâune approche non statistique, donc les donnĂ©es nâont pas Ă satisfaire des suppositions statistiques et (2) elle gĂ©nĂšre des formes interprĂ©tables qui pourraient aider Ă rĂ©soudre les problĂšmes de maintenance. Une Ă©tude sur
lâapplication de la LAD dans la maintenance conditionnelle est prĂ©sentĂ©e dans cette recherche dont lâobjectif est (1) dâĂ©tudier lâapplicabilitĂ© de la LAD dans des situations diffĂ©rentes qui nĂ©cessitent des considĂ©rations particuliĂšres concernant les types de donnĂ©es dâentrĂ©e et les dĂ©cisions de maintenance, (2) dâadapter la mĂ©thode LAD aux exigences particuliĂšres qui se posent Ă partir de ces applications et (3) dâamĂ©liorer la mĂ©thodologie LAD afin dâaugmenter lâexactitude de diagnostic et dâinterprĂ©tation de rĂ©sultats.
Les aspects innovants de la recherche prĂ©sentĂ©s dans cette thĂšse sont (1) lâapplication de la LAD dans la CBM pour la premiĂšre fois dans des applications qui bĂ©nĂ©ficient des propriĂ©tĂ©s uniques de cette technologie et (2) les modifications innovatrices de la mĂ©thodologie de la LAD, en
particulier dans le domaine de la gĂ©nĂ©ration des formes, afin dâamĂ©liorer ses performances dans le cadre de la CBM et dans le domaine de classification multiclasses.
La recherche menĂ©e dans cette thĂšse a suivi une approche Ă©volutive afin dâatteindre les objectifs
Ă©noncĂ©s ci-dessus. La LAD a Ă©tĂ© utilisĂ©e et adaptĂ©e Ă trois applications : (1) la dĂ©tection des composants malveillants (Rogue) dans lâinventaire de piĂšces de rechange rĂ©parables dâune compagnie aĂ©rienne commerciale, (2) la dĂ©tection et lâidentification des dĂ©fauts dans les transformateurs de puissance en utilisant la DGA et (3) la dĂ©tection des dĂ©fauts dans les rotors en utilisant des signaux de vibration. Cette recherche conclut que la LAD est une approche de prise de dĂ©cision prometteuse qui ajoute dâimportants avantages Ă la mise en oeuvre de la CBM dans
lâindustrie.----------Abstract
This thesis studies the applicability and adaptability of a data mining artificial intelligence
approach called Logical Analysis of Data (LAD) to diagnostic applications in Condition Based
Maintenance (CBM). Most of the technologies used so far for decision support in CBM tend to
automate the diagnostic process without offering any added knowledge that could be helpful to
the maintenance operation and maintenance personnel. LAD possesses two key advantages over
other decision making technologies used in CBM: (1) it is a non-statistical approach; as such no
statistical assumptions are required for the input data, and (2) it generates interpretable patterns
that could help solve maintenance problems. A study on the implementation of LAD in CBM is
presented in this research whose objective are to study the applicability of LAD in different CBM
situations requiring special considerations regarding the types of input data and maintenance
decisions, adapt the LAD methodology to the particular requirements that arise from these
applications, and improve the LAD methodology in line with the above two objectives in order to
increase diagnosis accuracy and result interpretability.
The novelty of the research presented in this thesis is (1) the application of LAD to CBM for the
first time in applications that stand to benefit from the advantages that this technology provides;
and (2) the innovative modifications to LAD methodology, particularly in the area of pattern
generation, in order to improve its performance within the context of CBM.
The research conducted in this thesis followed an evolutionary approach in order to achieve the
objectives stated in the Introduction. The research applied LAD in three applications: (1) the
detection of Rogue components within the spare part inventory of reparable components in a
commercial airline company, (2) the detection and identification of faults in power transformers
using DGA, and (3) the detection of faults in rotor bearings using vibration signals. This research
concludes that LAD is a promising decision making approach that adds important benefits to the
implementation of CBM in the industry
Machine Learning in Manufacturing towards Industry 4.0: From âFor Nowâ to âFour-Knowâ
While attracting increasing research attention in science and technology, Machine Learning (ML) is playing a critical role in the digitalization of manufacturing operations towards Industry 4.0. Recently, ML has been applied in several fields of production engineering to solve a variety of tasks with different levels of complexity and performance. However, in spite of the enormous number of ML use cases, there is no guidance or standard for developing ML solutions from ideation to deployment. This paper aims to address this problem by proposing an ML application roadmap for the manufacturing industry based on the state-of-the-art published research on the topic. First, this paper presents two dimensions for formulating ML tasks, namely, âFour-Knowâ (Know-what, Know-why, Know-when, Know-how) and âFour-Levelâ (Product, Process, Machine, System). These are used to analyze ML development trends in manufacturing. Then, the paper provides an implementation pipeline starting from the very early stages of ML solution development and summarizes the available ML methods, including supervised learning methods, semi-supervised methods, unsupervised methods, and reinforcement methods, along with their typical applications. Finally, the paper discusses the current challenges during ML applications and provides an outline of possible directions for future developments
Ascertaining Pain in Mental Health Records:Combining Empirical and Knowledge-Based Methods for Clinical Modelling of Electronic Health Record Text
In recent years, state-of-the-art clinical Natural Language Processing (NLP), as in other domains, has been dominated by neural networks and other statistical models. In contrast to the unstructured nature of Electronic Health Record (EHR) text, biomedical knowledge is increasingly available in structured and codified forms, underpinned by curated databases, machine-readable clinical guidelines, and logically defined terminologies. This thesis examines the incorporation of external medical knowledge into clinical NLP and tests these methods on a use case of ascertaining physical pain in clinical notes of mental health records.Pain is a common reason for accessing healthcare resources and has been a growing area of research, especially its impact on mental health. Pain also presents a unique NLP problem due to its ambiguous nature and the varying circumstances in which it can be used. For these reasons, pain has been chosen as a use case, making it a good case study for the application of the methods explored in this thesis. Models are built by assimilating both structured medical knowledge and clinical NLP and leveraging the inherent relations that exist within medical ontologies. The data source used in this project is a mental health EHR database called CRIS, which contains de-identified patient records from the South London and Maudsley NHS Foundation Trust, one of the largest mental health providers in Western Europe.A lexicon of pain terms was developed to identify documents within CRIS mentioning painrelated terms. Gold standard annotations were created by conducting manual annotations on these documents. These gold standard annotations were used to build models for a binary classification task, with the objective of classifying sentences from the clinical text as ârelevantâ, which indicates the sentence contains relevant mentions of pain, i.e., physical pain affecting the patient, or ânot relevantâ, which indicates the sentence does not contain mentions of physical pain, or the mention does not relate to the patient (ex: someone else in physical pain). Two models incorporating structured medical knowledge were built:1. a transformer-based model, SapBERT, that utilises a knowledge graph of the UMLS ontology, and2. a knowledge graph embedding model that utilises embeddings from SNOMED CT, which was then used to build a random forest classifier. This was achieved by modelling the clinical pain terms and their relations from SNOMED CT into knowledge graph embeddings, thus combining the data-driven view of clinical language, with the logical view of medical knowledge.These models have been compared with NLP models (binary classifiers) that do not incorporate such structured medical knowledge:1. a transformer-based model, BERT_base, and2. a random forest classifier model.Amongst the two transformer-based models, SapBERT performed better at the classification task (F1-score: 0.98), and amongst the random forest models, the one incorporating knowledge graph embeddings performed better (F1-score: 0.94). The SapBERT model was run on sentences from a cohort of patients within CRIS, with the objective of conducting a prevalence study to understand the distribution of pain based on sociodemographic and diagnostic factors.The contribution of this research is both methodological and practical, showing the difference between a conventional NLP approach of binary classification and one that incorporates external knowledge, and further utilising the models obtained from both these approaches ina prevalence study which was designed based on inputs from clinicians and a patient and public involvement group. The results emphasise the significance of going beyond the conventional approach to NLP when addressing complex issues such as pain.<br/
ChatCAD+: Towards a Universal and Reliable Interactive CAD using LLMs
The integration of Computer-Assisted Diagnosis (CAD) with Large Language
Models (LLMs) holds great potential in clinical applications, specifically in
the roles of digital family doctors and clinic assistants. However, current
works in this field are plagued by limitations, specifically a restricted scope
of applicable image domains and the provision of unreliable medical advice This
restricts their overall processing capabilities. Furthermore, the mismatch in
writing style between LLMs and radiologists undermines their practical
usefulness. To tackle these challenges, we introduce ChatCAD+, which is
designed to be universal and reliable. It is capable of handling medical images
from diverse domains and leveraging up-to-date information from reputable
medical websites to provide reliable medical advice. Additionally, it
incorporates a template retrieval system that improves report generation
performance via exemplar reports, enabling seamless integration into existing
clinical workflows. The source code is available at
https://github.com/zhaozh10/ChatCAD.Comment: Authors Zihao Zhao, Sheng Wang, Jinchen Gu, Yitao Zhu contributed
equally to this work and should be considered co-first author
Improving data management through automatic information extraction model in ontology for road asset management
lRoads are a critical component of transportation infrastructure, and their effective maintenance is paramount in ensuring their continued functionality and safety. This research proposes a novel information management approach based on state-of-the-art deep learning models and ontologies. The approach can automatically extract, integrate, complete, and search for project knowledge buried in unstructured text documents. The approach on the one hand facilitates implementation of modern management approaches, i.e., advanced working packaging to delivery success road management projects, on the other hand improves information management practices in the construction industry
- âŠ