521 research outputs found

    Data Mining

    Get PDF
    Data mining is a branch of computer science that is used to automatically extract meaningful, useful knowledge and previously unknown, hidden, interesting patterns from a large amount of data to support the decision-making process. This book presents recent theoretical and practical advances in the field of data mining. It discusses a number of data mining methods, including classification, clustering, and association rule mining. This book brings together many different successful data mining studies in various areas such as health, banking, education, software engineering, animal science, and the environment

    Applicability and Interpretability of Logical Analysis of Data in Condition Based Maintenance

    Get PDF
    RĂ©sumĂ© Cette thĂšse Ă©tudie l’applicabilitĂ© et l’adaptabilitĂ© d’une approche d’exploration de donnĂ©es basĂ©e sur l’intelligence artificielle proposĂ©e dans [Hammer, 1986] et appelĂ©e analyse logique de donnĂ©es (LAD) aux applications diagnostiques dans le domaine de la maintenance conditionnelle CBM). La plupart des technologies utilisĂ©es Ă  ce jour pour la prise de dĂ©cision dans la maintenance conditionnelle ont tendance Ă  automatiser le processus de diagnostic, sans offrir aucune connaissance ajoutĂ©e qui pourrait ĂȘtre utile Ă  l’opĂ©ration de maintenance et au personnel de maintenance. Par comparaison Ă  d’autres techniques de prise de dĂ©cision dans le domaine de la CBM, la LAD possĂšde deux avantages majeurs : (1) il s’agit d’une approche non statistique, donc les donnĂ©es n’ont pas Ă  satisfaire des suppositions statistiques et (2) elle gĂ©nĂšre des formes interprĂ©tables qui pourraient aider Ă  rĂ©soudre les problĂšmes de maintenance. Une Ă©tude sur l’application de la LAD dans la maintenance conditionnelle est prĂ©sentĂ©e dans cette recherche dont l’objectif est (1) d’étudier l’applicabilitĂ© de la LAD dans des situations diffĂ©rentes qui nĂ©cessitent des considĂ©rations particuliĂšres concernant les types de donnĂ©es d’entrĂ©e et les dĂ©cisions de maintenance, (2) d’adapter la mĂ©thode LAD aux exigences particuliĂšres qui se posent Ă  partir de ces applications et (3) d’amĂ©liorer la mĂ©thodologie LAD afin d’augmenter l’exactitude de diagnostic et d’interprĂ©tation de rĂ©sultats. Les aspects innovants de la recherche prĂ©sentĂ©s dans cette thĂšse sont (1) l’application de la LAD dans la CBM pour la premiĂšre fois dans des applications qui bĂ©nĂ©ficient des propriĂ©tĂ©s uniques de cette technologie et (2) les modifications innovatrices de la mĂ©thodologie de la LAD, en particulier dans le domaine de la gĂ©nĂ©ration des formes, afin d’amĂ©liorer ses performances dans le cadre de la CBM et dans le domaine de classification multiclasses. La recherche menĂ©e dans cette thĂšse a suivi une approche Ă©volutive afin d’atteindre les objectifs Ă©noncĂ©s ci-dessus. La LAD a Ă©tĂ© utilisĂ©e et adaptĂ©e Ă  trois applications : (1) la dĂ©tection des composants malveillants (Rogue) dans l’inventaire de piĂšces de rechange rĂ©parables d’une compagnie aĂ©rienne commerciale, (2) la dĂ©tection et l’identification des dĂ©fauts dans les transformateurs de puissance en utilisant la DGA et (3) la dĂ©tection des dĂ©fauts dans les rotors en utilisant des signaux de vibration. Cette recherche conclut que la LAD est une approche de prise de dĂ©cision prometteuse qui ajoute d’importants avantages Ă  la mise en oeuvre de la CBM dans l’industrie.----------Abstract This thesis studies the applicability and adaptability of a data mining artificial intelligence approach called Logical Analysis of Data (LAD) to diagnostic applications in Condition Based Maintenance (CBM). Most of the technologies used so far for decision support in CBM tend to automate the diagnostic process without offering any added knowledge that could be helpful to the maintenance operation and maintenance personnel. LAD possesses two key advantages over other decision making technologies used in CBM: (1) it is a non-statistical approach; as such no statistical assumptions are required for the input data, and (2) it generates interpretable patterns that could help solve maintenance problems. A study on the implementation of LAD in CBM is presented in this research whose objective are to study the applicability of LAD in different CBM situations requiring special considerations regarding the types of input data and maintenance decisions, adapt the LAD methodology to the particular requirements that arise from these applications, and improve the LAD methodology in line with the above two objectives in order to increase diagnosis accuracy and result interpretability. The novelty of the research presented in this thesis is (1) the application of LAD to CBM for the first time in applications that stand to benefit from the advantages that this technology provides; and (2) the innovative modifications to LAD methodology, particularly in the area of pattern generation, in order to improve its performance within the context of CBM. The research conducted in this thesis followed an evolutionary approach in order to achieve the objectives stated in the Introduction. The research applied LAD in three applications: (1) the detection of Rogue components within the spare part inventory of reparable components in a commercial airline company, (2) the detection and identification of faults in power transformers using DGA, and (3) the detection of faults in rotor bearings using vibration signals. This research concludes that LAD is a promising decision making approach that adds important benefits to the implementation of CBM in the industry

    Machine Learning in Manufacturing towards Industry 4.0: From ‘For Now’ to ‘Four-Know’

    Get PDF
    While attracting increasing research attention in science and technology, Machine Learning (ML) is playing a critical role in the digitalization of manufacturing operations towards Industry 4.0. Recently, ML has been applied in several fields of production engineering to solve a variety of tasks with different levels of complexity and performance. However, in spite of the enormous number of ML use cases, there is no guidance or standard for developing ML solutions from ideation to deployment. This paper aims to address this problem by proposing an ML application roadmap for the manufacturing industry based on the state-of-the-art published research on the topic. First, this paper presents two dimensions for formulating ML tasks, namely, ’Four-Know’ (Know-what, Know-why, Know-when, Know-how) and ’Four-Level’ (Product, Process, Machine, System). These are used to analyze ML development trends in manufacturing. Then, the paper provides an implementation pipeline starting from the very early stages of ML solution development and summarizes the available ML methods, including supervised learning methods, semi-supervised methods, unsupervised methods, and reinforcement methods, along with their typical applications. Finally, the paper discusses the current challenges during ML applications and provides an outline of possible directions for future developments

    Ascertaining Pain in Mental Health Records:Combining Empirical and Knowledge-Based Methods for Clinical Modelling of Electronic Health Record Text

    Get PDF
    In recent years, state-of-the-art clinical Natural Language Processing (NLP), as in other domains, has been dominated by neural networks and other statistical models. In contrast to the unstructured nature of Electronic Health Record (EHR) text, biomedical knowledge is increasingly available in structured and codified forms, underpinned by curated databases, machine-readable clinical guidelines, and logically defined terminologies. This thesis examines the incorporation of external medical knowledge into clinical NLP and tests these methods on a use case of ascertaining physical pain in clinical notes of mental health records.Pain is a common reason for accessing healthcare resources and has been a growing area of research, especially its impact on mental health. Pain also presents a unique NLP problem due to its ambiguous nature and the varying circumstances in which it can be used. For these reasons, pain has been chosen as a use case, making it a good case study for the application of the methods explored in this thesis. Models are built by assimilating both structured medical knowledge and clinical NLP and leveraging the inherent relations that exist within medical ontologies. The data source used in this project is a mental health EHR database called CRIS, which contains de-identified patient records from the South London and Maudsley NHS Foundation Trust, one of the largest mental health providers in Western Europe.A lexicon of pain terms was developed to identify documents within CRIS mentioning painrelated terms. Gold standard annotations were created by conducting manual annotations on these documents. These gold standard annotations were used to build models for a binary classification task, with the objective of classifying sentences from the clinical text as “relevant”, which indicates the sentence contains relevant mentions of pain, i.e., physical pain affecting the patient, or “not relevant”, which indicates the sentence does not contain mentions of physical pain, or the mention does not relate to the patient (ex: someone else in physical pain). Two models incorporating structured medical knowledge were built:1. a transformer-based model, SapBERT, that utilises a knowledge graph of the UMLS ontology, and2. a knowledge graph embedding model that utilises embeddings from SNOMED CT, which was then used to build a random forest classifier. This was achieved by modelling the clinical pain terms and their relations from SNOMED CT into knowledge graph embeddings, thus combining the data-driven view of clinical language, with the logical view of medical knowledge.These models have been compared with NLP models (binary classifiers) that do not incorporate such structured medical knowledge:1. a transformer-based model, BERT_base, and2. a random forest classifier model.Amongst the two transformer-based models, SapBERT performed better at the classification task (F1-score: 0.98), and amongst the random forest models, the one incorporating knowledge graph embeddings performed better (F1-score: 0.94). The SapBERT model was run on sentences from a cohort of patients within CRIS, with the objective of conducting a prevalence study to understand the distribution of pain based on sociodemographic and diagnostic factors.The contribution of this research is both methodological and practical, showing the difference between a conventional NLP approach of binary classification and one that incorporates external knowledge, and further utilising the models obtained from both these approaches ina prevalence study which was designed based on inputs from clinicians and a patient and public involvement group. The results emphasise the significance of going beyond the conventional approach to NLP when addressing complex issues such as pain.<br/

    ChatCAD+: Towards a Universal and Reliable Interactive CAD using LLMs

    Full text link
    The integration of Computer-Assisted Diagnosis (CAD) with Large Language Models (LLMs) holds great potential in clinical applications, specifically in the roles of digital family doctors and clinic assistants. However, current works in this field are plagued by limitations, specifically a restricted scope of applicable image domains and the provision of unreliable medical advice This restricts their overall processing capabilities. Furthermore, the mismatch in writing style between LLMs and radiologists undermines their practical usefulness. To tackle these challenges, we introduce ChatCAD+, which is designed to be universal and reliable. It is capable of handling medical images from diverse domains and leveraging up-to-date information from reputable medical websites to provide reliable medical advice. Additionally, it incorporates a template retrieval system that improves report generation performance via exemplar reports, enabling seamless integration into existing clinical workflows. The source code is available at https://github.com/zhaozh10/ChatCAD.Comment: Authors Zihao Zhao, Sheng Wang, Jinchen Gu, Yitao Zhu contributed equally to this work and should be considered co-first author

    Improving data management through automatic information extraction model in ontology for road asset management

    Get PDF
    lRoads are a critical component of transportation infrastructure, and their effective maintenance is paramount in ensuring their continued functionality and safety. This research proposes a novel information management approach based on state-of-the-art deep learning models and ontologies. The approach can automatically extract, integrate, complete, and search for project knowledge buried in unstructured text documents. The approach on the one hand facilitates implementation of modern management approaches, i.e., advanced working packaging to delivery success road management projects, on the other hand improves information management practices in the construction industry
    • 

    corecore