84 research outputs found

    The use of knowledge discovery databases in the identification of patients with colorectal cancer

    Get PDF
    Colorectal cancer is one of the most common forms of malignancy with 35,000 new patients diagnosed annually within the UK. Survival figures show that outcomes are less favourable within the UK when compared with the USA and Europe with 1 in 4 patients having incurable disease at presentation as of data from 2000.Epidemiologists have demonstrated that the incidence of colorectal cancer is highest on the industrialised western world with numerous contributory factors. These range from a genetic component to concurrent medical conditions and personal lifestyle. In addition, data also demonstrates that environmental changes play a significant role with immigrants rapidly reaching the incidence rates of the host country.Detection of colorectal cancer remains an important and evolving aspect of healthcare with the aim of improving outcomes by earlier diagnosis. This process was initially revolutionised within the UK in 2002 with the ACPGBI 2 week wait guidelines to facilitate referrals form primary care and has subsequently seen other schemes such as bowel cancer screening introduced to augment earlier detection rates. Whereas the national screening programme is dependent on FOBT the standard referral practice is dependent upon a number of trigger symptoms that qualify for an urgent referral to a specialist for further investigations. This process only identifies 25-30% of those with colorectal cancer and remains a labour intensive process with only 10% of those seen in the 2 week wait clinics having colorectal cancer.This thesis hypothesises whether using a patient symptom questionnaire in conjunction with knowledge discovery techniques such as data mining and artificial neural networks could identify patients at risk of colorectal cancer and therefore warrant urgent further assessment. Artificial neural networks and data mining methods are used widely in industry to detect consumer patterns by an inbuilt ability to learn from previous examples within a dataset and model often complex, non-linear patterns. Within medicine these methods have been utilised in a host of diagnostic techniques from myocardial infarcts to its use in the Papnet cervical smear programme for cervical cancer detection.A linkert based questionnaire of those attending the 2 week wait fast track colorectal clinic was used to produce a ‘symptoms’ database. This was then correlated with individual patient diagnoses upon completion of their clinical assessment. A total of 777 patients were included in the study and their diagnosis categorised into a dichotomous variable to create a selection of datasets for analysis. These data sets were then taken by the author and used to create a total of four primary databases based on all questions, 2 week wait trigger symptoms, Best knowledge questions and symptoms identified in Univariate analysis as significant. Each of these databases were entered into an artificial neural network programme, altering the number of hidden units and layers to obtain a selection of outcome models that could be further tested based on a selection of set dichotomous outcomes. Outcome models were compared for sensitivity, specificity and risk. Further experiments were carried out with data mining techniques and the WEKA package to identify the most accurate model. Both would then be compared with the accuracy of a colorectal specialist and GP.Analysis of the data identified that 24% of those referred on the 2 week wait referral pathway failed to meet referral criteria as set out by the ACPGBI. The incidence of those with colorectal cancer was 9.5% (74) which is in keeping with other studies and the main symptoms were rectal bleeding, change in bowel habit and abdominal pain. The optimal knowledge discovery database model was a back propagation ANN using all variables for outcomes cancer/not cancer with sensitivity of 0.9, specificity of 0.97 and LR 35.8. Artificial neural networks remained the more accurate modelling method for all the dichotomous outcomes.The comparison of GP’s and colorectal specialists at predicting outcome demonstrated that the colorectal specialists were the more accurate predictors of cancer/not cancer with sensitivity 0.27 and specificity 0.97, (95% CI 0.6-0.97, PPV 0.75, NPV 0.83) and LR 10.6. When compared to the KDD models for predicting the same outcome, once again the ANN models were more accurate with the optimal model having sensitivity 0.63, specificity 0.98 (95% CI 0.58-1, PPV 0.71, NPV 0.96) and LR 28.7.The results demonstrate that diagnosis colorectal cancer remains a challenging process, both for clinicians and also for computation models. KDD models have been shown to be consistently more accurate in the prediction of those with colorectal cancer than clinicians alone when used solely in conjunction with a questionnaire. It would be ill conceived to suggest that KDD models could be used as a replacement to clinician- patient interaction but they may aid in the acceleration of some patients for further investigations or ‘straight to test’ if used on those referred as routine patients

    Machine Learning na previsão de Cancro Colorretal em função de alterações metabólicas

    Get PDF
    No mundo atual, a quantidade de informação disponível nos mais variados setores é cada vez maior. É o caso da área da saúde, onde a recolha e tratamento de dados biomédicos procuram melhorar a tomada de decisão no tratamento a aplicar a um doente, recorrendo a ferramentas baseadas em Machine Learning. Machine Learning é uma área da Inteligência Artificial em que através da aplicação de algoritmos a um conjunto de dados é possível prever resultados ou até descobrir relações entre estes que seriam impercetíveis à primeira vista. Com este projeto pretende-se realizar um estudo em que o objetivo é investigar diversos algoritmos e técnicas de Machine Learning, de modo a identificar se o perfil de acilcarnitinas pode constituir um novo marcador bioquímico para a predição e prognóstico do Cancro Colorretal. No decurso do trabalho, foram testados diferentes algoritmos e técnicas de pré-processamento de dados. Foram realizadas três experiências distintas com o objetivo de validar as previsões dos modelos construídos para diferentes cenários, nomeadamente: prever se o paciente tem Cancro Colorretal, prever qual a doença que o paciente tem (Cancro Colorretal e outras doenças metabólicas) e prever se este tem ou não alguma doença. Numa primeira análise, os modelos desenvolvidos apresentam bons resultados na triagem de Cancro Colorretal. Os melhores resultados foram obtidos pelos algoritmos Random Forest e Gradient Boosting, em conjunto com técnicas de balanceamento dos dados e Feature Selection, nomeadamente Random Oversampling, Synthetic Oversampling e Recursive Feature SelectionIn today´s world, the amount of information available in various sectors is increasing. That is the case in the healthcare area, where the collection and treatment of biochemical data seek to improve the decision-making in the treatment to be applied to a patient, using Machine Learning-based tools. Machine learning is an area of Artificial Intelligence in which applying algorithms to a dataset makes it possible to predict results or even discover relationships that would be unnoticeable at first glance. This project’s main objective is to study several algorithms and techniques of Machine Learning to identify if the acylcarnitine profile may constitute a new biochemical marker for the prediction and prognosis of rectal cancer. In the course of the work, different algorithms and data preprocessing techniques were tested. Three different experiments were carried out to validate the predictions of the models built for different scenarios, namely: predicting whether the patient has Colorectal Cancer, predicting which disease the patient has (Colorectal Cancer and other metabolic diseases) and predicting whether he has any disease. As a first analysis, the developed models showed good results in Colorectal Cancer screening. The best results were obtained by the Random Forest and Gradient Boosting algorithms, together with data balancing and feature selection techniques, namely Random Oversampling, Synthetic Oversampling and Recursive Feature Selectio

    Surgical Subtask Automation for Intraluminal Procedures using Deep Reinforcement Learning

    Get PDF
    Intraluminal procedures have opened up a new sub-field of minimally invasive surgery that use flexible instruments to navigate through complex luminal structures of the body, resulting in reduced invasiveness and improved patient benefits. One of the major challenges in this field is the accurate and precise control of the instrument inside the human body. Robotics has emerged as a promising solution to this problem. However, to achieve successful robotic intraluminal interventions, the control of the instrument needs to be automated to a large extent. The thesis first examines the state-of-the-art in intraluminal surgical robotics and identifies the key challenges in this field, which include the need for safe and effective tool manipulation, and the ability to adapt to unexpected changes in the luminal environment. To address these challenges, the thesis proposes several levels of autonomy that enable the robotic system to perform individual subtasks autonomously, while still allowing the surgeon to retain overall control of the procedure. The approach facilitates the development of specialized algorithms such as Deep Reinforcement Learning (DRL) for subtasks like navigation and tissue manipulation to produce robust surgical gestures. Additionally, the thesis proposes a safety framework that provides formal guarantees to prevent risky actions. The presented approaches are evaluated through a series of experiments using simulation and robotic platforms. The experiments demonstrate that subtask automation can improve the accuracy and efficiency of tool positioning and tissue manipulation, while also reducing the cognitive load on the surgeon. The results of this research have the potential to improve the reliability and safety of intraluminal surgical interventions, ultimately leading to better outcomes for patients and surgeons

    Uncertainty, interpretability and dataset limitations in Deep Learning

    Full text link
    [eng] Deep Learning (DL) has gained traction in the last years thanks to the exponential increase in compute power. New techniques and methods are published at a daily basis, and records are being set across multiple disciplines. Undeniably, DL has brought a revolution to the machine learning field and to our lives. However, not everything has been resolved and some considerations must be taken into account. For instance, obtaining uncertainty measures and bounds is still an open problem. Models should be able to capture and express the confidence they have in their decisions, and Artificial Neural Networks (ANN) are known to lack in this regard. Be it through out of distribution samples, adversarial attacks, or simply unrelated or nonsensical inputs, ANN models demonstrate an unfounded and incorrect tendency to still output high probabilities. Likewise, interpretability remains an unresolved question. Some fields not only need but rely on being able to provide human interpretations of the thought process of models. ANNs, and specially deep models trained with DL, are hard to reason about. Last but not least, there is a tendency that indicates that models are getting deeper and more complex. At the same time, to cope with the increasing number of parameters, datasets are required to be of higher quality and, usually, larger. Not all research, and even less real world applications, can keep with the increasing demands. Therefore, taking into account the previous issues, the main aim of this thesis is to provide methods and frameworks to tackle each of them. These approaches should be applicable to any suitable field and dataset, and are employed with real world datasets as proof of concept. First, we propose a method that provides interpretability with respect to the results through uncertainty measures. The model in question is capable of reasoning about the uncertainty inherent in data and leverages that information to progressively refine its outputs. In particular, the method is applied to land cover segmentation, a classification task that aims to assign a type of land to each pixel in satellite images. The dataset and application serve to prove that the final uncertainty bound enables the end-user to reason about the possible errors in the segmentation result. Second, Recurrent Neural Networks are used as a method to create robust models towards lacking datasets, both in terms of size and class balance. We apply them to two different fields, road extraction in satellite images and Wireless Capsule Endoscopy (WCE). The former demonstrates that contextual information in the temporal axis of data can be used to create models that achieve comparable results to state-of-the-art while being less complex. The latter, in turn, proves that contextual information for polyp detection can be crucial to obtain models that generalize better and obtain higher performance. Last, we propose two methods to leverage unlabeled data in the model creation process. Often datasets are easier to obtain than to label, which results in many wasted opportunities with traditional classification approaches. Our approaches based on self-supervised learning result in a novel contrastive loss that is capable of extracting meaningful information out of pseudo-labeled data. Applying both methods to WCE data proves that the extracted inherent knowledge creates models that perform better in extremely unbalanced datasets and with lack of data. To summarize, this thesis demonstrates potential solutions to obtain uncertainty bounds, provide reasonable explanations of the outputs, and to combat lack of data or unbalanced datasets. Overall, the presented methods have a positive impact on the DL field and could have a real and tangible effect for the society.[cat] És innegable que el Deep Learning ha causat una revolució en molts aspectes no solament de l’aprenentatge automàtic però també de les nostres vides diàries. Tot i així, encara queden aspectes a millorar. Les xarxes neuronals tenen problemes per estimar la seva confiança en les prediccions, i sovint reporten probabilitats altes en casos que no tenen relació amb el model o que directament no tenen sentit. De la mateixa forma, interpretar els resultats d’un model profund i complex resulta una tasca extremadament complicada. Aquests mateixos models, cada cop amb més paràmetres i més potents, requereixen també de dades més ben etiquetades i més completes. Tenint en compte aquestes limitacions, l’objectiu principal és el de buscar mètodes i algoritmes per trobar-ne solució. Primerament, es proposa la creació d’un mètode capaç d’obtenir incertesa en imatges satèl·lit i d’utilitzar-la per crear models més robustos i resultats interpretables. En segon lloc, s’utilitzen Recurrent Neural Networks (RNN) per combatre la falta de dades mitjançant l’obtenció d’informació contextual de dades temporals. Aquestes s’apliquen per l’extracció de carreteres d’imatges satèl·lit i per la classificació de pòlips en imatges obtingudes amb Wireless Capsule Endoscopy (WCE). Finalment, es plantegen dos mètodes per tractar amb la falta de dades etiquetades i desbalancejos en les classes amb l’ús de Self-supervised Learning (SSL). Seqüències no etiquetades d’imatges d’intestins s’incorporen en el models en una fase prèvia a la classificació tradicional. Aquesta tesi demostra que les solucions proposades per obtenir mesures d’incertesa són efectives per donar explicacions raonables i interpretables sobre els resultats. Igualment, es prova que el context en dades de caràcter temporal, obtingut amb RNNs, serveix per obtenir models més simples que poden arribar a solucionar els problemes derivats de la falta de dades. Per últim, es mostra que SSL serveix per combatre de forma efectiva els problemes de generalització degut a dades no balancejades en diversos dominis de WCE. Concloem que aquesta tesi presenta mètodes amb un impacte real en diversos aspectes de DL a la vegada que demostra la capacitat de tenir un impacte positiu en la societat

    Pattern Recognition

    Get PDF
    Pattern recognition is a very wide research field. It involves factors as diverse as sensors, feature extraction, pattern classification, decision fusion, applications and others. The signals processed are commonly one, two or three dimensional, the processing is done in real- time or takes hours and days, some systems look for one narrow object class, others search huge databases for entries with at least a small amount of similarity. No single person can claim expertise across the whole field, which develops rapidly, updates its paradigms and comprehends several philosophical approaches. This book reflects this diversity by presenting a selection of recent developments within the area of pattern recognition and related fields. It covers theoretical advances in classification and feature extraction as well as application-oriented works. Authors of these 25 works present and advocate recent achievements of their research related to the field of pattern recognition

    Deep Learning in Medical Image Analysis

    Get PDF
    The accelerating power of deep learning in diagnosing diseases will empower physicians and speed up decision making in clinical environments. Applications of modern medical instruments and digitalization of medical care have generated enormous amounts of medical images in recent years. In this big data arena, new deep learning methods and computational models for efficient data processing, analysis, and modeling of the generated data are crucially important for clinical applications and understanding the underlying biological process. This book presents and highlights novel algorithms, architectures, techniques, and applications of deep learning for medical image analysis
    corecore