141 research outputs found

    Diagnóstico no invasivo de patologías humanas combinando análisis de aliento y modelización con redes neuronales

    Get PDF
    Tesis inédita de la Universidad Complutense de Madrid, Facultad de Ciencias Químicas, leída el 09-09-2016It is currently known that there is a direct relation between the moment a disease is detected or diagnosed and the consequences it will have on the patient, as an early detection is generally linked to a more favorable outcome. This concept is the basis of the present research, due to the fact that its main goal is the development of mathematical tools based on computational artificial intelligence to safely and non-invasively attain the detection of multiple diseases. To reach these devices, this research has focused on the breath analysis of patients with diverse diseases, using several analytical methodologies to extract the information contained in these samples, and multiple feature selection algorithms and neural networks for data analysis. In the past, it has been shown that there is a correlation between the molecular composition of breath and the clinical status of a human being, proving the existence of volatile biomarkers that can aid in disease detection depending on their presence or amount. During this research, two main types of analytical approaches have been employed to study the gaseous samples, and these were cross-reactive sensor arrays (based on organically functionalized silicon nanowire field-effect transistors (SiNW FETs) or gold nanoparticles (GNPs)) and proton transfer reaction-mass spectrometry (PTR-MS). The cross-reactive sensors analyze the bulk of the breath samples, offering global, fingerprint-like information, whereas PTR-MS quantifies the volatile molecules present in the samples. All of the analytical equipment employed leads to the generation of large amounts of data per sample, forcing the need of a meticulous mathematical analysis to adequately interpret the results. In this work, two fundamental types of mathematical tools were utilized. In first place, a set of five filter-based feature selection algorithms (χ2 (chi2) score, Fisher’s discriminant ratio, Kruskal-Wallis test, Relief-F algorithm, and information gain test) were employed to reduce the amount of independent in the large databases to the ones which contain the greatest discriminative power for a further modeling task. On the other hand, and in relation to mathematical modeling, artificial neural networks (ANNs), algorithms that are categorized as computational artificial intelligence, have been employed. These non-linear tools have been used to locate the relations between the independent variables of a system and the dependent ones to fulfill estimations or classifications. The type of ANN that has been used in this thesis coincides with the one that is more commonly employed in research, which is the supervised multilayer perceptron (MLP), due to its proven ability to create reliable models for many different applications...Actualmente es sabido que existe una relación directa entre el momento en el cual se detecta o diagnostica una enfermedad y las consecuencias que tendrá sobre el paciente, ya que una detección temprana va generalmente ligada a un desarrollo más favorable. Este concepto es el cimiento de la presente investigación, cuyo objetivo fundamental es el desarrollo de herramientas basadas en inteligencia artificial computacional que consigan, mediante medios seguros y no invasivos, la detección de diversas enfermedades. Para alcanzar dichos sistemas, los estudios han sido enfocados en el análisis de muestras de aliento de pacientes de diversas enfermedades, empleando varias técnicas para extraer información, y diversos algoritmos de selección de variables y redes neuronales para el procesamiento matemático. En el pasado, se ha comprobado que hay una correlación entre la composición molecular del aliento y el estado clínico de una persona, evidenciando la existencia de biomarcadores volátiles que pueden ayudar a detectar enfermedades, ya sea por su presencia o por su cantidad. Durante el transcurso de esta investigación, se han empleado esencialmente dos tipos de técnicas analíticas para estudiar las muestras gaseosas, y estas son conjuntos de sensores de reactividad cruzada (basados en transistores de efecto de campo con nanocables de silicio (SiNW FETs) o en nanopartículas de oro (GNPs), ambos funcionalizados con cadenas orgánicas) y equipos de reacción de transferencia de protones con espectrometría de masas (PTR-MS). Los sensores de reactividad cruzada analizan el aliento en su conjunto, extrayéndose información de la muestra global, mientras que usando PTR-MS, se cuantifican las moléculas volátiles presentes en las muestras analizadas. Todas las técnicas empleadas desembocan en la generación de grandes cantidades de datos por muestra, por lo que un análisis matemático exhaustivo es necesario para poder sacar el máximo rendimiento de los estudios. En este trabajo, se emplearon principalmente dos tipos de herramientas matemáticas. Las primeras son un grupo de cinco algoritmos de selección de variables, concretamente, filtros de variables (cálculos basados en estadística de χ2 (chi2), ratio discriminante de Fisher, análisis de Kruskal-Wallis, algoritmo relief-F y test de ganancia de información), que se han empleado en las bases de datos con grandes cantidades de variables independientes para localizar aquellas con mayor importancia o poder discriminativo para una tarea de modelización matemática posterior. Por otro lado, en cuando a dicha modelización, se ha empleado un tipo de algoritmo que se cataloga dentro del área de la inteligencia artificial computacional: las redes neuronales artificiales (ANNs). Estas herramientas matemáticas de naturaleza no lineal se han utilizado para localizar las relaciones existentes entre las variables independientes de un sistema y las variables dependientes o parámetros a estimar o clasificar. Se ha empleado el tipo de ANN supervisada más extensamente usado en investigación, que son los perceptrones multicapa (MLPs), debido a su habilidad contrastada para originar modelos fiables para numerosas aplicaciones...Fac. de Ciencias QuímicasTRUEunpu

    A Study on Deep Learning for Bioinformatics

    Get PDF
    Bioinformatics, an interdisciplinary area of biology and computer science, handles large and complex data sets with linear and non-linear relationships between attributes. To handle such relationships, deep learning has got a greater importance these days. This paper analyses different deep learning architectures and their applications in Bioinformatics. The paper also addresses the limitations and challenges of deep learning

    Prediction of lung tumor types based on protein attributes by machine learning algorithms

    Full text link

    Machine Learning Prediction of COVID-19 Severity Levels From Salivaomics Data

    Full text link
    The clinical spectrum of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the strain of coronavirus that caused the COVID-19 pandemic, is broad, extending from asymptomatic infection to severe immunopulmonary reactions that, if not categorized properly, may be life-threatening. Researchers rate COVID-19 patients on a scale from 1 to 8 according to the severity level of COVID-19, 1 being healthy and 8 being extremely sick, based on a multitude of factors including number of clinic visits, days since the first sign of symptoms, and more. However, there are two issues with the current state of severity level designation. Firstly, there exists variation among researchers in determining these patient scores, which may lead to improper treatment. Secondly, researchers use a variety of metrics to determine patient severity level, including metrics involving plasma collection that require invasive procedures. This project aims to remedy both issues by introducing a machine learning framework that unifies severity level designations based on noninvasive saliva biomarkers. Our results show that we can successfully use machine learning on salivaomics data to predict the severity level of COVID-19 patients, indicating the presence of viral load using saliva biomarkers

    Machine learning application in personalised lung cancer recurrence and survivability prediction

    Get PDF
    Machine learning is an important artificial intelligence technique that is widely applied in cancer diagnosis and detection. More recently, with the rise of personalised and precision medicine, there is a growing trend towards machine learning applications for prognosis prediction. However, to date, building reliable prediction models of cancer outcomes in everyday clinical practice is still a hurdle. In this work, we integrate genomic, clinical and demographic data of lung adenocarcinoma (LUAD) and squamous cell carcinoma (LUSC) patients from The Cancer Genome Atlas (TCGA) and introduce copy number variation (CNV) and mutation information of 15 selected genes to generate predictive models for recurrence and survivability. We compare the accuracy and benefits of three well-established machine learning algorithms: decision tree methods, neural networks and support vector machines. Although the accuracy of predictive models using the decision tree method has no significant advantage, the tree models reveal the most important predictors among genomic information (e.g. KRAS, EGFR, TP53), clinical status (e.g. TNM stage and radiotherapy) and demographics (e.g. age and gender) and how they influence the prediction of recurrence and survivability for both early stage LUAD and LUSC. The machine learning models have the potential to help clinicians to make personalised decisions on aspects such as follow-up timeline and to assist with personalised planning of future social care needs

    Reconstrução e classificação de sequências de ADN desconhecidas

    Get PDF
    The continuous advances in DNA sequencing technologies and techniques in metagenomics require reliable reconstruction and accurate classification methodologies for the diversity increase of the natural repository while contributing to the organisms' description and organization. However, after sequencing and de-novo assembly, one of the highest complex challenges comes from the DNA sequences that do not match or resemble any biological sequence from the literature. Three main reasons contribute to this exception: the organism sequence presents high divergence according to the known organisms from the literature, an irregularity has been created in the reconstruction process, or a new organism has been sequenced. The inability to efficiently classify these unknown sequences increases the sample constitution's uncertainty and becomes a wasted opportunity to discover new species since they are often discarded. In this context, the main objective of this thesis is the development and validation of a tool that provides an efficient computational solution to solve these three challenges based on an ensemble of experts, namely compression-based predictors, the distribution of sequence content, and normalized sequence lengths. The method uses both DNA and amino acid sequences and provides efficient classification beyond standard referential comparisons. Unusually, it classifies DNA sequences without resorting directly to the reference genomes but rather to features that the species biological sequences share. Specifically, it only makes use of features extracted individually from each genome without using sequence comparisons. RFSC was then created as a machine learning classification pipeline that relies on an ensemble of experts to provide efficient classification in metagenomic contexts. This pipeline was tested in synthetic and real data, both achieving precise and accurate results that, at the time of the development of this thesis, have not been reported in the state-of-the-art. Specifically, it has achieved an accuracy of approximately 97% in the domain/type classification.Os contínuos avanços em tecnologias de sequenciação de ADN e técnicas em meta genómica requerem metodologias de reconstrução confiáveis e de classificação precisas para o aumento da diversidade do repositório natural, contribuindo, entretanto, para a descrição e organização dos organismos. No entanto, após a sequenciação e a montagem de-novo, um dos desafios mais complexos advém das sequências de ADN que não correspondem ou se assemelham a qualquer sequencia biológica da literatura. São três as principais razões que contribuem para essa exceção: uma irregularidade emergiu no processo de reconstrução, a sequência do organismo é altamente dissimilar dos organismos da literatura, ou um novo e diferente organismo foi reconstruído. A incapacidade de classificar com eficiência essas sequências desconhecidas aumenta a incerteza da constituição da amostra e desperdiça a oportunidade de descobrir novas espécies, uma vez que muitas vezes são descartadas. Neste contexto, o principal objetivo desta tese é fornecer uma solução computacional eficiente para resolver este desafio com base em um conjunto de especialistas, nomeadamente preditores baseados em compressão, a distribuição de conteúdo de sequência e comprimentos de sequência normalizados. O método usa sequências de ADN e de aminoácidos e fornece classificação eficiente além das comparações referenciais padrão. Excecionalmente, ele classifica as sequências de ADN sem recorrer diretamente a genomas de referência, mas sim às características que as sequências biológicas da espécie compartilham. Especificamente, ele usa apenas recursos extraídos individualmente de cada genoma sem usar comparações de sequência. Além disso, o pipeline é totalmente automático e permite a reconstrução sem referência de genomas a partir de reads FASTQ com a garantia adicional de armazenamento seguro de informações sensíveis. O RFSC é então um pipeline de classificação de aprendizagem automática que se baseia em um conjunto de especialistas para fornecer classificação eficiente em contextos meta genómicos. Este pipeline foi aplicado em dados sintéticos e reais, alcançando em ambos resultados precisos e exatos que, no momento do desenvolvimento desta dissertação, não foram relatados na literatura. Especificamente, esta ferramenta desenvolvida, alcançou uma precisão de aproximadamente 97% na classificação de domínio/tipo.Mestrado em Engenharia de Computadores e Telemátic
    corecore