5 research outputs found

    Herramienta para analizar matrices de expresión génicas con machine learning

    Get PDF
    En el campo de las aplicaciones biomédicas, es tan importante obtener una alta precisión como hacer que los modelos generados sean explicables para el personal clínico. Por esta razón, es esencial aplicar técnicas inteligentes que sean capaces de aprender de manera efectiva en estos escenarios. En esta ocasión se trata de crear un software en R para proporcionar una manera sencilla de construir un análisis explicativo de la causalidad entre la expresión génica y las condiciones del paciente. El software creado está muy automatizado facilitando las entradas de datos para estudiar diferentes matrices de expresión, con un flujo lineal, con una lectura de datos a través del código GEO, un preprocesamiento en el que se facilita un contraste de hipótesis,una normalización para hacer los datos comparables entre ellos y un filtrado de genes que reduce el cálculo computacional del posterior entrenamiento de los modelos machine learning el cual conlleva diferentes técnicas de selección de genes para, a través de la validación del modelo, detectar la relación entre la expresión génica y la condición del paciente y compartir los resultados de los genes realmente implicados en la respuesta Pongo a prueba esta herramienta con uno de los temas mas actuales en cuanto a diagnostico clínico, la detección del cáncer a través de la expresión génica de las plaquetas. Los datos se han obtenido del experimento con código GSE89843. Se obtienen AUC por encima del 90% con tan solo 10 genes, lo que supone un gran avance en este campo. El AUC se puede interpretar como la probabilidad de clasificarlos correctamente. Debido a su bajo coste por el número reducido de genes y su poca invasividad puede realizarse a modo de test preventivo y reducir su tasa de mortalidad.In the field of biomedical applications, it is as important to obtain high precision as to make the generated models explainable to clinical staff. For this reason, it is essential to apply intelligent techniques that are able to learn effectively in these scenarios. This time it is about creating software in R to provide a simple way to construct an explanatory analysis of the causality between gene expression and patient conditions. The software created is highly automated, facilitating data entry to study different expression matrices, with a linear flow, with a reading of data through the GEO code, a preprocessing in which a hypothesis contrast is facilitated, a normalization to make the comparable data between them and a gene filtration that reduces the computational calculation of the subsequent training of machine learning models which entails different gene selection techniques to, through the validation of the model, detect the relationship between gene expression and the patient's condition and share the results of the genes really involved in the response I test this tool with one of the most current issues in terms of clinical diagnosis, the detection of cancer through the gene expression of platelets. The data were obtained from the experiment with code GSE89843. AUC above 90% are obtained with only 10 genes, which is a great advance in this field. The AUC can be interpreted as the probability of classifying them correctly. Due to its low cost due to the reduced number of genes and its low invasiveness, it can be carried out as a preventive test and reduce its mortality rate.En el camp de les aplicacions biomèdiques, és tan important obtenir una alta precisió com fer que els models generats siguin explicables per al personal clínic. Per aquesta raó, és essencial aplicar tècniques intel·ligents que siguin capaces d'aprendre de manera efectiva en aquests escenaris. En aquesta ocasió es tracta de crear un programari en R per a proporcionar una manera senzilla de construir una anàlisi explicativa de la causalitat entre l'expressió gènica i les condicions del pacient. El programari creat està molt automatitzat facilitant les entrades de dades per a estudiar diferents matrius d'expressió, amb un flux lineal, amb una lectura de dades a través del codi GEO, un preprocesamiento en el qual es facilita un contrast d'hipòtesi,una normalització per a fer les dades comparables entre ells i un filtrat de gens que redueix el càlcul computacional del posterior entrenament dels models machine learning el qual comporta diferents tècniques de selecció de gens per a, a través de la validació del model, detectar la relació entre l'expressió gènica i la condició del pacient i compartir els resultats dels gens realment implicats en la resposta. Poso a prova aquesta eina amb un dels temes mes actuals quant a diagnostico clínic, la detecció del càncer a través de l'expressió gènica de les plaquetes. Les dades s'han obtingut de l'experiment amb codi GSE89843. S'obtenen AUC per sobre del 90% amb tan sols 10 gens, la qual cosa suposa un gran avanç en aquest camp. El AUC es pot interpretar com la probabilitat de classificar-los correctament. A causa del seu baix cost pel nombre reduït de gens i la seva poca invasividad pot realitzar-se a manera de test preventiu i reduir la seva taxa de mortalitat

    Feature selection of gene expression data for Cancer classification using double RBF-kernels

    No full text
    Abstract Background Using knowledge-based interpretation to analyze omics data can not only obtain essential information regarding various biological processes, but also reflect the current physiological status of cells and tissue. The major challenge to analyze gene expression data, with a large number of genes and small samples, is to extract disease-related information from a massive amount of redundant data and noise. Gene selection, eliminating redundant and irrelevant genes, has been a key step to address this problem. Results The modified method was tested on four benchmark datasets with either two-class phenotypes or multiclass phenotypes, outperforming previous methods, with relatively higher accuracy, true positive rate, false positive rate and reduced runtime. Conclusions This paper proposes an effective feature selection method, combining double RBF-kernels with weighted analysis, to extract feature genes from gene expression data, by exploring its nonlinear mapping ability

    Immersive analytics for oncology patient cohorts

    Get PDF
    This thesis proposes a novel interactive immersive analytics tool and methods to interrogate the cancer patient cohort in an immersive virtual environment, namely Virtual Reality to Observe Oncology data Models (VROOM). The overall objective is to develop an immersive analytics platform, which includes a data analytics pipeline from raw gene expression data to immersive visualisation on virtual and augmented reality platforms utilising a game engine. Unity3D has been used to implement the visualisation. Work in this thesis could provide oncologists and clinicians with an interactive visualisation and visual analytics platform that helps them to drive their analysis in treatment efficacy and achieve the goal of evidence-based personalised medicine. The thesis integrates the latest discovery and development in cancer patients’ prognoses, immersive technologies, machine learning, decision support system and interactive visualisation to form an immersive analytics platform of complex genomic data. For this thesis, the experimental paradigm that will be followed is in understanding transcriptomics in cancer samples. This thesis specifically investigates gene expression data to determine the biological similarity revealed by the patient's tumour samples' transcriptomic profiles revealing the active genes in different patients. In summary, the thesis contributes to i) a novel immersive analytics platform for patient cohort data interrogation in similarity space where the similarity space is based on the patient's biological and genomic similarity; ii) an effective immersive environment optimisation design based on the usability study of exocentric and egocentric visualisation, audio and sound design optimisation; iii) an integration of trusted and familiar 2D biomedical visual analytics methods into the immersive environment; iv) novel use of the game theory as the decision-making system engine to help the analytics process, and application of the optimal transport theory in missing data imputation to ensure the preservation of data distribution; and v) case studies to showcase the real-world application of the visualisation and its effectiveness
    corecore