44 research outputs found

    Large-Scale and Pan-Cancer Multi-omic Analyses with Machine Learning

    Get PDF
    Multi-omic data analysis has been foundational in many fields of molecular biology, including cancer research. Investigation of the relationship between different omic data types reveals patterns that cannot otherwise be found in a single data type alone. With recent technological advancements in mass spectrometry (MS), MS-based proteomics has enabled the quantification of thousands of proteins in hundreds of cell lines and human tissue samples. This thesis presents several machine learning-based methods that facilitate the integrative analysis of multi-omic data. First, we reviewed five existing multi-omic data integration methods and performed a benchmarking analysis, using a large-scale multi-omic cancer cell line dataset. We evaluated the performance of these machine learning methods for drug response prediction and cancer type classification. Our result provides recommendations to researchers regarding optimal machine learning method selection for their applications. Second, we generated a pan-cancer proteomic map of 949 cancer cell lines across 40 cancer types and developed a machine learning method DeeProM to analyse the multi-omic information of these lines. This pan-cancer proteomic map (ProCan-DepMapSanger) is now publicly available and represents a major resource for the scientific community, for biomarker discovery and for the study of fundamental aspects of protein regulation. Third, we focused on publicly available multi-omic datasets of both cancer cell lines and human tissue samples and developed a Transformer-based deep learning method, DeePathNet, which integrates human knowledge with machine intelligence. We applied DeePathNet on three evaluation tasks, namely drug response prediction, cancer type classification and breast cancer subtype classification. Taken together, our analyses and methods allowed more accurate cancer diagnosis and prognosis

    2022-2023 Program and Abstracts: Celebration of Student Scholarship

    Get PDF
    The 2022-2023 Program and Abstracts for the Celebration of Student Scholarship at Morehead State University held on April 19, 2023. A Showcase of Student Research, Scholarship, Creative Work, and Performance Arts.https://scholarworks.moreheadstate.edu/celebration_posters_2023/1046/thumbnail.jp

    Multivariate Analysis and Modelling of multiple Brain endOphenotypes: Let's MAMBO!

    Get PDF
    Imaging genetic studies aim to test how genetic information influences brain structure and function by combining neuroimaging-based brain features and genetic data from the same individual. Most studies focus on individual correlation and association tests between genetic variants and a single measurement of the brain. Despite the great success of univariate approaches, given the capacity of neu- roimaging methods to provide a multiplicity of cerebral phenotypes, the development and application of multivariate methods become crucial. In this article, we review novel methods and strategies focused on the analysis of multiple phenotypes and genetic data. We also discuss relevant aspects of multi-trait modelling in the context of neuroimag- ing data

    Data integration in inflammatory bowel disease

    Get PDF
    [eng] INTRODUCTION: Inflammatory bowel disease is a complex intestinal disease with several genetic and environmental factors that can influence its course. The ethiology and pathophysiology of the disease is not fully understood. There is some evidence that microbiome can play a role. Finding relationships between microbiome and host’s mucosa could help advance prevention, diagnosis or treatment. METHODS: We based our analysis on intestinal bacterial 16S rRNA and human transcriptome data from biopsies from multiple timepoints and intestine segments. We extended regularized generalized canonical correlation analysis to find models that are coherent with previous knowledge on the disease taking into account the samples’ information. Multiple inflammatory bowel disease datasets on different treatments and conditions were analysed and the models defining those dataset were compared. The results were compared with multiple co-inertia analysis. RESULTS: Splitting sample variables into different blocks results in models of these relationships that show differences on the genes and microorganisms selected. The models generated using our new method inteRmodel outperformed multiple coinertia analysis to classify the samples according to their location. Despite being used on datasets of different sources the resulting models show similar relationships between variables. DISCUSSION: Comparing multiple models helps find out the relationships within datasets. Our method finds how strong are the relationships between the microbiome, transcriptome and environmental variables. On different datasets genes selected were common. This approach is robust and flexible to different datasets and settings. CONCLUSION: With inteRmodel we found that the microbiome relates more closely to the sample location than to disease, but the transcriptome is highly related to the location of the sample on the intestine. There is a common transcriptome between datasets while microorganisms depend of the dataset. We can improve sample classification by taking into account both bacterial 16S rRNA and host transcriptome.[cat] INTRODUCCIÓ: La malaltia inflamatòria intestinal és una malaltia intestinal complexa amb diversos factors genètics i ambientals que poden influir en el seu curs. L'etiologia i fisiopatologia de la malaltia no es con eix del tot. Hi ha evidències que el microbioma pot tenir un paper rellevant. Trobar relacions entre el microbioma i la mucosa de l'hoste podria ajudar a avançar en la prevenció, el diagnòstic o el tractament. MÈTODES: Vam basar la nostra anàlisi en dades d'ARNr 16S bacteriana intestinal i de transcriptoma humà de biòpsies de múltiples punts de temps i segments intestinals. Hem ampliat l'anàlisi de correlació canònica generalitzada regularitzada per trobar models coherents amb el coneixement previ sobre la malaltia tenint en compte la informació de les mostres. Es van analitzar diversos conjunts de dades de malaltia inflamatòria intestinal sobre diferents tractaments i condicions i es van comparar els models que defineixen aquest conjunt de dades. Els resultats es van comparar amb l'anàlisi de coinèrcia múltiple. RESULTATS: Dividir les variables de la mostra en diferents blocs dona com a resultat models d'aquestes relacions que mostren diferències en els gens i els microorganismes seleccionats. Els models generats mitjançant el nostre nou mètode intermodel van superar l'anàlisi de coinèrcia múltiple per classificar les mostres segons la seva ubicació. Tot i utilitzar-se en conjunts de dades de diferents fonts, els models resultants mostren relacions similars entre variables. DISCUSSIÓ: La comparació de diversos models ajuda a esbrinar les relacions dins dels conjunts de dades. El nostre mètode troba com de fortes són les relacions entre el microbioma, el transcriptoma i les variables ambientals. En diferents conjunts de dades, els gens seleccionats eren comuns. Aquest enfocament és robust i flexible per a diferents conjunts de dades i configuracions. CONCLUSIÓ: Amb inteRmodel vam trobar que el microbioma es relaciona més estretament amb la ubicació de la mostra que amb la malaltia, però el transcriptoma està molt relacionat amb la ubicació de la mostra a l'intestí. Hi ha un transcriptoma comú entre conjunts de dades, mentre que els microorganismes depenen del conjunt de dades. Podem millorar la classificació de les mostres tenint en compte tant l'ARNr 16S bacterià com el transcriptoma hoste.[spa] INTRODUCCIÓN: La enfermedad inflamatoria intestinal es una enfermedad intestinal compleja con factores genéticos y ambientales que pueden influir en su curso. La etiología y la fisiopatología de la enfermedad no se conocen por completo. Existen evidencias que el microbioma puede desempeijar un papel relevante. Encontrar relaciones entre el microbioma y la mucosa del huésped podría ayudar a avanzar en la prevención, el diagnóstico o el tratamiento. MÉTODOS: Basamos nuestro análisis en el ARNr 16S bacteriano intestinal y en datos de transcriptomas humanos de biopsias de múltiples puntos temporales y segmentos intestinales. Extendimos el análisis de correlación canónica generalizada regularizado para encontrar modelos coherentes con el conocimiento previo sobre la enfermedad teniendo en cuenta la información de las muestras. Se analizaron múltiples conjuntos de datos de enfermedad inflamatoria intestinal en diferentes tratamientos y condiciones y se compararon los modelos que definen esos conjuntos de datos. Los resultados se compararon con análisis de coinercia múltiple. RESULTADOS: Dividir las variables de la muestra en diferentes bloques resulta en modelos de estas relaciones que muestran diferencias en los genes y microorganismos seleccionados. Los modelos generados con nuestro nuevo método, inter-Rmodel, superaron el análisis de múltiples coinercias para clasificar las muestras según su ubicación. A pesar de ser utilizados en conjuntos de datos de diferentes fuentes, los modelos resultantes muestran unas relaciones similares entre las variables. DISCUSIÓN: La comparación de varios modelos ayuda a descubrir las relaciones dentro de los conjuntos de datos. Nuestro método encuentra cuán fuertes son las relaciones entre el microbioma, el transcriptoma y las variables ambientales. En diferentes conjuntos de datos, los genes seleccionados eran comunes. Este enfoque es robusto y flexible para diferentes conjuntos de datos y configuraciones. CONCLUSIÓN: Con inteRmodel descubrimos que el microbioma se relaciona más estrechamente con la ubicación de la muestra que con la enfermedad, pero el transcriptoma está muy relacionado con la ubicación de la muestra en el intestino. Hay un transcriptoma común entre los conjuntos de datos, mientras que los microorganismos dependen del conjunto de datos. Podemos mejorar la clasificación de las muestras teniendo en cuenta tanto el ARNr 16S bacteriano como el transcriptoma del huésped

    Multi view based imaging genetics analysis on Parkinson disease

    Get PDF
    Longitudinal studies integrating imaging and genetic data have recently become widespread among bioinformatics researchers. Combining such heterogeneous data allows a better understanding of complex diseases origins and causes. Through a multi-view based workflow proposal, we show the common steps and tools used in imaging genetics analysis, interpolating genotyping, neuroimaging and transcriptomic data. We describe the advantages of existing methods to analyze heterogeneous datasets, using Parkinson\u2019s Disease (PD) as a case study. Parkinson's disease is associated with both genetic and neuroimaging factors, however such imaging genetics associations are at an early investigation stage. Therefore it is desirable to have a free and open source workflow that integrates different analysis flows in order to recover potential genetic biomarkers in PD, as in other complex diseases

    STATISTICAL METHODS FOR BRAIN IMAGING GENOMICS

    Get PDF
    Brain Imaging genetic studies examine genetic basis of brain images to better understand the genetic impact on behavior and disease phenotypes. Methods for identifying genetic associations with voxelwise brain imaging data have evolved from parallel analysis on each voxel to incorporating spatial smoothness and correlation to increase statistical detection power. Challenges still exist on the joint analysis of imaging data and genetic data, including imperfect alignment of affected regions and registration error, low signal to noise ratio in high-dimensional data, complex relationships, high computation complexity, and between-study heterogeneity. To address these issues, the following methods are proposed.First, to deal with imperfect alignment and registration error in brain imaging data, we proposed a region-based functional genome-wide association detection method, which also reduces computation burden as compared to standard voxelwise methods. The method summarizes regional voxelwise measurements into density curves. The non-parametric ball covariance test is then used to detect association between the log-quantile transformed regional densities and genetic markers. We compared the ball covariance test with other state-of-the-art methods on simulated datasets and demonstrate good sensitivity and specificity of our method. Second, we combined functional partial least squares with distance correlation to reduce computation burden of high dimensional data and allow flexible characterization of the imaging-genetic relationship. Third, given imaging-genetic data from more than one studies, we theoretically compared the ensembled learner and merged learner in the prediction problem, where learners are trained using the multivariate varying coefficient model and multi-study data are assumed to come from a mixed model, where the mixed effect represents inter-study heterogeneity.Doctor of Philosoph

    Associating Multi-modal Brain Imaging Phenotypes and Genetic Risk Factors via A Dirty Multi-task Learning Method

    Get PDF
    Brain imaging genetics becomes more and more important in brain science, which integrates genetic variations and brain structures or functions to study the genetic basis of brain disorders. The multi-modal imaging data collected by different technologies, measuring the same brain distinctly, might carry complementary information. Unfortunately, we do not know the extent to which the phenotypic variance is shared among multiple imaging modalities, which further might trace back to the complex genetic mechanism. In this paper, we propose a novel dirty multi-task sparse canonical correlation analysis (SCCA) to study imaging genetic problems with multi-modal brain imaging quantitative traits (QTs) involved. The proposed method takes advantages of the multi-task learning and parameter decomposition. It can not only identify the shared imaging QTs and genetic loci across multiple modalities, but also identify the modality-specific imaging QTs and genetic loci, exhibiting a flexible capability of identifying complex multi-SNP-multi-QT associations. Using the state-of-the-art multi-view SCCA and multi-task SCCA, the proposed method shows better or comparable canonical correlation coefficients and canonical weights on both synthetic and real neuroimaging genetic data. In addition, the identified modality-consistent biomarkers, as well as the modality-specific biomarkers, provide meaningful and interesting information, demonstrating the dirty multi-task SCCA could be a powerful alternative method in multi-modal brain imaging genetics
    corecore