226 research outputs found
MSI-based mapping strategies in tumour-heterogeneity
Since the early 2000s, considerable innovations in MS technology and associated gene sequencing systems have enabled the "-omics" revolution. The data collected from multiple omics research can be combined to gain a better understanding of cancer's biological activity. Breast and ovarian cancer are among the most common cancers worldwide in women. Despite significant advances in diagnosis, treatment, and subtype identification, breast cancer remains the world's second leading cause of cancer-related deaths in women, with ovarian cancer ranking fifth. Tumour heterogeneity is a significant hurdle in cancer patient prognosis, response to therapy, and metastasis. As such, heterogeneity is one of the most significant and clinically relevant areas of cancer research nowadays. Metabolic reprogramming is a hallmark of malignancy that has been widely acknowledged in recent literature. Metabolic heterogeneity in tumours poses a challenge in developing therapies that exploit metabolic vulnerabilities.
Consequently, it is crucial to approach tumour heterogeneity with an unlabeled yet spatially specific read-out of metabolic and genetic information. The advantage of DESI-MSI technology originates from its untargeted nature, which allows for the investigation of thousands of component distributions, at a micrometre scale, in a single experiment. Most notably, using a DESI-MSI clustering approach could potentially offer novel insights into metabolism, providing a method to characterise metabolically distinct sub-regions and subsequently delineate the underlying genetic drivers through genomic analyses.
Hence, in this study, we aim to map the inter-and intra-tumour metabolic heterogeneity in breast and ovarian cancer by integrating multimodal MSI-based mapping strategies, comprising DESI and MALDI, with IMC (Imaging Mass Cytometry) analysis of the tumour section, using CyTOF, and high- throughput genetic characterisation of metabolically-distinct regions by transcriptomics. The multimodal analysis workflow was initially performed using sequential breast cancer Patient-Derived Xenografts (PDX) models and was expanded on primary tumour sections. Moreover, a newly developed DESI-MSI friendly, hydroxypropyl-methylcellulose and polyvinylpyrrolidone (HPMC/PVP) hydrogel-based embedding was successfully established to allow simultaneous preparation and analysis of numerous fresh frozen core-size biopsies in the same Tissue Microarray (TMA) block for the investigation of tumour heterogeneity. Additionally, a single section strategy was combined with DESI-MSI coupled to Laser Capture Microdissection (LCM) application to integrate gene expression analysis and Liquid Chromatography-Mass Spectrometry (LC-MS) on the same tissue segment. The developed single section methodology was then tested with multi-region collected ovarian tumours. DESI-MSI-guided spatial transcriptomics was performed for co-registration of different omics datasets on the same regions of interest (ROIs). This co-registration of various omics could unravel possible interactions between distinct metabolic profiles and specific genetic drivers that can lead to intra-tumour heterogeneity.
Linking all these findings from MSI-based or guided various strategies allows for a transition from a qualitative approach to a conceptual understanding of the architecture of multiple molecular networks responsible for cellular metabolism in tumour heterogeneity.Open Acces
Semantic Biclustering
Tato disertační práce se zaměřuje na problém hledání interpretovatelných a prediktivních vzorů, které jsou vyjádřeny formou dvojshluků, se specializací na biologická data. Prezentované metody jsou souhrnně označovány jako sémantické dvojshlukování, jedná se o podobor dolování dat. Termín sémantické dvojshlukování je použit z toho důvodu, že zohledňuje proces hledání koherentních podmnožin řádků a sloupců, tedy dvojshluků, v 2-dimensionální binární matici a zárove ň bere také v potaz sémantický význam prvků v těchto dvojshlucích. Ačkoliv byla práce motivována biologicky orientovanými daty, vyvinuté algoritmy jsou obecně aplikovatelné v jakémkoli jiném výzkumném oboru. Je nutné pouze dodržet požadavek na formát vstupních dat. Disertační práce představuje dva originální a v tomto ohledu i základní přístupy pro hledání sémantických dvojshluků, jako je Bicluster enrichment analysis a Rule a tree learning. Jelikož tyto metody nevyužívají vlastní hierarchické uspořádání termů v daných ontologiích, obecně je běh těchto algoritmů dlouhý čin může docházet k indukci hypotéz s redundantními termy. Z toho důvodu byl vytvořen nový operátor zjemnění. Tento operátor byl včleněn do dobře známého algoritmu CN2, kde zavádí dvě redukční procedury: Redundant Generalization a Redundant Non-potential. Obě procedury pomáhají dramaticky prořezat prohledávaný prostor pravidel a tím umožňují urychlit proces indukce pravidel v porovnání s tradičním operátorem zjemnění tak, jak je původně prezentován v CN2. Celý algoritmus spolu s redukčními metodami je publikován ve formě R balííčku, který jsme nazvali sem1R. Abychom ukázali i možnost praktického užití metody sémantického dvojshlukování na reálných biologických problémech, v disertační práci dále popisujeme a specificky upravujeme algoritmus sem1R pro dv+ úlohy. Zaprvé, studujeme praktickou aplikaci algoritmu sem1R v analýze E-3 ubikvitin ligázy v trávicí soustavě s ohledem na potenciál regenerace tkáně. Zadruhé, kromě objevování dvojshluků v dat ech genové exprese, adaptujeme algoritmus sem1R pro hledání potenciálne patogenních genetických variant v kohortě pacientů.This thesis focuses on the problem of finding interpretable and predic tive patterns, which are expressed in the form of biclusters, with an orientation to biological data. The presented methods are collectively called semantic biclustering, as a subfield of data mining. The term semantic biclustering is used here because it reflects both a process of finding coherent subsets of rows and columns in a 2-dimensional binary matrix and simultaneously takes into account a mutual semantic meaning of elements in such biclusters. In spite of focusing on applications of algorithms in biological data, the developed algorithms are generally applicable to any other research field, there are only limitations on the format of the input data. The thesis introduces two novel, and in that context basic, approaches for finding semantic biclusters, as Bicluster enrichment analysis and Rule and tree learning. Since these methods do not exploit the native hierarchical order of terms of input ontologies, the run-time of algorithms is relatively long in general or an induced hypothesis might have terms that are redundant. For this reason, a new refinement operator has been invented. The refinement operator was incorporated into the well-known CN2 algorithm and uses two reduction procedures: Redundant Generalization and Redundant Non-potential, both of which help to dramatically prune the rule space and consequently, speed-up the entire process of rule induction in comparison with the traditional refinement operator as is presented in CN2. The reduction procedures were published as an R package that we called sem1R. To show a possible practical usage of semantic biclustering in real biological problems, the thesis also describes and specifically adapts the algorithm for two real biological problems. Firstly, we studied a practical application of sem1R algorithm in an analysis of E-3 ubiquitin ligase in the gastrointestinal tract with respect to tissue regeneration potential. Secondly, besides discovering biclusters in gene expression data, we adapted the sem1R algorithm for a different task, concretely for finding potentially pathogenic genetic variants in a cohort of patients
Robust Computational Tools for Multiple Testing with Genetic Association Studies
Resolving the interplay of the genetic components of a complex disease is a challenging endeavor. Over the past several years, genome-wide association studies (GWAS) have emerged as a popular approach at locating common genetic variation within the human genome associated with disease risk. Assessing genetic-phenotype associations upon hundreds of thousands of genetic markers using the GWAS approach, introduces the potentially high number of false positive signals and requires statistical correction for multiple hypothesis testing. Permutation tests are considered the gold standard for multiple testing correction in GWAS, because they simultaneously provide unbiased Type I error control and high power. However, they demand heavy computational effort, especially with large-scale data sets of modern GWAS. In recent years, the computational problem has been circumvented by using approximations to permutation tests, but several studies have posed sampling conditions in which these approximations are suggestive to be biased.
We have developed an optimized parallel algorithm for the permutation testing approach to multiple testing correction in GWAS, whose implementation essentially abates the computational problem. When introduced to GWAS data, our algorithm yields rapid, precise, and powerful multiplicity adjustment, many orders of magnitude faster than existing employed GWAS statistical software.
Although GWAS have identified many potentially important genetic associations which will advance our understanding of human disease, the common variants with modest effects on disease risk discovered through this approach likely account for a small proportion of the heritability in complex disease. On the other hand, interactions between genetic and environmental factors could account for a substantial proportion of the heritability in a complex disease and are overlooked within the GWAS approach.
We have developed an efficient and easily implemented tool for genetic association studies, whose aim is identifying genes involved in a gene-environment interaction. Our approach is amenable to a wide range of association studies and assorted densities in sampled genetic marker panels, and incorporates resampling for multiple testing correction. Within the context of a case-control study design we demonstrate by way of simulation that our proposed method offers greater statistical power to detect gene-environment interaction, when compared to several competing approaches to assess this type of interaction
Integrative Computational Genomics Based Approaches to Uncover the Tissue-Specific Regulatory Networks in Development and Disease
Indiana University-Purdue University Indianapolis (IUPUI)Regulatory protein families such as transcription factors (TFs) and RNA Binding Proteins (RBPs) are increasingly being appreciated for their role in regulating the respective targeted genomic/transcriptomic elements resulting in dynamic transcriptional (TRNs) and post-transcriptional regulatory networks (PTRNs) in higher eukaryotes. The mechanistic understanding of these two regulatory network types require a high resolution tissue-specific functional annotation of both the proteins as well as their target sites. This dissertation addresses the need to uncover the tissue-specific regulatory networks in development and disease. This work establishes multiple computational genomics based approaches to further enhance our understanding of regulatory circuits and decipher the associated mechanisms at several layers of biological processes. This study potentially contributes to the research community by providing valuable resources including novel methods, web interfaces and software which transforms our ability to build high-quality regulatory binding maps of RBPs and TFs in a tissue specific manner using multi-omics datasets. The study deciphered the broad spectrum of temporal and evolutionary dynamics of the transcriptome and their regulation at transcriptional and post transcriptional levels. It also advances our ability to functionally annotate hundreds of RBPs and their RNA binding sites across tissues in the human genome which help in decoding the role of RBPs in the context of disease phenotype, networks, and pathways.
The approaches developed in this dissertation is scalable and adaptable to further investigate the tissue specific regulators in any biological systems. Overall, this study contributes towards accelerating the progress in molecular diagnostics and drug target identification using regulatory network analysis method in disease and pathophysiology
Detecting signals of selection in the genomes of Native Americans and admixed Latin Americans
The peopling of the Americas represents the last major expansion of human populations worldwide. As the first humans moved into the continent they were exposed to new environments requiring them to adapt. The subsequent colonization of the continent by Europeans, along with the African slave trade, involved a major admixture process that was accompanied by new selective pressures, most notably exposure to new pathogens. Applying current and novel methods to genome-wide SNP data of Native and admixed Latin Americans, this PhD thesis provides an analysis of the adaptive history in the Americas. I show that prior to the European contact, candidate regions of selection in Native Americans include genes associated with metabolic traits, highlighting a possible adaptation to dietary changes. Using novel and existing methods to detect selection post-admixture, I show that genes related to immune response were probably under selection in admixed Latin Americans. As an example on the evolution of an adaptive trait, I also conduct a Genome Wide Association Study on a sample of over 6,000 Latin Americans for skin, eye and hair pigmentation. I report eighteen independent genome-wide significant signals of association, including five novel variants. One of the novel variants associated to skin pigmentation is common in East Asians and Native Americans, but is almost absent everywhere else in the world. I show that this variant was selected in East Asians after their split from Europeans, and likely carried by the first Americans to the Americas
Discerning Drivers of Cancer: Computational Approaches to Somatic Exome Sequencing Data
Paired tumor-normal sequencing of thousands of patient’s exomes has revealed millions of somatic mutations, but functional characterization and clinical decision making are stymied because biologically neutral ‘passenger’ mutations greatly outnumber pathogenic ‘driver’ mutations. Since most mutations will return negative results if tested, conventional resource-intensive experiments are reserved for mutations which are observed in multiple patients or rarer mutations found in well-established cancer genes. Most mutations are therefore never tested, diminishing the potential to discover new mechanisms of cancer development and treatment opportunities. Computational methods that reliably prioritize mutations for testing would greatly increase the translation of sequencing results to clinical care. The goal of this thesis is to develop new approaches that use datasets of protein-coding somatic mutations to identify putative cancer-causing genes and mutations, and to validate these predictions in silico and experimentally. This effort will be split among several inter-related efforts, which taken together will help experimental biologists and clinicians focus on hypotheses that can yield novel insights into cancer biology, development, and treatment
Developing collaborative planning support tools for optimised farming in Western Australia
Land-use (farm) planning is a highly complex and dynamic process. A land-use plan can be optimal at one point in time, but its currency can change quickly due to the dynamic nature of the variables driving the land-use decision-making process. These include external drivers such as weather and produce markets, that also interact with the biophysical interactions and management activities of crop production.The active environment of an annual farm planning process can be envisioned as being cone-like. At the beginning of the sowing year, the number of options open to the manager is huge, although uncertainty is high due to the inability to foresee future weather and market conditions. As the production year reveals itself, the uncertainties around weather and markets become more certain, as does the impact of weather and management activities on future production levels. This restricts the number of alternative management options available to the farm manager. Moreover, every decision made, such as crop type sown in a paddock, will constrains the range of management activities possible in that paddock for the rest of the growing season.This research has developed a prototype Land-use Decision Support System (LUDSS) to aid farm managers in their tactical farm management decision making. The prototype applies an innovative approach that mimics the way in which a farm manager and/or consultant would search for optimal solutions at a whole-farm level. This model captured the range of possible management activities available to the manager and the impact that both external (to the farm) and internal drivers have on crop production and the environment. It also captured the risk and uncertainty found in the decision space.The developed prototype is based on a Multiple Objective Decision-making (MODM) - á Posteriori approach incorporating an Exhaustive Search method. The objective set used for the model is: maximising profit and minimising environmental impact. Pareto optimisation theory was chosen as the method to select the optimal solution and a Monte Carlo simulator is integrated into the prototype to incorporate the dynamic nature of the farm decision making process. The prototype has a user-friendly front and back end to allow farmers to input data, drive the application and extract information easily
Connectivity between MPAs: selecting appropriate taxa and assessing genetic connectivity in two benthic marine invertebrates
Connectivity is fundamental for the persistence of many populations of marine species and is formally identified as one of five key principles for designing an ecologically coherent network of Marine Protected Areas (MPAs) in European waters. However, the process of assessing connectivity between MPAs, and which taxa to include in assessments of connectivity, is challenging. Managers of MPAs have typically concentrated their efforts on species that are endangered or rare, or on so-called 'umbrella', 'keystone' or 'flagship' species; however, these species may not always be the best candidates for assessing connectivity of a MPA network. In this thesis, a meta-analysis was firstly conducted to study genetic patterns across a broad range of coastal marine taxa in the northeast Atlantic. This meta-analysis provided insights into the biological and methodological information needed to ascertain which taxa may be considered as good candidates for assessing genetic connectivity between MPAs across Britain and the wider northeast Atlantic. The knowledge gained from this literature survey facilitated the design of a set of criteria that identified ideal traits of a candidate species for assessments of genetic connectivity between MPAs; subsequently, based on these criteria, two species were selected to assess connectivity between MPAs in the British network: the pink sea fan (Eunicella verrucosa) and the European lobster (Homarus gammarus). Using 13 microsatellites and 3,743 SNPs, the results for the pink sea fan indicated the presence of three distinct genetic groups, partitioned between sites from western Ireland, southern Portugal and Britain-France. For the European lobster, 86 SNPs indicated strong genetic differentiation between the northeast Atlantic, the middle Mediterranean and the eastern Mediterranean (Aegean Sea). In addition, there was a pronounced genetic cline across the northeast Atlantic, suggesting that connectivity in the European lobster follows a stepping-stone model of dispersal, which was supported by simulations of larval dispersal. Taken together, the results from these two studies suggests that the MPA network in Britain is sufficient to maintain connectivity in the pink sea fan and the European lobster, and possibly other species living in comparable habitats with similar life histories and dispersal traits. Moreover, the criteria applied in this thesis to select species appears to facilitate the identification of ideal surrogate taxa to assess connectivity between MPAs, which could easily be applied to assessments of MPA network connectivity in other seas and oceans around the world.Natural Environment Research Council (NERC
- …