1,011 research outputs found
Leveraging technology-driven strategies to untangle omics big data: circumventing roadblocks in clinical facets of oral cancer
Oral cancer is one of the 19most rapidly progressing cancers associated with significant mortality, owing to its extreme degree of invasiveness and aggressive inclination. The early occurrences of this cancer can be clinically deceiving leading to a poor overall survival rate. The primary concerns from a clinical perspective include delayed diagnosis, rapid disease progression, resistance to various chemotherapeutic regimens, and aggressive metastasis, which collectively pose a substantial threat to prognosis. Conventional clinical practices observed since antiquity no longer offer the best possible options to circumvent these roadblocks. The world of current cancer research has been revolutionized with the advent of state-of-the-art technology-driven strategies that offer a ray of hope in confronting said challenges by highlighting the crucial underlying molecular mechanisms and drivers. In recent years, bioinformatics and Machine Learning (ML) techniques have enhanced the possibility of early detection, evaluation of prognosis, and individualization of therapy. This review elaborates on the application of the aforesaid techniques in unraveling potential hints from omics big data to address the complexities existing in various clinical facets of oral cancer. The first section demonstrates the utilization of omics data and ML to disentangle the impediments related to diagnosis. This includes the application of technology-based strategies to optimize early detection, classification, and staging via uncovering biomarkers and molecular signatures. Furthermore, breakthrough concepts such as salivaomics-driven non-invasive biomarker discovery and omics-complemented surgical interventions are articulated in detail. In the following part, the identification of novel disease-specific targets alongside potential therapeutic agents to confront oral cancer via omics-based methodologies is presented. Additionally, a special emphasis is placed on drug resistance, precision medicine, and drug repurposing. In the final section, we discuss the research approaches oriented toward unveiling the prognostic biomarkers and constructing prediction models to capture the metastatic potential of the tumors. Overall, we intend to provide a bird’s eye view of the various omics, bioinformatics, and ML approaches currently being used in oral cancer research through relevant case studies
Artificial intelligence for predictive biomarker discovery in immuno-oncology: a systematic review
Background: The widespread use of immune checkpoint inhibitors (ICIs) has revolutionised treatment of multiple cancer types. However, selecting patients who may benefit from ICI remains challenging. Artificial intelligence (AI) approaches allow exploitation of high-dimension oncological data in research and development of precision immuno-oncology. Materials and methods: We conducted a systematic literature review of peer-reviewed original articles studying the ICI efficacy prediction in cancer patients across five data modalities: genomics (including genomics, transcriptomics, and epigenomics), radiomics, digital pathology (pathomics), and real-world and multimodality data. Results: A total of 90 studies were included in this systematic review, with 80% published in 2021-2022. Among them, 37 studies included genomic, 20 radiomic, 8 pathomic, 20 real-world, and 5 multimodal data. Standard machine learning (ML) methods were used in 72% of studies, deep learning (DL) methods in 22%, and both in 6%. The most frequently studied cancer type was non-small-cell lung cancer (36%), followed by melanoma (16%), while 25% included pan-cancer studies. No prospective study design incorporated AI-based methodologies from the outset; rather, all implemented AI as a post hoc analysis. Novel biomarkers for ICI in radiomics and pathomics were identified using AI approaches, and molecular biomarkers have expanded past genomics into transcriptomics and epigenomics. Finally, complex algorithms and new types of AI-based markers, such as meta-biomarkers, are emerging by integrating multimodal/multi-omics data. Conclusion: AI-based methods have expanded the horizon for biomarker discovery, demonstrating the power of integrating multimodal data from existing datasets to discover new meta-biomarkers. While most of the included studies showed promise for AI-based prediction of benefit from immunotherapy, none provided high-level evidence for immediate practice change. A priori planned prospective trial designs are needed to cover all lifecycle steps of these software biomarkers, from development and validation to integration into clinical practice
Molekulargenetische Charakterisierung von Sarkomen zur Identifizierung prognostischer Risikogruppen und potentieller therapeutischer Angriffspunkte
Sarkome sind seltene Tumore, die sich durch eine erhebliche Heterogenität auf
histologischer, molekularer und genetischer Ebene auszeichnen. Trotz aller Fortschritte
in der modernen Krebsbehandlung haben Sarkom-Patienten im fortgeschrittenen
Stadium weiterhin begrenzte therapeutische Möglichkeiten und eine
ungünstige Prognose. Da die Untersuchung des genetischen Profils nicht nur die
Identifizierung prognostischer, sondern auch therapierelevanter Veränderungen
bei heterogenen Erkrankungen ermöglicht, sind genetische Analysen ein unverzichtbarer
Bestandteil der modernen Krebsbehandlung geworden.
In dieser Studie analysierten wir retrospektiv das genetische Profil einer real-life
Kohorte von 53 Sarkom-Patienten anhand eines 720-Gen-Panels.
In Anbetracht der Heterogenität von Sarkomen wurden mehrere histopathologische
Subtypen analysiert, wobei das Leiomyosarkom (17 %) am häufigsten vorkam.
Das Durchschnittsalter der Patienten zum Zeitpunkt der Analyse betrug 49
Jahre. Die durchschnittliche Zeitspanne von der Erstdiagnose bis zur genetischen
Analyse betrug 46,8 Monate. Das Gesamtüberleben betrug im Durchschnitt
55,9 Monate.
Jeder Patient erhielt eine Tumorgenomsequenzierung mit einem 720-Gene-Panel.
Bei 76,9% der Patienten wurde ein niedriger TMB-Wert festgestellt. Keiner
der Patienten wurde als mikrosatelliteninstabil identifiziert. 25% der Patienten
wiesen einen Mangel an der Funktionalität der homologen Rekombination (HRD)
auf. Bei 30,8% wurde ein Fusionsgen nachgewiesen, wobei EWSR1-FLI1 und
EWSR1- WT1 am häufigsten waren. Insgesamt wurden 38 Kopienzahlveränderungen
(CNAs) gefunden, was auf eine erhebliche genomische Instabilität hinweist.
Bei 15 Patienten wurden Keimbahnmutationen gefunden, die alle behandlungsrelevant
sind, wobei die Mutation im MUTYH-Gen die häufigste ist. Therapierelevante
somatische Mutationen wurden bei 47 Patienten gefunden (3,2 Mutationen/
Patient). Die am häufigsten betroffenen Gene waren TP53, CDKN2A-C,
CDK4, RB1 und ATRX.
93
Auf der Grundlage der NGS-Ergebnisse erhielten 39,6 % der Patienten eine personalisierte
Antitumortherapie. Das mediane Gesamtüberleben (OS) der Patienten
mit einer gemäß den Daten der NGS-Analyse ausgerichteten Behandlung
betrug 43 gegenüber 33 Monaten bei Patienten ohne zielgerichtete Therapien.
Unsere NGS-Daten aus einer heterogenen Kohorte von 53 Sarkom-Patienten
deuten darauf hin, dass personalisierte Therapien, die auf den Ergebnissen einer
720 Gen-Panel-Sequenzierung basieren, zu verbesserten klinischen Ergebnissen
bei Sarkom-Patienten führen könnten
Discovery of genetic factors for reading ability and dyslexia
The ability to read is critical to access wider learning and achieve qualifications, for accessing employment, and for adult life skills. Approximately one in ten individuals are affected by dyslexia, a learning difficulty which primarily impacts word reading and spelling. Specifically, phonological processing (the ability to decode phonemes) is impaired in dyslexia. Whilst some believe dyslexia represents the extreme end of a continuum of reading ability, others have suggested it is a distinct trait.
Variation in reading ability is a highly heritable (possibly 70%) complex trait caused by many genetic variants with a small effect size. However, the genetic architecture of reading ability and dyslexia is largely unknown due to a lack of quantitative genetic studies with sufficient statistical power to detect such small effect sizes. Previously, most genetic studies of reading ability have been conducted using samples of children with dyslexia, which tend to be modest in size. Whilst large samples of genotyped unselected adults have been collected (for example UK Biobank), phenotypic data on reading or language skills is rarely prioritised.
The overall aim of this thesis is to discover genetic variants associated with dyslexia and variation in reading skill in order to better understand the aetiology of reading difficulties, which in turn, may inform prediction, identification and intervention strategies in the future. Firstly, I will conduct a genome-wide association (GWA) study of over 50,000 adults with a self-reported dyslexia diagnosis and over 1 million controls to identify associated single nucleotide polymorphisms (SNPs). I will also explore ways to improve power for discovering genetic factors associated with reading ability. To do this, I will first investigate whether unselected adult samples are valid as a means to identify genetic factors associated with reading skill through a candidate gene approach. Secondly, I will investigate whether proxy reading phenotypes are also a means to gain power through large cohorts that have no quantitative measure of reading ability. Such samples may be informative for future GWA meta-analysis of quantitative reading ability.
In Chapter 1, I will first introduce reading ability and dyslexia. I will discuss how reading ability is a quantitative trait and how it can be measured before discussing how dyslexia is identified. Then, I will consider how dyslexia may relate to reading ability: whether it represents the extreme end of a continuum of reading or whether it is a distinct trait. I will then introduce the known causes of variation in reading ability and dyslexia, which includes both environmental and genetic factors. Next, I will present the history of genetic studies of reading ability and dyslexia and their limitations. Finally, I will discuss the current state of genetic research into reading ability and introduce the aims of my thesis in detail.
Chapter 2 is a publication in Nature Genetics entitled ‘Discovery of 42 genome-wide significant loci associated with dyslexia’ which includes GWA analysis of over 1 million 23andMe, Inc participants reporting on dyslexia diagnosis. I identify 42 independent genome-wide significant loci, 15 of which are in genes previously linked to cognitive ability and/or educational attainment, and 27 of which are novel and may be more specific to dyslexia. Extensive downstream biological analysis is performed alongside genetic correlations with other traits and dyslexia polygenic score prediction of quantitative reading scores.
Chapter 3 is a publication in Twin Research and Human Genetics on ‘The association of dyslexia and developmental speech and language disorder candidate genes with reading and language abilities in adults’ which analyses an adult population cohort
with quantitative measures of reading and language ability to replicate previous associations of candidate genes and biological pathways with dyslexia. I demonstrate that unselected adult populations are a valid means by which to identify genes which have previously been associated with dyslexia and/or speech and language disorder.
Chapter 4 is a research chapter in which I construct a proxy reading phenotype from measures of reading frequency in an unselected adult sample for whom a quantitative measure of reading ability is not available. I find that a dyslexia polygenic score constructed from the dyslexia GWA analysis in Chapter 3 cannot explain variation in the proxy phenotype suggesting that book reading is not a sufficient substitute for reading ability.
Finally, in Chapter 5, I integrate and discuss my research findings. I highlight the discovery of 42 variants associated with dyslexia through GWAS, in addition to the discovery of new genes and biological pathways which may form part of the biological basis of dyslexia. Following this, I consider what GWAS tells us about candidate gene findings. I discuss traits which are genetically correlated with dyslexia, including quantitative reading skills and ADHD. I consider the relationship between dyslexia and reading ability, and how genetic studies can help us to understand this better. I also consider the relationship between dyslexia and other developmental disorders, and how genetic studies can help us to understand this better. Lastly, I discuss methods to boost power for GWAS of reading ability
The development of bioinformatics workflows to explore single-cell multi-omics data from T and B lymphocytes
The adaptive immune response is responsible for recognising, containing and eliminating viral infection, and protecting from further reinfection. This antigen-specific response is driven by T and B cells, which recognise antigenic epitopes via highly specific heterodimeric surface receptors, termed T-cell receptors (TCRs) and B cell receptors (BCRs). The theoretical diversity of the receptor repertoire that can be generated via homologous recombination of V, D and J genes is large enough (>1015 unique sequences) that virtually any antigen can be recognised. However, only a subset of these are generated within the human body, and how they succeed in specifically recognising any pathogen(s) and distinguishing these from self-proteins remains largely unresolved.
The recent advances in applying single-cell genomics technologies to simultaneously measure the clonality, surface phenotype and transcriptomic signature of pathogen- specific immune cells have significantly improved understanding of these questions. Single-cell multi-omics permits the accurate identification of clonally expanded populations, their differentiation trajectories, the level of immune receptor repertoire diversity involved in the response and the phenotypic and molecular heterogeneity.
This thesis aims to develop a bioinformatic workflow utilising single-cell multi-omics data to explore, quantify and predict the clonal and transcriptomic signatures of the human T-cell response during and following viral infection. In the first aim, a web application, VDJView, was developed to facilitate the simultaneous analysis and visualisation of clonal, transcriptomic and clinical metadata of T and B cell multi-omics data. The application permits non-bioinformaticians to perform quality control and common analyses of single-cell genomics data integrated with other metadata, thus permitting the identification of biologically and clinically relevant parameters. The second aim pertains to analysing the functional, molecular and immune receptor profiles of CD8+ T cells in the acute phase of primary hepatitis C virus (HCV) infection. This analysis identified a novel population of progenitors of exhausted T cells, and lineage tracing revealed distinct trajectories with multiple fates and evolutionary plasticity. Furthermore, it was observed that high-magnitude IFN-γ CD8+ T-cell response is associated with the increased probability of viral escape and chronic infection. Finally, in the third aim, a novel analysis is presented based on the topological characteristics of a network generated on pathogen-specific, paired-chain, CD8+ TCRs. This analysis revealed how some cross-reactivity between TCRs can be explained via the sequence similarity between TCRs and that this property is not uniformly distributed across all pathogen-specific TCR repertoires. Strong correlations between the topological properties of the network and the biological properties of the TCR sequences were identified and highlighted.
The suite of workflows and methods presented in this thesis are designed to be adaptable to various T and B cell multi-omic datasets. The associated analyses contribute to understanding the role of T and B cells in the adaptive immune response to viral-infection and cancer
Effects of municipal smoke-free ordinances on secondhand smoke exposure in the Republic of Korea
ObjectiveTo reduce premature deaths due to secondhand smoke (SHS) exposure among non-smokers, the Republic of Korea (ROK) adopted changes to the National Health Promotion Act, which allowed local governments to enact municipal ordinances to strengthen their authority to designate smoke-free areas and levy penalty fines. In this study, we examined national trends in SHS exposure after the introduction of these municipal ordinances at the city level in 2010.MethodsWe used interrupted time series analysis to assess whether the trends of SHS exposure in the workplace and at home, and the primary cigarette smoking rate changed following the policy adjustment in the national legislation in ROK. Population-standardized data for selected variables were retrieved from a nationally representative survey dataset and used to study the policy action’s effectiveness.ResultsFollowing the change in the legislation, SHS exposure in the workplace reversed course from an increasing (18% per year) trend prior to the introduction of these smoke-free ordinances to a decreasing (−10% per year) trend after adoption and enforcement of these laws (β2 = 0.18, p-value = 0.07; β3 = −0.10, p-value = 0.02). SHS exposure at home (β2 = 0.10, p-value = 0.09; β3 = −0.03, p-value = 0.14) and the primary cigarette smoking rate (β2 = 0.03, p-value = 0.10; β3 = 0.008, p-value = 0.15) showed no significant changes in the sampled period. Although analyses stratified by sex showed that the allowance of municipal ordinances resulted in reduced SHS exposure in the workplace for both males and females, they did not affect the primary cigarette smoking rate as much, especially among females.ConclusionStrengthening the role of local governments by giving them the authority to enact and enforce penalties on SHS exposure violation helped ROK to reduce SHS exposure in the workplace. However, smoking behaviors and related activities seemed to shift to less restrictive areas such as on the streets and in apartment hallways, negating some of the effects due to these ordinances. Future studies should investigate how smoke-free policies beyond public places can further reduce the SHS exposure in ROK
The developmental transcriptome for Lytechinus variegatus exhibits temporally punctuated gene expression changes
Embryonic development is arguably the most complex process an organism undergoes during its lifetime, and understanding this complexity is best approached with a systems-level perspective. The sea urchin has become a highly valuable model organism for understanding developmental specification, morphogenesis, and evolution. As a non-chordate deuterostome, the sea urchin occupies an important evolutionary niche between protostomes and vertebrates. Lytechinus variegatus (Lv) is an Atlantic species that has been well studied, and which has provided important insights into signal transduction, patterning, and morphogenetic changes during embryonic and larval development. The Pacific species, Strongylocentrotus purpuratus (Sp), is another well-studied sea urchin, particularly for gene regulatory networks (GRNs) and cis-regulatory analyses. A well-annotated genome and transcriptome for Sp are available, but similar resources have not been developed for Lv. Here, we provide an analysis of the Lv transcriptome at 11 timepoints during embryonic and larval development. Temporal analysis suggests that the gene regulatory networks that underlie specification are well-conserved among sea urchin species. We show that the major transitions in variation of embryonic transcription divide the developmental time series into four distinct, temporally sequential phases. Our work shows that sea urchin development occurs via sequential intervals of relatively stable gene expression states that are punctuated by abrupt transitions.National Science FoundationFirst author draf
Distinct Assemblies of Heterodimeric Cytokine Receptors Govern Stemness Programs in Leukemia
Published first May 16, 2023Leukemia stem cells (LSC) possess distinct self-renewal and arrested differentiation properties that are responsible for disease emergence, therapy failure, and recurrence in acute myeloid leukemia (AML). Despite AML displaying extensive biological and clinical heterogeneity, LSC with high interleukin-3 receptor (IL3R) levels are a constant yet puzzling feature, as this receptor lacks tyrosine kinase activity. Here, we show that the heterodimeric IL3Rα/βc receptor assembles into hexamers and dodecamers through a unique interface in the 3D structure, where high IL3Rα/βc ratios bias hexamer formation. Importantly, receptor stoichiometry is clinically relevant as it varies across the individual cells in the AML hierarchy, in which high IL3Rα/βc ratios in LSCs drive hexamer-mediated stemness programs and poor patient survival, while low ratios mediate differentiation. Our study establishes a new paradigm in which alternative cytokine receptor stoichiometries differentially regulate cell fate, a signaling mechanism that may be generalizable to other transformed cellular hierarchies and of potential therapeutic significance.Winnie L. Kan, Urmi Dhagat, Kerstin B. Kaufmann, Timothy R. Hercus, Tracy L. Nero, Andy G.X. Zeng, John Toubia, Emma F. Barry, Sophie E. Broughton, Guillermo A. Gomez, Brooks A. Benard, Mara Dottore, Karen S. Cheung Tung Shing, Héléna Boutzen, Saumya E. Samaraweera, Kaylene J. Simpson, Liqing Jin, Gregory J. Goodall, C. Glenn Begley, Daniel Thomas, Paul G. Ekert, Denis Tvorogov, Richard J. D, Andrea, John E. Dick, Michael W. Parker, and Angel F. Lope
Comparative genomics of recent adaptation in Candida pathogens
[eng] Fungal infections pose a serious health threat, affecting >1,000 million people and causing ~1.5 million deaths each year. The problem is growing due to insufficient diagnostic and therapeutic options, increased number of susceptible patients, expansion of pathogens partly linked to climate change and the rise of antifungal drug resistance. Among other fungal pathogens, Candida species are a major cause of severe hospital-acquired infections, with high mortality in immunocompromised patients. Various Candida pathogens constitute a public health issue, which require further efforts to develop new drugs, optimize currently available treatments and improve diagnostics. Given the high dynamism of Candida genomes, a promising strategy to improve current therapies and diagnostics is to understand the evolutionary mechanisms of adaptation to antifungal drugs and to the human host. Previous work using in vitro evolution, population genomics, selection inferences and Genome Wide Association Studies (GWAS) have partially clarified such recent adaptation, but various open questions remain. In the three research articles that conform this PhD thesis we addressed some of these gaps from the perspective of comparative genomics.
First, we addressed methodological issues regarding the analysis of Candida genomes. Studying recent adaptation in these pathogens requires adequate bioinformatic tools for variant calling, filtering and functional annotation. Among other reasons, current methods are suboptimal due to limited accuracy to identify structural variants from short read sequencing data. In addition, there is a need for easy-to-use, reproducible variant calling pipelines. To address these gaps we developed the “personalized Structural Variation detection” pipeline (perSVade), a framework to call, filter and annotate several variant types, including structural variants, directly from reads. PerSVade enables accurate identification of structural variants in any species of interest, such as Candida pathogens. In addition, our tool automatically predicts the structural variant calling accuracy on simulated genomes, which informs about the reliability of the calling process. Furthermore, perSVade can be used to analyze single nucleotide polymorphisms and copy number-variants, so that it facilitates multi-variant, reproducible genomic studies. This tool will likely boost variant analyses in Candida pathogens and beyond.
Second, we addressed open questions about recent adaptation in Candida, using perSVade for variant identification. On the one hand, we investigated the evolutionary mechanisms of drug resistance in Candida glabrata. For this, we used a large-scale in vitro evolution experiment to study adaptation to two commonly-used antifungals: fluconazole and anidulafungin. Our results show rapid adaptation to one or both drugs, with moderate fitness costs and through few mutations in a narrow set of genes. In addition, we characterize a novel role of ERG3 mutations in cross-resistance towards fluconazole in
anidulafungin-adapted strains. These findings illuminate the mutational paths leading to drug resistance and cross-resistance in Candida pathogens. On the other hand, we reanalyzed ~2,000 public genomes and phenotypes to understand the signs of recent selection and drug resistance in six major Candida species: C. auris, C. glabrata, C. albicans, C. tropicalis, C. parapsilosis and C. orthopsilosis. We found hundreds of genes under recent selection, suggesting that clinical adaptation is diverse and complex. These involve species-specific but also convergently affected processes, such as cell adhesion, which could underlie conserved adaptive mechanisms. In addition, using GWAS we predicted known drivers of antifungal resistance alongside potentially novel players. Furthermore, our analyses reveal an important role of generally-overlooked structural variants, and suggest an unexpected involvement of (para)sexual recombination in the spread of resistance. Taken together, our findings provide novel insights on how Candida pathogens adapt to human-related environments and suggest candidate genes that deserve future attention. In summary, the results of this thesis improve our knowledge about the mechanisms of recent adaptation in Candida pathogens, which may enable improved therapeutic and diagnostic applications.[cat] Les infeccions fúngiques representen una greu amenaça per a la salut, afectant a més de 1.000 milions de persones i causant aproximadament 1,5 milions de morts cada any. El problema està augmentant a causa d’unes opcions terapèutiques i diagnòstiques insuficients, l'increment del nombre de pacients susceptibles, l'expansió dels patògens parcialment vinculada al canvi climàtic i l'augment de la resistència als fàrmacs antifúngics. D’entre diversos fongs patògens, els llevats del gènere Candida són una causa important d'infeccions nosocomials, amb una alta mortalitat en pacients immunodeprimits. Diverses espècies de Candida constitueixen un problema de salut pública, cosa que requereix més esforços per a desenvolupar nous medicaments, optimitzar els tractaments disponibles i millorar els diagnòstics. Tenint en compte el dinamisme genòmic d’aquests patògens, una estratègia prometedora per millorar les teràpies i diagnòstics actuals és comprendre els mecanismes evolutius d'adaptació als fàrmacs antifúngics i a l’hoste humà. Treballs anteriors utilitzant l'evolució in vitro, la genòmica de poblacions, les inferències de selecció i els estudis d'associació de genoma complet (GWAS, per les sigles en anglès) han aclarit parcialment aquesta adaptació recent, però encara hi ha diverses preguntes obertes. En els tres articles que conformen aquesta tesi doctoral, hem abordat algunes d'aquestes preguntes des de la perspectiva de la genòmica comparativa.
En primer lloc, hem abordat qüestions metodològiques relatives a l'anàlisi dels genomes de les espècies Candida. L'estudi de l'adaptació recent en aquests patògens requereix eines bioinformàtiques adequades per a la detecció, filtratge i anotació funcional de variants genètiques. Entre altres raons, els mètodes actuals són subòptims a causa de la limitada precisió per identificar variants estructurals a partir de dades de seqüenciació amb lectures curtes. A més, hi ha una necessitat d’eines computacionals per a la detecció de variants que siguin senzilles d'utilitzar i reproduibles. Per abordar aquestes mancances, hem desenvolupat el mètode bioinformàtic "personalized Structural Variation detection" (perSVade), una eina que permet la detecció, filtratge i anotació de diversos tipus de variants, incloent-hi les variants estructurals, directament des de les lectures. PerSVade permet la identificació precisa de les variants estructurals en qualsevol espècie d'interès, com ara els patògens Candida. A més, la nostra eina prediu automàticament la precisió de la detecció d’aquestes variants en genomes simulats, la qual cosa informa sobre la fiabilitat del procés. Finalment, perSVade es pot utilitzar per analitzar altres tipus de variants, com els polimorfismes de nucleòtid únic o els canvis en el nombre de còpies, facilitant així estudis genòmics integrals i reproduibles. Aquesta eina probablement impulsarà les anàlisis genòmiques en els patògens Candida i també en altres espècies.
En segon lloc, hem abordat algunes de les preguntes obertes sobre l'adaptació recent en els llevats Candida, utilitzant perSVade per a la identificació de variants. D'una banda, hem investigat els mecanismes evolutius de resistència als fàrmacs antifúngics en Candida glabrata. Per a això, hem utilitzat un experiment
d'evolució in vitro a gran escala per estudiar l'adaptació a dos antifúngics comuns: el fluconazol i l’anidulafungina. Els nostres resultats mostren una adaptació ràpida a un o ambdós fàrmacs, amb un cost per al creixement moderat i a través de poques mutacions en un nombre reduït de gens. A més, hem caracteritzat un paper nou de les mutacions en ERG3 en la resistència creuada al fluconazol en soques adaptades a anidulafungina. Aquests descobriments aclareixen els processos mutacionals que condueixen a la resistència als fàrmacs i a la resistència creuada en els patògens Candida. D'altra banda, hem re-analitzat aproximadament 2.000 genomes i fenotips disponibles en repositoris públics per a comprendre els senyals genòmics de selecció recent i de resistència a fàrmacs antifúngics, en sis espècies rellevants de Candida: C. auris, C. glabrata, C. albicans, C. tropicalis, C. parapsilosis i C. orthopsilosis. Hem trobat centenars de gens sota selecció recent, suggerint que l'adaptació clínica és diversa i complexa. Aquests gens estan relacionats amb funcions específiques de cada espècie, però també trobem processos alterats de manera similar en diferents patògens, com per exemple l’adhesió cel·lular, cosa que indica fenòmens d’adaptació conservats. A part, utilitzant GWAS hem predit mecanismes esperats de resistència a antifúngics i també possibles nous factors. A més, les nostres anàlisis revelen un paper important de les variants estructurals, generalment poc estudiades, i suggereixen una implicació inesperada de la recombinació (para)sexual en la propagació de la resistència. En conjunt, els nostres descobriments proporcionen noves perspectives sobre com els patògens Candida s'adapten als entorns humans, i suggereixen gens candidats que mereixen investigacions futures. En resum, els resultats d’aquesta tesi milloren el nostre coneixement sobre els mecanismes d'adaptació recent en els patògens Candida, cosa que pot permetre el disseny de noves teràpies i diagnòstics
Addressing the data bottleneck in medical deep learning models using a human-in-the-loop machine learning approach
[Abstract]: Any machine learning (ML) model is highly dependent on the data it uses for learning, and this is even more important in the case of deep learning models. The problem is a data bottleneck, i.e. the difficulty in obtaining an adequate number of cases and quality data. Another issue is improving the learning process, which can be done by actively introducing experts into the learning loop, in what is known as human-in-the-loop (HITL) ML. We describe an ML model based on a neural network in which HITL techniques were used to resolve the data bottleneck problem for the treatment of pancreatic cancer. We first augmented the dataset using synthetic cases created by a generative adversarial network. We then launched an active learning (AL) process involving human experts as oracles to label both new cases and cases by the network found to be suspect. This AL process was carried out simultaneously with an interactive ML process in which feedback was obtained from humans in order to develop better synthetic cases for each iteration of training. We discuss the challenges involved in including humans in the learning process, especially in relation to human–computer interaction, which is acquiring great importance in building ML models and can condition the success of a HITL approach. This paper also discusses the methodological approach adopted to address these challenges.This work has been supported by the State Research Agency of the Spanish Government (Grant PID2019-107194GB-I00/AEI/10.13039/501100011033) and by the Xunta de Galicia (Grant ED431C 2022/44), supported in turn by the EU European Regional Development Fund. We wish to acknowledge support received from the Centro de Investigación de Galicia CITIC, funded by the Xunta de Galicia and the European Regional Development Fund (Galicia 2014–2020 Program; Grant ED431G 2019/01).Xunta de Galicia; ED431C 2022/44Xunta de Galicia; ED431G 2019/0
- …