    Viral Quasispecies Reconstruction Using Next Generation Sequencing Reads

    The genomic diversity of viral quasispecies is a subject of great interest, especially for chronic infections. Characterization of viral diversity can be addressed by high-throughput sequencing technology (454 Life Sciences, Illumina, SOLiD, Ion Torrent, etc.). Standard assembly software was originally designed for single genome assembly and cannot be used to assemble and estimate the frequency of closely related quasispecies sequences. This work focuses on parsimonious and maximum likelihood models for assembling viral quasispecies and estimating their frequencies from 454 sequencing data. Our methods have been applied to several RNA viruses (HCV, IBV) as well as DNA viruses (HBV), genotyped using 454 Life Sciences amplicon and shotgun methods

    Recent advances in inferring viral diversity from high-throughput sequencing data

    Rapidly evolving RNA viruses prevail within a host as a collection of closely related variants, referred to as viral quasispecies. Advances in high-throughput sequencing (HTS) technologies have facilitated the assessment of the genetic diversity of such virus populations at an unprecedented level of detail. However, analysis of HTS data from virus populations is challenging due to short, error-prone reads. In order to account for uncertainties originating from these limitations, several computational and statistical methods have been developed for studying the genetic heterogeneity of virus population. Here, we review methods for the analysis of HTS reads, including approaches to local diversity estimation and global haplotype reconstruction. Challenges posed by aligning reads, as well as the impact of reference biases on diversity estimates are also discussed. In addition, we address some of the experimental approaches designed to improve the biological signal-to-noise ratio. In the future, computational methods for the analysis of heterogeneous virus populations are likely to continue being complemented by technological developments.ISSN:0168-170

    Algorithms for Transcriptome Quantification and Reconstruction from RNA-Seq Data

    Massively parallel whole transcriptome sequencing and its ability to generate full transcriptome data at the single transcript level provides a powerful tool with multiple interrelated applications, including transcriptome reconstruction, gene/isoform expression estimation, also known as transcriptome quantification. As a result, whole transcriptome sequencing has become the technology of choice for performing transcriptome analysis, rapidly replacing array-based technologies. The most commonly used transcriptome sequencing protocol, referred to as RNA-Seq, generates short (single or paired) sequencing tags from the ends of randomly generated cDNA fragments. RNA-Seq protocol reduces the sequencing cost and significantly increases data throughput, but is computationally challenging to reconstruct full-length transcripts and accurately estimate their abundances across all cell types. We focus on two main problems in transcriptome data analysis, namely, transcriptome reconstruction and quantification. Transcriptome reconstruction, also referred to as novel isoform discovery, is the problem of reconstructing the transcript sequences from the sequencing data. Reconstruction can be done de novo or it can be assisted by existing genome and transcriptome annotations. Transcriptome quantification refers to the problem of estimating the expression level of each transcript. We present a genome-guided and annotation-guided transcriptome reconstruction methods as well as methods for transcript and gene expression level estimation. Empirical results on both synthetic and real RNA-seq datasets show that the proposed methods improve transcriptome quantification and reconstruction accuracy compared to previous methods

    Embracing heterogeneity: coalescing the Tree of Life and the future of phylogenomics

    Building the Tree of Life (ToL) is a major challenge of modern biology, requiring advances in cyberinfrastructure, data collection, theory, and more. Here, we argue that phylogenomics stands to benefit by embracing the many heterogeneous genomic signals emerging from the first decade of large-scale phylogenetic analysis spawned by high-throughput sequencing (HTS). Such signals include those most commonly encountered in phylogenomic datasets, such as incomplete lineage sorting, but also those reticulate processes emerging with greater frequency, such as recombination and introgression. Here we focus specifically on how phylogenetic methods can accommodate the heterogeneity incurred by such population genetic processes; we do not discuss phylogenetic methods that ignore such processes, such as concatenation or supermatrix approaches or supertrees. We suggest that methods of data acquisition and the types of markers used in phylogenomics will remain restricted until a posteriori methods of marker choice are made possible with routine whole-genome sequencing of taxa of interest. We discuss limitations and potential extensions of a model supporting innovation in phylogenomics today, the multispecies coalescent model (MSC). Macroevolutionary models that use phylogenies, such as character mapping, often ignore the heterogeneity on which building phylogenies increasingly rely and suggest that assimilating such heterogeneity is an important goal moving forward. Finally, we argue that an integrative cyberinfrastructure linking all steps of the process of building the ToL, from specimen acquisition in the field to publication and tracking of phylogenomic data, as well as a culture that values contributors at each step, are essential for progress

    Phylogenomic and population genomic insights on the evolutionary history of coffee leaf rust within the rust fungi

    Tese de doutoramento, Biologia e Ecologia das Alterações Globais (Biologia do Genoma e Evolução), Universidade de Lisboa, Faculdade de Ciências, 2018Fungi are currently responsible for more than 30% of the emerging diseases worldwide and rust fungi (Pucciniales, Basidiomycota) are one of the most destructive groups of plant pathogens. In this thesis, two genomic approaches were pursued to further our knowledge on these pathogenic fungi at the macro-evolutionary level, using phylogenomics, and micro-evolutionary level, using population genomics. At the macro-evolutionary level, a phylogenomics pipeline was developed with the aim of investigating the role of positive selection on the origin of the rusts, particularly related to their obligate biotrophic life-style and pathogenicity. With up to 30% of the ca. 1000 screened genes showing a signal of positive selection, these results revealed a pervasive role of natural selection on the origin of this fungal group, with an enrichment of functional classes involved in nutrient uptake and secondary metabolites. Furthermore, positive selection was detected on conserved amino acid sites revealing an unexpected but potentially important role of natural selection on codon usage preferences. At the micro-evolutionary level, the focus was shifted to the coffee rust, Hemileia vastatrix, which is the causal agent of leaf rust disease and the main threat to Arabic coffee production worldwide. Using RAD sequencing to produce thousands of informative SNPs for a broad and unique sampling of this species, the aim was to investigate its evolutionary history and translate population genomic insights into recommendations for disease control. The results of this work overturned most of the preconceptions about the pathogen by revealing that instead of a single unstructured and large population, H. vastatrix is most likely a complex of cryptic species with marked host specialization. Moreover, genomic signatures of hybridization and introgression occurring between these lineages were uncovered, raising the possibility that virulence factors may be quickly exchanged. The most recent “domesticated” lineage infects exclusively the most important coffee species and SNP linkage analyses revealed the presence of recombination among isolates that were previously thought to be clonal. Altogether, these results considerably raise the evolutionary potential of this pathogen to overcome disease control measures in coffee crops. To undertake most of the tasks in this project, a new computational application called TriFusion was developed to streamline the gathering, processing and visualization of big genomic data

    Statistical Population Genomics

    This open access volume presents state-of-the-art inference methods in population genomics, focusing on data analysis based on rigorous statistical techniques. After introducing general concepts related to the biology of genomes and their evolution, the book covers state-of-the-art methods for the analysis of genomes in populations, including demography inference, population structure analysis and detection of selection, using both model-based inference and simulation procedures. Last but not least, it offers an overview of the current knowledge acquired by applying such methods to a large variety of eukaryotic organisms. Written in the highly successful Methods in Molecular Biology series format, chapters include introductions to their respective topics, pointers to the relevant literature, step-by-step, readily reproducible laboratory protocols, and tips on troubleshooting and avoiding known pitfalls. Authoritative and cutting-edge, Statistical Population Genomics aims to promote and ensure successful applications of population genomic methods to an increasing number of model systems and biological questions

    Characterization of vascular heterogeneity of astrocytomas grade 4 for supporting patient prognosis estimation, and treatment response assessment

    [ES] Los tumores cerebrales son una de las enfermedades más devastadoras en la actualidad por el importante deterioro cognitivo que sufren los pacientes, la elevada tasa de mortalidad y el mal pronóstico. Los astrocitomas de grado 4 conllevan una supervivencia de cinco años en aproximadamente el 5% de los pacientes diagnosticados, siendo los tumores más agresivos y letales del Sistema Nervioso Central (SNC). Los astrocitomas de grado 4 siguen siendo un problema médico complejo aún sin resolver. A pesar de representar más del 60% de los tumores cerebrales malignos en adultos, estos tumores tienen una baja prevalencia relativa y se consideran una enfermedad huérfana, lo que dificulta el desarrollo de nuevos fármacos o tratamientos que puedan beneficiar a los pacientes. La agresividad de estos tumores se debe a diferentes características, como la fuerte angiogénesis, la necrosis, la microproliferación vascular, la capacidad de invasión e infiltración de las células tumorales y un microambiente inmunológico particular. Además, debido a la rápida progresión de los astrocitomas de grado 4, en la zona de la lesión coexisten diferentes regiones específicas que cambian con el tiempo. Esta naturaleza compleja, junto con la marcada heterogeneidad interpaciente, intratumoral y longitudinal, complica el éxito de un único tratamiento eficaz para todos los pacientes. La imagen de resonancia magnética (MRI) supone una técnica útil para caracterizar la morfología y la vascularidad del tumor. El uso de métodos avanzados y robustos para analizar las imágenes de MR recogidas en las fases iniciales del tratamiento de los pacientes permite la delimitación de las diferentes regiones de los astrocitomas de grado 4, convirtiéndose en herramientas útiles para investigadores, radiólogos y neurocirujanos. Además, el cálculo de biomarcadores vasculares de imagen, como los propuestos en esta tesis, facilitaría la caracterización del tumor, la estimación del pronóstico y los enfoques de tratamiento más personalizados. Esta tesis propone cuatro pilares fundamentales para avanzar en el manejo de los astrocitomas de grado 4. Estos incluyen I) la caracterización multinivel del tumor para mejorar las clasificaciones de los gliomas de alto grado del SNC; II) la búsqueda y desarrollo de biomarcadores robustos para estimar el pronóstico de los pacientes desde el momento prequirúrgico; III) así como para evaluar la respuesta a los tratamientos y la selección de los pacientes que pueden beneficiarse de terapias específicas; y IV) el diseño e implementación de estudios clínicos y protocolos para la recogida de datos a largo plazo de cohortes de pacientes notables a nivel internacional. Para abordar estos cuatro pilares, se ha utilizado un enfoque interdisciplinario que combina el análisis de imágenes médicas, técnicas avanzadas de inteligencia artificial y variables moleculares, histopatológicas y clínicas. En conclusión, hemos abordado la influencia de la heterogeneidad interpaciente e intratumoral del astrocitoma de grado 4 para la caracterización y clasificación del tumor, la estimación del pronóstico del paciente y la predicción de las respuestas al tratamiento. Además, se han diseñado e implementado diferentes estudios clínicos que permiten la recogida de datos multinivel de cohortes internacionales de pacientes con astrocitoma de grado 4.[CA] Els tumors cerebrals són una de les malalties més devastadores en l'actualitat per la important deterioració cognitiva que pateixen els pacients, l'elevada taxa de mortalitat i el mal pronòstic. Els astrocitomes de grau 4 comporten una supervivència de cinc anys en aproximadament el 5% dels pacients diagnosticats, sent els tumors més agressius i letals del Sistema Nerviós Central (SNC). Els astrocitomes de grau 4 continuen sent un problema mèdic complex encara sense resoldre. Malgrat representar més del 60% dels tumors cerebrals malignes en adults, aquests tumors tenen una baixa prevalença relativa i es consideren una malaltia òrfena, la qual cosa dificulta el desenvolupament de nous fàrmacs o tractaments que puguen beneficiar als pacients. L'agressivitat d'aquests tumors es deu a diferents característiques, com la forta angiogènesis, la necrosi, la microproliferació vascular, la capacitat d'invasió i infiltració de les cèl·lules tumorals i un microambient immunològic particular. A més, a causa de la ràpida progressió dels astrocitomes de grau 4, en la zona de la lesió coexisteixen diferents regions específiques que canvien amb el temps. Aquesta naturalesa complexa, juntament amb la marcada heterogeneïtat interpacient, intratumoral i longitudinal fa que es complique l'èxit d'un únic tractament eficaç per a tots els pacients. L'imatge de ressonància magnètica (MRI) suposa una tècnica útil per a caracteritzar la morfologia i la vascularitat del tumor. L'ús de mètodes avançats i robustos per a analitzar les imatges de MR recollides en les fases inicials del tractament dels pacients permet la delimitació de les diferents regions dels astrocitomes de grau 4, convertint-se en eines útils per a investigadors, radiòlegs i neurocirugians. A més, el càlcul de biomarcadors vasculars d'imatge, com els proposats en aquesta tesi, facilitaria la caracterització del tumor, l'estimació del pronòstic i els enfocaments de tractament més personalitzats. Aquesta tesi proposa quatre pilars fonamentals per a avançar en el maneig dels astrocitomes de grau 4. Aquests inclouen I) la caracterització multinivell del tumor per a millorar les classificacions dels gliomes d'alt grau del SNC; II) la cerca i desenvolupament de biomarcadors robustos per a estimar el pronòstic dels pacients des del moment prequirúrgic; III) així com per a avaluar la resposta als tractaments i la selecció dels pacients que poden beneficiar-se de teràpies específiques; i IV) el disseny i implementació d'estudis clínics i protocols per a la recollida de dades a llarg termini de cohorts de pacients notables a nivell internacional. Per a abordar aquests quatre pilars, s'ha utilitzat un enfocament interdisciplinari que combina l'anàlisi d'imatges mèdiques, tècniques avançades d'intel·ligència artificial i variables moleculars, histopatològiques i clíniques. En conclusió, hem abordat la influència de l'heterogeneïtat interpacient i intratumoral del astrocitoma de grau 4 per a la caracterització i classificació del tumor, l'estimació del pronòstic del pacient i la predicció de les respostes al tractament. A més, s'han dissenyat i implementat diferents estudis clínics que permeten la recollida de dades multinivell de cohorts internacionals de pacients amb astrocitoma de grau 4.[EN] Brain tumors are one of the most devastating diseases today because of the significant cognitive impairment suffered by patients, high mortality rates, and poor prognosis. Astrocytomas grade 4 bring five-year survival in approximately 5% of diagnosed patients, being the most aggressive and lethal tumors of the Central Nervous System (CNS). Astrocytomas grade 4 continue to be an unresolved complex medical problem. Despite accounting for more than 60% of malignant brain tumors in adults, these tumors have a low relative prevalence and are considered an orphan disease, making difficult developing new drugs or treatments that might benefit patients. The aggressiveness of these tumors is due to different characteristics, such as strong angiogenesis, necrosis, vascular microproliferation, the capacity of the tumor cells to invade and infiltrate, and a particular immune microenvironment. In addition, due to the rapid progression of astrocytomas grade 4, different specific regions coexist in the lesion area which change over time. This complex nature, along with the marked interpatient, intratumor, and longitudinal heterogeneity, makes complicate the success of a single efficient treatment for all patients. Magnetic Resonance Imaging (MRI) represents a useful technique to characterize tumor morphology and vascularity. Using advanced and robust methods to analyze MR images collected from initial stages of patient management allows the delineation of different regions of astrocytomas grade 4, becoming useful tools for researchers, radiologists and neurosurgeons. In addition, the calculation of imaging vascular biomarkers, such as those proposed in this thesis, would facilitate tumor characterization, prognosis estimation and more personalized treatment approaches. This thesis proposes four fundamental pillars to advance the management of astrocytomas grade 4. These include I) the multilevel characterization of the tumor to improve classifications of high-grade CNS gliomas; II) the search and development of robust biomarkers for estimating patient prognosis from the presurgical moment; III) as well as for evaluating the response to treatments and the selection of patients who may benefit from specific therapies; and IV) the design and implementation of clinical studies and protocols for long-term collecting data from internationally remarkable cohorts of patients. To address these four pillars, an interdisciplinary approach has been used that combines medical imaging analysis, advanced artificial intelligence techniques, and molecular, histopathological, and clinical variables. Concluding, we have addressed the influence of both interpatient and intratumor heterogeneity of astrocytoma grade 4 for tumor characterization and classification, patient prognosis estimation and predicting treatment responses. In addition, different clinical studies have been designed and implemented allowing the collection of multilevel data from international cohorts of patients with astrocytoma grade 4.Álvarez Torres, MDM. (2022). Characterization of vascular heterogeneity of astrocytomas grade 4 for supporting patient prognosis estimation, and treatment response assessment [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/18895