501 research outputs found

    Bioinformatics for personal genomics: development and application of bioinformatic procedures for the analysis of genomic data

    Get PDF
    In the last decade, the huge decreasing of sequencing cost due to the development of high-throughput technologies completely changed the way for approaching the genetic problems. In particular, whole exome and whole genome sequencing are contributing to the extraordinary progress in the study of human variants opening up new perspectives in personalized medicine. Being a relatively new and fast developing field, appropriate tools and specialized knowledge are required for an efficient data production and analysis. In line with the times, in 2014, the University of Padua funded the BioInfoGen Strategic Project with the goal of developing technology and expertise in bioinformatics and molecular biology applied to personal genomics. The aim of my PhD was to contribute to this challenge by implementing a series of innovative tools and by applying them for investigating and possibly solving the case studies included into the project. I firstly developed an automated pipeline for dealing with Illumina data, able to sequentially perform each step necessary for passing from raw reads to somatic or germline variant detection. The system performance has been tested by means of internal controls and by its application on a cohort of patients affected by gastric cancer, obtaining interesting results. Once variants are called, they have to be annotated in order to define their properties such as the position at transcript and protein level, the impact on protein sequence, the pathogenicity and more. As most of the publicly available annotators were affected by systematic errors causing a low consistency in the final annotation, I implemented VarPred, a new tool for variant annotation, which guarantees the best accuracy (>99%) compared to the state-of-the-art programs, showing also good processing times. To make easy the use of VarPred, I equipped it with an intuitive web interface, that allows not only a graphical result evaluation, but also a simple filtration strategy. Furthermore, for a valuable user-driven prioritization of human genetic variations, I developed QueryOR, a web platform suitable for searching among known candidate genes as well as for finding novel gene-disease associations. QueryOR combines several innovative features that make it comprehensive, flexible and easy to use. The prioritization is achieved by a global positive selection process that promotes the emergence of the most reliable variants, rather than filtering out those not satisfying the applied criteria. QueryOR has been used to analyze the two case studies framed within the BioInfoGen project. In particular, it allowed to detect causative variants in patients affected by lysosomal storage diseases, highlighting also the efficacy of the designed sequencing panel. On the other hand, QueryOR simplified the recognition of LRP2 gene as possible candidate to explain such subjects with a Dent disease-like phenotype, but with no mutation in the previously identified disease-associated genes, CLCN5 and OCRL. As final corollary, an extensive analysis over recurrent exome variants was performed, showing that their origin can be mainly explained by inaccuracies in the reference genome, including misassembled regions and uncorrected bases, rather than by platform specific errors

    Computational pan-genomics: status, promises and challenges

    Get PDF
    International audienceMany disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets. Instead, novel, qualitatively different computational methods and paradigms are needed. We will witness the rapid extension of computational pan-genomics, a new sub-area of research in computational biology. In this article, we generalize existing definitions and understand a pan-genome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations as graphs. We outline how this and other challenges from different application domains translate into common computational problems, point out relevant bioinformatics techniques and identify open problems in computer science. With this review, we aim to increase awareness that a joint approach to computational pan-genomics can help address many of the problems currently faced in various domains

    Genomic analyses of the immune system

    Get PDF
    Project 1. Genetic variation in key immune system components Genes underpinning the diversity and plasticity of the human adaptive immune system, such as the HLA and immunoglobulins, are known for their complex structures and polymorphism. The emergence of long-read sequencing technologies has revolutionised genomics research, in particular the characterisation of segmental duplications and structural variation. Here, using long-read sequencing and additional genomics data from a healthy donor identified as HV31, I built two iterations of de novo personal genome assemblies for HV31 as a foundation to study the genetic variation of the immune system. I analysed complex structural variants found in genomic regions encoding key immune system components, and validated them against sequencing data. I also evaluated long-read sequencing accuracy and developed a tool for genomic data visualisation. Collectively, these efforts demonstrate the applications of personal genome assemblies in studying the immune system. Project 2. Effects of low-dose IL-2 immunotherapy in T and NK cells Low-dose interleukin-2 (IL-2) immunotherapy is a promising treatment for type 1 diabetes (T1D). IL-2 supresses autoimmune reactions by increasing the number of regulatory T cells (Tregs). To better understand the mechanism of action of low- dose IL-2 immunotherapy, I analysed single-cell multiomics data of T and NK cells collected from T1D patients before and after low-dose IL-2 treatment. I confirmed that low-dose IL-2 selectively expanded thymic-derived FOXP3+ HELIOS+ regulatory T cells and CD56br NK cells, and showed that the treatment reduced the frequency of IL-21-producing CD4+ T cells. In addition, I identified a long-lived gene expression signature induced by IL-2, which featured the upregulation of CISH and downregulation of AREG. Notably, I found that the signature remained detectable one month after the treatment. Further analyses of publicly available COVID-19 cohort data revealed that SARS-CoV-2 infection induced opposite changes that persisted for several months after recovery. These findings suggested potential mechanisms of long COVID and longer-term benefits of IL-2 immunotherapy

    Candidate Sequence Variants for Polyautoimmunity and Multiple Autoimmune Syndrome from a Colombian Genetic Isolate: Implications for Population Genetics

    Get PDF
    Autoimmunity is an immunological disorder whereby patients have lost immunological tolerance to self-antigen. It has extreme financial and socioeconomic burden with costs of over 100 billion dollars in the USA alone, and an estimated prevalence of 9.4%, and evidence indicates that this estimate has increased at a rate of 5% per year for the past 3 years. These phenotypes can be manifested in more severe forms through polyautoimmunity, whereby patients are carrying 2 or more autoimmune conditions. In addition to that, there is also the most extreme phenotype of autoimmunity known as the Multiple Autoimmune Syndrome (MAS), consisting of cases where patients have 3 or more autoimmune diseases. These extreme phenotypes are extremely important for genetic research as will be elaborated upon in this thesis. For more than 20 years, pedigrees from the world’s largest known genetic isolate, from the Paisa region of Colombia have been ascertained and thoroughly followed by Dr. Juan-Manuel Anaya and Dr. Mauricio Arcos-Burgos. This population has maintained its status as a genetic isolate since the 16th century, during the early colonization by the Spanish Conquistadors. In this thesis, our attempts in identifying potential candidate variants potentially underpinning the genetic etiology of autoimmune conditions in this population is facilitated by the fact that families are derived from individuals carrying extreme phenotypes, from familial cohorts where genetic homogeneity is maximized. Candidates are identified in both sporadic as well as familial cases. This is primarily achieved through combination of linkage analysis and association tests for both rare and common variants, derived from variant-calling pipelines and that had undergone quality control, filtering and functional annotation, via bioinformatic anlayses. Genes harbouring variants with significant evidence of linkage and association were primarily involved in negative regulation of apoptosis, phagocytosis, regulation of endopeptidase activity, response to lipopolysaccharides and plasminogen urokinase receptor activity. These findings, that were obtained by utilizing the combinations of statistical as well as network-based analyses have relevant potential implications in autoimmunity, and can be further supported with additional studies

    Investigación de la distribución de los alelos HLA en poblaciones sanas y enfermas mediante la aplicación de nuevas metodologías de secuenciación

    Get PDF
    Tesis inédita de la Universidad Complutense de Madrid, Facultad de Medicina, Departamento de Inmunología, Oftalmología y ORL, leída el 09/03/2021Increasing our knowledge of the HLA system, including both the complete sequence description and the assessment of its diversity at the worldwide human population-level, is of great importance for elucidating the molecular functional mechanisms of the immune system and its regulation in health and disease. Furthermore, assessment of HLA allelic and haplotypic diversity of each human population is essential in the clinical histocompatibility and transplantation setting as well as in the pharmacogenetics, immunotherapy and anthropology fields. Nevertheless, the inherent vast polymorphism and high complexity presented by the HLA system have been an important challenge for its unambiguous and in-depth (high-resolution) characterization by previously available legacy molecular HLA genotyping methods (e.g. SSP, SSO and even SBT). Recent application of novel next-generation sequencing (NGS) technology for high-resolution molecular HLA genotyping has enabled to obtain, at a high-throughput mode and larger scale, full-length and/or extended sequences and genotypes of all major HLA genes, thus overcoming most of these previous limitations. Objectives: I) Characterization of HLA allele and haplotype diversity of all major classical HLA genes (HLA-A, -B, -C, -DPA1, -DPB1, -DQA1, -DQB1, -DRB1 and -DRB3/4/5) by application of NGS of a first representative cohort of the Spanish population that could also serve as a healthy control reference group. Respective statistical analyses were performed for this immunogenetic population data. II) Characterization of HLA allele and haplotype diversity of all major classical HLA genes (HLA-A, -B, -C, -DPA1, -DPB1, -DQA1, -DQB1, -DRB1 and -DRB3/4/5) by application of NGS of a respective cohort of multiple sclerosis (MS) patients in the Spanish population (recruited at the Department of Neurology, Hospital Clínic, Barcelona, Catalonia, Spain). A first case-control study was carried out to examine HLA-disease associations with MS in these Spanish population cohorts as well as to attempt a fine-mapping of these allele and haplotype associations by full gene resolution level via NGS. In addition, a second analysis exercise (i.e. test case) of this case-control study was carried out using an alternative healthy control group dataset, exclusively from the Spanish northeastern region of Catalonia in this second case, to evaluate possible differences in the findings of HLA-disease association with MS due to plausible regional HLA genetic variation within mainland Spain (i.e. as a statistical way to try controlling for any possible existing population stratification)...El estudio del sistema HLA, incluyendo la descripción completa de su secuencia y de la diversidad de este complejo HLA a nivel poblacional, es de gran importancia de cara a poder entender los mecanismos moleculares y funciones del sistema inmune así como su regulación en individuos sanos y enfermos. Además, la caracterización exhaustiva de la diversidad de alelos y haplotipos HLA de cada población humana es esencial en el campo de la inmunología de trasplante e histocompatibilidad al igual que en las áreas de farmacogenética e inmunoterapia. El inmenso polimorfismo y gran complejidad que presenta el sistema HLA han sido hasta ahora importantes barreras de cara a poder caracterizarlo en gran detalle (por alta resolución) y sin ambigüedades mediante métodos de genotipaje HLA tradicionales disponibles (como son SSP, SSO o incluso SBT). La reciente aplicación de la novedosa tecnología de secuenciación masiva NGS para el genotipaje molecular HLA por alta resolución ha posibilitado obtener secuencias completas o mucho más extendidas para genotipos de los principales genes de HLA, superándose así estas previas limitaciones. Objetivos: I) Caracterización de la diversidad alélica y haplotípica de los principales genes HLA (HLA-A, -B, -C, -DPA1, -DPB1, -DQA1, -DQB1, -DRB1 y -DRB3/4/5) mediante la aplicación de NGS en una primera cohorte representativa de la población española que, igualmente, constituirá una población control de referencia para estudios de asociación de HLA y enfermedades. También, respectivos análisis estadísticos se realizaron para estos resultados de genotipaje HLA. II) Caracterización de la diversidad alélica y haplotípica de los principales genes HLA (HLA-A, -B, -C, -DPA1, -DPB1, -DQA1, -DQB1, -DRB1 y -DRB3/4/5) mediante la aplicación de NGS en una correspondiente cohorte de pacientes con esclerosis múltiple (EM) de la población española (reclutados y procedentes del Departamento de Neurología del Hospital Clínic (Barcelona, Cataluña)). Un primer estudio de asociación HLA tomando casos (pacientes EM) frente a controles sanos se llevó a cabo para examinar la asociación de genes HLA y la enfermedad de EM en estas cohortes de población española antes mencionadas. Así se buscaba realizar un mapeo fino de las respectivas asociaciones alélicas y haplotípicas de HLA mediante la gran resolución alélica proporcionada por esta metodología de secuenciación masiva. De modo adicional, y como un segundo ejercicio de análisis en este estudio de asociación HLA, se utilizó un grupo control sano alternativo al previo, que incluía individuos procedentes de la región de Cataluña (situada al noreste de España) exclusivamente en este caso, para evaluar así posibles diferencias dadas en la asociación de HLA con EM debido a la probable variación genética en HLA existente a nivel regional dentro del territorio de España...Fac. de MedicinaTRUEunpu

    A cooperative framework for molecular biology database integration using image object selection

    Get PDF
    The theme and the concept of 'Molecular Biology Database Integration' and the problems associated with this concept initiated the idea for this Ph.D research. The available technologies facilitate to analyse the data independently and discretely but it fails to integrate the data resources for more meaningful information. This along with the integration issues created the scope for this Ph.D research. The research has reviewed the 'database interoperability' problems and it has suggested a framework for integrating the molecular biology databases. The framework has proposed to develop a cooperative environment to share information on the basis of common purpose for the molecular biology databases. The research has also reviewed other implementation and interoperability issues for laboratory based, dedicated and target specific database. The research has addressed the following issues: diversity of molecular biology databases schemas, schema constructs and schema implementation multi-database query using image object keying, database integration technologies using context graph, automated navigation among these databases. This thesis has introduced a new approach for database implementation. It has introduced an interoperable component database concept to initiate multidatabase query on gene mutation data. A number of data models have been proposed for gene mutation data which is the basis for integrating the target specific component database to be integrated with the federated information system. The proposed data models are: data models for genetic trait analysis, classification of gene mutation data, pathological lesion data and laboratory data. The main feature of this component database is non-overlapping attributes and it will follow non-redundant integration approach as explained in the thesis. This will be achieved by storing attributes which will not have the union or intersection of any attributes that exist in public domain molecular biology databases. Unlike data warehousing technique, this feature is quite unique and novel. The component database will be integrated with other biological data sources for sharing information in a cooperative environment. This involves developing new tools. The thesis explains the role of these new tools which are: meta data extractor, mapping linker, query generator and result interpreter. These tools are used for a transparent integration without creating any global schema of the participating databases. The thesis has also established the concept of image object keying for multidatabase query and it has proposed a relevant algorithm for matching protein spot in gel electrophoresis image. An object spot in gel electrophoresis image will initiate the query when it is selected by the user. It matches the selected spot with other similar spots in other resource databases. This image object keying method is an alternative to conventional multidatabase query which requires writing complex SQL scripts. This method also resolve the semantic conflicts that exist among molecular biology databases. The research has proposed a new framework based on the context of the web data for interactions with different biological data resources. A formal description of the resource context is described in the thesis. The implementation of the context into Resource Document Framework (RDF) will be able to increase the interoperability by providing the description of the resources and the navigation plan for accessing the web based databases. A higher level construct is developed (has, provide and access) to implement the context into RDF for web interactions. The interactions within the resources are achieved by utilising an integration domain to extract the required information with a single instance and without writing any query scripts. The integration domain allows to navigate and to execute the query plan within the resource databases. An extractor module collects elements from different target webs and unify them as a whole object in a single page. The proposed framework is tested to find specific information e.g., information on Alzheimer's disease, from public domain biology resources, such as, Protein Data Bank, Genome Data Bank, Online Mendalian Inheritance in Man and local database. Finally, the thesis proposes further propositions and plans for future work

    Evolutionary Dynamics of Neoplastic Cell Populations in Barrett\u27s Esophagus

    Get PDF
    Cancer is a disease that develops over decades as result of acquisition of abnormalities in the genomes of otherwise normal cells. Acquired genomic heterogeneity in populations of cells within tissues allows cell-level Darwinian evolution that selects abnormal cellular genotypes encoding neoplastic (new benign growth), and in some cases cancerous (invasion within tissues and metastasis across tissues) cellular phenotypes. I studied neoplastic evolution over time in vivo in the pre-malignant condition Barrett\u27s esophagus to address the puzzling clinical phenomenon that 90-95% of individuals with Barrett\u27s stay benign over decades compared to the remaining 5-10% who progress to esophageal adenocarcinoma. Some individuals with Barrett\u27s use aspirin and other non-steroidal anti-inflammatory drugs (NSAIDs) that have been shown to reduce mortality from esophageal adenocarcinoma. I collaborated with the Seattle Barrett\u27s Esophagus Research Program group to test the hypothesis that NSAIDs modulate genome evolution of neoplastic cells by reducing the acquisition rate of somatic genomic abnormalities (SGA). We used single nucleotide polymorphism (SNP) arrays to detect SGA, such as copy number abnormalities and loss of heterozygosity, in 161 biopsies from 13 individuals with Barrett\u27s, obtained over 5-8 time points during 6-19 years of follow-up care. Over the follow-up period, each individual had a single change in NSAID use, allowing us to compare acquisition of SGA during periods on NSAIDs versus periods off NSAIDs within individuals. We found that the rate of accumulation of SGA was significantly lower (typically ten-fold lower) during periods on NSAIDs versus periods off NSAIDs. We also found that typically 1-3% of the genome had acquired SGA at baseline and that this percentage did not increase significantly over decades. In one individual who progressed to esophageal adenocarcinoma we detected a clonally expanded subpopulation of cells within the Barrett\u27s tissue, which had massive SGA affecting 19% of the genome in the last 3 of 11 years of follow-up. In summary, these findings suggest that NSAID use may reduce SGA acquisition rate and that neoplastic cell populations in Barrett\u27s can maintain evolutionary stasis over decades potentially explaining why 90-95% of individuals with Barrett\u27s remain benign and never progress to esophageal adenocarcinoma
    • …
    corecore