36 research outputs found

    Comparative gene prediction in human and mouse.

    Full text link
    The completion of the sequencing of the mouse genome promises to help predict human genes with greater accuracy. While current ab initio gene prediction programs are remarkably sensitive (i.e., they predict at least a fragment of most genes), their specificity is often low, predicting a large number of false-positive genes in the human genome. Sequence conservation at the protein level with the mouse genome can help eliminate some of those false positives. Here we describe SGP2, a gene prediction program that combines ab initio gene prediction with TBLASTX searches between two genome sequences to provide both sensitive and specific gene predictions. The accuracy of SGP2 when used to predict genes by comparing the human and mouse genomes is assessed on a number of data sets, including single-gene data sets, the highly curated human chromosome 22 predictions, and entire genome predictions from ENSEMBL. Results indicate that SGP2 outperforms purely ab initio gene prediction methods. Results also indicate that SGP2 works about as well with 3x shotgun data as it does with fully assembled genomes. SGP2 provides a high enough specificity that its predictions can be experimentally verified at a reasonable cost. SGP2 was used to generate a complete set of gene predictions on both the human and mouse by comparing the genomes of these two species. Our results suggest that another few thousand human and mouse genes currently not in ENSEMBL are worth verifying experimentally

    Comparative gene finding in chicken indicates that we are closing in on the set of multi-exonic widely expressed human genes

    Get PDF
    The recent availability of the chicken genome sequence poses the question of whether there are human protein-coding genes conserved in chicken that are currently not included in the human gene catalog. Here, we show, using comparative gene finding followed by experimental verification of exon pairs by RT-PCR, that the addition to the multi-exonic subset of this catalog could be as little as 0.2%, suggesting that we may be closing in on the human gene set. Our protocol, however, has two shortcomings: (i) the bioinformatic screening of the predicted genes, applied to filter out false positives, cannot handle intronless genes; and (ii) the experimental verification could fail to identify expression at a specific developmental time. This highlights the importance of developing methods that could provide a reliable estimate of the number of these two types of gene

    Comparative gene finding in chicken indicates that we are closing in on the set of multi-exonic widely expressed human genes

    Get PDF
    The recent availability of the chicken genome sequence poses the question of whether there are human protein-coding genes conserved in chicken that are currently not included in the human gene catalog. Here, we show, using comparative gene finding followed by experimental verification of exon pairs by RT–PCR, that the addition to the multi-exonic subset of this catalog could be as little as 0.2%, suggesting that we may be closing in on the human gene set. Our protocol, however, has two shortcomings: (i) the bioinformatic screening of the predicted genes, applied to filter out false positives, cannot handle intronless genes; and (ii) the experimental verification could fail to identify expression at a specific developmental time. This highlights the importance of developing methods that could provide a reliable estimate of the number of these two types of genes

    Comparison of mouse and human genomes followed by experimental verification yields an estimated 1,019 additional genes.

    Full text link
    A primary motivation for sequencing the mouse genome was to accelerate the discovery of mammalian genes by using sequence conservation between mouse and human to identify coding exons. Achieving this goal proved challenging because of the large proportion of the mouse and human genomes that is apparently conserved but apparently does not code for protein. We developed a two-stage procedure that exploits the mouse and human genome sequences to produce a set of genes with a much higher rate of experimental verification than previously reported prediction methods. RT-PCR amplification and direct sequencing applied to an initial sample of mouse predictions that do not overlap previously known genes verified the regions flanking one intron in 139 predictions, with verification rates reaching 76%. On average, the confirmed predictions show more restricted expression patterns than the mouse orthologs of known human genes, and two-thirds lack homologs in fish genomes, demonstrating the sensitivity of this dual-genome approach to hard-to-find genes. We verified 112 previously unknown homologs of known proteins, including two homeobox proteins relevant to developmental biology, an aquaporin, and a homolog of dystrophin. We estimate that transcription and splicing can be verified for >1,000 gene predictions identified by this method that do not overlap known genes. This is likely to constitute a significant fraction of the previously unknown, multiexon mammalian genes

    Immune cell profiling of the cerebrospinal fluid enables the characterization of the brain metastasis microenvironment

    Get PDF
    Brain metastases are the most common tumor of the brain with a dismal prognosis. A fraction of patients with brain metastasis benefit from treatment with immune checkpoint inhibitors (ICI) and the degree and phenotype of the immune cell infiltration has been used to predict response to ICI. However, the anatomical location of brain lesions limits access to tumor material to characterize the immune phenotype. Here, we characterize immune cells present in brain lesions and matched cerebrospinal fluid (CSF) using single-cell RNA sequencing combined with T cell receptor genotyping. Tumor immune infiltration and specifically CD8 + T cell infiltration can be discerned through the analysis of the CSF. Consistently, identical T cell receptor clonotypes are detected in brain lesions and CSF, confirming cell exchange between these compartments. The analysis of immune cells of the CSF can provide a non-invasive alternative to predict the response to ICI, as well as identify the T cell receptor clonotypes present in brain metastasis. The use of CSF for diagnosis of metastatic brain tumors could be of clinical and patient benefit. Here the authors undertake a single-cell RNA analysis of CSF and brain to determine whether the phenotype in the CSF is reflective of the phenotype in the tumo

    Integrated Analysis of Germline and Tumor DNA Identifies New Candidate Genes Involved in Familial Colorectal Cancer

    Get PDF
    Colorectal cancer (CRC) shows aggregation in some families but no alterations in the known hereditary CRC genes. We aimed to identify new candidate genes which are potentially involved in germline predisposition to familial CRC. An integrated analysis of germline and tumor whole-exome sequencing data was performed in 18 unrelated CRC families. Deleterious single nucleotide variants (SNV), short insertions and deletions (indels), copy number variants (CNVs) and loss of heterozygosity (LOH) were assessed as candidates for first germline or second somatic hits. Candidate tumor suppressor genes were selected when alterations were detected in both germline and somatic DNA, fulfilling Knudson's two-hit hypothesis. Somatic mutational profiling and signature analysis were also performed. A series of germline-somatic variant pairs were detected. In all cases, the first hit was presented as a rare SNV/indel, whereas the second hit was either a different SNV (3 genes) or LOH affecting the same gene (141 genes). BRCA2, BLM, ERCC2, RECQL, REV3L and RIF1 were among the most promising candidate genes for germline CRC predisposition. The identification of new candidate genes involved in familial CRC could be achieved by our integrated analysis. Further functional studies and replication in additional cohorts are required to confirm the selected candidates

    Systematic Collaborative Reanalysis of Genomic Data Improves Diagnostic Yield in Neurologic Rare Diseases

    Get PDF
    Altres ajuts: Generalitat de Catalunya, Departament de Salut; Generalitat de Catalunya, Departament d'Empresa i Coneixement i CERCA Program; Ministerio de Ciencia e Innovación; Instituto Nacional de Bioinformática; ELIXIR Implementation Studies (CNAG-CRG); Centro de Investigaciones Biomédicas en Red de Enfermedades Raras; Centro de Excelencia Severo Ochoa; European Regional Development Fund (FEDER).Many patients experiencing a rare disease remain undiagnosed even after genomic testing. Reanalysis of existing genomic data has shown to increase diagnostic yield, although there are few systematic and comprehensive reanalysis efforts that enable collaborative interpretation and future reinterpretation. The Undiagnosed Rare Disease Program of Catalonia project collated previously inconclusive good quality genomic data (panels, exomes, and genomes) and standardized phenotypic profiles from 323 families (543 individuals) with a neurologic rare disease. The data were reanalyzed systematically to identify relatedness, runs of homozygosity, consanguinity, single-nucleotide variants, insertions and deletions, and copy number variants. Data were shared and collaboratively interpreted within the consortium through a customized Genome-Phenome Analysis Platform, which also enables future data reinterpretation. Reanalysis of existing genomic data provided a diagnosis for 20.7% of the patients, including 1.8% diagnosed after the generation of additional genomic data to identify a second pathogenic heterozygous variant. Diagnostic rate was significantly higher for family-based exome/genome reanalysis compared with singleton panels. Most new diagnoses were attributable to recent gene-disease associations (50.8%), additional or improved bioinformatic analysis (19.7%), and standardized phenotyping data integrated within the Undiagnosed Rare Disease Program of Catalonia Genome-Phenome Analysis Platform functionalities (18%)

    Computational identification of genes: ab initio and comparative approaches

    No full text
    El trabajo que aquí se presenta, estudia el reconocimiento de las señales que delimitan y definen los genes que codifican para proteínas, así como su aplicabilidad en los programas de predicción de genes. La tesis que aquí se presenta, también explora la utilitzación de la genómica comparativa para mejorar la identificación de genes en diferentes especies simultaniamente. También se explica el desarrollo de dos programas de predicción computacional de genes: geneid y sgp2. El programa geneid identifica los genes codificados en una secuencia anónima de DNA basandose en sus propiedades intrínsecas (principalmente las señales de splicing y el uso diferencial de codones). sgp2 permite utilitzar la comparación entre dos genomas, que han de estar a una cierta distancia evolutiva óptima, para mejorar la predicción de genes, bajo la hipotesis que las regiones codificantes están mas conservadas que las regiones que no codifican para proteínas.The motivation of this thesis is to give a little insight in how genes are encoded and recognized by the cell machinery and to use this information to find genes in unannotated genomic sequences. One of the objectives is the development of tools to identify eukaryotic genes through the modeling and recognition of their intrinsic signals and properties. This thesis addresses another problem: how the sequence of related genomes can contribute to the identification of genes. The value of comparative genomics is illustrated by the sequencing of the mouse genome for the purpose of annotating the human genome. Comparative gene predictions programs exploit this data under the assumption that conserved regions between related species correspond to functional regions (coding genes among them). Thus, this thesis also describes a gene prediction program that combines ab initio gene prediction with comparative information between two genomes to improve the accuracy of the predictions
    corecore