25 research outputs found
Bioinformatic characterization and analysis of polymorphic inversions in the human genome
Within the great interest in the characterization of genomic structural variants
(SVs) in the human genome, inversions present unique challenges and
have been little studied. This thesis has developed "GRIAL", a new algorithm
focused specifically in detect and map accurately inversions from
paired-end mapping (PEM) data, which is the most widely used method
to detect SVs. GRIAL is based on geometrical rules to cluster, merge and
refine both breakpoints of putative inversions. That way, we have been
able to predict hundreds of inversions in the human genome. In addition,
thanks to the different GRIAL quality scores, we have been able to
identify spurious PEM-patterns and their causes, and discard a big fraction
of the predicted inversions as false positives. Furthermore, we have created
â ˘ AIJInvFESTâ˘A˙I, the first database of human polymorphic inversions,
which represents the most reliable catalogue of inversions and integrates
all the associated information from multiple sources. Currently, InvFEST
combines information from 30 different studies and contains 1092 candidate
inversions, which are categorized based on internal scores and manual
curation. Finally, the analysis of all the data generated has provided information
on the genomic patterns of inversions, contributing decisively to
the understanding of the map of human polymorphic inversions.Dentro del estudio de las variantes estructurales en el genoma humano,
las inversiones han sido las menos han consolidado sus resultados y constituye
uno de los principales retos en la actualidad. Esta tesis aborda el
tema a través de la implementación de "GRIAL" un nuevo algoritmo específicamente
diseñado para la detección más precisa posible de las inversiones
usando el mapeo de secuencias apareadas (del inglés PEM) que es
el método más utilizado para estudiar la variación estructural. GRIAL se
basa en reglas geométricas para agrupar los patrones de PEM que señalan
un posible punto de rotura (del inglés breakpoint) de inversión, además une
cada breakpoint correspondientes a inversiones independientes y refina lo
más exacto posible su localización. Su uso nos permitió predecir cientos de
inversiones. Un gran aporte de nuestro método es la creación de índices
(del inglés score) de fiabilidad para las predicciones mediante los cuales
identificamos patrones de inversión incorrectos y sus causas. Esto nos
permitió filtrar nuestro resultado eliminando un gran número de predicciones
posiblemente falsas. Además se creó "InvFEST", la primera base de
datos especialmente dedicada a inversiones polimórficas en el genoma humano
la cual representa el catálogo más fiable de inversiones, integrando
además a cada inversión conocida la información asociada disponible. Actualmente
InvFEST contiene (y mantiene la clasificación según el nivel de
certeza) un catálogo de 1092 inversiones clasificadas, a partir de datos de
30 estudios diferentes. Finalmente el análisis de toda la información generada
nos permitió describir algunos patrones de las inversiones polimórficas
en el genoma humano contribuyendo de este modo a la comprensión de
esta variante estructural y el estado de su información en los estudios del
genoma humano.Inversió genòmic
Bioinformatic characterization and analysis of polymorphic inversions in the human genome
Within the great interest in the characterization of genomic structural variants
(SVs) in the human genome, inversions present unique challenges and
have been little studied. This thesis has developed "GRIAL", a new algorithm
focused specifically in detect and map accurately inversions from
paired-end mapping (PEM) data, which is the most widely used method
to detect SVs. GRIAL is based on geometrical rules to cluster, merge and
refine both breakpoints of putative inversions. That way, we have been
able to predict hundreds of inversions in the human genome. In addition,
thanks to the different GRIAL quality scores, we have been able to
identify spurious PEM-patterns and their causes, and discard a big fraction
of the predicted inversions as false positives. Furthermore, we have created
â ˘ AIJInvFESTâ˘A˙I, the first database of human polymorphic inversions,
which represents the most reliable catalogue of inversions and integrates
all the associated information from multiple sources. Currently, InvFEST
combines information from 30 different studies and contains 1092 candidate
inversions, which are categorized based on internal scores and manual
curation. Finally, the analysis of all the data generated has provided information
on the genomic patterns of inversions, contributing decisively to
the understanding of the map of human polymorphic inversions.Dentro del estudio de las variantes estructurales en el genoma humano,
las inversiones han sido las menos han consolidado sus resultados y constituye
uno de los principales retos en la actualidad. Esta tesis aborda el
tema a través de la implementación de "GRIAL" un nuevo algoritmo específicamente
diseñado para la detección más precisa posible de las inversiones
usando el mapeo de secuencias apareadas (del inglés PEM) que es
el método más utilizado para estudiar la variación estructural. GRIAL se
basa en reglas geométricas para agrupar los patrones de PEM que señalan
un posible punto de rotura (del inglés breakpoint) de inversión, además une
cada breakpoint correspondientes a inversiones independientes y refina lo
más exacto posible su localización. Su uso nos permitió predecir cientos de
inversiones. Un gran aporte de nuestro método es la creación de índices
(del inglés score) de fiabilidad para las predicciones mediante los cuales
identificamos patrones de inversión incorrectos y sus causas. Esto nos
permitió filtrar nuestro resultado eliminando un gran número de predicciones
posiblemente falsas. Además se creó "InvFEST", la primera base de
datos especialmente dedicada a inversiones polimórficas en el genoma humano
la cual representa el catálogo más fiable de inversiones, integrando
además a cada inversión conocida la información asociada disponible. Actualmente
InvFEST contiene (y mantiene la clasificación según el nivel de
certeza) un catálogo de 1092 inversiones clasificadas, a partir de datos de
30 estudios diferentes. Finalmente el análisis de toda la información generada
nos permitió describir algunos patrones de las inversiones polimórficas
en el genoma humano contribuyendo de este modo a la comprensión de
esta variante estructural y el estado de su información en los estudios del
genoma humano.Inversió genòmic
InvFEST, a database integrating information of polymorphic inversions in the human genome
The newest genomic advances have uncovered an unprecedented degree of structural variation throughout genomes, with great amounts of data accumulating rapidly. Here we introduce InvFEST (http://invfestdb.uab.cat), a database combining multiple sources of information to generate a complete catalogue of non-redundant human polymorphic inversions. Due to the complexity of this type of changes and the underlying high false-positive discovery rate, it is necessary to integrate all the available data to get a reliable estimate of the real number of inversions. InvFEST automatically merges predictions into different inversions, refines the breakpoint locations, and finds associations with genes and segmental duplications. In addition, it includes data on experimental validation, population frequency, functional effects and evolutionary history. All this information is readily accessible through a complete and user-friendly web report for each inversion. In its current version, InvFEST combines information from 34 different studies and contains 1092 candidate inversions, which are categorized based on internal scores and manual curation. Therefore, InvFEST aims to represent the most reliable set of human inversions and become a central repository to share information, guide future studies and contribute to the analysis of the functional and evolutionary impact of inversions on the human genome.This work was supported by The European Research Council under the European Union Seventh Research Framework Programme [Starting Grant 243212 (INVFEST) to M.C.]; Ministerio de Asuntos Exteriores y Cooperación (Spain) [MAECAECI doctoral fellowship to A.M.F.]; Ministerio de Ciencia e Innovación (Spain) [BFU2007-60930 to M.C. and BFU2009-09504 to A.B.]. Funding for open access/ncharge: European Research Counci
InvFEST, a database integrating information of polymorphic inversions in the human genome
The newest genomic advances have uncovered an unprecedented degree of structural variation throughout genomes, with great amounts of data accumulating rapidly. Here we introduce InvFEST (http://invfestdb.uab.cat), a database combining multiple sources of information to generate a complete catalogue of non-redundant human polymorphic inversions. Due to the complexity of this type of changes and the underlying high false-positive discovery rate, it is necessary to integrate all the available data to get a reliable estimate of the real number of inversions. InvFEST automatically merges predictions into different inversions, refines the breakpoint locations, and finds associations with genes and segmental duplications. In addition, it includes data on experimental validation, population frequency, functional effects and evolutionary history. All this information is readily accessible through a complete and user-friendly web report for each inversion. In its current version, InvFEST combines information from 34 different studies and contains 1092 candidate inversions, which are categorized based on internal scores and manual curation. Therefore, InvFEST aims to represent the most reliable set of human inversions and become a central repository to share information, guide future studies and contribute to the analysis of the functional and evolutionary impact of inversions on the human genome.This work was supported by The European Research Council under the European Union Seventh Research Framework Programme [Starting Grant 243212 (INVFEST) to M.C.]; Ministerio de Asuntos Exteriores y Cooperación (Spain) [MAECAECI doctoral fellowship to A.M.F.]; Ministerio de Ciencia e Innovación (Spain) [BFU2007-60930 to M.C. and BFU2009-09504 to A.B.]. Funding for open access/ncharge: European Research Counci
PeSV-fisher : identification of somatic and non-somatic structural variants using next generation sequencing data
Next-generation sequencing technologies expedited research to develop efficient computational tools for the identification of structural variants (SVs) and their use to study human diseases. As deeper data is obtained, the existence of higher complexity SVs in some genomes becomes more evident, but the detection and definition of most of these complex rearrangements is still in its infancy. The full characterization of SVs is a key aspect for discovering their biological implications. Here we present a pipeline (PeSV-Fisher) for the detection of deletions, gains, intra- and inter-chromosomal translocations, and inversions, at very reasonable computational costs. We further provide comprehensive information on co-localization of SVs in the genome, a crucial aspect for studying their biological consequences. The algorithm uses a combination of methods based on paired-reads and read-depth strategies. PeSV-Fisher has been designed with the aim to facilitate identification of somatic variation, and, as such, it is capable of analysing two or more samples simultaneously, producing a list of non-shared variants between samples. We tested PeSV-Fisher on available sequencing data, and compared its behaviour to that of frequently deployed tools (BreakDancer and VariationHunter). We have also tested this algorithm on our own sequencing data, obtained from a tumour and a normal blood sample of a patient with chronic lymphocytic leukaemia, on which we have also validated the results by targeted re-sequencing of different kinds of predictions. This allowed us to determine confidence parameters that influence the reliability of breakpoint predictions.Availability:PeSV-Fisher is available at http://gd.crg.eu/tools
InvFEST, a database integrating information of polymorphic inversions in the human genome
The newest genomic advances have uncovered an unprecedented degree of structural variation throughout genomes, with great amounts of data accumulating rapidly. Here we introduce InvFEST (), a database combining multiple sources of information to generate a complete catalogue of non-redundant human polymorphic inversions. Due to the complexity of this type of changes and the underlying high false-positive discovery rate, it is necessary to integrate all the available data to get a reliable estimate of the real number of inversions. InvFEST automatically merges predictions into different inversions, refines the breakpoint locations, and finds associations with genes and segmental duplications. In addition, it includes data on experimental validation, population frequency, functional effects and evolutionary history. All this information is readily accessible through a complete and user-friendly web report for each inversion. In its current version, InvFEST combines information from 34 different studies and contains 1092 candidate inversions, which are categorized based on internal scores and manual curation. Therefore, InvFEST aims to represent the most reliable set of human inversions and become a central repository to share information, guide future studies and contribute to the analysis of the functional and evolutionary impact of inversions on the human genome
PeSV-Fisher: identification of somatic and non-somatic structural variants using nextgeneration sequencing data
Next-generation sequencing technologies expedited research to develop efficient computational tools for the identification of structural variants (SVs) and their use to study human diseases. As deeper data is obtained, the existence of higher complexity SVs in some genomes becomes more evident, but the detection and definition of most of these complex rearrangements is still in its infancy. The full characterization of SVs is a key aspect for discovering their biological implications. Here we present a pipeline (PeSV-Fisher) for the detection of deletions, gains, intra- and inter-chromosomal translocations, and inversions, at very reasonable computational costs. We further provide comprehensive information on co-localization of SVs in the genome, a crucial aspect for studying their biological consequences. The algorithm uses a combination of methods based on paired-reads and read-depth strategies. PeSV-Fisher has been designed with the aim to facilitate identification of somatic variation, and, as such, it is capable of analysing two or more samples simultaneously, producing a list of non-shared variants between samples. We tested PeSV-Fisher on available sequencing data, and compared its behaviour to that of frequently deployed tools (BreakDancer and VariationHunter). We have also tested this algorithm on our own sequencing data, obtained from a tumour and a normal blood sample of a patient with chronic lymphocytic leukaemia, on which we have also validated the results by targeted re-sequencing of different kinds of predictions. This allowed us to determine confidence parameters that influence the reliability of breakpoint predictions.This work was supported by AGAUR (Generalitat de Catalunya, 2009 SGR 1502) (X.E.); CIBERESP (Instituto de Salud Carlos III) (G.E.); ESGI (European Commission, 262055_ESGI) (R.R., X.E.), ENGAGE (European Commission, ENGAGE_201413), TECHGENE (European Commission, TECHGENE_223143), and GEUVADIS (European Commission, 261123_GEUVADIS) (X.E.); NOVADIS (Ministerio de Ciencia y Technologia, SAF2008-00357) (X.E.); Galicia Government Xunta de Galicia (Spain) through the project number 10PXIB918057 (J.M.C.T.); MAEC-AEC1 Predoctoral Fellowship (Ministerio de Asuntos Exteriores y Cooperación, Spain) (A.M.F.); and Ramón y Cajal position and grant BFU2007-60930 (Ministerio de Educación y Ciencia) (M.C.)
Validation and Genotyping of Multiple Human Polymorphic Inversions Mediated by Inverted Repeats Reveals a High Degree of Recurrence
In recent years different types of structural variants (SVs) have been discovered in the human genome and their functional impact has become increasingly clear. Inversions, however, are poorly characterized and more difficult to study, especially those mediated by inverted repeats or segmental duplications. Here, we describe the results of a simple and fast inverse PCR (iPCR) protocol for high-throughput genotyping of a wide variety of inversions using a small amount of DNA. In particular, we analyzed 22 inversions predicted in humans ranging from 5.1 kb to 226 kb and mediated by inverted repeat sequences of 1.6-24 kb. First, we validated 17 of the 22 inversions in a panel of nine HapMap individuals from different populations, and we genotyped them in 68 additional individuals of European origin, with correct genetic transmission in ∼12 mother-father-child trios. Global inversion minor allele frequency varied between 1% and 49% and inversion genotypes were consistent with Hardy-Weinberg equilibrium. By analyzing the nucleotide variation and the haplotypes in these regions, we found that only four inversions have linked tag-SNPs and that in many cases there are multiple shared SNPs between standard and inverted chromosomes, suggesting an unexpected high degree of inversion recurrence during human evolution. iPCR was also used to check 16 of these inversions in four chimpanzees and two gorillas, and 10 showed both orientations either within or between species, providing additional support for their multiple origin. Finally, we have identified several inversions that include genes in the inverted or breakpoint regions, and at least one disrupts a potential coding gene. Thus, these results represent a significant advance in our understanding of inversion polymorphism in human populations and challenge the common view of a single origin of inversions, with important implications for inversion analysis in SNP-based studies
Validation and genotyping of multiple human polymorphic inversions mediated by inverted repeats reveals a high degree of recurrence
In recent years different types of structural variants (SVs) have been discovered in the human genome and their functional impact has become increasingly clear. Inversions, however, are poorly characterized and more difficult to study, especially those mediated by inverted repeats or segmental duplications. Here, we describe the results of a simple and fast inverse PCR (iPCR) protocol for high-throughput genotyping of a wide variety of inversions using a small amount of DNA. In particular, we analyzed 22 inversions predicted in humans ranging from 5.1 kb to 226 kb and mediated by inverted repeat sequences of 1.6-24 kb. First, we validated 17 of the 22 inversions in a panel of nine HapMap individuals from different populations, and we genotyped them in 68 additional individuals of European origin, with correct genetic transmission in ∼ 12 mother-father-child trios. Global inversion minor allele frequency varied between 1% and 49% and inversion genotypes were consistent with Hardy-Weinberg equilibrium. By analyzing the nucleotide variation and the haplotypes in these regions, we found that only four inversions have linked tag-SNPs and that in many cases there are multiple shared SNPs between standard and inverted chromosomes, suggesting an unexpected high degree of inversion recurrence during human evolution. iPCR was also used to check 16 of these inversions in four chimpanzees and two gorillas, and 10 showed both orientations either within or between species, providing additional support for their multiple origin. Finally, we have identified several inversions that include genes in the inverted or breakpoint regions, and at least one disrupts a potential coding gene. Thus, these results represent a significant advance in our understanding of inversion polymorphism in human populations and challenge the common view of a single origin of inversions, with important implications for inversion analysis in SNP-based studies.This work was supported by the European Research Council (ERC) Starting Grant 243212 (INVFEST) under the European Union Seventh Research Framework Programme (FP7) to MC, a FPI PhD fellowship from the Ministerio de Educación y Ciencia (Spain) to MO, a PIF PhD fellowship from the Universitat Autònoma de Barcelona (Spain) to CGD, a MAEC-AECI PhD fellowship from the Ministerio de Asuntos Exteriores y Cooperación (Spain) to AMF, and a research PRIC grant from the Barcelona Zoo (Ajuntament de Barcelona, Spain) to ARH