19 research outputs found
RUbioSeq+: A multiplatform application that executes parallelized pipelines to analyse next-generation sequencing data
This is the peer reviewed version of the following article: Computer Methods and Programs in Biomedine 138 (2016): 73-81, which has been published in final form at http://dx.doi.org/10.1016/j.cmpb.2016.10.008Background and objective To facilitate routine analysis and to improve the reproducibility of the results, next-generation sequencing (NGS) analysis requires intuitive, efficient and integrated data processing pipelines. Methods We have selected well-established software to construct a suite of automated and parallelized workflows to analyse NGS data for DNA-seq (single-nucleotide variants (SNVs) and indels), CNA-seq, bisulfite-seq and ChIP-seq experiments. Results Here, we present RUbioSeq+, an updated and extended version of RUbioSeq, a multiplatform application that incorporates a suite of automated and parallelized workflows to analyse NGS data. This new version includes: (i) an interactive graphical user interface (GUI) that facilitates its use by both biomedical researchers and bioinformaticians, (ii) a new pipeline for ChIP-seq experiments, (iii) pair-wise comparisons (case–control analyses) for DNA-seq experiments, (iv) and improvements in the parallelized and multithreaded execution options. Results generated by our software have been experimentally validated and accepted for publication. Conclusions RUbioSeq+ is free and open to all users at http://rubioseq.bioinfo.cnio.es/.M.R-C is funded by the BLUEPRINT Consortium (FP7/ 2007-2013) under grant agreement number 282510. J.M.F is funded by the INB Node 2 - CNIO, a member of Proteored - PRB2-ISCIII and is supported by grant PT13/0001, of the PE I+D+i 2013-2016, funded by ISCIII and FEDER. H.L-F is funded by a postdoctoral fellowship from the Xunta de Galicia. F.F-R and D.G-P are funded by the European Union's Seventh Framework Programme FP7/REGPOT 2012 2013.1 under grant agreement n° 316265 (BIOCAPS) and the "Platform of integration of intelligent techniques for analysis of biomedical information" project (TIN2013-47153-C3-3-R) financed by the Spanish Ministry of Economy and Competitiveness C.FT is funded by the "Spanish National Youth Guarantee Implementation Plan” (2013/2016) financed by the Spanish Ministry of Economy and Competitivenes
Analysis of Paired Primary-Metastatic Hormone-Receptor Positive Breast Tumors (HRPBC) Uncovers Potential Novel Drivers of Hormonal Resistance
We sought to identify genetic variants associated with disease relapse and failure to hormonal treatment in hormone-receptor positive breast cancer (HRPBC). We analyzed a series of HRPBC with distant relapse, by sequencing pairs (n = 11) of tumors (primary and metastases) at >800X. Comparative genomic hybridization was performed as well. Top hits, based on the frequency of alteration and severity of the changes, were tested in the TCGA series. Genes determining the most parsimonious prognostic signature were studied for their functional role in vitro, by performing cell growth assays in hormonal-deprivation conditions, a setting that mimics treatment with aromatase inhibitors. Severe alterations were recurrently found in 18 genes in the pairs. However, only MYC, DNAH5, CSFR1, EPHA7, ARID1B, and KMT2C preserved an independent prognosis impact and/or showed a significantly different incidence of alterations between relapsed and non-relapsed cases in the TCGA series. The signature composed of MYC, KMT2C, and EPHA7 best discriminated the clinical course, (overall survival 90,7 vs. 144,5 months; p = 0.0001). Having an alteration in any of the genes of the signature implied a hazard ratio of death of 3.25 (p<0.0001), and early relapse during the adjuvant hormonal treatment. The presence of the D348N mutation in KMT2C and/or the T666I mutation in the kinase domain of EPHA7 conferred hormonal resistance in vitro. Novel inactivating mutations in KMT2C and EPHA7, which confer hormonal resistance, are linked to adverse clinical course in HRPBC
3D chromatin connectivity underlies replication origin efficiency in mouse embryonic stem cells
In mammalian cells, chromosomal replication starts at thousands of origins at which replisomes are assembled. Replicative stress triggers additional initiation events from 'dormant' origins whose genomic distribution and regulation are not well understood. In this study, we have analyzed origin activity in mouse embryonic stem cells in the absence or presence of mild replicative stress induced by aphidicolin, a DNA polymerase inhibitor, or by deregulation of origin licensing factor CDC6. In both cases, we observe that the majority of stress-responsive origins are also active in a small fraction of the cell population in a normal S phase, and stress increases their frequency of activation. In a search for the molecular determinants of origin efficiency, we compared the genetic and epigenetic features of origins displaying different levels of activation, and integrated their genomic positions in three-dimensional chromatin interaction networks derived from high-depth Hi-C and promoter-capture Hi-C data. We report that origin efficiency is directly proportional to the proximity to transcriptional start sites and to the number of contacts established between origin-containing chromatin fragments, supporting the organization of origins in higher-level DNA replication factories.MCIN/AEI/10.13039/501100011033 [BFU2016-80402-R and PID2019-106707RB-100 to JM; BFU2016-78849-P and PID2019-105949GB-I00 to MG]; ‘ERDF A way of making Europe’; ‘CNIO Friends’ postdoctoral fellowship (to V.P.); Fondation Toulouse Cancer Santé and the Pierre Fabre Research Institute as part of the Chair of Bioinformatics in Oncology of the CRCT; CNIO-La Caixa predoctoral fellowships (to K.J., M.R.); Portuguese Foundation for Science and Technology [FCT-SFRH/BD/81027/2011 to R.A.]; Spanish Ministry of Science and Innovation [BES-2014–070050 to J.M.F.-J.]; Foundation for Polish Science co-financed by the European Union ERFD funds [TEAM/2016–3/30 to K.J.]; Polish National Science Centre [2020/37/B/NZ2/03757 to K.J.). Funding for open access charge: Spanish Ministry of Science and Innovation (PID2019-106707RB-100)
Comparación de secuencias genómicas e identificación de proteínas utilizando FPGAS
La comparación de cadenas es una parte importante de muchos programas y aplicaciones, en particular, es creciente su uso en el terreno de la biología y la investigación científica. Miles de secuencias provenientes de enormes bases de datos de contenido genético son diariamente comparadas con este motivo. Por ello, se hace necesaria la utilización de algoritmos rápidos, y no sólo eso, sino que sus resultados sean lo más fiables posible.
Los algoritmos existentes actualmente se basan en la búsqueda exacta, es decir, en comprobar si una cadena es igual a otra dada, o en la búsqueda inexacta, consistente
en hallar un coste o valoración que indicaría lo que una cadena difiere de otra. El algoritmo de Smith-Waterman pertenece a este segundo grupo y es el que hemos
elegido para implementar la comparación entre secuencias de ADN, dado que es el mejor dentro de los algoritmos de búsqueda inexacta.
Utilizando Smith-Waterman quedaría resuelto el problema de la fiabilidad, pero también es muy importante la velocidad, ya que cuanto más rápido se obtenga el resultado, el trabajo de los investigadores o programas también se acelerará y por lo tanto mejorará. Una solución software del algoritmo se obtendría aproximadamente en un tiempo N *M, siendo N y M las longitudes de las cadenas a comparar. Mientras que una solución hardware aprovechando el paralelismo que aportan arquitecturas como los arrays sistólicos podría obtenerla en N + M. Con lo cual, si las cadenas son largas, como es el caso de las secuencias de ADN, la mejora es enormemente visible. Por ello, para
implementar el sistema hemos elegido la opción hardware y para hacerlo utilizaremos FPGA’s.
[ABSTRACT]
String comparison is an important part of many programs and applications. Its use is especially growing in biology and scientific research. For this reason, thousands of sequences coming from enormous data bases with genetic contents, are compared daily. Therefore fast algorithms with reliable results are necessary.
The currently existent algorithms are based either on exact or on inexact search.
Exact search verifies if a string is equal to another given one, and inexact search consists in finding a cost or valuation which indicates the resemblance between two
strings. The Smith-Waterman algorithm is based on inexact search and is the one we have chosen to implement the comparison of DNA strings given it is the best choice for
inexact search.
By using Smith-Waterman the reliability problem is solved, but the speed is also very important due to the fact that the faster the result is obtained, the faster the work of
researchers and programs is done and therefore improves. A software solution of the algorithm could be obtained in approximately N*M, where N and M are the lengths of
the strings to compare. Meanwhile, a hardware solution could be obtained in N+M, taking advantage of the paralelism architectures, such as systolic arrays, offer. Therefore the improvement on large strings, for instance DNA sequences, is clearly visible.
Because of this, to implement the system we have chosen the hardware approach using FPGA’s
Analysis of Paired Primary-Metastatic Hormone-Receptor Positive Breast Tumors (HRPBC) Uncovers Potential Novel Drivers of Hormonal Resistance
We sought to identify genetic variants associated with disease relapse and failure to hormonal treatment in hormone-receptor positive breast cancer (HRPBC). We analyzed a series of HRPBC with distant relapse, by sequencing pairs (n = 11) of tumors (primary and metastases) at >800X. Comparative genomic hybridization was performed as well. Top hits, based on the frequency of alteration and severity of the changes, were tested in the TCGA series. Genes determining the most parsimonious prognostic signature were studied for their functional role in vitro, by performing cell growth assays in hormonal-deprivation conditions, a setting that mimics treatment with aromatase inhibitors. Severe alterations were recurrently found in 18 genes in the pairs. However, only MYC, DNAH5, CSFR1, EPHA7, ARID1B, and KMT2C preserved an independent prognosis impact and/or showed a significantly different incidence of alterations between relapsed and non-relapsed cases in the TCGA series. The signature composed of MYC, KMT2C, and EPHA7 best discriminated the clinical course, (overall survival 90,7 vs. 144,5 months; p = 0.0001). Having an alteration in any of the genes of the signature implied a hazard ratio of death of 3.25 (p<0.0001), and early relapse during the adjuvant hormonal treatment. The presence of the D348N mutation in KMT2C and/or the T666I mutation in the kinase domain of EPHA7 conferred hormonal resistance in vitro. Novel inactivating mutations in KMT2C and EPHA7, which confer hormonal resistance, are linked to adverse clinical course in HRPBC
3D chromatin connectivity underlies replication origin efficiency in mouse embryonic stem cells.
In mammalian cells, chromosomal replication starts at thousands of origins at which replisomes are assembled. Replicative stress triggers additional initiation events from 'dormant' origins whose genomic distribution and regulation are not well understood. In this study, we have analyzed origin activity in mouse embryonic stem cells in the absence or presence of mild replicative stress induced by aphidicolin, a DNA polymerase inhibitor, or by deregulation of origin licensing factor CDC6. In both cases, we observe that the majority of stress-responsive origins are also active in a small fraction of the cell population in a normal S phase, and stress increases their frequency of activation. In a search for the molecular determinants of origin efficiency, we compared the genetic and epigenetic features of origins displaying different levels of activation, and integrated their genomic positions in three-dimensional chromatin interaction networks derived from high-depth Hi-C and promoter-capture Hi-C data. We report that origin efficiency is directly proportional to the proximity to transcriptional start sites and to the number of contacts established between origin-containing chromatin fragments, supporting the organization of origins in higher-level DNA replication factories.MCIN/AEI/10.13039/501100011033 [BFU2016-80402-R and PID2019-106707RB-100 to JM; BFU2016-78849-P and PID2019-105949GB-I00 to MG]; `ERDF A way of making Europe'; 'CNIO Friends' postdoctoral fellowship (to V.P.); Fondation Toulouse Cancer Sante and the Pierre Fabre Research Institute as part of the Chair of Bioinformatics in Oncology of the CRCT; CNIO-La Caixa predoctoral fellowships (to K.J., M.R.); Portuguese Foundation for Science and Technology [FCT-SFRH/BD/81027/2011 to R.A.]; Spanish Ministry of Science and Innovation [BES-2014-070050 to J.M.F.-J.]; Foundation for Polish Science co-financed by the European Union ERFD funds [TEAM/2016-3/30 to K.J.]; Polish National Science Centre [2020/37/B/NZ2/03757 to K.J.). Funding for open access charge: Spanish Ministry of Science and Innovation (PID2019-106707RB-100).S
Analysis of Paired Primary-Metastatic Hormone-Receptor Positive Breast Tumors (HRPBC) Uncovers Potential Novel Drivers of Hormonal Resistance
<div><p>We sought to identify genetic variants associated with disease relapse and failure to hormonal treatment in hormone-receptor positive breast cancer (HRPBC). We analyzed a series of HRPBC with distant relapse, by sequencing pairs (n = 11) of tumors (primary and metastases) at >800X. Comparative genomic hybridization was performed as well. Top hits, based on the frequency of alteration and severity of the changes, were tested in the TCGA series. Genes determining the most parsimonious prognostic signature were studied for their functional role <i>in vitro</i>, by performing cell growth assays in hormonal-deprivation conditions, a setting that mimics treatment with aromatase inhibitors. Severe alterations were recurrently found in 18 genes in the pairs. However, only <i>MYC</i>, <i>DNAH5</i>, <i>CSFR1</i>, <i>EPHA7</i>, <i>ARID1B</i>, and <i>KMT2C</i> preserved an independent prognosis impact and/or showed a significantly different incidence of alterations between relapsed and non-relapsed cases in the TCGA series. The signature composed of <i>MYC</i>, <i>KMT2C</i>, and <i>EPHA7</i> best discriminated the clinical course, (overall survival 90,7 vs. 144,5 months; p = 0.0001). Having an alteration in any of the genes of the signature implied a hazard ratio of death of 3.25 (p<0.0001), and early relapse during the adjuvant hormonal treatment. The presence of the D348N mutation in <i>KMT2C</i> and/or the T666I mutation in the kinase domain of <i>EPHA7</i> conferred hormonal resistance <i>in vitro</i>. Novel inactivating mutations in <i>KMT2C</i> and <i>EPHA7</i>, which confer hormonal resistance, are linked to adverse clinical course in HRPBC.</p></div
Allelic Expansion of Variants Causing Severe Functional Protein Alterations from the Primary to the Metastatic Lesions.
<p>Allelic Expansion of Variants Causing Severe Functional Protein Alterations from the Primary to the Metastatic Lesions.</p
Clinical Characteristics and Detected Deleterious Mutations in Each Patient Pair.
<p>Clinical Characteristics and Detected Deleterious Mutations in Each Patient Pair.</p