18 research outputs found

    Analysis of hypervariable DNA sequences by NGS technologies: QuasiFlow

    Get PDF
    The development of Next Generation Sequencing (NGS) technologies has allowed deep characterization of highly variable sequences such as viral or mitochondrial genomes. With respect to RNA and ssDNA viruses, their low replication fidelity generates viral populations consisting of complex mutant spectra termed viral quasispecies. Their study is of special interest as they can be considered a phenotypic reservoir1. Similarly, heteroplasmy of human mitochondrial genomes, in which different sequences are found within a single individual, might have important clinical consequences. For the analysis of the mutant spectrum of such hypervariable sequences from NGS data, we have developed QuasiFlow, a workflow designed in AutoFlow2 that uses Illumina reads. QuasiFlow provides information about DNA variability, such as SNPs, indels and recombination events (Figure 1). Furthermore, it allows haplotype reconstruction of viral quasispecies and characterization of its diversity through normalized Shannon index, nucleotide diversity and mutation networks. Quasiflow performs also a comparative study among samples, based on correlation, ANOVA and PCA analysis, in order to determine which parameters are affected by the experiment and how the samples behave according to their biological origin. In this work, we have applied QuasiFlow to analyze the population structure of the begomovirus Tomato yellow leaf curl virus (TYLCV) infectious clone inoculated in Arabidopsis thaliana plants, using HiSeq or MiSeq reads. Their analysis allowed detection of minor quasispecies variants with a frequency of 10-4 to 10-5 and reconstructed the haplotypes present in the sample. In addition, QuasiFlow was used to discover variants and recombinants in mixed infections of tomato plants. These results show the fast generation of recombinant genomes in geminivirus mixed infections and demonstrate the potential of QuasiFlow for the analysis of mutant spectra using Illumina MiSeq sequencing data. We have extended the use of QuasiFlow to the analysis of highly variable sequences such as the mitochondrial DNA. For that, we have analyzed DNA Illumina Miseq reads from 47 human mitochondrial samples from different cell lines obtained from the NCBI SRA database. Quasiflow generated automatically SNPs, SNP frequencies, indels and analyzed up to 23 variables using PCA analysis and performed an hierarchical clustering of the samples. Our analysis was able to detect pathological variants presented in a frequency lower than 1%.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech. This research was funded by Junta de Andalucía and EU through the ERDF 2014-2020, Projects P10-CVI-6075 to M. G.C. and P10-CVI-6561 to A.G-P

    Analysis of viral quasispecies by NGS technologies: quasiflow

    Get PDF
    We have developed QuasiFlow, a workflow designed in AutoFlow that takes advantage of NGS technologies to reconstruct quasispecies based in Illumina reads. QuasiFlow characterises and computes several key parameters, such as recombination events, SNPs, transitions, transversions, indels, quasispecies reconstruction, normalized Shannon index, nucleotide diversity and mutation networks. Moreover, it performs a comparative study of the samples comprising correlation, ANOVA and PCA analyses of the previously obtained virus population parameters. Using QuasiFlow we have analysed Illumina MiSeq reads from DNA samples obtained in mixed infections of ssDNA begomovirus in tomato plants amplified by rolling circle amplification. Further, we have extended the use of QuasiFlow to the analysis of the highly variable mitochondrial DNA. For that, we have used DNA Illumina MiSeq reads from 47 human mitochondrial samples from different cell lines obtained from the NCBI SRA databaseUniversidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    Integrating differential expression, co-expression and gene network analysis for the identification of common genes associated with tumor angiogenesis deregulation

    Get PDF
    Angiogenesis is essential for tumor growth and cancer metastasis. Identifying the molecular pathways involved in this process is the first step in the rational design of new therapeutic strategies to improve cancer treatment. In recent years, RNA-seq data analysis has helped to determine the genetic and molecular factors associated with different types of cancer. In this work we performed integrative analysis using RNA-seq data from human umbilical vein endothelial cells (HUVEC) and patients with angiogenesis-dependent diseases to find genes that serve as potential candidates to improve the prognosis of tumor angiogenesis deregulation and understand how this process is orchestrated at the genetic and molecular level. We downloaded four RNA-seq datasets (including cellular models of tumor angiogenesis and ischaemic heart disease) from the Sequence Read Archive. Our integrative analysis includes a first step to determine differentially and co-expressed genes. For this, we used the ExpHunter Suite, an R package that performs differential expression, co-expression and functional analysis of RNA-seq data. We used both differentially and co-expressed genes to explore the human gene interaction network and determine which genes were found in the different datasets that may be key for the angiogenesis deregulation. Finally, we performed drug repositioning analysis to find potential targets related to angiogenesis inhibition...This work was supported by the Spanish Ministry of Science, Innovation and Universities (grant PID2019-105010RB-I00, grant PID2019-108096RB-C21), the Andalusian Government and FEDER (grants UMA18-FEDERJA-102, UMA18-FEDERJA-220, PY20_00257, PY20_00372, RH-0079-2021 and funds from the group PAIDI BIO 267); the Spanish Ministry of Economy and Competitiveness (grant PID2019-108096RB-C21), the Institute of Health Carlos III (project IMPaCT-Data, exp. IMP/00019), co-funded by the European Union, European Regional Development Fund (ERDF, ‘‘A way to make Europe"); and the European Union (HORIZON-HLTH-2022-DISEASE-06, Project ID: 101080580) to JAGR. JRP holds a research grant from the Andalusian Government (Fundacion Progreso y Salud) [PI-0075-2017]. BM is awarded of the Ayudas para la formación del profesorado universitario (FPU18/00755, Ministerio de Universidades). The ‘‘CIBER de Enfermedades Raras’’ is an initiative from the ISCIII (Spain). The funders had no role in the study design, data collection and analysis, decision to publish or preparation of the manuscript. Funding for open access charge: Universidad de Málaga / CBU

    Assigning protein function from domain-function associations using DomFun

    Get PDF
    Background: Protein function prediction remains a key challenge. Domain composition affects protein function. Here we present DomFun, a Ruby gem that uses associations between protein domains and functions, calculated using multiple indices based on tripartite network analysis. These domain-function associations are combined at the protein level, to generate protein-function predictions. Results: We analysed 16 tripartite networks connecting homologous superfamily and FunFam domains from CATH-Gene3D with functional annotations from the three Gene Ontology (GO) sub-ontologies, KEGG, and Reactome. We validated the results using the CAFA 3 benchmark platform for GO annotation, finding that out of the multiple association metrics and domain datasets tested, Simpson index for FunFam domain-function associations combined with Stouffer’s method leads to the best performance in almost all scenarios. We also found that using FunFams led to better performance than superfamilies, and better results were found for GO molecular function compared to GO biological process terms. DomFun performed as well as the highest-performing method in certain CAFA 3 evaluation procedures in terms of Fmax and Smin We also implemented our own benchmark procedure, Pathway Prediction Performance (PPP), which can be used to validate function prediction for additional annotations sources, such as KEGG and Reactome. Using PPP, we found similar results to those found with CAFA 3 for GO, moreover we found good performance for the other annotation sources. As with CAFA 3, Simpson index with Stouffer’s method led to the top performance in almost all scenarios. Conclusions: DomFun shows competitive performance with other methods evaluated in CAFA 3 when predicting proteins function with GO, although results vary depending on the evaluation procedure. Through our own benchmark procedure, PPP, we have shown it can also make accurate predictions for KEGG and Reactome. It performs best when using FunFams, combining Simpson index derived domain-function associations using Stouffer’s method. The tool has been implemented so that it can be easily adapted to incorporate other protein features, such as domain data from other sources, amino acid k-mers and motifs. The DomFun Ruby gem is available from https://rubygems.org/gems/DomFun. Code maintained at https://github.com/ElenaRojano/DomFun. Validation procedure scripts can be found at https://github.com/ElenaRojano/DomFun_project

    Identificación del probiótico Shewanella putrefaciens Pdp11 mediante PCR a través del transposón único que interrumpe al gen que codifica para la proteína fenazina.

    Get PDF
    Shewanella putrefaciens Pdp11 es una cepa descrita como probiótico en peces de importancia acuícola. La secuenciación de su genoma ha permitido establecer comparaciones a nivel genómico con otras cepas patógenas pertenecientes al mismo género. Como parte del estudio del genoma de Pdp11 se han identificado la presencia de 6 transposones y su ausencia en 7 cepas de Shewanella.sp (Pérez Gómez Olivia et al., 2021). En este trabajo se plantea el uso del transposón que interrumpe la proteína PhzE, implicada en la biosíntesis de la fenazina, para la identificación especifica de SpPdp11. Así como, la puesta a punto de la PCR para determinar la sensibilidad de los cebadores en la identificación del probiótico. En los cultivos acuícola, Shewanella putrefaciens Pdp11 se administra a los peces mediante su dieta en concentraciones de 10^9 ufc/gr de pienso, este trabajo permitiría la futura identificación y cuantificación del probiótico en muestras intestinales, así como el estudio del potencial de colonización del mismo.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    Creación y uso de los flujos de trabajo bioinformáticos

    No full text
    En el año 2012 comencé mi doctorado en el Departamento de Biología Molecular y Bioquímica, aunque alojado en la Plataforma Andaluza de Bioinformática (PAB), emplazado en el edificio de Supercomputación y Bioinnovación sito en el Parque Tecnológico de Andalucía. En esta institución de la Universidad de Málaga se encuentran los recursos de supercomputación y de ultrasecuenciación (por aquel entonces equipada con un equipo Titanium FLX+ de ROCHE y a la que se ha añadido recientemente un NextSeq550 de Illumina). En agradecimiento a poder trabajar en semejante entorno, he ido poniendo a disposición de la PAB las herramientas que he desarrollado, sobre todo las que servían para automatizar y optimizar los protocolos usados en la manipulación de la información procedente de los experimentos de secuenciación. De ahí que el trabajo de mi tesis doctoral se haya encaminado a desarrollar las herramientas y realizar los estudios pertinentes para cubrir dicha tarea. Dada la naturaleza del trabajo a realizar, esta tesis presenta un carácter técnico y práctico, donde la mayoría de los estudios presentan nuevas herramientas o modificaciones de las mismas con el propósito de obtener conocimiento biológico útil para el investigador. De hecho, muchos de los trabajos aquí expuestos son fruto de la colaboración con otros grupos de investigación que necesitaban análisis bioinformáticos con sus datos de secuenciación. Así, la primera tarea que nos planteamos fue facilitar la gestión y la creación de flujos de trabajo automatizados. Para ello creamos una herramienta que, con una sintaxis sencilla, facilita a los usuarios con pocos conocimientos técnicos la posibilidad de crear su propios flujos de trabajo, lo cual garantiza la reproducibilidad de los resultados. Además, configura de forma automática gran parte de la gestión de recursos, con lo que descarga al usuario de una parte de este trabajo. Debido al uso intensivo de los flujos de trabajo y a la complejidad de la información manejada, es inevitable que los resultados de los análisis se acumulen al menos al mismos ritmo con el que se generan. La única manera de no verse desbordado por este problema es mantener organizados estos datos y ponerlos a disposición de manera intuitiva a la comunidad científica. Por ello, hemos puesto a punto dos sistemas de bases de datos, uno para información genómica y otro para información transcriptómica que se han usado en distintos proyectos de investigación, en algunos de los cuales he participado activamente, como en el transcriptoma reproductivo del olivo y de la haba, así como en el genoma de Symphonia, del pino marítimo y del garbanzo. De forma específica, para nuestros estudios de transcriptómica hemos desarrollado flujos de trabajo para construir transcriptomas de forma automática a partir de la información generada por las tecnologías de secuenciación 454 e Illumina. De hecho, esto ha dado lugar al desarrollo de una herramienta de anotación funcional y estructural para trabajar con secuencias de transcriptómica. Además, hemos realizado un estudio sobre herramientas de predicción de secuencias codificantes para poder identificar posibles secuencias específicas propias de un organismo de las que no existen evidencias en las bases de datos biológicas actuales. Por último, en cuanto a nuestros estudios de genómica, hemos ideado nuevas aproximaciones para genomas de organismos no modelo. Con estas aproximaciones se puede obtener la estructura exón-intrón de ciertos genes sin tener que secuenciar todo el genoma del organismo. Para ello hemos desarrollado una herramienta que es capaz de construir modelos de gen a partir de un conjunto de proteínas de referencia y el ensamblaje fragmentado del genoma de un organismo. Por otro lado, hemos desarrollado un flujo de trabajo para identificar y anotar genes en borradores de genomas

    Bioinformatics Prediction for Network-Based Integrative Multi-Omics Expression Data Analysis in Hirschsprung Disease

    No full text
    Hirschsprung’s disease (HSCR) is a rare developmental disorder in which enteric ganglia are missing along a portion of the intestine. HSCR has a complex inheritance, with RET as the major disease-causing gene. However, the pathogenesis of HSCR is still not completely understood. Therefore, we applied a computational approach based on multi-omics network characterization and clustering analysis for HSCR-related gene/miRNA identification and biomarker discovery. Protein–protein interaction (PPI) and miRNA–target interaction (MTI) networks were analyzed by DPClusO and BiClusO, respectively, and finally, the biomarker potential of miRNAs was computationally screened by miRNA-BD. In this study, a total of 55 significant gene–disease modules were identified, allowing us to propose 178 new HSCR candidate genes and two biological pathways. Moreover, we identified 12 key miRNAs with biomarker potential among 137 predicted HSCR-associated miRNAs. Functional analysis of new candidates showed that enrichment terms related to gene ontology (GO) and pathways were associated with HSCR. In conclusion, this approach has allowed us to decipher new clues of the etiopathogenesis of HSCR, although molecular experiments are further needed for clinical validations

    Development of genomic tools in a widespread tropical tree, Symphonia globulifera L.f. a new low-coverage draft genome, SNP and SSR markers

    No full text
    Population genetic studies in tropical plants are often challenging because of limited information on taxonomy, phylogenetic relationships and distribution ranges, scarce genomic information and logistic challenges in sampling. We describe a strategy to develop robust and widely applicable genetic markers based on a modest development of genomic resources in the ancient tropical tree species Symphonia globulifera L.f. (Clusiaceae), a keystone species in African and Neotropical rainforests. We provide the first low-coverage (11X) fragmented draft genome sequenced on an individual from Cameroon, covering 1.027 Gbp or 67.5% of the estimated genome size. Annotation of 565 scaffolds (7.57 Mbp) resulted in the prediction of 1046 putative genes (231 of them containing a complete open reading frame) and 1523 exact simple sequence repeats (SSRs, microsatellites). Aligning a published transcriptome of a French Guiana population against this draft genome produced 923 high-quality single nucleotide polymorphisms. We also preselected genic SSRs in silico that were conserved and polymorphic across a wide geographical range, thus reducing marker development tests on rare DNA samples. Of 23 SSRs tested, 19 amplified and 18 were successfully genotyped in four S. globulifera populations from South America (Brazil and French Guiana) and Africa (Cameroon and São Tomé island, FST = 0.34). Most loci showed only population-specific deviations from Hardy–Weinberg proportions, pointing to local population effects (e.g. null alleles). The described genomic resources are valuable for evolutionary studies in Symphonia and for comparative studies in plants. The methods are especially interesting for widespread tropical or endangered taxa with limited DNA availability. © 2016 John Wiley & Sons LtdSCOPUS: ar.jinfo:eu-repo/semantics/publishe

    Respuesta transcriptómica del intestino de Solea senegalensis tras la administración dietética del probiótico Shewanella putrefaciens Pdp11

    No full text
    El uso de probióticos en acuicultura mejora la salud y el bienestar de los animales, en el caso del probiótico SpPdp11 se han observado numerosos beneficios a nivel de inmunidad, estrés y microbiota intestinal. Sin embargo, no se ha realizado un análisis transcriptomico a nivel intestinal del efecto de este microorganismo en la dieta. Se realizó una extracción de ARN de una sección anterior y posterior del intestino Solea senegalensis y un siguiente análisis de RNA-seq. Los resultados mostraron en general una disminución de genes relacionados con la división celular en intestino anterior frente al aumento en el metabolismo de lípidos de la parte posterior. La inclusión de sPpd11 en la dieta también mostró una disminución de la oxidación en ácidos grasos, estrés oxidativo o respuesta inflamatoria en el intestino anterior de los peces de estudio.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    Data from: Development of genomic tools in a widespread tropical tree, Symphonia globulifera L.f.: a new low-coverage draft genome, SNP and SSR markers

    No full text
    Population genetic studies in tropical plants are often challenging because of limited information on taxonomy, phylogenetic relationships and distribution ranges, scarce genomic information and logistic challenges in sampling. We describe a strategy to develop robust and widely applicable genetic markers based on a modest development of genomic resources in the ancient tropical tree species Symphonia globulifera L.f. (Clusiaceae), a keystone species in African and Neotropical rainforests. We provide the first low-coverage (11X) fragmented draft genome sequenced on an individual from Cameroon, covering 1.027 Gbp or 67.5% of the estimated genome size. Annotation of 565 scaffolds (7.57 Mbp) resulted in the prediction of 1046 putative genes (231 of them containing a complete open reading frame) and 1523 exact simple sequence repeats (SSRs, microsatellites). Aligning a published transcriptome of a French Guiana population against this draft genome produced 923 high-quality single nucleotide polymorphisms. We also preselected genic SSRs in silico that were conserved and polymorphic across a wide geographical range, thus reducing marker development tests on rare DNA samples. Of 23 SSRs tested, 19 amplified and 18 were successfully genotyped in four S. globulifera populations from South America (Brazil and French Guiana) and Africa (Cameroon and São Tomé island, FST = 0.34). Most loci showed only population-specific deviations from Hardy–Weinberg proportions, pointing to local population effects (e.g. null alleles). The described genomic resources are valuable for evolutionary studies in Symphonia and for comparative studies in plants. The methods are especially interesting for widespread tropical or endangered taxa with limited DNA availability
    corecore