12 research outputs found

    Genômica comparativa de isolados de Leishmania infantum de Santa Catarina e Rio Grande do Norte

    No full text
    Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro de Ciências Biológicas, Programa de Pós-Graduação em Biotecnologia e Biociências, Florianópolis, 2021.Leishmania infantum é um protozoário unicelular flagelado e parasita obrigatório, que tem como reservatório humanos e canídeos. Este parasito é o agente causador de leishmaniose visceral nas Américas, considerada uma antropozoonose tropical negligenciada. A doença acomete órgãos linfoides como medula óssea, fígado e baço, podendo causar sintomas como febre, perda de peso, hepatoesplenomegalia e anemia. As ciências ômicas podem atuar em estudos para elucidar mecanismos referentes à biologia de parasitos ou para identificar regiões que apresentam, ou não, similaridade entre diferentes organismos. No contexto de similaridade se enquadram estudos de genômica comparativa, que possibilitam identificar, por exemplo, regiões compartilhadas, genes e proteínas essenciais presentes no genoma de organismos de uma mesma espécie. O objetivo deste trabalho foi realizar a montagem, anotação funcional e análise comparativa a partir de dados de sequenciamento de segunda geração de amostras de L. infantum coletadas em Santa Catarina e obtidas de bancos de dados públicos para o Rio Grande do Norte. Ambas as etapas de montagem e anotação foram realizadas utilizando dados públicos obtidos do TriTrypDB da cepa JPCM5 de L. infantum como referência. As montagens preliminares foram realizadas pelo programa SPAdes e, então, refinadas pelos programas SSPACE e GapFiller, para ordenação de scaffolds e preenchimento de gaps, respectivamente. A ordenação de scaffolds em cromossomos foi realizada pelo programa SAMtools a partir de alinhamentos contra a referência e refinado pelos programas BWA-MEM e Pilon. A predição gênica foi realizada pelo preditor AUGUSTUS e a anotação funcional pela pipeline AnnotaPipeline. As montagens finais apresentaram média de 32,6 Mb totais e 2.790,49 bases não identificadas distribuídas em 36 scaffolds contíguos, o que representa 99,5% do tamanho do genoma de referência. As montagens apresentaram média de 8.695 proteínas preditas e anotadas, com média de 2.983 proteínas anotadas como hipotéticas e sem anotação funcional. A partir do resultado destas montagens e predições, o programa OrthoFinder foi utilizado para realizar a análise de ortologia entre todas as proteínas anotadas, mas o perfil gênico se mostrou conservado e sem proteínas específicas, que garantem vantagem evolutiva, para uma única amostra. A anotação funcional com base em termos ontológicos permitiu identificar processo biológicos clássicos para as amostras de Santa Catarina, mesmo sem a presença do vetor clássico no município. Polimorfismos de base única foram detectados pelo programa FreeBayes e seus impactos foram preditos pelo programa SnpEff. Polimorfismos não-sinônimos de alto impacto não foram frequentes e apareceram em poucas amostras. Por fim, este trabalho permitiu uma análise comparativa de diversos genomas altamente sintênicos com base em montagem e anotação de genomas, gerando informações funcionais que podem atuar em conjunto com estudos mais tradicionais de identificação e perfil de variantes.Abstract: Leishmania infantum is an intracellular parasite that infects mammalian hosts and sandflies. It is the main etiological agent for visceral leishmaniasis, which is considered a neglected tropical disease, in the Americas. Visceral leishmaniasis affects mainly lymphoid organs and manifests in hepatosplenomegaly, weight loss, fever and anaemia. Omics refers to a subfield in bioinformatics focused on biological sequences which can be employed in studies for biological insights. Comparative genomics can involve sequence similarity for genomic discoveries, such as syntenic regions and gene/protein discovery within a species. We employed comparative genomics in assembled and annotated genomes, generated from short reads, obtained in Santa Catarina and from public databases. Both genome assembly and annotation used the current reference genome from TriTrypDB, Leishmania infantum JPCM5. Draft assemblies were generated by SPAdes, and submitted to SSPACE and GapFiller for scaffolding/gap filling. Scaffolds were aligned to the reference genome and merged into polished chromosomes through BWA-MEM and Pilon. AUGUSTUS was used for gene prediction and AnnotaPipeline obtained functional annotations. Final assemblies presented 36 scaffolds, and an average of 32.6 Mb and 2,790.29 unidentified nucleotides. Gene predictions presented and average of 8,695 annotated proteins with 2,983 hypothetical proteins without functional annotations. Functional annotations, based on ontology, allowed the identification of classical biological processes despite the absence of classical vectors in Santa Catarina. Annotated proteins were submitted to OrthoFinder for orthology inference, which resulted in conserved gene profiles throughout all samples. Single nucleotide polymorphisms were were detected by FreeBayes and annotated by SnpEff for potential impacts. High impact non-synonymous mutations were not frequent and not dispersed in multiple samples. In conclusion, comparative genomics resulted in highly syntenic genomes and functional annotations that could be incorporated in traditional studies involving variant analysis

    Table2_AnnotaPipeline: An integrated tool to annotate eukaryotic proteins using multi-omics data.docx

    No full text
    Assignment of gene function has been a crucial, laborious, and time-consuming step in genomics. Due to a variety of sequencing platforms that generates increasing amounts of data, manual annotation is no longer feasible. Thus, the need for an integrated, automated pipeline allowing the use of experimental data towards validation of in silico prediction of gene function is of utmost relevance. Here, we present a computational workflow named AnnotaPipeline that integrates distinct software and data types on a proteogenomic approach to annotate and validate predicted features in genomic sequences. Based on FASTA (i) nucleotide or (ii) protein sequences or (iii) structural annotation files (GFF3), users can input FASTQ RNA-seq data, MS/MS data from mzXML or similar formats, as the pipeline uses both transcriptomic and proteomic information to corroborate annotations and validate gene prediction, providing transcription and expression evidence for functional annotation. Reannotation of the available Arabidopsis thaliana, Caenorhabditis elegans, Candida albicans, Trypanosoma cruzi, and Trypanosoma rangeli genomes was performed using the AnnotaPipeline, resulting in a higher proportion of annotated proteins and a reduced proportion of hypothetical proteins when compared to the annotations publicly available for these organisms. AnnotaPipeline is a Unix-based pipeline developed using Python and is available at: https://github.com/bioinformatics-ufsc/AnnotaPipeline.</p

    Table1_AnnotaPipeline: An integrated tool to annotate eukaryotic proteins using multi-omics data.XLSX

    No full text
    Assignment of gene function has been a crucial, laborious, and time-consuming step in genomics. Due to a variety of sequencing platforms that generates increasing amounts of data, manual annotation is no longer feasible. Thus, the need for an integrated, automated pipeline allowing the use of experimental data towards validation of in silico prediction of gene function is of utmost relevance. Here, we present a computational workflow named AnnotaPipeline that integrates distinct software and data types on a proteogenomic approach to annotate and validate predicted features in genomic sequences. Based on FASTA (i) nucleotide or (ii) protein sequences or (iii) structural annotation files (GFF3), users can input FASTQ RNA-seq data, MS/MS data from mzXML or similar formats, as the pipeline uses both transcriptomic and proteomic information to corroborate annotations and validate gene prediction, providing transcription and expression evidence for functional annotation. Reannotation of the available Arabidopsis thaliana, Caenorhabditis elegans, Candida albicans, Trypanosoma cruzi, and Trypanosoma rangeli genomes was performed using the AnnotaPipeline, resulting in a higher proportion of annotated proteins and a reduced proportion of hypothetical proteins when compared to the annotations publicly available for these organisms. AnnotaPipeline is a Unix-based pipeline developed using Python and is available at: https://github.com/bioinformatics-ufsc/AnnotaPipeline.</p

    Genomic Surveillance of SARS-CoV-2 in Healthcare Workers: A Critical Sentinel Group for Monitoring the SARS-CoV-2 Variant Shift

    No full text
    SARS-CoV-2 genome surveillance is important for monitoring risk groups and health workers as well as data on new cases and mortality rate due to COVID-19. We characterized the circulation of SARS-CoV-2 variants from May 2021 to April 2022 in the state of Santa Catarina, southern Brazil, and evaluated the similarity between variants present in the population and healthcare workers (HCW). A total of 5291 sequenced genomes demonstrated the circulation of 55 strains and four variants of concern (Alpha, Delta, Gamma and Omicron—sublineages BA.1 and BA.2). The number of cases was relatively low in May 2021, but the number of deaths was higher with the Gamma variant. There was a significant increase in both numbers between December 2021 and February 2022, peaking in mid-January 2022, when the Omicron variant dominated. After May 2021, two distinct variant groups (Delta and Omicron) were observed, equally distributed among the five Santa Catarina mesoregions. Moreover, from November 2021 to February 2022, similar variant profiles between HCW and the general population were observed, and a quicker shift from Delta to Omicron in HCW than in the general population. This demonstrates the importance of HCW as a sentinel group for monitoring disease trends in the general population

    Emergence of Two Distinct SARS-CoV-2 Gamma Variants and the Rapid Spread of P.1-like-II SARS-CoV-2 during the Second Wave of COVID-19 in Santa Catarina, Southern Brazil

    No full text
    The western mesoregion of the state of Santa Catarina (SC), Southern Brazil, was heavily affected as a whole by the COVID-19 pandemic in early 2021. This study aimed to evaluate the dynamics of the SARS-CoV-2 virus spreading patterns in the SC state from March 2020 to April 2021 using genomic surveillance. During this period, there were 23 distinct variants, including Beta and Gamma, among which the Gamma and related lineages were predominant in the second pandemic wave within SC. A regionalization of P.1-like-II in the Western SC region was observed, concomitant to the increase in cases, mortality, and the case fatality rate (CFR) index. This is the first evidence of the regionalization of the SARS-CoV-2 transmission in SC and it highlights the importance of tracking the variants, dispersion, and impact of SARS-CoV-2 on the public health systems

    ILC Reference Design Report Volume 1 - Executive Summary

    No full text
    The International Linear Collider (ILC) is a 200-500 GeV center-of-mass high-luminosity linear electron-positron collider, based on 1.3 GHz superconducting radio-frequency (SCRF) accelerating cavities. The ILC has a total footprint of about 31 km and is designed for a peak luminosity of 2x10^34 cm^-2s^-1. This report is the Executive Summary (Volume I) of the four volume Reference Design Report. It gives an overview of the physics at the ILC, the accelerator design and value estimate, the detector concepts, and the next steps towards project realization.The International Linear Collider (ILC) is a 200-500 GeV center-of-mass high-luminosity linear electron-positron collider, based on 1.3 GHz superconducting radio-frequency (SCRF) accelerating cavities. The ILC has a total footprint of about 31 km and is designed for a peak luminosity of 2x10^34 cm^-2s^-1. This report is the Executive Summary (Volume I) of the four volume Reference Design Report. It gives an overview of the physics at the ILC, the accelerator design and value estimate, the detector concepts, and the next steps towards project realization

    ILC Reference Design Report Volume 4 - Detectors

    No full text
    This report, Volume IV of the International Linear Collider Reference Design Report, describes the detectors which will record and measure the charged and neutral particles produced in the ILC's high energy e+e- collisions. The physics of the ILC, and the environment of the machine-detector interface, pose new challenges for detector design. Several conceptual designs for the detector promise the needed performance, and ongoing detector R&D is addressing the outstanding technological issues. Two such detectors, operating in push-pull mode, perfectly instrument the ILC interaction region, and access the full potential of ILC physics.This report, Volume IV of the International Linear Collider Reference Design Report, describes the detectors which will record and measure the charged and neutral particles produced in the ILC's high energy e+e- collisions. The physics of the ILC, and the environment of the machine-detector interface, pose new challenges for detector design. Several conceptual designs for the detector promise the needed performance, and ongoing detector R&D is addressing the outstanding technological issues. Two such detectors, operating in push-pull mode, perfectly instrument the ILC interaction region, and access the full potential of ILC physics
    corecore