15 research outputs found

    Capacidade de nitrificação e desnitrificação da ETAR Norte-SIMRIA

    Get PDF
    Mestrado em Engenharia do AmbienteA análise do funcionamento de uma Estação de Tratamento de Águas Residuais (ETAR) é muito importante para conhecer a interação e gestão de todas as fases do processo de tratamento existentes nas ETAR e para perceber qual o tipo de gestão adequada. O estágio na ETAR Norte da SIMRIA ocorreu entre Março e Julho de 2013. O conhecimento dos principais órgãos da instalação e o modo como operam constituiu o primeiro objetivo do estágio, essencial para efetuar uma caraterização pormenorizada da ETAR ao qual se seguiu uma avaliação pormenorizada das condições em que a ETAR opera atualmente, através da compilação e análise dos dados respeitantes à operação no ano 2012. Através da comparação destes para o ano 2012 e 2013 foi possível compreender que a gestão de uma ETAR não se resume à manutenção de determinados parâmetros dentro de intervalos ideais sendo, por vezes e durante determinado período de tempo, impossível esta manutenção uma vez que o processo depende fortemente da qualidade do afluente. O desempenho da ETAR foi avaliado relativamente aos parâmetros constantes da Licença de Utilização dos Recursos Hídricos para a Rejeição de Águas Residuais e o funcionamento geral analisado face aos valores previstos no respetivo projeto de execução e face a gamas ideais presentes na bibliografia consultada. Para o dimensionamento do sistema para Nitrificação e Desnitrificação, o primeiro passo consistiu em validar um conjunto de frações de Matéria Orgânica e constantes cinéticas de forma a atingir um volume dimensionado igual, ou muito próximo, ao existente, no cenário atual de apenas remoção de Matéria Orgânica, para a partir daí realizar o dimensionamento do sistema para Nitrificação e Desnitrificação, de forma mais exata possível.The analysis of the operation of a Wastewater Treatment Plant (WWTP) is very important to understand the interaction and management of all phases of the treatment process existent in the WWTP and to realize what kind of management are indicated. The internship in ETAR Norte of SIMRIA occurred between March and July 2013. The knowledge of the principal organs of the ETAR Norte and how they work constituted the first major goal of the internship and it is essential to perform a detailed characterization of the WWTP. The avaluation of the conditions under which it currently operates, through the compilation and analysis of the operative data of the year 2012 was the first step in the design of the treatment system for Nitrification and Denitrification. With the analysis of the operative data of 2012 and 2013 it was possible to understand that the management of a wastewater treatment plant is not just keep the values in the ideal ranges. Sometimes, in a period of time, is impossible to maintain the values in the wanted range because the process depends heavily on the quality of the influent. The performance of the WWTP was evaluated for the parameters listed in Reject Wastwatwe License and general functioning analyzed against the predicted values on the appropriate project execution and face to parameters considered ideal in the bibliography. For the sizing system for Nitrification and Denitrification the first step was to validate a set of fractions of organic matter and kinetic constants in order to achieve an equal sized volume, or very close to, existing in the current scenario only organic matter removal, to thereafter perform the sizing system for Nitrification and Denitrification, most accurate way possible

    Preparation of well-dispersed chitosan/alginate hollow multilayered microcapsules for enhanced cellular internalization

    Get PDF
    Hollow multilayered capsules have shown massive potential for being used in the biomedical and biotechnology fields, in applications such as cellular internalization, intracellular trafficking, drug delivery, or tissue engineering. In particular, hollow microcapsules, developed by resorting to porous calcium carbonate sacrificial templates, natural-origin building blocks and the prominent Layer-by-Layer (LbL) technology, have attracted increasing attention owing to their key features. However, these microcapsules revealed a great tendency to aggregate, which represents a major hurdle when aiming for cellular internalization and intracellular therapeutics delivery. Herein, we report the preparation of well-dispersed polysaccharide-based hollow multilayered microcapsules by combining the LbL technique with an optimized purification process. Cationic chitosan (CHT) and anionic alginate (ALG) were chosen as the marine origin polysaccharides due to their biocompatibility and structural similarity to the extracellular matrices of living tissues. Moreover, the inexpensive and highly versatile LbL technology was used to fabricate core-shell microparticles and hollow multilayered microcapsules, with precise control over their composition and physicochemical properties, by repeating the alternate deposition of both materials. The microcapsules' synthesis procedure was optimized to extensively reduce their natural aggregation tendency, as shown by the morphological analysis monitored by advanced microscopy techniques. The well-dispersed microcapsules showed an enhanced uptake by fibroblasts, opening new perspectives for cellular internalization.publishe

    Pathogenic Escherichia coli, Salmonella spp. and Campylobacter spp. in Two Natural Conservation Centers of Wildlife in Portugal: Genotypic and Phenotypic Characterization

    Get PDF
    This article belongs to the Section Food Microbiology.Human–wildlife coexistence may increase the potential risk of direct transmission of emergent or re-emergent zoonotic pathogens to humans. Intending to assess the occurrence of three important foodborne pathogens in wild animals of two wildlife conservation centers in Portugal, we investigated 132 fecal samples for the presence of Escherichia coli (Shiga toxin-producing E. coli (STEC) and non-STEC), Salmonella spp. and Campylobacter spp. A genotypic search for genes having virulence and antimicrobial resistance (AMR) was performed by means of PCR and Whole-Genome Sequencing (WGS) and phenotypic (serotyping and AMR profiles) characterization. Overall, 62 samples tested positive for at least one of these species: 27.3% for STEC, 11.4% for non-STEC, 3.0% for Salmonella spp. and 6.8% for Campylobacter spp. AMR was detected in four E. coli isolates and the only Campylobacter coli isolated in this study. WGS analysis revealed that 57.7% (30/52) of pathogenic E. coli integrated genetic clusters of highly closely related isolates (often involving different animal species), supporting the circulation and transmission of different pathogenic E. coli strains in the studied areas. These results support the idea that the health of humans, animals and ecosystems are interconnected, reinforcing the importance of a One Health approach to better monitor and control public health threats.This work was supported by funding from the European Union’s Horizon 2020 Research and Innovation Programme under grant agreement No 773830: One Health European Joint Pro- gramme, as part of the DiSCoVeR project (Discovering the sources of Salmonella, Campylobacter, VTEC and Antimicrobial Resistance). S.R., R.C. and V.M. were beneficiaries of fellowships from the same Programme on behalf of ADONIS (S.R.), FedAMR (R.C.) and BeOne (V.M) projects.info:eu-repo/semantics/publishedVersio

    INNUENDO: A cross-sectoral platform for the integration of genomics in the surveillance of food-borne pathogens

    Get PDF
    Abstract In response to the EFSA call New approaches in identifying and characterizing microbial and chemical hazards, the project INNUENDO (https://sites.google.com/site/theinnuendoproject/) aimed to design an analytical platform and standard procedures for the use of whole-genome sequencing in surveillance and outbreak investigation of food-borne pathogens. The project firstly attempted to identify existing flaws and needs, and then to provide applicable cross-sectorial solutions. The project focused in developing a platform for small countries with limited economical and personnel resources. To achieve these goals, we applied a user-centered design strategy involving the end-users, such as microbiologists in public health and veterinary authorities, in every step of the design, development and implementation phases. As a result, we delivered the INNUENDO Platform V1.0 (https://innuendo.readthedocs.io/en/latest/), a stand-alone, portable, open-source, end-to-end system for the management, analysis, and sharing of bacterial genomic data. The platform uses Nextflow workflow manager to assemble analytical software modules in species-specific protocols that can be run using a user-friendly interface. The reproducibility of the process is ensured by using Docker containers and throught the annotation of the whole process using an ontology. Several modules, available at https://github.com/TheInnuendoProject, have been developed including: genome assembly and species confirmation; fast genome clustering; in silico typing; standardized species-specific phylogenetic frameworks for Campylobacter jejuni, Yersinia enterocolitica, Salmonella enterica and Escherichia coli based on an innovative gene-by-gene methodology; quality control measures from raw reads to allele calling; reporting system; a built-in communication protocols and a strain classification system enabling smooth communication during outbreak investigation. As proof-of-concepts, the proposed solutions have been thoroughly tested in simulated outbreak conditions by several public health and veterinary agencies across Europe. The results have been widely disseminated through several channels (web-sites, scientific publications, organization of workshops). The INNUENDO Platform V1.0 is effectively one of the models for the usage of open-source software in genomic epidemiology.Peer reviewe

    SARS-CoV-2 introductions and early dynamics of the epidemic in Portugal

    Get PDF
    Genomic surveillance of SARS-CoV-2 in Portugal was rapidly implemented by the National Institute of Health in the early stages of the COVID-19 epidemic, in collaboration with more than 50 laboratories distributed nationwide. Methods By applying recent phylodynamic models that allow integration of individual-based travel history, we reconstructed and characterized the spatio-temporal dynamics of SARSCoV-2 introductions and early dissemination in Portugal. Results We detected at least 277 independent SARS-CoV-2 introductions, mostly from European countries (namely the United Kingdom, Spain, France, Italy, and Switzerland), which were consistent with the countries with the highest connectivity with Portugal. Although most introductions were estimated to have occurred during early March 2020, it is likely that SARS-CoV-2 was silently circulating in Portugal throughout February, before the first cases were confirmed. Conclusions Here we conclude that the earlier implementation of measures could have minimized the number of introductions and subsequent virus expansion in Portugal. This study lays the foundation for genomic epidemiology of SARS-CoV-2 in Portugal, and highlights the need for systematic and geographically-representative genomic surveillance.We gratefully acknowledge to Sara Hill and Nuno Faria (University of Oxford) and Joshua Quick and Nick Loman (University of Birmingham) for kindly providing us with the initial sets of Artic Network primers for NGS; Rafael Mamede (MRamirez team, IMM, Lisbon) for developing and sharing a bioinformatics script for sequence curation (https://github.com/rfm-targa/BioinfUtils); Philippe Lemey (KU Leuven) for providing guidance on the implementation of the phylodynamic models; Joshua L. Cherry (National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health) for providing guidance with the subsampling strategies; and all authors, originating and submitting laboratories who have contributed genome data on GISAID (https://www.gisaid.org/) on which part of this research is based. The opinions expressed in this article are those of the authors and do not reflect the view of the National Institutes of Health, the Department of Health and Human Services, or the United States government. This study is co-funded by Fundação para a Ciência e Tecnologia and Agência de Investigação Clínica e Inovação Biomédica (234_596874175) on behalf of the Research 4 COVID-19 call. Some infrastructural resources used in this study come from the GenomePT project (POCI-01-0145-FEDER-022184), supported by COMPETE 2020 - Operational Programme for Competitiveness and Internationalisation (POCI), Lisboa Portugal Regional Operational Programme (Lisboa2020), Algarve Portugal Regional Operational Programme (CRESC Algarve2020), under the PORTUGAL 2020 Partnership Agreement, through the European Regional Development Fund (ERDF), and by Fundação para a Ciência e a Tecnologia (FCT).info:eu-repo/semantics/publishedVersio

    Análise da eficácia do uso terapêutico da trimetazidina nas principais síndromes coronarianas agudas

    Get PDF
    A injúria de reperfusão, ou isquemia, é um mecanismo fisiopatológico que ocorre nas etiopatogenias do miocárdio, como na angina instável. Assim, são utilizados fármacos adjuvantes citoprotetores, como a trimetazidina (TMZ), que visam à diminuição do tempo de hospitalização e melhora na função cardíaca com ação profilática contra essa lesão. No entanto, apesar dos seus potenciais benefícios no tratamento da síndrome coronariana aguda, ainda não está clara a sua eficácia em relação a outras terapias disponíveis. Nesse sentido, o objetivo do estudo é analisar a eficácia do uso terapêutico da trimetazidina nas principais síndromes coronarianas agudas. Foi realizada uma revisão sistemática usando as bases de dados PubMed, Cochrane Library e Embase. Um total de 3 estudos foi incluído na análise. Os resultados mostraram que a terapia com trimetazidina reduziu significativamente a incidência de eventos cardíacos adversos maiores (ECAM) (OR = 0,33, IC 95% 0,15-0,75, p = 0,007), menor dano miocárdico (p < 0,05) e fração de ejeção ventricular esquerda mais elevada e menos eventos adversos em comparação com o grupo placebo (p < 0,05). Não foram observadas diferenças significativas entre os grupos de trimetazidina e controle em termos de mortalidade por todas as causas, mortalidade cardiovascular ou incidência de eventos adversos. Os resultados deste estudo sugerem que a terapia adjuvante com trimetazidina pode melhorar os resultados clínicos e a função cardíaca em pacientes com IAM sem aumentar o risco de eventos adversos. No entanto, são necessários mais ensaios clínicos randomizados em larga escala para confirmar esses resultados e determinar a duração e dose ideais da terapia com trimetazidina nessa população de pacientes

    Bioinformatics study of expression from genomes of epidemiologically related MRSA CC398 isolates from human and wild animal samples

    No full text
    One of the most important livestock-associated methicillin-resistant Staphylococcus aureus (LA-MRSA) genetic lineages is the clonal complex (CC) 398, which can cause typical S. aureus-associated infections in people. In this work, whole-genome sequencing, RNA-sequencing, and gel-based comparative proteomics were applied to study the genetic characteristics of three MRSA CC398 isolates recovered from humans (strains C5621 and C9017), and from an animal (strain OR418). Of the three strains, C9017 presented the broadest resistance genotype, including resistance to fluroquinolone, clindamycin, tiamulin, macrolide and aminoglycoside antimicrobial classes. The scn, sak, and chp genes of the immune evasion cluster system were solely detected in OR418. Pangenome analysis showed a total of 288 strain-specific genes, most of which are hypothetical or phage-related proteins. OR418 had the most pronounced genetic differences. RNAIII (δ-hemolysin) gene was clearly the most expressed gene in OR418 and C5621, but it was not detected in C9017. Significant differences in the proteome profiles were found between strains. For example, the immunoglobulin-binding protein Sbi was more abundant in OR418. Considering that Sbi is a multifunctional immune evasion factor in S. aureus, the results point to OR418 strain having high zoonotic potential. Overall, multiomics biomarker signatures can assume an important role to advance precision medicine in the years to come. SIGNIFICANCE: MRSA is one of the most representative drug-resistant pathogens and its dissemination is increasing due to MRSA capability of establishing new reservoirs. LA-MRSA is considered an emerging problem worldwide and CC398 is one of the most important genetic lineages. In this study, three MRSA CC398 isolates recovered from humans and from a wild animal were analyzed through whole-genome sequencing, RNA-sequencing, and gel-based comparative proteomics in order to gather systems-wide omics data and better understand the genetic characteristics of this lineage to identify distinctive markers and genomic features of relevance to public health.This work was supported by the Associate Laboratory for Green Chemistry-LAQV, which is financed by national funds from FCT/MCTES (UIDB/50006/2020 and UIDP/50006/2020) and by the projects UIDB/CVT/00772/2020 and LA/P/0059/2020 funded by the Portuguese Foundation for Science and Technology (FCT). H. S. acknowledges the Associate Laboratory for Green Chemistry-LAQV (LA/P/0008/2020) funded by Fundação para a Ciência e a Tecnologia, I.P for his research contract. This work is a result of the GenomePT project (POCI-01-0145- FEDER-022184), supported by COMPETE 2020 - Operational Pro gramme for Competitiveness and Internationalization (POCI), Lisboa Portugal Regional Operational Programme (Lisboa2020), Algarve Portugal Regional Operational Programme (CRESC Algarve2020), under the PORTUGAL 2020 Partnership Agreement, through the European Regional Development Fund (ERDF), and by Fundação para a Ciência e a Tecnologia (FCT).info:eu-repo/semantics/publishedVersio

    Bioinformatics study of expression from genomes of epidemiologically related MRSA CC398 isolates from human and wild animal samples

    No full text
    One of the most important livestock-associated methicillin-resistant Staphylococcus aureus (LA-MRSA) genetic lineages is the clonal complex (CC) 398, which can cause typical S. aureus-associated infections in people. In this work, whole-genome sequencing, RNA-sequencing, and gel-based comparative proteomics were applied to study the genetic characteristics of three MRSA CC398 isolates recovered from humans (strains C5621 and C9017), and from an animal (strain OR418). Of the three strains, C9017 presented the broadest resistance genotype, including resistance to fluroquinolone, clindamycin, tiamulin, macrolide and aminoglycoside antimicrobial classes. The scn, sak, and chp genes of the immune evasion cluster system were solely detected in OR418. Pangenome analysis showed a total of 288 strain-specific genes, most of which are hypothetical or phage-related proteins. OR418 had the most pronounced genetic differences. RNAIII (δ-hemolysin) gene was clearly the most expressed gene in OR418 and C5621, but it was not detected in C9017. Significant differences in the proteome profiles were found between strains. For example, the immunoglobulin-binding protein Sbi was more abundant in OR418. Considering that Sbi is a multifunctional immune evasion factor in S. aureus, the results point to OR418 strain having high zoonotic potential. Overall, multiomics biomarker signatures can assume an important role to advance precision medicine in the years to come. SIGNIFICANCE: MRSA is one of the most representative drug-resistant pathogens and its dissemination is increasing due to MRSA capability of establishing new reservoirs. LA-MRSA is considered an emerging problem worldwide and CC398 is one of the most important genetic lineages. In this study, three MRSA CC398 isolates recovered from humans and from a wild animal were analyzed through whole-genome sequencing, RNA-sequencing, and gel-based comparative proteomics in order to gather systems-wide omics data and better understand the genetic characteristics of this lineage to identify distinctive markers and genomic features of relevance to public health.This work was supported by the Associate Laboratory for Green Chemistry-LAQV, which is financed by national funds from FCT/MCTES (UIDB/50006/2020 and UIDP/50006/2020) and by the projects UIDB/CVT/00772/2020 and LA/P/0059/2020 funded by the Portuguese Foundation for Science and Technology (FCT). H. S. acknowledges the Associate Laboratory for Green Chemistry-LAQV (LA/P/0008/2020) funded by Fundação para a Ciência e a Tecnologia, I.P for his research contract. This work is a result of the GenomePT project (POCI-01-0145- FEDER-022184), supported by COMPETE 2020 - Operational Pro gramme for Competitiveness and Internationalization (POCI), Lisboa Portugal Regional Operational Programme (Lisboa2020), Algarve Portugal Regional Operational Programme (CRESC Algarve2020), under the PORTUGAL 2020 Partnership Agreement, through the European Regional Development Fund (ERDF), and by Fundação para a Ciência e a Tecnologia (FCT).info:eu-repo/semantics/publishedVersio

    INNUENDO whole genome and core genome MLST schemas and datasets for Salmonella enterica

    No full text
    <p><strong>Dataset</strong></p> <p>As reference dataset, 4,307 public available draft or complete genome assemblies and available metadata of <em>Salmonella enterica</em> have been downloaded from public repositories (i.e. <a href="https://enterobase.warwick.ac.uk/">EnteroBase</a>, <a href="https://www.ncbi.nlm.nih.gov/">National Center for Biotechnology Information NCBI</a>and <a href="https://www.ebi.ac.uk/">The European Bioinformatics Institute EMBL-EBI</a>; accessed April 2017). The collection includes 1,465 <em>S.</em> Enteritidis, 2,442 <em>S.</em>Typhimurium, and 400 of other frequently isolated serovars in Europe. The dataset includes also 153 <em>S.</em>Typhimurium variant 4,[5],12:i:- collected from different Italian regions between 2012 and 2014 during a surveillance study and 129 <em>S.</em> Enteritidis belonging to the INNUENDO sequence dataset (<a href="https://www.ebi.ac.uk/ena/data/view/PRJEB27020">PRJEB27020</a>). The 282 additional genomes were assembled using <a href="https://github.com/B-UMMI/INNUca">INNUca v3.1</a>.</p> <p>File 'Metadata/Senterica_metadata.txt' contains metadata information for each strain including source classification, host taxa, year and country of isolation, serotype, classical pubMLST 7 genes ST classification, and source/method of the assembly. </p> <p>The directory 'Genomes' contains all the 4,589 assemblies of the strains listed in 'Metadata/Senterica_metadata.txt'. Please note that genomes marked as 'Enterobase' have been downloaded from Enterobase webpage http://enterobase.warwick.ac.uk.</p> <p><strong>Schema creation and validation</strong></p> <p>The wgMLST schema from <a href="https://enterobase.warwick.ac.uk/species/senterica/download_data">EnteroBase</a> have been downloaded and curated using <a href="https://github.com/B-UMMI/chewBBACA/wiki/1.-Schema-Creation"><em>chewBBACA AutoAlleleCDSCuration</em></a> for removing all alleles that are not coding sequences (CDS). The quality of the remain loci have been assessed using <a href="https://github.com/B-UMMI/chewBBACA/wiki/1.-Schema-Creation"><em>chewBBACA Schema Evaluation</em></a> and loci with single alleles, those with high length variability (i.e. if more than 1 allele is outside the mode +/- 0.05 size) and those present in less than 0.5% of the <em>Salmonella</em> genomes in <a href="https://enterobase.warwick.ac.uk/species/index/senterica">EnteroBase</a> at the date of the analysis (April 2017) have been removed. The wgMLST schema have been further curated, excluding all those loci detected as “Repeated Loci” and loci annotated as “non-informative paralogous hit (NIPH/ NIPHEM)” or “Allele Larger/ Smaller than length mode (ALM/ ASM)” by the <a href="https://github.com/B-UMMI/chewBBACA/wiki/2.-Allele-Calling"><em>chewBBACA Allele Calling</em></a> engine in more than 1% of a dataset composed by 4,589 <em>Salmonella</em> genomes.</p> <p>File 'Schemas/Senterica_wgMLST_ 8558_schema.tar.gz' contains the wgMLST schema formatted for chewBBACA and includes a total of  8,558 loci.</p> <p>File 'Schemas/Senterica_cgMLST_ 3255_listGenes.txt' contains the list of genes from the wgMLST schema which defines the cgMLST schema. The cgMLST schema consists of  3,255 loci and has been defined as the loci present in at least the 99% of the 4,589 <em>Salmonella</em> genomes. Genomes have no more than 2% of missing loci.</p> <p>File 'Allele_Profles/Senterica_wgMLST_alleleProfiles.tsv' contains the wgMLST allelic profile of the 4,589 <em>Salmonella</em> genomes of the dataset. Please note that missing loci follow the annotation of chewBBACA Allele Calling software.</p> <p>File 'Allele_Profles/Senterica_cgMLST_alleleProfiles.tsv' contains the cgMLST allelic profile of the 4,589 <em>Salmonella</em> genomes of the dataset. Please note that missing loci are indicated with a zero.</p> <p><strong>Additional citations</strong></p> <p>The schema are prepared to be used with <a href="https://github.com/B-UMMI/chewBBACA/wiki"><strong>chewBBACA</strong></a>. When using the schema in this repository please cite also:</p> <blockquote> <p>Silva M, Machado M, Silva D, Rossi M, Moran-Gilad J, Santos S, Ramirez M, Carriço J. chewBBACA: A complete suite for gene-by-gene schema creation and strain identification. 15/03/2018. M Gen 4(3): doi:10.1099/mgen.0.000166 <a href="http://mgen.microbiologyresearch.org/content/journal/mgen/10.1099/mgen.0.000166">http://mgen.microbiologyresearch.org/content/journal/mgen/10.1099/mgen.0.000166</a></p> </blockquote> <p><em>Salmonella enterica</em> schema is a derivation of EnteroBase <em>Salmonella </em><a href="http://enterobase.warwick.ac.uk/">EnteroBase</a> wgMLST schema. When using the schema in this repository please cite also:</p> <blockquote> <p>Alikhan N-F, Zhou Z, Sergeant MJ, Achtman M (2018) A genomic overview of the population structure of <em>Salmonella</em>. PLoS Genet 14 (4):e1007261. <a href="https://doi.org/10.1371/journal.pgen.1007261">https://doi.org/10.1371/journal.pgen.1007261</a></p> </blockquote

    INNUENDO whole genome and core genome MLST schemas and datasets for Campylobacter jejuni

    No full text
    <p><strong>Dataset</strong></p> <p>Raw reads deposited in the European Nucleotide Archive (ENA) or in the NCBI Sequence Read Archive (SRA) as <em>C. jejuni</em> were retrieved in April 2017. In total 5,691 genomes passed the INNUca v3.1 pipeline have been selected. Additionally, 566 raw reads previously published in <a href="https://www.ncbi.nlm.nih.gov/pubmed/27041390">Kovanen et al., 2016</a>, <a href="https://www.ncbi.nlm.nih.gov/pubmed/28348829">Llarena et al., 2016</a>, <a href="https://www.ncbi.nlm.nih.gov/pubmed/25232158">Kovanen et al., 2014</a>, <a href="https://www.ncbi.nlm.nih.gov/pubmed/24655229">Kovanen et al., 2014</a> and <a href="http://www.sciencedirect.com/science/article/pii/S0740002016310449?via=ihub">Gacia-Sanchez et a., 2017</a> were included. The database also includes 269 <em>C. jejuni</em> belonging to the INNUENDO Sequence Dataset (<a href="https://www.ebi.ac.uk/ena/data/view/PRJEB27020">PRJEB27020</a>). Genomes were assembled using <a href="https://github.com/INNUENDOCON/INNUca">INNUca v3.1 pipeline</a> and passed the QC. </p> <p>File 'Metadata/Cjejuni_metadata.txt' contains metadata information for each strain including country and year of isolation, source classification and taxa of the host, classical pubMLST 7 genes ST and CC classification. </p> <p>The directory 'Genomes' contains all the 6,526 INNUca V3.1 assemblies of the strains listed in 'Metadata/Cjejuni_metadata.txt'.</p> <p><strong>Schema creation and validation</strong></p> <p>Draft genome assemblies were annotated using Prokka and initial pangenome was defined using Roary. The <a href="https://github.com/B-UMMI/chewBBACA/wiki/1.-Schema-Creation"><em>chewBBACA CreateSchema.py</em></a> was used for creating a whole genome schema starting from roary pangenome. The schema was initially composed by 5,447 loci and has been populated with the 6,526 <em>C. jejuni</em> genomes. The quality of the loci has been assessed using <a href="https://github.com/B-UMMI/chewBBACA/wiki/1.-Schema-Creation"><em>chewBBACA Schema Evaluation</em></a>. Loci with single alleles and those with high length variability (i.e. if more than 1 allele is outside the mode +/- 0.05 size) have been removed. The wgMLST schema has been further curated, excluding all those loci detected as “Repeated Loci” and loci annotated as “non-informative paralogous hit (NIPH/ NIPHEM)” or “Allele Larger/ Smaller than length mode (ALM/ ASM)” by the <a href="https://github.com/B-UMMI/chewBBACA/wiki/2.-Allele-Calling"><em>chewBBACA Allele Calling</em></a> engine in more than 1% of the <em>C. jejuni</em> genomes dataset.</p> <p>File 'Schema/Cjejuni_wgMLST_2795_schema.tar.gz' contains the wgMLST schema formatted for chewBBACA and includes a total of 2,795 loci.</p> <p>File 'Schema/Cjejuni_cgMLST_678_listGenes.txt' contains the list of genes from the wgMLST schema which defines the cgMLST schema. The cgMLST schema consists of 678 loci and has been defined as the loci present in at least the 99.9% of the 6,526 <em>C. jejuni</em> genomes. Genomes have no more than 2% of missing loci.</p> <p>File 'Allele_Profles/Cjejuni_wgMLST_alleleProfiles.tsv' contains the wgMLST allelic profile of the 6,526 <em>C. jejuni</em> genomes of the dataset. Please note that missing loci follow the annotation of chewBBACA Allele Calling software.</p> <p>File 'Allele_Profles/Cjejuni_cgMLST_alleleProfiles.tsv' contains the cgMLST allelic profile of the 6,526 <em>C. jejuni</em> genomes of the dataset. Please note that missing loci are indicated with a zero.</p
    corecore