1,003 research outputs found

    Synapse at CAp 2017 NER challenge: Fasttext CRF

    Full text link
    We present our system for the CAp 2017 NER challenge which is about named entity recognition on French tweets. Our system leverages unsupervised learning on a larger dataset of French tweets to learn features feeding a CRF model. It was ranked first without using any gazetteer or structured external data, with an F-measure of 58.89\%. To the best of our knowledge, it is the first system to use fasttext embeddings (which include subword representations) and an embedding-based sentence representation for NER

    A draft genome sequence and functional screen reveals the repertoire of type III secreted proteins of Pseudomonas syringae pathovar tabaci 11528

    Get PDF
    <p>Abstract</p> <p>Background</p> <p><it>Pseudomonas syringae </it>is a widespread bacterial pathogen that causes disease on a broad range of economically important plant species. Pathogenicity of <it>P. syringae </it>strains is dependent on the type III secretion system, which secretes a suite of up to about thirty virulence 'effector' proteins into the host cytoplasm where they subvert the eukaryotic cell physiology and disrupt host defences. <it>P. syringae </it>pathovar <it>tabaci </it>naturally causes disease on wild tobacco, the model member of the Solanaceae, a family that includes many crop species as well as on soybean.</p> <p>Results</p> <p>We used the 'next-generation' Illumina sequencing platform and the Velvet short-read assembly program to generate a 145X deep 6,077,921 nucleotide draft genome sequence for <it>P. syringae </it>pathovar <it>tabaci </it>strain 11528. From our draft assembly, we predicted 5,300 potential genes encoding proteins of at least 100 amino acids long, of which 303 (5.72%) had no significant sequence similarity to those encoded by the three previously fully sequenced <it>P. syringae </it>genomes. Of the core set of Hrp Outer Proteins that are conserved in three previously fully sequenced <it>P. syringae </it>strains, most were also conserved in strain 11528, including AvrE1, HopAH2, HopAJ2, HopAK1, HopAN1, HopI, HopJ1, HopX1, HrpK1 and HrpW1. However, the <it>hrpZ1 </it>gene is partially deleted and <it>hopAF1 </it>is completely absent in 11528. The draft genome of strain 11528 also encodes close homologues of HopO1, HopT1, HopAH1, HopR1, HopV1, HopAG1, HopAS1, HopAE1, HopAR1, HopF1, and HopW1 and a degenerate HopM1'. Using a functional screen, we confirmed that <it>hopO1, hopT1, hopAH1</it>, <it>hopM1'</it>, <it>hopAE1</it>, <it>hopAR1</it>, and <it>hopAI1' </it>are part of the virulence-associated HrpL regulon, though the <it>hopAI1' </it>and <it>hopM1' </it>sequences were degenerate with premature stop codons. We also discovered two additional HrpL-regulated effector candidates and an HrpL-regulated distant homologue of <it>avrPto1</it>.</p> <p>Conclusion</p> <p>The draft genome sequence facilitates the continued development of <it>P. syringae </it>pathovar <it>tabaci </it>on wild tobacco as an attractive model system for studying bacterial disease on plants. The catalogue of effectors sheds further light on the evolution of pathogenicity and host-specificity as well as providing a set of molecular tools for the study of plant defence mechanisms. We also discovered several large genomic regions in <it>Pta </it>11528 that do not share detectable nucleotide sequence similarity with previously sequenced <it>Pseudomonas </it>genomes. These regions may include horizontally acquired islands that possibly contribute to pathogenicity or epiphytic fitness of <it>Pta </it>11528.</p

    An improved, high-quality draft genome sequence of the Germination-Arrest Factor-producing Pseudomonas fluorescens WH6

    Get PDF
    <p>Abstract</p> <p>Background</p> <p><it>Pseudomonas fluorescens </it>is a genetically and physiologically diverse species of bacteria present in many habitats and in association with plants. This species of bacteria produces a large array of secondary metabolites with potential as natural products. <it>P. fluorescens </it>isolate WH6 produces Germination-Arrest Factor (GAF), a predicted small peptide or amino acid analog with herbicidal activity that specifically inhibits germination of seeds of graminaceous species.</p> <p>Results</p> <p>We used a hybrid next-generation sequencing approach to develop a high-quality draft genome sequence for <it>P. fluorescens </it>WH6. We employed automated, manual, and experimental methods to further improve the draft genome sequence. From this assembly of 6.27 megabases, we predicted 5876 genes, of which 3115 were core to <it>P. fluorescens </it>and 1567 were unique to WH6. Comparative genomic studies of WH6 revealed high similarity in synteny and orthology of genes with <it>P. fluorescens </it>SBW25. A phylogenomic study also placed WH6 in the same lineage as SBW25. In a previous non-saturating mutagenesis screen we identified two genes necessary for GAF activity in WH6. Mapping of their flanking sequences revealed genes that encode a candidate anti-sigma factor and an aminotransferase. Finally, we discovered several candidate virulence and host-association mechanisms, one of which appears to be a complete type III secretion system.</p> <p>Conclusions</p> <p>The improved high-quality draft genome sequence of WH6 contributes towards resolving the <it>P. fluorescens </it>species, providing additional impetus for establishing two separate lineages in <it>P. fluorescens</it>. Despite the high levels of orthology and synteny to SBW25, WH6 still had a substantial number of unique genes and represents another source for the discovery of genes with implications in affecting plant growth and health. Two genes are demonstrably necessary for GAF and further characterization of their proteins is important for developing natural products as control measure against grassy weeds. Finally, WH6 is the first isolate of <it>P. fluorescens </it>reported to encode a complete T3SS. This gives us the opportunity to explore the role of what has traditionally been thought of as a virulence mechanism for non-pathogenic interactions with plants.</p
    corecore