1,003 research outputs found
Synapse at CAp 2017 NER challenge: Fasttext CRF
We present our system for the CAp 2017 NER challenge which is about named
entity recognition on French tweets. Our system leverages unsupervised learning
on a larger dataset of French tweets to learn features feeding a CRF model. It
was ranked first without using any gazetteer or structured external data, with
an F-measure of 58.89\%. To the best of our knowledge, it is the first system
to use fasttext embeddings (which include subword representations) and an
embedding-based sentence representation for NER
A draft genome sequence and functional screen reveals the repertoire of type III secreted proteins of Pseudomonas syringae pathovar tabaci 11528
<p>Abstract</p> <p>Background</p> <p><it>Pseudomonas syringae </it>is a widespread bacterial pathogen that causes disease on a broad range of economically important plant species. Pathogenicity of <it>P. syringae </it>strains is dependent on the type III secretion system, which secretes a suite of up to about thirty virulence 'effector' proteins into the host cytoplasm where they subvert the eukaryotic cell physiology and disrupt host defences. <it>P. syringae </it>pathovar <it>tabaci </it>naturally causes disease on wild tobacco, the model member of the Solanaceae, a family that includes many crop species as well as on soybean.</p> <p>Results</p> <p>We used the 'next-generation' Illumina sequencing platform and the Velvet short-read assembly program to generate a 145X deep 6,077,921 nucleotide draft genome sequence for <it>P. syringae </it>pathovar <it>tabaci </it>strain 11528. From our draft assembly, we predicted 5,300 potential genes encoding proteins of at least 100 amino acids long, of which 303 (5.72%) had no significant sequence similarity to those encoded by the three previously fully sequenced <it>P. syringae </it>genomes. Of the core set of Hrp Outer Proteins that are conserved in three previously fully sequenced <it>P. syringae </it>strains, most were also conserved in strain 11528, including AvrE1, HopAH2, HopAJ2, HopAK1, HopAN1, HopI, HopJ1, HopX1, HrpK1 and HrpW1. However, the <it>hrpZ1 </it>gene is partially deleted and <it>hopAF1 </it>is completely absent in 11528. The draft genome of strain 11528 also encodes close homologues of HopO1, HopT1, HopAH1, HopR1, HopV1, HopAG1, HopAS1, HopAE1, HopAR1, HopF1, and HopW1 and a degenerate HopM1'. Using a functional screen, we confirmed that <it>hopO1, hopT1, hopAH1</it>, <it>hopM1'</it>, <it>hopAE1</it>, <it>hopAR1</it>, and <it>hopAI1' </it>are part of the virulence-associated HrpL regulon, though the <it>hopAI1' </it>and <it>hopM1' </it>sequences were degenerate with premature stop codons. We also discovered two additional HrpL-regulated effector candidates and an HrpL-regulated distant homologue of <it>avrPto1</it>.</p> <p>Conclusion</p> <p>The draft genome sequence facilitates the continued development of <it>P. syringae </it>pathovar <it>tabaci </it>on wild tobacco as an attractive model system for studying bacterial disease on plants. The catalogue of effectors sheds further light on the evolution of pathogenicity and host-specificity as well as providing a set of molecular tools for the study of plant defence mechanisms. We also discovered several large genomic regions in <it>Pta </it>11528 that do not share detectable nucleotide sequence similarity with previously sequenced <it>Pseudomonas </it>genomes. These regions may include horizontally acquired islands that possibly contribute to pathogenicity or epiphytic fitness of <it>Pta </it>11528.</p
An improved, high-quality draft genome sequence of the Germination-Arrest Factor-producing Pseudomonas fluorescens WH6
<p>Abstract</p> <p>Background</p> <p><it>Pseudomonas fluorescens </it>is a genetically and physiologically diverse species of bacteria present in many habitats and in association with plants. This species of bacteria produces a large array of secondary metabolites with potential as natural products. <it>P. fluorescens </it>isolate WH6 produces Germination-Arrest Factor (GAF), a predicted small peptide or amino acid analog with herbicidal activity that specifically inhibits germination of seeds of graminaceous species.</p> <p>Results</p> <p>We used a hybrid next-generation sequencing approach to develop a high-quality draft genome sequence for <it>P. fluorescens </it>WH6. We employed automated, manual, and experimental methods to further improve the draft genome sequence. From this assembly of 6.27 megabases, we predicted 5876 genes, of which 3115 were core to <it>P. fluorescens </it>and 1567 were unique to WH6. Comparative genomic studies of WH6 revealed high similarity in synteny and orthology of genes with <it>P. fluorescens </it>SBW25. A phylogenomic study also placed WH6 in the same lineage as SBW25. In a previous non-saturating mutagenesis screen we identified two genes necessary for GAF activity in WH6. Mapping of their flanking sequences revealed genes that encode a candidate anti-sigma factor and an aminotransferase. Finally, we discovered several candidate virulence and host-association mechanisms, one of which appears to be a complete type III secretion system.</p> <p>Conclusions</p> <p>The improved high-quality draft genome sequence of WH6 contributes towards resolving the <it>P. fluorescens </it>species, providing additional impetus for establishing two separate lineages in <it>P. fluorescens</it>. Despite the high levels of orthology and synteny to SBW25, WH6 still had a substantial number of unique genes and represents another source for the discovery of genes with implications in affecting plant growth and health. Two genes are demonstrably necessary for GAF and further characterization of their proteins is important for developing natural products as control measure against grassy weeds. Finally, WH6 is the first isolate of <it>P. fluorescens </it>reported to encode a complete T3SS. This gives us the opportunity to explore the role of what has traditionally been thought of as a virulence mechanism for non-pathogenic interactions with plants.</p
Recommended from our members
Recurrent mutualism breakdown events in a legume rhizobia metapopulation.
Bacterial mutualists generate major fitness benefits for eukaryotes, reshaping the host phenotype and its interactions with the environment. Yet, microbial mutualist populations are predicted to generate mutants that defect from providing costly services to hosts while maintaining the capacity to exploit host resources. Here, we examined the mutualist service of symbiotic nitrogen fixation in a metapopulation of root-nodulating Bradyrhizobium spp. that associate with the native legume Acmispon strigosus. We quantified mutualism traits of 85 Bradyrhizobium isolates gathered from a 700 km transect in California spanning 10 sampled A. strigosus populations. We clonally inoculated each Bradyrhizobium isolate onto A. strigosus hosts and quantified nodulation capacity and net effects of infection, including host growth and isotopic nitrogen concentration. Six Bradyrhizobium isolates from five populations were categorized as ineffective because they formed nodules but did not enhance host growth via nitrogen fixation. Six additional isolates from three populations failed to form root nodules. Phylogenetic reconstruction inferred two types of mutualism breakdown, including three to four independent losses of effectiveness and five losses of nodulation capacity on A. strigosus. The evolutionary and genomic drivers of these mutualism breakdown events remain poorly understood
Recommended from our members
Length Bias Correction in Gene Ontology Enrichment Analysis Using Logistic Regression
When assessing differential gene expression from RNA sequencing data, commonly used statistical tests tend to have greater power to detect differential expression of genes encoding longer transcripts. This phenomenon, called "length bias", will influence subsequent analyses such as Gene Ontology enrichment analysis. In the presence of length bias, Gene Ontology categories that include longer genes are more likely to be identified as enriched. These categories, however, are not necessarily biologically more relevant. We show that one can effectively adjust for length bias in Gene Ontology analysis by including transcript length as a covariate in a logistic regression model. The logistic regression model makes the statistical issue underlying length bias more transparent: transcript length becomes a confounding factor when it correlates with both the Gene Ontology membership and the significance of the differential expression test. The inclusion of the transcript length as a covariate allows one to investigate the direct correlation between the Gene Ontology membership and the significance of testing differential expression, conditional on the transcript length. We present both real and simulated data examples to show that the logistic regression approach is simple, effective, and flexible
Recommended from our members
RNA-Seq analysis of resistant and susceptible potato varieties during the early stages of potato virus Y infection
Background: Potato virus Y (PVY) is one of the most important plant viruses affecting potato production. The
interactions between potato and PVY are complex and the outcome of the interactions depends on the potato
genotype, the PVY strain, and the environmental conditions. A potato cultivar can induce resistance to a specific
PVY strain, yet be susceptible to another. How a single potato cultivar responds to PVY in both compatible and
incompatible interactions is not clear.
Results: In this study, we used RNA-sequencing (RNA-Seq) to investigate and compare the transcriptional changes
in leaves of potato upon inoculation with PVY. We used two potato varieties: Premier Russet, which is resistant to
the PVY strain O (PVYᴼ) but susceptible to the strain NTN (PVYᴺᵀᴺ), and Russet Burbank, which is susceptible to all
PVY strains that have been tested. Leaves were inoculated with PVYᴼ or PVYᴺᵀᴺ, and samples were collected 4 and
10 h post inoculation (hpi). A larger number of differentially expressed (DE) genes were found in the compatible
reactions compared to the incompatible reaction. For all treatments, the majority of DE genes were down-regulated
at 4 hpi and up-regulated at 10 hpi. Gene Ontology enrichment analysis showed enrichment of the biological
process GO term “Photosynthesis, light harvesting” specifically in PVYᴼ-inoculated Premier Russet leaves, while
the GO term “nucleosome assembly” was largely overrepresented in PVYᴺᵀᴺ-inoculated Premier Russet leaves and
PVYᴼ-inoculated Russet Burbank leaves but not in PVYᴼ-inoculated Premier Russet leaves. Fewer genes were DE
over 4-fold in the incompatible reaction compared to the compatible reactions. Amongst these, five genes were
DE only in PVYᴼ-inoculated Premier Russet leaves, and all five were down-regulated. These genes are predicted to
encode for a putative ABC transporter, a MYC2 transcription factor, a VQ-motif containing protein, a non-specific
lipid-transfer protein, and a xyloglucan endotransglucosylase-hydroxylase.
Conclusions: Our results show that the incompatible and compatible reactions in Premier Russet shared more
similarities, in particular during the initial response, than the compatible reactions in the two different hosts. Our
results identify potential key processes and genes that determine the fate of the reaction, compatible or
incompatible, between PVY and its host
A draft genome sequence and functional screen reveals the repertoire of type III secreted proteins of Pseudomonas syringae pathovar tabaci 11528 [Correction]
Correction of previous articl
Recommended from our members
The Rare Codon AGA Is Involved in Regulation of Pyoluteorin Biosynthesis in Pseudomonas protegens Pf-5
The soil bacterium Pseudomonas protegens Pf-5 can colonize root and seed surfaces of many plants, protecting them from infection by plant pathogenic fungi and oomycetes. The capacity to suppress disease is attributed to Pf-5's production of a large spectrum of antibiotics, which is controlled by complex regulatory circuits operating at the transcriptional and post-transcriptional levels. In this study, we analyzed the genomic sequence of Pf-5 for codon usage patterns and observed that the six rarest codons in the genome are present in all seven known antibiotic biosynthesis gene clusters. In particular, there is an abundance of rare codons in pltR, which encodes a member of the LysR transcriptional regulator family that controls the expression of pyoluteorin biosynthetic genes. To test the hypothesis that rare codons in pltR influence pyoluteorin production, we generated a derivative of Pf-5 in which 23 types of rare codons in pltR were substituted with synonymous preferred codons. The resultant mutant produced pyoluteorin at levels 15 times higher than that of the wild-type Pf-5. Accordingly, the promoter activity of the pyoluteorin biosynthetic gene pltL was 20 times higher in the codon-modified stain than in the wild-type. pltR has six AGA codons, which is the rarest codon in the Pf-5 genome. Substitution of all six AGA codons with preferred Arg codons resulted in a variant of pltR that conferred increased pyoluteorin production and pltL promoter activity. Furthermore, overexpression of tRNAUCUArg, the cognate tRNA for the AGA codon, significantly increased pyoluteorin production by Pf-5. A bias in codon usage has been linked to the regulation of many phenotypes in eukaryotes and prokaryotes but, to our knowledge, this is the first example of the role of a rare codon in the regulation of antibiotic production by a Gram-negative bacterium.Keywords: Pseudomonas protegens, pyoluteorin, rare codon, AGA codon, regulatio
Recommended from our members
Higher order asymptotics for negative binomial regression inferences from RNA-sequencing data
RNA sequencing (RNA-Seq) is the current method of choice for characterizing transcriptomes and
quantifying gene expression changes. This next generation sequencing-based method provides unprecedented
depth and resolution. The negative binomial (NB) probability distribution has been shown to be a
useful model for frequencies of mapped RNA-Seq reads and consequently provides a basis for statistical analysis
of gene expression. Negative binomial exact tests are available for two-group comparisons but do not
extend to negative binomial regression analysis, which is important for examining gene expression as a function
of explanatory variables and for adjusted group comparisons accounting for other factors. We address
the adequacy of available large-sample tests for the small sample sizes typically available from RNA-Seq
studies and consider a higher-order asymptotic (HOA) adjustment to likelihood ratio tests. We demonstrate
that 1) the HOA-adjusted likelihood ratio test is practically indistinguishable from the exact test in situations
where the exact test is available, 2) the type I error of the HOA test matches the nominal specification in
regression settings we examined via simulation, and 3) the power of the likelihood ratio test does not appear
to be affected by the HOA adjustment. This work helps clarify the accuracy of the unadjusted likelihood ratio
test and the degree of improvement available with the HOA adjustment. Furthermore, the HOA test may be
preferable even when the exact test is available because it does not require ad hoc library size adjustments.Keywords: Regression, RNA-Seq, Overdispersion, Extra- Poisson variation, Negative binomial, Higher-order asymptotic
Recommended from our members
Alternative Splicing in the Obligate Biotrophic Oomycete Pathogen Pseudoperonospora cubensis
Pseudoperonospora cubensis is an obligate pathogen and
causative agent of cucurbit downy mildew. To help advance
our understanding of the pathogenicity of P. cubensis, we
used RNA-Seq to improve the quality of its reference
genome sequence. We also characterized the RNA-Seq
dataset to inventory transcript isoforms and infer alternative
splicing during different stages of its development. Almost
half of the original gene annotations were improved
and nearly 4,000 previously unannotated genes were identified.
We also demonstrated that approximately 24% of
the expressed genome and nearly 55% of the intron-containing
genes from P. cubensis had evidence for alternative
splicing. Our analyses revealed that intron retention is the
predominant alternative splicing type in P. cubensis, with
alternative 5′- and alternative 3′-splice sites occurring at
lower frequencies. Representatives of the newly identified
genes and predicted alternatively spliced transcripts were
experimentally validated. The results presented herein
highlight the utility of RNA-Seq for improving draft genome
annotations and, through this approach, we demonstrate
that alternative splicing occurs more frequently than
previously predicted. In total, the current study provides
evidence that alternative splicing plays a key role in transcriptome
regulation and proteome diversification in plantpathogenic
oomycetes
- …