Search CORE

1,003 research outputs found

Synapse at CAp 2017 NER challenge: Fasttext CRF

Author: Alexandra J. Weisberg (4234153)
Briana S. Bullington (4234156)
Eric R. Moore (4234150)
Jeff Chang (228277)
Kimberly H. Halsey (208487)
Yuan Jiang (296541)
Publication venue
Publication date: 01/01/2017
Field of study

We present our system for the CAp 2017 NER challenge which is about named entity recognition on French tweets. Our system leverages unsupervised learning on a larger dataset of French tweets to learn features feeding a CRF model. It was ranked first without using any gazetteer or structured external data, with an F-measure of 58.89\%. To the best of our knowledge, it is the first system to use fasttext embeddings (which include subword representations) and an embedding-based sentence representation for NER

arXiv.org e-Print Archive

Directory of Open Access Journals

FigShare

A draft genome sequence and functional screen reveals the repertoire of type III secreted proteins of Pseudomonas syringae pathovar tabaci 11528

Author: Chang Jeff H
Dangl Jeffery L
Ibanez Selena Gimenez
MacLean Daniel
Rathjen John P
Studholme David J
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background <it>Pseudomonas syringae </it>is a widespread bacterial pathogen that causes disease on a broad range of economically important plant species. Pathogenicity of <it>P. syringae </it>strains is dependent on the type III secretion system, which secretes a suite of up to about thirty virulence 'effector' proteins into the host cytoplasm where they subvert the eukaryotic cell physiology and disrupt host defences. <it>P. syringae </it>pathovar <it>tabaci </it>naturally causes disease on wild tobacco, the model member of the Solanaceae, a family that includes many crop species as well as on soybean. Results We used the 'next-generation' Illumina sequencing platform and the Velvet short-read assembly program to generate a 145X deep 6,077,921 nucleotide draft genome sequence for <it>P. syringae </it>pathovar <it>tabaci </it>strain 11528. From our draft assembly, we predicted 5,300 potential genes encoding proteins of at least 100 amino acids long, of which 303 (5.72%) had no significant sequence similarity to those encoded by the three previously fully sequenced <it>P. syringae </it>genomes. Of the core set of Hrp Outer Proteins that are conserved in three previously fully sequenced <it>P. syringae </it>strains, most were also conserved in strain 11528, including AvrE1, HopAH2, HopAJ2, HopAK1, HopAN1, HopI, HopJ1, HopX1, HrpK1 and HrpW1. However, the <it>hrpZ1 </it>gene is partially deleted and <it>hopAF1 </it>is completely absent in 11528. The draft genome of strain 11528 also encodes close homologues of HopO1, HopT1, HopAH1, HopR1, HopV1, HopAG1, HopAS1, HopAE1, HopAR1, HopF1, and HopW1 and a degenerate HopM1'. Using a functional screen, we confirmed that <it>hopO1, hopT1, hopAH1</it>, <it>hopM1'</it>, <it>hopAE1</it>, <it>hopAR1</it>, and <it>hopAI1' </it>are part of the virulence-associated HrpL regulon, though the <it>hopAI1' </it>and <it>hopM1' </it>sequences were degenerate with premature stop codons. We also discovered two additional HrpL-regulated effector candidates and an HrpL-regulated distant homologue of <it>avrPto1</it>. Conclusion The draft genome sequence facilitates the continued development of <it>P. syringae </it>pathovar <it>tabaci </it>on wild tobacco as an attractive model system for studying bacterial disease on plants. The catalogue of effectors sheds further light on the evolution of pathogenicity and host-specificity as well as providing a set of molecular tools for the study of plant defence mechanisms. We also discovered several large genomic regions in <it>Pta </it>11528 that do not share detectable nucleotide sequence similarity with previously sequenced <it>Pseudomonas </it>genomes. These regions may include horizontally acquired islands that possibly contribute to pathogenicity or epiphytic fitness of <it>Pta </it>11528.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Carolina Digital Repository

Open Research Exeter

The Australian National University

An improved, high-quality draft genome sequence of the Germination-Arrest Factor-producing Pseudomonas fluorescens WH6

Author: Armstrong Donald J
Banowetz Gary M
Chang Jeff H
Creason Allison L
Givan Scott A
Halgren Anne B
Kimbrel Jeffrey A
Mills Dallice I
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background <it>Pseudomonas fluorescens </it>is a genetically and physiologically diverse species of bacteria present in many habitats and in association with plants. This species of bacteria produces a large array of secondary metabolites with potential as natural products. <it>P. fluorescens </it>isolate WH6 produces Germination-Arrest Factor (GAF), a predicted small peptide or amino acid analog with herbicidal activity that specifically inhibits germination of seeds of graminaceous species. Results We used a hybrid next-generation sequencing approach to develop a high-quality draft genome sequence for <it>P. fluorescens </it>WH6. We employed automated, manual, and experimental methods to further improve the draft genome sequence. From this assembly of 6.27 megabases, we predicted 5876 genes, of which 3115 were core to <it>P. fluorescens </it>and 1567 were unique to WH6. Comparative genomic studies of WH6 revealed high similarity in synteny and orthology of genes with <it>P. fluorescens </it>SBW25. A phylogenomic study also placed WH6 in the same lineage as SBW25. In a previous non-saturating mutagenesis screen we identified two genes necessary for GAF activity in WH6. Mapping of their flanking sequences revealed genes that encode a candidate anti-sigma factor and an aminotransferase. Finally, we discovered several candidate virulence and host-association mechanisms, one of which appears to be a complete type III secretion system. Conclusions The improved high-quality draft genome sequence of WH6 contributes towards resolving the <it>P. fluorescens </it>species, providing additional impetus for establishing two separate lineages in <it>P. fluorescens</it>. Despite the high levels of orthology and synteny to SBW25, WH6 still had a substantial number of unique genes and represents another source for the discovery of genes with implications in affecting plant growth and health. Two genes are demonstrably necessary for GAF and further characterization of their proteins is important for developing natural products as control measure against grassy weeds. Finally, WH6 is the first isolate of <it>P. fluorescens </it>reported to encode a complete T3SS. This gives us the opportunity to explore the role of what has traditionally been thought of as a virulence mechanism for non-pathogenic interactions with plants.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Recommended from our members

Recurrent mutualism breakdown events in a legume rhizobia metapopulation.

Author: Al Moussawi Khadija
Chang Jeff H
Gano-Cohen Kelsey A
Quides Kenjiro W
Sachs Joel L
Stokes Peter J
Weisberg Alexandra J
Wendlandt Camille E
Publication venue: eScholarship, University of California
Publication date: 01/01/2020
Field of study

Bacterial mutualists generate major fitness benefits for eukaryotes, reshaping the host phenotype and its interactions with the environment. Yet, microbial mutualist populations are predicted to generate mutants that defect from providing costly services to hosts while maintaining the capacity to exploit host resources. Here, we examined the mutualist service of symbiotic nitrogen fixation in a metapopulation of root-nodulating Bradyrhizobium spp. that associate with the native legume Acmispon strigosus. We quantified mutualism traits of 85 Bradyrhizobium isolates gathered from a 700 km transect in California spanning 10 sampled A. strigosus populations. We clonally inoculated each Bradyrhizobium isolate onto A. strigosus hosts and quantified nodulation capacity and net effects of infection, including host growth and isotopic nitrogen concentration. Six Bradyrhizobium isolates from five populations were categorized as ineffective because they formed nodules but did not enhance host growth via nitrogen fixation. Six additional isolates from three populations failed to form root nodules. Phylogenetic reconstruction inferred two types of mutualism breakdown, including three to four independent losses of effectiveness and five losses of nodulation capacity on A. strigosus. The evolutionary and genomic drivers of these mutualism breakdown events remain poorly understood

eScholarship - University of California

Recommended from our members

Length Bias Correction in Gene Ontology Enrichment Analysis Using Logistic Regression

Author: Chang Jeff H.
Cumbie Jason S.
Di Yanming
Emerson Sarah
Mi Gu
Publication venue: 'Public Library of Science (PLoS)'
Publication date
Field of study

When assessing differential gene expression from RNA sequencing data, commonly used statistical tests tend to have greater power to detect differential expression of genes encoding longer transcripts. This phenomenon, called "length bias", will influence subsequent analyses such as Gene Ontology enrichment analysis. In the presence of length bias, Gene Ontology categories that include longer genes are more likely to be identified as enriched. These categories, however, are not necessarily biologically more relevant. We show that one can effectively adjust for length bias in Gene Ontology analysis by including transcript length as a covariate in a logistic regression model. The logistic regression model makes the statistical issue underlying length bias more transparent: transcript length becomes a confounding factor when it correlates with both the Gene Ontology membership and the significance of the differential expression test. The inclusion of the transcript length as a covariate allows one to investigate the direct correlation between the Gene Ontology membership and the significance of testing differential expression, conditional on the transcript length. We present both real and simulated data examples to show that the logistic regression approach is simple, effective, and flexible

ScholarsArchive@OSU

Recommended from our members

RNA-Seq analysis of resistant and susceptible potato varieties during the early stages of potato virus Y infection

Author: Buchanan Alex
Chang Jeff H.
Crosslin James M.
Goyer Aymeric
Hamlin Launa
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Background: Potato virus Y (PVY) is one of the most important plant viruses affecting potato production. The interactions between potato and PVY are complex and the outcome of the interactions depends on the potato genotype, the PVY strain, and the environmental conditions. A potato cultivar can induce resistance to a specific PVY strain, yet be susceptible to another. How a single potato cultivar responds to PVY in both compatible and incompatible interactions is not clear. Results: In this study, we used RNA-sequencing (RNA-Seq) to investigate and compare the transcriptional changes in leaves of potato upon inoculation with PVY. We used two potato varieties: Premier Russet, which is resistant to the PVY strain O (PVYᴼ) but susceptible to the strain NTN (PVYᴺᵀᴺ), and Russet Burbank, which is susceptible to all PVY strains that have been tested. Leaves were inoculated with PVYᴼ or PVYᴺᵀᴺ, and samples were collected 4 and 10 h post inoculation (hpi). A larger number of differentially expressed (DE) genes were found in the compatible reactions compared to the incompatible reaction. For all treatments, the majority of DE genes were down-regulated at 4 hpi and up-regulated at 10 hpi. Gene Ontology enrichment analysis showed enrichment of the biological process GO term “Photosynthesis, light harvesting” specifically in PVYᴼ-inoculated Premier Russet leaves, while the GO term “nucleosome assembly” was largely overrepresented in PVYᴺᵀᴺ-inoculated Premier Russet leaves and PVYᴼ-inoculated Russet Burbank leaves but not in PVYᴼ-inoculated Premier Russet leaves. Fewer genes were DE over 4-fold in the incompatible reaction compared to the compatible reactions. Amongst these, five genes were DE only in PVYᴼ-inoculated Premier Russet leaves, and all five were down-regulated. These genes are predicted to encode for a putative ABC transporter, a MYC2 transcription factor, a VQ-motif containing protein, a non-specific lipid-transfer protein, and a xyloglucan endotransglucosylase-hydroxylase. Conclusions: Our results show that the incompatible and compatible reactions in Premier Russet shared more similarities, in particular during the initial response, than the compatible reactions in the two different hosts. Our results identify potential key processes and genes that determine the fate of the reaction, compatible or incompatible, between PVY and its host

ScholarsArchive@OSU

A draft genome sequence and functional screen reveals the repertoire of type III secreted proteins of Pseudomonas syringae pathovar tabaci 11528 [Correction]

Author: Chang Jeff H
Dangl Jeffery L.
Ibanez Selena
MacLean Daniel
Rathjen John P.
Studholme David J
Publication venue: BioMed Central Ltd
Publication date: 01/12/2009
Field of study

Correction of previous articl

Carolina Digital Repository

Recommended from our members

The Rare Codon AGA Is Involved in Regulation of Pyoluteorin Biosynthesis in Pseudomonas protegens Pf-5

Author: Chang Jeff H.
Hesse Cedar
Kohen Max
Loper Joyce E.
Philmus Benjamin
Yan Qing
Publication venue: 'Frontiers Media SA'
Publication date
Field of study

The soil bacterium Pseudomonas protegens Pf-5 can colonize root and seed surfaces of many plants, protecting them from infection by plant pathogenic fungi and oomycetes. The capacity to suppress disease is attributed to Pf-5's production of a large spectrum of antibiotics, which is controlled by complex regulatory circuits operating at the transcriptional and post-transcriptional levels. In this study, we analyzed the genomic sequence of Pf-5 for codon usage patterns and observed that the six rarest codons in the genome are present in all seven known antibiotic biosynthesis gene clusters. In particular, there is an abundance of rare codons in pltR, which encodes a member of the LysR transcriptional regulator family that controls the expression of pyoluteorin biosynthetic genes. To test the hypothesis that rare codons in pltR influence pyoluteorin production, we generated a derivative of Pf-5 in which 23 types of rare codons in pltR were substituted with synonymous preferred codons. The resultant mutant produced pyoluteorin at levels 15 times higher than that of the wild-type Pf-5. Accordingly, the promoter activity of the pyoluteorin biosynthetic gene pltL was 20 times higher in the codon-modified stain than in the wild-type. pltR has six AGA codons, which is the rarest codon in the Pf-5 genome. Substitution of all six AGA codons with preferred Arg codons resulted in a variant of pltR that conferred increased pyoluteorin production and pltL promoter activity. Furthermore, overexpression of tRNAUCUArg, the cognate tRNA for the AGA codon, significantly increased pyoluteorin production by Pf-5. A bias in codon usage has been linked to the regulation of many phenotypes in eukaryotes and prokaryotes but, to our knowledge, this is the first example of the role of a rare codon in the regulation of antibiotic production by a Gram-negative bacterium.Keywords: Pseudomonas protegens, pyoluteorin, rare codon, AGA codon, regulatio

ScholarsArchive@OSU

Recommended from our members

Higher order asymptotics for negative binomial regression inferences from RNA-sequencing data

Author: Chang Jeff H.
Di Yanming
Emerson Sarah C.
Kimbrel Jeffrey A.
Schafer Daniel W.
Publication venue: 'Walter de Gruyter GmbH'
Publication date
Field of study

RNA sequencing (RNA-Seq) is the current method of choice for characterizing transcriptomes and quantifying gene expression changes. This next generation sequencing-based method provides unprecedented depth and resolution. The negative binomial (NB) probability distribution has been shown to be a useful model for frequencies of mapped RNA-Seq reads and consequently provides a basis for statistical analysis of gene expression. Negative binomial exact tests are available for two-group comparisons but do not extend to negative binomial regression analysis, which is important for examining gene expression as a function of explanatory variables and for adjusted group comparisons accounting for other factors. We address the adequacy of available large-sample tests for the small sample sizes typically available from RNA-Seq studies and consider a higher-order asymptotic (HOA) adjustment to likelihood ratio tests. We demonstrate that 1) the HOA-adjusted likelihood ratio test is practically indistinguishable from the exact test in situations where the exact test is available, 2) the type I error of the HOA test matches the nominal specification in regression settings we examined via simulation, and 3) the power of the likelihood ratio test does not appear to be affected by the HOA adjustment. This work helps clarify the accuracy of the unadjusted likelihood ratio test and the degree of improvement available with the HOA adjustment. Furthermore, the HOA test may be preferable even when the exact test is available because it does not require ad hoc library size adjustments.Keywords: Regression, RNA-Seq, Overdispersion, Extra- Poisson variation, Negative binomial, Higher-order asymptotic

ScholarsArchive@OSU

Recommended from our members

Alternative Splicing in the Obligate Biotrophic Oomycete Pathogen Pseudoperonospora cubensis

Author: Buchanan Alex
Burkhardt Alyssa
Chang Jeff H.
Cumbie Jason S.
Day Brad
Savory Elizabeth A.
Publication venue: American Phytopathological Society
Publication date
Field of study

Pseudoperonospora cubensis is an obligate pathogen and causative agent of cucurbit downy mildew. To help advance our understanding of the pathogenicity of P. cubensis, we used RNA-Seq to improve the quality of its reference genome sequence. We also characterized the RNA-Seq dataset to inventory transcript isoforms and infer alternative splicing during different stages of its development. Almost half of the original gene annotations were improved and nearly 4,000 previously unannotated genes were identified. We also demonstrated that approximately 24% of the expressed genome and nearly 55% of the intron-containing genes from P. cubensis had evidence for alternative splicing. Our analyses revealed that intron retention is the predominant alternative splicing type in P. cubensis, with alternative 5′- and alternative 3′-splice sites occurring at lower frequencies. Representatives of the newly identified genes and predicted alternatively spliced transcripts were experimentally validated. The results presented herein highlight the utility of RNA-Seq for improving draft genome annotations and, through this approach, we demonstrate that alternative splicing occurs more frequently than previously predicted. In total, the current study provides evidence that alternative splicing plays a key role in transcriptome regulation and proteome diversification in plantpathogenic oomycetes

ScholarsArchive@OSU