17,775 research outputs found

    Letter to Sound Rules for Accented Lexicon Compression

    Get PDF
    This paper presents trainable methods for generating letter to sound rules from a given lexicon for use in pronouncing out-of-vocabulary words and as a method for lexicon compression. As the relationship between a string of letters and a string of phonemes representing its pronunciation for many languages is not trivial, we discuss two alignment procedures, one fully automatic and one hand-seeded which produce reasonable alignments of letters to phones. Top Down Induction Tree models are trained on the aligned entries. We show how combined phoneme/stress prediction is better than separate prediction processes, and still better when including in the model the last phonemes transcribed and part of speech information. For the lexicons we have tested, our models have a word accuracy (including stress) of 78% for OALD, 62% for CMU and 94% for BRULEX. The extremely high scores on the training sets allow substantial size reductions (more than 1/20). WWW site: http://tcts.fpms.ac.be/synthesis/mbrdicoComment: 4 pages 1 figur

    A Seeded Genetic Algorithm for RNA Secondary Structural Prediction with Pseudoknots

    Get PDF
    This work explores a new approach in using genetic algorithm to predict RNA secondary structures with pseudoknots. Since only a small portion of most RNA structures is comprised of pseudoknots, the majority of structural elements from an optimal pseudoknot-free structure are likely to be part of the true structure. Thus seeding the genetic algorithm with optimal pseudoknot-free structures will more likely lead it to the true structure than a randomly generated population. The genetic algorithm uses the known energy models with an additional augmentation to allow complex pseudoknots. The nearest-neighbor energy model is used in conjunction with Turner’s thermodynamic parameters for pseudoknot-free structures, and the H-type pseudoknot energy estimation for simple pseudoknots. Testing with known pseudoknot sequences from PseudoBase shows that it out performs some of the current popular algorithms

    Parallel Treebanks in Phrase-Based Statistical Machine Translation

    Get PDF
    Given much recent discussion and the shift in focus of the field, it is becoming apparent that the incorporation of syntax is the way forward for the current state-of-the-art in machine translation (MT). Parallel treebanks are a relatively recent innovation and appear to be ideal candidates for MT training material. However, until recently there has been no other means to build them than by hand. In this paper, we describe how we make use of new tools to automatically build a large parallel treebank and extract a set of linguistically motivated phrase pairs from it. We show that adding these phrase pairs to the translation model of a baseline phrase-based statistical MT (PBSMT) system leads to significant improvements in translation quality. We describe further experiments on incorporating parallel treebank information into PBSMT, such as word alignments. We investigate the conditions under which the incorporation of parallel treebank data performs optimally. Finally, we discuss the potential of parallel treebanks in other paradigms of MT

    Seeded Graph Matching via Large Neighborhood Statistics

    Full text link
    We study a well known noisy model of the graph isomorphism problem. In this model, the goal is to perfectly recover the vertex correspondence between two edge-correlated Erd\H{o}s-R\'{e}nyi random graphs, with an initial seed set of correctly matched vertex pairs revealed as side information. For seeded problems, our result provides a significant improvement over previously known results. We show that it is possible to achieve the information-theoretic limit of graph sparsity in time polynomial in the number of vertices nn. Moreover, we show the number of seeds needed for exact recovery in polynomial-time can be as low as n3ϵn^{3\epsilon} in the sparse graph regime (with the average degree smaller than nϵn^{\epsilon}) and Ω(logn)\Omega(\log n) in the dense graph regime. Our results also shed light on the unseeded problem. In particular, we give sub-exponential time algorithms for sparse models and an nO(logn)n^{O(\log n)} algorithm for dense models for some parameters, including some that are not covered by recent results of Barak et al

    Identification and partial characterization of antifungal and antibacterial activities of two Bacillus sp. strains isolated from salt soil in Tunisia

    Get PDF
    Two Bacillus sp. strains (B29 and B27) isolated from soil in the South of Tunisia were tested for their abilities to produce antimicrobial compounds. Both strains showed antimicrobial activity against Gram-positive and Gram-negative bacteria, yeasts and fungi. The produced compounds were extracted by using four different solvents. The hexane solvent allowed to obtain maximum of activity of the strain B29. The activity of the strain B27 was not elucidated by the four solvents used. Bio-autography results of B29 hexane extract revealed presence of different antibiotics and antifungal compounds with different Rf values of 0.3 and 0.76 for antifungal compounds and of 0.12, 0.14, 0.19 and 0.3 for antibacterial ones. Two active fractions were isolated from the culture broth of the strain B29 by semi-preparative high performance liquid chromatography (HPLC). The partial sequencing of the 16S rDNA gene was used to identify the two Bacillus strains. They may be assigned to new Bacillus specie

    Cephalosporinases associated with outer membrane vesicles released by Bacteroides spp. protect gut pathogens and commensals against beta-lactam antibiotics

    Get PDF
    Objectives: To identify β-lactamase genes in gut commensal Bacteroides species and to assess the impact of these enzymes, when carried by outer membrane vesicles (OMVs), in protecting enteric pathogens and commensals. Methods: A deletion mutant of the putative class A β-lactamase gene (locus tag BT_4507) found in the genome of the human commensal Bacteroides thetaiotaomicron was constructed and a phenotypic analysis performed. A phylogenetic tree was built from an alignment of nine Bacteroides cephalosporinase protein sequences, using the maximum likelihood method. The rate of cefotaxime degradation after incubation with OMVs produced by different Bacteroides species was quantified using a disc susceptibility test. The resistance of Salmonella Typhimurium and Bifidobacterium breve to cefotaxime in liquid culture in the presence of B. thetaiotaomicron OMVs was evaluated by measuring bacterial growth. Results: The B. thetaiotaomicron BT_4507 gene encodes a β-lactamase related to the CepA cephalosporinase of Bacteroides fragilis. OMVs produced by B. thetaiotaomicron and several other Bacteroides species, except Bacteroides ovatus, carried surface-associated β-lactamases that could degrade cefotaxime. β-Lactamase-harbouring OMVs from B. thetaiotaomicron protected Salmonella Typhimurium and B. breve from an otherwise lethal dose of cefotaxime. Conclusions: The production of membrane vesicles carrying surface-associated β-lactamases by Bacteroides species, which constitute a major part of the human colonic microbiota, may protect commensal bacteria and enteric pathogens, such as Salmonella Typhimurium, against β-lactam antibiotics

    Pathogen Response Genes Mediate Caenorhabditis elegans Innate Immunity

    Full text link
    Innate immunity is crucial in the response and defense against pathogens for invertebrates and vertebrates alike. The soil nematode Caenorhabditis elegans is a useful model to study the eukaryotic innate immune response to microbial pathogenesis. Prior research indicates that the protein receptor FSHR-1 plays an important role in the innate recognition of intestinal infection due to pathogen consumption. Determining what genes are controlled by FSHR-1 may uncover an unknown pathway that could increase not only the comprehension of the C. elegans immune system but also innate immunity generally. To characterize the function of FSHR-1, four candidate pathogen response genes that appear to be regulated by FSHR-1 were evaluated in worms infected with Pseudomonas aeruginosa. Although intestine specific RNA interference of these four genes did not show immunity phenotypes, quantitative PCR suggests that FSHR-1 regulates the basal and/or infection-induced expression of three of the four genes. To explore this FSHR-1-dependent transcriptional induction, fluorescent transgenic reporters were constructed for the three candidate FSHR-1 target genes. The spatial expression of one putative pathogen response gene was characterized in transgenic worms under both control and pathogenic conditions. RNA interference was performed to assess the FSHR-1 dependency of this expression pattern

    From sea to land and beyond : new insights into the evolution of euthyneuran Gastropoda (Mollusca)

    Get PDF
    Background The Euthyneura are considered to be the most successful and diverse group of Gastropoda. Phylogenetically, they are riven with controversy. Previous morphology-based phylogenetic studies have been greatly hampered by rampant parallelism in morphological characters or by incomplete taxon sampling. Based on sequences of nuclear 18S rRNA and 28S rRNA as well as mitochondrial 16S rRNA and COI DNA from 56 taxa, we reconstructed the phylogeny of Euthyneura utilising Maximum Likelihood and Bayesian inference methods. The evolution of colonization of freshwater and terrestrial habitats by pulmonate Euthyneura, considered crucial in the evolution of this group of Gastropoda, is reconstructed with Bayesian approaches. Results We found several well supported clades within Euthyneura, however, we could not confirm the traditional classification, since Pulmonata are paraphyletic and Opistobranchia are either polyphyletic or paraphyletic with several clades clearly distinguishable. Sacoglossa appear separately from the rest of the Opisthobranchia as sister taxon to basal Pulmonata. Within Pulmonata, Basommatophora are paraphyletic and Hygrophila and Eupulmonata form monophyletic clades. Pyramidelloidea are placed within Euthyneura rendering the Euthyneura paraphyletic. Conclusion Based on the current phylogeny, it can be proposed for the first time that invasion of freshwater by Pulmonata is a unique evolutionary event and has taken place directly from the marine environment via an aquatic pathway. The origin of colonisation of terrestrial habitats is seeded in marginal zones and has probably occurred via estuaries or semi-terrestrial habitats such as mangroves

    Likelihood-based inference of B-cell clonal families

    Full text link
    The human immune system depends on a highly diverse collection of antibody-making B cells. B cell receptor sequence diversity is generated by a random recombination process called "rearrangement" forming progenitor B cells, then a Darwinian process of lineage diversification and selection called "affinity maturation." The resulting receptors can be sequenced in high throughput for research and diagnostics. Such a collection of sequences contains a mixture of various lineages, each of which may be quite numerous, or may consist of only a single member. As a step to understanding the process and result of this diversification, one may wish to reconstruct lineage membership, i.e. to cluster sampled sequences according to which came from the same rearrangement events. We call this clustering problem "clonal family inference." In this paper we describe and validate a likelihood-based framework for clonal family inference based on a multi-hidden Markov Model (multi-HMM) framework for B cell receptor sequences. We describe an agglomerative algorithm to find a maximum likelihood clustering, two approximate algorithms with various trade-offs of speed versus accuracy, and a third, fast algorithm for finding specific lineages. We show that under simulation these algorithms greatly improve upon existing clonal family inference methods, and that they also give significantly different clusters than previous methods when applied to two real data sets
    corecore