17,775 research outputs found
Letter to Sound Rules for Accented Lexicon Compression
This paper presents trainable methods for generating letter to sound rules
from a given lexicon for use in pronouncing out-of-vocabulary words and as a
method for lexicon compression.
As the relationship between a string of letters and a string of phonemes
representing its pronunciation for many languages is not trivial, we discuss
two alignment procedures, one fully automatic and one hand-seeded which produce
reasonable alignments of letters to phones.
Top Down Induction Tree models are trained on the aligned entries. We show
how combined phoneme/stress prediction is better than separate prediction
processes, and still better when including in the model the last phonemes
transcribed and part of speech information. For the lexicons we have tested,
our models have a word accuracy (including stress) of 78% for OALD, 62% for CMU
and 94% for BRULEX. The extremely high scores on the training sets allow
substantial size reductions (more than 1/20).
WWW site: http://tcts.fpms.ac.be/synthesis/mbrdicoComment: 4 pages 1 figur
A Seeded Genetic Algorithm for RNA Secondary Structural Prediction with Pseudoknots
This work explores a new approach in using genetic algorithm to predict RNA secondary structures with pseudoknots. Since only a small portion of most RNA structures is comprised of pseudoknots, the majority of structural elements from an optimal pseudoknot-free structure are likely to be part of the true structure. Thus seeding the genetic algorithm with optimal pseudoknot-free structures will more likely lead it to the true structure than a randomly generated population. The genetic algorithm uses the known energy models with an additional augmentation to allow complex pseudoknots. The nearest-neighbor energy model is used in conjunction with Turner’s thermodynamic parameters for pseudoknot-free structures, and the H-type pseudoknot energy estimation for simple pseudoknots. Testing with known pseudoknot sequences from PseudoBase shows that it out performs some of the current popular algorithms
Parallel Treebanks in Phrase-Based Statistical Machine Translation
Given much recent discussion and the shift in focus of the field, it is becoming apparent that the incorporation of syntax is the way forward for the current state-of-the-art in machine translation (MT). Parallel treebanks are a relatively recent innovation and appear to be ideal candidates for MT training material. However, until recently there has been no other means to build them than by
hand. In this paper, we describe how we make use of new tools to automatically build a large parallel treebank and extract a set of linguistically motivated phrase pairs from it. We show that adding these phrase pairs to the translation model of a baseline phrase-based statistical MT (PBSMT) system leads to significant improvements in translation quality. We describe further experiments on incorporating parallel treebank information into PBSMT, such as word alignments. We investigate the conditions under which the incorporation of parallel treebank data performs optimally. Finally, we discuss the potential of parallel treebanks in other paradigms of MT
Seeded Graph Matching via Large Neighborhood Statistics
We study a well known noisy model of the graph isomorphism problem. In this
model, the goal is to perfectly recover the vertex correspondence between two
edge-correlated Erd\H{o}s-R\'{e}nyi random graphs, with an initial seed set of
correctly matched vertex pairs revealed as side information. For seeded
problems, our result provides a significant improvement over previously known
results. We show that it is possible to achieve the information-theoretic limit
of graph sparsity in time polynomial in the number of vertices . Moreover,
we show the number of seeds needed for exact recovery in polynomial-time can be
as low as in the sparse graph regime (with the average degree
smaller than ) and in the dense graph regime.
Our results also shed light on the unseeded problem. In particular, we give
sub-exponential time algorithms for sparse models and an
algorithm for dense models for some parameters, including some that are not
covered by recent results of Barak et al
Identification and partial characterization of antifungal and antibacterial activities of two Bacillus sp. strains isolated from salt soil in Tunisia
Two Bacillus sp. strains (B29 and B27) isolated from soil in the South of Tunisia were tested for their abilities to produce antimicrobial compounds. Both strains showed antimicrobial activity against Gram-positive and Gram-negative bacteria, yeasts and fungi. The produced compounds were extracted by using four different solvents. The hexane solvent allowed to obtain maximum of activity of the strain B29. The activity of the strain B27 was not elucidated by the four solvents used. Bio-autography results of B29 hexane extract revealed presence of different antibiotics and antifungal compounds with different Rf values of 0.3 and 0.76 for antifungal compounds and of 0.12, 0.14, 0.19 and 0.3 for antibacterial ones. Two active fractions were isolated from the culture broth of the strain B29 by semi-preparative high performance liquid chromatography (HPLC). The partial sequencing of the 16S rDNA gene was used to identify the two Bacillus strains. They may be assigned to new Bacillus specie
Cephalosporinases associated with outer membrane vesicles released by Bacteroides spp. protect gut pathogens and commensals against beta-lactam antibiotics
Objectives: To identify β-lactamase genes in gut commensal Bacteroides species and to assess the impact of these enzymes, when carried by outer membrane vesicles (OMVs), in protecting enteric pathogens and commensals. Methods: A deletion mutant of the putative class A β-lactamase gene (locus tag BT_4507) found in the genome of the human commensal Bacteroides thetaiotaomicron was constructed and a phenotypic analysis performed. A phylogenetic tree was built from an alignment of nine Bacteroides cephalosporinase protein sequences, using the maximum likelihood method. The rate of cefotaxime degradation after incubation with OMVs produced by different Bacteroides species was quantified using a disc susceptibility test. The resistance of Salmonella Typhimurium and Bifidobacterium breve to cefotaxime in liquid culture in the presence of B. thetaiotaomicron OMVs was evaluated by measuring bacterial growth. Results: The B. thetaiotaomicron BT_4507 gene encodes a β-lactamase related to the CepA cephalosporinase of Bacteroides fragilis. OMVs produced by B. thetaiotaomicron and several other Bacteroides species, except Bacteroides ovatus, carried surface-associated β-lactamases that could degrade cefotaxime. β-Lactamase-harbouring OMVs from B. thetaiotaomicron protected Salmonella Typhimurium and B. breve from an otherwise lethal dose of cefotaxime. Conclusions: The production of membrane vesicles carrying surface-associated β-lactamases by Bacteroides species, which constitute a major part of the human colonic microbiota, may protect commensal bacteria and enteric pathogens, such as Salmonella Typhimurium, against β-lactam antibiotics
Pathogen Response Genes Mediate Caenorhabditis elegans Innate Immunity
Innate immunity is crucial in the response and defense against pathogens for invertebrates and vertebrates alike. The soil nematode Caenorhabditis elegans is a useful model to study the eukaryotic innate immune response to microbial pathogenesis. Prior research indicates that the protein receptor FSHR-1 plays an important role in the innate recognition of intestinal infection due to pathogen consumption. Determining what genes are controlled by FSHR-1 may uncover an unknown pathway that could increase not only the comprehension of the C. elegans immune system but also innate immunity generally. To characterize the function of FSHR-1, four candidate pathogen response genes that appear to be regulated by FSHR-1 were evaluated in worms infected with Pseudomonas aeruginosa. Although intestine specific RNA interference of these four genes did not show immunity phenotypes, quantitative PCR suggests that FSHR-1 regulates the basal and/or infection-induced expression of three of the four genes. To explore this FSHR-1-dependent transcriptional induction, fluorescent transgenic reporters were constructed for the three candidate FSHR-1 target genes. The spatial expression of one putative pathogen response gene was characterized in transgenic worms under both control and pathogenic conditions. RNA interference was performed to assess the FSHR-1 dependency of this expression pattern
From sea to land and beyond : new insights into the evolution of euthyneuran Gastropoda (Mollusca)
Background The Euthyneura are considered to be the most successful and diverse group of Gastropoda. Phylogenetically, they are riven with controversy. Previous morphology-based phylogenetic studies have been greatly hampered by rampant parallelism in morphological characters or by incomplete taxon sampling. Based on sequences of nuclear 18S rRNA and 28S rRNA as well as mitochondrial 16S rRNA and COI DNA from 56 taxa, we reconstructed the phylogeny of Euthyneura utilising Maximum Likelihood and Bayesian inference methods. The evolution of colonization of freshwater and terrestrial habitats by pulmonate Euthyneura, considered crucial in the evolution of this group of Gastropoda, is reconstructed with Bayesian approaches. Results We found several well supported clades within Euthyneura, however, we could not confirm the traditional classification, since Pulmonata are paraphyletic and Opistobranchia are either polyphyletic or paraphyletic with several clades clearly distinguishable. Sacoglossa appear separately from the rest of the Opisthobranchia as sister taxon to basal Pulmonata. Within Pulmonata, Basommatophora are paraphyletic and Hygrophila and Eupulmonata form monophyletic clades. Pyramidelloidea are placed within Euthyneura rendering the Euthyneura paraphyletic. Conclusion Based on the current phylogeny, it can be proposed for the first time that invasion of freshwater by Pulmonata is a unique evolutionary event and has taken place directly from the marine environment via an aquatic pathway. The origin of colonisation of terrestrial habitats is seeded in marginal zones and has probably occurred via estuaries or semi-terrestrial habitats such as mangroves
Likelihood-based inference of B-cell clonal families
The human immune system depends on a highly diverse collection of
antibody-making B cells. B cell receptor sequence diversity is generated by a
random recombination process called "rearrangement" forming progenitor B cells,
then a Darwinian process of lineage diversification and selection called
"affinity maturation." The resulting receptors can be sequenced in high
throughput for research and diagnostics. Such a collection of sequences
contains a mixture of various lineages, each of which may be quite numerous, or
may consist of only a single member. As a step to understanding the process and
result of this diversification, one may wish to reconstruct lineage membership,
i.e. to cluster sampled sequences according to which came from the same
rearrangement events. We call this clustering problem "clonal family
inference." In this paper we describe and validate a likelihood-based framework
for clonal family inference based on a multi-hidden Markov Model (multi-HMM)
framework for B cell receptor sequences. We describe an agglomerative algorithm
to find a maximum likelihood clustering, two approximate algorithms with
various trade-offs of speed versus accuracy, and a third, fast algorithm for
finding specific lineages. We show that under simulation these algorithms
greatly improve upon existing clonal family inference methods, and that they
also give significantly different clusters than previous methods when applied
to two real data sets
- …