30,603 research outputs found

    Speech Recognition by Composition of Weighted Finite Automata

    Full text link
    We present a general framework based on weighted finite automata and weighted finite-state transducers for describing and implementing speech recognizers. The framework allows us to represent uniformly the information sources and data structures used in recognition, including context-dependent units, pronunciation dictionaries, language models and lattices. Furthermore, general but efficient algorithms can used for combining information sources in actual recognizers and for optimizing their application. In particular, a single composition algorithm is used both to combine in advance information sources such as language models and dictionaries, and to combine acoustic observations and information sources dynamically during recognition.Comment: 24 pages, uses psfig.st

    Beyond Word N-Grams

    Full text link
    We describe, analyze, and evaluate experimentally a new probabilistic model for word-sequence prediction in natural language based on prediction suffix trees (PSTs). By using efficient data structures, we extend the notion of PST to unbounded vocabularies. We also show how to use a Bayesian approach based on recursive priors over all possible PSTs to efficiently maintain tree mixtures. These mixtures have provably and practically better performance than almost any single model. We evaluate the model on several corpora. The low perplexity achieved by relatively small PST mixture models suggests that they may be an advantageous alternative, both theoretically and practically, to the widely used n-gram models.Comment: 15 pages, one PostScript figure, uses psfig.sty and fullname.sty. Revised version of a paper in the Proceedings of the Third Workshop on Very Large Corpora, MIT, 199

    Similarity-Based Models of Word Cooccurrence Probabilities

    Full text link
    In many applications of natural language processing (NLP) it is necessary to determine the likelihood of a given word combination. For example, a speech recognizer may need to determine which of the two word combinations ``eat a peach'' and ``eat a beach'' is more likely. Statistical NLP methods determine the likelihood of a word combination from its frequency in a training corpus. However, the nature of language is such that many word combinations are infrequent and do not occur in any given corpus. In this work we propose a method for estimating the probability of such previously unseen word combinations using available information on ``most similar'' words. We describe probabilistic word association models based on distributional word similarity, and apply them to two tasks, language modeling and pseudo-word disambiguation. In the language modeling task, a similarity-based model is used to improve probability estimates for unseen bigrams in a back-off language model. The similarity-based method yields a 20% perplexity improvement in the prediction of unseen bigrams and statistically significant reductions in speech-recognition error. We also compare four similarity-based estimation methods against back-off and maximum-likelihood estimation methods on a pseudo-word sense disambiguation task in which we controlled for both unigram and bigram frequency to avoid giving too much weight to easy-to-disambiguate high-frequency configurations. The similarity-based methods perform up to 40% better on this particular task.Comment: 26 pages, 5 figure

    Principles and Implementation of Deductive Parsing

    Get PDF
    We present a system for generating parsers based directly on the metaphor of parsing as deduction. Parsing algorithms can be represented directly as deduction systems, and a single deduction engine can interpret such deduction systems so as to implement the corresponding parser. The method generalizes easily to parsers for augmented phrase structure formalisms, such as definite-clause grammars and other logic grammar formalisms, and has been used for rapid prototyping of parsing algorithms for a variety of formalisms including variants of tree-adjoining grammars, categorial grammars, and lexicalized context-free grammars.Comment: 69 pages, includes full Prolog cod

    Detection of new eruptions in the Magellanic Clouds LBVs R 40 and R 110

    Full text link
    We performed a spectroscopic and photometric analysis to study new eruptions in two luminous blue variables (LBVs) in the Magellanic Clouds. We detected a strong new eruption in the LBV R40 that reached V∼9.2V \sim 9.2 in 2016, which is around 1.31.3 mag brighter than the minimum registered in 1985. During this new eruption, the star changed from an A-type to a late F-type spectrum. Based on photometric and spectroscopic empirical calibrations and synthetic spectral modeling, we determine that R\,40 reached Teff=5800−6300T_{\mathrm{eff}} = 5800-6300~K during this new eruption. This object is thereby probably one of the coolest identified LBVs. We could also identify an enrichment of nitrogen and r- and s-process elements. We detected a weak eruption in the LBV R 110 with a maximum of V∼9.9V \sim 9.9 mag in 2011, that is, around 1.01.0 mag brighter than in the quiescent phase. On the other hand, this new eruption is about 0.20.2 mag fainter than the first eruption detected in 1990, but the temperature did not decrease below 8500 K. Spitzer spectra show indications of cool dust in the circumstellar environment of both stars, but no hot or warm dust was present, except by the probable presence of PAHs in R\,110. We also discuss a possible post-red supergiant nature for both stars

    Chemical analysis of giant stars in the young open cluster NGC 3114

    Full text link
    Context: Open clusters are very useful targets for examining possible trends in galactocentric distance and age, especially when young and old open clusters are compared. Aims: We carried out a detailed spectroscopic analysis to derive the chemical composition of seven red giants in the young open cluster NGC 3114. Abundances of C, N, O, Li, Na, Mg, Al, Ca, Si, Ti, Ni, Cr, Y, Zr, La, Ce, and Nd were obtained, as well as the carbon isotopic ratio. Methods: The atmospheric parameters of the studied stars and their chemical abundances were determined using high-resolution optical spectroscopy. We employed the local-thermodynamic-equilibrium model atmospheres of Kurucz and the spectral analysis code MOOG. The abundances of the light elements were derived using the spectral synthesis technique. Results: We found that NGC 3114 has a mean metallicity of [Fe/H] = -0.01+/-0.03. The isochrone fit yielded a turn-off mass of 4.2 Msun. The [N/C] ratio is in good agreement with the models predicted by first dredge-up. We found that two stars, HD 87479 and HD 304864, have high rotational velocities of 15.0 km/s and 11.0 km/s; HD 87526 is a halo star and is not a member of NGC 3114. Conclusions: The carbon and nitrogen abundance in NGC 3114 agree with the field and cluster giants. The oxygen abundance in NGC 3114 is lower compared to the field giants. The [O/Fe] ratio is similar to the giants in young clusters. We detected sodium enrichment in the analyzed cluster giants. As far as the other elements are concerned, their [X/Fe] ratios follow the same trend seen in giants with the same metallicity.Comment: 17 pages, 9 figures, 10 tables; accepted for publication in A&

    Retrofit Options for Increasing Energy Efficiency in Office Buildings- Methodology Review

    Get PDF
    Portuguese Buildings represent 35% of primary energy consumption in 2006, with non-residential sector representing almost half of this number globally and around 65% in Lisbon city. Expected to grow 5% yearly in this period, non-residential buildings rehabilitation is a great opportunity for energy rehabilitation for a stock of 800.000 buildings needing medium to high interventions. For this task to be successful it is also urgent that procedures consider an accurate technical framework, where existing technologies and best case-studies can be considered, in order to drive passive measures retrofitting forward. This paper presents an overview of a methodology development which pretends to include the energy component in rehabilitation schemes with an integrated and comprehensive analysis, achieving all those directly involved in the building process (owners, consumers, public bodies, construction and project design industry) as well as new important players such as ESCOs
    • …
    corecore