361 research outputs found
Recommended from our members
An Entropy-based Assessment of the Unicode Encoding for Tibetan
This paper presents an analysis of the Unicode encoding scheme for Tibetan from the standpoint of morpheme entropy. We can speak of two levels of entropy in Tibetan: syllable entropy (a measure of the probability of the sequential occurrence of syllables), and morpheme entropy (a measure of the probability of the sequential occurrence of characters or morphemes), the latter being a measure of the redundancy of the language. Syllable entropy is a purely statistical calculation that is a function of the domain of the literature sampled, while morpheme entropy, we show, is relatively domain independent given a statistically significant sample. Morpheme entropy can be calculated statistically, though a theoretical upper bound can also be postulated based on language dependent morphology rules. This paper presents both theoretical and statistical estimates of the morpheme entropy for Tibetan, and explores the Tibetan Unicode encoding scheme in relation to data compression, and other issues analyzed in light of entropy-based language modeling
Recommended from our members
Automatic Segmentation and Part-Of-Speech Tagging For Tibetan: A First Step Towards Machine Translation
This paper presents what we believe to be the first reported work on Tibetan machine translation (MT). Of the three conceptually distinct components of a MT system — analysis, transfer, and generation — the first phase, consisting of POS tagging has been successfully completed. The combination POS tagger / word-segmenter was manually constructed as a rule-based multi-tagger relying on the Wilson formulation of Tibetan grammar. Partial parsing was also performed in combination with POS-tag sequence disambiguation. The component was evaluated at the task of document indexing for Information Retrieval (IR). Preliminary analysis indicated slightly better (though statistically comparable) performance to n-gram based approaches at a known-item IR task. Although segmentation is application specific, error analysis placed segmentation accuracy at 99%; the accuracy of the POS tagger is also estimated at 99% based on IR error analysis and random sampling
Recommended from our members
An Entropy-based Assessment of the Unicode Encoding for Tibetan
This paper presents an analysis of the Unicode encoding scheme for Tibetan from the standpoint of morpheme entropy. We can speak of two levels of entropy in Tibetan: syllable entropy (a measure of the probability of the sequential occurrence of syllables), and morpheme entropy (a measure of the probability of the sequential occurrence of characters or morphemes), the latter being a measure of the redundancy of the language. Syllable entropy is a purely statistical calculation that is a function of the domain of the literature sampled, while morpheme entropy, we show, is relatively domain independent given a statistically significant sample. Morpheme entropy can be calculated statistically, though a theoretical upper bound can also be postulated based on language dependent morphology rules. This paper presents both theoretical and statistical estimates of the morpheme entropy for Tibetan, and explores the Tibetan Unicode encoding scheme in relation to data compression, and other issues analyzed in light of entropy-based language modeling
Recommended from our members
The Use of yig-cha and chos-kyi-rnam-grangs in Computing Lexical Cohesion for Tibetan Topic Boundary Detection
To properly implement a simple Tibetan Information Retrieval (IR) system segmentation of one form or another (n-gram, POS-tagging, dictionary substring matching, etc.) must be performed (see Hackett (2000b)). To take Tibetan indexing to a more sophisticated level however, some form of topic detection must be employed. This paper reports the results of a pilot study on the application to Tibetan of one technique for topic boundary detection: Lexical Cohesion. The resources developed and deployed, the theoretical model used, and its potential applications are discussed
Document Translation for Cross-Language Text Retrieval at the University of Maryland
The University of Maryland participated in three TREC-6 tasks: ad hoc retrieval, cross-language retrieval, and spoken document retrieval. The principal focus of the work was evaluation of a cross-language text retrieval technique based on fully automatic machine translation. The results show that approaches based on document translation can be approximately as effective as approaches based on query translation, but that additional work will be needed to develop a solid basis for choosing between the two in specific applications. Ad hoc and spoken document retrieval results are also presented
Mustard catch crop enhances denitrification in shallow groundwater beneath a spring barley field
The study was funded by Department of Agriculture and Food through the Research Stimulus Fund Programme (Grant RSF 06383) in collaboration with the Department of Civil, Structural & Environmental Engineering, Trinity College Dublin, Ireland.peer-reviewedOver-winter green cover crops have been reported to increase dissolved organic carbon (DOC) concentrations in groundwater, which can be used as an energy source for denitrifiers. This study investigates the impact of a mustard catch crop on in situ denitrification and nitrous oxide (N2O) emissions from an aquifer overlain by arable land. Denitrification rates and N2O-N/(N2O-N + N2-N) mole fractions were measured in situ with a push–pull method in shallow groundwater under a spring barley system in experimental plots with and without a mustard cover crop. The results suggest that a mustard cover crop could substantially enhance reduction of groundwater nitrate NO3--N via denitrification without significantly increasing N2O emissions. Mean total denitrification (TDN) rates below mustard cover crop and no cover crop were 7.61 and 0.002 μg kg−1 d−1, respectively. Estimated N2O-N/(N2O-N + N2-N) ratios, being 0.001 and 1.0 below mustard cover crop and no cover crop respectively, indicate that denitrification below mustard cover crop reduces N2O to N2, unlike the plot with no cover crop. The observed enhanced denitrification under the mustard cover crop may result from the higher groundwater DOC under mustard cover crop (1.53 mg L−1) than no cover crop (0.90 mg L−1) being added by the root exudates and root masses of mustard. This study gives insights into the missing piece in agricultural nitrogen (N) balance and groundwater derived N2O emissions under arable land and thus helps minimise the uncertainty in agricultural N and N2O-N balances
Beef production from feedstuffs conserved using new technologies to reduce negative environmental impacts
End of project reportMost (ca. 86%) Irish farms make some silage. Besides directly providing feed for livestock, the provision of grass silage within integrated grassland systems makes an important positive contribution to effective grazing management and improved forage utilisation by grazing animals, and to effective feed budgeting by farmers. It can also contribute to maintaining the content of desirable species in pastures, and to livestock not succumbing to parasites at sensitive times of the year. Furthermore, the optimal recycling of nutrients collected from housed livestock can often be best achieved by spreading the manures on the land used for producing the conserved feed. On most Irish farms, grass silage will remain the main conserved forage for feeding to livestock during winter for the foreseeable future. However, on some farms high yields of whole-crop (i.e. grain + straw) cereals such as wheat, barley and triticale, and of forage maize, will be an alternative option provided that losses during harvesting, storage and feedout are minimised and that input costs are restrained. These alternative forages have the potential to reliably support high levels of animal performance while avoiding the production of effluent. Their production and use however will need to advantageously integrate into ruminant production systems. A range of technologies can be employed for crop production and conservation, and for beef production, and the optimal options need to be identified. Beef cattle being finished indoors are offered concentrate feedstuffs at rates that range from modest inputs through to ad libitum access. Such concentrates frequently contain high levels of cereals such as barley or wheat. These cereals are generally between 14% to 18% moisture content and tend to be rolled shortly before being included in coarse rations or are more finely processed prior to pelleting. Farmers thinking of using ‘high-moisture grain’ techniques for preserving and processing cereal grains destined for feeding to beef cattle need to know how the yield, conservation efficiency and feeding value of such grains compares with grains conserved using more conventional techniques. European Union policy strongly encourages a sustainable and multifunctional agriculture. Therefore, in addition to providing European consumers with quality food produced within approved systems, agriculture must also contribute positively to the conservation of natural resources and the upkeep of the rural landscape. Plastics are widely used in agriculture and their post-use fate on farms must not harm the environment - they must be managed to support the enduring sustainability of farming systems. There is an absence of information on the efficacy of some new options for covering and sealing silage with plastic sheeting and tyres, and an absence of an inventory of the use, re-use and post-use fate of plastic film on farms. Irish cattle farmers operate a large number of beef production systems, half of which use dairy bred calves. In the current, continuously changing production and market conditions, new beef systems must be considered. A computer package is required that will allow the rapid, repeatable simulation and assessment of alternate beef production systems using appropriate, standardised procedures. There is thus a need to construct, evaluate and utilise computer models of components of beef production systems and to develop mathematical relationships to link system components into a network that would support their integration into an optimal system model. This will provide a framework to integrate physical and financial on-farm conditions with models for estimating feed supply and animal growth patterns. Cash flow and profit/loss results will be developed. This will help identify optimal systems, indicate the cause of failure of imperfect systems and identify areas where applied research data are currently lacking, or more basic research is required
Saltwater nectotizing fasciitis following coral reef laceration possibly exacerbated by a long-haul flight: a case report
Neurotrophins promote revascularization by local recruitment of TrkB+ endothelial cells and systemic mobilization of hematopoietic progenitors
Conformational and Structural Relaxations of Poly(ethylene oxide) and Poly(propylene oxide) Melts: Molecular Dynamics Study of Spatial Heterogeneity, Cooperativity, and Correlated Forward-Backward Motion
Performing molecular dynamics simulations for all-atom models, we
characterize the conformational and structural relaxations of poly(ethylene
oxide) and poly(propylene oxide) melts. The temperature dependence of these
relaxation processes deviates from an Arrhenius law for both polymers. We
demonstrate that mode-coupling theory captures some aspects of the glassy
slowdown, but it does not enable a complete explanation of the dynamical
behavior. When the temperature is decreased, spatially heterogeneous and
cooperative translational dynamics are found to become more important for the
structural relaxation. Moreover, the transitions between the conformational
states cease to obey Poisson statistics. In particular, we show that, at
sufficiently low temperatures, correlated forward-backward motion is an
important aspect of the conformational relaxation, leading to strongly
nonexponential distributions for the waiting times of the dihedrals in the
various conformational statesComment: 13 pages, 13 figure
- …