2,751 research outputs found

    Introduction to Library Trends 47 (3) Winter 1999: Folkloristic Approaches in Library and Information Science

    Get PDF
    published or submitted for publicatio

    Disambiguation strategies for data-oriented translation

    Get PDF
    The Data-Oriented Translation (DOT) model { originally proposed in (Poutsma, 1998, 2003) and based on Data-Oriented Parsing (DOP) (e.g. (Bod, Scha, & Sima'an, 2003)) { is best described as a hybrid model of translation as it combines examples, linguistic information and a statistical translation model. Although theoretically interesting, it inherits the computational complexity associated with DOP. In this paper, we focus on one computational challenge for this model: efficiently selecting the `best' translation to output. We present four different disambiguation strategies in terms of how they are implemented in our DOT system, along with experiments which investigate how they compare in terms of accuracy and efficiency

    Seeing the wood for the trees: data-oriented translation

    Get PDF
    Data-Oriented Translation (DOT), which is based on Data-Oriented Parsing (DOP), comprises an experience-based approach to translation, where new translations are derived with reference to grammatical analyses of previous translations. Previous DOT experiments [Poutsma, 1998, Poutsma, 2000a, Poutsma, 2000b] were small in scale because important advances in DOP technology were not incorporated into the translation model. Despite this, related work [Way, 1999, Way, 2003a, Way, 2003b] reports that DOT models are viable in that solutions to ‘hard’ translation cases are readily available. However, it has not been shown to date that DOT models scale to larger datasets. In this work, we describe a novel DOT system, inspired by recent advances in DOP parsing technology. We test our system on larger, more complex corpora than have been used heretofore, and present both automatic and human evaluations which show that high quality translations can be achieved at reasonable speeds

    Structured parameter estimation for LFG-DOP using Backoff

    Get PDF
    Despite its state-of-the-art performance, the Data Oriented Parsing (DOP) model has been shown to suffer from biased parameter estimation, and the good performance seems more the result of ad hoc adjustments than correct probabilistic generalization over the data. In recent work, we developed a new estimation procedure, called Backoff Estimation, for DOP models that are based on Phrase-Structure annotations (so called Tree-DOP models). Backoff Estimation deviates from earlier methods in that it treats the model parameters as a highly structured space of correlated events (backoffs), rather than a set of disjoint events. In this paper we show that the problem of biased estimates also holds for DOP models that are based on Lexical-Functional Grammar annotations (i.e. LFG-DOP), and that the LFG-DOP parameters also constitute a hierarchically structured space. Subsequently, we adapt the Backoff Estimation algorithm from Tree-DOP to LFG-DOP models. Backoff Estimation turns out to be a natural solution to some of the specific problems of robust parsing under LFGDOP

    Data-oriented parsing and the Penn Chinese treebank

    Get PDF
    We present an investigation into parsing the Penn Chinese Treebank using a Data-Oriented Parsing (DOP) approach. DOP comprises an experience-based approach to natural language parsing. Most published research in the DOP framework uses PStrees as its representation schema. Drawbacks of the DOP approach centre around issues of efficiency. We incorporate recent advances in DOP parsing techniques into a novel DOP parser which generates a compact representation of all subtrees which can be derived from any full parse tree. We compare our work to previous work on parsing the Penn Chinese Treebank, and provide both a quantitative and qualitative evaluation. While our results in terms of Precision and Recall are slightly below those published in related research, our approach requires no manual encoding of head rules, nor is a development phase per se necessary. We also note that certain constructions which were problematic in this previous work can be handled correctly by our DOP parser. Finally, we observe that the ‘DOP Hypothesis’ is confirmed for parsing the Penn Chinese Treebank

    Parallel Treebanks in Phrase-Based Statistical Machine Translation

    Get PDF
    Given much recent discussion and the shift in focus of the field, it is becoming apparent that the incorporation of syntax is the way forward for the current state-of-the-art in machine translation (MT). Parallel treebanks are a relatively recent innovation and appear to be ideal candidates for MT training material. However, until recently there has been no other means to build them than by hand. In this paper, we describe how we make use of new tools to automatically build a large parallel treebank and extract a set of linguistically motivated phrase pairs from it. We show that adding these phrase pairs to the translation model of a baseline phrase-based statistical MT (PBSMT) system leads to significant improvements in translation quality. We describe further experiments on incorporating parallel treebank information into PBSMT, such as word alignments. We investigate the conditions under which the incorporation of parallel treebank data performs optimally. Finally, we discuss the potential of parallel treebanks in other paradigms of MT

    Comparing constituency and dependency representations for SMT phrase-extraction

    Get PDF
    We consider the value of replacing and/or combining string-based methods with syntax-based methods for phrase-based statistical machine translation (PBSMT), and we also consider the relative merits of using constituency-annotated vs. dependency-annotated training data. We automatically derive two subtree-aligned treebanks, dependency-based and constituency-based, from a parallel English–French corpus and extract syntactically motivated word- and phrase-pairs. We automatically measure PB-SMT quality. The results show that combining string-based and syntax-based word- and phrase-pairs can improve translation quality irrespective of the type of syntactic annotation. Furthermore, using dependency annotation yields greater translation quality than constituency annotation for PB-SMT

    Stated Preferences for Ecotourism Alternatives On the Standing Rock Sioux Indian Reservation

    Get PDF
    Despite favorable locations and the potential for economic development, Native American tribes have not developed their ecotourism markets substantially. This paper presents a choice experiment analysis of potential tourist and local resident preferences for alternative ecotourism development scenarios for the Standing Rock Sioux Indian Reservation. The choice experiments elicitation featured attributes of both cultural and nature-based tourist attractions. Survey results demonstrated that visitors interviewed at powwows had significantly different preferences from those interviewed at local tourist attractions. Results from all samples showed positive preferences towards an amphitheater, a nature trail, and a bison meal, and no preference toward an ATV trail. Non-powwow tourists had significant willingness to pay for a number of potential attractions, including nature trails, a road through the bison pasture, and an interpretive center with amphitheatre show.choice experiments, ecotourism, Native Americans, Standing Rock Sioux, Resource /Energy Economics and Policy,

    THE USE OF CHOICE EXPERIMENTS TO ANALYZE CONSUMER PREFERENCES FOR ORGANIC PRODUCE IN COSTA RICA

    Get PDF
    Choice Experiments are used to elicit Costa Rican consumer preferences for different attributes of organic and conventional vegetables in a hypothetical market. Focus groups identified a primary concern with the food safety and a secondary interest on the environmental impact of production practices. Two alternative national certification seals were proposed: 1) a "Blue Seal" certifying the Department of Public Health's approval for food safety; and 2) a "Green Seal" certifying Ministry of Agriculture's approval for environmentally sound production practices. Three other attributes were selected: "Appearance", "Size", and "Price". These attributes, together with the proposed labels, were presented in different combinations to a sample of 432 Costa Rican consumers at ten supermarkets located in the urban Central Valley. The results of the multinomial logit model demonstrate that the attributes "Appearance" and "Price" the have the strongest influence over the probability choosing alternative scenarios. Also, there was a significant preference for the "Blue Seal" and the "Blue Seal" and "Green Seal" combined. The socioeconomic variables turned out to be not significant in consumer choice. The results show a MWTP of 20% for the "Blue Seal" certifying healthy produce, and an additional 19% for the "Green Seal". The favorable acceptance of the certification seals on the part of the Costa Rican consumer can imply a large internal market for organic and ecologically healthy produce.Consumer/Household Economics,

    Water Markets in Mexico: Opportunities and Constraints

    Get PDF
    In 1992, the Government of Mexico initiated a new national water law which decentralised water resources management and allowed the market transfer of water-use concessions between individual irrigators. These reforms were expected to improve water resources management through greater user participation in irrigation management, as well as to increase irrigators incentives to improve water-use efficiency. At the time of its proposal the 1992 Federal Water Law was considered to the first step in the establishment of limited water markets. This paper addresses the opportunities and constraints to improved water resource use and allocation through the market incentives that result from transferable water-use permits. The paper reviews water allocation institutions in Mexico and provides case studies of water allocation and decision-making.Resource /Energy Economics and Policy,
    • 

    corecore