1,273 research outputs found

    A Generalized Constraint Approach to Bilingual Dictionary Induction for Low-Resource Language Families

    Get PDF
    The lack or absence of parallel and comparable corpora makes bilingual lexicon extraction a difficult task for low-resource languages. The pivot language and cognate recognition approaches have been proven useful for inducing bilingual lexicons for such languages. We propose constraint-based bilingual lexicon induction for closely related languages by extending constraints from the recent pivot-based induction technique and further enabling multiple symmetry assumption cycle to reach many more cognates in the transgraph. We fur- ther identify cognate synonyms to obtain many-to-many translation pairs. This article utilizes four datasets: one Austronesian low-resource language and three Indo-European high-resource languages. We use three constraint-based methods from our previous work, the Inverse Consultation method and translation pairs generated from Cartesian product of input dictionaries as baselines. We evaluate our result using the met- rics of precision, recall, and F-score. Our customizable approach allows the user to conduct cross validation to predict the optimal hyperparameters (cognate threshold and cognate synonym threshold) with various combination of heuristics and number of symmetry assumption cycles to gain the highest F-score. Our pro- posed methods have statistically significant improvement of precision and F-score compared to our previous constraint-based methods. The results show that our method demonstrates the potential to complement other bilingual dictionary creation methods like word alignment models using parallel corpora for high-resource languages while well handling low-resource languages

    Plan Optimization to Bilingual Dictionary Induction for Low-Resource Language Families

    Get PDF
    Creating bilingual dictionary is the first crucial step in enriching low-resource languages. Especially for the closely-related ones, it has been shown that the constraint-based approach is useful for inducing bilingual lexicons from two bilingual dictionaries via the pivot language. However, if there are no available machine-readable dictionaries as input, we need to consider manual creation by bilingual native speakers. To reach a goal of comprehensively create multiple bilingual dictionaries, even if we already have several existing machine-readable bilingual dictionaries, it is still difficult to determine the execution order of the constraint-based approach to reducing the total cost. Plan optimization is crucial in composing the order of bilingual dictionaries creation with the consideration of the methods and their costs. We formalize the plan optimization for creating bilingual dictionaries by utilizing Markov Decision Process (MDP) with the goal to get a more accurate estimation of the most feasible optimal plan with the least total cost before fully implementing the constraint-based bilingual lexicon induction. We model a prior beta distribution of bilingual lexicon induction precision with language similarity and polysemy of the topology as α\alpha and β\beta parameters. It is further used to model cost function and state transition probability. We estimated the cost of all investment plan as a baseline for evaluating the proposed MDP-based approach with total cost as an evaluation metric. After utilizing the posterior beta distribution in the first batch of experiments to construct the prior beta distribution in the second batch of experiments, the result shows 61.5\% of cost reduction compared to the estimated all investment plan and 39.4\% of cost reduction compared to the estimated MDP optimal plan. The MDP-based proposal outperformed the baseline on the total cost.Comment: 29 pages, 16 figures, 9 tables, accepted for publication in ACM TALLI

    Plan Optimization for Creating Bilingual Dictionaries of Low-Resource Languages

    Get PDF
    The constraint-based approach has been proven useful for inducing bilingual lexicons for closely-related low- resource languages. When we want to create multiple bilingual dictionaries linking several languages, we need to consider manual creation by bilingual language experts if there are no available machine-readable dictionaries are available as input. To overcome the difficulty in planning the creation of bilingual dictionaries, the consideration of various methods and costs, plan optimization is essential. We adopt the Markov Decision Process (MDP) in formalizing plan optimization for creating bilingual dictionaries; the goal is to better predict the most feasible optimal plan with the least total cost before fully implementing the constraint-based bilingual dictionary induction framework. We define heuristics based on input language characteristics to devise a baseline plan for evaluating our MDP-based approach with total cost as an evaluation metric. The MDP-based proposal outperformed heuristic planning on the total cost for all datasets examined

    近縁言語のための帰納的な対訳辞書生成フレームワーク

    Get PDF
    京都大学0048新制・課程博士博士(情報学)甲第21395号情博第681号新制||情||117(附属図書館)京都大学大学院情報学研究科社会情報学専攻(主査)教授 石田 亨, 教授 吉川 正俊, 教授 河原 達也学位規則第4条第1項該当Doctor of InformaticsKyoto UniversityDFA
    corecore