8 research outputs found
Designing a Collaborative Process to Create Bilingual Dictionaries of Indonesian Ethnic Languages
The constraint-based approach has been proven useful for inducing bilingual dictionary for closely-related low-resource languages.
When we want to create multiple bilingual dictionaries linking several languages, we need to consider manual creation by a native
speaker if there are no available machine-readable dictionaries are available as input. To overcome the difficulty in planning the creation
of bilingual dictionaries, the consideration of various methods and costs, plan optimization is essential. Utilizing both constraint-based
approach and plan optimizer, we design a collaborative process for creating 10 bilingual dictionaries from every combination of 5
languages, i.e., Indonesian, Malay, Minangkabau, Javanese, and Sundanese. We further design an online collaborative dictionary
generation to bridge spatial gap between native speakers. We define a heuristic plan that only utilizes manual investment by the native
speaker to evaluate our optimal plan with total cost as an evaluation metric. The optimal plan outperformed the heuristic plan with a
63.3% cost reduction
Designing a Collaborative Process to Create Bilingual Dictionaries of Indonesian Ethnic Languages
The constraint-based approach has been proven useful for inducing bilingual dictionary for closely-related low-resource languages. When we want to create multiple bilingual dictionaries linking several languages, we need to consider manual creation by a native speaker if there are no available machine-readable dictionaries are available as input. To overcome the difficulty in planning the creation
of bilingual dictionaries, the consideration of various methods and costs, plan optimization is essential. Utilizing both constraint-based approach and plan optimizer, we design a collaborative process for creating 10 bilingual dictionaries from every combination of 5 languages, i.e., Indonesian, Malay, Minangkabau, Javanese, and Sundanese. We further design an online collaborative dictionary generation to bridge spatial gap between native speakers. We define a heuristic plan that only utilizes manual investment by the native speaker to evaluate our optimal plan with total cost as an evaluation metric. The optimal plan outperformed the heuristic plan with a 63.3% cost reduction
Designing a Collaborative Process to Create Bilingual Dictionaries of Indonesian Ethnic Languages
The constraint-based approach has been proven useful for inducing bilingual dictionary for closely-related low-resource languages.
When we want to create multiple bilingual dictionaries linking several languages, we need to consider manual creation by a native
speaker if there are no available machine-readable dictionaries are available as input. To overcome the difficulty in planning the creation
of bilingual dictionaries, the consideration of various methods and costs, plan optimization is essential. Utilizing both constraint-based
approach and plan optimizer, we design a collaborative process for creating 10 bilingual dictionaries from every combination of 5
languages, i.e., Indonesian, Malay, Minangkabau, Javanese, and Sundanese. We further design an online collaborative dictionary
generation to bridge spatial gap between native speakers. We define a heuristic plan that only utilizes manual investment by the native
speaker to evaluate our optimal plan with total cost as an evaluation metric. The optimal plan outperformed the heuristic plan with a
63.3% cost reduction
Visualizing Language Lexical Similarity Clusters: A Case Study of Indonesian Ethnic Languages
Language similarity clusters are useful for computational linguistic researches that rely on language similarity or cognate recognition. The existing language similarity clustering approach which utilizes hierarchical clustering and k-means clustering has difficulty in creating clusters with a middle range of language similarity. Moreover, it lacks an interactive visualization that user can explore. To address these issues, we formalize a graph-based approach of creating and visualizing language lexical similarity clusters by utilizing ASJP database to generate the language similarity matrix, then formalize the data as an undirected graph. To create the clusters, we apply a connected components algorithm with a threshold of language similarity range. Our interactive online tool allows a user to dynamically create new clusters by changing the threshold of language similarity range and explore the data based on language similarity range and number of speakers. We provide an implementation example of our approach to 119 Indonesian ethnic languages. The experiment result shows that for the case of low system execution burden, the system performance was quite stable. For the case of high system execution burden, despite the fluctuated performance, the response times were still below 25 seconds, which is considered acceptable
Plan Optimization to Bilingual Dictionary Induction for Low-Resource Language Families
Creating bilingual dictionary is the first crucial step in enriching
low-resource languages. Especially for the closely-related ones, it has been
shown that the constraint-based approach is useful for inducing bilingual
lexicons from two bilingual dictionaries via the pivot language. However, if
there are no available machine-readable dictionaries as input, we need to
consider manual creation by bilingual native speakers. To reach a goal of
comprehensively create multiple bilingual dictionaries, even if we already have
several existing machine-readable bilingual dictionaries, it is still difficult
to determine the execution order of the constraint-based approach to reducing
the total cost. Plan optimization is crucial in composing the order of
bilingual dictionaries creation with the consideration of the methods and their
costs. We formalize the plan optimization for creating bilingual dictionaries
by utilizing Markov Decision Process (MDP) with the goal to get a more accurate
estimation of the most feasible optimal plan with the least total cost before
fully implementing the constraint-based bilingual lexicon induction. We model a
prior beta distribution of bilingual lexicon induction precision with language
similarity and polysemy of the topology as and parameters. It
is further used to model cost function and state transition probability. We
estimated the cost of all investment plan as a baseline for evaluating the
proposed MDP-based approach with total cost as an evaluation metric. After
utilizing the posterior beta distribution in the first batch of experiments to
construct the prior beta distribution in the second batch of experiments, the
result shows 61.5\% of cost reduction compared to the estimated all investment
plan and 39.4\% of cost reduction compared to the estimated MDP optimal plan.
The MDP-based proposal outperformed the baseline on the total cost.Comment: 29 pages, 16 figures, 9 tables, accepted for publication in ACM
TALLI