53,957 research outputs found
Hybrid Approach to English-Hindi Name Entity Transliteration
Machine translation (MT) research in Indian languages is still in its
infancy. Not much work has been done in proper transliteration of name entities
in this domain. In this paper we address this issue. We have used English-Hindi
language pair for our experiments and have used a hybrid approach. At first we
have processed English words using a rule based approach which extracts
individual phonemes from the words and then we have applied statistical
approach which converts the English into its equivalent Hindi phoneme and in
turn the corresponding Hindi word. Through this approach we have attained
83.40% accuracy.Comment: Proceedings of IEEE Students' Conference on Electrical, Electronics
and Computer Sciences 201
Improving the translation environment for professional translators
When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side.
This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project
PRIME: A System for Multi-lingual Patent Retrieval
Given the growing number of patents filed in multiple countries, users are
interested in retrieving patents across languages. We propose a multi-lingual
patent retrieval system, which translates a user query into the target
language, searches a multilingual database for patents relevant to the query,
and improves the browsing efficiency by way of machine translation and
clustering. Our system also extracts new translations from patent families
consisting of comparable patents, to enhance the translation dictionary
Improve and Implement an Open Source Question Answering System
A question answer system takes queries from the user in natural language and returns a short concise answer which best fits the response to the question. This report discusses the integration and implementation of question answer systems for English and Hindi as part of the open source search engine Yioop. We have implemented a question answer system for English and Hindi, keeping in mind users who use these languages as their primary language. The user should be able to query a set of documents and should get the answers in the same language. English and Hindi are very different when it comes to language structure, characters etc. We have implemented the Question Answer System so that it supports localization and improved Part of Speech tagging performance by storing the lexicon in the database instead of a file based lexicon. We have implemented a brill tagger variant for Part of Speech tagging of Hindi phrases and grammar rules for triplet extraction. We also improve Yioop’s lexical data handling support by allowing the user to add named entities. Our improvements to Yioop were then evaluated by comparing the retrieved answers against a dataset of answers known to be true. The test data for the question answering system included creating 2 indexes, 1 each for English and Hindi. These were created by configuring Yioop to crawl 200,000 wikipedia pages for each crawl. The crawls were configured to be domain specific so that English index consists of pages restricted to English text and Hindi index is restricted to pages with Hindi text. We then used a set of 50 questions on the English and Hindi systems. We recored, Hindi system to have an accuracy of about 55% for simple factoid questions and English question answer system to have an accuracy of 63%
Towards using web-crawled data for domain adaptation in statistical machine translation
This paper reports on the ongoing work focused on domain adaptation of statistical machine translation using domain-specific data obtained by domain-focused web crawling. We present a strategy for crawling monolingual and parallel data and their exploitation for testing, language modelling, and system tuning in a phrase--based machine translation framework. The proposed approach is evaluated on the domains of Natural Environment and Labour Legislation and two language
pairs: English–French and English–Greek
New Method for Optimization of License Plate Recognition system with Use of Edge Detection and Connected Component
License Plate recognition plays an important role on the traffic monitoring
and parking management systems. In this paper, a fast and real time method has
been proposed which has an appropriate application to find tilt and poor
quality plates. In the proposed method, at the beginning, the image is
converted into binary mode using adaptive threshold. Then, by using some edge
detection and morphology operations, plate number location has been specified.
Finally, if the plat has tilt, its tilt is removed away. This method has been
tested on another paper data set that has different images of the background,
considering distance, and angel of view so that the correct extraction rate of
plate reached at 98.66%.Comment: 3rd IEEE International Conference on Computer and Knowledge
Engineering (ICCKE 2013), October 31 & November 1, 2013, Ferdowsi Universit
Mashha
- …