12 research outputs found

    Nlp Challenges for Machine Translation from English to Indian Languages

    Get PDF
    This Natural Langauge processing is carried particularly on English-Kannada/Telugu. Kannada is a language of India. The Kannada language has a classification of Dravidian, Southern, Tamil-Kannada, and Kannada. Regions Spoken: Kannada is also spoken in Karnataka, Andhra Pradesh, Tamil Nadu, and Maharashtra. Population: The total population of people who speak Kannada is 35,346,000, as of 1997. Alternate Name: Other names for Kannada are Kanarese, Canarese, Banglori, and Madrassi. Dialects: Some dialects of Kannada are Bijapur, Jeinu Kuruba, and Aine Kuruba. There are about 20 dialects and Badaga may be one. Kannada is the state language of Karnataka. About 9,000,000 people speak Kannada as a second language. The literacy rate for people who speak Kannada as a first language is about 60%, which is the same for those who speak Kannada as a second language (in India). Kannada was used in the Bible from 1831-2000. Statistical machine translation (SMT) is a machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora. The statistical approach contrasts with the rule-based approaches to machine translation as well as with example-based machine translatio

    Kannada and Telugu Native Languages to English Cross Language Information Retrieval

    Get PDF
    One of the crucial challenges in cross lingual information retrieval is the retrieval of relevant information for a query expressed in as native language. While retrieval of relevant documents is slightly easier, analysing the relevance of the retrieved documents and the presentation of the results to the users are non-trivial tasks. To accomplish the above task, we present our Kannada English and Telugu English CLIR systems as part of Ad-Hoc Bilingual task. We take a query translation based approach using bi-lingual dictionaries. When a query words not found in the dictionary then the words are transliterated using a simple rule based approach which utilizes the corpus to return the β€˜k’ closest English transliterations of the given Kannada/Telugu word. The resulting multiple translation/transliteration choices for each query word are disambiguated using an iterative page-rank style algorithm which, based on term-term co-occurrence statistics, produces the final translated query. Finally we conduct experiments on these translated query using a Kannada/Telugu document collection and a set of English queries to report the improvements, performance achieved for each task is to be presented and statistical analysis of these results are given

    Semantical and Syntactical Analysis of NLP

    Get PDF
    Natural language processing describes the use and ability of systems to process sentences in a natural language such as English or any other Indian Languages, rather than in specialized artificial computer languages such as C, C++. This paper deals with Syntactical and Semantical analysis of Indian languages such as Kannada for machine translation, which plays a vital role in accurate machine translation for NLP. The accurate machine translation leads to an accurate cross language information retrieval. The Syntactical and Semantical structures for machine translation are presented with an example

    Phrase Structure Based English to Kannada Sentence Translation

    Get PDF
    In order to build a natural language processing system first the words are placed into a structured form that leads to a syntactically correct sentence. Syntactic analysis of a sentence is performed by parsing technique. This paper explores the novel approach that how the shift reduce parsing technique is used for translating English sentences into a grammatically correct Kannada sentences by reordering of English parse tree structure, generating and implementing phrase structure grammar(PSG) for kannada sentences. Recursive Descent Parsing technique is used to generate English phrase tree structure and terminal symbols are tagged with Kannada equivalent words then Shift-Reduce Parsing technique is used to construct a Kannada sentence. Part-of-Speech (POS) tagger is used to tag Kannada words to English words

    Language Identification: Contrivance Learning Process Using Web Based Disquisition

    Get PDF
    Language identification is the foremost task in the study of linguistics .The projections of language identification & conversions such as Google translate or any other hypothetical translator works in wonders. The mechanism of detecting the language performed by these translators is a real marvel. Hence in this divertissement it is of the primary importance to study the methods of identifying the language. In this paper, the methodologies of recognizing some of the Natural Languages such as English, Kannada, Hindi & Telugu is explained on the basis of N-Gram algorithm and the respective vowels and consonants of each of the languages are retrieved and stored for building the syntactic structure of the corpus

    Phonetic Dictionary for Natural Language Processing: Kannada

    Get PDF
    India has 22 officially recognized languages: Assamese, Bengali, English, Gujarati, Hindi, Kannada, Kashmiri, Konkani, Malayalam, Manipuri, Marathi, Nepali, Oriya, Punjabi, Sanskrit, Tamil, Telugu, and Urdu. Clearly, India owns the language diversity problem. In the age of Internet, the multiplicity of languages makes it even more necessary to have sophisticated Systems for Natural Language Process. In this paper we are developing the phonetic dictionary for natural language processing particularly for Kannada. Phonetics is the scientific study of speech sounds. Acoustic phonetics studies the physical properties of sounds and provides a language to distinguish one sound from another in quality and quantity. Kannada language is one of the major Dravidian languages of India. The language uses forty nine phonemic letters, divided into three groups: Swaragalu (thirteen letters); Yogavaahakagalu (two letters); and Vyanjanagalu (thirty-four letters), similar to the vowels and consonants of English, respectively

    Natural Language Processing Semantical and Syntactical Analysis for English

    Get PDF
    Natural language is to facilitate the user to exchange the ideas among people. These ideas converge to form the "meaning" of an utterance or text in the form of a series of sentences. The meaning of sentences describes as semantics. The input/output of a NLP can be a written text or a speech. There are two major components of natural language processing, namely: natural language understanding which describes mapping of given input in the natural language into a useful representation and the other is natural language generation which produce natural language as output on basis of input data as text. This paper deals with natural language understanding mainly on semantic
    corecore