5 research outputs found

    تعلم اللغة العربية بمساعدة الحاسب عن طريق الكلمات المتشابهة = Computer assisted language learning for Arabic using cognates

    Get PDF
    يتحدث الناس حول العالم بالعديد من اللغات المختلفة والتي تمتد بعض منها إلى مئات أو الآف السنين، ونشأت هذه اللغات بطرق مختلفة وانحدر بعض منها من لغات أخرى، ومع مرور الوقت توسع استخدام بعض هذه اللغات لتصبح لغات عالمية بينما تعرضت لغات أخرى إلى الاندثار. كما أن العديد من اللغات قد تعرضت للتحوير الذي يطرأ بشكل طبيعي على أي لغة مع مرور الوقت وبنفس الوقت تمت ولادة لغات جديدة من أعقاب لغات أخرى ولا شك ان هناك تشابهًا كبيرًا بين هذه اللغات الطبيعية، وتختلف نسبة التشابه ليشكل في بعض الأحيان ما يزيد عن نصف كلمات بعض اللغات والتي تعتبر مستلة من لغات أخرى. توسع استخدامات الحاسب الآلي في السنوات الماضية بشكل كبير حتى شملت استخداماته تعلم وتعليم اللغات الحية، ومن هذه اللغات حظيت اللغة الانجليزية بجانب كبير من هذا الاهتمام بينما لم تحظ اللغة العربية إلا على جانب قليل منه، وقد طورت في السنوات القليلة الماضية العديد من البرنامج والتطبيقات المستخدمة لتعليم اللغات ومنها اللغة العربية عن طريق الدروس والأمثلة والمسابقات والأسئلة والأجوبة والألعاب الحيوية التعليمية، بينما لا يوجد أي برنامج يقوم باكتشاف الكلمات المتشابهة في اللفظ والمعنى بين اللغة العربية واللغات الأخرىبشكل تلقائي ويستفيد من ذلك في تعليم اللغة، والتي تساعد هذه العملية على تسريع التعلم بشكل سهل وفعال وممتع، حيث ان المستخدم سيسهل عليه تذكر الكلمات ذات اللفظ المتشابه مع لغته الأم. يهدف هذا البحث إلى استخدام الحاسب وتقنية المعلومات لتسريع تعلم وتعليم اللغة العربية للمتحدثين بغيرها (اللغة الانجليزية أنموذجا في هذا البحث) وذلك عن طريق التركيز على الكلمات المتشابهة بين اللغة العربية واللغة الانجليزية، سيتمكن المتعلم عن طريق البرنامج من إنشاء وإضافة الدروس وسيتم استخراج الكلمات المتشابهة آلياً عن طريق الحاسب، سيتم استخدام المنهج التحليلي لدراسة وتحليل الطرق والخوارزميات المستخدمة لإيجاد التشابه بين اللغات المختلفة ومن ثم تطوير نظام لإيجاد التشابه بين اللغة العربية والانجليزية. وتعد هذه الأنظمة إحدى طرائق معالجة اللغات الطبيعية. ***** People around the world speak many different languages, some of which extend back to hundreds or thousands of years, and these languages arose in different ways and some of them descended from other languages. Many languages have undergone modification that occurs naturally in any language with time. At the same time, new languages were born from other languages, and for sure there are large similarities between these languages, and the similarity rate varies. Therefore, it sometimes can be more than half of the language’s words, which are considered derived from other languages. The use of computers in the past years has expanded greatly to be used in various aspects of human life including learning and teaching living languages. Among these languages, English language received a large part of this attention, while the Arabic language received only a small part of it. In the past few years, many programs and applications used to teach Languages, including the Arabic language, through lessons, examples, competitions, questions and answers in addition to educational games. While there is no program that detects words that are similar in pronunciation and meaning between Arabic and other languages, this process helps to accelerate learning in an easy, effective and enjoyable way. Thus, the learner will find it easier for him to remember words with pronunciation similar to his mother tongue. This research aims to use the computer and information technology to accelerate the learning and teaching of the Arabic language to non-native speakers (the English language is a model in this research) by focusing on the similar words (cognates) between Arabic and English, through the program, the learner will be able to create and add lessons, then the cognates will be extracted automatically by computer. The analytical approach will be used to study the methods and algorithms used to find similarities between different languages, and then developing a new algorithm to find similarities between Arabic and English languages, and this method is considered one of the Natural Language Processing techniques

    Effective retrieval techniques for Arabic text

    Get PDF
    Arabic is a major international language, spoken in more than 23 countries, and the lingua franca of the Islamic world. The number of Arabic-speaking Internet users has grown over nine-fold in the Middle East between the year 2000 and 2007, yet research in Arabic Information Retrieval (AIR) has not advanced as in other languages such as English. In this thesis, we explore techniques that improve the performance of AIR systems. Stemming is considered one of the most important factors to improve retrieval effectiveness of AIR systems. Most current stemmers remove affixes without checking whether the removed letters are actually affixes. We propose lexicon-based improvements to light stemming that distinguish core letters from proper Arabic affixes. We devise rules to stem most affixes and show their effects on retrieval effectiveness. Using the TREC 2001 test collection, we show that applying relevance feedback with our rules produces significantly better results than light stemming. Techniques for Arabic information retrieval have been studied in depth on clean collections of newswire dispatches. However, the effectiveness of such techniques is not known on other noisy collections in which text is generated using automatic speech recognition (ASR) systems and queries are generated using machine translations (MT). Using noisy collections, we show that normalisation, stopping and light stemming improve results as in normal text collections but that n-grams and root stemming decrease performance. Most recent AIR research has been undertaken using collections that are far smaller than the collections used for English text retrieval; consequently, the significance of some published results is debatable. Using the LDC Arabic GigaWord collection that contains more than 1 500 000 documents, we create a test collection of~90 topics with their relevance judgements. Using this test collection, we show empirically that for a large collection, root stemming is not competitive. Of the approaches we have studied, lexicon-based stemming approaches perform better than light stemming approaches alone. Arabic text commonly includes foreign words transliterated into Arabic characters. Several transliterated forms may be in common use for a single foreign word, but users rarely use more than one variant during search tasks. We test the effectiveness of lexicons, Arabic patterns, and n-grams in distinguishing foreign words from native Arabic words. We introduce rules that help filter foreign words and improve the n-gram approach used in language identification. Our combined n-grams and lexicon approach successfully identifies 80% of all foreign words with a precision of 93%. To find variants of a specific foreign word, we apply phonetic and string similarity techniques and introduce novel algorithms to normalise them in Arabic text. We modify phonetic techniques used for English to suit the Arabic language, and compare several techniques to determine their effectiveness in finding foreign word variants. We show that our algorithms significantly improve recall. We also show that expanding queries using variants identified by our Soutex4 phonetic algorithm results in a significant improvement in precision and recall. Together, the approaches described in this thesis represent an important step towards realising highly effective retrieval of Arabic text
    corecore