5 research outputs found
تعلم اللغة العربية بمساعدة الحاسب عن طريق الكلمات المتشابهة = Computer assisted language learning for Arabic using cognates
يتحدث الناس حول العالم بالعديد من اللغات المختلفة والتي تمتد بعض منها إلى مئات أو الآف السنين، ونشأت هذه اللغات بطرق مختلفة وانحدر بعض منها من لغات أخرى، ومع مرور الوقت توسع استخدام بعض هذه اللغات لتصبح لغات عالمية بينما تعرضت لغات أخرى إلى الاندثار. كما أن العديد من اللغات قد تعرضت للتحوير الذي يطرأ بشكل طبيعي على أي لغة مع مرور الوقت وبنفس الوقت تمت ولادة لغات جديدة من أعقاب لغات أخرى ولا شك ان هناك تشابهًا كبيرًا بين هذه اللغات الطبيعية، وتختلف نسبة التشابه ليشكل في بعض الأحيان ما يزيد عن نصف كلمات بعض اللغات والتي تعتبر مستلة من لغات أخرى. توسع استخدامات الحاسب الآلي في السنوات الماضية بشكل كبير حتى شملت استخداماته تعلم وتعليم اللغات الحية، ومن هذه اللغات حظيت اللغة الانجليزية بجانب كبير من هذا الاهتمام بينما لم تحظ اللغة العربية إلا على جانب قليل منه، وقد طورت في السنوات القليلة الماضية العديد من البرنامج والتطبيقات المستخدمة لتعليم اللغات ومنها اللغة العربية عن طريق الدروس والأمثلة والمسابقات والأسئلة والأجوبة والألعاب الحيوية التعليمية، بينما لا يوجد أي برنامج يقوم باكتشاف الكلمات المتشابهة في اللفظ والمعنى بين اللغة العربية واللغات الأخرىبشكل تلقائي ويستفيد من ذلك في تعليم اللغة، والتي تساعد هذه العملية على تسريع التعلم بشكل سهل وفعال وممتع، حيث ان المستخدم سيسهل عليه تذكر الكلمات ذات اللفظ المتشابه مع لغته الأم. يهدف هذا البحث إلى استخدام الحاسب وتقنية المعلومات لتسريع تعلم وتعليم اللغة العربية للمتحدثين بغيرها (اللغة الانجليزية أنموذجا في هذا البحث) وذلك عن طريق التركيز على الكلمات المتشابهة بين اللغة العربية واللغة الانجليزية، سيتمكن المتعلم عن طريق البرنامج من إنشاء وإضافة الدروس وسيتم استخراج الكلمات المتشابهة آلياً عن طريق الحاسب، سيتم استخدام المنهج التحليلي لدراسة وتحليل الطرق والخوارزميات المستخدمة لإيجاد التشابه بين اللغات المختلفة ومن ثم تطوير نظام لإيجاد التشابه بين اللغة العربية والانجليزية. وتعد هذه الأنظمة إحدى طرائق معالجة اللغات الطبيعية.
***** People around the world speak many different languages, some of
which extend back to hundreds or thousands of years, and these languages arose
in different ways and some of them descended from other languages. Many
languages have undergone modification that occurs naturally in any language
with time. At the same time, new languages were born from other languages,
and for sure there are large similarities between these languages, and the
similarity rate varies. Therefore, it sometimes can be more than half of the
language’s words, which are considered derived from other languages. The use
of computers in the past years has expanded greatly to be used in various
aspects of human life including learning and teaching living languages. Among
these languages, English language received a large part of this attention, while
the Arabic language received only a small part of it. In the past few years, many
programs and applications used to teach Languages, including the Arabic
language, through lessons, examples, competitions, questions and answers in
addition to educational games. While there is no program that detects words that
are similar in pronunciation and meaning between Arabic and other languages,
this process helps to accelerate learning in an easy, effective and enjoyable way.
Thus, the learner will find it easier for him to remember words with
pronunciation similar to his mother tongue. This research aims to use the
computer and information technology to accelerate the learning and teaching of
the Arabic language to non-native speakers (the English language is a model in
this research) by focusing on the similar words (cognates) between Arabic and
English, through the program, the learner will be able to create and add lessons,
then the cognates will be extracted automatically by computer. The analytical
approach will be used to study the methods and algorithms used to find
similarities between different languages, and then developing a new algorithm
to find similarities between Arabic and English languages, and this method is
considered one of the Natural Language Processing techniques
Effective retrieval techniques for Arabic text
Arabic is a major international language, spoken in more than 23 countries, and the lingua franca of the Islamic world. The number of Arabic-speaking Internet users has grown over nine-fold in the Middle East between the year 2000 and 2007, yet research in Arabic Information Retrieval (AIR) has not advanced as in other languages such as English. In this thesis, we explore techniques that improve the performance of AIR systems. Stemming is considered one of the most important factors to improve retrieval effectiveness of AIR systems. Most current stemmers remove affixes without checking whether the removed letters are actually affixes. We propose lexicon-based improvements to light stemming that distinguish core letters from proper Arabic affixes. We devise rules to stem most affixes and show their effects on retrieval effectiveness. Using the TREC 2001 test collection, we show that applying relevance feedback with our rules produces significantly better results than light stemming. Techniques for Arabic information retrieval have been studied in depth on clean collections of newswire dispatches. However, the effectiveness of such techniques is not known on other noisy collections in which text is generated using automatic speech recognition (ASR) systems and queries are generated using machine translations (MT). Using noisy collections, we show that normalisation, stopping and light stemming improve results as in normal text collections but that n-grams and root stemming decrease performance. Most recent AIR research has been undertaken using collections that are far smaller than the collections used for English text retrieval; consequently, the significance of some published results is debatable. Using the LDC Arabic GigaWord collection that contains more than 1 500 000 documents, we create a test collection of~90 topics with their relevance judgements. Using this test collection, we show empirically that for a large collection, root stemming is not competitive. Of the approaches we have studied, lexicon-based stemming approaches perform better than light stemming approaches alone. Arabic text commonly includes foreign words transliterated into Arabic characters. Several transliterated forms may be in common use for a single foreign word, but users rarely use more than one variant during search tasks. We test the effectiveness of lexicons, Arabic patterns, and n-grams in distinguishing foreign words from native Arabic words. We introduce rules that help filter foreign words and improve the n-gram approach used in language identification. Our combined n-grams and lexicon approach successfully identifies 80% of all foreign words with a precision of 93%. To find variants of a specific foreign word, we apply phonetic and string similarity techniques and introduce novel algorithms to normalise them in Arabic text. We modify phonetic techniques used for English to suit the Arabic language, and compare several techniques to determine their effectiveness in finding foreign word variants. We show that our algorithms significantly improve recall. We also show that expanding queries using variants identified by our Soutex4 phonetic algorithm results in a significant improvement in precision and recall. Together, the approaches described in this thesis represent an important step towards realising highly effective retrieval of Arabic text