19,620 research outputs found

    An analysis of machine translation errors on the effectiveness of an Arabic-English QA system

    Get PDF
    The aim of this paper is to investigate how much the effectiveness of a Question Answering (QA) system was affected by the performance of Machine Translation (MT) based question translation. Nearly 200 questions were selected from TREC QA tracks and ran through a question answering system. It was able to answer 42.6% of the questions correctly in a monolingual run. These questions were then translated manually from English into Arabic and back into English using an MT system, and then re-applied to the QA system. The system was able to answer 10.2% of the translated questions. An analysis of what sort of translation error affected which questions was conducted, concluding that factoid type questions are less prone to translation error than others

    EveTAR: Building a Large-Scale Multi-Task Test Collection over Arabic Tweets

    Full text link
    This article introduces a new language-independent approach for creating a large-scale high-quality test collection of tweets that supports multiple information retrieval (IR) tasks without running a shared-task campaign. The adopted approach (demonstrated over Arabic tweets) designs the collection around significant (i.e., popular) events, which enables the development of topics that represent frequent information needs of Twitter users for which rich content exists. That inherently facilitates the support of multiple tasks that generally revolve around events, namely event detection, ad-hoc search, timeline generation, and real-time summarization. The key highlights of the approach include diversifying the judgment pool via interactive search and multiple manually-crafted queries per topic, collecting high-quality annotations via crowd-workers for relevancy and in-house annotators for novelty, filtering out low-agreement topics and inaccessible tweets, and providing multiple subsets of the collection for better availability. Applying our methodology on Arabic tweets resulted in EveTAR , the first freely-available tweet test collection for multiple IR tasks. EveTAR includes a crawl of 355M Arabic tweets and covers 50 significant events for which about 62K tweets were judged with substantial average inter-annotator agreement (Kappa value of 0.71). We demonstrate the usability of EveTAR by evaluating existing algorithms in the respective tasks. Results indicate that the new collection can support reliable ranking of IR systems that is comparable to similar TREC collections, while providing strong baseline results for future studies over Arabic tweets

    Optimal line length for reading schoolbook on screen

    Get PDF
    Although experimental studies have shown a strong impact of text layout on the legibility of e- text, many digital texts appearing in eBook or the internet use different designs, so that there is no straightforward answer in the literature over which one to follow when designing e- material. Therefore, in this paper we shall focus on the text layout, particularly the influence of line lengthen reading performance of e-school book.48 native Arabic students (24 male and 24 female) volunteered for this experiment. The participants’ age ranged from 9 to 13. Performance of students was assessed through two dependent variables: (1) time to complete each tasks; and (2) accuracy of the answers. Accuracy data was based on the number of correct answers the students provided and the total score was 12 points. Several findings were reported by this experiment such as; the time needed to complete all the question models becomes significantly low when students are older, errors for all the question models are expected to be significantly lower for older students. Reading text on a single column with double columns shows that the reading process is affected by the students’ age, as older students were faster when reading through double columns, while students aged 9 prefer the single column in both reading processes. The study has recommended double line for fast reading for students their reading performance is satisfactory. While, long line has suggested for students with difficulty in reading

    The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants

    Full text link
    We present Belebele, a multiple-choice machine reading comprehension (MRC) dataset spanning 122 language variants. Significantly expanding the language coverage of natural language understanding (NLU) benchmarks, this dataset enables the evaluation of text models in high-, medium-, and low-resource languages. Each question is based on a short passage from the Flores-200 dataset and has four multiple-choice answers. The questions were carefully curated to discriminate between models with different levels of general language comprehension. The English dataset on its own proves difficult enough to challenge state-of-the-art language models. Being fully parallel, this dataset enables direct comparison of model performance across all languages. We use this dataset to evaluate the capabilities of multilingual masked language models (MLMs) and large language models (LLMs). We present extensive results and find that despite significant cross-lingual transfer in English-centric LLMs, much smaller MLMs pretrained on balanced multilingual data still understand far more languages. We also observe that larger vocabulary size and conscious vocabulary construction correlate with better performance on low-resource languages. Overall, Belebele opens up new avenues for evaluating and analyzing the multilingual capabilities of NLP systems.Comment: 27 pages, 13 figure

    Readers reading practices of EFL Yemeni students: recommendations for the 21st century

    Get PDF
    This paper investigates the reading practices of forty-five second year EFL Yemeni undergraduate students using the Four Resources Model of multiliteracy practices. The Four Resources Model of multiliteracy practices organizes reading practices into four key practices: code breaking, text participating, text uses and text analysing levels. Quantitative and qualitative methods, designed based on the Four Resources Model constructs, were used to collect data from a sample of students studying English as a Foreign Language at a university in Yemen. Quantitative data was collected through a questionnaire, while qualitative data was gathered using semi-structured interviews guided by the research objectives. The findings reveal that Yemeni students were medium users of the code breaker and text user practices whereas the meaning making and text analysis practices were reported to be used in low usage. On the whole, these early findings suggest that the reading practices and reading abilities of the Yemeni students are still limited even at the tertiary level and have not developed fully with regard to reading in English. This paper reports in detail, the use of the Four Resources Model as a tool to determine reading efficacy while examining the aforementioned findings. Discussion is put forward on the implications for teaching of reading and its approaches in a Yemeni context, especially in view of the students‟ reading needs at the tertiary level in Yemen

    AR2SPARQL: An Arabic Natural Language Interface for the Semantic Web

    Get PDF
    With the growing interest in supporting the Arabic language on the Semantic Web (SW), there is an emerging need to enable Arab users to query ontologies and RDF stores without being challenged with the formal logic of the SW. In the domain of English language, several efforts provided Natural Language (NL) interfaces to enable ordinary users to query ontologies using NL queries. However, none of these efforts were designed to support the Arabic language which has different morphological and semantic structures. As a step towards supporting Arabic Question Answering (QA) on the SW, this work presents AR2SPARQL, a NL interface that takes questions expressed in Arabic and returns answers drawn from an ontology-based knowledge base. The core of AR2SPARQL is the approach we propose to translate Arabic questions into triples which are matched against RDF data to retrieve an answer. The system uses both linguistic and semantic features to resolve ambiguity when matching words to the ontology content. To overcome the limited support for Arabic Natural Language Processing (NLP), the system does not make intensive use of sophisticated linguistic methods. Instead, it relies more on the knowledge defined in the ontology and the grammar rules we define to capture the structures of Arabic questions and to construct an adequate RDF representations. AR2SPARQL has been tested with two different datasets and results have shown that it achieves a good retrieval performance in terms of precision and recall

    DEVELOPMENT OF MAHÂRAH AL-ISTIMÂ’ TEST INSTRUMENT FOR ELECTRONIC BASED ARABIC STUDENT USING THE KAHOOT! APPLICATION

    Get PDF
    This study aimed to develop the feasibility of the Mahñrah al-Istimñ’ test instrument for electronic-based Arabic students using the Kahoot! application at UIN Sunan Kalijaga Yogyakarta. The method used in this research is the research and development of  Borg and Gall model through analysis, design, and testing. Eligibility is based on the expert validator test and the final operation field test. The results of the study can be concluded that: 1) Preparation of the Test Instrument refers to the 5 objectives or indicators of Mahmud Kamil an-Naqah into 50 questions in product design using the Kahoot! application, 2) From the results of the expert validator it is known that the results of the material quality are obtained by an average 5.37 and the media quality of the application is obtained by an average of 4.75 (very feasible), 3) After being obtained from the main field test to 20 students there are 6 questions or 12% that are not feasible because they are invalid and have not discrimination index, then the author revise and test the revision results to 70 students with 100% valid results and are suitable for use

    Evaluating the retrieval effectiveness of Web search engines using a representative query sample

    Full text link
    Search engine retrieval effectiveness studies are usually small-scale, using only limited query samples. Furthermore, queries are selected by the researchers. We address these issues by taking a random representative sample of 1,000 informational and 1,000 navigational queries from a major German search engine and comparing Google's and Bing's results based on this sample. Jurors were found through crowdsourcing, data was collected using specialised software, the Relevance Assessment Tool (RAT). We found that while Google outperforms Bing in both query types, the difference in the performance for informational queries was rather low. However, for navigational queries, Google found the correct answer in 95.3 per cent of cases whereas Bing only found the correct answer 76.6 per cent of the time. We conclude that search engine performance on navigational queries is of great importance, as users in this case can clearly identify queries that have returned correct results. So, performance on this query type may contribute to explaining user satisfaction with search engines

    Parallel corpus multi stream question answering with applications to the Qu'ran

    Get PDF
    Question-Answering (QA) is an important research area, which is concerned with developing an automated process that answers questions posed by humans in a natural language. QA is a shared task for the Information Retrieval (IR), Information Extraction (IE), and Natural Language Processing communities (NLP). A technical review of different QA system models and methodologies reveals that a typical QA system consists of different components to accept a natural language question from a user and deliver its answer(s) back to the user. Existing systems have been usually aimed at structured/ unstructured data collected from everyday English text, i.e. text collected from television programmes, news wires, conversations, novels and other similar genres. Despite all up-to-date research in the subject area, a notable fact is that none of the existing QA Systems has been tested on a Parallel Corpus of religious text with the aim of question answering. Religious text has peculiar characteristics and features which make it more challenging for traditional QA methods than other kinds of text. This thesis proposes PARMS (Parallel Corpus Multi Stream) Methodology; a novel method applying existing advanced IR (Information Retrieval) techniques, and combining them with NLP (Natural Language Processing) methods and additional semantic knowledge to implement QA (Question Answering) for a parallel corpus. A parallel Corpus involves use of multiple forms of the same corpus where each form differs from others in a certain aspect, e.g. translations of a scripture from one language to another by different translators. Additional semantic knowledge can be referred as a stream of information related to a corpus. PARMS uses Multiple Streams of semantic knowledge including a general ontology (WordNet) and domain-specific ontologies (QurTerms, QurAna, QurSim). This additional knowledge has been used in embedded form for Query Expansion, Corpus Enrichment and Answer Ranking. The PARMS Methodology has wider applications. This thesis applies it to the Quran – the core text of Islam; as a first case study. The PARMS Method uses parallel corpus comprising ten different English translations of the Quran. An individual Quranic verse is treated as an answer to questions asked in a natural language, English. This thesis also implements PARMS QA Application as a proof of concept for the PARMS methodology. The PARMS Methodology aims to evaluate the range of semantic knowledge streams separately and in combination; and also to evaluate alternative subsets of the DATA source: QA from one stream vs. parallel corpus. Results show that use of Parallel Corpus and Multiple Streams of semantic knowledge have obvious advantages. To the best of my knowledge, this method is developed for the first time and it is expected to be a benchmark for further research area
    • 

    corecore