19,620 research outputs found
An analysis of machine translation errors on the effectiveness of an Arabic-English QA system
The aim of this paper is to investigate
how much the effectiveness of a Question
Answering (QA) system was affected
by the performance of Machine
Translation (MT) based question translation.
Nearly 200 questions were selected
from TREC QA tracks and ran through a
question answering system. It was able to
answer 42.6% of the questions correctly
in a monolingual run. These questions
were then translated manually from English
into Arabic and back into English using
an MT system, and then re-applied to
the QA system. The system was able to
answer 10.2% of the translated questions.
An analysis of what sort of translation error
affected which questions was conducted,
concluding that factoid type
questions are less prone to translation error
than others
EveTAR: Building a Large-Scale Multi-Task Test Collection over Arabic Tweets
This article introduces a new language-independent approach for creating a
large-scale high-quality test collection of tweets that supports multiple
information retrieval (IR) tasks without running a shared-task campaign. The
adopted approach (demonstrated over Arabic tweets) designs the collection
around significant (i.e., popular) events, which enables the development of
topics that represent frequent information needs of Twitter users for which
rich content exists. That inherently facilitates the support of multiple tasks
that generally revolve around events, namely event detection, ad-hoc search,
timeline generation, and real-time summarization. The key highlights of the
approach include diversifying the judgment pool via interactive search and
multiple manually-crafted queries per topic, collecting high-quality
annotations via crowd-workers for relevancy and in-house annotators for
novelty, filtering out low-agreement topics and inaccessible tweets, and
providing multiple subsets of the collection for better availability. Applying
our methodology on Arabic tweets resulted in EveTAR , the first
freely-available tweet test collection for multiple IR tasks. EveTAR includes a
crawl of 355M Arabic tweets and covers 50 significant events for which about
62K tweets were judged with substantial average inter-annotator agreement
(Kappa value of 0.71). We demonstrate the usability of EveTAR by evaluating
existing algorithms in the respective tasks. Results indicate that the new
collection can support reliable ranking of IR systems that is comparable to
similar TREC collections, while providing strong baseline results for future
studies over Arabic tweets
Optimal line length for reading schoolbook on screen
Although experimental studies have shown a strong impact of text layout on the legibility of e- text, many digital texts appearing in eBook or the internet use different designs, so that there is no straightforward answer in the literature over which one to follow when designing e- material. Therefore, in this paper we shall focus on the text layout, particularly the influence of line lengthen reading performance of e-school book.48 native Arabic students (24 male and 24 female) volunteered for this experiment. The participantsâ age ranged from 9 to 13. Performance of students was assessed through two dependent variables: (1) time to complete each tasks; and (2) accuracy of the answers. Accuracy data was based on the number of correct answers the students provided and the total score was 12 points. Several findings were reported by this experiment such as; the time needed to complete all the question models becomes significantly low when students are older, errors for all the question models are expected to be significantly lower for older students. Reading text on a single column with double columns shows that the reading process is affected by the studentsâ age, as older students were faster when reading through double columns, while students aged 9 prefer the single column in both reading processes. The study has recommended double line for fast reading for students their reading performance is satisfactory. While, long line has suggested for students with difficulty in reading
The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants
We present Belebele, a multiple-choice machine reading comprehension (MRC)
dataset spanning 122 language variants. Significantly expanding the language
coverage of natural language understanding (NLU) benchmarks, this dataset
enables the evaluation of text models in high-, medium-, and low-resource
languages. Each question is based on a short passage from the Flores-200
dataset and has four multiple-choice answers. The questions were carefully
curated to discriminate between models with different levels of general
language comprehension. The English dataset on its own proves difficult enough
to challenge state-of-the-art language models. Being fully parallel, this
dataset enables direct comparison of model performance across all languages. We
use this dataset to evaluate the capabilities of multilingual masked language
models (MLMs) and large language models (LLMs). We present extensive results
and find that despite significant cross-lingual transfer in English-centric
LLMs, much smaller MLMs pretrained on balanced multilingual data still
understand far more languages. We also observe that larger vocabulary size and
conscious vocabulary construction correlate with better performance on
low-resource languages. Overall, Belebele opens up new avenues for evaluating
and analyzing the multilingual capabilities of NLP systems.Comment: 27 pages, 13 figure
Readers reading practices of EFL Yemeni students: recommendations for the 21st century
This paper investigates the reading practices of forty-five second year EFL Yemeni undergraduate students using the Four Resources Model of multiliteracy practices. The Four Resources Model of multiliteracy practices organizes reading practices into four key practices: code breaking, text participating, text uses and text analysing levels. Quantitative and qualitative methods, designed based on the Four Resources Model constructs, were used to collect data from a sample of students studying English as a Foreign Language at a university in Yemen. Quantitative data was collected through a questionnaire, while qualitative data was gathered using semi-structured interviews guided by the research objectives. The findings reveal that Yemeni students were medium users of the code breaker and text user practices whereas the meaning making and text analysis practices were reported to be used in low usage. On the whole, these early findings suggest that the reading practices and reading abilities of the Yemeni students are still limited even at the tertiary level and have not developed fully with regard to reading in English. This paper reports in detail, the use of the Four Resources Model as a tool to determine reading efficacy while examining the aforementioned findings. Discussion is put forward on the implications for teaching of reading and its approaches in a Yemeni context, especially in view of the studentsâ reading needs at the tertiary level in Yemen
AR2SPARQL: An Arabic Natural Language Interface for the Semantic Web
With the growing interest in supporting the Arabic language on the Semantic Web (SW), there is an emerging need to enable Arab users to query ontologies and RDF stores without being challenged with the formal logic of the SW. In the domain of English language, several efforts provided Natural Language (NL) interfaces to enable ordinary users to query ontologies using NL queries. However, none of these efforts were designed to support the Arabic language which has different morphological and semantic structures.
As a step towards supporting Arabic Question Answering (QA) on the SW, this work presents AR2SPARQL, a NL interface that takes questions expressed in Arabic and returns answers drawn from an ontology-based knowledge base. The core of AR2SPARQL is the approach we propose to translate Arabic questions into triples which are matched against RDF data to retrieve an answer. The system uses both linguistic and semantic features to resolve ambiguity when matching words to the ontology content. To overcome the limited support for Arabic Natural Language Processing (NLP), the system does not make intensive use of sophisticated linguistic methods. Instead, it relies more on the knowledge defined in the ontology and the grammar rules we define to capture the structures of Arabic questions and to construct an adequate RDF representations. AR2SPARQL has been tested with two different datasets and results have shown that it achieves a good retrieval performance in terms of precision and recall
DEVELOPMENT OF MAHĂRAH AL-ISTIMĂâ TEST INSTRUMENT FOR ELECTRONIC BASED ARABIC STUDENT USING THE KAHOOT! APPLICATION
This study aimed to develop the feasibility of the MahĂąrah al-IstimĂąâ test instrument for electronic-based Arabic students using the Kahoot! application at UIN Sunan Kalijaga Yogyakarta. The method used in this research is the research and development of Borg and Gall model through analysis, design, and testing. Eligibility is based on the expert validator test and the final operation field test. The results of the study can be concluded that: 1) Preparation of the Test Instrument refers to the 5 objectives or indicators of Mahmud Kamil an-Naqah into 50 questions in product design using the Kahoot! application, 2) From the results of the expert validator it is known that the results of the material quality are obtained by an average 5.37 and the media quality of the application is obtained by an average of 4.75 (very feasible), 3) After being obtained from the main field test to 20 students there are 6 questions or 12% that are not feasible because they are invalid and have not discrimination index, then the author revise and test the revision results to 70 students with 100% valid results and are suitable for use
Evaluating the retrieval effectiveness of Web search engines using a representative query sample
Search engine retrieval effectiveness studies are usually small-scale, using
only limited query samples. Furthermore, queries are selected by the
researchers. We address these issues by taking a random representative sample
of 1,000 informational and 1,000 navigational queries from a major German
search engine and comparing Google's and Bing's results based on this sample.
Jurors were found through crowdsourcing, data was collected using specialised
software, the Relevance Assessment Tool (RAT). We found that while Google
outperforms Bing in both query types, the difference in the performance for
informational queries was rather low. However, for navigational queries, Google
found the correct answer in 95.3 per cent of cases whereas Bing only found the
correct answer 76.6 per cent of the time. We conclude that search engine
performance on navigational queries is of great importance, as users in this
case can clearly identify queries that have returned correct results. So,
performance on this query type may contribute to explaining user satisfaction
with search engines
Parallel corpus multi stream question answering with applications to the Qu'ran
Question-Answering (QA) is an important research area, which is concerned with developing an automated process that answers questions posed by humans in a natural language. QA is a shared task for the Information Retrieval (IR), Information Extraction (IE), and Natural Language Processing communities (NLP). A technical review of different QA system models and methodologies reveals that a typical QA system consists of different components to accept a natural language question from a user and deliver its answer(s) back to the user. Existing systems have been usually aimed at structured/ unstructured data collected from everyday English text, i.e. text collected from television programmes, news wires, conversations, novels and other similar genres. Despite all up-to-date research in the subject area, a notable fact is that none of the existing QA Systems has been tested on a Parallel Corpus of religious text with the aim of question answering. Religious text has peculiar characteristics and features which make it more challenging for traditional QA methods than other kinds of text.
This thesis proposes PARMS (Parallel Corpus Multi Stream) Methodology; a novel method applying existing advanced IR (Information Retrieval) techniques, and combining them with NLP (Natural Language Processing) methods and additional semantic knowledge to implement QA (Question Answering) for a parallel corpus. A parallel Corpus involves use of multiple forms of the same corpus where each form differs from others in a certain aspect, e.g. translations of a scripture from one language to another by different translators. Additional semantic knowledge can be referred as a stream of information related to a corpus. PARMS uses Multiple Streams of semantic knowledge including a general ontology (WordNet) and domain-specific ontologies (QurTerms, QurAna, QurSim). This additional knowledge has been used in embedded form for Query Expansion, Corpus Enrichment and Answer Ranking.
The PARMS Methodology has wider applications. This thesis applies it to the Quran â the core text of Islam; as a first case study. The PARMS Method uses parallel corpus comprising ten different English translations of the Quran. An individual Quranic verse is treated as an answer to questions asked in a natural language, English. This thesis also implements PARMS QA Application as a proof of concept for the PARMS methodology. The PARMS Methodology aims to evaluate the range of semantic knowledge streams separately and in combination; and also to evaluate alternative subsets of the DATA source: QA from one stream vs. parallel corpus. Results show that use of Parallel Corpus and Multiple Streams of semantic knowledge have obvious advantages. To the best of my knowledge, this method is developed for the first time and it is expected to be a benchmark for further research area
- âŠ