28 research outputs found
Improved Arabic characters recognition by combining multiple machine learning classifiers.
In this paper, we investigate a range of
strategies for combining multiple machine learning
techniques for recognizing Arabic characters, where we
are faced with imperfect and dimensionally variable input
characters. Experimental results show that combined
confidence-based backoff strategies can produce more
accurate results than each technique produces by itself
and even the ones exhibited by the majority voting
combination
Evaluation of Combining Data-Driven Dependency Parsers for Arabic
In recent years, there has been a considerable interest in dependency parsing for many reasons. First, dependency-based syntactic representations seem to be effective in many areas of NLP, such as machine translation, question answering, and relation extraction, thanks to their transparent encoding of predicate-argument structure. Second, dependency parsing is flexible for free word order languages (e.g. Arabic and Czech). Third, and most importantly, the dependency-based approach has led to the development of fast robust reasonably accurate syntactic parsers for a number of languages. In this paper, we investigate the technique of combining multiple data-driven dependency parsers for parsing Arabic. Arabic has a number of characteristics, which will be described through the paper, that make parsing it challenging. Experimental results show that combined parsers can produce more accurate results, even for imperfectly tagged text, than each parser produces by itself for texts with the gold-standard tags
Optimal k-means clustering using artificial bee colony algorithm with variable food sources length
Clustering is a robust machine learning task that involves dividing data points into a set of groups with similar traits. One of the widely used methods in this regard is the k-means clustering algorithm due to its simplicity and effectiveness. However, this algorithm suffers from the problem of predicting the number and coordinates of the initial clustering centers. In this paper, a method based on the first artificial bee colony algorithm with variable-length individuals is proposed to overcome the limitations of the k-means algorithm. Therefore, the proposed technique will automatically predict the clusters number (the value of k) and determine the most suitable coordinates for the initial centers of clustering instead of manually presetting them. The results were encouraging compared with the traditional k-means algorithm on three real-life clustering datasets. The proposed algorithm outperforms the traditional k-means algorithm for all tested real-life datasets
Improved Arabic Characters Recognition by Combining Multiple Machine Learning Classifiers
In this paper, we investigate a range of strategies for combining multiple machine learning techniques for recognizing Arabic characters, where we are faced with imperfect and dimensionally variable input characters. Experimental results show that combined confidence-based backoff strategies can produce more accurate results than each technique produces by itself and even the ones exhibited by the majority voting combination
A Dataset for Arabic Textual Entailment
There are fewer resources for textual entailment(TE) for Arabic than for other languages,and the manpower for constructingsuch a resource is hard to come by.We describe here a semi-automatic techniquefor creating a first dataset for TEsystems for Arabic using an extension ofthe ‘headline-lead paragraph’ technique.We also sketch the difficulties inherent involunteer annotators-based judgment, anddescribe a regime to ameliorate some ofthese
