80,665 research outputs found
A Combined CNN and LSTM Model for Arabic Sentiment Analysis
Deep neural networks have shown good data modelling capabilities when dealing
with challenging and large datasets from a wide range of application areas.
Convolutional Neural Networks (CNNs) offer advantages in selecting good
features and Long Short-Term Memory (LSTM) networks have proven good abilities
of learning sequential data. Both approaches have been reported to provide
improved results in areas such image processing, voice recognition, language
translation and other Natural Language Processing (NLP) tasks. Sentiment
classification for short text messages from Twitter is a challenging task, and
the complexity increases for Arabic language sentiment classification tasks
because Arabic is a rich language in morphology. In addition, the availability
of accurate pre-processing tools for Arabic is another current limitation,
along with limited research available in this area. In this paper, we
investigate the benefits of integrating CNNs and LSTMs and report obtained
improved accuracy for Arabic sentiment analysis on different datasets.
Additionally, we seek to consider the morphological diversity of particular
Arabic words by using different sentiment classification levels.Comment: Authors accepted version of submission for CD-MAKE 201
Clustering Web Search Results For Effective Arabic Language Browsing
The process of browsing Search Results is one of the major problems with
traditional Web search engines for English, European, and any other languages
generally, and for Arabic Language particularly. This process is absolutely
time consuming and the browsing style seems to be unattractive. Organizing Web
search results into clusters facilitates users quick browsing through search
results. Traditional clustering techniques (data-centric clustering algorithms)
are inadequate since they don't generate clusters with highly readable names or
cluster labels. To solve this problem, Description-centric algorithms such as
Suffix Tree Clustering (STC) algorithm have been introduced and used
successfully and extensively with different adapted versions for English,
European, and Chinese Languages. However, till the day of writing this paper,
in our knowledge, STC algorithm has been never applied for Arabic Web Snippets
Search Results Clustering.In this paper, we propose first, to study how STC can
be applied for Arabic Language? We then illustrate by example that is
impossible to apply STC after Arabic Snippets pre-processing (stem or root
extraction) because the Merging process yields many redundant clusters.
Secondly, to overcome this problem, we propose to integrate STC in a new scheme
taking into a count the Arabic language properties in order to get the web more
and more adapted to Arabic users. The proposed approach automatically clusters
the web search results into high quality, and high significant clusters labels.
The obtained clusters not only are coherent, but also can convey the contents
to the users concisely and accurately. Therefore the Arabic users can decide at
a glance whether the contents of a cluster are of interest....
Arabic Language Sentiment Analysis on Health Services
The social media network phenomenon leads to a massive amount of valuable
data that is available online and easy to access. Many users share images,
videos, comments, reviews, news and opinions on different social networks
sites, with Twitter being one of the most popular ones. Data collected from
Twitter is highly unstructured, and extracting useful information from tweets
is a challenging task. Twitter has a huge number of Arabic users who mostly
post and write their tweets using the Arabic language. While there has been a
lot of research on sentiment analysis in English, the amount of researches and
datasets in Arabic language is limited. This paper introduces an Arabic
language dataset which is about opinions on health services and has been
collected from Twitter. The paper will first detail the process of collecting
the data from Twitter and also the process of filtering, pre-processing and
annotating the Arabic text in order to build a big sentiment analysis dataset
in Arabic. Several Machine Learning algorithms (Naive Bayes, Support Vector
Machine and Logistic Regression) alongside Deep and Convolutional Neural
Networks were utilized in our experiments of sentiment analysis on our health
dataset.Comment: Authors accepted version of submission for ASAR 201
Hybrid Arabic–French machine translation using syntactic re-ordering and morphological pre-processing
This is an accepted manuscript of an article published by Elsevier BV in Computer Speech & Language on 08/11/2014, available online: https://doi.org/10.1016/j.csl.2014.10.007
The accepted version of the publication may differ from the final published version.Arabic is a highly inflected language and a morpho-syntactically complex language with many differences compared to several languages that are heavily studied. It may thus require good pre-processing as it presents significant challenges for Natural Language Processing (NLP), specifically for Machine Translation (MT). This paper aims to examine how Statistical Machine Translation (SMT) can be improved using rule-based pre-processing and language analysis. We describe a hybrid translation approach coupling an Arabic–French statistical machine translation system using the Moses decoder with additional morphological rules that reduce the morphology of the source language (Arabic) to a level that makes it closer to that of the target language (French). Moreover, we introduce additional swapping rules for a structural matching between the source language and the target language. Two structural changes involving the positions of the pronouns and verbs in both the source and target languages have been attempted. The results show an improvement in the quality of translation and a gain in terms of BLEU score after introducing a pre-processing scheme for Arabic and applying these rules based on morphological variations and verb re-ordering (VS into SV constructions) in the source language (Arabic) according to their positions in the target language (French). Furthermore, a learning curve shows the improvement in terms on BLEU score under scarce- and large-resources conditions. The proposed approach is completed without increasing the amount of training data or radically changing the algorithms that can affect the translation or training engines.This paper is based upon work supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grant number 356097-08.Published versio
- …
