Search CORE

612 research outputs found

Improving Statistical MT through Morphological Analysis

Author: Goldwater Sharon
McClosky David
Publication venue
Publication date: 01/01/2005
Field of study

Crossref

Edinburgh Research Explorer

Recommended from our members

Minimally supervised induction of morphology through bitexts

Author: Moon Taesun, Ph. D.
Publication venue
Publication date: 01/12/2008
Field of study

textA knowledge of morphology can be useful for many natural language processing systems. Thus, much effort has been expended in developing accurate computational tools for morphology that lemmatize, segment and generate new forms. The most powerful and accurate of these have been manually encoded, such endeavors being without exception expensive and time-consuming. There have been consequently many attempts to reduce this cost in the development of morphological systems through the development of unsupervised or minimally supervised algorithms and learning methods for acquisition of morphology. These efforts have yet to produce a tool that approaches the performance of manually encoded systems. Here, I present a strategy for dealing with morphological clustering and segmentation in a minimally supervised manner but one that will be more linguistically informed than previous unsupervised approaches. That is, this study will attempt to induce clusters of words from an unannotated text that are inflectional variants of each other. Then a set of inflectional suffixes by part-of-speech will be induced from these clusters. This level of detail is made possible by a method known as alignment and transfer (AT), among other names, an approach that uses aligned bitexts to transfer linguistic resources developed for one language–the source language–to another language–the target. This approach has a further advantage in that it allows a reduction in the amount of training data without a significant degradation in performance making it useful in applications targeted at data collected from endangered languages. In the current study, however, I use English as the source and German as the target for ease of evaluation and for certain typlogical properties of German. The two main tasks, that of clustering and segmentation, are approached as sequential tasks with the clustering informing the segmentation to allow for greater accuracy in morphological analysis. While the performance of these methods does not exceed the current roster of unsupervised or minimally supervised approaches to morphology acquisition, it attempts to integrate more learning methods than previous studies. Furthermore, it attempts to learn inflectional morphology as opposed to derivational morphology, which is a crucial distinction in linguistics.Linguistic

Texas ScholarWorks

An analysis of customer perception using lexicon-based sentiment analysis of Arabic Texts framework.

Author: Alam AS
Alsemaree O
Gill SS
Uhlig S
Publication venue: Elsevier
Publication date: 01/05/2024
Field of study

Sentiment Analysis (SA) employing Natural Language Processing (NLP) is pivotal in determining the positivity and negativity of customer feedback. Although significant research in SA is focused on English texts, there is a growing demand for SA in other widely spoken languages, such as Arabic. This is predominantly due to the global reach of social media which enables users to express opinions on products in any language and, in turn, necessitates a thorough understanding of customers' perceptions of new products based on social media conversations. However, the current research studies demonstrate inadequacies in furnishing text analysis for comprehending the perceptions of Arabic customers towards coffee and coffee products. Therefore, this study proposes a comprehensive Lexicon-based Sentiment Analysis on Arabic Texts (LSAnArTe) framework applied to social media data, to understand customer perceptions of coffee, a widely consumed product in the Arabic-speaking world. The LSAnArTe Framework incorporates the existing AraSenTi dictionary, an Arabic database of sentiment scores for Arabic words, and lemmatizes unknown words using the Qalasadi open platform. It classifies each word as positive, negative or neutral before conducting sentence-level sentiment classification. Data collected from X (formerly known as Twitter, resulted in a cleaned dataset of 10,769 tweets, is used to validate the proposed framework, which is then compared with Amazon Comprehend. The dataset was annotated manually to ensure maximum accuracy and reliability in validating the proposed LSAnArTe Framework. The results revealed that the proposed LSAnArTe Framework, with an accuracy score of 93.79 %, outperformed the Amazon Comprehend tool, which had an accuracy of 51.90 %

Queen Mary Research Online

Optimizing Deep Learning Model Parameters with the Bees Algorithm for Improved Medical Text Classification

Author: Alghfeli Maryam
Ibrahim Adham
Kashkash Mariam
Shaaban Mai A.
Publication venue
Publication date: 14/03/2023
Field of study

This paper introduces a novel mechanism to obtain the optimal parameters of a deep learning model using the Bees Algorithm, which is a recent promising swarm intelligence algorithm. The optimization problem is to maximize the accuracy of classifying ailments based on medical text given the initial hyper-parameters to be adjusted throughout a definite number of iterations. Experiments included two different datasets: English and Arabic. The highest accuracy achieved is 99.63% on the English dataset using Long Short-Term Memory (LSTM) along with the Bees Algorithm, and 88% on the Arabic dataset using AraBERT

arXiv.org e-Print Archive

Customer sentiment analysis for Arabic social media using a novel ensemble machine learning approach

Author: Habbat Nassera
Hicham Nouri
Karim Sabri
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 31/03/2023
Field of study

Arabic’s complex morphology, orthography, and dialects make sentiment analysis difficult. This activity makes it harder to extract text attributes from short conversations to evaluate tone. Analyzing and judging a person’s emotional state is complex. Due to these issues, interpreting sentiments accurately and identifying polarity may take much work. Sentiment analysis extracts subjective information from text. This research evaluates machine learning (ML) techniques for understanding Arabic emotions. Sentiment analysis (SA) uses a support vector machine (SVM), Adaboost classifier (AC), maximum entropy (ME), k-nearest neighbors (KNN), decision tree (DT), random forest (RF), logistic regression (LR), and naive Bayes (NB). A model for the ensemble-based sentiment was developed. Ensemble classifiers (ECs) with 10-fold cross-validation out-performed other machine learning classifiers in accuracy (A), specificity (S), precision (P), F1 score (FS), and sensitivity (S).

Institute of Advanced Engineering and Science

A Novel Methodology for Topic Identification in Hadith

Author: Amina El Ganadi
Federico Ruozzi
Luca Gagliardelli
Sania Aftar
Sonia Bergamaschi
Publication venue
Publication date: 01/01/2024
Field of study

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Connecting Dream Networks Across Cultures

Author: Menczer Filippo
Varol Onur
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

Many species dream, yet there remain many open research questions in the study of dreams. The symbolism of dreams and their interpretation is present in cultures throughout history. Analysis of online data sources for dream interpretation using network science leads to understanding symbolism in dreams and their associated meaning. In this study, we introduce dream interpretation networks for English, Chinese and Arabic that represent different cultures from various parts of the world. We analyze communities in these networks, finding that symbols within a community are semantically related. The central nodes in communities give insight about cultures and symbols in dreams. The community structure of different networks highlights cultural similarities and differences. Interconnections between different networks are also identified by translating symbols from different languages into English. Structural correlations across networks point out relationships between cultures. Similarities between network communities are also investigated by analysis of sentiment in symbol interpretations. We find that interpretations within a community tend to have similar sentiment. Furthermore, we cluster communities based on their sentiment, yielding three main categories of positive, negative, and neutral dream symbols.Comment: 6 pages, 3 figure

arXiv.org e-Print Archive

Crossref