612 research outputs found
Recommended from our members
Minimally supervised induction of morphology through bitexts
textA knowledge of morphology can be useful for many natural language processing systems. Thus, much effort has been expended in developing accurate computational tools for morphology that lemmatize, segment and generate new forms. The most powerful and accurate of these have been manually encoded, such endeavors being without exception expensive and time-consuming. There have been consequently many attempts to reduce this cost in the development of morphological systems through the development of unsupervised or minimally supervised algorithms and learning methods for acquisition of morphology. These efforts have yet to produce a tool that approaches the performance of manually encoded systems.
Here, I present a strategy for dealing with morphological clustering and segmentation in a minimally supervised manner but one that will be more linguistically informed than previous unsupervised approaches. That is, this study will attempt to induce clusters of words from an unannotated text that are inflectional variants of each other. Then a set of inflectional suffixes by part-of-speech will be induced from these clusters. This level of detail is made possible by a method known as alignment and transfer (AT), among other names, an approach that uses aligned bitexts to transfer linguistic resources developed for one language–the source language–to another language–the target. This approach has a further advantage in that it allows a reduction in the amount of training data without a significant degradation in performance making it useful in applications targeted at data collected from endangered languages. In the current study, however, I use English as the source and German as the target for ease of evaluation and for certain typlogical properties of German. The two main tasks, that of clustering and segmentation, are approached as sequential tasks with the clustering informing the segmentation to allow for greater accuracy in morphological analysis.
While the performance of these methods does not exceed the current roster of unsupervised or minimally supervised approaches to morphology acquisition, it attempts to integrate more learning methods than previous studies. Furthermore, it attempts to learn inflectional morphology as opposed to derivational morphology, which is a crucial distinction in linguistics.Linguistic
An analysis of customer perception using lexicon-based sentiment analysis of Arabic Texts framework.
Sentiment Analysis (SA) employing Natural Language Processing (NLP) is pivotal in determining the positivity and negativity of customer feedback. Although significant research in SA is focused on English texts, there is a growing demand for SA in other widely spoken languages, such as Arabic. This is predominantly due to the global reach of social media which enables users to express opinions on products in any language and, in turn, necessitates a thorough understanding of customers' perceptions of new products based on social media conversations. However, the current research studies demonstrate inadequacies in furnishing text analysis for comprehending the perceptions of Arabic customers towards coffee and coffee products. Therefore, this study proposes a comprehensive Lexicon-based Sentiment Analysis on Arabic Texts (LSAnArTe) framework applied to social media data, to understand customer perceptions of coffee, a widely consumed product in the Arabic-speaking world. The LSAnArTe Framework incorporates the existing AraSenTi dictionary, an Arabic database of sentiment scores for Arabic words, and lemmatizes unknown words using the Qalasadi open platform. It classifies each word as positive, negative or neutral before conducting sentence-level sentiment classification. Data collected from X (formerly known as Twitter, resulted in a cleaned dataset of 10,769 tweets, is used to validate the proposed framework, which is then compared with Amazon Comprehend. The dataset was annotated manually to ensure maximum accuracy and reliability in validating the proposed LSAnArTe Framework. The results revealed that the proposed LSAnArTe Framework, with an accuracy score of 93.79Â %, outperformed the Amazon Comprehend tool, which had an accuracy of 51.90Â %
Optimizing Deep Learning Model Parameters with the Bees Algorithm for Improved Medical Text Classification
This paper introduces a novel mechanism to obtain the optimal parameters of a
deep learning model using the Bees Algorithm, which is a recent promising swarm
intelligence algorithm. The optimization problem is to maximize the accuracy of
classifying ailments based on medical text given the initial hyper-parameters
to be adjusted throughout a definite number of iterations. Experiments included
two different datasets: English and Arabic. The highest accuracy achieved is
99.63% on the English dataset using Long Short-Term Memory (LSTM) along with
the Bees Algorithm, and 88% on the Arabic dataset using AraBERT
Customer sentiment analysis for Arabic social media using a novel ensemble machine learning approach
Arabic’s complex morphology, orthography, and dialects make sentiment analysis difficult. This activity makes it harder to extract text attributes from short conversations to evaluate tone. Analyzing and judging a person’s emotional state is complex. Due to these issues, interpreting sentiments accurately and identifying polarity may take much work. Sentiment analysis extracts subjective information from text. This research evaluates machine learning (ML) techniques for understanding Arabic emotions. Sentiment analysis (SA) uses a support vector machine (SVM), Adaboost classifier (AC), maximum entropy (ME), k-nearest neighbors (KNN), decision tree (DT), random forest (RF), logistic regression (LR), and naive Bayes (NB). A model for the ensemble-based sentiment was developed. Ensemble classifiers (ECs) with 10-fold cross-validation out-performed other machine learning classifiers in accuracy (A), specificity (S), precision (P), F1 score (FS), and sensitivity (S).
Connecting Dream Networks Across Cultures
Many species dream, yet there remain many open research questions in the
study of dreams. The symbolism of dreams and their interpretation is present in
cultures throughout history. Analysis of online data sources for dream
interpretation using network science leads to understanding symbolism in dreams
and their associated meaning. In this study, we introduce dream interpretation
networks for English, Chinese and Arabic that represent different cultures from
various parts of the world. We analyze communities in these networks, finding
that symbols within a community are semantically related. The central nodes in
communities give insight about cultures and symbols in dreams. The community
structure of different networks highlights cultural similarities and
differences. Interconnections between different networks are also identified by
translating symbols from different languages into English. Structural
correlations across networks point out relationships between cultures.
Similarities between network communities are also investigated by analysis of
sentiment in symbol interpretations. We find that interpretations within a
community tend to have similar sentiment. Furthermore, we cluster communities
based on their sentiment, yielding three main categories of positive, negative,
and neutral dream symbols.Comment: 6 pages, 3 figure
- …