1,170 research outputs found
Negative vaccine voices in Swedish social media
Vaccinations are one of the most significant interventions to public health, but vaccine hesitancy creates concerns for a portion of the population in many countries, including Sweden. Since discussions on vaccine hesitancy are often taken on social networking sites, data from Swedish social media are used to study and quantify the sentiment among the discussants on the vaccination-or-not topic during phases of the COVID-19 pandemic. Out of all the posts analyzed a majority showed a stronger negative sentiment, prevailing throughout the whole of the examined period, with some spikes or jumps due to the occurrence of certain vaccine-related events distinguishable in the results. Sentiment analysis can be a valuable tool to track public opinions regarding the use, efficacy, safety, and importance of vaccination
Marginal contrast in loanword phonology:Production and perception
Though Dutch is usually described as lacking a voicing contrast at the velar place of articulation, due to intense language contact and heavy lexical borrowing, a contrast between /k/ and /g/ has recently been emerging. We explored the status of this contrast in Dutch speakers in both production and perception. We asked participants to produce loanwords containing a /g/ in the source language (e.g., goal) and found a range of productions, including a great many unadapted [g] tokens. We also tested the same speakers on their perception of the emerging [k] ~ [g] contrast and found that our participants were able to discriminate the emerging contrast well. We additionally explored the possibility that those speakers who use the new contrast more in production are also better at perceiving it, but we did not observe strong evidence of such a link. Overall, our results indicate that the adoption of the new sound is well advanced in the population we tested, but is still modulated by individual-level factors. We hold that contrasts emerging through borrowing, like other phonological contrasts, are subject to perceptual and functional constraints, and that these and other ‘marginal contrasts’ must be considered as full-fledged parts of phonology
Hypocoristics: a derivational problem
This study is an investigatory research on the two major schools of linguistics, formal and functional. The study looks at earlier versions of Generative Theory as the representative of formal linguistics and contrasts it to Skousen’s computational model which is taken as the representative of functional linguistics. The way each of the theories are described and evaluated are by considering how each of them can be used in analysing hypocoristic data. A description of hypocoristics for 165 names collected from Kuwaiti Arabic speakers were the base for the analysis. The data was given a general description at first to show how they can be accounted for in the two theories. The first approach that was used was a rule-based approach used previously with Jordanian Arabic Hypocoristics which use Semitic root and Pattern Morphology. The second rule-based approach was also a rule-based approach the employed phonological processes to account for the derivation. The two were considered part of formal theories of analysis. The functional analysis which uses a computational model that employs phonological features defined over statistically driven frequencies was used to model the data. An evaluation of the model with low success rates lead to the change of the model and present an alternative hybrid model that utilises both rules and analogy. The model was inspired by a rule-based theory which was not fleshed out and analogy was used to flesh it out and place it with a usage-based theory of language. Finally, the thesis ended with an open evaluative stand requiring further research on computational models from a computational perspective rather than a linguistics view
New approach for Arabic named entity recognition on social media based on feature selection using genetic algorithm
Many features can be extracted from the massive volume of data in different types that are available nowadays on social media. The growing demand for multimedia applications was an essential factor in this regard, particularly in the case of text data. Often, using the full feature set for each of these activities can be time-consuming and can also negatively impact performance. It is challenging to find a subset of features that are useful for a given task due to a large number of features. In this paper, we employed a feature selection approach using the genetic algorithm to identify the optimized feature set. Afterward, the best combination of the optimal feature set is used to identify and classify the Arabic named entities (NEs) based on support vector. Experimental results show that our system reaches a state-of-the-art performance of the Arab NER on social media and significantly outperforms the previous systems
Word Knowledge and Word Usage
Word storage and processing define a multi-factorial domain of scientific inquiry whose thorough investigation goes well beyond the boundaries of traditional disciplinary taxonomies, to require synergic integration of a wide range of methods, techniques and empirical and experimental findings. The present book intends to approach a few central issues concerning the organization, structure and functioning of the Mental Lexicon, by asking domain experts to look at common, central topics from complementary standpoints, and discuss the advantages of developing converging perspectives. The book will explore the connections between computational and algorithmic models of the mental lexicon, word frequency distributions and information theoretical measures of word families, statistical correlations across psycho-linguistic and cognitive evidence, principles of machine learning and integrative brain models of word storage and processing. Main goal of the book will be to map out the landscape of future research in this area, to foster the development of interdisciplinary curricula and help single-domain specialists understand and address issues and questions as they are raised in other disciplines
Unsupervised learning of Arabic non-concatenative morphology
Unsupervised approaches to learning the morphology of a language play an important role in computer processing of language from a practical and theoretical perspective, due their minimal reliance on manually produced linguistic resources and human annotation. Such approaches have been widely researched for the problem of concatenative affixation, but less attention has been paid to the intercalated (non-concatenative) morphology exhibited by Arabic and other Semitic languages.
The aim of this research is to learn the root and pattern morphology of Arabic, with accuracy comparable to manually built morphological analysis systems. The approach is kept free from human supervision or manual parameter settings, assuming only that roots and patterns intertwine to form a word.
Promising results were obtained by applying a technique adapted from previous work in concatenative morphology learning, which uses machine learning to determine relatedness between words. The output, with probabilistic relatedness values between words, was then used to rank all possible roots and patterns to form a lexicon. Analysis using trilateral roots resulted in correct root identification accuracy of approximately 86% for inflected words.
Although the machine learning-based approach is effective, it is conceptually complex. So an alternative, simpler and computationally efficient approach was then devised to obtain morpheme scores based on comparative counts of roots and patterns. In this approach, root and pattern scores are defined in terms of each other in a mutually recursive relationship, converging to an optimized morpheme ranking. This technique gives slightly better accuracy while being conceptually simpler and more efficient.
The approach, after further enhancements, was evaluated on a version of the Quranic Arabic Corpus, attaining a final accuracy of approximately 93%. A comparative evaluation shows this to be superior to two existing, well used manually built Arabic stemmers, thus demonstrating the practical feasibility of unsupervised learning of non-concatenative morphology
One Model to Rule them all: Multitask and Multilingual Modelling for Lexical Analysis
When learning a new skill, you take advantage of your preexisting skills and
knowledge. For instance, if you are a skilled violinist, you will likely have
an easier time learning to play cello. Similarly, when learning a new language
you take advantage of the languages you already speak. For instance, if your
native language is Norwegian and you decide to learn Dutch, the lexical overlap
between these two languages will likely benefit your rate of language
acquisition. This thesis deals with the intersection of learning multiple tasks
and learning multiple languages in the context of Natural Language Processing
(NLP), which can be defined as the study of computational processing of human
language. Although these two types of learning may seem different on the
surface, we will see that they share many similarities.
The traditional approach in NLP is to consider a single task for a single
language at a time. However, recent advances allow for broadening this
approach, by considering data for multiple tasks and languages simultaneously.
This is an important approach to explore further as the key to improving the
reliability of NLP, especially for low-resource languages, is to take advantage
of all relevant data whenever possible. In doing so, the hope is that in the
long term, low-resource languages can benefit from the advances made in NLP
which are currently to a large extent reserved for high-resource languages.
This, in turn, may then have positive consequences for, e.g., language
preservation, as speakers of minority languages will have a lower degree of
pressure to using high-resource languages. In the short term, answering the
specific research questions posed should be of use to NLP researchers working
towards the same goal.Comment: PhD thesis, University of Groninge
- …