132 research outputs found
Participatory action research on climate risk management, Bangladesh
The rural populations of southern Bangladesh are some of the most vulnerable communities in the world to the future impacts of climate change. They are particularly at risk from floods, waterlogged soils, and increasing salinity of both land and water. The objective of this project was to analyze the vulnerability of people in four villages that are experiencing different levels of soil salinity. The study evaluated the strengths and weaknesses of current coping strategies and assessed the potential of an index-based insurance scheme, designed diversification and better information products to improve adaptive capacity
Team Oulu at SemEval-2020 Task 12 : Multilingual Identification of Offensive Language, Type and Target of Twitter Post Using Translated Datasets
AbstractWith the proliferation of social media platforms, anonymous discussions together with easy online access, reports on offensive content have caused serious concern to both authorities and research communities. Although there is extensive research in identifying textual offensive language from online content, the dynamic discourse of social media content, as well as the emergence of new forms of offensive language, especially in a multilingual setting, calls for future research in the issue. In this work, we tackled Task A, B, and C of Offensive Language Challenge at SemEval2020. We handled offensive language in five languages: English, Greek, Danish, Arabic, and Turkish. Specifically, we pre-processed all provided datasets and developed an appropriate strategy to handle Tasks (A, B, & C) for identifying the presence/absence, type and the target of offensive language in social media. For this purpose, we used OLID2019, OLID2020 datasets, and generated new datasets, which we made publicly available. We used the provided unsupervised machine learning implementation for automated annotated datasets and the online Google translation tools to create new datasets as well. We discussed the limitations and the success of our machine learning-based approach for all the five different languages. Our results for identifying offensive posts (Task A) yielded satisfactory accuracy of 0.92 for English, 0.81 for Danish, 0.84 for Turkish, 0.85 for Greek, and 0.89 for Arabic. For the type detection (Task B), the results are significantly higher (.87 accuracy) compared to target detection (Task C), which yields .81 accuracy. Moreover, after using automated Google translation, the overall efficiency improved by 2% for Greek, Turkish, and Danish.Abstract
With the proliferation of social media platforms, anonymous discussions together with easy online access, reports on offensive content have caused serious concern to both authorities and research communities. Although there is extensive research in identifying textual offensive language from online content, the dynamic discourse of social media content, as well as the emergence of new forms of offensive language, especially in a multilingual setting, calls for future research in the issue. In this work, we tackled Task A, B, and C of Offensive Language Challenge at SemEval2020. We handled offensive language in five languages: English, Greek, Danish, Arabic, and Turkish. Specifically, we pre-processed all provided datasets and developed an appropriate strategy to handle Tasks (A, B, & C) for identifying the presence/absence, type and the target of offensive language in social media. For this purpose, we used OLID2019, OLID2020 datasets, and generated new datasets, which we made publicly available. We used the provided unsupervised machine learning implementation for automated annotated datasets and the online Google translation tools to create new datasets as well. We discussed the limitations and the success of our machine learning-based approach for all the five different languages. Our results for identifying offensive posts (Task A) yielded satisfactory accuracy of 0.92 for English, 0.81 for Danish, 0.84 for Turkish, 0.85 for Greek, and 0.89 for Arabic. For the type detection (Task B), the results are significantly higher (.87 accuracy) compared to target detection (Task C), which yields .81 accuracy. Moreover, after using automated Google translation, the overall efficiency improved by 2% for Greek, Turkish, and Danish
A systematic review of hate speech automatic detection using natural language processing
AbstractWith the multiplication of social media platforms, which offer anonymity, easy access and online community formation and online debate, the issue of hate speech detection and tracking becomes a growing challenge to society, individual, policy-makers and researchers. Despite efforts for leveraging automatic techniques for automatic detection and monitoring, their performances are still far from satisfactory, which constantly calls for future research on the issue. This paper provides a systematic review of literature in this field, with a focus on natural language processing and deep learning technologies, highlighting the terminology, processing pipeline, core methods employed, with a focal point on deep learning architecture. From a methodological perspective, we adopt PRISMA guideline of systematic review of the last 10 years literature from ACM Digital Library and Google Scholar. In the sequel, existing surveys, limitations, and future research directions are extensively discussed.Abstract
With the multiplication of social media platforms, which offer anonymity, easy access and online community formation and online debate, the issue of hate speech detection and tracking becomes a growing challenge to society, individual, policy-makers and researchers. Despite efforts for leveraging automatic techniques for automatic detection and monitoring, their performances are still far from satisfactory, which constantly calls for future research on the issue. This paper provides a systematic review of literature in this field, with a focus on natural language processing and deep learning technologies, highlighting the terminology, processing pipeline, core methods employed, with a focal point on deep learning architecture. From a methodological perspective, we adopt PRISMA guideline of systematic review of the last 10 years literature from ACM Digital Library and Google Scholar. In the sequel, existing surveys, limitations, and future research directions are extensively discussed
Car Parking User’s Behavior Using News Articles Mining Based Approach
AbstractStudying individual’s parking choice behavior can considerably contribute towards evidence-based policing in urban area. This study investigates evidence gathered by mining Finland news article API concerning car parking associated topics in order to comprehend user’s behavior and identify potential unforeseen circumstances that may impact users’ decisions and preferences. The study follows a natural language processing research pipeline, emphasizing word co-occurrence analysis, sentiment score and named-entity monitoring. The results can be exploited by local authorities to develop further evidence based policing in city urban planning.Abstract
Studying individual’s parking choice behavior can considerably contribute towards evidence-based policing in urban area. This study investigates evidence gathered by mining Finland news article API concerning car parking associated topics in order to comprehend user’s behavior and identify potential unforeseen circumstances that may impact users’ decisions and preferences. The study follows a natural language processing research pipeline, emphasizing word co-occurrence analysis, sentiment score and named-entity monitoring. The results can be exploited by local authorities to develop further evidence based policing in city urban planning
Finnish Hate-Speech Detection on Social Media Using CNN and FinBERT
AbstractThere has been a lot of research in identifying hate posts from social media because of their detrimental effects on both individuals and society. The majority of this research has concentrated on English, although one notices the emergence of multilingual detection tools such as multilingual-BERT (mBERT). However, there is a lack of hate speech datasets compared to English, and a multilingual pre-trained model often contains fewer tokens for other languages. This paper attempts to contribute to hate speech identification in Finnish by constructing a new hate speech dataset that is collected from a popular forum (Suomi24). Furthermore, we have experimented with FinBERT pre-trained model performance for Finnish hate speech detection compared to state-of-the-art mBERT and other practices. In addition, we tested the performance of FinBERT compared to fastText as embedding, which employed with Convolution Neural Network (CNN). Our results showed that FinBERT yields a 91.7% accuracy and 90.8% F1 score value, which outperforms all state-of-art models, including multilingual-BERT and CNN.Abstract
There has been a lot of research in identifying hate posts from social media because of their detrimental effects on both individuals and society. The majority of this research has concentrated on English, although one notices the emergence of multilingual detection tools such as multilingual-BERT (mBERT). However, there is a lack of hate speech datasets compared to English, and a multilingual pre-trained model often contains fewer tokens for other languages. This paper attempts to contribute to hate speech identification in Finnish by constructing a new hate speech dataset that is collected from a popular forum (Suomi24). Furthermore, we have experimented with FinBERT pre-trained model performance for Finnish hate speech detection compared to state-of-the-art mBERT and other practices. In addition, we tested the performance of FinBERT compared to fastText as embedding, which employed with Convolution Neural Network (CNN). Our results showed that FinBERT yields a 91.7% accuracy and 90.8% F1 score value, which outperforms all state-of-art models, including multilingual-BERT and CNN
EFFECT OF SULPHUR AND ZINC ON THE GROWTH AND YIELD OF ONION
A Thesis
Submitted to the Faculty of Agriculture.
Sher.e.Bangla
Agricultural
University.
Dhaka,
in partial fulfilment of the requirements
for the degree of
MASTER OF SCIENCE
IN
SOIL SCIENCE
SEMESTER: JULY-DECEMBER, 2008A held experiment was carried out at Sher-e-Bangla Agricultural University Farm
during the
RaM
season of 2008 to investigate the effect of sulphur and zinc on the
growth and yield of onion cv.
taharpuri.
The red tarrece soil of Tejgaon was silty
loam in texture having p14 5.6. The experiment was conducted in a RCBL) with
three replications. The experiment comprises 4 levels of sulphur from gypsum (0
kg. 10 kg, 20 kg and 30 kg sulphur ha") and 4 Levels of zinc ftom zinc oxide (0
kg. 1 kg. 3 kg and
4
kg zinc ha"). There was combination of sixteen treatments
including control (no fertilizer).lt was observed from the experiment that S and Zn
alone or in combination significantly increased all the parameters studied.
S20,
Zn3
and Zn
4
individually gave the height results over the control in respect of most
eases. Maximum results were found with (S
0
+7jn3
) treatment combination in
respect of all the studied parameters and S
0
+ Zn0
produced minimum results. The
highest N. P. K. S and Zn content in bulb and in leaf were also obtained with
S20Zn3
treatment combination. Thus the findings of the experiment suggested that
combined use of 20 kg sulphur with 3 kg zinc produced maximum growth and
yield of onion in red terrace soil of the Tejgaon series. This fertilizer combination
of sulphur and zinc not only gives maximum growth and yield of onion but also
keeps the soil fertile and productive
Hate and Offensive language detection using BERT for English Subtask A
AbstractThis paper presents the results and main findings of the HASOC-2021 Hate/Offensive Language Identification Subtask A. The work consisted of fine-tuning pre-trained transformer networks such as BERT and an ensemble of different models, including CNN and BERT. We have used the HASOC-2021 English 3.8k annotated twitter dataset. We compare current pre-trained transformer networks with and without Masked-Language-Modelling (MLM) fine-tuning on their performance for offensive language detection. Among different BERT MLM fine-tuned BERT-base, BERT-large, and ALBERT outperformed other models; however, BERT and CNN ensemble classifier that applies majority voting outperformed other models, achieving 85.1% F1 score on both hate/non-hate labels. Our final submission achieved 77.0 F1 in the HASOC-2021 competition.Abstract
This paper presents the results and main findings of the HASOC-2021 Hate/Offensive Language Identification Subtask A. The work consisted of fine-tuning pre-trained transformer networks such as BERT and an ensemble of different models, including CNN and BERT. We have used the HASOC-2021 English 3.8k annotated twitter dataset. We compare current pre-trained transformer networks with and without Masked-Language-Modelling (MLM) fine-tuning on their performance for offensive language detection. Among different BERT MLM fine-tuned BERT-base, BERT-large, and ALBERT outperformed other models; however, BERT and CNN ensemble classifier that applies majority voting outperformed other models, achieving 85.1% F1 score on both hate/non-hate labels. Our final submission achieved 77.0 F1 in the HASOC-2021 competition
Offensive Language Identification Using Hindi-English Code-Mixed Tweets, and Code-Mixed Data Augmentation
AbstractThe Code-mixed text classification is challenging due to the lack of code-mixed labeled datasets and the non-existence of pre-trained models. This paper presents the HASOC-2021 offensive language identification results and main findings on code-mixed (Hindi-English) Subtask2. In this work, we have proposed a new method of code-mixed data augmentation using synonym replacement of Hindi and English words using WordNet, and phonetics conversion of Hinglish (Hindi-English) words. We used a 5.7k pre-annotated HASOC-2021 code-mixed dataset for training and data augmentation. The proposal’s feasibility was tested with a Logistic Regression (LR) used as a baseline, Convolutional Neural Network (CNN), and BERT with and without data augmentation. The research outcomes were promising and yields almost 3% increase of classifier accuracy and F1 scores as compared to baseline. Our official submission showed a 66.56% F1 score and ranked 8th position in the competition.Abstract
The Code-mixed text classification is challenging due to the lack of code-mixed labeled datasets and the non-existence of pre-trained models. This paper presents the HASOC-2021 offensive language identification results and main findings on code-mixed (Hindi-English) Subtask2. In this work, we have proposed a new method of code-mixed data augmentation using synonym replacement of Hindi and English words using WordNet, and phonetics conversion of Hinglish (Hindi-English) words. We used a 5.7k pre-annotated HASOC-2021 code-mixed dataset for training and data augmentation. The proposal’s feasibility was tested with a Logistic Regression (LR) used as a baseline, Convolutional Neural Network (CNN), and BERT with and without data augmentation. The research outcomes were promising and yields almost 3% increase of classifier accuracy and F1 scores as compared to baseline. Our official submission showed a 66.56% F1 score and ranked 8th position in the competition
Data Expansion Using WordNet-based Semantic Expansion and Word Disambiguation for Cyberbullying Detection
AbstractAutomatic identification of cyberbullying from textual content is known to be a challenging task. The challenges arise from the inherent structure of cyberbullying and the lack of labeled large-scale corpus, enabling efficient machine-learning-based tools including neural networks. This paper advocates a data augmentation-based approach that could enhance the automatic detection of cyberbullying in social media texts. We use both word sense disambiguation and synonymy relation in WordNet lexical database to generate coherent equivalent utterances of cyberbullying input data. The disambiguation and semantic expansion are intended to overcome the inherent limitations of social media posts, such as an abundance of unstructured constructs and limited semantic content. Besides, to test the feasibility, a novel protocol has been employed to collect cyberbullying traces data from AskFm forum, where about a 10K-size dataset has been manually labeled. Next, the problem of cyberbullying identification is viewed as a binary classification problem using an elaborated data augmentation strategy and an appropriate classifier. For the latter, a Convolutional Neural Network (CNN) architecture with FastText and BERT was put forward, whose results were compared against commonly employed Na¨ıve Bayes (NB) and Logistic Regression (LR) classifiers with and without data augmentation. The research outcomes were promising and yielded almost 98.4% of classifier accuracy, an improvement of more than 4% over baseline resultsAbstract
Automatic identification of cyberbullying from textual content is known to be a challenging task. The challenges arise from the inherent structure of cyberbullying and the lack of labeled large-scale corpus, enabling efficient machine-learning-based tools including neural networks. This paper advocates a data augmentation-based approach that could enhance the automatic detection of cyberbullying in social media texts. We use both word sense disambiguation and synonymy relation in WordNet lexical database to generate coherent equivalent utterances of cyberbullying input data. The disambiguation and semantic expansion are intended to overcome the inherent limitations of social media posts, such as an abundance of unstructured constructs and limited semantic content. Besides, to test the feasibility, a novel protocol has been employed to collect cyberbullying traces data from AskFm forum, where about a 10K-size dataset has been manually labeled. Next, the problem of cyberbullying identification is viewed as a binary classification problem using an elaborated data augmentation strategy and an appropriate classifier. For the latter, a Convolutional Neural Network (CNN) architecture with FastText and BERT was put forward, whose results were compared against commonly employed Na¨ıve Bayes (NB) and Logistic Regression (LR) classifiers with and without data augmentation. The research outcomes were promising and yielded almost 98.4% of classifier accuracy, an improvement of more than 4% over baseline result
- …
