508 research outputs found

    Deep Learning for Opinion Mining and Topic Classification of Course Reviews

    Full text link
    Student opinions for a course are important to educators and administrators, regardless of the type of the course or the institution. Reading and manually analyzing open-ended feedback becomes infeasible for massive volumes of comments at institution level or online forums. In this paper, we collected and pre-processed a large number of course reviews publicly available online. We applied machine learning techniques with the goal to gain insight into student sentiments and topics. Specifically, we utilized current Natural Language Processing (NLP) techniques, such as word embeddings and deep neural networks, and state-of-the-art BERT (Bidirectional Encoder Representations from Transformers), RoBERTa (Robustly optimized BERT approach) and XLNet (Generalized Auto-regression Pre-training). We performed extensive experimentation to compare these techniques versus traditional approaches. This comparative study demonstrates how to apply modern machine learning approaches for sentiment polarity extraction and topic-based classification utilizing course feedback. For sentiment polarity, the top model was RoBERTa with 95.5% accuracy and 84.7% F1-macro, while for topic classification, an SVM (Support Vector Machine) was the top classifier with 79.8% accuracy and 80.6% F1-macro. We also provided an in-depth exploration of the effect of certain hyperparameters on the model performance and discussed our observations. These findings can be used by institutions and course providers as a guide for analyzing their own course feedback using NLP models towards self-evaluation and improvement.Comment: Accepted and Published in Education and Information Technologies (Accepted March 2023

    ViCGCN: Graph Convolutional Network with Contextualized Language Models for Social Media Mining in Vietnamese

    Full text link
    Social media processing is a fundamental task in natural language processing with numerous applications. As Vietnamese social media and information science have grown rapidly, the necessity of information-based mining on Vietnamese social media has become crucial. However, state-of-the-art research faces several significant drawbacks, including imbalanced data and noisy data on social media platforms. Imbalanced and noisy are two essential issues that need to be addressed in Vietnamese social media texts. Graph Convolutional Networks can address the problems of imbalanced and noisy data in text classification on social media by taking advantage of the graph structure of the data. This study presents a novel approach based on contextualized language model (PhoBERT) and graph-based method (Graph Convolutional Networks). In particular, the proposed approach, ViCGCN, jointly trained the power of Contextualized embeddings with the ability of Graph Convolutional Networks, GCN, to capture more syntactic and semantic dependencies to address those drawbacks. Extensive experiments on various Vietnamese benchmark datasets were conducted to verify our approach. The observation shows that applying GCN to BERTology models as the final layer significantly improves performance. Moreover, the experiments demonstrate that ViCGCN outperforms 13 powerful baseline models, including BERTology models, fusion BERTology and GCN models, other baselines, and SOTA on three benchmark social media datasets. Our proposed ViCGCN approach demonstrates a significant improvement of up to 6.21%, 4.61%, and 2.63% over the best Contextualized Language Models, including multilingual and monolingual, on three benchmark datasets, UIT-VSMEC, UIT-ViCTSD, and UIT-VSFC, respectively. Additionally, our integrated model ViCGCN achieves the best performance compared to other BERTology integrated with GCN models

    English/Russian lexical cognates detection using NLP Machine Learning with Python

    Full text link
    Изучение языка – это замечательное занятие, которое расширяет наш кругозор и позволяет нам общаться с представителями различных культур и людей по всему миру. Традиционно языковое образование основывалось на традиционных методах, таких как учебники, словарный запас и языковой обмен. Однако с появлением машинного обучения наступила новая эра в обучении языку, предлагающая инновационные и эффективные способы ускорения овладения языком. Одним из интригующих применений машинного обучения в изучении языков является использование родственных слов, слов, которые имеют схожее значение и написание в разных языках. Для решения этой темы в данной исследовательской работе предлагается облегчить процесс изучения второго языка с помощью искусственного интеллекта, в частности нейронных сетей, которые могут идентифицировать и использовать слова, похожие или идентичные как на первом языке учащегося, так и на целевом языке. Эти слова, известные как лексические родственные слова, могут облегчить изучение языка, предоставляя учащимся знакомый ориентир и позволяя им связывать новый словарный запас со словами, которые они уже знают. Используя возможности нейронных сетей для обнаружения и использования этих родственных слов, учащиеся смогут ускорить свой прогресс в освоении второго языка. Хотя исследование семантического сходства в разных языках не является новой темой, наша цель состоит в том, чтобы применить другой подход для выявления русско-английских лексических родственных слов и представить полученные результаты в качестве инструмента изучения языка, используя выборку данных о лексическом и семантическом сходстве. между языками, чтобы построить модель обнаружения лексических родственных слов и ассоциаций слов. Впоследствии, в зависимости от нашего анализа и результатов, мы представим приложение для определения словесных ассоциаций, которое смогут использовать конечные пользователи. Учитывая, что русский и английский являются одними из наиболее распространенных языков в мире, а Россия является популярным местом для иностранных студентов со всего мира, это послужило значительной мотивацией для разработки инструмента искусственного интеллекта, который поможет людям, изучающим русский язык как англоговорящие, или изучающим английский язык. как русскоязычные.Language learning is a remarkable endeavor that expands our horizons and allows us to connect with diverse cultures and people around the world. Traditionally, language education has relied on conventional methods such as textbooks, vocabulary drills, and language exchanges. However, with the advent of machine learning, a new era has dawned upon language instruction, offering innovative and efficient ways to accelerate language acquisition. One intriguing application of machine learning in language learning is the utilization of cognates, words that share similar meanings and spellings across different languages. To address this subject, this research paper proposes to facilitate the process of acquiring a second language with the help of artificial intelligence, particularly neural networks, which can identify and use words that are similar or identical in both the learner's first language and the target language. These words, known as lexical cognates which can facilitate language learning by providing a familiar point of reference for the learner and enabling them to associate new vocabulary with words they already know. By leveraging the power of neural networks to detect and utilize these cognates, learners will be able to accelerate their progress in acquiring a second language. Although the study of semantic similarity across different languages is not a new topic, our objective is to adopt a different approach for identifying Russian-English Lexical cognates and present the obtained results as a language learning tool, by using the lexical and semantic similarity data sample across languages to build a lexical cognates detection and words association model. Subsequently, depend on our analysis and results, will present a word association application that can be utilized by end users. Given that Russian and English are among the most widely spoken languages globally and that Russia is a popular destination for international students from around the world, it served as a significant motivation to develop an AI tool to assist individuals learning Russian as English speakers or learning English as Russian speakers

    ACQUIRING SYNTACTIC VARIATION: REGULARIZATION IN WH-QUESTION PRODUCTION

    Get PDF
    Children are often exposed to language-internal variation. Studying the acquisition of variation allows us to understand more about children’s ability to acquire probabilistic input, their preferences at choice points, and factors contributing to such preference. Using wh-variation as a case study, this dissertation explores the acquisition of syntactic variation through corpus analyses, behavioral experiments, and computational simulation. In English and some other languages (e.g., French, Brazilian Portuguese, etc.), information-seeking wh-questions allow for at least two variants: a wh-in-situ variant and a fronted-wh variant. How do English-speaking children acquire wh-variation, and what factors condition their course of acquisition? Experimental results show that 3-to-5 year-old children regularize to fronted wh-questions in their production even in contexts that allow for both variants to be used interchangeably. Based on the characteristics of the variants, two factors are identified to potentially contribute to the preference for fronted wh-questions: frequency and discourse restrictions. Two artificial language learning (ALL) experiments are then conducted so that the effect of discourse can be studied separately from frequency. The results show that learners prefer the variant with fewer or no discourse restrictions (i.e., the fronted-wh variant) when frequency is controlled. Thus, regularization in language acquisition is conditioned by both domain-general factors, such as frequency, and language-specific factors, such as discourse markedness. The dissertation also looks into the motivation for regularization. One prominent hypothesis is that regularization serves as a means to reduce the cognitive burden associated with learning multiple variants at once. Instead of mastering all the variants, learners can simplify the learning process and minimize their chance of violating a constraint by producing the dominant variant. This work provides additional evidence for the hypothesis in three ways. First, we replicate the findings that tasks that are more cognitively taxing induce more regularization. Second, we present new evidence that participants with a lower composite working memory score tend to have a higher regularization rate. Third, we provide a computational simulation showing that regularization behavior only happens when an intake limit (reflecting limited working memory capacity) and a parsimony bias to reduce the cognitive burden are incorporated in the model
    corecore