5 research outputs found
A Framework for Personalized Content Recommendations to Support Informal Learning in Massively Diverse Information WIKIS
Personalization has proved to achieve better learning outcomes by adapting to specific learners’ needs, interests, and/or preferences. Traditionally, most personalized learning software systems focused on formal learning. However, learning personalization is not only desirable for formal learning, it is also required for informal learning, which is self-directed, does not follow a specified curriculum, and does not lead to formal qualifications. Wikis among other informal learning platforms are found to attract an increasing attention for informal learning, especially Wikipedia. The nature of wikis enables learners to freely navigate the learning environment and independently construct knowledge without being forced to follow a predefined learning path in accordance with the constructivist learning theory. Nevertheless, navigation on information wikis suffer from several limitations. To support informal learning on Wikipedia and similar environments, it is important to provide easy and fast access to relevant content. Recommendation systems (RSs) have long been used to effectively provide useful recommendations in different technology enhanced learning (TEL) contexts. However, the massive diversity of unstructured content as well as user base on such information oriented websites poses major challenges when designing recommendation models for similar environments. In addition to these challenges, evaluation of TEL recommender systems for informal learning is rather a challenging activity due to the inherent difficulty in measuring the impact of recommendations on informal learning with the absence of formal assessment and commonly used learning analytics. In this research, a personalized content recommendation framework (PCRF) for information wikis as well as an evaluation framework that can be used to evaluate the impact of personalized content recommendations on informal learning from wikis are proposed. The presented recommendation framework models learners’ interests by continuously extrapolating topical navigation graphs from learners’ free navigation and applying graph structural analysis algorithms to extract interesting topics for individual users. Then, it integrates learners’ interest models with fuzzy thesauri for personalized content recommendations. Our evaluation approach encompasses two main activities. First, the impact of personalized recommendations on informal learning is evaluated by assessing conceptual knowledge in users’ feedback. Second, web analytics data is analyzed to get an insight into users’ progress and focus throughout the test session. Our evaluation revealed that PCRF generates highly relevant recommendations that are adaptive to changes in user’s interest using the HARD model with rank-based mean average precision (MAP@k) scores ranging between 100% and 86.4%. In addition, evaluation of informal learning revealed that users who used Wikipedia with personalized support could achieve higher scores on conceptual knowledge assessment with average score of 14.9 compared to 10.0 for the students who used the encyclopedia without any recommendations. The analysis of web analytics data show that users who used Wikipedia with personalized recommendations visited larger number of relevant pages compared to the control group, 644 vs 226 respectively. In addition, they were also able to make use of a larger number of concepts and were able to make comparisons and state relations between concepts
Tuning Model Analisis Sentimen Tweeter Sepakbola Pada Dataset Kecil dan Seimbang
Suporter bola adalah orang yang mendukung dan memberikan motivasi serta semangat untuk pemain klub bola yang memiliki fanatisme positif maupun negatif, baik dalam dunia nyata atau social media, tweeter. Penelitian ini menghasilkan model klasifikasi untuk prediksi tweet supporter sepakbola dengan sedikit data dan berimbang. Model klasifikasi dibangun berdasarkan ekplorasi analisis data dan penentuan baseline model dari akurasi null, polarisasi dan subyektivitas, seleksi fitur, klasifikasi linier dan non linier. Model terpilih akan dilakukan tuning untuk mendapatkan hasil yang lebih presisi dan akurat serta dievaluasi dengan confusion matrik serta laporan klasifikasi untuk memberikan intuisi lebih dalam tentang perilaku classifier atas akurasi global. Hasil penelitian ditemukannya polarisasi kata bermakna negative yang berada dikelas positif sebesar 88% dengan frekuensi 4% dan rerata harmoni 8%. Model klasisfikasi Multinomial Naïve Bayes terpilih sebagai model terbaik dengan akurasi 99%, error 0.8% pada data train dan 100%, error 0% pada data validasi. Eksperimen untuk menguji model terhadap 30 entri data test baru, menghasilkan prediksi denganakurasinya 87% dengan error 13%, artinya hanya terdapat 4 kesalahan prediksi. Kedepan disarankan untuk menguji model ektraksi fitur atau melakukan boosting, bagging dan deep learning untuk mengetahui apakah hasilnya menjadi lebih baik
Sentiment analysis in tweets
Orientador: Jacques WainerDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Análise do sentimento é um campo de estudo de recente popularização devido ao crescimento da Internet e ao conteúdo gerado por seus usuários. Mais recentemente, as redes sociais surgiram, nessas redes as pessoas publicam suas opiniões em linguagem coloquial e compacta. Isto é o que acontece, por exemplo, no Twitter, uma ferramenta de comunicação que pode ser facilmente utilizada como fonte de informação para várias ferramentas automatizadas de inferência de sentimento. Esforços de pesquisa foram direcionados para lidar com o problema da análise do sentimento nas redes sociais do ponto de vista de um problema de classificação, onde não há consenso sobre qual é o melhor classificador, qual a melhor forma de pré- processamento entre outros. O objetivo desta dissertação é investigar a influência de algumas técnicas de pré-processamento, da técnica TF-IDF, do volume do conjunto de treinamento e de técnicas ensembles na acurácia de alguns classificadores supervisionadosAbstract: Sentiment analysis is a field of study that shows recent popularization due to the growth of Internet and the content that is generated by its users. More recently, social networks have emerged, where people post their opinions in colloquial and compact language. This is what happens in Twitter, a communication tool that can easily be used as a source of information for various automatic tools of sentiment inference. Research efforts have been directed to deal with the problem of sentiment analysis in social networks from the point of view of a classification problem, where there is no consensus about what the best classifier is, and what is the best configuration provided by the feature engineering process. The objective of this dissertation is to investigate the influence of some pre-processing techniques, the TF-IDF technique, the volume of the training set and ensembles techniques in the accuracy of some supervised techniquesMestradoCiência da ComputaçãoMestre em Ciência da Computaçã
Generación de un corpus para detección de competidores en el idioma español mediante minería de opiniones comparativas. Caso de estudio: sector textil en la provincia del Azuay
En la actualidad con el avance de la tecnología y más aún con la llegada de la pandemia el uso de
las plataformas digitales se ha incrementado. Un estudio presentado por la Cámara de Comercio
Electrónico Ecuatoriana del año 2020 demuestra que el comercio electrónico ha incrementado en
al menos 15 veces con respecto al 2019 el uso de plataformas digitales online con la llegada de la
pandemia. Debido a esto, las empresas para hacer estudios de mercado deben buscar nuevas
fuentes de información. Por lo tanto, el internet se ha convertido en un insumo intangible de toda
estrategia comercial. Una parte fundamental de una estrategia comercial es analizar a la
competencia, este análisis en años anteriores según la literatura se realizaba generalmente
mediante encuestas, pero con la llegada de las plataformas digitales ha cambiado este método y
hoy por hoy se puede extraer los datos de la web para luego implementar un proceso de Inteligencia
Competitiva (CI), la cual permite hacer un análisis completo para tener una ventaja competitiva. CI
comprende de varios pasos, esta investigación aborda todos estos pasos, pero se enfoca
principalmente en el paso inicial, la recolección y análisis de datos, que es un paso fundamental para
CI, donde actualmente existen problemas como: falta de corpus en español especializado para CI,
por lo cual los investigadores no tienen la facilidad de implementar modelos de aprendizaje
automático que les ayuden a tener una ventaja competitiva. El presente trabajo de investigación
presenta una metodología para la creación de un corpus en el idioma español que permita entrenar
algoritmos con el fin de realizar detección de competidores en el contexto del sector textil. Se han
generado dos resultados principales: 1) Una metodología utilizando técnicas de minería de textos
(minería de opiniones comparativas y reconocimiento de entidades nombradas) para construir
corpus enfocado hacia la Inteligencia Competitiva. 2) Un corpus en español, dentro del dominio de
comentarios de redes sociales, el cual sirve de base para futuras investigaciones relacionadas con la
inteligencia competitiva, específicamente en la detección de competidores en el lenguaje español,
donde la CI estaba estrictamente restringida por la falta de un corpus. Por último, se ha evaluado la
utilidad del corpus desarrollado mediante un Dashboard creado en base a un caso de estudio llevado
a cabo en el contexto del sector textil en redes sociales. Se ha demostrado que efectivamente es de
utilidad para el sector textil, sin embargo, se recomienda hacer una nueva validación con empresas
que estén directamente relacionadas al sector textil y así obtener una validación más directa,
también se recomienda evaluar en otros sectores.Currently, with the advancement of technology and even more so with the arrival of the
pandemic, the use of digital platforms has increased. A study presented by the Ecuadorian Chamber
of Electronic Commerce for the year 2020 shows that electronic commerce has increased the use of
online digital platforms by at least 15 times compared to 2019 with the arrival of the pandemic. Due
to this, companies to do market research must look for new sources of information. Therefore, the
internet has become an intangible input for any business strategy. A fundamental part of a
commercial strategy is to analyze the competition, this analysis in previous years according to the
literature was generally carried out through surveys, but with the arrival of digital platforms this
method has changed and today the data can be extracted from the web to then implement a
Competitive Intelligence (CI) process, which allows a complete analysis to have a competitive
advantage. CI comprises several steps, this research addresses all these steps, but focuses mainly
on the initial step, data collection and data analysis, which is a fundamental step for CI, where there
are currently problems such as: lack of corpus in Spanish specialized for CI, so researchers do not
have the facility to implement machine learning models that help them to have a competitive
advantage. This research presents a methodology for the creation of a corpus in the Spanish
language that allows algorithms to be trained in order to detect competitors in the context of the
textile sector. Two main results have been generated: 1) A methodology using text mining
techniques (comparative opinion mining and named entity recognition) to build a corpus focused
on Competitive Intelligence. 2) A corpus in Spanish, within the domain of social network comments,
which serves as a basis for future research related to competitive intelligence, specifically in the
detection of competitors in the Spanish language, where the CI was strictly restricted by the lack of
a corpus. Finally, the usefulness of the corpus developed has been evaluated through a Dashboard
created based on a case study carried out in the context of the textile sector in social networks. It
has been shown that it is indeed useful for the textile sector, however, it is recommended to carry
out a new validation with companies that are directly related to the textile sector and thus obtain a
more direct validation, it is also recommended to evaluate in other sectors.Ingeniero de SistemasCuenc
Recommended from our members
Sentiment analysis of dialectical Arabic social media content using a hybrid linguistic-machine learning approach
Despite the enormous increase in the number of Arabic posts on social networks, the sentiment analysis research into extracting opinions from these posts lags behind that for the English language. This is largely attributed to the challenges in processing the morphologically complex Arabic natural language and the scarcity of Arabic NLP tools and resources. This complex task is further exacerbated when analysing dialectal Arabic that do not abide by the formal grammatical structure. Based on the semantic modelling of the target domain’s knowledge and multi-factor lexicon-based sentiment analysis, the intent of this research is to use a hybrid approach, integrating linguistic and machine learning methods for sentiment analysis classification of dialectal Arabic. First, a dataset of dialectal Arabic tweets was collected focusing on the unemployment domain, which is annotated manually. The tweets cover different dialectal Arabic in Saudi Arabia for which a comprehensive Arabic sentiment lexicon was constructed. This approach to sentiment analysis also integrated a novel light stemming mechanism towards improved Saudi dialectal Arabic stemming. Subsequently, a novel multi-factor lexicon-based sentiment analysis algorithm was developed for domain-specific social media posts written in dialectal Arabic. The algorithm considers several factors (emoji, intensifiers, negations, supplications) to improve the accuracy of the classifications. Applying this model to a central problem of sentiment analysis in dialectical Arabic, these operational techniques were deployed in order to assess analytical performance across social media channels which are vulnerable to semantic and colloquial variations. Finally, this study presented a new hybrid approach to sentiment analysis where domain knowledge is utilised in two methods to combine computational linguistics and machine learning; the first method integrates the problem domain semantic knowledgebase in the machine learning training features set, while the second uses the outcome of the lexicon-based sentiment classification in the training of the machine learning methods. By integrating these techniques into a single, hybridised solution, a greater degree of accuracy and consistency was achieved than applying each approach independently, confirming a pragmatic solution to sentiment classification in dialectical Arabic text