Search CORE

177 research outputs found

Normalized Web Distance and Word Similarity

Author: Cilibrasi Rudi L.
Vitanyi Paul M. B.
Publication venue
Publication date: 01/01/2009
Field of study

There is a great deal of work in cognitive psychology, linguistics, and computer science, about using word (or phrase) frequencies in context in text corpora to develop measures for word similarity or word association, going back to at least the 1960s. The goal of this chapter is to introduce the normalizedis a general way to tap the amorphous low-grade knowledge available for free on the Internet, typed in by local users aiming at personal gratification of diverse objectives, and yet globally achieving what is effectively the largest semantic electronic database in the world. Moreover, this database is available for all by using any search engine that can return aggregate page-count estimates for a large range of search-queries. In the paper introducing the NWD it was called `normalized Google distance (NGD),' but since Google doesn't allow computer searches anymore, we opt for the more neutral and descriptive NWD. web distance (NWD) method to determine similarity between words and phrases. ItComment: Latex, 20 pages, 7 figures, to appear in: Handbook of Natural Language Processing, Second Edition, Nitin Indurkhya and Fred J. Damerau Eds., CRC Press, Taylor and Francis Group, Boca Raton, FL, 2010, ISBN 978-142008592

arXiv.org e-Print Archive

CiteSeerX

CWI's Institutional Repository

International Migration, Integration and Social Cohesion online publications

Handling non-compositionality in multilingual CNLs

Author: Enache Ramona
Kolachina Prasanth
Listenmaa Inari
Publication venue
Publication date: 01/01/2014
Field of study

In this paper, we describe methods for handling multilingual non-compositional constructions in the framework of GF. We specifically look at methods to detect and extract non-compositional phrases from parallel texts and propose methods to handle such constructions in GF grammars. We expect that the methods to handle non-compositional constructions will enrich CNLs by providing more flexibility in the design of controlled languages. We look at two specific use cases of non-compositional constructions: a general-purpose method to detect and extract multilingual multiword expressions and a procedure to identify nominal compounds in German. We evaluate our procedure for multiword expressions by performing a qualitative analysis of the results. For the experiments on nominal compounds, we incorporate the detected compounds in a full SMT pipeline and evaluate the impact of our method in machine translation process.Comment: CNL workshop in COLING 201

arXiv.org e-Print Archive

Crossref

The identification of improvement strategies in continuous assessment using sentiment analysis in the Operational Research course

Author: Durán-Peña Julián
Garcia-Franco Sergio
SALCEDO
Salcedo-Rugeles Kevin
Talero-Sarmiento Leonardo
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 10/04/2020
Field of study

[EN] The University aims is to graduate professionals with high levels of competence to impact society positively. In consequence, the institutions apply different educational strategies to focus on improving the curricular competences until mastery the whole competences topics. An alternative highly applied is continuous assessment, which is a form of educational examination that evaluates the progress of a student throughout a prescribed course. A critical course in the engineer formation is Operational Research; this course focuses on scientific management supported by mathematical models such as decision theory, stochastic scenarios, simulation, mathematical optimization etcetera. This work aim is to diagnose the continuous assessment strategy apply to Industrial and System engineer students enrolled in Operational Research course, to do that, this research carries out a sentiment analysis which is a text classification tool that analyses an incoming message (in this case a perception essay) and indicates whether the underlying sentiment is positive, negative or neutral. Furthermore, the Techniques applied to group the emotions of anger, anticipations, disgust, fear, joy, negative, positive, sadness, surprise, and trust. Taking into account the initial results, the authors highlight alternatives such as the flipped classroom, gamification as educational strategies to implement in futures courses looking to improving the continuous assessment positive perception.Talero-Sarmiento, L.; Durán-Peña, J.; Salcedo-Rugeles, K.; Garcia-Franco, S.; SALCEDO (2020). The identification of improvement strategies in continuous assessment using sentiment analysis in the Operational Research course. Editorial Universitat Politècnica de València. 389-397. https://doi.org/10.4995/INN2019.2019.10246OCS38939

RiuNet

IMPLEMENTASI ALGORITMA NAÏVE BAYES CLASSIFIER DAN SUPPORT VECTOR MACHINE PADA KLASIFIKASI SENTIMEN REVIEW LAYANAN TELEMEDICINE HALODOC

Author: CIKANIA REYNALDA NABILA
Publication venue: 'Universitas Negeri Gorontalo - Fakultas Matematika dan IPA'
Publication date: 30/11/2021
Field of study

Halodoc is a telemedicine-based healthcare application that connects patients with health practitioners such as doctors, pharmacies, and laboratories. There are some comments from halodoc users, both positive and negative comments. This indicates the public's concern for the Halodoc application so it is necessary to analyze the sentiment or comments that appear on the Halodoc application service, especially during the COVID-19 pandemic in order for Halodoc application services to be better. The Naïve Bayes Classifier (NBC) and Support Vector Machine (SVM) algorithms are used to analyze the public sentiment of Halodoc's telemedicine service application users. The negative category sentiment classification result was 12.33%, while the positive category sentiment was 87.67% from 5,687 reviews which means that the positive review sentiment is more than the negative review sentiment. The accuracy performance of the Naive Bayes Classifier Algorithm resulted in an accuracy rate of 87.77% with an AUC value of 57.11% and a G-Mean of 40.08%, while svm algorithm with KERNEL RBF had an accuracy value of 86.1% with an AUC value of 60.149% and a G-Mean value of 49.311%. Based on the accuracy value of the model can be known SVM Kernel RBF model better than NBC on classifying the review of user sentiment of halodoc telemedicine servic

E-Journals Universitas Negeri Gorontalo

Natural language processing libraries in a big data subject area

Author: Lutskiv A. M.
Popovych N. M.
Yurkevych Kh. B.
Луцків Андрій Мирославович
Попович Н. М.
Юркевич Х. Б.
Publication venue: TNTU
Publication date: 27/11/2019
Field of study

Electronic archive of Ternopil National Ivan Puluj Technical University

Emoticon-based Ambivalent Expression: A Hidden Indicator for Unusual Behaviors in Weibo

Author: Hu Yue
Wu Junjie
Zhao Jichang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 07/05/2015
Field of study

Recent decades have witnessed online social media being a big-data window for quantificationally testifying conventional social theories and exploring much detailed human behavioral patterns. In this paper, by tracing the emoticon use in Weibo, a group of hidden "ambivalent users" are disclosed for frequently posting ambivalent tweets containing both positive and negative emotions. Further investigation reveals that this ambivalent expression could be a novel indicator of many unusual social behaviors. For instance, ambivalent users with the female as the majority like to make a sound in midnights or at weekends. They mention their close friends frequently in ambivalent tweets, which attract more replies and thus serve as a more private communication way. Ambivalent users also respond differently to public affairs from others and demonstrate more interests in entertainment and sports events. Moreover, the sentiment shift of words adopted in ambivalent tweets is more evident than usual and exhibits a clear "negative to positive" pattern. The above observations, though being promiscuous seemingly, actually point to the self regulation of negative mood in Weibo, which could find its base from the emotion management theories in sociology but makes an interesting extension to the online environment. Finally, as an interesting corollary, ambivalent users are found connected with compulsive buyers and turn out to be perfect targets for online marketing.Comment: Data sets can be downloaded freely from www.datatang.com/data/47207 or http://pan.baidu.com/s/1mg67cbm. Any issues feel free to contact [email protected]

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

FigShare