3 research outputs found
Urdu Speech and Text Based Sentiment Analyzer
Discovering what other people think has always been a key aspect of our
information-gathering strategy. People can now actively utilize information
technology to seek out and comprehend the ideas of others, thanks to the
increased availability and popularity of opinion-rich resources such as online
review sites and personal blogs. Because of its crucial function in
understanding people's opinions, sentiment analysis (SA) is a crucial task.
Existing research, on the other hand, is primarily focused on the English
language, with just a small amount of study devoted to low-resource languages.
For sentiment analysis, this work presented a new multi-class Urdu dataset
based on user evaluations. The tweeter website was used to get Urdu dataset.
Our proposed dataset includes 10,000 reviews that have been carefully
classified into two categories by human experts: positive, negative. The
primary purpose of this research is to construct a manually annotated dataset
for Urdu sentiment analysis and to establish the baseline result. Five
different lexicon- and rule-based algorithms including Naivebayes, Stanza,
Textblob, Vader, and Flair are employed and the experimental results show that
Flair with an accuracy of 70% outperforms other tested algorithms.Comment: Sentiment Analysis, Opinion Mining, Urdu language, polarity
assessment, lexicon-based metho
A machine learning approach for Urdu text sentiment analysis
Product evaluations, ratings, and other sorts of online expressions have risen in popularity as a result of the emergence of social networking sites and blogs. Sentiment analysis has emerged as a new area of study for computational linguists as a result of this rapidly expanding data set. From around a decade ago, this has been a topic of discussion for English speakers. However, the scientific community completely ignores other important languages, such as Urdu. Morphologically, Urdu is one of the most complex languages in the world. For this reason, a variety of unique characteristics, such as the language's unusual morphology and unrestricted word order, make the Urdu language processing a difficult challenge to solve. This research provides a new framework for the categorization of Urdu language sentiments. The main contributions of the research are to show how important this multidimensional research problem is as well as its technical parts, such as the parsing algorithm, corpus, lexicon, etc. A new approach for Urdu text sentiment analysis including data gathering, pre-processing, feature extraction, feature vector formation, and finally, sentiment classification has been designed to deal with Urdu language sentiments. The result and discussion section provides a comprehensive comparison of the proposed work with the standard baseline method in terms of precision, recall, f-measure, and accuracy of three different types of datasets. In the overall comparison of the models, the proposed work shows an encouraging achievement in terms of accuracy and other metrics. Last but not least, this section also provides the featured trend and possible direction of the current work
Desarrollo de una herramienta para la anotación semántica automática de documentos pdf basado en ontologÃas
Actualmente, Internet es una de las fuentes más accesibles y utilizadas para buscar información
sobre determinado tema, a través de la cual las personas pueden conectarse a una gran colección
de recursos, servicios y contenidos. En ese sentido, el uso de motores de búsqueda es
indispensable para poder encontrar contenido especÃfico y relevante para el usuario, es decir,
información precisa y alineada con el tema de su interés.
Sin embargo, los buscadores pueden presentar dificultades para brindar al usuario la información
deseada. Estas dificultades se presentan por motivos tales como las caracterÃsticas propias del
lenguaje natural como la polisemia, sinonimia y ambigüedad; asÃ, también, por el
desconocimiento de los temas que son de interés para el usuario. Otra de las causas que
dificultan la recuperación de información relevante es que la búsqueda de resultados se realiza
de manera sintáctica, esto es, buscando en los documentos la coincidencia exacta de los términos
ingresados en la cadena de búsqueda. Del mismo modo, otra razón importante es que los
formatos e interfaces de contenido se presentan en formatos comprensibles solo por las personas
y no por un computador.
Ante esto, el presente proyecto propone una alternativa de solución de forma tal que los
documentos contengan información adicional que describa los conceptos y entidades principales
del contenido. Esta información adicional se añadirá de manera automática a los documentos
mediante anotaciones semánticas en base a un dominio de conocimiento que sea de interés para
el usuario. De esta manera, se pretende apoyar el concepto de Web semántica cuya propuesta es
clasificar, estructurar y anotar los recursos con semántica explÃcita para que puedan ser
procesados por sistemas inteligentes.Tesi