8 research outputs found

    Polarity Classification Tool for Sentiment Analysis in Malay Language

    Get PDF
    The popularity of the social media channels has increased the interest among researchers in the sentiment analysis (SA) area. One aspect of the SA research is the determination of the polarity of the comments in the social media, i.e. positive, negative, and neutral. However, there is a scarcity of Malay sentiment analysis tools because most of the work in the literature discuss the polarity classification tool in English. This paper presents the development of a polarity classification tool called Malay Polarity Classification Tool (MaCT). This tool is developed based on the AFINN sentiment lexicon for English language. We have attempted to translate each word in AFINN to its Malay equivalent and later, use the lexicon to collect the sentiment data from Twitter. The Twitter data are then classified into positive, negative, and neutral. For the validation purpose, we collect 400 positive tweets, 400 negative tweets, and 200 neutral tweets, and later, run the tweets through our sentiment lexicon and found 90% score for precision, recall and accuracy. Our main contribution in the research is the new AFINN translation for Malay language and also the classification of the sentiment data

    A quantitative and qualitative analysis of Nordic surnames

    Get PDF
    Proceedings of the 18th Nordic Conference of Computational Linguistics NODALIDA 2011. Editors: Bolette Sandford Pedersen, Gunta Nešpore and Inguna Skadiņa. NEALT Proceedings Series, Vol. 11 (2011), 74-81. © 2011 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/16955

    U-Compare bio-event meta-service: compatible BioNLP event extraction services

    Get PDF
    AbstractBackgroundBio-molecular event extraction from literature is recognized as an important task of bio text mining and, as such, many relevant systems have been developed and made available during the last decade. While such systems provide useful services individually, there is a need for a meta-service to enable comparison and ensemble of such services, offering optimal solutions for various purposes.ResultsWe have integrated nine event extraction systems in the U-Compare framework, making them inter-compatible and interoperable with other U-Compare components. The U-Compare event meta-service provides various meta-level features for comparison and ensemble of multiple event extraction systems. Experimental results show that the performance improvements achieved by the ensemble are significant. ConclusionsWhile individual event extraction systems themselves provide useful features for bio text mining, the U-Compare meta-service is expected to improve the accessibility to the individual systems, and to enable meta-level uses over multiple event extraction systems such as comparison and ensemble.This research was partially supported by KAKENHI 18002007 [YK, MM, JDK, SP, TO, JT]; JST PRESTO and KAKENHI 21500130 [YK]; the Academy of Finland and computational resources were provided by CSC -- IT Center for Science Ltd [JB, FG]; the Research Foundation Flanders (FWO) [SVL]; UK Biotechnology and Biological Sciences, Research Council (BBSRC project BB/G013160/1 Automated Biological Event Extraction from the Literature for Drug Discovery) and JISC, National Centre for Text Mining [SA]; the Spanish grant BIO2010-17527 [MN, APM]; NIH Grant U54 DA021519 [AO, DRR]Peer Reviewe

    A case study in tagging case in german: an assessment of statistical approaches

    Full text link
    In this study, we assess the performance of purely statistical approaches using supervised machine learning for predicting case in German (nominative, accusative, dative, genitive, n/a). We experiment with two different treebanks containing morphological annotations: TIGER and TUEBA. An evaluation with 10-fold cross-validation serves as the basis for systematic comparisons of the optimal parametrizations of different approaches. We test taggers based on Hidden Markov Models (HMM), Decision Trees, and Conditional Random Fields (CRF). The CRF approach based on our hand-crafted feature model achieves an accuracy of about 94%. This outperforms all other approaches and results in an improvement of 11% compared to a baseline HMM trigram tagger and an improvement of 2% compared to a state-of-the-art tagger for rich morphological tagsets. Moreover, we investigate the effect of additional (morphological) categories (gender, number, person, part of speech) in the internal tagset used for the training. Rich internal tagsets improve results for all tested approaches

    The Archive Query Log: Mining Millions of Search Result Pages of Hundreds of Search Engines from 25 Years of Web Archives

    Full text link
    The Archive Query Log (AQL) is a previously unused, comprehensive query log collected at the Internet Archive over the last 25 years. Its first version includes 356 million queries, 166 million search result pages, and 1.7 billion search results across 550 search providers. Although many query logs have been studied in the literature, the search providers that own them generally do not publish their logs to protect user privacy and vital business data. Of the few query logs publicly available, none combines size, scope, and diversity. The AQL is the first to do so, enabling research on new retrieval models and (diachronic) search engine analyses. Provided in a privacy-preserving manner, it promotes open research as well as more transparency and accountability in the search industry.Comment: SIGIR 2023 resource paper, 13 page

    Meta-level sentiment models for big social data analysis

    Get PDF
    People react to events, topics and entities by expressing their personal opinions and emotions. These reactions can correspond to a wide range of intensities, from very mild to strong. An adequate processing and understanding of these expressions has been the subject of research in several fields, such as business and politics. In this context, Twitter sentiment analysis, which is the task of automatically identifying and extracting subjective information from tweets, has received increasing attention from the Web mining community. Twitter provides an extremely valuable insight into human opinions, as well as new challenging Big Data problems. These problems include the processing of massive volumes of streaming data, as well as the automatic identification of human expressiveness within short text messages. In that area, several methods and lexical resources have been proposed in order to extract sentiment indicators from natural language texts at both syntactic and semantic levels. These approaches address different dimensions of opinions, such as subjectivity, polarity, intensity and emotion. This article is the first study of how these resources, which are focused on different sentiment scopes, complement each other. With this purpose we identify scenarios in which some of these resources are more useful than others. Furthermore, we propose a novel approach for sentiment classification based on meta-level features. This supervised approach boosts existing sentiment classification of subjectivity and polarity detection on Twitter. Our results show that the combination of meta-level features provides significant improvements in performance. However, we observe that there are important differences that rely on the type of lexical resource, the dataset used to build the model, and the learning strategy. Experimental results indicate that manually generated lexicons are focused on emotional words, being very useful for polarity prediction. On the other hand, lexicons generated with automatic methods include neutral words, introducing noise in the detection of subjectivity. Our findings indicate that polarity and subjectivity prediction are different dimensions of the same problem, but they need to be addressed using different subspace features. Lexicon-based approaches are recommendable for polarity, and stylistic part-of-speech based approaches are meaningful for subjectivity. With this research we offer a more global insight of the resource components for the complex task of classifying human emotion and opinion

    Enhancing factoid question answering using frame semantic-based approaches

    Get PDF
    FrameNet is used to enhance the performance of semantic QA systems. FrameNet is a linguistic resource that encapsulates Frame Semantics and provides scenario-based generalizations over lexical items that share similar semantic backgrounds.Doctor of Philosoph
    corecore