3,419 research outputs found

    Theory and Applications for Advanced Text Mining

    Get PDF
    Due to the growth of computer technologies and web technologies, we can easily collect and store large amounts of text data. We can believe that the data include useful knowledge. Text mining techniques have been studied aggressively in order to extract the knowledge from the data since late 1990s. Even if many important techniques have been developed, the text mining research field continues to expand for the needs arising from various application fields. This book is composed of 9 chapters introducing advanced text mining techniques. They are various techniques from relation extraction to under or less resourced language. I believe that this book will give new knowledge in the text mining field and help many readers open their new research fields

    Comparing Statistical and Data Mining Techniques for Enrichment Ontology with Instances

    Get PDF
    Enriching instances into an ontology is an important task because the process extends knowledge in ontology to cover more extensively the domain of interest, so that greater benefits can be obtained. There are many techniques to classify instances of concepts with two popular techniques being the statistical and data mining methods. The paper compares the use of the two methods to classify instances to enrich ontology having greater domain knowledge, and selects a conditional random field for the statistical method and feature-weight k-nearest neighbor classification for the data mining method. The experiments are conducted on tourism ontology. The results show that conditional random fields methods provide greater precision and recall value than the other, specifically, F1-measure is 74.09% for conditional random fields and 60.04% for feature-weight k-nearest neighbor classification

    Sentiment Classification of Online Customer Reviews and Blogs Using Sentence-level Lexical Based Semantic Orientation Method

    Get PDF
    ABSTRACT Sentiment analysis is the process of extracting knowledge from the peoples‟ opinions, appraisals and emotions toward entities, events and their attributes. These opinions greatly impact on customers to ease their choices regarding online shopping, choosing events, products and entities. With the rapid growth of online resources, a vast amount of new data in the form of customer reviews and opinions are being generated progressively. Hence, sentiment analysis methods are desirable for developing efficient and effective analyses and classification of customer reviews, blogs and comments. The main inspiration for this thesis is to develop high performance domain independent sentiment classification method. This study focuses on sentiment analysis at the sentence level using lexical based method for different type data such as reviews and blogs. The proposed method is based on general lexicons i.e. WordNet, SentiWordNet and user defined lexical dictionaries for sentiment orientation. The relations and glosses of these dictionaries provide solution to the domain portability problem. The experiments are performed on various data sets such as customer reviews and blogs comments. The results show that the proposed method with sentence contextual information is effective for sentiment classification. The proposed method performs better than word and text level corpus based machine learning methods for semantic orientation. The results highlight that the proposed method achieves an average accuracy of 86% at sentence-level and 97% at feedback level for customer reviews. Similarly, it achieves an average accuracy of 83% at sentence level and 86% at feedback level for blog comment

    Semantic discovery and reuse of business process patterns

    Get PDF
    Patterns currently play an important role in modern information systems (IS) development and their use has mainly been restricted to the design and implementation phases of the development lifecycle. Given the increasing significance of business modelling in IS development, patterns have the potential of providing a viable solution for promoting reusability of recurrent generalized models in the very early stages of development. As a statement of research-in-progress this paper focuses on business process patterns and proposes an initial methodological framework for the discovery and reuse of business process patterns within the IS development lifecycle. The framework borrows ideas from the domain engineering literature and proposes the use of semantics to drive both the discovery of patterns as well as their reuse

    Emotion at the end of life: Semantic annotation and key domains in a pilot study audiovisual corpus

    Get PDF
    This article focuses on emotion talk in English and the semantic annotation of emotions in a pilot study corpus about the end of life. It describes the process of compiling and annotating a corpus containing the transcript of the verbal component of audiovisual material regarding end-of-life care. The paper also aims to present a lexico-semantic analysis of emotion talk based on the combined use of two corpus processing tools: Wmatrix and Sketch Engine. The findings indicate that the limitations of semantic annotation can be overcome by concordance and collocational analysis. They also reveal that the lexis of emotion is commonly present at the end of life and show the main keywords and key concepts, the predominant semantic categories of emotion and the most frequent emotion words in the corpus. The results suggest that the most frequent emotions in the corpus are SADNESS, FEAR, LIKING, LOVE, HAPPINESS/RELIEF, WORRY, CALMNESS, ANGER, HOPE and CONFIDENCE.FEDER Andalucia A-HUM-131-UGR18Andalusian Regional Government (Junta de Andalucia-Consejeria de Economia y Conocimiento)Spanish GovernmentEuropean Commission A-HUM-131-UGR18Universidad de Granada/CBUA PID2020-118775RB-C2

    forms of hybridity in travel blogs

    Get PDF
    The technological revolution has changed considerably not only the way people travel and but also how they narrate their experiences. In this respect, the analysis of travel blogs can offer insights into the discursive and communicative practices which characterize this hybrid genre. This study is based on the investigation of a corpus of highly visited travel blogs and aims to observe their hybridity from a multitude of perspectives. More specifically, hybridity is seen in terms of genre, (a)synchronicity, collaboration, modes of communication and level of multimodality, style, orientation, levels of subjectivity and pragmatic functions. From a lexical perspective, specific attention is devoted to evaluative adjectives. In particular, the use of adjectives belonging to conceptual classes such as 'assessment' or 'deviance' is a widespread tool to express the blogger's subjectivity and may assume different communicative and pragmatic functions

    Semi-automating the reading programme for a historical dictionary project

    Get PDF
    This paper describes the resources and software procedures used or developed in a major enabling step towards the revision of the scholarly reference work A  Dictionary of South African English on Historical Principles (DSAE, Silva et al. 1996), namely the semi-automatic generation of a digitally-sourced lexical database on which new and updated dictionary entries will be based; as well as the addition, in parallel, of a new corpus of South African English (SAE) to the project. Drawing on online data sources and an extensive list of known SAE word forms, we have developed a software toolchain to gather, encode, annotate and collate textual sources, producing: (i) a 3.1-billion part-of-speech-annotated corpus of South African English; (ii) a lexical database of illustrative quotations for over 20,000 known SAE word forms, available for selection at the entry-revision stage; and (iii) a list of potential new variant spellings and headword inclusion candidates. These steps replace, where recent electronic sources are concerned, the mechanical aspects of quotation gathering, normally undertaken manually through a reading programme requiring years of teamwork to acquire sufficient coverage (cf. Hicks 2010).Keywords: corpora, dictionary workflows, historical lexicography, language varieties, lexical databases, reading programmes, South African Englis
    • …
    corecore