172,347 research outputs found

    Concept Based Labeling of Text Documents Using Support Vector Machine

    Get PDF
    Classification plays a vital role in many information management and retrieval tasks . Text classification uses labeled training data to learn the classification system and then automatically classifies the remaining text using the lear ned system. Classification follows various techniques such as text processing, feature extraction, feature vector construction and final classification. The proposed mining model consists of sentence - based concept analysis, document - based concept analysis, corpus - based concept - analysis, and concept - based similarity measure. The proposed model can efficiently find significant matching concepts between documents, according to the semantics of their sentences. The similarity between documents is calculate d bas ed on a n similarity measure. Then we analyze the term that contributes to the sentence semantics on the sentence, document, and corpus levels rather than the traditional analysis of the document only. With the extracted feature vector for each new document, Support Vector Machine (SVM) algorithm is applied for document classification. The approach enhances the text classification accuracy

    Data Mining with Shallow vs. Linguistic Features to Study Diversification of Scientific Registers

    Get PDF
    We present a methodology to analyze the linguistic evolution of scientific registers with data mining techniques, comparing the insights gained from shallow vs. linguistic features. The focus is on selected scientific disciplines at the boundaries to computer science (computational linguistics, bioinformatics, digital construction, microelectronics). The data basis is the English Scientific Text Corpus (SCITEX) which covers a time range of roughly thirty years (1970/80s to early 2000s) (Degaetano-Ortlieb et al., 2013; Teich and Fankhauser, 2010). In particular, we investigate the diversification of scientific registers over time. Our theoretical basis is Systemic Functional Linguistics (SFL) and its specific incarnation of register theory (Halliday and Hasan, 1985). In terms of methods, we combine corpus-based methods of feature extraction and data mining techniques

    Using text mining techniques for classical music scores analysis

    Get PDF
    Music Classification is a particular area of Computational Musicology that provides valuable insights about the evolving of compo- sition patterns and assists in catalogue generation. The proposed work detaches from former works by classifying music based on music score in- formation. Text Mining techniques support music score processing while Classification techniques are used in the construction of decision mod- els. Although research is still at its earliest beginnings, the work already provides valuable contributes to symbolic music representation process- ing and subsequent analysis. Score processing involved the counting of ascending and descending chromatic intervals, note duration and meta- information tagging. Analysis involved feature selection and the evalu- ation of several data mining algorithms, ensuring extensibility towards larger repositories or more complex problems. Experiments report the analysis of composition epochs on a subset of the Mutopia project open archive of classical LilyPond-annotated music scores

    A Hybrid Model for Sentiment Analysis Based on Movie Review Datasets

    Get PDF
    The classification of sentiments, often known as sentiment analysis, is now widely recognized as an open field of research. Over the past few years, a huge amount of study work has been carried out in these disciplines by utilizing a wide variety of research approaches. Due to the possibility that the performance of sentiment analysis may be impacted by the high-dimensional feature set, text mining demands careful consideration during in the construction and selection of features.The process of recognising and extracting subjective information from written data is referred to as sentiment analysis. Sentiment analysis enables companies to understand the social sentiment around their brand, product, or service by monitoring the conversations that take place in internet chat rooms. In order to categorise people's attitudes or sentiments, this study provides a hybrid model (Support Vector Machine, Convolutional Neural Network, and Long Short-Term Memory). The findings of using the network model to sentiment analysis on the movie review or amazon review datasets reveal that it is possible to gain a good classification impact by using the model. The preprocessing is used for text mining, the removal of punctuation, and the generation of vocabulary, also uses GLOVE for vectorization and TF-IDF algorithms for better feature extraction.  The results that were proposed were compared with various base models such as KNN, and MNB, amongst others, which demonstrates that the hybrid model performs better than other models

    Transforming Graph Representations for Statistical Relational Learning

    Full text link
    Relational data representations have become an increasingly important topic due to the recent proliferation of network datasets (e.g., social, biological, information networks) and a corresponding increase in the application of statistical relational learning (SRL) algorithms to these domains. In this article, we examine a range of representation issues for graph-based relational data. Since the choice of relational data representation for the nodes, links, and features can dramatically affect the capabilities of SRL algorithms, we survey approaches and opportunities for relational representation transformation designed to improve the performance of these algorithms. This leads us to introduce an intuitive taxonomy for data representation transformations in relational domains that incorporates link transformation and node transformation as symmetric representation tasks. In particular, the transformation tasks for both nodes and links include (i) predicting their existence, (ii) predicting their label or type, (iii) estimating their weight or importance, and (iv) systematically constructing their relevant features. We motivate our taxonomy through detailed examples and use it to survey and compare competing approaches for each of these tasks. We also discuss general conditions for transforming links, nodes, and features. Finally, we highlight challenges that remain to be addressed

    Opinion Mining on Non-English Short Text

    Full text link
    As the type and the number of such venues increase, automated analysis of sentiment on textual resources has become an essential data mining task. In this paper, we investigate the problem of mining opinions on the collection of informal short texts. Both positive and negative sentiment strength of texts are detected. We focus on a non-English language that has few resources for text mining. This approach would help enhance the sentiment analysis in languages where a list of opinionated words does not exist. We propose a new method projects the text into dense and low dimensional feature vectors according to the sentiment strength of the words. We detect the mixture of positive and negative sentiments on a multi-variant scale. Empirical evaluation of the proposed framework on Turkish tweets shows that our approach gets good results for opinion mining

    Automatic domain ontology extraction for context-sensitive opinion mining

    Get PDF
    Automated analysis of the sentiments presented in online consumer feedbacks can facilitate both organizations’ business strategy development and individual consumers’ comparison shopping. Nevertheless, existing opinion mining methods either adopt a context-free sentiment classification approach or rely on a large number of manually annotated training examples to perform context sensitive sentiment classification. Guided by the design science research methodology, we illustrate the design, development, and evaluation of a novel fuzzy domain ontology based contextsensitive opinion mining system. Our novel ontology extraction mechanism underpinned by a variant of Kullback-Leibler divergence can automatically acquire contextual sentiment knowledge across various product domains to improve the sentiment analysis processes. Evaluated based on a benchmark dataset and real consumer reviews collected from Amazon.com, our system shows remarkable performance improvement over the context-free baseline
    • …
    corecore