Search CORE

5 research outputs found

Dynamic Document Annotation for Efficient Data Retrieval

Author: Deepali R Dagale, Prof. Poorna Shankar
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/12/2016
Field of study

Document annotation is considered as one of the most popular methods, where metadata present in document is used to search documents from a large text documents database. Few application domains such as scientific networks, blogs share information in a large amount is usually in unstructured data text documents. Manual annotation of each document becomes a tedious task. Annotations facilitate the task of finding the document topic and assist the reader to quickly overview and understand document. Dynamic document annotation provides a solution to such type of problems. Dynamic annotation of documents is generally considered as a semi-supervised learning task. The documents are dynamically assigned to one of a set of predefined classes based on the features extracted from their textual content. This paper proposes survey on Collaborative Adaptive Data sharing platform (CADS) for document annotation and use of query workload to direct the annotation process. A key novelty of CADS is that it learns with time the most important data attributes of the application, and uses this knowledge to guide the data insertion and querying

International Journal on Recent and Innovation Trends in Computing and Communication

A study of feature exraction techniques for classifying topics and sentiments from news posts

Author: Al-Dyani Wafa Zubair Abdullah
Publication venue
Publication date: 01/01/2014
Field of study

Recently, many news channels have their own Facebook pages in which news posts have been released in a daily basis. Consequently, these news posts contain temporal opinions about social events that may change over time due to external factors as well as may use as a monitor to the significant events happened around the world. As a result, many text mining researches have been conducted in the area of Temporal Sentiment Analysis, which one of its most challenging tasks is to detect and extract the key features from news posts that arrive continuously overtime. However, extracting these features is a challenging task due to post’s complex properties, also posts about a specific topic may grow or vanish overtime leading in producing imbalanced datasets. Thus, this study has developed a comparative analysis on feature extraction Techniques which has examined various feature extraction techniques (TF-IDF, TF, BTO, IG, Chi-square) with three different n-gram features (Unigram, Bigram, Trigram), and using SVM as a classifier. The aim of this study is to discover the optimal Feature Extraction Technique (FET) that could achieve optimum accuracy results for both topic and sentiment classification. Accordingly, this analysis is conducted on three news channels’ datasets. The experimental results for topic classification have shown that Chi-square with unigram have proven to be the best FET compared to other techniques. Furthermore, to overcome the problem of imbalanced data, this study has combined the best FET with OverSampling technology. The evaluation results have shown an improvement in classifier’s performance and has achieved a higher accuracy at 93.37%, 92.89%, and 91.92 for BBC, Al-Arabiya, and Al-Jazeera, respectively, compared to what have been obtained on original datasets. Similarly, same combination (Chi-square+Unigram) has been used for sentiment classification and obtained accuracies at rates of 81.87%, 70.01%, 77.36%. However, testing the recognized optimal FET on unseen randomly selected news posts has shown a relatively very low accuracies for both topic and sentiment classification due to the changes of topics and sentiments over time

Universiti Utara Malaysia: UUM eTheses

Classification of terrorism based on tweet text post on twitter using term weighting schemes

Author: Muhammad Muhammad Fikri Arif
Publication venue
Publication date: 01/01/2018
Field of study

Social Network Service (SNS) has become the main platform to distribute information, sharing of experience and knowledge. The Twitter platform gained the popularity very quickly since it’s founded for all layers of generation. The popularity of Twitter has led to prominent media coverage with instant news and advertisement from all over the world. However, the content of tweet posted on Twitter platform are not necessarily true and can sometimes be considered as a threat to another users. Workforce expertise that involve in intelligence gathering always deals with difficulty as the complexity of crime increases, human errors and time constraints. Thus, it is difficult to prevent undesired posts, such as terrorism posts, which are intended to disseminate their propaganda. Hence, an investigating for three term weighting schemes on two datasets are used to improve the automated content-based classification techniques. The research study aims to improve the content-based classification accuracy on Twitter by comparing Term Weighting Schemes in classifying terrorism contents. In this project, three different techniques for term weighting schemes namely Entropy, Term Frequency Inverse Document Frequency (TF-IDF) and Term Frequency Relevance Frequency (TFRF) are used as feature selection process in filtering Twitter posts. The performance of these techniques were examined via datasets, and the accuracy of their result was measured by Support Vector Machine (SVM). Entropy, TF-IDF and TFRF are judged based on accuracy, precision, recall and F score measurement. Results showed that TFRF performed better than Entropy and TF-IDF. It is hoped that this study would give other researchers an insight especially who want to work with similar area

Universiti Teknologi Malaysia Institutional Repository

Lexicon Expansion System for Domain and Time Oriented Sentiment Analysis

Author: Nuno Ricardo Pinheiro da Silva Guimarães
Publication venue
Publication date: 28/11/2016
Field of study

Repositório Aberto da Universidade do Porto

Classification of facebook news feeds and sentiment analysis

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref