3,134 research outputs found

    Simultaneous Inference of User Representations and Trust

    Full text link
    Inferring trust relations between social media users is critical for a number of applications wherein users seek credible information. The fact that available trust relations are scarce and skewed makes trust prediction a challenging task. To the best of our knowledge, this is the first work on exploring representation learning for trust prediction. We propose an approach that uses only a small amount of binary user-user trust relations to simultaneously learn user embeddings and a model to predict trust between user pairs. We empirically demonstrate that for trust prediction, our approach outperforms classifier-based approaches which use state-of-the-art representation learning methods like DeepWalk and LINE as features. We also conduct experiments which use embeddings pre-trained with DeepWalk and LINE each as an input to our model, resulting in further performance improvement. Experiments with a dataset of ∼\sim356K user pairs show that the proposed method can obtain an high F-score of 92.65%.Comment: To appear in the proceedings of ASONAM'17. Please cite that versio

    Conceptual Sentiment Analysis Model

    Get PDF
    Bag-of-words approach is popularly used for Sentiment analysis. It maps the terms in the reviews to term-document vectors and thus disrupts the syntactic structure of sentences in the reviews. Association among the terms or the semantic structure of sentences is also not preserved. This research work focuses on classifying the sentiments by considering the syntactic and semantic structure of the sentences in the review. To improve accuracy, sentiment classifiers based on relative frequency, average frequency and term frequency inverse document frequency were proposed. To handle terms with apostrophe, preprocessing techniques were extended. To focus on opinionated contents, subjectivity extraction was performed at phrase level. Experiments were performed on Pang & Lees, Kaggle’s and UCI’s dataset. Classifiers were also evaluated on the UCI’s Product and Restaurant dataset. Sentiment Classification accuracy improved from 67.9% for a comparable term weighing technique, DeltaTFIDF, up to 77.2% for proposed classifiers. Inception of the proposed concept based approach, subjectivity extraction and extensions to preprocessing techniques, improved the accuracy to 93.9%

    Linguistic Geometries for Unsupervised Dimensionality Reduction

    Full text link
    Text documents are complex high dimensional objects. To effectively visualize such data it is important to reduce its dimensionality and visualize the low dimensional embedding as a 2-D or 3-D scatter plot. In this paper we explore dimensionality reduction methods that draw upon domain knowledge in order to achieve a better low dimensional embedding and visualization of documents. We consider the use of geometries specified manually by an expert, geometries derived automatically from corpus statistics, and geometries computed from linguistic resources.Comment: 13 pages, 15 figure

    Hashing for Similarity Search: A Survey

    Full text link
    Similarity search (nearest neighbor search) is a problem of pursuing the data items whose distances to a query item are the smallest from a large database. Various methods have been developed to address this problem, and recently a lot of efforts have been devoted to approximate search. In this paper, we present a survey on one of the main solutions, hashing, which has been widely studied since the pioneering work locality sensitive hashing. We divide the hashing algorithms two main categories: locality sensitive hashing, which designs hash functions without exploring the data distribution and learning to hash, which learns hash functions according the data distribution, and review them from various aspects, including hash function design and distance measure and search scheme in the hash coding space

    Exploring Public Sentiment: A Sentiment Analysis of GST Discourse on Twitter using Supervised Machine Learning Classifiers

    Get PDF
    A key economic move that resulted in heated disputes was India's introduction of the Goods and Services Tax (GST). Social media channels offered a widely used forum for the people to express their views on the GST, providing insightful data for gauging mood and guiding next revisions. The emotion of 5629 GST-related tweets was assessed using the VADER lexicon after being obtained using the Twitter Developer API. The tf-idf feature was used for text vectorization, with 80% of the data going toward training and the remaining 20% going toward testing. In this study, six well-known classifiers—the Ridge Classifier, Logistic Regression, Linear SVC, Perceptron, Decision Tree, and K-Nearest Neighbor—were thoroughly compared to evaluate their performance in a range of circumstances. Accuracy, precision, recall, f-score, training, and testing times were all included in the performance measurements. The study presented novel pre-processing methods and examined the training/testing times before coming to the conclusion that the Ridge Classifier performed better than the others in terms of accuracy, precision, and efficiency. In this study, six well-known classifiers—the Ridge Classifier, Logistic Regression, Linear SVC, Perceptron, Decision Tree, and K-Nearest Neighbor—were thoroughly compared to evaluate their performance in a range of circumstances. Accuracy, precision, recall, f-score, training, and testing times were all included in the performance measurements. The study presented novel pre-processing methods and examined the training/testing times before coming to the conclusion that the Ridge Classifier performed better than the others in terms of accuracy, precision, and efficiency

    Multiple-Domain Sentiment Classification for Cantonese Using a Combined Approach

    Get PDF
    In this study, we proposed a combined approach, which amalgamates machine learning and lexicon- based approaches for multiple-domain sentiment classification that supports Cantonese-based social media analysis. Our study contributes to the existing literature not only by investigating the effectiveness of the proposed combined approach for supporting social media analysis in the Cantonese context but also by verifying that the proposed method outperforms the baseline approaches, which are commonly used in the literature. We demonstrated that social media network-based classifiers can be general classifiers that support multiple-domain sentiment classification
    • …
    corecore