3,134 research outputs found
Simultaneous Inference of User Representations and Trust
Inferring trust relations between social media users is critical for a number
of applications wherein users seek credible information. The fact that
available trust relations are scarce and skewed makes trust prediction a
challenging task. To the best of our knowledge, this is the first work on
exploring representation learning for trust prediction. We propose an approach
that uses only a small amount of binary user-user trust relations to
simultaneously learn user embeddings and a model to predict trust between user
pairs. We empirically demonstrate that for trust prediction, our approach
outperforms classifier-based approaches which use state-of-the-art
representation learning methods like DeepWalk and LINE as features. We also
conduct experiments which use embeddings pre-trained with DeepWalk and LINE
each as an input to our model, resulting in further performance improvement.
Experiments with a dataset of 356K user pairs show that the proposed
method can obtain an high F-score of 92.65%.Comment: To appear in the proceedings of ASONAM'17. Please cite that versio
Conceptual Sentiment Analysis Model
Bag-of-words approach is popularly used for Sentiment analysis. It maps the terms in the reviews to term-document vectors and thus disrupts the syntactic structure of sentences in the reviews. Association among the terms or the semantic structure of sentences is also not preserved. This research work focuses on classifying the sentiments by considering the syntactic and semantic structure of the sentences in the review. To improve accuracy, sentiment classifiers based on relative frequency, average frequency and term frequency inverse document frequency were proposed. To handle terms with apostrophe, preprocessing techniques were extended. To focus on opinionated contents, subjectivity extraction was performed at phrase level. Experiments were performed on Pang & Lees, Kaggle’s and UCI’s dataset. Classifiers were also evaluated on the UCI’s Product and Restaurant dataset. Sentiment Classification accuracy improved from 67.9% for a comparable term weighing technique, DeltaTFIDF, up to 77.2% for proposed classifiers. Inception of the proposed concept based approach, subjectivity extraction and extensions to preprocessing techniques, improved the accuracy to 93.9%
Linguistic Geometries for Unsupervised Dimensionality Reduction
Text documents are complex high dimensional objects. To effectively visualize
such data it is important to reduce its dimensionality and visualize the low
dimensional embedding as a 2-D or 3-D scatter plot. In this paper we explore
dimensionality reduction methods that draw upon domain knowledge in order to
achieve a better low dimensional embedding and visualization of documents. We
consider the use of geometries specified manually by an expert, geometries
derived automatically from corpus statistics, and geometries computed from
linguistic resources.Comment: 13 pages, 15 figure
Hashing for Similarity Search: A Survey
Similarity search (nearest neighbor search) is a problem of pursuing the data
items whose distances to a query item are the smallest from a large database.
Various methods have been developed to address this problem, and recently a lot
of efforts have been devoted to approximate search. In this paper, we present a
survey on one of the main solutions, hashing, which has been widely studied
since the pioneering work locality sensitive hashing. We divide the hashing
algorithms two main categories: locality sensitive hashing, which designs hash
functions without exploring the data distribution and learning to hash, which
learns hash functions according the data distribution, and review them from
various aspects, including hash function design and distance measure and search
scheme in the hash coding space
Exploring Public Sentiment: A Sentiment Analysis of GST Discourse on Twitter using Supervised Machine Learning Classifiers
A key economic move that resulted in heated disputes was India's introduction of the Goods and Services Tax (GST). Social media channels offered a widely used forum for the people to express their views on the GST, providing insightful data for gauging mood and guiding next revisions. The emotion of 5629 GST-related tweets was assessed using the VADER lexicon after being obtained using the Twitter Developer API. The tf-idf feature was used for text vectorization, with 80% of the data going toward training and the remaining 20% going toward testing. In this study, six well-known classifiers—the Ridge Classifier, Logistic Regression, Linear SVC, Perceptron, Decision Tree, and K-Nearest Neighbor—were thoroughly compared to evaluate their performance in a range of circumstances. Accuracy, precision, recall, f-score, training, and testing times were all included in the performance measurements. The study presented novel pre-processing methods and examined the training/testing times before coming to the conclusion that the Ridge Classifier performed better than the others in terms of accuracy, precision, and efficiency. In this study, six well-known classifiers—the Ridge Classifier, Logistic Regression, Linear SVC, Perceptron, Decision Tree, and K-Nearest Neighbor—were thoroughly compared to evaluate their performance in a range of circumstances. Accuracy, precision, recall, f-score, training, and testing times were all included in the performance measurements. The study presented novel pre-processing methods and examined the training/testing times before coming to the conclusion that the Ridge Classifier performed better than the others in terms of accuracy, precision, and efficiency
Multiple-Domain Sentiment Classification for Cantonese Using a Combined Approach
In this study, we proposed a combined approach, which amalgamates machine learning and lexicon- based approaches for multiple-domain sentiment classification that supports Cantonese-based social media analysis. Our study contributes to the existing literature not only by investigating the effectiveness of the proposed combined approach for supporting social media analysis in the Cantonese context but also by verifying that the proposed method outperforms the baseline approaches, which are commonly used in the literature. We demonstrated that social media network-based classifiers can be general classifiers that support multiple-domain sentiment classification
- …