5,762 research outputs found
Highly Relevant Routing Recommendation Systems for Handling Few Data Using MDL Principle and Embedded Relevance Boosting Factors
A route recommendation system can provide better recommendation if it also
takes collected user reviews into account, e.g. places that generally get
positive reviews may be preferred. However, to classify sentiment, many
classification algorithms existing today suffer in handling small data items
such as short written reviews. In this paper we propose a model for a strongly
relevant route recommendation system that is based on an MDL-based (Minimum
Description Length) sentiment classification and show that such a system is
capable of handling small data items (short user reviews). Another highlight of
the model is the inclusion of a set of boosting factors in the relevance
calculation to improve the relevance in any recommendation system that
implements the model.Comment: ACM SIGIR 2018 Workshop on Learning from Limited or Noisy Data for
Information Retrieval (LND4IR'18), July 12, 2018, Ann Arbor, Michigan, USA, 8
pages, 9 figure
Recommended from our members
Beyond TREC's filtering track
Following the withdrawal of the filtering track from the latest TREC conferences, there is a niche for new evaluation standards. Towards this end, we suggest, based on variations of TREC's routing subtask, two new evaluation methodologies. The first can be used for evaluating single, multi-topic profiles and the second for testing the ability of a multi-topic profile to adapt to both modest variations and radical drifts in user interests
Machine Learning in Automated Text Categorization
The automated categorization (or classification) of texts into predefined
categories has witnessed a booming interest in the last ten years, due to the
increased availability of documents in digital form and the ensuing need to
organize them. In the research community the dominant approach to this problem
is based on machine learning techniques: a general inductive process
automatically builds a classifier by learning, from a set of preclassified
documents, the characteristics of the categories. The advantages of this
approach over the knowledge engineering approach (consisting in the manual
definition of a classifier by domain experts) are a very good effectiveness,
considerable savings in terms of expert manpower, and straightforward
portability to different domains. This survey discusses the main approaches to
text categorization that fall within the machine learning paradigm. We will
discuss in detail issues pertaining to three different problems, namely
document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey
Deciding How to Decide: Dynamic Routing in Artificial Neural Networks
We propose and systematically evaluate three strategies for training
dynamically-routed artificial neural networks: graphs of learned
transformations through which different input signals may take different paths.
Though some approaches have advantages over others, the resulting networks are
often qualitatively similar. We find that, in dynamically-routed networks
trained to classify images, layers and branches become specialized to process
distinct categories of images. Additionally, given a fixed computational
budget, dynamically-routed networks tend to perform better than comparable
statically-routed networks.Comment: ICML 2017. Code at https://github.com/MasonMcGill/multipath-nn Video
abstract at https://youtu.be/NHQsDaycwy
Using online linear classifiers to filter spam Emails
The performance of two online linear classifiers - the Perceptron and Littlestoneās Winnow ā is explored for two anti-spam filtering benchmark corpora - PU1 and Ling-Spam. We study the performance for varying numbers of features, along with three different feature selection methods: Information Gain (IG), Document Frequency (DF) and Odds Ratio. The size of the training set and the number of training iterations are also investigated for both classifiers. The experimental results show that both the Perceptron and Winnow perform much better when using IG or DF than using Odds Ratio. It is further demonstrated that when using IG or DF, the classifiers are insensitive to the number of features and the number of training iterations, and not greatly sensitive to the size of training set. Winnow is shown to slightly outperform the Perceptron. It is also demonstrated that both of these online classifiers perform much better than a standard NaĆÆve Bayes method. The theoretical and implementation computational complexity of these two classifiers are very low, and they are very easily adaptively updated. They outperform most of the published results, while being significantly easier to train and adapt. The analysis and promising experimental results indicate that the Perceptron and Winnow are two very competitive classifiers for anti-spam filtering
- ā¦