10 research outputs found

    Regression and Learning to Rank Aggregation for User Engagement Evaluation

    Full text link
    User engagement refers to the amount of interaction an instance (e.g., tweet, news, and forum post) achieves. Ranking the items in social media websites based on the amount of user participation in them, can be used in different applications, such as recommender systems. In this paper, we consider a tweet containing a rating for a movie as an instance and focus on ranking the instances of each user based on their engagement, i.e., the total number of retweets and favorites it will gain. For this task, we define several features which can be extracted from the meta-data of each tweet. The features are partitioned into three categories: user-based, movie-based, and tweet-based. We show that in order to obtain good results, features from all categories should be considered. We exploit regression and learning to rank methods to rank the tweets and propose to aggregate the results of regression and learning to rank methods to achieve better performance. We have run our experiments on an extended version of MovieTweeting dataset provided by ACM RecSys Challenge 2014. The results show that learning to rank approach outperforms most of the regression models and the combination can improve the performance significantly.Comment: In Proceedings of the 2014 ACM Recommender Systems Challenge, RecSysChallenge '1

    Review Paper on Answers Selection and Recommendation in Community Question Answers System

    Get PDF
    Nowadays, question answering system is more convenient for the users, users ask question online and then they will get the answer of that question, but as browsing is primary need for each an individual, the number of users ask question and system will provide answer but the computation time increased as well as waiting time increased and same type of questions are asked by different users, system need to give same answers repeatedly to different users. To avoid this we propose PLANE technique which may quantitatively rank answer candidates from the relevant question pool. If users ask any question, then system provide answers in ranking form, then system recommend highest rank answer to the user. We proposing expert recommendation system, an expert will provide answer of the question which is asked by the user and we also implement sentence level clustering technique in which a single question have multiple answers, system provide most suitable answer to the question which is asked by the user

    Answer Extraction with Multiple Extraction Engines for Web-Based Question Answering

    Get PDF
    Abstract. Answer Extraction of Web-based Question Answering aims to extract answers from snippets retrieved by search engines. Search results contain lots of noisy and incomplete texts, thus the task becomes more challenging comparing with traditional answer extraction upon offline corpus. In this paper we discuss the important role of employing multiple extraction engines for Web-based Question Answering. Aggregating multiple engines could ease the negative effect from the noisy search results on single method. We adopt a Pruned Rank Aggregation method which performs pruning while aggregating candidate lists provided by multiple engines. It fully leverages redundancies within and across each list for reducing noises in candidate list without hurting answer recall. In addition, we rank the aggregated list with a Learning to Rank framework with similarity, redundancy, quality and search features. Experiment results on TREC data show that our method is effective for reducing noises in candidate list, and greatly helps to improve answer ranking results. Our method outperforms state-of-the-art answer extraction method, and is sufficient in dealing with the noisy search snippets for Web-based QA

    Towards Supporting Visual Question and Answering Applications

    Get PDF
    abstract: Visual Question Answering (VQA) is a new research area involving technologies ranging from computer vision, natural language processing, to other sub-fields of artificial intelligence such as knowledge representation. The fundamental task is to take as input one image and one question (in text) related to the given image, and to generate a textual answer to the input question. There are two key research problems in VQA: image understanding and the question answering. My research mainly focuses on developing solutions to support solving these two problems. In image understanding, one important research area is semantic segmentation, which takes images as input and output the label of each pixel. As much manual work is needed to label a useful training set, typical training sets for such supervised approaches are always small. There are also approaches with relaxed labeling requirement, called weakly supervised semantic segmentation, where only image-level labels are needed. With the development of social media, there are more and more user-uploaded images available on-line. Such user-generated content often comes with labels like tags and may be coarsely labelled by various tools. To use these information for computer vision tasks, I propose a new graphic model by considering the neighborhood information and their interactions to obtain the pixel-level labels of the images with only incomplete image-level labels. The method was evaluated on both synthetic and real images. In question answering, my research centers on best answer prediction, which addressed two main research topics: feature design and model construction. In the feature design part, most existing work discussed how to design effective features for answer quality / best answer prediction. However, little work mentioned how to design features by considering the relationship between answers of one given question. To fill this research gap, I designed new features to help improve the prediction performance. In the modeling part, to employ the structure of the feature space, I proposed an innovative learning-to-rank model by considering the hierarchical lasso. Experiments with comparison with the state-of-the-art in the best answer prediction literature have confirmed that the proposed methods are effective and suitable for solving the research task.Dissertation/ThesisDoctoral Dissertation Computer Science 201

    Modelling input texts: from Tree Kernels to Deep Learning

    Get PDF
    One of the core questions when designing modern Natural Language Processing (NLP) systems is how to model input textual data such that the learning algorithm is provided with enough information to estimate accurate decision functions. The mainstream approach is to represent input objects as feature vectors where each value encodes some of their aspects, e.g., syntax, semantics, etc. Feature-based methods have demonstrated state-of-the-art results on various NLP tasks. However, designing good features is a highly empirical-driven process, it greatly depends on a task requiring a significant amount of domain expertise. Moreover, extracting features for complex NLP tasks often requires expensive pre-processing steps running a large number of linguistic tools while relying on external knowledge sources that are often not available or hard to get. Hence, this process is not cheap and often constitutes one of the major challenges when attempting a new task or adapting to a different language or domain. The problem of modelling input objects is even more acute in cases when the input examples are not just single objects but pairs of objects, such as in various learning to rank problems in Information Retrieval and Natural Language processing. An alternative to feature-based methods is using kernels which are essentially non-linear functions mapping input examples into some high dimensional space thus allowing for learning decision functions with higher discriminative power. Kernels implicitly generate a very large number of features computing similarity between input examples in that implicit space. A well-designed kernel function can greatly reduce the effort to design a large set of manually designed features often leading to superior results. However, in the recent years, the use of kernel methods in NLP has been greatly under-estimated primarily due to the following reasons: (i) learning with kernels is slow as it requires to carry out optimization in the dual space leading to quadratic complexity; (ii) applying kernels to the input objects encoded with vanilla structures, e.g., generated by syntactic parsers, often yields minor improvements over carefully designed feature-based methods. In this thesis, we adopt the kernel learning approach for solving complex NLP tasks and primarily focus on solutions to the aforementioned problems posed by the use of kernels. In particular, we design novel learning algorithms for training Support Vector Machines with structural kernels, e.g., tree kernels, considerably speeding up the training over the conventional SVM training methods. We show that using the training algorithms developed in this thesis allows for training tree kernel models on large-scale datasets containing millions of instances, which was not possible before. Next, we focus on the problem of designing input structures that are fed to tree kernel functions to automatically generate a large set of tree-fragment features. We demonstrate that previously used plain structures generated by syntactic parsers, e.g., syntactic or dependency trees, are often a poor choice thus compromising the expressivity offered by a tree kernel learning framework. We propose several effective design patterns of the input tree structures for various NLP tasks ranging from sentiment analysis to answer passage reranking. The central idea is to inject additional semantic information relevant for the task directly into the tree nodes and let the expressive kernels generate rich feature spaces. For the opinion mining tasks, the additional semantic information injected into tree nodes can be word polarity labels, while for more complex tasks of modelling text pairs the relational information about overlapping words in a pair appears to significantly improve the accuracy of the resulting models. Finally, we observe that both feature-based and kernel methods typically treat words as atomic units where matching different yet semantically similar words is problematic. Conversely, the idea of distributional approaches to model words as vectors is much more effective in establishing a semantic match between words and phrases. While tree kernel functions do allow for a more flexible matching between phrases and sentences through matching their syntactic contexts, their representation can not be tuned on the training set as it is possible with distributional approaches. Recently, deep learning approaches have been applied to generalize the distributional word matching problem to matching sentences taking it one step further by learning the optimal sentence representations for a given task. Deep neural networks have already claimed state-of-the-art performance in many computer vision, speech recognition, and natural language tasks. Following this trend, this thesis also explores the virtue of deep learning architectures for modelling input texts and text pairs where we build on some of the ideas to model input objects proposed within the tree kernel learning framework. In particular, we explore the idea of relational linking (proposed in the preceding chapters to encode text pairs using linguistic tree structures) to design a state-of-the-art deep learning architecture for modelling text pairs. We compare the proposed deep learning models that require even less manual intervention in the feature design process then previously described tree kernel methods that already offer a very good trade-off between the feature-engineering effort and the expressivity of the resulting representation. Our deep learning models demonstrate the state-of-the-art performance on a recent benchmark for Twitter Sentiment Analysis, Answer Sentence Selection and Microblog retrieval

    Provision of better VLE learner support with a Question Answering System

    Get PDF
    The focus of this research is based on the provision of user support to students using electronic means of communication to aid their learning. Digital age brought anytime anywhere access of learning resources to students. Most academic institutions and also companies use Virtual Learning Environments to provide their learners with learning material. All learners using the VLE have access to the same material and help despite their existing knowledge and interests. This work uses the information in the learning materials of Virtual Learning Environments to answer questions and provide student help by a Question Answering System. The aim of this investigation is to research if a satisfactory combination of Question Answering, Information Retrieval and Automatic Summarisation techniques within a VLE will help/support the student better than existing systems (full text search engines)

    Étude algorithmique et combinatoire de la méthode de Kemeny-Young et du consensus de classements

    Full text link
    Une permutation est une liste qui ordonne des objets ou des candidats en fonction d’une préférence ou d’un critère. Des exemples sont les résultats d’un moteur de recherche sur l’internet, des classements d’athlètes, des listes de gènes liés à une maladie données par des méthodes de prédiction ou simplement des préférences d’activités à faire pour la pro- chaine fin de semaine. On peut être intéressé à agréger plusieurs permutations pour en obtenir une permutation consensus. Ce problème est bien connu en science politique et plusieurs méthodes existent pour agréger des permutations, chacune ayant ses propriétés mathématiques. Parmi ces méthodes, la méthode de Kemeny-Young, aussi nommée la médiane de permutations, permet de trouver un consensus qui minimise la somme des distances entre ce consensus et l’ensemble de permutations. Cette méthode détient plu- sieurs propriétés désirables. Par contre, elle est difficile à calculer, ouvrant par ce fait, la voie à de nombreux travaux de recherche. Une généralisation de ce problème permet de considérer les classements qui contiennent des égalités entre les objets classés et qui peuvent être incomplets en ne considérant qu’un sous-ensemble d’objets. Dans cette thèse nous étudions la méthode de Kemeny-Young sous différents aspects : — Premièrement, une réduction d’espace de recherche est proposée. Elle permet d’améliorer les temps de calcul d’approches exactes pour le problème. — Deuxièmement, une heuristique bien paramétrée est développée et sert par le gui- dage d’un algorithme exact branch-and-bound. Cet algorithme utilise aussi une nouvelle réduction d’espace. — Troisièmement, le cas particulier du problème sur trois permutations est investigué. Une réduction d’espace de recherche basée sur les graphes est proposée pour ce cas, suivi d’une borne inférieure très stricte. Deux conjectures sont émises et font le lien entre ce cas et le problème du 3-Hitting Set. — Finalement, une généralisation du problème est proposée et permet d’étendre nos travaux de réduction d’espace de recherche à l’agrégation de classements.A permutation is a list that orders objects or candidates with a preference function or a criterion. Some examples include results from a search engine on the internet, athlete rankings, lists of genes related to a disease given by prediction methods or simply the preference of activities for the next weekend. One might be interested to aggregate a set of permutations to get a consensus permutation. This problem is well known in political science and many methods exists that can aggregate permutations, each one having its mathematical properties. Among those methods, the Kemeny-Young method, also known as the median of permutations, finds a consensus that minimise the sum of distances between that consensus and the set of permutations. This method holds many desirable properties. On the other end, this method is difficult to calculate, thus opening the way for research works. A generalization of this problem considers rankings containing ties between the ranked objects and rankings that might be incomplete by considering only a subset of objects. In this thesis, we study the Kemeny-Young method under different aspects : — Firstly, a search space reduction technique is proposed. It improves the time com- plexity of exact algorithms for the problem. — Secondly, a well parameterized heuristic is developed and is used as guidance in a branch-and-bound exact algorithm. This algorithm also uses a new search space reduction technique. — Thirdly, the special case of the problem on three permutations is investigated. A search space reduction technique based on graphs is presented for this case, followed by a very tight lower bound. Two conjectures are stated and are linking this case with the 3-Hitting Set problem. — Finally, a generalization of the problem is proposed and allows us to extend our work on search space reduction techniques to the rank aggregation problem
    corecore